Thanks for the data. I realized there's been further problems with the benchmark on both our ends.
1. the speedups (Skylake i6700HQ vs Haswell 4770 in column I) isn't a valid comparison
Those numbers are almost certainly the multithreaded throughput for 4 cores. I forgot to say if you want to benchmark single thread throughput, you need to uncomment the line, // omp_set_num_threads(1)
If I compare your numbers with mine from the 2nd sheet, the speed ups range from 0.4x to 1.2x, average = 0.78x. This is believable since you're using a power sipping 2.6 GHz CPU compared to a 3.4 GHz for me. If I scale up your numbers by 3.4/2.6 (optimistic, since in a lot of cases, especially IIR, memory bandwidth is a bigger bottleneck than CPU), then the speedup becomes 1.02x.
That's about right. Haswell and Skylake should have almost the same performance per clock cycle.
2. the width & height for the benchmark (but not the accuracy checking test) is swapped due to calling IterateCombinations() with the width & height swapped but I made the same mistake, so no inconsistency.
3. my desktop memory is only dual channel. Must've gotten it mixed up with the desktop and servers I use at work, which are all quad channel.
4. multithreaded throughput unstable - > 10% run to run difference
Windows' anti-malware service?
Also, I found the performance of the optimized loops where I use "goto middle" to handle SIMD remainders while keeping code size to a minimum (so it fits in cache or even better, the uOP cache) is fragile. I found the Microsoft compiler reorders the basic blocks, resulting in a loop that does 2 branches/iteration instead of 1, dropping the performance by almost 2x. Luckily, GCC 6 doesn't do that, which is why I used it for the benchmark. If anyone knows how to discourage the compiler from reordering code like that, I'd like to know.
I just tried gcc 6.1.0, but saw no noticeable difference.
"can I make it print also the reference speed?" yes, set useRefCode in BenchmarkFunction() to true
-Yale
On Tue, Sep 20, 2016 at 4:00 AM, Alexander Brock <a.brock@...2965...> wrote:
On 09/20/2016 03:33 AM, Yale Zhang wrote:
" "error loading". I think I need the file" Right, that's a panel from my comic. I'd rather not share it, but you can just use any 8bit color PNG. The resolution I used for the benchmark is 1467x1373.
I made a file with this resolution and ran the test. The program output is attached in the file log1
real 0m7.513s user 0m57.092s sys 0m0.284s
May I ask how you're using blurs in your work?
I use only Gaussian blurr and I use it very rarely.
I also ran the benchmark code but it only prints speed of the vectorized version, can I make it print also the reference speed? I attached my results and my memory configuration.
Best Regards, Alexander
Inkscape-devel mailing list Inkscape-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/inkscape-devel