Thanks for the encouragement. It's both encouraging and intimidating to hear that multithreaded rendering hasn't been seriously attempted before.
I have an ace up my sleeve: GCC/LLVM's thread sanitizer. It works like Valgrind/AddressSanitizer but reports race conditions instead of buffer overflows. I figured out how to use it and iteratively eliminated all the fatal race conditions one by one. Patch is attached if anyone wants to try.
Here're the speedups I got on a complex scene with lots of filters (unexpected_visitor.svg from the discussion on my vectorized gaussian blur)
1 thread: 4.2s 8 (hyperthread): 1.1 1 thread (8 for filters): 2.6
CPU: Intel 4770 @3.4 GHz memory: 2 channels DDR3 @ 1866 MHz OS: Windows 10
So a ~2.4x speedup over the current implementation. Not bad, but on another synthetic scene (3 heavily blurred boxes), multithreading is actually > 2x slower than single thread ! This was very puzzling, but I finally figured it out. It's because it's doing almost 4x more work. When rendering a filtered object, all objects behind that one have to be rendered immediately and that intermediate rendering can have a larger area than the rendered region itself since filters can access neighboring pixels. For the blur filter, the expanded region was way bigger than the rectangle each thread was rendering to! It also explains why the current renderer without multithreading is very slow when zooming into a heavily blurred region. It's because the rendering is done in blocks to improve responsiveness. But the block size (64k) is too small. Please consider increasing this - make it 1/8 of the window height?
I hope this isn't a fundamental problem. Any thoughts on how this might be improved?
The other, safer approach is to multithread pixman, but that also has lots of challenges and probably won't be as fast for most scenes: -lots of small functions to optimize -some functions like Cairo's scanline rendering (cairo_tor_scanline_converter_generate()) are probably difficult or too small to be multithreaded -needs more forks/joins (fine grained ||ism). This will be bad on Windows, where the pthread wake up latency is 7 times longer than on Linux.
"I don't know how widespread AVX2 is, or if the 1.3x improvement is a large enough benefit to warrant considering it for [pixman]"
AVX2 should be on all Intel processors since summer, 2013 (Haswell). I measured the speedups again on my desktop with > 2x the memory bandwidth of my laptop and it's still quite weak. It must be that those functions like blits (2D memcpy), fills, composite_in, composite_out, are all bandwidth limited, so wider SIMD isn't much help. You might say this bottleneck would contradict the reported speedups above, but keep in mind that 1 core alone can't fully use up all the memory bandwidth.
I'll have a discussion with the pixman developers to see what they think. On a related note, I've also submitted a patch for Windows touchscreen support in GTK: https://bugzilla.gnome.org/show_bug.cgi?id=776568
-Yale
On Sun, Jan 1, 2017 at 1:42 AM, Bryce Harrington <bryce@...961...> wrote:
On Sat, Dec 31, 2016 at 06:21:40PM -0500, Yale Zhang wrote:
In my quest for a more fluid experience with fewer distractions, I've attempted to multithread the rendering. I see this has been a long standing discussion:
https://bugs.launchpad.net/inkscape/+bug/200415 https://bugs.launchpad.net/inkscape/+bug/330271
I'm using a Lenovo P40 tablet and the total frame rendering time for a simple piece is 80 to 100ms (1920x1080, a few hundred vertices on 2 layers with no filter effects - only alpha compositing). This slow rendering speed makes the touchscreen zooming I recently implemented very jerky.
So, I tried multithreading SPCanvas::paintRectInternal() with OpenMP by splitting the rectangle into 2 and rendering them in ||. I used mutual exclusion for some obviously thread unsafe code like the call to markRect in SPCanvas::paintSingleBuffer(). The rendering would work for a few frames before it freezes (waiting threads timeout and then exit).
Then, I put other calls that I suspected were thread unsafe in mutually exclusive blocks until I discovered _root->render() isn't safe. No point in going further.
Excuse my naive attempt. Can anyone guess how feasible it is to multithread the rendering? For now, I don't care if it's pixel perfect. I just need something that's decent and doesn't crash/freeze.
Yes, various people have looked at multi-threading before, but not founda feasible way to attack it.
I'm also wondering why the Cairo OpenGL backend isn't being used? GPU rendering on integrated GPUs should give a nice speedup since there should be no copying overhead.
On Linux, the cairo library is typically shipped with its GL backend disabled, so that presents sort of a logistical roadblock that'd need solved. Also, while theoretically you're right it should provide a performance boost, it's not guaranteed. OpenGL has been experimental in Cairo and not as thoroughly tested as the X and other backends, so there may well be corner cases where performance is poorer. But no way to know for certain except to hook it up and try it out. A number of us have had this task on our todo list but I don't think anyone's taken a solid shot at it yet.
The other place to optimize is pixman. I did some profiling (rapidly zooming in and out with touchscreen) and >= 25% of the time is spent in pixman rendering. I already went ahead and ported a few to AVX2 and got ~1.3x speedup (should get more since my laptop is bottlenecked by memory bandwidth owing to having only 1 memory channel).
Since pixman is low level and widely used, optimzations would be very interesting. I don't know how widespread AVX2 is, or if the 1.3x improvement is a large enough benefit to warrant considering it for Pixman, though. Regardless, I'd be interested in learning more of your work along these paths. Perhaps you'll discover something worth inclusion in upstream codebases?
Thanks, Bryce
Function Module Samples sse2_blt.part.0 libpixman-1-0.dll 4221 sse2_combine_in_u libpixman-1-0.dll 2189 sse2_fill libpixman-1-0.dll 1693 cairo_tor_scan_converter_generate libcairo-2.dll 1494 sse2_composite_over_8888_8888 libpixman-1-0.dll 1424 bits_image_fetch_separable_convolution_affine_none_a8r8g8b8 libpixman-1-0.dll 1104 feed_curve_to_cairo(_cairo*Geom::Curve const& libinkscape_base.dll 611 fast_composite_scaled_bilinear_sse2_8888_8888_cover_SRC libpixman-1-0.dll 475 fill_xrgb32_lerp_opaque_spans libcairo-2.dll 348 cairo_tor_scan_converter_add_polygon libcairo-2.dll 260 compute_face libcairo-2.dll 238 _dynamic_cast libstdc++-6.dll 209 outer_join libcairo-2.dll 179 cairo_polygon_add_edge libcairo-2.dll 178 g_hash_table_lookup libglib-2.0-0.dll 169 cairo_spline_decompose_into libcairo-2.dll 153 g_slice_alloc libglib-2.0-0.dll 138 cairo_spline_intersects libcairo-2.dll 131 feed_pathvector_to_cairo(_cairo*Geom::PathVector const&) libinkscape_base.dll 127 line_to libcairo-2.dll 119 void std::vector<Geom::Pointstd::allocatorGeom::Point libinkscape_base.dll 116 cairo_matrix_transform_point libcairo-2.dll 110 cell_list_render_edge libcairo-2.dll 106 g_type_check_instance_is_a libgobject-2.0-0.dll 106
Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot _______________________________________________ Inkscape-devel mailing list Inkscape-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/inkscape-devel