I have an update on my progress. I've made these changes which I'm requesting to be merged from my simdgenius_inkscape branch

1. disable event compression (previously mentioned)

2. fix priority inversion (09f55021, previously mentioned) - without this, dragging objects too fast would cause no redraw.

3. draw immediately (78d47938) this partially reverts Krzysztof's Hack fest 2016 changes like in 0.92.  This reduces latency by ~10ms  (compare cells G22 and E22 in spreadsheet)

I originally wanted to remove backing store and draw directly to the window surface, but I ran into this dilema so I didn't do it.


But there is still way more speedup to be had:

1. rendering cache slows things down 6x!  Compare cells E9 & E10 in the spreadsheet.

2. optimize GDK (maybe it's only slow on Windows) - compare cells J17 and J23 
      I made 2 changes:
      a. disabled layered windows - this is almost the same effect as setting GTK_CSD=0. I'm not clear why it speeds things up. I'm guessing it reduces image copying. Layered windows have to draw to CPU memory (GDI DIB), while non-layered windows draw directly to GPU memory (device dependent bitmap)?

      b. replace surface_content = gdk_window_get_content (window);   in gdk_window_begin_paint_internal()
          with surface_content = CAIRO_CONTENT_COLOR_ALPHA

          why create/delete a surface just to get the type?

3. multithreaded rendering - I've updated it to work with trunk, so you can try it. Very little speed up for simple scenes.


On Tue, Jan 9, 2018 at 2:23 AM, Eduard Braun <eduard.braun2@...173...> wrote:
Am 09.01.2018 um 08:58 schrieb Yale Zhang:
"so this might be tough to achieve unless we can motivate upstream sufficiently or can implement it ourselves."
Right, I don't think anyone cares about Windows GTK except LRN, you, and I. I'm pretty familiar with low level Windows stuff, so I feel up to the task.
That is great to hear! Let me know if I can help in any way down the road (testing, etc.).

I think the best way to proceed is to,

1. restore direct rendering in trunk
2. optimize GDK windows backend by not creating and destroying surfaces for every frame draw. Reduce buffer copying.
    Use DWM for transparency instead of the ancient layered windows. That can be a fallback for Windows < 7

3. (optional) see if any benefits to using DXGI or OpenGL surfaces. Earlier I thought they would be faster since they support page flipping instead of copying. But it seems page flipping is only for full screen apps, while for windowed apps, even when you swap buffers with wglSwapBuffers(), it's actually doing a copy.
This sounds like a good plan. As gtk3 has already dropped support for Windows XP we don't have to care for fallbacks.


On Mon, Jan 8, 2018 at 7:30 AM, Eduard Braun <eduard.braun2@...173...> wrote:
Am 08.01.2018 um 14:41 schrieb Yale Zhang:
Hurray, I've dug much deeper into the problem and have a very good picture of what's slow.
Thank you so much for digging into this! It would probably have taken me weeks to figure out only half of it...

The biggest cause of lag is bzr r14795 (Hackfest 2016: Fix SPCanvas to comply with GTK3 rendering model). That adds 13.7 ms of latency (maybe less without my changes) because instead of sending the rendered canvas immediately to the screen, it schedules a call to SPCanvas::handle_draw() by invalidating the drawn area. I've changed it back to the original method and it cuts the latency from 20 to 11.5 ms. Benchmark attached.
Is there any advantage of scheduling a draw here instead of rendering immediately? (i.e. is there any reason for doing it "the gtk3 way"?)
I know we had performance degradation even in gtk2 which is why this code was eventually reverted in 0.92.x branch restoring performance to the status-quo, see https://gitlab.com/inkscape/inkscape/commit/972b7daf0ea9f73b55e0b9e48503a130aefce9f5

*too many buffer copies (see inkscape_render_buffers.svg) -
    a. There's no need to draw to a separate buffer returned by gdk_window_begin_draw_frame() for eliminating tearing because Inkscape already renders to backing store.
       Or you can render directly to that buffer and not have the separate backing store.
This matches the observations in https://bugzilla.gnome.org/show_bug.cgi?id=781153#c8 and following comments.
From what I understand gtk3 always uses double buffering, so probably we could render directly to the buffer without risk for regressions in other environments?

    b. GDK should not have to copy from the temporary surface to the window surface in gdk_window_end_draw_frame(). It should use page flipping like in OpenGL and Direct3D. They should add an OpenGL or Direct2D backend for Windows.
I'm afraid the Windows backend is not under overly active development, so this might be tough to achieve unless we can motivate upstream sufficiently or can implement it ourselves.

    c. The buffer gdk_window_end_draw_frame() copies to might not be the actual screen (DC for Windows). That's the case for layered Windows. There was a recent change to use layered windows for the Windows backend for transparency:

    I have suspicions that Win2k legacy API will be slower than using DWM functions for transparent windows. 
    Also, the reason why GTK_CSD=0 makes things slow on Windows is because it uses a slightly different rendering path:
      1. GTK_CSD=1 -  uses layered windows. Both Cairo surfaces are memory buffers and not actual GPU memory (DC in GDI terms). UpdateLayeredWindow() is used.
      2. GTK_CSD=0 -  seems both surfaces are actual GPU memory. My guesses to why it's slow are:
               i.  gdk_window_end_draw_frame() slow because it has to read the surface back from GPU to CPU memory, only to copy it back to GPU memory?
               ii.  rendering to a Cairo Win32 surface is slow? Cairo on Windows uses pixman for software rendering in CPU memory, but at what granularity does it upload the pixels to GPU memory?
*no need for gdk_window_begin_draw_frame() to clear buffer

We're going to have to seriously work with GTK developers to improve rendering speed for Windows.
The most active (only?) GTK developer working on the win32 backend seems to be LRN (author of the commit you linked) and I assume he's our best bet to get feedback.
I'm not sure how much he's willing to help with the GTK_CSD=0 part as it seems upstream is not overly interested in providing native looking apps and therefore mainly cares for the GTK_CSD=1 case and layered windows. On the other hand I can't imagine they'll put obstacles in our way if we can put in sufficient effort...


On Thu, Jan 4, 2018 at 2:39 AM, Eduard Braun <eduard.braun2@...3331.....> wrote:

Cool! I will try it this evening.

Any idea why this only seemed to affect Windows with CSD disabled? From your explanation I gather this could have been an issue on any platform.

Best Regards,

Am 04.01.2018 um 08:17 schrieb Yale Zhang:
OK, I think I've solved the slow dragging problem. Eduard, can you try my latest commit?

There are 4 events involved for dragging it seems:
1. GDKEventMouse
2. selection modified
   generated in Selection::_scheduled_modified() with priority 101
3. redraw
   generated in SPCanvas::addIdle() with priority 200
   calls SPCanvas::paint() which only redraws to *offscreen* buffer
4. refresh/expose generated in gtk_widget_queue_draw_area() with priority 120
   calls SPCanvas::handle_draw() which actually updates the screen

For fast redraw response, you want to execute #2, then #3, then #4 as soon as possible. This
requires the priorities for each event to be higher than the previous or else mouse events that
come later will get processed before earlier redraw,expose events already in the queue. Since
this is real time stuff, the longer an event waits in the queue, the higher its priority should be,
so this is a priority inversion.

So all I did was change the event priorities.

On Wed, Jan 3, 2018 at 4:53 AM, Yale Zhang <yzhang1985@...400...> wrote:
After digging further, I have some questions about how rendering works.

At 1st, it seems the lack of refresh when dragging things is because the refresh priority (for calling idle function) is lower than the mouse event priority.
This is consistent with the behavior of refresh working only when you move the mouse slowly.

I tried changing UPDATE_PRIORITY in sp-canvas.cpp to high, but it didn't help since idle_handler() & SPCanvas::paint() are being called frequently when when dragging rapidly.
But why don't the changes show on the screen? Is it drawing to an off screen buffer?

I noticed there's a call to gdk_widget_queue_draw_area() that's not in 0.92.  What does that do? The documentation says it will generate an expose event for that invalidated area, but that doesn't make sense because
why would you invalidate the region right after drawing it.

I just need to know where the screen is actually updated.


On Wed, Jan 3, 2018 at 12:34 AM, Yale Zhang <yzhang1985@...400...> wrote:
Correct, the slow refresh when dragging objects is still there. I'm looking into it next since it also kills my productivity. But for now at least, pen & calligraphy responsiveness matches that in 0.92. I use the Wacom pen all the time and with event compression it simply could not keep up with something like signing your name.

I've had lots of experience debugging CPU throughput bottlenecks (I've used Linux perf, gprof, Zoom, VTune), but not *latency* ones. Validating real time & parallel behavior is a lot hard from what I heard.

I've approached this rather amateurishly so far with good old printf() debugging :)  I just recorded the time stamps of a GdkEvent throughout its life cycle from creation, dispatch, and to when it's handled. You could plot all the events on parallel time lines (like in NVIDIA's CUDA profiler) to get a big picture and spot any anomalies (I've made a web app that generates parallel timelines in SVG) but that will probably take too long to study.

On Tue, Jan 2, 2018 at 4:07 PM, Eduard Braun <eduard.braun2@...173...> wrote:

Hi Yale,

great to see somebody looking into this!

I was looking into motion event compression before and it certainly sounds like something that could improve responsiveness of certain tools in Inkscape.

Unfortunately it does not noticeably improve redraw performance in relation to the cited bug for me - as mentioned in the bug report it becomes extremely noticeable with increasing window size and happens for "simple" tasks like moving a rect on canvas. For a 2560x1440 window redrawing basically stops for me while moving the mouse and only resumes once I stop movement of the mouse pointer...

I hope we can figure out the source - you certainly seem to be more experienced with profiling tasks (maybe you can give me some pointers on your workflow?) eventually...

Best Regards,

Am 31.12.2017 um 10:52 schrieb Yale Zhang:
OK, I got developer access now. That was fast.

I've created a merge request

On Sun, Dec 31, 2017 at 4:08 AM, Yale Zhang <yzhang1985@...233.....400...> wrote:
Hi, I've identified why drawing is lagging with GTK+3.

It's because of GTK3's motion event compression:

Adding a  gdk_window_set_event_compression (window, FALSE);  in SPCanvas::handle_realize() makes things much smoother.

At 1st I thought it was because the events were sitting in the queue for too long. So I added some timing code to measure the latency between when a motion event was generated in GDK  to when SPCanvas::paint() is called. Actually, I detect bursts of mouse moves or redraws and only use the 1st for latency measurements since there might not be a 1 to 1 relation between motion events and redraws. I was seeing a 4 to 10ms latency for head (GTK3) but only 0.5 ms for 0.92 (GTK2).

I thought I was on to something, but this mislead me for a while. Finally, I saw that the # motion events and redraws were 10x higher for GTK2.

I haven't stayed up to date with the GitLab migration. I tried to push a patch to my branch simdgenius_inkscape, migrated from Bazaar, but access is denied. I just requested project access, so appreciate it if someone grants it.


Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

Inkscape-devel mailing list