Increasing performance through OpenCL

Abhishek Sharma

5 Sep 2010 5 Sep '10

10:35 a.m.

Hi all,

Previously I was discussing on lib2geom the concept of applying OpenCL based refactoring on the code(lib2geom initially). Then on suggestions of some developers I have moved from lib2geom to inskcape on an overall to increase the performance of inskcape in any possible manner. Some earlier suggestions included, Parallelizing the rendering of cairo based implementation. I would love to move forward in the area, but I will be needing some push(in terms of startup) to get started. Like the first and foremost question that lies ahead of me as a wall is, "What is the area of the code that is possible to be parallelized and how much can it affect the performance??" Second question is, What approach should be taken to parallelize the stuff?? Most serial algorithms when implemented in parallel totally change their appearance. So will that be OK or not?? There are other hurdles but it would be better to first proceed forwards with snacks and then go the main feast. I hope the developers will mark their attention to this and kindly come to start this discussion.

A hearty Thank You to all the inkscape developers.

-- Abhishek Sharma

Attachments:

attachment.htm (text/html — 1.2 KB)

Show replies by date

Jasper van de Gronde

5 Sep 5 Sep

3:48 p.m.

On 2010-09-05 12:35, Abhishek Sharma wrote:

...

... I would love to move forward in the area, but I will be needing some push(in terms of startup) to get started. Like the first and foremost question that lies ahead of me as a wall is, "What is the area of the code that is possible to be parallelized and how much can it affect the performance??" Second question is, What approach should be taken to parallelize the stuff??

As explained earlier one possibility would be to optimize Cairo. This is NOT part of the Inkscape code base in any way, but any work on this library will (in the near future) directly be able to impact not only Inkscape, but also other open-source projects. As it is largely a rasterizing library it should not be too hard to find things to parallelize, but you should really contact the Cairo folks for that.

...

Most serial algorithms when implemented in parallel totally change their appearance. So will that be OK or not?? There are other hurdles but it would be better to first proceed forwards with snacks and then go the main feast. I hope the developers will mark their attention to this and kindly come to start this discussion.

Inkscape has a number of filters that can be easily parallelized (in fact, Gaussian blur already uses a crude, but effective, form using OpenMP). They are all located in the src/display directory as nr-filter-*.cpp. You're free to pick one and parallelize it (there all have a pretty obvious "render" function). Quite a few of them would be trivial to parallelize, so have a look if you can find one with which you feel comfortable modifying and have a go. If you run into trouble you can always present your problem here.

To write a patch for Inkscape all you have to do is checkout the source, modify it to suit your needs, make a patch file (bzr diff -p1 for example) and post it here and/or to the bug tracker. You don't need our permission or approval to get started. (There is information on the Wiki on how to check out the code and build it.)

Joshua A. Andler

5:53 p.m.

On Sun, 2010-09-05 at 17:48 +0200, Jasper van de Gronde wrote:

...

Inkscape has a number of filters that can be easily parallelized (in fact, Gaussian blur already uses a crude, but effective, form using OpenMP). They are all located in the src/display directory as nr-filter-*.cpp. You're free to pick one and parallelize it (there all have a pretty obvious "render" function). Quite a few of them would be trivial to parallelize, so have a look if you can find one with which you feel comfortable modifying and have a go. If you run into trouble you can always present your problem here.

If I am not mistaken, Krzysztof has already done the parallelization of the rest of the filters in his GSoC branch.

Cheers, Josh

Krzysztof Kosiński

9:03 p.m.

2010/9/5 Joshua A. Andler <scislac@...400...>:

...

If I am not mistaken, Krzysztof has already done the parallelization of the rest of the filters in his GSoC branch.

Cheers, Josh

That's right, but they use OpenMP only, no OpenCL for now.

Converting the filters to OpenCL might or might not give some performance gain. With multiple filters the overhead of transmitting the data to the graphics card will dominate and we might end up being slower than the CPU. Additionally, older Nvidia cards and all ATI cards older than HD 5xxx lack the ability to create OpenCL image objects. It's theoretically possible to implement the filters without them, using generic memory objects, but I'm not sure what the performance will be like.

The best approach would be to use cairo-gl surfaces and create OpenCL contexts that directly refer to OpenGL pixmap contents. This would bring the count of roundtrips down to 1 (at the end of rendering, to draw to the X surface provided by GTK). Unfortunately, the performance of cairo-gl is rather bad, especially on ATI hardware. I have a Radeon HD 4850, which is a mid-range gaming card, and the Cairo performance tests run ~10x slower on cairo-gl than on the image backend.

Regards, Krzysztof

Alexandre Prokoudine

9:51 p.m.

On 9/6/10, Krzysztof Kosiński wrote:

...

2010/9/5 Joshua A. Andler <scislac@...400...>:

...
If I am not mistaken, Krzysztof has already done the parallelization of the rest of the filters in his GSoC branch.

That's right, but they use OpenMP only, no OpenCL for now.

Converting the filters to OpenCL might or might not give some performance gain. With multiple filters the overhead of transmitting the data to the graphics card will dominate and we might end up being slower than the CPU.

Then make it possible to disable GPU ;) IIRC, OpenCL can make a use of multicore CPU just like OpenMP.

Alexandre Prokoudine http://libregraphicsworld.org

Abhishek Sharma

6 Sep 6 Sep

6:39 a.m.

Well yes it is quite true that, we can atleast use the multicores on the system if not graphics card so as to increase performance.

5450

Age (days ago)

5451

Last active (days ago)

List overview

Download

5 comments

5 participants

tags (0)

participants (5)

Abhishek Sharma
Alexandre Prokoudine
Jasper van de Gronde
Joshua A. Andler
Krzysztof Kosiński