I'm interested in improving Inkscape's test framework for the GSoC. I've read the GSoC related documentation. I've also read all the information on testing on the Wiki that I could find and had a look at some of the current unit tests in SVN.
Unfortunately I don't have a Linux system set up at the moment, so I can't actually execute 'make check'. So far I've come up with the following draft plan:
1. Make sure the existing tests work on Win32 (and MacOS X). 2. Set up system to rerun tests periodically/when needed. 3. *Possibly* also try generating some historical data (where possible). 4. Implement new tests for (at least): - nr-compose (I already did this locally, at least partially) - livarot (parts of) - sp_svg_... (transform_read/write, read/write_path, etc.) 5. Integrate (existing) SVG conformance tests so that they are also rerun. I will attempt to partially automate this (as far as I understand the current tests are evaluated manually) by storing result images and only asking for human judgement when an image changes. I'm also thinking of a few more advanced ways of automating some of these tests. 6. Add performance information by timing tests (as well as rendering of images, for the SVG conformance tests for example).
In short my intention is to create a system which (continuously) keeps track of the status of unit tests as well as SVG conformance and performance of Inkscape. Test results will be accessable online.
I also plan to use gcov to keep track of test coverage and if interesting (read: if the tests take long) reduce the number of tests that have to be rerun each time. I'm currently checking whether I can get gcov to work to my satisfaction with Inkscape.
If deemed interesting I might also add information from gprof to give a better idea of what parts of the code are interesting to optimize (and keep track of performance over time on a fine-grained level). I already regularly use gprof with Inkscape myself.
Is this more or less what was originally intended? In any case, I would appreciate any suggestions and/or information you might have.
Sounds nice Jasper.
Yesterday I had a chat with mgsloan of 2geom, and we discussed lib2geom integration into Inkscape. We both thought we need tests to make sure nothing gets broken by the switch to 2geom etc. Since I am on Windows as well, I also have troubles with not being able to do 'make check'. What you propose sounds all very nice (I like the publishing on internet). Also the profiling is nice, I have been looking for that a while ago.
Cheers, Johan
-----Original Message----- From: inkscape-devel-bounces@lists.sourceforge.net [mailto:inkscape-devel-bounces@lists.sourceforge.net] On Behalf Of Jasper van de Gronde Sent: vrijdag 21 maart 2008 13:19 To: inkscape Subject: [Inkscape-devel] GSoC test suite project
I'm interested in improving Inkscape's test framework for the GSoC. I've read the GSoC related documentation. I've also read all the information on testing on the Wiki that I could find and had a look at some of the current unit tests in SVN.
Unfortunately I don't have a Linux system set up at the moment, so I can't actually execute 'make check'. So far I've come up with the following draft plan:
- Make sure the existing tests work on Win32 (and MacOS X).
- Set up system to rerun tests periodically/when needed.
- *Possibly* also try generating some historical data (where
possible). 4. Implement new tests for (at least): - nr-compose (I already did this locally, at least partially) - livarot (parts of) - sp_svg_... (transform_read/write, read/write_path, etc.) 5. Integrate (existing) SVG conformance tests so that they are also rerun. I will attempt to partially automate this (as far as I understand the current tests are evaluated manually) by storing result images and only asking for human judgement when an image changes. I'm also thinking of a few more advanced ways of automating some of these tests. 6. Add performance information by timing tests (as well as rendering of images, for the SVG conformance tests for example).
In short my intention is to create a system which (continuously) keeps track of the status of unit tests as well as SVG conformance and performance of Inkscape. Test results will be accessable online.
I also plan to use gcov to keep track of test coverage and if interesting (read: if the tests take long) reduce the number of tests that have to be rerun each time. I'm currently checking whether I can get gcov to work to my satisfaction with Inkscape.
If deemed interesting I might also add information from gprof to give a better idea of what parts of the code are interesting to optimize (and keep track of performance over time on a fine-grained level). I already regularly use gprof with Inkscape myself.
Is this more or less what was originally intended? In any case, I would appreciate any suggestions and/or information you might have.
This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Inkscape-devel mailing list Inkscape-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/inkscape-devel
J.B.C.Engelen@...1578... wrote:
Sounds nice Jasper.
Yesterday I had a chat with mgsloan of 2geom, and we discussed lib2geom integration into Inkscape. We both thought we need tests to make sure nothing gets broken by the switch to 2geom etc. Since I am on Windows as well, I also have troubles with not being able to do 'make check'. What you propose sounds all very nice (I like the publishing on internet). Also the profiling is nice, I have been looking for that a while ago.
If you want to profile Inkscape yourself, just add -pg to the build flags in build.xml (I've attached my build.xml as an example). You then have to rebuild Inkscape (takes a while) and, voilá, you're ready to start profiling.
To get the profiling information using gprof, just start Inkscape and do whatever you're interested in profiling. A file called gmon.out will be written. Now you can execute: gprof inkscape.dbg gmon.out > gmon.txt The gmon.txt file will now contain the standard profiling information that gprof generates. (gprof has lots of switches to do all sorts of interesting stuff, but this is the basic procedure)
Unfortunately I don't have a Linux system set up at the moment, so I can't actually execute 'make check'. So far I've come up with the following draft plan:
Running Ubuntu in MS Virtual PC is a quite simple thing to set up if you don't happen to have a spare computer laying around. The Ubuntu 6.* series works better with Virtual PC than the 7.* series.
Of course it is very good if make check would work on Windows also, but having a Linux reference machine to build on seems like a good way to check you don't break anything on the other side of the fence.
// Albin
Albin Sunnanbo wrote:
Unfortunately I don't have a Linux system set up at the moment, so I can't actually execute 'make check'. So far I've come up with the following draft plan:
Running Ubuntu in MS Virtual PC is a quite simple thing to set up if you don't happen to have a spare computer laying around. The Ubuntu 6.* series works better with Virtual PC than the 7.* series.
Thanks for the information, could save me some frustration.
Of course it is very good if make check would work on Windows also, but having a Linux reference machine to build on seems like a good way to check you don't break anything on the other side of the fence.
I most definitely plan on doing just that. I already had a look at some emulators :)
On Mar 21, 2008, at 5:19 AM, Jasper van de Gronde wrote:
Unfortunately I don't have a Linux system set up at the moment, so I can't actually execute 'make check'. So far I've come up with the following draft plan:
The newer unit tests were done using CxxTest, which works well on Windows. Getting them setup in a windows build should not take too much more...
Also a while back I did a custom test reporter for it that did output in a standard XML format that Bryce wanted. That might be another thing to look at leveraging.
Jon A. Cruz wrote:
On Mar 21, 2008, at 5:19 AM, Jasper van de Gronde wrote:
Unfortunately I don't have a Linux system set up at the moment, so I
can't actually execute 'make check'. So far I've come up with the
following draft plan:
The newer unit tests were done using CxxTest, which works well on Windows. Getting them setup in a windows build should not take too much more...
I indeed hope not to spend too much time on that :)
Also a while back I did a custom test reporter for it that did output in a standard XML format that Bryce wanted. That might be another thing to look at leveraging.
That's a good idea, especially as I want to integrate the results and publish them on-line. Do you still have any working code lying around for writing to the aforementioned XML format?
On Mar 21, 2008, at 1:12 PM, Jasper van de Gronde wrote:
Also a while back I did a custom test reporter for it that did output in a standard XML format that Bryce wanted. That might be another thing to look at leveraging.
That's a good idea, especially as I want to integrate the results and publish them on-line. Do you still have any working code lying around for writing to the aforementioned XML format?
Yes. That should still be in there and live.
TRPIFormatter.h has the one that does the TRPI XML format (or a slightly extended version).
Once you have the info in some form of XML, it's usually easy to convert it to other xml using xslt.
Oh, and it looks like I did some "Pylog" formatter also. PylogFormatter.h
They are hooked up in "make check" to create test-all.xml and test- all.log
Jon A. Cruz wrote:
On Mar 21, 2008, at 5:19 AM, Jasper van de Gronde wrote:
Unfortunately I don't have a Linux system set up at the moment, so I
can't actually execute 'make check'. So far I've come up with the
following draft plan:
The newer unit tests were done using CxxTest, which works well on Windows. Getting them setup in a windows build should not take too much more... ...
I also saw some of the Utest unit tests were converted to the CxxTest framework, is there any specific reason (other than time, etc.) some are still not converted? I'm asking because for the test suite project I would like to convert the remaining Utest unit tests to CxxTest (this way the building process becomes a bit simpler and at the same time we can make sure all the tests use the same output format, etc.).
BTW, they indeed work perfectly fine on Windows :)
On Mar 29, 2008, at 4:34 AM, Jasper van de Gronde wrote:
I also saw some of the Utest unit tests were converted to the CxxTest framework, is there any specific reason (other than time, etc.) some are still not converted? I'm asking because for the test suite project I would like to convert the remaining Utest unit tests to CxxTest (this way the building process becomes a bit simpler and at the same time we can make sure all the tests use the same output format, etc.).
No. Not much other than that.
Well... a minor factor of trying to get more people involved with them.
There are also a few that are in both places.
BTW, they indeed work perfectly fine on Windows :)
Yay.
On Fri, Mar 21, 2008 at 9:19 AM, Jasper van de Gronde <th.v.d.gronde@...528...> wrote:
Unfortunately I don't have a Linux system set up at the moment, so I can't actually execute 'make check'. So far I've come up with the following draft plan:
Great plan, and long overdue! :)
I think the important thing is to test both from bottom up (unit tests of classes and functions) and from top down (scripted tests and performance measurements of the entire program doing various tasks on various files). For the latter, Ted's command line access to verbs is priceless, but you'll also need to find some kind of bitmap diff program to compare PNG renditions of various files to the reference versions and find differences. This will be especially critical when we switch main rendering to cairo.
I have a couple thousand misc SVG files from various sources, and before a release I ran a very simple script that would load each file and export it to PNG. Even without any bitmap diffing, this found at least a couple bugs.
bulia byak wrote:
On Fri, Mar 21, 2008 at 9:19 AM, Jasper van de Gronde <th.v.d.gronde@...528...> wrote:
Unfortunately I don't have a Linux system set up at the moment, so I can't actually execute 'make check'. So far I've come up with the following draft plan:
Great plan, and long overdue! :)
I think the important thing is to test both from bottom up (unit tests of classes and functions) and from top down (scripted tests and performance measurements of the entire program doing various tasks on various files).
It's a good idea to also make use of the command line access to verbs, this would indeed allow for some very interesting testing possibilities. The main problem would be how to test it. I would imagine comparing the SVG result to a reference output would be most interesting (after all, just looking good usually isn't good enough when executing verbs). But to compare SVGs it would be necessary to parse them and write comparison routines for different elements, which might be a bit much to do as part of this project.
Alternatively the resulting bitmaps could be compared, this would be a lot less work (it would be needed for the SVG conformance tests anyway). And it would still capture bugs resulting in visible problems.
For the latter, Ted's command line access to verbs is priceless, but you'll also need to find some kind of bitmap diff program to compare PNG renditions of various files to the reference versions and find differences. This will be especially critical when we switch main rendering to cairo.
One way I am planning to deal with this is letting humans decide whether something is acceptable and caching the result. The underlying assumption is that Inkscape's rendering output probably doesn't change that often (on most of the tests), so it would only require human intervention when something actually changes.
Another method to reduce human intervention I was thinking of was to define areas of the result image to disregard when comparing it. The idea behind this was that a lot of the conformance tests I found contained lots of labels which weren't actually important (for that test case). With a bit of luck the rest of the image is good enough to be compared directly to the reference image (although the image might not be exactly the same the difference will be considerably smaller, and methods like looking at MSE or maximum error will provide reasonable results).
In addition, I just stumbled across: http://pdiff.sourceforge.net/ I still have to try it, but it sounds interesting.
As a last resort it would definitely be feasible for me to program a simple bitmap diff program (one that computes the MSE for example).
Combining these methods I hope to minimize the amount of human intervention needed. If anyone has any other ideas I'd be glad to hear them.
I have a couple thousand misc SVG files from various sources, and before a release I ran a very simple script that would load each file and export it to PNG. Even without any bitmap diffing, this found at least a couple bugs.
I get the feeling there is no need to worry about getting enough test images :)
On Fri, Mar 21, 2008 at 10:13:32PM +0100, Jasper van de Gronde wrote:
bulia byak wrote:
On Fri, Mar 21, 2008 at 9:19 AM, Jasper van de Gronde <th.v.d.gronde@...528...> wrote:
Unfortunately I don't have a Linux system set up at the moment, so I can't actually execute 'make check'. So far I've come up with the following draft plan:
Great plan, and long overdue! :)
I think the important thing is to test both from bottom up (unit tests of classes and functions) and from top down (scripted tests and performance measurements of the entire program doing various tasks on various files).
It's a good idea to also make use of the command line access to verbs, this would indeed allow for some very interesting testing possibilities. The main problem would be how to test it. I would imagine comparing the SVG result to a reference output would be most interesting (after all, just looking good usually isn't good enough when executing verbs). But to compare SVGs it would be necessary to parse them and write comparison routines for different elements, which might be a bit much to do as part of this project.
As a first order, a simple diff | wc -l could be used to estimate the size of change. md5sum of the svg file can be used to test output identity. There's probably also tools for extracting the elements of an XML file, and a simple listing of the SVG elements could be useful for comparisons (e.g., after adding a group, check to see if a new group object is present).
Alternatively the resulting bitmaps could be compared, this would be a lot less work (it would be needed for the SVG conformance tests anyway). And it would still capture bugs resulting in visible problems.
The Cairo test suite has a simple bitmap diff tool that could be reused here. There are also other similar tools (I used one once from a video camera motion sensor project I found on SourceForge.)
Combining these methods I hope to minimize the amount of human intervention needed. If anyone has any other ideas I'd be glad to hear them.
I have a couple thousand misc SVG files from various sources, and before a release I ran a very simple script that would load each file and export it to PNG. Even without any bitmap diffing, this found at least a couple bugs.
I get the feeling there is no need to worry about getting enough test images :)
Along these lines of using bitmap diff tools, you could create a suite of regression tests using past problematic .svg's. This could be done by scouring launchpad for Fixed bugs with attachments that are not patches (you'd need to write a spider script to do this, or use a tool like bugbuddy, since the advanced search doesn't allow for this), downloading the svg's and prepending the bug ID to their filenames.
Collect those files, run them through current inkscape to produce png's, and store those as the reference images, then make a simple test script that will run through the .svg's and compare output to the reference images, to look for future regressions. Toss that into a new SVN repo, and we'd suddenly have a big regression test to run automatically.
Bryce
Bryce Harrington wrote:
...
It's a good idea to also make use of the command line access to verbs, this would indeed allow for some very interesting testing possibilities. The main problem would be how to test it. I would imagine comparing the SVG result to a reference output would be most interesting (after all, just looking good usually isn't good enough when executing verbs). But to compare SVGs it would be necessary to parse them and write comparison routines for different elements, which might be a bit much to do as part of this project.
As a first order, a simple diff | wc -l could be used to estimate the size of change. md5sum of the svg file can be used to test output identity. There's probably also tools for extracting the elements of an XML file, and a simple listing of the SVG elements could be useful for comparisons (e.g., after adding a group, check to see if a new group object is present).
Unfortunately none of those would account for changes like increased/reduced precision or alternative encodings for paths.
Alternatively the resulting bitmaps could be compared, this would be a lot less work (it would be needed for the SVG conformance tests anyway). And it would still capture bugs resulting in visible problems.
The Cairo test suite has a simple bitmap diff tool that could be reused here. There are also other similar tools (I used one once from a video camera motion sensor project I found on SourceForge.)
Thanks for the suggestions.
...
Along these lines of using bitmap diff tools, you could create a suite of regression tests using past problematic .svg's. This could be done by scouring launchpad for Fixed bugs with attachments that are not patches (you'd need to write a spider script to do this, or use a tool like bugbuddy, since the advanced search doesn't allow for this), downloading the svg's and prepending the bug ID to their filenames. ...
That would indeed also be a potentially interesting source of test data. I already looked at launchpad's APIs, unfortunately they're not very advanced (yet), so it would indeed need some tricks. But I will definitely look into it.
On Fri, Mar 21, 2008 at 01:19:28PM +0100, Jasper van de Gronde wrote:
I'm interested in improving Inkscape's test framework for the GSoC. I've read the GSoC related documentation. I've also read all the information on testing on the Wiki that I could find and had a look at some of the current unit tests in SVN.
Unfortunately I don't have a Linux system set up at the moment, so I can't actually execute 'make check'. So far I've come up with the following draft plan:
- Make sure the existing tests work on Win32 (and MacOS X).
- Set up system to rerun tests periodically/when needed.
- *Possibly* also try generating some historical data (where possible).
- Implement new tests for (at least):
- nr-compose (I already did this locally, at least partially)
- livarot (parts of)
- sp_svg_... (transform_read/write, read/write_path, etc.)
- Integrate (existing) SVG conformance tests so that they are also rerun. I will attempt to partially automate this (as far as I understand the current tests are evaluated manually) by storing result images and only asking for human judgement when an image changes. I'm also thinking of a few more advanced ways of automating some of these tests.
- Add performance information by timing tests (as well as rendering of images, for the SVG conformance tests for example).
In short my intention is to create a system which (continuously) keeps track of the status of unit tests as well as SVG conformance and performance of Inkscape. Test results will be accessable online.
Is this more or less what was originally intended? In any case, I would appreciate any suggestions and/or information you might have.
Actually the original intent is more about gaining a suite of useful tests, than setting up the test framework. We already have most of the infrastructure for running tests via cxxtest and make check; certainly it could be better/fancier, but that's far secondary to having more valuable tests in the first place. (Having worked on several test frameworks myself, I know how much more attractive they are to work on than "just" tests, but the ultimate goal is finding and fixing bugs, and for that we need tests.)
So, given that the summer moves along fast, I would encourage you to focus on #1 and #4, which are the things you could add the most unique value. The others are of course worthwhile, but other folks have set those up in the past and I expect could again in the future; so far the limiting factor has been our scarcity of tests, which reduce the usefulness of frameworks to begin with. So if you focused particularly on #1 and #4, I think it might stimulate the rest to fall into place, and would make the most valuable use of your time.
livarot would probably not be worth the while to instrument with tests as it's scheduled for removal anyway.
With testing, there are some general rules of what things are best to make tests for:
* Code that is under active development according to svn logs (bugs breed in new code)
* Code that integrates two different chunks of code (bugs hide out between the cracks)
* Code in which a lot of other bugs have already been found (bugs tend to congregate together). Check recent bug tracker activity for where bugs are being found.
* Code that isn't documented (bugs live in dark areas)
* Code that is executed a lot (these bugs hurt the most)
* Code that is rarely ever executed (these bugs are fat and lazy, and easy to find)
* Code that when you ask a developer about it, they wince and wish to change the subject. ;-)
Hopefully those heuristics can be used to prioritize which sections of code need tests written the most.
The types of tests needed, as Bulia and others mentioned, include both unit tests (via cxxtest), and high level functional tests (such as via the command line verbs). The former is more suited for running via make check; the latter could be hooked into make check but probably would be more useful to be a stand alone suite of test scripts.
Bryce
Bryce Harrington wrote:
... Actually the original intent is more about gaining a suite of useful tests, than setting up the test framework. We already have most of the infrastructure for running tests via cxxtest and make check; certainly it could be better/fancier, but that's far secondary to having more valuable tests in the first place. (Having worked on several test frameworks myself, I know how much more attractive they are to work on than "just" tests, but the ultimate goal is finding and fixing bugs, and for that we need tests.)
Great, thanks for the feedback. For me the main thing is to make sure the tests are actually used (obviously this goes hand in hand with actually *having* tests). That is, I'd like to make sure the tests are run often and systematically. Personally I don't think developers will run them themselves very often (I know how difficult it can be to get people to change their habits), so by running them as often as possible (preferably after each commit) on all three supported platforms I hope any future regressions might be picked up more easily. And with a bit of luck it will also increase the visibility of the tests.
Having said that, that is also pretty much all I want to do in the testing *framework* department, I'm not looking to create a fancy web interface with all sorts of nice features. If I can get away with a system which regularly reruns tests and uploads a few xml files, and perhaps sends a message to a mailing list if it encounters a regression, I'd be quite happy.
So, given that the summer moves along fast, I would encourage you to focus on #1 and #4, which are the things you could add the most unique value. The others are of course worthwhile, but other folks have set those up in the past and I expect could again in the future; so far the limiting factor has been our scarcity of tests, which reduce the usefulness of frameworks to begin with. So if you focused particularly on #1 and #4, I think it might stimulate the rest to fall into place, and would make the most valuable use of your time.
I'll definitely focus on those first, but I do intent to also incorporate higher level tests, like the W3C conformance tests and tests for verbs.
livarot would probably not be worth the while to instrument with tests as it's scheduled for removal anyway.
I was wondering about that. I know the plan is to use Cairo for rendering, but for what parts exactly, and on what time scale? For example, would nr-compose also become redundant? And if the current renderer will probably remain to be used for one or two years it may still be useful to make sure it is in good shape.
With testing, there are some general rules of what things are best to make tests for:...
Hopefully those heuristics can be used to prioritize which sections of code need tests written the most.
I had been thinking about this too, and I'll definitely have a look at things like this. But initially I thought I'd try to more or less close some "gaps" and more or less try to cover the code involved in converting SVG to a bitmap. I like systematic approaches :)
The types of tests needed, as Bulia and others mentioned, include both unit tests (via cxxtest), and high level functional tests (such as via the command line verbs). The former is more suited for running via make check; the latter could be hooked into make check but probably would be more useful to be a stand alone suite of test scripts.
Yes, indeed. It might also have different output. I could imagine it might be useful to have a more quantitative result for example. Instead of saying a test failed or passed it would simply report the similarity measure it uses for example, so people could more easily evaluate the effect of changes that only change the precision of the renderer.
On Fri, Mar 21, 2008 at 7:29 PM, Jasper van de Gronde > > livarot would probably not be worth the while to instrument with tests
as it's scheduled for removal anyway.
I was wondering about that. I know the plan is to use Cairo for rendering, but for what parts exactly, and on what time scale?
No promises on time scale; as for coverage, have a look at how different things are done in display/ depending on mode - the goal is to make the normal mode like the outline mode which already uses cairo.
For example, would nr-compose also become redundant?
It will be less used but still used for many things. And btw it will need some serious changes because cairo uses a different bitmap format.
And if the current renderer will probably remain to be used for one or two years it may still be useful to make sure it is in good shape.
Sure, if you find a bug in livarot, it's worth patching. Just don't spend your time specifically on that part.
On Fri, Mar 21, 2008 at 11:29:28PM +0100, Jasper van de Gronde wrote:
livarot would probably not be worth the while to instrument with tests as it's scheduled for removal anyway.
I was wondering about that. I know the plan is to use Cairo for rendering, but for what parts exactly, and on what time scale? For example, would nr-compose also become redundant? And if the current renderer will probably remain to be used for one or two years it may still be useful to make sure it is in good shape.
The time frame I'm thinking is less than a year. Some aspects of the old system may remain where analogs in Cairo or 2geom don't exist, but I don't know offhand what those would be.
Bryce
On Fri, 21 Mar 2008 16:21:32 -0700, Bryce Harrington <bryce@...1798...> wrote:
On Fri, Mar 21, 2008 at 11:29:28PM +0100, Jasper van de Gronde wrote:
livarot would probably not be worth the while to instrument with tests as it's scheduled for removal anyway.
I was wondering about that. I know the plan is to use Cairo for rendering, but for what parts exactly, and on what time scale? For example, would nr-compose also become redundant? And if the current renderer will probably remain to be used for one or two years it may still be useful to make sure it is in good shape.
The time frame I'm thinking is less than a year. Some aspects of the old system may remain where analogs in Cairo or 2geom don't exist, but I don't know offhand what those would be.
I think even the nr-composite stuff has analogs in libpixman.
The only thing unique to Inkscape that I can think of is bulia's compositing mode for rubber bands and such.
-mental
participants (7)
-
unknown@example.com
-
Albin Sunnanbo
-
Bryce Harrington
-
bulia byak
-
Jasper van de Gronde
-
Jon A. Cruz
-
MenTaLguY