Re: [Inkscape-devel] W3C SVG test suite and trunk
Hi. Sorry, my fault. Actually I thought i was sending the messages to the mailing list, but by mistake i only sent them to martin and alex. I send the messages again.
============ Email 1 ==================
2013/10/10 Alex Valavanis <valavanisalex@...400...>:
Well, I guess the easiest thing would be to organise the svg input files for the rendering tests into two separate subfolders in known-pass/testcases and known-fail/testcases. We can then just tell the test runner to look in the appropriate folder:
# cat run-svg-tests-good runtests.py --directory=known-pass
# cat run-svg-tests-bad runtests.py --directory=known-fail
Ok, if I understood correctly, the idea here is to have known pass tests, and known fail tests. All test with the respective pass/fail test.
If a known pass test changes from the reference image then we can assume we have a regression.
If a known fail test changes from the reference we might have solved the problem, but since we can't automatically decide if it is already solved, we would need to check manually and add it to the known pass test if required.
That sounds to me like a good compromise between automation and some manual work. The regressions would be detected automatically but the improvements would need to be checked manually.
Thoughts about this ?
============ Email 2 ==================
2013/10/10 Martin Owens <doctormo@...400...>:
On Thu, 2013-10-10 at 16:14 +0100, Alex Valavanis wrote:
two separate executable scripts
Not yet. And while the python is pretty hairy (needs some code review) it should be possible to make it do what you want without too many issues.
Guiu, Tav; what do you think? Would it be easy enough to modify and would you like to patch it or should I?
Yes, I think I could be able to modify the code to fit the current code to be added to the testings. I sent a message trying to shade some light on what needs to be done and trying to define the problem exactly. If we're happy with what I propose, I can start making the changes soon. There is some hairy code and rewriting some can do very good in the readibility of the code.
Guiu
Martin,
========= Message 3 ==============
Ok, sorry for sending that much messages, I just wanted to make a little summary about the work that would be needed.
1. Get the SVG 1.1 Second edition test files and extract the text. Jasper, in the link [1] you sent the files are from the Second edition of the test suite or the first one ? Are they up to date ?
btw, I do actually think that removing the text is good, because if there is a problem with the text rendering code, it would affect a lot of tests. Removing the text from the tests we can isolate better what we are testing with each file.
2. Then, we would need to manually separate the tests in pass/fail.
3. Implement the code as explained in the message I sent before. Then add it to the tests in the current inkscape trunk to be executed every time someone makes a commit.
4. Then a way to make this information public would need to be done. But we can discuss this while the first points are made.
[1] https://svn.code.sf.net/p/inkscape/code/gsoc-testsuite/tester/testcases/svgt...
If we're happy with that and we agree that this is a good solution, I can start working on this soon.
Cheers Guiu Rocafort
I'm sorry about that Guiu
2013/10/11 Martin Owens <doctormo@...400...>:
On Fri, 2013-10-11 at 01:01 +0200, Guiu Rocafort wrote:
If we're happy with that and we agree that this is a good solution, I can start working on this soon.
You need to send this information to the devel mailing list. I recommend compiling your two emails into one and replying to all as it fits to have this sent to everyone.
Martin,
On 2013-10-11 12:12, Guiu Rocafort wrote:
...Ok, sorry for sending that much messages, I just wanted to make a little summary about the work that would be needed. 1. Get the SVG 1.1 Second edition test files and extract the text. Jasper, in the link [1] you sent the files are from the Second edition of the test suite or the first one ? Are they up to date ?
Probably not (it was years ago). But it should not be terribly difficult to regenerate them from the new tests (textclean.py can remove text and set links to be local).
btw, I do actually think that removing the text is good, because if there is a problem with the text rendering code, it would affect a lot of tests. Removing the text from the tests we can isolate better what we are testing with each file.
That's the idea, but for tests that explicitly test text rendering it is of course another story (I think I simply left those out initially).
- Then, we would need to manually separate the tests in pass/fail.
Assuming the reference images are up-to-date: if you run the tests once you should get a list of results in testresults.txt. A little scripting or sed magic (didn't test, but I mean something along these lines: grep Pass testresults.txt | sed 's/([^:]*):.*/\1/' | xargs -I{} svn mv {} pass/{}) should then allow you to put them in two different directories, or simply give you two lists of tests to run. Alternatively, you could easily hack the test script to output test names to different files depending on the result (so generating fail.txt and pass.txt for example), in the section near the end where the actual testing occurs.
- Implement the code as explained in the message I sent before. Then
add it to the tests in the current inkscape trunk to be executed every time someone makes a commit.
That would be great!
- Then a way to make this information public would need to be done.
But we can discuss this while the first points are made.
Sure. Whatever you do, I would not recommend the GUI I made (turned out to be incredibly inefficient, and essentially more than you really need, especially if the tests run automatically, as you then have much finer-grained information anyway).
Hi ! I've been making some progress in the automated testing code. I've simplified much and tried to keep it simple. ( http://bazaar.launchpad.net/~neandertalspeople/+junk/inkscape-testsuite/file... )
I've been having much more problems trying to get the image comparison work well. Because I removed the text from the tests, I didn't have reference images to use. So what I did is the following:
I ran the tests with Inskscape and compared manually the outputs with the reference images ( with text ). If visually they look the same, I just copied the Inkscape rendered image as the reference for the tests. This way I could get reference images for the tests that passed. The fail test remained without correct reference images so they all failed. Then I tried the tests with the trunk inkscape ( i made the reference images using inkscape 0.48 ) and found out that many of the pass tests failed ( all of them except 2 ) even if they have been rendered using inkscape.
So I've started to notice that this is going to be more difficult than I initially thought. I might try to use the perceptualdiff to try to differentiate if the changes are small enough to consider the test pass. A more developed idea would consist in measuring the "density of changes" in pixels in areas of the image.
Any thoughts ? Tavmjong ? jasper ?
Guiu
2013/10/11 Jasper van de Gronde <th.v.d.gronde@...528...>:
On 2013-10-11 12:12, Guiu Rocafort wrote:
...Ok, sorry for sending that much messages, I just wanted to make a little summary about the work that would be needed. 1. Get the SVG 1.1 Second edition test files and extract the text. Jasper, in the link [1] you sent the files are from the Second edition of the test suite or the first one ? Are they up to date ?
Probably not (it was years ago). But it should not be terribly difficult to regenerate them from the new tests (textclean.py can remove text and set links to be local).
btw, I do actually think that removing the text is good, because if there is a problem with the text rendering code, it would affect a lot of tests. Removing the text from the tests we can isolate better what we are testing with each file.
That's the idea, but for tests that explicitly test text rendering it is of course another story (I think I simply left those out initially).
- Then, we would need to manually separate the tests in pass/fail.
Assuming the reference images are up-to-date: if you run the tests once you should get a list of results in testresults.txt. A little scripting or sed magic (didn't test, but I mean something along these lines: grep Pass testresults.txt | sed 's/([^:]*):.*/\1/' | xargs -I{} svn mv {} pass/{}) should then allow you to put them in two different directories, or simply give you two lists of tests to run. Alternatively, you could easily hack the test script to output test names to different files depending on the result (so generating fail.txt and pass.txt for example), in the section near the end where the actual testing occurs.
- Implement the code as explained in the message I sent before. Then
add it to the tests in the current inkscape trunk to be executed every time someone makes a commit.
That would be great!
- Then a way to make this information public would need to be done.
But we can discuss this while the first points are made.
Sure. Whatever you do, I would not recommend the GUI I made (turned out to be incredibly inefficient, and essentially more than you really need, especially if the tests run automatically, as you then have much finer-grained information anyway).
October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register > http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clk... _______________________________________________ Inkscape-devel mailing list Inkscape-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/inkscape-devel
On Wed, 2013-10-16 at 02:48 +0200, Guiu Rocafort wrote:
Hi ! I've been making some progress in the automated testing code. I've simplified much and tried to keep it simple. ( http://bazaar.launchpad.net/~neandertalspeople/+junk/inkscape-testsuite/file... )
I've been having much more problems trying to get the image comparison work well. Because I removed the text from the tests, I didn't have reference images to use. So what I did is the following:
I ran the tests with Inskscape and compared manually the outputs with the reference images ( with text ). If visually they look the same, I just copied the Inkscape rendered image as the reference for the tests. This way I could get reference images for the tests that passed. The fail test remained without correct reference images so they all failed. Then I tried the tests with the trunk inkscape ( i made the reference images using inkscape 0.48 ) and found out that many of the pass tests failed ( all of them except 2 ) even if they have been rendered using inkscape.
Trunk uses a new renderer based on Cairo. It is not surprising that the images from 0.48 don't match trunk on a pixel-by-pixel basis. (BTW, the SVG spec says that there is a one pixel tolerance in rendering SVGs.) Since automated testing is mostly for checking for regressions, I would simply make the reference images using trunk (comparing them with the PNGs from W3C to determine pass/fail).
So I've started to notice that this is going to be more difficult than I initially thought. I might try to use the perceptualdiff to try to differentiate if the changes are small enough to consider the test pass. A more developed idea would consist in measuring the "density of changes" in pixels in areas of the image.
Any thoughts ? Tavmjong ? jasper ?
On 2013-10-16 07:29, Tavmjong Bah wrote:
On Wed, 2013-10-16 at 02:48 +0200, Guiu Rocafort wrote:
Hi ! I've been making some progress in the automated testing code. I've simplified much and tried to keep it simple. ( http://bazaar.launchpad.net/~neandertalspeople/+junk/inkscape-testsuite/file... )
Very nice! You may have simplified a little too much though, although it does depend a bit on what the goal with this is. In particular, you seem to have removed any support for patch files (which are insanely useful for testing certain things in isolation, and to avoid problems due to different renderings on different systems, for example when using fonts). I would also recommend quoting command line arguments, or you risk running into trouble the first time someone puts a space in a filename for example. The rest of the simplifications you can probably get away with, although I would indeed test what happens when Inkscape crashes (that was one of the reasons for tester.cpp, but is primarily a problem under Windows, so if Linux is the only target, then it might indeed just work).
Trunk uses a new renderer based on Cairo. It is not surprising that the images from 0.48 don't match trunk on a pixel-by-pixel basis. (BTW, the SVG spec says that there is a one pixel tolerance in rendering SVGs.) Since automated testing is mostly for checking for regressions, I would simply make the reference images using trunk (comparing them with the PNGs from W3C to determine pass/fail).
I fully agree, make life easy on yourself. You may even consider using just one reference (like you appear to be doing now), then for now you don't have to sort out whether or not tests pass or fail (assuming the automated test system can live with that). In that case I would not use the names "fail" and "pass" though, as they are a bit misleading. Maybe "same" and "changed" or something?
So I've started to notice that this is going to be more difficult than I initially thought. I might try to use the perceptualdiff to try to differentiate if the changes are small enough to consider the test pass. A more developed idea would consist in measuring the "density of changes" in pixels in areas of the image.
Any thoughts ? Tavmjong ? jasper ?
The W3C reference images are (if I remember correctly, and this may have changed with the new test suite) essentially unsuitable for automated testing with Inkscape. There are simply too many small differences between Inkscape and what they used (Batik I think) to make a direct comparison. The one pixel off rule might help (if we implemented something for using that), but since Inkscape is systematically half a pixel off compared to Batik... Inkscape considers a pixel to be at (x+0.5,y+0.5) while Batik considers it to be at (x,y), if I'm not mistaken (just compare the black border around the tests to see what I mean). Also, suddenly rendering a whole object by one pixel off would typically be considered a bug, so anything that uses the "one pixel off" rule would probably have to be a bit more intelligent than just allowing things to be one pixel off.
Long story short: just use the current output from Inkscape as a reference, and (optionally) divide the tests into pass and fail categories. (It would be much nicer to have the division, as it would allow us to keep track of where we are in terms of compliance, but don't let it get in the way of having usable automated testing.)
On Wed, 2013-10-16 at 09:20 +0200, Jasper van de Gronde wrote:
Long story short: just use the current output from Inkscape as a reference, and (optionally) divide the tests into pass and fail categories. (It would be much nicer to have the division, as it would allow us to keep track of where we are in terms of compliance, but don't let it get in the way of having usable automated testing.)
I agree. I imagine something like this:
bool pass; int compliance;
By which I mean, you have your pass/fail which can be used for regression testing and then we have a log or other data set which is purely for compliance with the svg specification. This wouldn't fail the automatic test suite, but could feed into the website to give an overview of the precision of our renderer.
Martin,
Hello Jasper,
Wednesday, October 16, 2013, 9:20:46 AM, you wrote:
The W3C reference images are (if I remember correctly, and this may have changed with the new test suite) essentially unsuitable for automated testing with Inkscape.
Yes, they are not intended for regression testing. In particular SVG allows the 0.5 pixel slop on "where is the middle of the pixel" and also does not specify anti-aliasing algorithms, text hinting, and suchlike which will produce per-implementation differences.
There are simply too many small differences between Inkscape and what they used (Batik I think)
Batik if it passes. Something else (another renderer, or a patch SVG that gives the correct result) if not.
to make a direct comparison. The one pixel off rule might help (if we implemented something for using that), but since Inkscape is systematically half a pixel off compared to Batik... Inkscape considers a pixel to be at (x+0.5,y+0.5) while Batik considers it to be at (x,y), if I'm not mistaken (just compare the black border around the tests to see what I mean).
If you are making your own version of the test suite, have you considered wrapping everything in a
<svg start tag copied from the test> <g transform="translate(-0.5,-0.5)"> svg the test here </g> </svg>
On 2013-10-16 20:42, Chris Lilley wrote:
...If you are making your own version of the test suite, have you considered wrapping everything in a <svg start tag copied from the test> <g transform="translate(-0.5,-0.5)"> svg the test here </g> </svg>
Yes :) But even then there are quite a few more subtle differences. Still, perhaps it's good to give it another shot.
On Thu, 2013-10-17 at 09:34 +0200, Jasper van de Gronde wrote:
Yes :) But even then there are quite a few more subtle differences. Still, perhaps it's good to give it another shot.
Hmm;
Maybe we're thinking of two things at the same time? The test suite is clearly more geared towards regressions and reusing the svg tests as a baseline from which to extend to test regressions. This means getting renderings from inkscape is not just a good idea, but vital.
We'd also like to know how close our render is to the real specification. For this we'd need an unmodified set of the svg files (text errors and all) and the results should be more of a percentage similarity so we can see our precision.
I'm convinced these should be separate.
Martin,
Hello Martin,
Thursday, October 17, 2013, 3:42:57 PM, you wrote:
On Thu, 2013-10-17 at 09:34 +0200, Jasper van de Gronde wrote:
Yes :) But even then there are quite a few more subtle differences. Still, perhaps it's good to give it another shot.
Hmm;
Maybe we're thinking of two things at the same time? The test suite is clearly more geared towards regressions and reusing the svg tests as a baseline from which to extend to test regressions. This means getting renderings from inkscape is not just a good idea, but vital.
We'd also like to know how close our render is to the real specification. For this we'd need an unmodified set of the svg files (text errors and all) and the results should be more of a percentage similarity so we can see our precision.
I'm convinced these should be separate.
I agree that these are solving different problems and should be separate.
participants (5)
-
Chris Lilley
-
Guiu Rocafort
-
Jasper van de Gronde
-
Martin Owens
-
Tavmjong Bah