
Hi all,
Could someone have a look at the testsuite and reprogram it such that (at least when perceptualdiff is used) the results that are marked 'new' are marked 'fail' instead? This is much clearer, and cleans up the daily checks at http://auriga.mine.nu/inkscape/. Thank you very very much!
I love the testsuite, but I don't want to delve in to the code and change things myself.
Thanks a bunch, Johan

NO! There is a definite and extremely important difference between new and fail. Perceptualdiff only helps with images that really are almost exactly the same. It does NOT help when: - A Bezier curve is slightly (but benignly) perturbed (which has happened) or other small but for a human insignificant (as far as the correctness of the render is concerned) changes occur (for example changes in resampling of bitmaps). - A complicated test case was judged as 'pass' incorrectly. - A 'new' result is actually a 'pass' (for example when there is no pass reference yet). - Something changes that is unrelated to the specific test. For example, there are a few filter tests that register as passes because Inkscape indeed implements them correctly but that still do not render correctly because Inkscape doesn't implement the color-interpolation properties. (It would be better to change the tests of course, but still, stuff like this happens.)
In short, perceptualdiff is nowhere near a true substitute for a human judge. It is great for filtering out spurious results based on minute numerical differences and/or differences in the binary encoding of the pngs, but that's about it.
For this reason the system was set up specifically to allow for multiple pass/fail references and flag anything it can't match as a new result. In the past I could easily keep up with judging any new results because don't occur very frequently, but recently a lot of tests suddenly had new results (probably because of changes in bitmap rendering) and since I was/am way too busy I was unable to rejudge them myself. At the time I sent a mail about this, but no one responded.
So: PLEASE just judge these new results once (probably won't take too long) and then the results will be like normal again.
P.S. The system is set up so that if there are two (or more) results in one day it only displays the last, that's why hardly any new results show up in the history of the results (I'd run the tests, rejudge any new results, if any, and then rerun the tests).
J.B.C.Engelen@...1578... wrote:
Hi all,
Could someone have a look at the testsuite and reprogram it such that (at least when perceptualdiff is used) the results that are marked 'new' are marked 'fail' instead? This is much clearer, and cleans up the daily checks at http://auriga.mine.nu/inkscape/. Thank you very very much!
I love the testsuite, but I don't want to delve in to the code and change things myself.
Thanks a bunch, Johan

** If there is someone out there who likes to make website stuff, please help with our test suite ! **
It's a useability issue and depends too much on one person. Right now, we have many "new" results. A result that is obviously different from the pass reference is marked "new", is something only marked "fail" when it is equal to a fail reference? Perhaps there is a program that gives a measure of how equal images are, instead of simply "equal"/"not equal"? It would be very nice if the system didn't need much user intervention, or if that intervention would be very easy. (e.g. website based)
So: PLEASE just judge these new results once (probably won't take too long) and then the results will be like normal again.
What action does it take to judge? Should one commit a new fail or pass-reference? The website's system is different so the new fail/pass ref should be generated by the website too, otherwise the result will still be flagged "new". This means that for example I cannot do it.
You know I am a big fan of the testsuite, but for some reason, there are very few people using it. IMHO it would be much better if people would add testfiles to the testsuite instead of adding them to the bugtracker, but it doesn't happen. Perhaps clearifying the result listing can help. I don't know...
Ciao, Johan
-----Original Message----- From: Jasper van de Gronde [mailto:th.v.d.gronde@...528...] Sent: Tuesday, October 20, 2009 10:25
NO! There is a definite and extremely important difference between new and fail. Perceptualdiff only helps with images that really are almost exactly the same. It does NOT help when:
- A Bezier curve is slightly (but benignly) perturbed (which has happened) or other small but for a human insignificant
(as far as the correctness of the render is concerned) changes occur (for example changes in resampling of bitmaps).
- A complicated test case was judged as 'pass' incorrectly.
- A 'new' result is actually a 'pass' (for example when there is no pass reference yet).
- Something changes that is unrelated to the specific test. For example, there are a few filter tests that register as
passes because Inkscape indeed implements them correctly but that still do not render correctly because Inkscape doesn't implement the color-interpolation properties. (It would be better to change the tests of course, but still, stuff like this happens.)
In short, perceptualdiff is nowhere near a true substitute for a human judge. It is great for filtering out spurious results based on minute numerical differences and/or differences in the binary encoding of the pngs, but that's about it.
For this reason the system was set up specifically to allow for multiple pass/fail references and flag anything it can't match as a new result. In the past I could easily keep up with judging any new results because don't occur very frequently, but recently a lot of tests suddenly had new results (probably because of changes in bitmap rendering) and since I was/am way too busy I was unable to rejudge them myself. At the time I sent a mail about this, but no one responded.
So: PLEASE just judge these new results once (probably won't take too long) and then the results will be like normal again.
P.S. The system is set up so that if there are two (or more) results in one day it only displays the last, that's why hardly any new results show up in the history of the results (I'd run the tests, rejudge any new results, if any, and then rerun the tests).

J.B.C.Engelen@...1578... wrote:
** If there is someone out there who likes to make website stuff, please help with our test suite ! ** ... Perhaps there is a program that gives a measure of how equal images are, instead of simply "equal"/"not equal"? It would be very nice if the system didn't need much user intervention, or if that intervention would be very easy. (e.g. website based)
During the time I did it I only had to rejudge images occasionally, and you only have to look at the output image and move it to either the pass or the fail references (and then commit). But yes, it would be great to have a web interface for this, especially for users who are unfamiliar with SVN.
... You know I am a big fan of the testsuite, but for some reason, there are very few people using it. IMHO it would be much better if people would add testfiles to the testsuite instead of adding them to the bugtracker, but it doesn't happen.
Indeed, I greatly appreciate the effort you've put into making new
Perhaps clearifying the result listing can help. I don't know...
The Wiki is currently down so I can't check the exact URL (probably just http://www.inkscape.org/wiki/index.php/TestingInkscape as linked to from the testsuite result page), but there is quite a bit of documentation on the Wiki on testing Inkscape (both unit tests and rendering tests). In short:
Inkscape has unit tests to (mostly) test low-level functionality which are run using make check (or the Windows equivalent) and implemented in code using CxxTest.
In addition Inkscape has rendering tests to test high level functionality. These tests are run using runtests.py in the test repository (in Inkscape's SVN) and consist of a test SVG and references PNGs. For each test there can be any number of fail and/or pass references to which the test program tries to match the output. As Inkscape's output doesn't change too often this makes it relatively easy to keep up with any changes that do occur manually. Especially when combined with perceptualdiff to filter out really trivial changes.
Also, instead of a pass reference another SVG file (called a patch file) can be given which Inkscape should render in exactly the same way but which IS rendered correctly. This can make comparisons slightly more robust and enables a pass reference to be made before Inkscape actually passes a test.

-----Original Message----- From: Jasper van de Gronde [mailto:th.v.d.gronde@...528...] Sent: Wednesday, October 21, 2009 15:44 To: Engelen, J.B.C. (Johan) Cc: inkscape-devel@lists.sourceforge.net Subject: Re: Test suite
Also, instead of a pass reference another SVG file (called a patch file) can be given which Inkscape should render in exactly the same way but which IS rendered correctly. This can make comparisons slightly more robust and enables a pass reference to be made before Inkscape actually passes a test.
Yeah I did that. It works well. However, if the test fails, it was marked "new" for me. (which is why I proposed to change "new" to "fail".)
I think the testing itself is very well documented, but it needs to get more exposure and it needs someone to work on the result output. (cxxtests should be online as well for example!)
- Johan

Hi, Thanks, now I understand the meaning of "new" and what are "fail references" made for! I'm not involved in software develeopement nor CS in the large, and to tell you the truth, this was really unclear to me. This should be explained somewhere online (maybe it is already and I missed it), or made clear from reading the results themselves.
So I support Johan suggestion; what about replacing "new" by "fail", and "fail" by "fail (known)", with a little caption explaining that known failures are those matching a "fail reference"?
(and needless to say, I definitely agree the test suite is vital for inkscape developpement and maintainance and should be better known so that more people contribute test cases)
Cheers.

jf barraud wrote:
Hi, Thanks, now I understand the meaning of "new" and what are "fail references" made for! I'm not involved in software develeopement nor CS in the large, and to tell you the truth, this was really unclear to me. This should be explained somewhere online (maybe it is already and I missed it), or made clear from reading the results themselves.
Perhaps a simple legend? (Can't think why I didn't include one in the first place.)
So I support Johan suggestion; what about replacing "new" by "fail", and "fail" by "fail (known)", with a little caption explaining that known failures are those matching a "fail reference"?
As such I think it migh be a bit misleading, but Johan does have some ideas to perhaps reduce the number of new results, so who knows.
And if the term "new" is misleading, perhaps something else, like "unknown" might be better?
BTW, the project I was feverishly working on over the past few months (http://2009.igem.org/Team:Groningen) is finally (almost) over! So I might actually be contributing to Inkscape (for example to the test suite) again in the near future. (A lot of the graphics on our Wiki were made with Inkscape btw.)
participants (3)
-
unknown@example.com
-
Jasper van de Gronde
-
jf barraud