** If there is someone out there who likes to make website stuff, please help with our test suite ! **
It's a useability issue and depends too much on one person. Right now, we have many "new" results. A result that is obviously different from the pass reference is marked "new", is something only marked "fail" when it is equal to a fail reference? Perhaps there is a program that gives a measure of how equal images are, instead of simply "equal"/"not equal"? It would be very nice if the system didn't need much user intervention, or if that intervention would be very easy. (e.g. website based)
So: PLEASE just judge these new results once (probably won't take too long) and then the results will be like normal again.
What action does it take to judge? Should one commit a new fail or pass-reference? The website's system is different so the new fail/pass ref should be generated by the website too, otherwise the result will still be flagged "new". This means that for example I cannot do it.
You know I am a big fan of the testsuite, but for some reason, there are very few people using it. IMHO it would be much better if people would add testfiles to the testsuite instead of adding them to the bugtracker, but it doesn't happen. Perhaps clearifying the result listing can help. I don't know...
Ciao, Johan
-----Original Message----- From: Jasper van de Gronde [mailto:th.v.d.gronde@...528...] Sent: Tuesday, October 20, 2009 10:25
NO! There is a definite and extremely important difference between new and fail. Perceptualdiff only helps with images that really are almost exactly the same. It does NOT help when:
- A Bezier curve is slightly (but benignly) perturbed (which has happened) or other small but for a human insignificant
(as far as the correctness of the render is concerned) changes occur (for example changes in resampling of bitmaps).
- A complicated test case was judged as 'pass' incorrectly.
- A 'new' result is actually a 'pass' (for example when there is no pass reference yet).
- Something changes that is unrelated to the specific test. For example, there are a few filter tests that register as
passes because Inkscape indeed implements them correctly but that still do not render correctly because Inkscape doesn't implement the color-interpolation properties. (It would be better to change the tests of course, but still, stuff like this happens.)
In short, perceptualdiff is nowhere near a true substitute for a human judge. It is great for filtering out spurious results based on minute numerical differences and/or differences in the binary encoding of the pngs, but that's about it.
For this reason the system was set up specifically to allow for multiple pass/fail references and flag anything it can't match as a new result. In the past I could easily keep up with judging any new results because don't occur very frequently, but recently a lot of tests suddenly had new results (probably because of changes in bitmap rendering) and since I was/am way too busy I was unable to rejudge them myself. At the time I sent a mail about this, but no one responded.
So: PLEASE just judge these new results once (probably won't take too long) and then the results will be like normal again.
P.S. The system is set up so that if there are two (or more) results in one day it only displays the last, that's why hardly any new results show up in the history of the results (I'd run the tests, rejudge any new results, if any, and then rerun the tests).