
I've run the rendering tests for a bit more than two months now, usually roughly once a week. The results have actually been relatively stable, considering all the refactoring work going on. However, there have been a few notable changes.
The good news: the number of passed tests has gone from 96 to 105 and the number of failed tests from 65 to 56.
The bad news: we have three tests that crash at the moment (introduced between 2008-12-28 and 2009-01-04), and some of the changes in the number of passed tests were due to a rejudging of some the filter outputs (basically there appear to be at least one or two wrong references in the W3C test suite, and Inkscape currently simply doesn't handle the color-interpolation properties, while the actual filters are fine).
You can get more details at: http://home.hccnet.nl/th.v.d.gronde/inkscape/ResultViewer.html
I intend to continue running the tests as before in the near future, but am still most definitely interested in any additional help in further automating these tests and/or writing the tests. Specifically: - I currently run these tests by starting a batch file on my notebook, basically when I feel like it. A machine which could run this using a timer would be great (it would have to compile inkscape, run the tests and upload the result, possibly generating an e-mail if there are any regressions). - I currently judge any new results (so results which don't match any existing reference). Usually this is fine (most of the time there are only about 1-3 new results), but any help would be appreciated. Of course this would be a lot easier if the machine running the tests could also serve the output images (then a small on-line application could be made for letting people judge the new results). - I'd like to change the comparison algorithm used. Currently I'm using perceptualdiff, but I would like to use something which simply allows for a given error threshold (maximum error and/or rms error). Obviously extra bonus points for an algorithm that can do something sensible for comparing curves (the SVG standard says something about curves having to be within one pixel of the "true" curve or something). - Any additional tests (based on bug reports for example) would obviously be welcome. - I'd love to be able to link a bug report to a certain fail result (linking it to a file is probably not a good idea, as one test may exhibit different bugs over time). Any ideas and/or implementations for doing this would be great. Note that currently the system cannot distinguish between two fail references, so some easy way of specifying for which outputs the bug holds would be very nice.