Hi,
I just checked into trunk a fix for Cairo export of masked objects (bug 597974). In tracing the problem I came across another mask related problem. Inkscape is calculating the value of the mask wrong for screen display. The value of the mask is suppose to be calculated using "the luminancetoalpha coefficients (as defined in the 'feColorMatrix' filter primitive)" on linear RGB values. At the moment, Inkscape is combining the RGB values with equal weight. This leads to a noticeable difference between Inkscape screen display and the Cairo based exports, as well as display in browsers. I am guessing that sRGB color values are being used in addition, although I don't see a noticeable difference when comparing Inkscape PDF output or corrected screen display output with Batik or Firefox rendering.
The code to fix is in nrarenaitem.cpp at line 518 and 536... but as this is speed critical code, I don't know the best way of doing this. Can someone who is familiar with nrpixops, etc. have a look?
This is the current code:
m = NR_PREMUL_112 (s[0] + s[1] + s[2], s[3]); d[0] = FAST_DIV_ROUND < 3 * 255 > (m);
This is the desired code equivalent:
d[0] = (int)((s[0]*0.2125 + s[1]*0.7154 + s[2]*0.0721) * s[3]/255.0);
Tav
On 20101121 21:31, Tavmjong Bah wrote:
... The code to fix is in nrarenaitem.cpp at line 518 and 536... but as this is speed critical code, I don't know the best way of doing this. Can someone who is familiar with nrpixops, etc. have a look?
This is the current code:
m = NR_PREMUL_112 (s[0] + s[1] + s[2], s[3]); d[0] = FAST_DIV_ROUND< 3 * 255> (m);
This is the desired code equivalent:
d[0] = (int)((s[0]*0.2125 + s[1]*0.7154 + s[2]*0.0721) * s[3]/255.0);
Thanks a lot for finding this, I think I'll be able to fix this reasonably fast, so I'll have a look.
I just saw that this is (probably) already fixed in the Cairo branch (although the calculation used does have a small problem, see below). So it might be better to copy that. Or... we could finally make the switchover. Seriously, if switching to the Cairo branch fixes some things and "only" introduces some problems with certain transformation matrices (and only in the context of gradients?) then I think it might be time to just switch (well, move the changes to trunk). Especially since we've just released.
As for the computation in the Cairo branch (in struct MaskLuminanceToAlpha), it basically multiplies the coefficients by 255 and does the computation in integers. In principle not a bad way to do it, except that the rounded coefficients don't add up to 1. I would recommend adding one to the green component (so use 183 instead of 182), as its fractional part is closest to one (0.427, vs. 0.1875 and 0.3855). Also, since it's using 32 bit ints anyway a bit more precision could easily be used.
As for speed, the present code is probably fast enough. It might benefit from SSE or something like that though. It might also be possible to speed it up a bit by (partially) expanding the multiplies using shifts and adds. But this would have to be tried, with compilers and processors being quite good at this sort of stuff, and the coefficients not being that "sparse". In any case, there is no good reason to multiply the coefficients by 255, just use 256 (or some other, higher, power of 2), that way the divide can be done with a single shift (and that should definitely be somewhat faster).
On 20101122 10:46, Jasper van de Gronde wrote:
On 20101121 21:31, Tavmjong Bah wrote:
... The code to fix is in nrarenaitem.cpp at line 518 and 536... but as this is speed critical code, I don't know the best way of doing this. Can someone who is familiar with nrpixops, etc. have a look?
This is the current code:
m = NR_PREMUL_112 (s[0] + s[1] + s[2], s[3]); d[0] = FAST_DIV_ROUND< 3 * 255> (m);
This is the desired code equivalent:
d[0] = (int)((s[0]*0.2125 + s[1]*0.7154 + s[2]*0.0721) * s[3]/255.0);
Thanks a lot for finding this, I think I'll be able to fix this reasonably fast, so I'll have a look. ...
Hi,
Just checked in a fix to trunk.
On Mon, 20101122 at 11:28 +0100, Jasper van de Gronde wrote:
I just saw that this is (probably) already fixed in the Cairo branch (although the calculation used does have a small problem, see below). So it might be better to copy that. Or... we could finally make the switchover. Seriously, if switching to the Cairo branch fixes some things and "only" introduces some problems with certain transformation matrices (and only in the context of gradients?) then I think it might be time to just switch (well, move the changes to trunk). Especially since we've just released.
I would love to see this happen!
As for the computation in the Cairo branch (in struct MaskLuminanceToAlpha), it basically multiplies the coefficients by 255 and does the computation in integers. In principle not a bad way to do it, except that the rounded coefficients don't add up to 1. I would recommend adding one to the green component (so use 183 instead of 182), as its fractional part is closest to one (0.427, vs. 0.1875 and 0.3855). Also, since it's using 32 bit ints anyway a bit more precision could easily be used.
As for speed, the present code is probably fast enough. It might benefit from SSE or something like that though. It might also be possible to speed it up a bit by (partially) expanding the multiplies using shifts and adds. But this would have to be tried, with compilers and processors being quite good at this sort of stuff, and the coefficients not being that "sparse". In any case, there is no good reason to multiply the coefficients by 255, just use 256 (or some other, higher, power of 2), that way the divide can be done with a single shift (and that should definitely be somewhat faster).
This is exactly the same as I came up with this morning... and just checked in. I multiply the RGB numbers by 256 and then shifted the bits to divide by 256. I then multiply by alpha and divide by 255.
Note: This calculation is still using sRGB when it should be using linearRGB. I've verified that it gives the correct results when using individually solid red, green, and blue as masks (matching Batik and Firefox). If one uses a mask of 50% gray, there is a significant difference between Inkscape and other renderers. I've attached a test file to bug 597974:
https://bugs.launchpad.net/inkscape/+bug/597974
Tav
On 20101122 10:46, Jasper van de Gronde wrote:
On 20101121 21:31, Tavmjong Bah wrote:
... The code to fix is in nrarenaitem.cpp at line 518 and 536... but as this is speed critical code, I don't know the best way of doing this. Can someone who is familiar with nrpixops, etc. have a look?
This is the current code:
m = NR_PREMUL_112 (s[0] + s[1] + s[2], s[3]); d[0] = FAST_DIV_ROUND< 3 * 255> (m);
This is the desired code equivalent:
d[0] = (int)((s[0]*0.2125 + s[1]*0.7154 + s[2]*0.0721) * s[3]/255.0);
Thanks a lot for finding this, I think I'll be able to fix this reasonably fast, so I'll have a look. ...
On Mon, 20101122 at 16:06 +0100, Tavmjong Bah wrote:
Note: This calculation is still using sRGB when it should be using linearRGB. I've verified that it gives the correct results when using individually solid red, green, and blue as masks (matching Batik and Firefox). If one uses a mask of 50% gray, there is a significant difference between Inkscape and other renderers. I've attached a test file to bug 597974:
To convert sRGB to linearRGB I propose to add three lookup tables, one for each color. The values will include both the conversion plus the luminance to alpha values. I've done a quick test and this seems to work.
const unsigned int srgb_to_linear_r[] = { 0, 0, 0, 0, 1, 2, 3, 5, 6, 8, 11, 13, 16, 19, 23, 27, ... }
m = NR_PREMUL_112 ( (srgb_to_linear_r[s[0]] + srgb_to_linear_g[s[1]] + srgb_to_linear_b[s[2]] ) >> 8, s[3]); d[0] = FAST_DIV_ROUND < 255 > (m);
This would fix Inkscape's display but not Cairo PDF export. The latter is more problematic as the pixel data is already multiplied by alpha.
Tav
W dniu 22 listopada 2010 11:28 użytkownik Jasper van de Gronde <th.v.d.gronde@...528...> napisał:
As for the computation in the Cairo branch (in struct MaskLuminanceToAlpha), it basically multiplies the coefficients by 255 and does the computation in integers. In principle not a bad way to do it, except that the rounded coefficients don't add up to 1. I would recommend adding one to the green component (so use 183 instead of 182), as its fractional part is closest to one (0.427, vs. 0.1875 and 0.3855). Also, since it's using 32 bit ints anyway a bit more precision could easily be used.
I didn't notice the coefficients didn't add up to 255, thanks for spotting this. I changed them to add up to 512, so the divide can be done with two bitwise operations. I'll commit this change once I finally succeed doing the trunk merge :)
Regards, Krzysztof
Jasper van de Gronde wrote:
In any case, there is no good reason to multiply the coefficients by 255, just use 256 (or some other, higher, power of 2), that way the divide can be done with a single shift (and that should definitely be somewhat faster).
Maybe I'm gonna say something obvious for you, if so forgive me. When you do integer math there is a difference between multiplying by 255 or 256 and it's in what is '1'. '1' can be 255 (for example if you are speaking about R, G, and B coordinates) so you can make all the calculations up to 0xffffffff (or in general 2^(n+8)1) then finally with x>>n you get directly your 8 bits result. But if you consider 2^(n+8) as being '1' you finally have to do x/(2^n+8)*(2^(8)1), that is you need a multiplication that could be avoided because implicit in all preceeding calculations. And of course, the division cannot be done first (as would be in the expression I wrote) without using a cast to float (provided x is an integer). Also, having (2^n)1 as '1' saves 1 bit that would be 1 only when followed by 0s and otherwise could be used to increase the precision.
I understand that having 2^n as '1' is conceptually easier, but when quantities limited to n bits are involved, usually 2^n1 is '1' and then it's better to use their natural limit as normalizer.
Regards. Luca
2010/11/26 LucaDC <dicappello@...2144...>:
Jasper van de Gronde wrote:
In any case, there is no good reason to multiply the coefficients by 255, just use 256 (or some other, higher, power of 2), that way the divide can be done with a single shift (and that should definitely be somewhat faster).
Maybe I'm gonna say something obvious for you, if so forgive me. When you do integer math there is a difference between multiplying by 255 or 256 and it's in what is '1'.
In this case it doesn't matter because the input values are limited to 0...255. We can use any multiplier we want  as long as the coeffs sum up to the value we use as the divisor, it will work correctly. There are many other cases where it won't, precisely because of the reason you mentioned (e.g. in blending, or when alpha is used as a coefficient) and then we can't avoid the division.
Regards, Krzysztof
Krzysztof Kosiński wrote:
In this case it doesn't matter because the input values are limited to 0...255. We can use any multiplier we want  as long as the coeffs sum up to the value we use as the divisor, it will work correctly. There are many other cases where it won't, precisely because of the reason you mentioned (e.g. in blending, or when alpha is used as a coefficient) and then we can't avoid the division.
Well, I was speaking about avoiding the final multiplication and the use of floats: if you want to get 255 as final maximum starting from 256 (or 512, or 2^n), you either have to divide by 1,00392... (or 2,00784..., or some other float) or first you have to multiply by 255 and then shiftdivide (by 8 or 9). Then, why not implicitly mutliply by 2^n in each partial coefficient so you sum up to 255*2^n, which are multiplications you must do and in this case can turn into integer multiplications rather than floats and without loss of final precision? If you sum up to 32768 (16 bits!) and then divide by 128 (>>7) and your result is '1' you get 256 that is over the limit of 255. But if you sum up to 32767 (or better 32640, both 15 bits so we could use one more bit for precision) you just have to >>7 and you're done. No floats, no final multiplitcation: just the final scale already taken into consideration in all preceeding calculations. And the difference between the two is only in considering 256 or 255 (or better, 2^n rather than (2^n)1) as maximum. Indeed, it's really in this case that matters.
This way of doing calculations is typical of microcontollers and DSPs where performances do count a lot because you either need to be really fast (but really!) or you don't have enough resources for your application but you have to make it work anyway :) My opinion is that Inkscape could be faster than it is, or at least give the feeling that it is. Maybe some more attention to these details (because they are details, but many details make a whole) could help to improve performances.
Regards. Luca
On 20101129 10:18, LucaDC wrote:
Krzysztof Kosiński wrote:
In this case it doesn't matter because the input values are limited to 0...255. We can use any multiplier we want  as long as the coeffs sum up to the value we use as the divisor, it will work correctly. There are many other cases where it won't, precisely because of the reason you mentioned (e.g. in blending, or when alpha is used as a coefficient) and then we can't avoid the division.
Well, I was speaking about avoiding the final multiplication and the use of floats: if you want to get 255 as final maximum starting from 256 (or 512, or 2^n),
And that's the point. We do NOT want to get 255 as maximum starting from 256, we want to get 255 as maximum from 255. We have three values (r,g,b) in the range [0,255] and want to weigh these according to certain coefficients (giving l, the luminance, in the range [0,255]), lets call these cr, cg and cb (and they are defined such that cr+cg+cb=1). So the ideal computation would be: l = cr*r+cg*g+cb*b
Now, as we've noted we'd like to avoid floating point computations, as they're usually still a bit slower, and we're using integers, so why not keep them as integers. So instead of multiplying by cr,cg,cb we multiply by Bcr,Bcg,Bcb (which are approximately the old coefficients scaled by B, such that Bcr+Bcg+Bcb=B), and then divide by B again to get the unscaled l: l = (Bcr*r+Bcg*g+Bcb*b)/B
To choose a proper B we need two things: it should be large enough to give a relatively precise answer and the division should ideally be easy. To make it large enough we just take some value that is at least as large as the one divided by the smallest coefficient, and we have plenty of spare room, so we can choose something that's comfortably large. To make the division as easy as possible we can choose a power of two so the division becomes a shift.
This make sense now?
Jasper van de Gronde wrote:
And that's the point. We do NOT want to get 255 as maximum starting from 256, we want to get 255 as maximum from 255. ... This make sense now?
Yes: I understand I didn't explain myself well enough. Sorry. I was trying to say the exact same think. I meant that if the final sum can be up to 2^n (not the initial values! I know that the range is 0255 i.e. 8 bits) you get into troubles when dividing by 2^(n8). Of course if the coefficients sum up to 2^n but the starting values are up to 255 (i.e. (2^8)1), the sum never reaches 2^n and the result of the final shift
(n8) will never be more than 255.
Of course this means that your maximum sum is not 2^n but 255*2^(n8). This is what I was referring as '1' while, if I correctly understood, you considered the sum of the coefficients and this can be done only because the starting variables all have the same maximum. Try doing the same reasoning for three different variables going from 0 respectively to 255, 127 and 63 to obtain a maximum final of 255. You'll see that you need a much different approach. Anyway, I'm glad that we agree on this simpler case and I hope this clarifies a bit more my thoughts.
Regards Luca
participants (4)

Jasper van de Gronde

Krzysztof Kosiński

LucaDC

Tavmjong Bah