I have modified Inkscape's blur filter to operate in linear time using a method that gives a very good approximation of a gaussian blur using a small IIR filter (falling back on a heavily modified version of the old code for very small kernel sizes, as the IIR filter doesn't handle those well). This means that the time needed to compute a blur doesn't depend on the size of the kernel (except for the fact that larger borders are needed for larger kernels).
I'm posting it here for feedback, if people think this is a good idea I'll clean up some last things and submit it to the patch tracker. Also, I wouldn't mind some testing, I only tested with a simple image with a blurred rectangle and gallardo.svgz (the latter has quite a nice collection of blurs).
When rendering gallardo.svgz using the new code cuts the total time spent in FilterGaussian::render roughly in half (on my computer it is 30-40%, so 50% should be quite attainable), compared to the SVN implementation using average(!) quality, while being virtually indistinguishable from best(!) quality rendering. I have seen some differences on certain test drawings in my test application, but for a rendering of gallardo.svgz the maximum difference was 1 and didn't occur often (and could be due to slightly different rounding, numerical errors, etc.).
In my patch I completely eliminated the old resampling stuff, but I could also imagine keeping it to let people speed up gaussian blur even more.
Some things I know of that could be done to make it faster/smaller/etc.: - Currently it uses either the IIR filter or the FIR filter for both the X and Y directions, which could in theory result in bad performance on blurs where the deviation is high in only one direction. There is no technical reason preventing this decision to be made separately for the X and Y directions though (I just haven't come around to doing that). - Code for the X and Y directions could be merged, as they are so similar that only the bounds and some strides differ (might have a negative impact on performance though, would have to be timed). - A fixed-point format could be used for the intermediate values in the IIR filter (I already experimented with this in a test application, it works quite well and is slightly faster, probably more so on older hardware). - Currently temporary storage is used to store the intermediate values for an entire row/column for the IIR filter. This isn't strictly necessary (but probably provides slightly better quality and without it the values would need to be clipped when storing them, at least in my tests).
BTW, I also made a small change to how the normal gaussian kernel is calculated which results in slightly higher quality blurs but can make direct comparison with the old code harder (the step size wasn't very accurate for small kernel sizes, resulting in a blur that could be quite visibly different from the correct one). So when evaluating the performance of the IIR filter it should be compared to the output of the FIR filter in the new code.
Index: src/display/nr-filter-gaussian.cpp =================================================================== --- src/display/nr-filter-gaussian.cpp (revision 13436) +++ src/display/nr-filter-gaussian.cpp (working copy) @@ -6,13 +6,16 @@ * Authors: * Niko Kiirala <niko@...1267...> * bulia byak + * Jasper van de Gronde <th.v.d.gronde@...528...> * * Copyright (C) 2006 authors * * Released under GNU GPL, read the file 'COPYING' for more information */
+#include <algorithm> #include <cmath> +#include <complex> #include <glib.h>
using std::isnormal; @@ -24,6 +27,37 @@ #include "libnr/nr-matrix.h" #include "prefs-utils.h"
+// IIR filtering method based on: +// L.J. van Vliet, I.T. Young, and P.W. Verbeek, Recursive Gaussian Derivative Filters, +// in: A.K. Jain, S. Venkatesh, B.C. Lovell (eds.), +// ICPR'98, Proc. 14th Int. Conference on Pattern Recognition (Brisbane, Aug. 16-20), +// IEEE Computer Society Press, Los Alamitos, 1998, 509-514. +// +// Using the backwards-pass initialization procedure from: +// Boundary Conditions for Young - van Vliet Recursive Filtering +// Bill Triggs, Michael Sdika +// IEEE Transactions on Signal Processing, Volume 54, Number 5 - may 2006 + +// Number of IIR filter coefficients used. Currently only 3 is supported. +// "Recursive Gaussian Derivative Filters" says this is enough though (and +// some testing indeed shows that the quality doesn't improve much if larger +// filters are used). +const size_t N = 3; + +template<typename InIt, typename OutIt, typename Size> +void copy_n(InIt beg_in, Size N, OutIt beg_out) { + std::copy(beg_in, beg_in+N, beg_out); +} + +typedef double Value; // Type used for filter coefficients (can be 10.21 fixed point) +template<typename T> inline T sqr(T const v) { return v*v; } + +template<typename Tt, typename Ts> +Tt round_cast(Ts const& v) { + static Ts const rndoffset(.5); + return static_cast<Tt>(v+rndoffset); +} + namespace NR {
FilterGaussian::FilterGaussian() @@ -45,211 +79,39 @@ { int length_x = _effect_area_scr_x(trans); int length_y = _effect_area_scr_y(trans); - return _max(length_x, length_y) * 2 + 1; + return std::max(length_x, length_y) + 1; }
void FilterGaussian::_make_kernel(double *kernel, double deviation, double expansion) { - double length = deviation * 3.0; - int scr_len = (int)std::floor(length * expansion); - if(scr_len < 1) scr_len = 1; - double d_sq = deviation * deviation * 2; - double step = length / scr_len; + int const scr_len = (int)std::ceil(deviation * 3.0 * expansion); + double const d_sq = sqr(deviation * expansion) * 2;
+ // Compute kernel and sum of coefficients + // Note that actually only half the kernel is computed, as it is symmetric double sum = 0; - for ( int i = 0; i < scr_len * 2 + 1 ; i++ ) { - double i_sq = (step * i - length) * (step * i - length); - sum += (std::exp(-i_sq / d_sq) / std::sqrt(M_PI * d_sq)); + for ( int i = 0; i <= scr_len ; i++ ) { + kernel[i] = std::exp(-sqr(i) / d_sq); + sum += kernel[i]; } + sum = 2*sum - kernel[0]; // the sum of the complete kernel is twice as large (minus the center element to avoid counting it twice)
- for ( int i = 0; i < scr_len * 2 + 1 ; i++ ) { - double i_sq = (step * i - length) * (step * i - length); - kernel[i] = (std::exp(-i_sq / d_sq) / std::sqrt(M_PI * d_sq)) / sum; + // Normalize kernel + for ( int i = 0; i <= scr_len ; i++ ) { + kernel[i] /= sum; } }
int FilterGaussian::_effect_area_scr_x(Matrix const &trans) { - int ret = (int)std::floor(_deviation_x * 3.0 * trans.expansionX()); - if(ret < 1) ret = 1; - return ret; + return (int)std::ceil(_deviation_x * 3.0 * trans.expansionX()); }
int FilterGaussian::_effect_area_scr_y(Matrix const &trans) { - int ret = (int)std::floor(_deviation_y * 3.0 * trans.expansionY()); - if(ret < 1) ret = 1; - return ret; + return (int)std::ceil(_deviation_y * 3.0 * trans.expansionY()); }
-int FilterGaussian::_effect_subsample_step(int scr_len_x, int quality) -{ - switch (quality) { - case BLUR_QUALITY_WORST: - if (scr_len_x < 8) { - return 1; - } else if (scr_len_x < 32) { - return 4; - } else if (scr_len_x < 64) { - return 8; - } else if (scr_len_x < 128) { - return 32; - } else if (scr_len_x < 256) { - return 128; - } else if (scr_len_x < 512) { - return 512; - } else if (scr_len_x < 1024) { - return 4096; - } else { - return 65536; - } - break; - case BLUR_QUALITY_WORSE: - if (scr_len_x < 16) { - return 1; - } else if (scr_len_x < 64) { - return 4; - } else if (scr_len_x < 120) { - return 8; - } else if (scr_len_x < 200) { - return 32; - } else if (scr_len_x < 400) { - return 64; - } else if (scr_len_x < 800) { - return 256; - } else if (scr_len_x < 1200) { - return 1024; - } else { - return 65536; - } - break; - case BLUR_QUALITY_BETTER: - if (scr_len_x < 32) { - return 1; - } else if (scr_len_x < 160) { - return 4; - } else if (scr_len_x < 320) { - return 8; - } else if (scr_len_x < 640) { - return 32; - } else if (scr_len_x < 1280) { - return 64; - } else if (scr_len_x < 2560) { - return 256; - } else { - return 1024; - } - break; - case BLUR_QUALITY_BEST: - return 1; // no subsampling at all - break; - case BLUR_QUALITY_NORMAL: - default: - if (scr_len_x < 16) { - return 1; - } else if (scr_len_x < 80) { - return 4; - } else if (scr_len_x < 160) { - return 8; - } else if (scr_len_x < 320) { - return 32; - } else if (scr_len_x < 640) { - return 64; - } else if (scr_len_x < 1280) { - return 256; - } else if (scr_len_x < 2560) { - return 1024; - } else { - return 65536; - } - break; - } -} - -int FilterGaussian::_effect_subsample_step_log2(int scr_len_x, int quality) -{ - switch (quality) { - case BLUR_QUALITY_WORST: - if (scr_len_x < 8) { - return 0; - } else if (scr_len_x < 32) { - return 2; - } else if (scr_len_x < 64) { - return 3; - } else if (scr_len_x < 128) { - return 5; - } else if (scr_len_x < 256) { - return 7; - } else if (scr_len_x < 512) { - return 9; - } else if (scr_len_x < 1024) { - return 12; - } else { - return 16; - } - break; - case BLUR_QUALITY_WORSE: - if (scr_len_x < 16) { - return 0; - } else if (scr_len_x < 64) { - return 2; - } else if (scr_len_x < 120) { - return 3; - } else if (scr_len_x < 200) { - return 5; - } else if (scr_len_x < 400) { - return 6; - } else if (scr_len_x < 800) { - return 8; - } else if (scr_len_x < 1200) { - return 10; - } else { - return 16; - } - break; - case BLUR_QUALITY_BETTER: - if (scr_len_x < 32) { - return 0; - } else if (scr_len_x < 160) { - return 2; - } else if (scr_len_x < 320) { - return 3; - } else if (scr_len_x < 640) { - return 5; - } else if (scr_len_x < 1280) { - return 6; - } else if (scr_len_x < 2560) { - return 8; - } else { - return 10; - } - break; - case BLUR_QUALITY_BEST: - return 0; // no subsampling at all - break; - case BLUR_QUALITY_NORMAL: - default: - if (scr_len_x < 16) { - return 0; - } else if (scr_len_x < 80) { - return 2; - } else if (scr_len_x < 160) { - return 3; - } else if (scr_len_x < 320) { - return 5; - } else if (scr_len_x < 640) { - return 6; - } else if (scr_len_x < 1280) { - return 8; - } else if (scr_len_x < 2560) { - return 10; - } else { - return 16; - } - break; - } -} - /** * Sanity check function for indexing pixblocks. * Catches reading and writing outside the pixblock area. @@ -264,6 +126,155 @@ } }
+static void calcFilter(double const sigma, double b[N]) { + assert(N==3); + std::complex<double> const d1_org(1.40098, 1.00236); + double const d3_org = 1.85132; + double qbeg = 1; // Don't go lower than sigma==2 (we'd probably want a normal convolution in that case anyway) + double qend = 2*sigma; + double const sigmasqr = sqr(sigma); + double s; + do { // Binary search for right q (a linear interpolation scheme is suggested, but this should work fine as well) + double const q = (qbeg+qend)/2; + // Compute scaled filter coefficients + std::complex<double> const d1 = pow(d1_org, 1.0/q); + double const d3 = pow(d3_org, 1.0/q); + double const absd1sqr = std::norm(d1); + double const re2d1 = 2*d1.real(); + double const bscale = 1.0/(absd1sqr*d3); + b[2] = -bscale; + b[1] = bscale*(d3+re2d1); + b[0] = -bscale*(absd1sqr+d3*re2d1); + // Compute actual sigma^2 + double const ssqr = 2*(2*(d1/sqr(d1-1.)).real()+d3/sqr(d3-1.)); + if ( ssqr < sigmasqr ) { + qbeg = q; + } else { + qend = q; + } + s = sqrt(ssqr); + } while(qend-qbeg>(sigma/(1<<30))); + printf("real sigma: %g, q: %g\n", s, qend); +} + +static void calcTriggsSdikaM(double const b[N], double M[N*N]) { + assert(N==3); + double a1=b[0], a2=b[1], a3=b[2]; + double const Mscale = 1.0/((1+a1-a2+a3)*(1-a1-a2-a3)*(1+a2+(a1-a3)*a3)); + M[0] = 1-a2-a1*a3-sqr(a3); + M[1] = (a1+a3)*(a2+a1*a3); + M[2] = a3*(a1+a2*a3); + M[3] = a1+a2*a3; + M[4] = (1-a2)*(a2+a1*a3); + M[5] = a3*(1-a2-a1*a3-sqr(a3)); + M[6] = a1*(a1+a3)+a2*(1-a2); + M[7] = a1*(a2-sqr(a3))+a3*(1+a2*(a2-1)-sqr(a3)); + M[8] = a3*(a1+a2*a3); + for(size_t i=0; i<9; i++) M[i] *= Mscale; +} + +template<unsigned int SIZE> +static void calcTriggsSdikaInitialization(double const M[N*N], Value const uold[N][SIZE], Value const uplus[SIZE], Value const vplus[SIZE], Value const alpha, Value vold[N][SIZE]) { + for(size_t c=0; c<SIZE; c++) { + double uminp[N]; + for(size_t i=0; i<N; i++) uminp[i] = uold[i][c] - uplus[c]; + for(size_t i=0; i<N; i++) { + double voldf = 0; + for(size_t j=0; j<N; j++) { + voldf += uminp[j]*M[i*N+j]; + } + // Properly takes care of the scaling coefficient alpha and vplus (which is already appropriately scaled) + // This was arrived at by starting from a version of the blur filter that ignored the scaling coefficient + // (and scaled the final output by alpha^2) and then gradually reintroducing the scaling coefficient. + voldf *= alpha; + vold[i][c] = voldf; + vold[i][c] += vplus[c]; + } + } +} + +// TODO: Make filterX and filterY the same function by doing something with strides +template<typename PT, unsigned int PC> +void filterX(NRPixBlock *const in, NRPixBlock *const out, Value const b[N], double const M[N*N], Value *const tmpdata) { + size_t const w=in->area.x1-in->area.x0, h=in->area.y1-in->area.y0; + // Blur image (x) + for ( size_t y = 0 ; y < h ; y++ ) { + // corresponding line in the source and output buffer + PT const *const srcimg = reinterpret_cast<PT const *>(NR_PIXBLOCK_PX(in) + y*in->rs); + PT *const dstimg = reinterpret_cast<PT *>(NR_PIXBLOCK_PX(out) + y*out->rs); + // Border constants + Value imin[PC]; copy_n(srcimg + (0)*PC, PC, imin); + Value iplus[PC]; copy_n(srcimg + (w-1)*PC, PC, iplus); + // Forward pass + Value u[N+1][PC]; + for(size_t i=0; i<N; i++) copy_n(imin, PC, u[i]); + for ( size_t x = 0 ; x < w ; x++ ) { + for(size_t i=N; i>0; i--) copy_n(u[i-1], PC, u[i]); + copy_n(srcimg+x*PC, PC, u[0]); + for(size_t c=0; c<PC; c++) { + u[0][c] *= b[0]; + for(size_t i=1; i<N+1; i++) u[0][c] += u[i][c]*b[i]; + } + copy_n(u[0], PC, tmpdata+x*PC); + } + // Backward pass + Value v[N+1][PC]; + calcTriggsSdikaInitialization(M, u, iplus, iplus, b[0], v); + for(size_t c=0; c<PC; c++) dstimg[(w-1)*PC+c] = round_cast<PT>(v[0][c]); + size_t x=w-1; + while(x-->0) { + for(size_t i=N; i>0; i--) copy_n(v[i-1], PC, v[i]); + copy_n(tmpdata+x*PC, PC, v[0]); + for(size_t c=0; c<PC; c++) { + v[0][c] *= b[0]; + for(size_t i=1; i<N+1; i++) v[0][c] += v[i][c]*b[i]; + } + for(size_t c=0; c<PC; c++) dstimg[x*PC+c] = round_cast<PT>(v[0][c]); + } + } +} + +template<typename PT, unsigned int PC> +void filterY(NRPixBlock *const in, NRPixBlock *const out, Value const b[N], double const M[N*N], Value *const tmpdata) { + size_t const w=in->area.x1-in->area.x0, h=in->area.y1-in->area.y0; + size_t const in_rs=in->rs/sizeof(PT), out_rs=out->rs/sizeof(PT); + // Blur image (y) + for ( size_t x = 0 ; x < w ; x++ ) { + // corresponding line in the source and output buffer + PT const *const srcimg = reinterpret_cast<PT const *>(NR_PIXBLOCK_PX(in) + x*PC); + PT *const dstimg = reinterpret_cast<PT *>(NR_PIXBLOCK_PX(out) + x*PC); + // Border constants + Value imin[PC]; copy_n(srcimg + (0)*in_rs, PC, imin); + Value iplus[PC]; copy_n(srcimg + (h-1)*in_rs, PC, iplus); + // Forward pass + Value u[N+1][PC]; + for(size_t i=0; i<N; i++) copy_n(imin, PC, u[i]); + for ( size_t y = 0 ; y < h ; y++ ) { + for(size_t i=N; i>0; i--) copy_n(u[i-1], PC, u[i]); + copy_n(srcimg+y*in_rs, PC, u[0]); + for(size_t c=0; c<PC; c++) { + u[0][c] *= b[0]; + for(size_t i=1; i<N+1; i++) u[0][c] += u[i][c]*b[i]; + } + copy_n(u[0], PC, tmpdata+y*PC); + } + // Backward pass + Value v[N+1][PC]; + calcTriggsSdikaInitialization(M, u, iplus, iplus, b[0], v); + for(size_t c=0; c<PC; c++) dstimg[(h-1)*out_rs+c] = round_cast<PT>(v[0][c]); + size_t y=h-1; + while(y-->0) { + for(size_t i=N; i>0; i--) copy_n(v[i-1], PC, v[i]); + copy_n(tmpdata+y*PC, PC, v[0]); + for(size_t c=0; c<PC; c++) { + v[0][c] *= b[0]; + for(size_t i=1; i<N+1; i++) v[0][c] += v[i][c]*b[i]; + } + for(size_t c=0; c<PC; c++) dstimg[y*out_rs+c] = round_cast<PT>(v[0][c]); + } + } +} + int FilterGaussian::render(FilterSlot &slot, Matrix const &trans) { /* in holds the input pixblock */ @@ -288,262 +299,313 @@ return 0; }
- /* Blur radius in screen units (pixels) */ - int scr_len_x = _effect_area_scr_x(trans); - int scr_len_y = _effect_area_scr_y(trans); + // TODO: Find the best threshold (should not be lower than 1) + // TODO: Split into separate x and y parts. + //if (_deviation_x * trans.expansionX() >= 2 && _deviation_y * trans.expansionY() >= 2 && std::abs(_deviation_x * trans.expansionX()-2.37749)>0.00001) { + // This condition was chosen based on the old code that was used to decide on how much subsampling to use. + // (Using the condition below the old code would never have used subsampling.) + if (3*std::min(_deviation_x * trans.expansionX(), _deviation_y * trans.expansionY()) >= 8) { + // new buffer for the final output, same resolution as the in buffer + NRPixBlock *out = new NRPixBlock; + nr_pixblock_setup_fast(out, in->mode, in->area.x0, in->area.y0, + in->area.x1, in->area.y1, true); + if (out->data.px == NULL) { + // alas, we've accomplished a lot, but ran out of memory - so abort + return 0; + } + size_t numChannels; + switch(in->mode) { + case NR_PIXBLOCK_MODE_A8: ///< Grayscale + numChannels = 1; + break; + case NR_PIXBLOCK_MODE_R8G8B8: ///< 8 bit RGB + numChannels = 3; + break; + case NR_PIXBLOCK_MODE_R8G8B8A8N: ///< Normal 8 bit RGBA + numChannels = 4; + break; + case NR_PIXBLOCK_MODE_R8G8B8A8P: ///< Premultiplied 8 bit RGBA + numChannels = 4; + break; + default: + assert(false); + } + Value *const tmpdata = new Value[std::max(in->area.x1-in->area.x0,in->area.y1-in->area.y0)*numChannels]; + if (tmpdata == NULL) { + nr_pixblock_release(out); + delete out; + return 0; + }
- // subsampling step; it depends on the radius, but somewhat nonlinearly, to make high zooms - // workable; is adjustable by quality in -2..2; 0 is the default; 2 is the best quality with no - // subsampling - int quality = prefs_get_int_attribute ("options.blurquality", "value", 0); - int stepx = _effect_subsample_step(scr_len_x, quality); - int stepx_l2 = _effect_subsample_step_log2(scr_len_x, quality); - int stepy = _effect_subsample_step(scr_len_y, quality); - int stepy_l2 = _effect_subsample_step_log2(scr_len_y, quality); - int stepx2 = stepx >> 1; - int stepy2 = stepy >> 1; + // Filter variables + Value b[N+1]; // scaling coefficient + filter coefficients (can be 10.21 fixed point) + double bf[N]; // computed filter coefficients + double M[N*N]; // matrix used for initialization procedure (has to be double)
- /* buffer for x-axis blur */ - NRPixBlock *bufx = new NRPixBlock; - /* buffer for y-axis blur */ - NRPixBlock *bufy = new NRPixBlock; + // Compute filter (x) + calcFilter(_deviation_x * trans.expansionX(), bf); + for(size_t i=0; i<N; i++) bf[i] = -bf[i]; + b[0] = 1; // b[0] == alpha (scaling coefficient) + for(size_t i=0; i<N; i++) { + b[i+1] = bf[i]; + b[0] -= b[i+1]; + }
- // boundaries of the subsampled (smaller, unless step==1) buffers - int xd0 = (in->area.x0 >> stepx_l2); - int xd1 = (in->area.x1 >> stepx_l2) + 1; - int yd0 = (in->area.y0 >> stepy_l2); - int yd1 = (in->area.y1 >> stepy_l2) + 1; + // Compute initialization matrix (x) + calcTriggsSdikaM(bf, M);
- // set up subsampled buffers - nr_pixblock_setup_fast(bufx, in->mode, xd0, yd0, xd1, yd1, true); - nr_pixblock_setup_fast(bufy, in->mode, xd0, yd0, xd1, yd1, true); - if (bufx->data.px == NULL || bufy->data.px == NULL) { // no memory - return 0; - } + // Filter (x) + switch(in->mode) { + case NR_PIXBLOCK_MODE_A8: ///< Grayscale + filterX<unsigned char,1>(in, out, b, M, tmpdata); + break; + case NR_PIXBLOCK_MODE_R8G8B8: ///< 8 bit RGB + filterX<unsigned char,3>(in, out, b, M, tmpdata); + break; + case NR_PIXBLOCK_MODE_R8G8B8A8N: ///< Normal 8 bit RGBA + filterX<unsigned char,4>(in, out, b, M, tmpdata); + break; + case NR_PIXBLOCK_MODE_R8G8B8A8P: ///< Premultiplied 8 bit RGBA + filterX<unsigned char,4>(in, out, b, M, tmpdata); + break; + default: + assert(false); + };
- //mid->visible_area = in->visible_area; - //out->visible_area = in->visible_area; + if ( std::abs(_deviation_x*trans.expansionX() - _deviation_y*trans.expansionY()) > 0.000001 ) { + // Compute filter (y) + calcFilter(_deviation_y * trans.expansionY(), bf); + for(size_t i=0; i<N; i++) bf[i] = -bf[i]; + b[0] = 1; // b[0] == alpha (scaling coefficient) + for(size_t i=0; i<N; i++) { + b[i+1] = bf[i]; + b[0] -= b[i+1]; + }
- /* Array for filter kernel, big enough to fit kernels for both X and Y - * direction kernel, one at time */ - double kernel[_kernel_size(trans)]; - - /* 1. Blur in direction of X-axis, from in to bufx (they have different resolution)*/ - _make_kernel(kernel, _deviation_x, trans.expansionX()); + // Compute initialization matrix (y) + calcTriggsSdikaM(bf, M); + }
- for ( int y = bufx->area.y0 ; y < bufx->area.y1; y++ ) { + // Filter (y) + switch(in->mode) { + case NR_PIXBLOCK_MODE_A8: ///< Grayscale + filterY<unsigned char,1>(out, out, b, M, tmpdata); + break; + case NR_PIXBLOCK_MODE_R8G8B8: ///< 8 bit RGB + filterY<unsigned char,3>(out, out, b, M, tmpdata); + break; + case NR_PIXBLOCK_MODE_R8G8B8A8N: ///< Normal 8 bit RGBA + filterY<unsigned char,4>(out, out, b, M, tmpdata); + break; + case NR_PIXBLOCK_MODE_R8G8B8A8P: ///< Premultiplied 8 bit RGBA + filterY<unsigned char,4>(out, out, b, M, tmpdata); + break; + default: + assert(false); + };
- // corresponding line in the source buffer - int in_line; - if ((y << stepy_l2) >= in->area.y1) { - in_line = (in->area.y1 - in->area.y0 - 1) * in->rs; - } else { - in_line = ((y << stepy_l2) - (in->area.y0)) * in->rs; - if (in_line < 0) - in_line = 0; + delete[] tmpdata; + + out->empty = FALSE; + slot.set(_output, out); + } else { + /* Blur radius in screen units (pixels) */ + int scr_len_x = _effect_area_scr_x(trans); + int scr_len_y = _effect_area_scr_y(trans); + + int const PC = NR_PIXBLOCK_BPP(in); + typedef unsigned char PT; + + /* buffer for blur */ + NRPixBlock *out = new NRPixBlock; + nr_pixblock_setup_fast(out, in->mode, in->area.x0, in->area.y0, + in->area.x1, in->area.y1, true); + if (out->data.px == NULL) { // no memory + return 0; }
- // current line in bufx - int bufx_line = (y - yd0) * bufx->rs; + /* Array for filter kernel, big enough to fit kernels for both X and Y + * direction kernel, one at time */ + double kernel[_kernel_size(trans)];
- int skipbuf[4] = {INT_MIN, INT_MIN, INT_MIN, INT_MIN}; + // Past pixels seen (to enable in-place operation) + PT history[_kernel_size(trans)][PC]; + + int const width=in->area.x1-in->area.x0, height=in->area.y1-in->area.y0;
- for ( int x = bufx->area.x0 ; x < bufx->area.x1 ; x++ ) { + /* 1. Blur in direction of X-axis, from in to bufx (they have different resolution)*/ + _make_kernel(kernel, _deviation_x, trans.expansionX());
- // for all bytes of the pixel - for ( int byte = 0 ; byte < NR_PIXBLOCK_BPP(in) ; byte++) { + for ( int y = 0 ; y < height; y++ ) {
- if(skipbuf[byte] > x) continue; + // corresponding line in the source buffer + int const in_line = y * in->rs;
- double sum = 0; - int last_in = -1; - int different_count = 0; + // current line in bufx + int const bufx_line = y * out->rs;
- // go over our point's neighborhood on x axis in the in buffer, with stepx increment - for ( int i = -scr_len_x ; i <= scr_len_x ; i += stepx ) { + int skipbuf[4] = {INT_MIN, INT_MIN, INT_MIN, INT_MIN};
- // the pixel we're looking at - int x_in = (((x << stepx_l2) + i + stepx2) >> stepx_l2) << stepx_l2; + // history initialization + PT imin[PC]; copy_n(NR_PIXBLOCK_PX(in) + in_line, PC, imin); + for(int i=0; i<scr_len_x; i++) copy_n(imin, PC, history[i]);
- // distance from it to the current x,y - int dist = x_in - (x << stepx_l2); - if (dist < -scr_len_x) - dist = -scr_len_x; - if (dist > scr_len_x) - dist = scr_len_x; + for ( int x = 0 ; x < width ; x++ ) {
- if (x_in >= in->area.x1) { - x_in = (in->area.x1 - in->area.x0 - 1); - } else { - x_in = (x_in - in->area.x0); - if (x_in < 0) - x_in = 0; - } + int const disp = PC * x;
- // value at the pixel - _check_index(in, in_line + NR_PIXBLOCK_BPP(in) * x_in + byte, __LINE__); - unsigned char in_byte = NR_PIXBLOCK_PX(in)[in_line + NR_PIXBLOCK_BPP(in) * x_in + byte]; + // update history + for(int i=scr_len_x; i>0; i--) copy_n(history[i-1], PC, history[i]); + copy_n(NR_PIXBLOCK_PX(in) + in_line + disp, PC, history[0]);
- // is it the same as last one we saw? - if(in_byte != last_in) different_count++; - last_in = in_byte; + // for all bytes of the pixel + for ( int byte = 0 ; byte < PC ; byte++) {
- // sum pixels weighted by the kernel; multiply by stepx because we're skipping stepx pixels - sum += stepx * in_byte * kernel[scr_len_x + dist]; - } + if(skipbuf[byte] > x) continue;
- // store the result in bufx - _check_index(bufx, bufx_line + NR_PIXBLOCK_BPP(bufx) * (x - xd0) + byte, __LINE__); - NR_PIXBLOCK_PX(bufx)[bufx_line + NR_PIXBLOCK_BPP(bufx) * (x - xd0) + byte] = (unsigned char)sum; + double sum = 0; + int last_in = -1; + int different_count = 0;
- // optimization: if there was no variation within this point's neighborhood, - // skip ahead while we keep seeing the same last_in byte: - // blurring flat color would not change it anyway - if (different_count <= 1) { - int pos = x + 1; - while(((pos << stepx_l2) + scr_len_x) < in->area.x1 && - NR_PIXBLOCK_PX(in)[in_line + NR_PIXBLOCK_BPP(in) * ((pos << stepx_l2) + scr_len_x - in->area.x0) + byte] == last_in) - { - _check_index(in, in_line + NR_PIXBLOCK_BPP(in) * ((pos << stepx_l2) + scr_len_x - in->area.x0) + byte, __LINE__); - _check_index(bufx, bufx_line + NR_PIXBLOCK_BPP(bufx) * (pos - xd0) + byte, __LINE__); - NR_PIXBLOCK_PX(bufx)[bufx_line + NR_PIXBLOCK_BPP(bufx) * (pos - xd0) + byte] = last_in; - pos++; + // go over our point's neighbours in the history + for ( int i = 0 ; i <= scr_len_x ; i++ ) { + // value at the pixel + unsigned char in_byte = history[i][byte]; + + // is it the same as last one we saw? + if(in_byte != last_in) different_count++; + last_in = in_byte; + + // sum pixels weighted by the kernel + sum += in_byte * kernel[i]; } - skipbuf[byte] = pos; - } - } - } - }
+ // go over our point's neighborhood on x axis in the in buffer + for ( int i = 1 ; i <= scr_len_x ; i++ ) {
- /* 2. Blur in direction of Y-axis, from bufx to bufy (they have the same resolution) */ - _make_kernel(kernel, _deviation_y, trans.expansionY()); + // the pixel we're looking at + int x_in = x + i; + if (x_in >= width) { + x_in = width - 1; + } else if (x_in < 0) { + x_in = 0; + }
- for ( int x = bufy->area.x0 ; x < bufy->area.x1; x++ ) { + // value at the pixel + _check_index(in, in_line + PC * x_in + byte, __LINE__); + unsigned char in_byte = NR_PIXBLOCK_PX(in)[in_line + PC * x_in + byte];
- int bufy_disp = NR_PIXBLOCK_BPP(bufy) * (x - xd0); - int bufx_disp = NR_PIXBLOCK_BPP(bufx) * (x - xd0); + // is it the same as last one we saw? + if(in_byte != last_in) different_count++; + last_in = in_byte;
- int skipbuf[4] = {INT_MIN, INT_MIN, INT_MIN, INT_MIN}; + // sum pixels weighted by the kernel + sum += in_byte * kernel[i]; + }
- for ( int y = bufy->area.y0; y < bufy->area.y1; y++ ) { + // store the result in bufx + _check_index(out, bufx_line + disp + byte, __LINE__); + NR_PIXBLOCK_PX(out)[bufx_line + disp + byte] = (unsigned char)sum;
- int bufy_line = (y - yd0) * bufy->rs; + // optimization: if there was no variation within this point's neighborhood, + // skip ahead while we keep seeing the same last_in byte: + // blurring flat color would not change it anyway + if (different_count <= 1) { + int pos = x + 1; + while(pos + scr_len_x < width && + NR_PIXBLOCK_PX(in)[in_line + PC * (pos + scr_len_x) + byte] == last_in) + { + _check_index(in, in_line + PC * (pos + scr_len_x) + byte, __LINE__); + _check_index(out, bufx_line + PC * pos + byte, __LINE__); + NR_PIXBLOCK_PX(out)[bufx_line + PC * pos + byte] = last_in; + pos++; + } + skipbuf[byte] = pos; + } + } + } + }
- for ( int byte = 0 ; byte < NR_PIXBLOCK_BPP(bufx) ; byte++) {
- if (skipbuf[byte] > y) continue; + /* 2. Blur in direction of Y-axis, from bufx to bufy (they have the same resolution) */ + _make_kernel(kernel, _deviation_y, trans.expansionY());
- double sum = 0; - int last_in = -1; - int different_count = 0; + for ( int x = 0 ; x < width; x++ ) {
- for ( int i = -scr_len_y ; i <= scr_len_y ; i += stepy ) { + int const disp = PC * x;
- int y_in = ((((y << stepy_l2) + i + stepy2) >> stepy_l2) - yd0); + int skipbuf[4] = {INT_MIN, INT_MIN, INT_MIN, INT_MIN};
- int dist = ((y_in + yd0) << stepy_l2) - (y << stepy_l2); - if (dist < -scr_len_y) - dist = -scr_len_y; - if (dist > scr_len_y) - dist = scr_len_y; + PT imin[PC]; copy_n(NR_PIXBLOCK_PX(out) + disp, PC, imin); + for(int i=0; i<scr_len_y; i++) copy_n(imin, PC, history[i]);
- if (y_in >= (yd1 - yd0)) y_in = (yd1 - yd0) - 1; - if (y_in < 0) y_in = 0; + for ( int y = 0; y < height; y++ ) {
- _check_index(bufx, y_in * bufx->rs + NR_PIXBLOCK_BPP(bufx) * (x - xd0) + byte, __LINE__); - unsigned char in_byte = NR_PIXBLOCK_PX(bufx)[y_in * bufx->rs + NR_PIXBLOCK_BPP(bufx) * (x - xd0) + byte]; - if(in_byte != last_in) different_count++; - last_in = in_byte; - sum += stepy * in_byte * kernel[scr_len_y + dist]; - } + int bufy_line = y * out->rs;
- _check_index(bufy, bufy_line + bufy_disp + byte, __LINE__); - NR_PIXBLOCK_PX(bufy)[bufy_line + bufy_disp + byte] = (unsigned char)sum; + for(int i=scr_len_y; i>0; i--) copy_n(history[i-1], PC, history[i]); + copy_n(NR_PIXBLOCK_PX(out) + bufy_line + disp, PC, history[0]);
- if (different_count <= 1) { - int pos = y + 1; - while((pos + (scr_len_y >> stepy_l2) + 1) < yd1 && - NR_PIXBLOCK_PX(bufx)[(pos + (scr_len_y >> stepy_l2) + 1 - yd0) * bufx->rs + bufx_disp + byte] == last_in) - { - _check_index(bufx, (pos + (scr_len_y >> stepy_l2) + 1 - yd0) * bufx->rs + bufx_disp + byte, __LINE__); - _check_index(bufy, (pos - yd0) * bufy->rs + bufy_disp + byte, __LINE__); - NR_PIXBLOCK_PX(bufy)[(pos - yd0) * bufy->rs + bufy_disp + byte] = last_in; - pos++; - } - skipbuf[byte] = pos; - } + for ( int byte = 0 ; byte < PC ; byte++) {
- } - } - } + if (skipbuf[byte] > y) continue;
- // we don't need bufx anymore - nr_pixblock_release(bufx); - delete bufx; + double sum = 0; + int last_in = -1; + int different_count = 0;
- // interpolation will need to divide by stepx * stepy - int divisor = stepx_l2 + stepy_l2; + for ( int i = 0 ; i <= scr_len_y ; i++ ) { + // value at the pixel + unsigned char in_byte = history[i][byte];
- // new buffer for the final output, same resolution as the in buffer - NRPixBlock *out = new NRPixBlock; - nr_pixblock_setup_fast(out, in->mode, in->area.x0, in->area.y0, - in->area.x1, in->area.y1, true); - if (out->data.px == NULL) { - // alas, we've accomplished a lot, but ran out of memory - so abort - return 0; - } + // is it the same as last one we saw? + if(in_byte != last_in) different_count++; + last_in = in_byte;
- for ( int y = yd0 ; y < yd1 - 1; y++ ) { - for ( int x = xd0 ; x < xd1 - 1; x++ ) { - for ( int byte = 0 ; byte < NR_PIXBLOCK_BPP(bufy) ; byte++) { + // sum pixels weighted by the kernel + sum += in_byte * kernel[i]; + }
- // get 4 values at the corners of the pixel from bufy - _check_index(bufy, ((y - yd0) * bufy->rs) + NR_PIXBLOCK_BPP(bufy) + (x - xd0) + byte, __LINE__); - unsigned char a00 = NR_PIXBLOCK_PX(bufy)[((y - yd0) * bufy->rs) + NR_PIXBLOCK_BPP(bufy) * (x - xd0) + byte]; - if (stepx == 1 && stepy == 1) { // if there was no subsampling, just use a00 - _check_index(out, ((y - yd0) * out->rs) + NR_PIXBLOCK_BPP(out) * (x - xd0) + byte, __LINE__); - NR_PIXBLOCK_PX(out)[((y - yd0) * out->rs) + NR_PIXBLOCK_BPP(out) * (x - xd0) + byte] = a00; - continue; - } - _check_index(bufy, ((y - yd0) * bufy->rs) + NR_PIXBLOCK_BPP(bufy) * (x + 1 - xd0) + byte, __LINE__); - unsigned char a10 = NR_PIXBLOCK_PX(bufy)[((y - yd0) * bufy->rs) + NR_PIXBLOCK_BPP(bufy) * (x + 1 - xd0) + byte]; - _check_index(bufy, ((y + 1 - yd0) * bufy->rs) + NR_PIXBLOCK_BPP(bufy) * (x - xd0) + byte, __LINE__); - unsigned char a01 = NR_PIXBLOCK_PX(bufy)[((y + 1 - yd0) * bufy->rs) + NR_PIXBLOCK_BPP(bufy) * (x - xd0) + byte]; - _check_index(bufy, ((y + 1 - yd0) * bufy->rs) + NR_PIXBLOCK_BPP(bufy) * (x + 1 - xd0) + byte, __LINE__); - unsigned char a11 = NR_PIXBLOCK_PX(bufy)[((y + 1 - yd0) * bufy->rs) + NR_PIXBLOCK_BPP(bufy) * (x + 1 - xd0) + byte]; + for ( int i = 1 ; i <= scr_len_y ; i++ ) {
- // iterate over the rectangle to be interpolated - for ( int yi = 0 ; yi < stepy; yi++ ) { - int iy = stepy - yi; - int y_out = (y << stepy_l2) + yi; - if ((y_out < out->area.y0) || (y_out >= out->area.y1)) - continue; - int out_line = (y_out - out->area.y0) * out->rs; + int y_in = y + i; + if (y_in >= height) { + y_in = height - 1; + } else if (y_in < 0) { + y_in = 0; + }
- for ( int xi = 0 ; xi < stepx; xi++ ) { - int ix = stepx - xi; - int x_out = (x << stepx_l2) + xi; - if ((x_out < out->area.x0) || (x_out >= out->area.x1)) - continue; + _check_index(out, y_in * out->rs + disp + byte, __LINE__); + unsigned char in_byte = NR_PIXBLOCK_PX(out)[y_in * out->rs + disp + byte]; + if(in_byte != last_in) different_count++; + last_in = in_byte; + sum += in_byte * kernel[i]; + }
- // simple linear interpolation - int a = (a00*ix*iy + a10*xi*iy + a01*ix*yi + a11*xi*yi) >> divisor; + _check_index(out, bufy_line + disp + byte, __LINE__); + NR_PIXBLOCK_PX(out)[bufy_line + disp + byte] = (unsigned char)sum;
- _check_index(out, out_line + NR_PIXBLOCK_BPP(out) * (x_out - out->area.x0) + byte, __LINE__); - NR_PIXBLOCK_PX(out)[out_line + NR_PIXBLOCK_BPP(out) * (x_out - out->area.x0) + byte] = (unsigned char) a; + if (different_count == 0) { + int pos = y + 1; + while((pos + scr_len_y) < height && + NR_PIXBLOCK_PX(out)[(pos + scr_len_y) * out->rs + disp + byte] == last_in) + { + _check_index(out, (pos + scr_len_y) * out->rs + disp + byte, __LINE__); + _check_index(out, pos * out->rs + disp + byte, __LINE__); + NR_PIXBLOCK_PX(out)[pos * out->rs + disp + byte] = last_in; + pos++; + } + skipbuf[byte] = pos; } + } } } + + out->empty = FALSE; + slot.set(_output, out); }
- nr_pixblock_release(bufy); - delete bufy; - - out->empty = FALSE; - slot.set(_output, out); - return 0; }
@@ -551,7 +613,7 @@ { int area_x = _effect_area_scr_x(trans); int area_y = _effect_area_scr_y(trans); - return _max(area_x, area_y); + return std::max(area_x, area_y); }
void FilterGaussian::set_deviation(double deviation) Index: src/display/nr-filter-gaussian.h =================================================================== --- src/display/nr-filter-gaussian.h (revision 13436) +++ src/display/nr-filter-gaussian.h (working copy) @@ -7,6 +7,7 @@ * Authors: * Niko Kiirala <niko@...1267...> * bulia byak + * Jasper van de Gronde <th.v.d.gronde@...528...> * * Copyright (C) 2006 authors * @@ -62,17 +63,6 @@ void _make_kernel(double *kernel, double deviation, double expansion); int _effect_area_scr_x(Matrix const &trans); int _effect_area_scr_y(Matrix const &trans); - int _effect_subsample_step(int scr_len_x, int quality); - int _effect_subsample_step_log2(int scr_len_x, int quality); - - inline int _min(int const a, int const b) - { - return ((a < b) ? a : b); - } - inline int _max(int const a, int const b) - { - return ((a > b) ? a : b); - } };
On Sat, 2006-12-02 at 19:13 +0100, Jasper van de Gronde wrote:
When rendering gallardo.svgz using the new code cuts the total time spent in FilterGaussian::render roughly in half (on my computer it is 30-40%, so 50% should be quite attainable), compared to the SVN implementation using average(!) quality, while being virtually indistinguishable from best(!) quality rendering. I have seen some differences on certain test drawings in my test application, but for a rendering of gallardo.svgz the maximum difference was 1 and didn't occur often (and could be due to slightly different rounding, numerical errors, etc.).
Hmm. That worries me a bit -- can you try and construct SVG versions of the drawings in your test application? We really need to get a sense of what the worst case is, and how to mitigate it.
-mental
MenTaLguY wrote:
On Sat, 2006-12-02 at 19:13 +0100, Jasper van de Gronde wrote:
When rendering gallardo.svgz using the new code cuts the total time spent in FilterGaussian::render roughly in half (on my computer it is 30-40%, so 50% should be quite attainable), compared to the SVN implementation using average(!) quality, while being virtually indistinguishable from best(!) quality rendering. I have seen some differences on certain test drawings in my test application, but for a rendering of gallardo.svgz the maximum difference was 1 and didn't occur often (and could be due to slightly different rounding, numerical errors, etc.).
Hmm. That worries me a bit -- can you try and construct SVG versions of the drawings in your test application? We really need to get a sense of what the worst case is, and how to mitigate it.
I imported the image I used earlier in Inkscape and compared the new code with and without the IIR filter and found no (significant) differences. Previously I used Corel PhotoPaint to generate the reference image, so I also tried some other applications and found it quite hard to get the same results in each application.
My guess is that the differences I found previously weren't due to errors in the IIR filter, but rather a difference in interpretation of the radius between Corel PhotoPaint and my code or some optimization of gaussian blur on their side, perhaps they quantize coefficients quite coarsely for example.
I do have a question though. I found the relation between the radius entered in the fill & stroke dialog and the actual value of sigma used quite odd (sigma=radius*400/(width+height)), so I was wondering whether there was a reason for this, because it makes it quite difficult to predict the result of the specified blur.
On 12/3/06, Jasper van de Gronde <th.v.d.gronde@...528...> wrote:
I do have a question though. I found the relation between the radius entered in the fill & stroke dialog and the actual value of sigma used quite odd (sigma=radius*400/(width+height)), so I was wondering whether there was a reason for this, because it makes it quite difficult to predict the result of the specified blur.
This is the end user control, and therefore the formula is designed to make most sense for the user, not the programmer :)
When using blur, I'm rarely interested as a designer in setting a precise radius. Instead I tend to think in terms like "slightly blurred" or "heavily blurred", and these terms are clearly relative to the size of the selected object. If the blur slider was setting an absolute radius, it would be very hard to use for small obejcts (where even a slightest movement of the slider would cause great changes in visible image) and of limited use for large objects (where even the maximum value may be not enough). Therefore it uses percentage. Taking width+height as a base is an attempt to define blur as percentage of a vaguely understood "visual size" of the object - surely the "size" is not the same as as width+height, but it correlates better with that sum than with just width or just height, and slightly better than with sqrt(width^2 + height^2) because a square looks "bigger" than an oblong rectangle with the same diagonal. Finally, the 400 coefficient is there so that the maximum 100% corresponds to the maximum useful blur radius, which turns out to be approximately 1/4 of width+height; anything larger will not alter the image in any meaningful way, just make the already amorphous blob larger and lighter.
P.S. I haven't tried the patch yet but I hope to find time for that in the next few days. Thanks Jasper for your efforts!
This patch is really powerfull, the rendering of my picture (http://www.gnome-look.org/content/show.php?content=48742) take only 1min30s to render in PNG contrary to the previous rendering way that take 4 hours on the same box for the same quality.
But there is one problem with this patch, this is about the speed of rendering in user space, the time needed to render the picture during the design is as long as the rendering to PNG. There is now way to set the quality for the user space with your patch and this is a big problem for the reactivity.
I think you should add a way to choose the quality like the previous rendering method.
Yann
participants (4)
-
bulia byak
-
Jasper van de Gronde
-
MenTaLguY
-
yann.papouin