Re: [Inkscape-devel] PNG export of SVG with Blur

4 Jan 2007

I have also this problem with big artwork, so for the rendering I'm 
using an Inkscape version with the patch from Jasper van de Gronde (from 
the subject [Inkscape-devel] Faster gaussian blur ). With this patch the 
rendering of my picture takes only 3minutes against 6hours normally.
The problem is that it take the same length to render during the design 
and apparently the quality is not the same (personnally I'm not able to 
do a difference) so I'm using this version only to render my picutres
I have added the patch that was attached to the original post 
(02.12.2006 19:13)
++
Nicu Buculei wrote:
...
I have a not very complex SVG (7 layers, under 20 blurred objects) and 
it proved impossible to export it as PNG at 1600x1200. CPU usage 
climbs to 100% (Athlon XP 1800+) and stay here, Inkscape becomes 
unresponsive. I had to kill it after 30 minutes.
Exporting a thumbnail at 128x96 work OK, but the full image not.
I tried a workaround: separate the file in 7 different files (one for 
each layer), export them as PNG and put together in GIMP. It is doable 
for 6 of the 7 layers (but not nice with some of them).
The last layer remaining (attached) is the worst: I can't export it as 
PNG. It contains only two shapes, each of them with a blur filter. 
Export to PNG at 1600x1200 freeze Inkscape and I killed it after 30 
minutes.
Is there a way for me to get the PNG? I guess I can leave it over 
night to export, but this is ridiculous for such a small and simple file.
Used the autopackage version 20070101 (after I got the same bad 
performance with a slightly older autopackage version).


Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=D...


Inkscape-devel mailing list
Inkscape-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/inkscape-devel
Index: src/display/nr-filter-gaussian.cpp
===================================================================

--- src/display/nr-filter-gaussian.cpp	(revision 13436)
+++ src/display/nr-filter-gaussian.cpp	(working copy)
@@ -6,13 +6,16 @@
  * Authors:
  *   Niko Kiirala <niko@...1267...>
  *   bulia byak
+ *   Jasper van de Gronde <th.v.d.gronde@...528...>
  *
  * Copyright (C) 2006 authors
  *
  * Released under GNU GPL, read the file 'COPYING' for more information
  */
+#include <algorithm>
 #include <cmath>
+#include <complex>
 #include <glib.h>
using std::isnormal;
@@ -24,6 +27,37 @@
 #include "libnr/nr-matrix.h"
 #include "prefs-utils.h"
+// IIR filtering method based on:
+// L.J. van Vliet, I.T. Young, and P.W. Verbeek, Recursive Gaussian Derivative Filters,
+// in: A.K. Jain, S. Venkatesh, B.C. Lovell (eds.),
+// ICPR'98, Proc. 14th Int. Conference on Pattern Recognition (Brisbane, Aug. 16-20),
+// IEEE Computer Society Press, Los Alamitos, 1998, 509-514.
+//
+// Using the backwards-pass initialization procedure from:
+// Boundary Conditions for Young - van Vliet Recursive Filtering
+// Bill Triggs, Michael Sdika
+// IEEE Transactions on Signal Processing, Volume 54, Number 5 - may 2006 
+
+// Number of IIR filter coefficients used. Currently only 3 is supported.
+// "Recursive Gaussian Derivative Filters" says this is enough though (and
+// some testing indeed shows that the quality doesn't improve much if larger
+// filters are used).
+const size_t N = 3;
+
+template<typename InIt, typename OutIt, typename Size>
+void copy_n(InIt beg_in, Size N, OutIt beg_out) {
+    std::copy(beg_in, beg_in+N, beg_out);
+}
+
+typedef double Value; // Type used for filter coefficients (can be 10.21 fixed point)
+template<typename T> inline T sqr(T const v) { return v*v; }
+
+template<typename Tt, typename Ts>
+Tt round_cast(Ts const& v) {
+	static Ts const rndoffset(.5);
+	return static_cast<Tt>(v+rndoffset);
+}
+
 namespace NR {
FilterGaussian::FilterGaussian()
@@ -45,211 +79,39 @@
 {
     int length_x = _effect_area_scr_x(trans);
     int length_y = _effect_area_scr_y(trans);
-    return _max(length_x, length_y) * 2 + 1;
+    return std::max(length_x, length_y) + 1;
 }
void FilterGaussian::_make_kernel(double *kernel, double deviation, double expansion)
 {
-    double length = deviation * 3.0;
-    int scr_len = (int)std::floor(length * expansion);
-    if(scr_len < 1) scr_len = 1;
-    double d_sq = deviation * deviation * 2;
-    double step = length / scr_len;
+    int const scr_len = (int)std::ceil(deviation * 3.0 * expansion);
+    double const d_sq = sqr(deviation * expansion) * 2;
+    // Compute kernel and sum of coefficients
+    // Note that actually only half the kernel is computed, as it is symmetric
     double sum = 0;
-    for ( int i = 0; i < scr_len * 2 + 1 ; i++ ) {
-        double i_sq = (step * i - length) * (step * i - length);
-        sum += (std::exp(-i_sq / d_sq) / std::sqrt(M_PI * d_sq));
+    for ( int i = 0; i <= scr_len ; i++ ) {
+        kernel[i] = std::exp(-sqr(i) / d_sq);
+        sum += kernel[i];
     }
+    sum = 2*sum - kernel[0]; // the sum of the complete kernel is twice as large (minus the center element to avoid counting it twice)
-    for ( int i = 0; i < scr_len * 2 + 1 ; i++ ) {
-        double i_sq = (step * i - length) * (step * i - length);
-        kernel[i] = (std::exp(-i_sq / d_sq) / std::sqrt(M_PI * d_sq)) / sum;
+    // Normalize kernel
+    for ( int i = 0; i <= scr_len ; i++ ) {
+        kernel[i] /= sum;
     }
 }
int FilterGaussian::_effect_area_scr_x(Matrix const &trans)
 {
-    int ret = (int)std::floor(_deviation_x * 3.0 * trans.expansionX());
-    if(ret < 1) ret = 1;
-    return ret;
+    return (int)std::ceil(_deviation_x * 3.0 * trans.expansionX());
 }
int FilterGaussian::_effect_area_scr_y(Matrix const &trans)
 {
-    int ret = (int)std::floor(_deviation_y * 3.0 * trans.expansionY());
-    if(ret < 1) ret = 1;
-    return ret;
+    return (int)std::ceil(_deviation_y * 3.0 * trans.expansionY());
 }
-int FilterGaussian::_effect_subsample_step(int scr_len_x, int quality)
-{
-    switch (quality) {
-        case BLUR_QUALITY_WORST:
-            if (scr_len_x < 8) {
-                return 1;
-            } else if (scr_len_x < 32) {
-                return 4;
-            } else if (scr_len_x < 64) {
-                return 8;
-            } else if (scr_len_x < 128) {
-                return 32;
-            } else if (scr_len_x < 256) {
-                return 128;
-            } else if (scr_len_x < 512) {
-                return 512;
-            } else if (scr_len_x < 1024) {
-                return 4096; 
-            } else {
-                return 65536;
-            }
-            break;
-        case BLUR_QUALITY_WORSE:
-            if (scr_len_x < 16) {
-                return 1;
-            } else if (scr_len_x < 64) {
-                return 4;
-            } else if (scr_len_x < 120) {
-                return 8;
-            } else if (scr_len_x < 200) {
-                return 32;
-            } else if (scr_len_x < 400) {
-                return 64;
-            } else if (scr_len_x < 800) {
-                return 256;
-            } else if (scr_len_x < 1200) {
-                return 1024; 
-            } else {
-                return 65536;
-            }
-            break;
-        case BLUR_QUALITY_BETTER:
-            if (scr_len_x < 32) {
-                return 1;
-            } else if (scr_len_x < 160) {
-                return 4;
-            } else if (scr_len_x < 320) {
-                return 8;
-            } else if (scr_len_x < 640) {
-                return 32;
-            } else if (scr_len_x < 1280) {
-                return 64;
-            } else if (scr_len_x < 2560) {
-                return 256;
-            } else {
-                return 1024;
-            }
-            break;
-        case BLUR_QUALITY_BEST:
-            return 1; // no subsampling at all
-            break;
-        case BLUR_QUALITY_NORMAL:
-        default: 
-            if (scr_len_x < 16) {
-                return 1;
-            } else if (scr_len_x < 80) {
-                return 4;
-            } else if (scr_len_x < 160) {
-                return 8;
-            } else if (scr_len_x < 320) {
-                return 32;
-            } else if (scr_len_x < 640) {
-                return 64;
-            } else if (scr_len_x < 1280) {
-                return 256;
-            } else if (scr_len_x < 2560) {
-                return 1024; 
-            } else {
-                return 65536;
-            }
-            break;
-    }
-}
-
-int FilterGaussian::_effect_subsample_step_log2(int scr_len_x, int quality)
-{
-    switch (quality) {
-        case BLUR_QUALITY_WORST:
-            if (scr_len_x < 8) {
-                return 0;
-            } else if (scr_len_x < 32) {
-                return 2;
-            } else if (scr_len_x < 64) {
-                return 3;
-            } else if (scr_len_x < 128) {
-                return 5;
-            } else if (scr_len_x < 256) {
-                return 7;
-            } else if (scr_len_x < 512) {
-                return 9;
-            } else if (scr_len_x < 1024) {
-                return 12; 
-            } else {
-                return 16;
-            }
-            break;
-        case BLUR_QUALITY_WORSE:
-            if (scr_len_x < 16) {
-                return 0;
-            } else if (scr_len_x < 64) {
-                return 2;
-            } else if (scr_len_x < 120) {
-                return 3;
-            } else if (scr_len_x < 200) {
-                return 5;
-            } else if (scr_len_x < 400) {
-                return 6;
-            } else if (scr_len_x < 800) {
-                return 8;
-            } else if (scr_len_x < 1200) {
-                return 10; 
-            } else {
-                return 16;
-            }
-            break;
-        case BLUR_QUALITY_BETTER:
-            if (scr_len_x < 32) {
-                return 0;
-            } else if (scr_len_x < 160) {
-                return 2;
-            } else if (scr_len_x < 320) {
-                return 3;
-            } else if (scr_len_x < 640) {
-                return 5;
-            } else if (scr_len_x < 1280) {
-                return 6;
-            } else if (scr_len_x < 2560) {
-                return 8;
-            } else {
-                return 10;
-            }
-            break;
-        case BLUR_QUALITY_BEST:
-            return 0; // no subsampling at all
-            break;
-        case BLUR_QUALITY_NORMAL:
-        default:
-            if (scr_len_x < 16) {
-                return 0;
-            } else if (scr_len_x < 80) {
-                return 2;
-            } else if (scr_len_x < 160) {
-                return 3;
-            } else if (scr_len_x < 320) {
-                return 5;
-            } else if (scr_len_x < 640) {
-                return 6;
-            } else if (scr_len_x < 1280) {
-                return 8;
-            } else if (scr_len_x < 2560) {
-                return 10; 
-            } else {
-                return 16;
-            }
-            break;
-    }
-}
-
 /**
  * Sanity check function for indexing pixblocks.
  * Catches reading and writing outside the pixblock area.
@@ -264,6 +126,155 @@
     }
 }
+static void calcFilter(double const sigma, double b[N]) {
+    assert(N==3);
+    std::complex<double> const d1_org(1.40098,  1.00236);
+    double const d3_org = 1.85132;
+    double qbeg = 1; // Don't go lower than sigma==2 (we'd probably want a normal convolution in that case anyway)
+    double qend = 2*sigma;
+    double const sigmasqr = sqr(sigma);
+    double s;
+    do { // Binary search for right q (a linear interpolation scheme is suggested, but this should work fine as well)
+        double const q = (qbeg+qend)/2;
+        // Compute scaled filter coefficients
+        std::complex<double> const d1 = pow(d1_org, 1.0/q);
+        double const d3 = pow(d3_org, 1.0/q);
+        double const absd1sqr = std::norm(d1);
+        double const re2d1 = 2*d1.real();
+        double const bscale = 1.0/(absd1sqr*d3);
+        b[2] = -bscale;
+        b[1] =  bscale*(d3+re2d1);
+        b[0] = -bscale*(absd1sqr+d3*re2d1);
+        // Compute actual sigma^2
+        double const ssqr = 2*(2*(d1/sqr(d1-1.)).real()+d3/sqr(d3-1.));
+        if ( ssqr < sigmasqr ) {
+            qbeg = q;
+        } else {
+            qend = q;
+        }
+        s = sqrt(ssqr);
+    } while(qend-qbeg>(sigma/(1<<30)));
+    printf("real sigma: %g, q: %g\n", s, qend);
+}
+
+static void calcTriggsSdikaM(double const b[N], double M[N*N]) {
+    assert(N==3);
+    double a1=b[0], a2=b[1], a3=b[2];
+    double const Mscale = 1.0/((1+a1-a2+a3)*(1-a1-a2-a3)*(1+a2+(a1-a3)*a3));
+    M[0] = 1-a2-a1*a3-sqr(a3);
+    M[1] = (a1+a3)*(a2+a1*a3);
+    M[2] = a3*(a1+a2*a3);
+    M[3] = a1+a2*a3;
+    M[4] = (1-a2)*(a2+a1*a3);
+    M[5] = a3*(1-a2-a1*a3-sqr(a3));
+    M[6] = a1*(a1+a3)+a2*(1-a2);
+    M[7] = a1*(a2-sqr(a3))+a3*(1+a2*(a2-1)-sqr(a3));
+    M[8] = a3*(a1+a2*a3);
+    for(size_t i=0; i<9; i++) M[i] *= Mscale;
+}
+
+template<unsigned int SIZE>
+static void calcTriggsSdikaInitialization(double const M[N*N], Value const uold[N][SIZE], Value const uplus[SIZE], Value const vplus[SIZE], Value const alpha, Value vold[N][SIZE]) {
+    for(size_t c=0; c<SIZE; c++) {
+        double uminp[N];
+        for(size_t i=0; i<N; i++) uminp[i] = uold[i][c] - uplus[c];
+        for(size_t i=0; i<N; i++) {
+            double voldf = 0;
+            for(size_t j=0; j<N; j++) {
+                voldf += uminp[j]*M[i*N+j];
+            }
+            // Properly takes care of the scaling coefficient alpha and vplus (which is already appropriately scaled)
+            // This was arrived at by starting from a version of the blur filter that ignored the scaling coefficient
+            // (and scaled the final output by alpha^2) and then gradually reintroducing the scaling coefficient.
+            voldf *= alpha; 
+            vold[i][c] = voldf;
+            vold[i][c] += vplus[c];
+        }
+    }
+}
+
+// TODO: Make filterX and filterY the same function by doing something with strides
+template<typename PT, unsigned int PC>
+void filterX(NRPixBlock *const in, NRPixBlock *const out, Value const b[N], double const M[N*N], Value *const tmpdata) {
+    size_t const w=in->area.x1-in->area.x0, h=in->area.y1-in->area.y0;
+    // Blur image (x)
+    for ( size_t y = 0 ; y < h ; y++ ) {
+        // corresponding line in the source and output buffer
+        PT const *const srcimg = reinterpret_cast<PT const *>(NR_PIXBLOCK_PX(in)  + y*in->rs);
+        PT       *const dstimg = reinterpret_cast<PT       *>(NR_PIXBLOCK_PX(out) + y*out->rs);
+        // Border constants
+        Value imin[PC];  copy_n(srcimg + (0)*PC, PC, imin);
+        Value iplus[PC]; copy_n(srcimg + (w-1)*PC, PC, iplus);
+        // Forward pass
+        Value u[N+1][PC];
+        for(size_t i=0; i<N; i++) copy_n(imin, PC, u[i]);
+        for ( size_t x = 0 ; x < w ; x++ ) {
+            for(size_t i=N; i>0; i--) copy_n(u[i-1], PC, u[i]);
+            copy_n(srcimg+x*PC, PC, u[0]);
+            for(size_t c=0; c<PC; c++) {
+                u[0][c] *= b[0];
+                for(size_t i=1; i<N+1; i++) u[0][c] += u[i][c]*b[i];
+            }
+            copy_n(u[0], PC, tmpdata+x*PC);
+        }
+        // Backward pass
+        Value v[N+1][PC];
+        calcTriggsSdikaInitialization(M, u, iplus, iplus, b[0], v);
+        for(size_t c=0; c<PC; c++) dstimg[(w-1)*PC+c] = round_cast<PT>(v[0][c]);
+        size_t x=w-1;
+        while(x-->0) {
+            for(size_t i=N; i>0; i--) copy_n(v[i-1], PC, v[i]);
+            copy_n(tmpdata+x*PC, PC, v[0]);
+            for(size_t c=0; c<PC; c++) {
+                v[0][c] *= b[0];
+                for(size_t i=1; i<N+1; i++) v[0][c] += v[i][c]*b[i];
+            }
+            for(size_t c=0; c<PC; c++) dstimg[x*PC+c] = round_cast<PT>(v[0][c]);
+        }
+    }
+}
+
+template<typename PT, unsigned int PC>
+void filterY(NRPixBlock *const in, NRPixBlock *const out, Value const b[N], double const M[N*N], Value *const tmpdata) {
+    size_t const w=in->area.x1-in->area.x0, h=in->area.y1-in->area.y0;
+    size_t const in_rs=in->rs/sizeof(PT), out_rs=out->rs/sizeof(PT);
+    // Blur image (y)
+    for ( size_t x = 0 ; x < w ; x++ ) {
+        // corresponding line in the source and output buffer
+        PT const *const srcimg = reinterpret_cast<PT const *>(NR_PIXBLOCK_PX(in)  + x*PC);
+        PT       *const dstimg = reinterpret_cast<PT       *>(NR_PIXBLOCK_PX(out) + x*PC);
+        // Border constants
+        Value imin[PC];  copy_n(srcimg + (0)*in_rs, PC, imin);
+        Value iplus[PC]; copy_n(srcimg + (h-1)*in_rs, PC, iplus);
+        // Forward pass
+        Value u[N+1][PC];
+        for(size_t i=0; i<N; i++) copy_n(imin, PC, u[i]);
+        for ( size_t y = 0 ; y < h ; y++ ) {
+            for(size_t i=N; i>0; i--) copy_n(u[i-1], PC, u[i]);
+            copy_n(srcimg+y*in_rs, PC, u[0]);
+            for(size_t c=0; c<PC; c++) {
+                u[0][c] *= b[0];
+                for(size_t i=1; i<N+1; i++) u[0][c] += u[i][c]*b[i];
+            }
+            copy_n(u[0], PC, tmpdata+y*PC);
+        }
+        // Backward pass
+        Value v[N+1][PC];
+        calcTriggsSdikaInitialization(M, u, iplus, iplus, b[0], v);
+        for(size_t c=0; c<PC; c++) dstimg[(h-1)*out_rs+c] = round_cast<PT>(v[0][c]);
+        size_t y=h-1;
+        while(y-->0) {
+            for(size_t i=N; i>0; i--) copy_n(v[i-1], PC, v[i]);
+            copy_n(tmpdata+y*PC, PC, v[0]);
+            for(size_t c=0; c<PC; c++) {
+                v[0][c] *= b[0];
+                for(size_t i=1; i<N+1; i++) v[0][c] += v[i][c]*b[i];
+            }
+            for(size_t c=0; c<PC; c++) dstimg[y*out_rs+c] = round_cast<PT>(v[0][c]);
+        }
+    }
+}
+
 int FilterGaussian::render(FilterSlot &slot, Matrix const &trans)
 {
     /* in holds the input pixblock */
@@ -288,262 +299,313 @@
         return 0;
     }
-    /* Blur radius in screen units (pixels) */
-    int scr_len_x = _effect_area_scr_x(trans);
-    int scr_len_y = _effect_area_scr_y(trans);
+    // TODO: Find the best threshold (should not be lower than 1)
+    // TODO: Split into separate x and y parts.
+    //if (_deviation_x * trans.expansionX() >= 2 && _deviation_y * trans.expansionY() >= 2 && std::abs(_deviation_x * trans.expansionX()-2.37749)>0.00001) {
+    // This condition was chosen based on the old code that was used to decide on how much subsampling to use.
+    // (Using the condition below the old code would never have used subsampling.)
+    if (3*std::min(_deviation_x * trans.expansionX(), _deviation_y * trans.expansionY()) >= 8) {
+        // new buffer for the final output, same resolution as the in buffer
+        NRPixBlock *out = new NRPixBlock;
+        nr_pixblock_setup_fast(out, in->mode, in->area.x0, in->area.y0,
+                                              in->area.x1, in->area.y1, true);
+        if (out->data.px == NULL) {
+            // alas, we've accomplished a lot, but ran out of memory - so abort
+            return 0;
+        }
+        size_t numChannels;
+        switch(in->mode) {
+        case NR_PIXBLOCK_MODE_A8:        ///< Grayscale
+            numChannels = 1;
+            break;
+        case NR_PIXBLOCK_MODE_R8G8B8:    ///< 8 bit RGB
+            numChannels = 3;
+            break;
+        case NR_PIXBLOCK_MODE_R8G8B8A8N: ///< Normal 8 bit RGBA
+            numChannels = 4;
+            break;
+        case NR_PIXBLOCK_MODE_R8G8B8A8P:  ///< Premultiplied 8 bit RGBA
+            numChannels = 4;
+            break;
+        default:
+            assert(false);
+        }
+        Value *const tmpdata = new Value[std::max(in->area.x1-in->area.x0,in->area.y1-in->area.y0)*numChannels];
+        if (tmpdata == NULL) {
+            nr_pixblock_release(out);
+            delete out;
+            return 0;
+        }
-    // subsampling step; it depends on the radius, but somewhat nonlinearly, to make high zooms
-    // workable; is adjustable by quality in -2..2; 0 is the default; 2 is the best quality with no
-    // subsampling
-    int quality = prefs_get_int_attribute ("options.blurquality", "value", 0);
-    int stepx = _effect_subsample_step(scr_len_x, quality);
-    int stepx_l2 = _effect_subsample_step_log2(scr_len_x, quality);
-    int stepy = _effect_subsample_step(scr_len_y, quality);
-    int stepy_l2 = _effect_subsample_step_log2(scr_len_y, quality);
-    int stepx2 = stepx >> 1;
-    int stepy2 = stepy >> 1;
+        // Filter variables
+        Value b[N+1];  // scaling coefficient + filter coefficients (can be 10.21 fixed point)
+        double bf[N];  // computed filter coefficients
+        double M[N*N]; // matrix used for initialization procedure (has to be double)
-    /* buffer for x-axis blur */
-    NRPixBlock *bufx = new NRPixBlock;
-    /* buffer for y-axis blur */
-    NRPixBlock *bufy = new NRPixBlock;
+        // Compute filter (x)
+        calcFilter(_deviation_x * trans.expansionX(), bf);
+        for(size_t i=0; i<N; i++) bf[i] = -bf[i];
+        b[0] = 1; // b[0] == alpha (scaling coefficient)
+        for(size_t i=0; i<N; i++) {
+            b[i+1] = bf[i];
+            b[0] -= b[i+1];
+        }
-    // boundaries of the subsampled (smaller, unless step==1) buffers
-    int xd0 = (in->area.x0 >> stepx_l2);
-    int xd1 = (in->area.x1 >> stepx_l2) + 1;
-    int yd0 = (in->area.y0 >> stepy_l2);
-    int yd1 = (in->area.y1 >> stepy_l2) + 1;
+        // Compute initialization matrix (x)
+        calcTriggsSdikaM(bf, M);
-    // set up subsampled buffers
-    nr_pixblock_setup_fast(bufx, in->mode, xd0, yd0, xd1, yd1, true);
-    nr_pixblock_setup_fast(bufy, in->mode, xd0, yd0, xd1, yd1, true);
-    if (bufx->data.px == NULL || bufy->data.px == NULL) { // no memory
-        return 0;
-    }
+        // Filter (x)
+        switch(in->mode) {
+        case NR_PIXBLOCK_MODE_A8:        ///< Grayscale
+            filterX<unsigned char,1>(in, out, b, M, tmpdata);
+            break;
+        case NR_PIXBLOCK_MODE_R8G8B8:    ///< 8 bit RGB
+            filterX<unsigned char,3>(in, out, b, M, tmpdata);
+            break;
+        case NR_PIXBLOCK_MODE_R8G8B8A8N: ///< Normal 8 bit RGBA
+            filterX<unsigned char,4>(in, out, b, M, tmpdata);
+            break;
+        case NR_PIXBLOCK_MODE_R8G8B8A8P:  ///< Premultiplied 8 bit RGBA
+            filterX<unsigned char,4>(in, out, b, M, tmpdata);
+            break;
+        default:
+            assert(false);
+        };
-    //mid->visible_area = in->visible_area;
-    //out->visible_area = in->visible_area;
+        if ( std::abs(_deviation_x*trans.expansionX() - _deviation_y*trans.expansionY()) > 0.000001 ) {
+            // Compute filter (y)
+            calcFilter(_deviation_y * trans.expansionY(), bf);
+            for(size_t i=0; i<N; i++) bf[i] = -bf[i];
+            b[0] = 1; // b[0] == alpha (scaling coefficient)
+            for(size_t i=0; i<N; i++) {
+                b[i+1] = bf[i];
+                b[0] -= b[i+1];
+            }
-    /* Array for filter kernel, big enough to fit kernels for both X and Y
-     * direction kernel, one at time */
-    double kernel[_kernel_size(trans)];
-    
-    /* 1. Blur in direction of X-axis, from in to bufx (they have different resolution)*/
-    _make_kernel(kernel, _deviation_x, trans.expansionX());
+            // Compute initialization matrix (y)
+            calcTriggsSdikaM(bf, M);
+        }
-    for ( int y = bufx->area.y0 ; y < bufx->area.y1; y++ ) {
+        // Filter (y)
+        switch(in->mode) {
+        case NR_PIXBLOCK_MODE_A8:        ///< Grayscale
+            filterY<unsigned char,1>(out, out, b, M, tmpdata);
+            break;
+        case NR_PIXBLOCK_MODE_R8G8B8:    ///< 8 bit RGB
+            filterY<unsigned char,3>(out, out, b, M, tmpdata);
+            break;
+        case NR_PIXBLOCK_MODE_R8G8B8A8N: ///< Normal 8 bit RGBA
+            filterY<unsigned char,4>(out, out, b, M, tmpdata);
+            break;
+        case NR_PIXBLOCK_MODE_R8G8B8A8P:  ///< Premultiplied 8 bit RGBA
+            filterY<unsigned char,4>(out, out, b, M, tmpdata);
+            break;
+        default:
+            assert(false);
+        };
-        // corresponding line in the source buffer
-        int in_line;
-        if ((y << stepy_l2) >= in->area.y1) {
-            in_line = (in->area.y1 - in->area.y0 - 1) * in->rs;
-        } else {
-            in_line = ((y << stepy_l2) - (in->area.y0))  * in->rs;
-            if (in_line < 0)
-                in_line = 0;
+        delete[] tmpdata;
+
+        out->empty = FALSE;
+        slot.set(_output, out);
+    } else {
+		/* Blur radius in screen units (pixels) */
+        int scr_len_x = _effect_area_scr_x(trans);
+        int scr_len_y = _effect_area_scr_y(trans);
+
+        int const PC = NR_PIXBLOCK_BPP(in);
+        typedef unsigned char PT;
+
+        /* buffer for blur */
+        NRPixBlock *out = new NRPixBlock;
+        nr_pixblock_setup_fast(out, in->mode, in->area.x0, in->area.y0,
+                                              in->area.x1, in->area.y1, true);
+        if (out->data.px == NULL) { // no memory
+            return 0;
         }
-        // current line in bufx
-        int bufx_line = (y - yd0) * bufx->rs;
+        /* Array for filter kernel, big enough to fit kernels for both X and Y
+        * direction kernel, one at time */
+        double kernel[_kernel_size(trans)];
-        int skipbuf[4] = {INT_MIN, INT_MIN, INT_MIN, INT_MIN};
+        // Past pixels seen (to enable in-place operation)
+        PT history[_kernel_size(trans)][PC];
+        
+        int const width=in->area.x1-in->area.x0, height=in->area.y1-in->area.y0;
-        for ( int x = bufx->area.x0 ; x < bufx->area.x1 ; x++ ) {
+        /* 1. Blur in direction of X-axis, from in to bufx (they have different resolution)*/
+        _make_kernel(kernel, _deviation_x, trans.expansionX());
-            // for all bytes of the pixel
-            for ( int byte = 0 ; byte < NR_PIXBLOCK_BPP(in) ; byte++) {
+        for ( int y = 0 ; y < height; y++ ) {
-                if(skipbuf[byte] > x) continue;
+            // corresponding line in the source buffer
+            int const in_line = y * in->rs;
-                double sum = 0;
-                int last_in = -1;
-                int different_count = 0;
+            // current line in bufx
+            int const bufx_line = y * out->rs;
-                // go over our point's neighborhood on x axis in the in buffer, with stepx increment
-                for ( int i = -scr_len_x ; i <= scr_len_x ; i += stepx ) {
+            int skipbuf[4] = {INT_MIN, INT_MIN, INT_MIN, INT_MIN};
-                    // the pixel we're looking at
-                    int x_in = (((x << stepx_l2) + i + stepx2) >> stepx_l2) << stepx_l2;
+            // history initialization
+            PT imin[PC]; copy_n(NR_PIXBLOCK_PX(in) + in_line, PC, imin);
+            for(int i=0; i<scr_len_x; i++) copy_n(imin, PC, history[i]);
-                    // distance from it to the current x,y
-                    int dist = x_in - (x << stepx_l2);
-                    if (dist < -scr_len_x) 
-                        dist = -scr_len_x;
-                    if (dist > scr_len_x) 
-                        dist = scr_len_x;
+            for ( int x = 0 ; x < width ; x++ ) {
-                    if (x_in >= in->area.x1) {
-                        x_in = (in->area.x1 - in->area.x0 - 1);
-                    } else {
-                        x_in = (x_in - in->area.x0);
-                        if (x_in < 0)
-                            x_in = 0;
-                    }
+                int const disp = PC * x;
-                    // value at the pixel
-                    _check_index(in, in_line + NR_PIXBLOCK_BPP(in) * x_in + byte, __LINE__);
-                    unsigned char in_byte = NR_PIXBLOCK_PX(in)[in_line + NR_PIXBLOCK_BPP(in) * x_in + byte];
+                // update history
+                for(int i=scr_len_x; i>0; i--) copy_n(history[i-1], PC, history[i]);
+                copy_n(NR_PIXBLOCK_PX(in) + in_line + disp, PC, history[0]);
-                    // is it the same as last one we saw?
-                    if(in_byte != last_in) different_count++;
-                    last_in = in_byte;
+                // for all bytes of the pixel
+                for ( int byte = 0 ; byte < PC ; byte++) {
-                    // sum pixels weighted by the kernel; multiply by stepx because we're skipping stepx pixels
-                    sum += stepx * in_byte * kernel[scr_len_x + dist];
-                }
+                    if(skipbuf[byte] > x) continue;
-                // store the result in bufx
-                _check_index(bufx, bufx_line + NR_PIXBLOCK_BPP(bufx) * (x - xd0) + byte, __LINE__);
-                NR_PIXBLOCK_PX(bufx)[bufx_line + NR_PIXBLOCK_BPP(bufx) * (x - xd0) + byte] = (unsigned char)sum;
+                    double sum = 0;
+                    int last_in = -1;
+                    int different_count = 0;
-                // optimization: if there was no variation within this point's neighborhood, 
-                // skip ahead while we keep seeing the same last_in byte: 
-                // blurring flat color would not change it anyway
-                if (different_count <= 1) {
-                    int pos = x + 1;
-                    while(((pos << stepx_l2) + scr_len_x) < in->area.x1 &&
-                          NR_PIXBLOCK_PX(in)[in_line + NR_PIXBLOCK_BPP(in) * ((pos << stepx_l2) + scr_len_x - in->area.x0) + byte] == last_in)
-                    {
-                        _check_index(in, in_line + NR_PIXBLOCK_BPP(in) * ((pos << stepx_l2) + scr_len_x - in->area.x0) + byte, __LINE__);
-                        _check_index(bufx, bufx_line + NR_PIXBLOCK_BPP(bufx) * (pos - xd0) + byte, __LINE__);
-                        NR_PIXBLOCK_PX(bufx)[bufx_line + NR_PIXBLOCK_BPP(bufx) * (pos - xd0) + byte] = last_in;
-                        pos++;
+                    // go over our point's neighbours in the history
+                    for ( int i = 0 ; i <= scr_len_x ; i++ ) {
+                        // value at the pixel
+                        unsigned char in_byte = history[i][byte];
+
+                        // is it the same as last one we saw?
+                        if(in_byte != last_in) different_count++;
+                        last_in = in_byte;
+
+                        // sum pixels weighted by the kernel
+                        sum += in_byte * kernel[i];
                     }
-                    skipbuf[byte] = pos;
-                }
-            }
-        }
-    }
+                    // go over our point's neighborhood on x axis in the in buffer
+                    for ( int i = 1 ; i <= scr_len_x ; i++ ) {
-    /* 2. Blur in direction of Y-axis, from bufx to bufy (they have the same resolution) */
-    _make_kernel(kernel, _deviation_y, trans.expansionY());
+                        // the pixel we're looking at
+                        int x_in = x + i;
+                        if (x_in >= width) {
+                            x_in = width - 1;
+                        } else if (x_in < 0) {
+                            x_in = 0;
+                        }
-    for ( int x = bufy->area.x0 ; x < bufy->area.x1; x++ ) {
+                        // value at the pixel
+                        _check_index(in, in_line + PC * x_in + byte, __LINE__);
+                        unsigned char in_byte = NR_PIXBLOCK_PX(in)[in_line + PC * x_in + byte];
-        int bufy_disp = NR_PIXBLOCK_BPP(bufy) * (x - xd0);
-        int bufx_disp = NR_PIXBLOCK_BPP(bufx) * (x - xd0);
+                        // is it the same as last one we saw?
+                        if(in_byte != last_in) different_count++;
+                        last_in = in_byte;
-        int skipbuf[4] = {INT_MIN, INT_MIN, INT_MIN, INT_MIN};
+                        // sum pixels weighted by the kernel
+                        sum += in_byte * kernel[i];
+                    }
-        for ( int y = bufy->area.y0; y < bufy->area.y1; y++ ) {
+                    // store the result in bufx
+                    _check_index(out, bufx_line + disp + byte, __LINE__);
+                    NR_PIXBLOCK_PX(out)[bufx_line + disp + byte] = (unsigned char)sum;
-            int bufy_line = (y - yd0) * bufy->rs;
+                    // optimization: if there was no variation within this point's neighborhood, 
+                    // skip ahead while we keep seeing the same last_in byte: 
+                    // blurring flat color would not change it anyway
+                    if (different_count <= 1) {
+                        int pos = x + 1;
+                        while(pos + scr_len_x < width &&
+                              NR_PIXBLOCK_PX(in)[in_line + PC * (pos + scr_len_x) + byte] == last_in)
+                        {
+                            _check_index(in, in_line + PC * (pos + scr_len_x) + byte, __LINE__);
+                            _check_index(out, bufx_line + PC * pos + byte, __LINE__);
+                            NR_PIXBLOCK_PX(out)[bufx_line + PC * pos + byte] = last_in;
+                            pos++;
+                        }
+                        skipbuf[byte] = pos;
+                    }
+                }
+            }
+        }
-            for ( int byte = 0 ; byte < NR_PIXBLOCK_BPP(bufx) ; byte++) {
-                if (skipbuf[byte] > y) continue;
+        /* 2. Blur in direction of Y-axis, from bufx to bufy (they have the same resolution) */
+        _make_kernel(kernel, _deviation_y, trans.expansionY());
-                double sum = 0;
-                int last_in = -1;
-                int different_count = 0;
+        for ( int x = 0 ; x < width; x++ ) {
-                for ( int i = -scr_len_y ; i <= scr_len_y ; i += stepy ) {
+            int const disp = PC * x;
-                    int y_in = ((((y << stepy_l2) + i + stepy2) >> stepy_l2) - yd0);
+            int skipbuf[4] = {INT_MIN, INT_MIN, INT_MIN, INT_MIN};
-                    int dist = ((y_in + yd0) << stepy_l2) - (y << stepy_l2);
-                    if (dist < -scr_len_y) 
-                        dist = -scr_len_y;
-                    if (dist > scr_len_y) 
-                        dist = scr_len_y;
+            PT imin[PC]; copy_n(NR_PIXBLOCK_PX(out) + disp, PC, imin);
+            for(int i=0; i<scr_len_y; i++) copy_n(imin, PC, history[i]);
-                    if (y_in >= (yd1 - yd0)) y_in = (yd1 - yd0) - 1;
-                    if (y_in < 0) y_in = 0;
+            for ( int y = 0; y < height; y++ ) {
-                    _check_index(bufx, y_in * bufx->rs + NR_PIXBLOCK_BPP(bufx) * (x - xd0) + byte, __LINE__);
-                    unsigned char in_byte = NR_PIXBLOCK_PX(bufx)[y_in * bufx->rs + NR_PIXBLOCK_BPP(bufx) * (x - xd0) + byte];
-                    if(in_byte != last_in) different_count++;
-                    last_in = in_byte;
-                    sum += stepy * in_byte * kernel[scr_len_y + dist];
-                }
+                int bufy_line = y * out->rs;
-                _check_index(bufy, bufy_line + bufy_disp + byte, __LINE__);
-                NR_PIXBLOCK_PX(bufy)[bufy_line + bufy_disp + byte] = (unsigned char)sum;
+                for(int i=scr_len_y; i>0; i--) copy_n(history[i-1], PC, history[i]);
+                copy_n(NR_PIXBLOCK_PX(out) + bufy_line + disp, PC, history[0]);
-                if (different_count <= 1) {
-                    int pos = y + 1;
-                    while((pos + (scr_len_y >> stepy_l2) + 1) < yd1 &&
-                          NR_PIXBLOCK_PX(bufx)[(pos + (scr_len_y >> stepy_l2) + 1 - yd0) * bufx->rs + bufx_disp + byte] == last_in)
-                    {
-                        _check_index(bufx, (pos + (scr_len_y >> stepy_l2) + 1 - yd0) * bufx->rs + bufx_disp + byte, __LINE__);
-                        _check_index(bufy, (pos - yd0) * bufy->rs + bufy_disp + byte, __LINE__);
-                        NR_PIXBLOCK_PX(bufy)[(pos - yd0) * bufy->rs + bufy_disp + byte] = last_in;
-                        pos++;
-                    }
-                    skipbuf[byte] = pos;
-                }
+                for ( int byte = 0 ; byte < PC ; byte++) {
-            }
-        }
-    }
+                    if (skipbuf[byte] > y) continue;
-    // we don't need bufx anymore
-    nr_pixblock_release(bufx);
-    delete bufx;
+                    double sum = 0;
+                    int last_in = -1;
+                    int different_count = 0;
-    // interpolation will need to divide by stepx * stepy
-    int divisor = stepx_l2 + stepy_l2;
+                    for ( int i = 0 ; i <= scr_len_y ; i++ ) {
+                        // value at the pixel
+                        unsigned char in_byte = history[i][byte];
-    // new buffer for the final output, same resolution as the in buffer
-    NRPixBlock *out = new NRPixBlock;
-    nr_pixblock_setup_fast(out, in->mode, in->area.x0, in->area.y0,
-                           in->area.x1, in->area.y1, true);
-    if (out->data.px == NULL) {
-        // alas, we've accomplished a lot, but ran out of memory - so abort
-        return 0;
-    }
+                        // is it the same as last one we saw?
+                        if(in_byte != last_in) different_count++;
+                        last_in = in_byte;
-    for ( int y = yd0 ; y < yd1 - 1; y++ ) {
-        for ( int x = xd0 ; x < xd1 - 1; x++ ) {
-            for ( int byte = 0 ; byte < NR_PIXBLOCK_BPP(bufy) ; byte++) {
+                        // sum pixels weighted by the kernel
+                        sum += in_byte * kernel[i];
+                    }
-                // get 4 values at the corners of the pixel from bufy
-                _check_index(bufy, ((y - yd0) * bufy->rs) + NR_PIXBLOCK_BPP(bufy) + (x - xd0) + byte, __LINE__);
-                unsigned char a00 = NR_PIXBLOCK_PX(bufy)[((y - yd0) * bufy->rs) + NR_PIXBLOCK_BPP(bufy) * (x - xd0) + byte];
-                if (stepx == 1 && stepy == 1) { // if there was no subsampling, just use a00
-                    _check_index(out, ((y - yd0) * out->rs) + NR_PIXBLOCK_BPP(out) * (x - xd0) + byte, __LINE__);
-                    NR_PIXBLOCK_PX(out)[((y - yd0) * out->rs) + NR_PIXBLOCK_BPP(out) * (x - xd0) + byte] = a00;
-                    continue;
-                }
-                _check_index(bufy, ((y - yd0) * bufy->rs) + NR_PIXBLOCK_BPP(bufy) * (x + 1 - xd0) + byte, __LINE__);
-                unsigned char a10 = NR_PIXBLOCK_PX(bufy)[((y - yd0) * bufy->rs) + NR_PIXBLOCK_BPP(bufy) * (x + 1 - xd0) + byte];
-                _check_index(bufy, ((y + 1 - yd0) * bufy->rs) + NR_PIXBLOCK_BPP(bufy) * (x - xd0) + byte, __LINE__);
-                unsigned char a01 = NR_PIXBLOCK_PX(bufy)[((y + 1 - yd0) * bufy->rs) + NR_PIXBLOCK_BPP(bufy) * (x - xd0) + byte];
-                _check_index(bufy, ((y + 1 - yd0) * bufy->rs) + NR_PIXBLOCK_BPP(bufy) * (x + 1 - xd0) + byte, __LINE__);
-                unsigned char a11 = NR_PIXBLOCK_PX(bufy)[((y + 1 - yd0) * bufy->rs) + NR_PIXBLOCK_BPP(bufy) * (x + 1 - xd0) + byte];
+                    for ( int i = 1 ; i <= scr_len_y ; i++ ) {
-                // iterate over the rectangle to be interpolated
-                for ( int yi = 0 ; yi < stepy; yi++ ) {
-                    int iy = stepy - yi;
-                    int y_out = (y << stepy_l2) + yi;
-                    if ((y_out < out->area.y0) || (y_out >= out->area.y1))
-                        continue;
-                    int out_line  = (y_out - out->area.y0) * out->rs;
+                        int y_in = y + i;
+                        if (y_in >= height) {
+                            y_in = height - 1;
+                        } else if (y_in < 0) {
+                            y_in = 0;
+                        }
-                    for ( int xi = 0 ; xi < stepx; xi++ ) {
-                        int ix = stepx - xi;
-                        int x_out = (x << stepx_l2) + xi;
-                        if ((x_out < out->area.x0) || (x_out >= out->area.x1))
-                            continue;
+                        _check_index(out, y_in * out->rs + disp + byte, __LINE__);
+                        unsigned char in_byte = NR_PIXBLOCK_PX(out)[y_in * out->rs + disp + byte];
+                        if(in_byte != last_in) different_count++;
+                        last_in = in_byte;
+                        sum += in_byte * kernel[i];
+                    }
-                        // simple linear interpolation
-                        int a = (a00*ix*iy + a10*xi*iy + a01*ix*yi + a11*xi*yi) >> divisor;
+                    _check_index(out, bufy_line + disp + byte, __LINE__);
+                    NR_PIXBLOCK_PX(out)[bufy_line + disp + byte] = (unsigned char)sum;
-                        _check_index(out, out_line + NR_PIXBLOCK_BPP(out) * (x_out - out->area.x0) + byte, __LINE__);
-                        NR_PIXBLOCK_PX(out)[out_line + NR_PIXBLOCK_BPP(out) * (x_out - out->area.x0) + byte] = (unsigned char) a;
+                    if (different_count == 0) {
+                        int pos = y + 1;
+                        while((pos + scr_len_y) < height &&
+                            NR_PIXBLOCK_PX(out)[(pos + scr_len_y) * out->rs + disp + byte] == last_in)
+                        {
+                            _check_index(out, (pos + scr_len_y) * out->rs + disp + byte, __LINE__);
+                            _check_index(out, pos * out->rs + disp + byte, __LINE__);
+                            NR_PIXBLOCK_PX(out)[pos * out->rs + disp + byte] = last_in;
+                            pos++;
+                        }
+                        skipbuf[byte] = pos;
                     }
+
                 }
             }
         }
+
+        out->empty = FALSE;
+        slot.set(_output, out);
     }
-    nr_pixblock_release(bufy);
-    delete bufy;
-
-    out->empty = FALSE;
-    slot.set(_output, out);
-
     return 0;
 }
@@ -551,7 +613,7 @@
 {
     int area_x = _effect_area_scr_x(trans);
     int area_y = _effect_area_scr_y(trans);
-    return _max(area_x, area_y);
+    return std::max(area_x, area_y);
 }
void FilterGaussian::set_deviation(double deviation)
Index: src/display/nr-filter-gaussian.h
===================================================================
--- src/display/nr-filter-gaussian.h	(revision 13436)
+++ src/display/nr-filter-gaussian.h	(working copy)
@@ -7,6 +7,7 @@
  * Authors:
  *   Niko Kiirala <niko@...1267...>
  *   bulia byak
+ *   Jasper van de Gronde <th.v.d.gronde@...528...>
  *
  * Copyright (C) 2006 authors
  *
@@ -62,17 +63,6 @@
     void _make_kernel(double *kernel, double deviation, double expansion);
     int _effect_area_scr_x(Matrix const &trans);
     int _effect_area_scr_y(Matrix const &trans);
-    int _effect_subsample_step(int scr_len_x, int quality);
-    int _effect_subsample_step_log2(int scr_len_x, int quality);
-
-    inline int _min(int const a, int const b)
-    {
-        return ((a < b) ? a : b);
-    }
-    inline int _max(int const a, int const b)
-    {
-        return ((a > b) ? a : b);
-    }
 };

    

Re: [Inkscape-devel] PNG export of SVG with Blur

Yann Papouin