Hi. I'm using Inkscape to author a comic and the slow speed for certain things is very annoying. I'm already have OpenMP turned on and using 4 cores.

1. large blurs are slow - I'm an expert with writing SIMD code, so I was thinking about vectorizing the Gaussian IIR filter with SIMD intrinsics, even though it's harder than for a FIR. But I noticed there isn't any SIMD code in Inkscape so does that mean it's something to avoid.
I'm pretty sure no current compiler is smart enough to vectorize it, and besides, Inkscape is compiled with -O2, meaning -ftree-vectorize isn't on by default.


2. when there's a large image (raster based) background - scrolling in a zoomed region is very slow
I compiled the latest 0.49 code with GCC profiling and it shows this:

 33.98     22.47    22.47                             exp2l
 21.29     36.55    14.08                             log2l
 17.57     48.17    11.62                             pow
  7.12     52.88     4.71      658     0.01     0.01  ink_cairo_surface_srgb_to_linear(_cairo_surface*)
  6.72     57.32     4.44      563     0.01     0.01  ink_cairo_surface_linear_to_srgb(_cairo_surface*)
  5.51     60.96     3.64     1216     0.00     0.00  Inkscape::Filters::FilterGaussian::~FilterGaussian()
  5.23     64.42     3.46                             internal_modf
  0.59     64.81     0.39                             _mcount_private
  0.41     65.08     0.27                             __fentry__
  0.12     65.16     0.08                             GC_mark_from
  0.09     65.22     0.06     5579     0.00     0.00  Geom::parse_svg_path(char const*, Geom::SVGPathSink&)
  0.06     65.26     0.04    35320     0.00     0.00  bounds_exact_transformed(std::vector<Geom::Path, std::allocator<Geom::Path> > const&, Geom::Affine const&)
  0.06     65.30     0.04        8     0.01     0.01  convert_pixbuf_normal_to_argb32(_GdkPixbuf*)
  0.05     65.33     0.03   885444     0.00     0.00  std::vector<Geom::Linear, std::allocator<Geom::Linear> >::_M_fill_insert(__gnu_cxx::__normal_iterator<Geom::Linear*, std::vector<Geom::Linear, std::allocator<Geom::Linear> > >, unsigned long long, Geom::Linear const&)


The cost is absolutely dominated by ink_cairo_surface_srgb_to_linear() and ink_cairo_surface_linear_to_srgb().  My first instinct was to optimize those 2 functions, but then I thought why are those even being called every time I scroll through the image?
Why not convert the images up front to linear and stay that way in memory?

If that can't be done, then my optimization approach is:
1. replace ink_cairo_surface_srgb_to_linear()  with a simple 3rd degree polynomial approximation (0.902590573087882 - 0.010238759806148x + 0.002825455367280x^2 +  0.000004414767235x^3) and vectorize with SSE intrinsics. The approximation was calculated by minimizing the square error (maxError = 0.313) over the range [10, 255]. For x < 10, it uses simple scaling.

2. replace ink_surface_linear_to_srgb()  with  a vectorized implementation of pow(). Unlike srgb_to_linear(),  a low degree polynomial can't be used due to the curve having larger high order derivatives. An alternative would be piece wise, low order polynomials.

The main question I have is what degree of accuracy is desired? Certainly, it doesn't need double precision pow() since the input is only 8 bits! Is +- 0.5  from the true value (before quantization) OK or do people depend on getting pixel perfect results?


 UJ