On Tue, 2004-09-21 at 23:52, Peter Selinger wrote:
One thing is that I use -O3, whereas you probably use -O2. One should manually inline a lot of functions in this case, to make it faster. This makes a difference of 20-30 percent.
For the sake of the less experienced developers here, I feel compelled to point out that there is an important tradeoff involved:
Manual inlining and similar micro-optimizations generally can't yield order-of-magnitude improvements in performance like algorithmic optimizations can.
Worse, most micro-optimizations have the tendency to obscure patterns in the code which are helpful for making larger optimization and improving the algorithms.
They also really screw up maintainability, and occasionally they can make performance even worse.
For example, an issue raised on the Linux kernel mailing list recently was that the kernel actually performed significantly better when the tiny spinlock functions used throughout the kernel were _not_ inlined. Surprising, but true. The most recent versions of 2.6 have non-inlined spinlocks now.
So, IMO manual inlining and related transformations are only something to do if:
* you've completely debugged the code
* you don't plan on making any more major changes to it
* you've exhausted all the options for better algorithms
* you've measured the performance benefit
* the performance improvement is significant (whether a 20% increase is significant may depend on whether your typical runtime is a few seconds or a few weeks)
* you're a seasoned expert like Peter or myself ^_-
But, you get the idea.
But this cannot be the only reason it is so slow. I compiled the standalone potrace with g++ and -O0 just for comparison, and it still finishes Jon's image in 1.5 seconds.
So, as demonstrated here, even poor compiler options have relatively little impact on ultimate performance, compared to choosing good algorithms -- potrace is very fast even in the presence of pessimal compiler options.
-mental