On Jan 15, 2010, at 3:06 AM, Krzysztof KosiĆski wrote:
Using a list will result in completely unacceptable performance. Most actions will be O(n^2) where n is the number of selected nodes, because most actions that work on selected nodes check whether each one is selected. Using a sorted set instead of a hash set / hash map is slightly better - O(n log n), but this structure is also inadequate. The correct alternative is fixing Apple's broken and outdated headers and using a hash set, then we get O(n) on average.
BTW, I did expect the type of list to be a factor here... but you did not cite the important information.
Logically in an abstract world, what statements you've made here are perhaps sufficient. Unfortunately such abstractions can lead to spherical cows
http://en.wikipedia.org/wiki/Spherical_cow
We need to see a bit of the actual numbers involved. Among other things, CPU look-ahead and caching can significantly affect performance... I've seen simple arrays and array walking outperform hash tables/maps contrary to Big-O predictions. (If interested, I think MSDN has a decent article on this).
So the statement "will result" is really more of "may result, in the abstract". The number of items that we might encounter in use probably is a large factor. So we really need to see some rough numbers for min, max and median counts, along with a bit of performance smoke-testing comparing hash-set to some others. Of course, this needs to be in the context of the whole program compiled with the final optimization levels used for release (I generally have my debug builds set to compile with no optimizations, so as to allow for debugging)
You also only seemed to compare one set implementation to another... but there are other data structures that might be employed. So at the abstract level the information there is missing. Now, we do know that the one hashing algorithm used is merely the pointer cast to an int, so we might have a clue as to the shape of the data and the performance of the hashing itself. However, the size of sets involved and the types of operations performed need to be listed too are factors.
(Again, I'm *guessing* that hashed set will help Inkscape's performance. However, without any details we are just guessing... and that is not software engineering)