On 09/24/2009 08:52 AM, Jasper van de Gronde wrote:
Diederik van Lierop wrote:
... For now I've fixed my patch (see the attachement) by limiting it to the non-MMX c++ code. This fixes the bitmap pixel alignment issue for some users, but not for everyone. There must be someone reading this who can port this to the assembly code too. It really is only a few lines of code! Until then this bug cannot be completely solved, .... unless we start using Cairo (?)
I would recommend timing to see whether the MMX code is (still) really much faster than the normal C++ code. If it's not significantly faster than the old 32bit C++ code I would go with the higher quality code.
As for porting the code to assembly, if it is indeed making use of MMX I have doubts about the feasibility/usefulness of porting the code, as the assembly would likely not be as efficient for 64bit integers as it would be for 32bit integers (assuming the right instructions do exist). But of course this would have to be tested.
Yes, valid points indeed! So I prepared a patch to do some measurements and got some interesting results: when running Fedora 10 i386 virtualized on a Fedora 10 x86_64 host, the non-mmx code actually appears to be faster by 10 - 15%! But before we decide to kick out some of the mmx optimalizations (only the ones that are being tested here), we would need more measurements because my setup might not be representative. So to all the bug reporters who had their images disappearing due to my previous patch: I know you're running with the mmx optimized code ;-). Please test the attached patch against rev. 22287 and report back the measurement data. This is what you should get on the console:
... avg. time spent rendering; with mmx -> 4.10622 msec | without mmx -> 3.5991 msec avg. time spent rendering; with mmx -> 4.10563 msec | without mmx -> 3.59859 msec avg. time spent rendering; with mmx -> 4.10459 msec | without mmx -> 3.59769 msec
Only the last line is relevant.
PS: Don't use this patch for production work, because everything will be rendered twice (with and without mmx optimalisations). This will slow down the rendering. I also tried running the non-mmx optimized code before the other to eliminate any caching effects, but the results look similar
If we can get rid of this specific mmx optimized assembly code, then we can solve the misalignment in rendering the bitmaps for everyone!
Kind regards,
Diederik
Index: src/libnr/nr-compose-transform.cpp =================================================================== --- src/libnr/nr-compose-transform.cpp (revision 22287) +++ src/libnr/nr-compose-transform.cpp (working copy) @@ -345,14 +345,37 @@ }
#ifdef WITH_MMX - if (NR_PIXOPS_MMX) { - /* WARNING: MMX composer REQUIRES w > 0 and h > 0 */ - nr_mmx_R8G8B8A8_P_R8G8B8A8_P_R8G8B8A8_N_TRANSFORM_n (px, w, h, rs, spx, sw, sh, srs, FFd2s, FF_S, alpha, dbits); - return; - } + GTimeVal t1, t2, t3; + static double total_mmx = 0; + static double total_no_mmx = 0; + static double n = 0; + + if (NR_PIXOPS_MMX) { + g_get_current_time(&t1); + + /* WARNING: MMX composer REQUIRES w > 0 and h > 0 */ + nr_mmx_R8G8B8A8_P_R8G8B8A8_P_R8G8B8A8_N_TRANSFORM_n (px, w, h, rs, spx, sw, sh, srs, FFd2s, FF_S, alpha, dbits); + + g_get_current_time(&t2); + } #endif - nr_R8G8B8A8_P_R8G8B8A8_P_R8G8B8A8_N_TRANSFORM_n (px, w, h, rs, spx, sw, sh, srs, FFd2s, FF_S, alpha, dbits); - } + nr_R8G8B8A8_P_R8G8B8A8_P_R8G8B8A8_N_TRANSFORM_n (px, w, h, rs, spx, sw, sh, srs, FFd2s, FF_S, alpha, dbits); +#ifdef WITH_MMX + if (NR_PIXOPS_MMX) { + g_get_current_time(&t3); + double dt12 = ((t2.tv_sec - t1.tv_sec) * G_USEC_PER_SEC + (t2.tv_usec - t1.tv_usec)) / 1000.0; + double dt23 = ((t3.tv_sec - t2.tv_sec) * G_USEC_PER_SEC + (t3.tv_usec - t2.tv_usec)) / 1000.0; + total_mmx += dt12; + total_no_mmx += dt23; + n++; + std::cout << "avg. time spent rendering; with mmx -> " << total_mmx/n << " msec | without mmx -> " << total_no_mmx/n << " msec" << std::endl; + } else { + std::cout << "sorry, no NR_PIXOPS_MMX" << std::endl; + } +#else + std::cout << "sorry, no WITH_MMX" << std::endl; +#endif + } }
void nr_R8G8B8A8_P_R8G8B8A8_P_R8G8B8A8_P_TRANSFORM (unsigned char *px, int w, int h, int rs,