![](https://secure.gravatar.com/avatar/eb3fe37da4a199eb4e3b479d8a57f808.jpg?s=120&d=mm&r=g)
Talking to Simarilius, it looks like the version of libgc we've been using on Win32 has been built to replace the standard malloc.
This should in principle be harmless (except that the collector will be scanning a lot of memory it does not need to, and might mistake e.g. bitmap data for pointers), but
Could someone try a version of libgc that has been built without the malloc-replacement option and report back?
A different problem may be that Win9x and WinNT/2k/XP could require different builds of libgc.
-mental
![](https://secure.gravatar.com/avatar/eb3fe37da4a199eb4e3b479d8a57f808.jpg?s=120&d=mm&r=g)
On Fri, 2004-08-27 at 14:37, MenTaLguY wrote:
Talking to Simarilius, it looks like the version of libgc we've been using on Win32 has been built to replace the standard malloc.
This should in principle be harmless (except that the collector will be scanning a lot of memory it does not need to, and might mistake e.g. bitmap data for pointers), but
Hrm. Didn't finish my thought there.
The thing is, I trust the boehm collector on Win32, provided we use it conservatively, as it's been very well tested by other projects before us (e.g. currently Mono is using it exclusively on Win32 for memory allocation).
However, I believe less conservative usages (like attempting to override the system allocator with it) are problematic.
If the override isn't perfect, for example, we could end up with memory allocated by the Win32 heap allocation API passed to GC_FREE(), or one of the Win32 heap freeing functions getting passed a gc-managed pointer.
That would be consistent with the symptoms and backtraces we appear to be seeing with the Win32 crashes; dying either in GC_free() or in a Win32 heap free function.
-mental
![](https://secure.gravatar.com/avatar/8d5128b5b838ecedc34635fba7995f7f.jpg?s=120&d=mm&r=g)
On chat mental and bulia spoke of plans for solving the windows crash issues. It's possibly due to one of the new dependencies. The tricky part is that since none of us do builds on windows, it's hard to troubleshoot, especially since stuff changes quite a bit in the codebase on a daily basis. However we have a plan.
What we'd like to do is temporarily go into a development freeze until this issue is fixed. The focus for developers should be on fixing the critical bugs, memory leaks identified by gc, and all platform build issues. We want to get back to a point where all developers can confidently build and run Inkscape again.
The steps for eliminating libgc as a source of the issues are: 1) test with the existing malloc-overriding libgc build, but USE_LIBGC undefined in src/gc-core.h 2) test with a fixed libgc build, but re-enable USE_LIBGC 3) test with a fixed libgc build, but disable USE_LIBGC again 4) test without libgc linked at all, and with USE_LIBGC disabled
If we still have the problem at #4, it's not libgc-related.
We also need to have people test each of the nightly builds, going back in time until we find the first Win32 CVS version that crashes, so we can do a diff to find out which changes went into that version.
Once we've narrowed it to the smallest set, we can determine the next steps. Hopefully we can get this problem licked quickly, via this approach.
Unless anyone has a major concern about doing this, let's plan on starting the freeze tomorrow (say, 12 hrs from right now). If we can't get good progress on this within a couple days, we'll de-freeze and figure something else out.
Bryce
On Fri, 27 Aug 2004, MenTaLguY wrote:
On Fri, 2004-08-27 at 14:37, MenTaLguY wrote:
Talking to Simarilius, it looks like the version of libgc we've been using on Win32 has been built to replace the standard malloc.
This should in principle be harmless (except that the collector will be scanning a lot of memory it does not need to, and might mistake e.g. bitmap data for pointers), but
Hrm. Didn't finish my thought there.
The thing is, I trust the boehm collector on Win32, provided we use it conservatively, as it's been very well tested by other projects before us (e.g. currently Mono is using it exclusively on Win32 for memory allocation).
However, I believe less conservative usages (like attempting to override the system allocator with it) are problematic.
If the override isn't perfect, for example, we could end up with memory allocated by the Win32 heap allocation API passed to GC_FREE(), or one of the Win32 heap freeing functions getting passed a gc-managed pointer.
That would be consistent with the symptoms and backtraces we appear to be seeing with the Win32 crashes; dying either in GC_free() or in a Win32 heap free function.
-mental
![](https://secure.gravatar.com/avatar/eb3fe37da4a199eb4e3b479d8a57f808.jpg?s=120&d=mm&r=g)
On Fri, 2004-08-27 at 21:04, Bryce Harrington wrote:
We also need to have people test each of the nightly builds, going back in time until we find the first Win32 CVS version that crashes, so we can do a diff to find out which changes went into that version.
For those of you following along at home, a binary search is probably the best way of going about this; find two builds that are reasonably far apart, where the oldest one doesn't fail, but the newest one does. The offending changes are somewhere in-between.
Next, try a build halfway in-between the first two you checked; if it fails, then the breakage happened somewhere in the first half, otherwise someplace in the second. Repeat the process within that smaller interval, and so forth... eventually it'll be narrowed down to one day.
-mental
![](https://secure.gravatar.com/avatar/776e393a53791abccf09321ff2a63186.jpg?s=120&d=mm&r=g)
MenTaLguY <mental@...360...> writes:
On Fri, 2004-08-27 at 21:04, Bryce Harrington wrote:
We also need to have people test each of the nightly builds, going back in time until we find the first Win32 CVS version that crashes, so we can do a diff to find out which changes went into that version.
For what it is worth.. I used to take autobuilds from http://cortijodelrio.net/~inkscape/win32/ and found that the last one I would work (startup without crashing) was about Aug 5th 2004. After vainly trying builds daily, I gave this Thursday when I came across Bob Jamison's post (http://troi.hous.es3.titan.com/~rjamison/inkscape/builds ) . I then tried building it myself (on Windows, not X-compiling). I was able to get the build to startup OK, Most new things I was able to try out and they work(like clones, randomization of stars). But all the text objects are invisible/missing and I could not create new ones too.
So, there you have it ... a rough date to focus on.
Sudhan
![](https://secure.gravatar.com/avatar/650e8f686572eb00b7b445fe657f222a.jpg?s=120&d=mm&r=g)
--- Sudhan <rsudharsan@...19...> wrote:
MenTaLguY <mental@...360...> writes:
On Fri, 2004-08-27 at 21:04, Bryce Harrington wrote:
We also need to have people test each of the nightly builds,
going back
in time until we find the first Win32 CVS version that crashes,
so we
can do a diff to find out which changes went into that version.
For what it is worth.. I used to take autobuilds from http://cortijodelrio.net/~inkscape/win32/ and found that the last one I would work (startup without crashing) was about Aug 5th 2004. After vainly trying builds daily, I gave this Thursday when I came across Bob Jamison's post (http://troi.hous.es3.titan.com/~rjamison/inkscape/builds ) . I then tried building it myself (on Windows, not X-compiling). I was able to get the build to startup OK, Most new things I was able to try out and they work(like clones, randomization of stars). But all the text objects are invisible/missing and I could not create new ones too.
So, there you have it ... a rough date to focus on.
Sudhan
This just confuses the heck out of me, as native builds on win2K will not get past startup for me unless run through gdb. I just checked out a clean copy, redownloaded the libs, and compiled afresh to check, and there is no way CVS wants to run on my box. have done make clean and am now trying without USE_LIBGC defined.
John
__________________________________ Do you Yahoo!? New and Improved Yahoo! Mail - Send 10MB messages! http://promotions.yahoo.com/new_mail
![](https://secure.gravatar.com/avatar/776e393a53791abccf09321ff2a63186.jpg?s=120&d=mm&r=g)
John Cliff wrote:
This just confuses the heck out of me, as native builds on win2K will not get past startup for me unless run through gdb. I just checked out a clean copy, redownloaded the libs, and compiled afresh to check, and there is no way CVS wants to run on my box. have done make clean and am now trying without USE_LIBGC defined.
In my case, it definitely compiles and runs allright (I have tried it many times (clean/make/dist) now; with the exception of the text entities. Even there, I figured just now that I am able to select these "invisible" entities and I can even see them if I set them up for "vertical" orientation (!!). In this orientation, all the letters are overlap onto each other.
BTW, I could not compile the stuff without definition of USE_LIBGC
If it will help someone debug, I can provide the distribution. I am on W2K.
Sudhan
![](https://secure.gravatar.com/avatar/650e8f686572eb00b7b445fe657f222a.jpg?s=120&d=mm&r=g)
--- Bryce Harrington <bryce@...260...> wrote:
We also need to have people test each of the nightly builds, going back in time until we find the first Win32 CVS version that crashes, so we can do a diff to find out which changes went into that version.
for me using the autobuilds to track it, last working build was the one from the 4th, the first broken one was the 6th, the autobuild didnt run on the 5th.
_______________________________ Do you Yahoo!? Win 1 of 4,000 free domain names from Yahoo! Enter now. http://promotions.yahoo.com/goldrush
![](https://secure.gravatar.com/avatar/eb3fe37da4a199eb4e3b479d8a57f808.jpg?s=120&d=mm&r=g)
On Sat, 2004-08-28 at 05:45, John Cliff wrote:
--- Bryce Harrington <bryce@...260...> wrote:
We also need to have people test each of the nightly builds, going back in time until we find the first Win32 CVS version that crashes, so we can do a diff to find out which changes went into that version.
for me using the autobuilds to track it, last working build was the one from the 4th, the first broken one was the 6th, the autobuild didnt run on the 5th.
Okay, I've been looking over the deltas between the 4th and 5th, and the 5th and 6th.
Here is what changed:
August 4th -> August 5th:
* I made Inkscape::Refcounted a subclass of Inkscape::GC::FinalizedObject, which primarily affected SPSelection and the few NR::Object-derived classes and added -lgc to the compiler flags
* New keybindings were added to inkview
* a range bug was fixed in src/draw-context.cpp
* the "vaccuum defs" functionality was added
* peter documented various correctness proofs
* some changes were made to SPPath to (theoretically) cope with a missing d= attribute
* large changes were made to SPText which I do not fully understand
* SVG preview and some other stuff was added to the file dialog
* src/helper/helper-forward.h no longer includes src/display/display-forward.h
August 5th -> August 6th:
* -lgc was added to the Win32 makefile (its absence was why the August 5th autobuild failed)
* peter made some fixes to src/display/bezier-utils.cpp
Based on this information, I think it is fairly certain that the crashes are libgc-related.
Based on the analysis that we have done on Jabber over the past few days, it appears that libgc is attempting to take over the system new/delete functionality, so all code that allocates memory is affected.
Interestingly, it appears that things actually got worse when we undefined USE_LIBGC (which causes Inkscape::GC to use the system new/delete rather than the libgc allocator) [item 1 on the checklist I proposed to bryce].
This suggests to me that libgc proper is working fine; it is the system allocator that has been broken by libgc's attempt at takeover.
Since "malloc takeover" is not required, I suggest we make our own build of libgc that does not include it (I am fairly certain it is not the default). That sort of magic is always dicey and error-prone to begin with.
This hypothesis can be confirmed by completing the checklist:
1) existing libgc without USE_LIBGC: should fail 2) "fixed" libgc with USE_LIBGC: should work 3) "fixed" libgc without USE_LIBGC: should work 4) no libgc at all, also without USE_LIBGC: should work
-mental
![](https://secure.gravatar.com/avatar/8d5128b5b838ecedc34635fba7995f7f.jpg?s=120&d=mm&r=g)
On Sat, 28 Aug 2004, MenTaLguY wrote:
Okay, I've been looking over the deltas between the 4th and 5th, and the 5th and 6th.
Here is what changed:
Based on this information, I think it is fairly certain that the crashes are libgc-related.
Yup, sounds like a smoking gun.
Based on the analysis that we have done on Jabber over the past few days, it appears that libgc is attempting to take over the system new/delete functionality, so all code that allocates memory is affected.
Interestingly, it appears that things actually got worse when we undefined USE_LIBGC (which causes Inkscape::GC to use the system new/delete rather than the libgc allocator) [item 1 on the checklist I proposed to bryce].
This suggests to me that libgc proper is working fine; it is the system allocator that has been broken by libgc's attempt at takeover.
Since "malloc takeover" is not required, I suggest we make our own build of libgc that does not include it (I am fairly certain it is not the default). That sort of magic is always dicey and error-prone to begin with.
I think this would be a good solution especially due to the fact that gc must be compiled with the --enable-cplusplus option, which it appears few if any pre-packaged versions of gc have, anyway.
If the gc package were smaller, I'd even be tempted to suggest incorporating it into the inkscape codebase, but at over 200 files I think that would be way too much bloat.
This hypothesis can be confirmed by completing the checklist:
- existing libgc without USE_LIBGC: should fail
- "fixed" libgc with USE_LIBGC: should work
- "fixed" libgc without USE_LIBGC: should work
- no libgc at all, also without USE_LIBGC: should work
Sounds like we're making good progress narrowing it down.
Looking at the gc codebase, there's some platform-specific code in it, so it would not be a surprise to find it working fine on Linux but having issues on Windows... Do we know if it crashes on all variants of Windows, or specific versions (like only on XP?)
There's a gctest program included - perhaps someone on Windows could compile and run this to see if it produces anything informative? There's also some todo's at the end of the doc/README.changes. There's also a BUGS section in the README.
Bryce
![](https://secure.gravatar.com/avatar/eb3fe37da4a199eb4e3b479d8a57f808.jpg?s=120&d=mm&r=g)
On Sat, 2004-08-28 at 17:30, Bryce Harrington wrote:
There's a gctest program included - perhaps someone on Windows could compile and run this to see if it produces anything informative?
We definitely ought to do that, on OS X and Linux too if possible.
-mental
![](https://secure.gravatar.com/avatar/eb3fe37da4a199eb4e3b479d8a57f808.jpg?s=120&d=mm&r=g)
On Sat, 2004-08-28 at 17:30, Bryce Harrington wrote:
Looking at the gc codebase, there's some platform-specific code in it, so it would not be a surprise to find it working fine on Linux but having issues on Windows... Do we know if it crashes on all variants of Windows, or specific versions (like only on XP?)
Looking at this I don't think it's libgc crashing -- rather it's libgc cutting the legs out from under the system malloc, which then crashes.
-mental
![](https://secure.gravatar.com/avatar/58a5209b61cc09379468fe48275f2ddf.jpg?s=120&d=mm&r=g)
On Sat, 28 Aug 2004 14:30:17 -0700, Bryce Harrington wrote:
I think this would be a good solution especially due to the fact that gc must be compiled with the --enable-cplusplus option, which it appears few if any pre-packaged versions of gc have, anyway.
If the gc package were smaller, I'd even be tempted to suggest incorporating it into the inkscape codebase, but at over 200 files I think that would be way too much bloat.
The autopackages of 0.40 already statically link libgc for exactly this reason: on the few systems that already have it installed, it usually lacks C++ support. On the tiny number of systems that have it with C++ support, the recent C++ ABI breakage would stop it from being usable anyway.
I'm pondering static linking GTKmm as well, again because there's no stable C++ ABI on Linux yet. The DSO is only about 700k, and in this case I'm not sure the effort of having separate packages, C++ ABI detection, rpath mangling and so on is really worth it.
thanks -mike
![](https://secure.gravatar.com/avatar/dc940f48c5635785f32941f1fbe6b601.jpg?s=120&d=mm&r=g)
Mike Hearn wrote:
I'm pondering static linking GTKmm as well, again because there's no stable C++ ABI on Linux yet.
I thought that in general this was a C++ design choice.
If everything was built with gcc, then all might be better, but since different compilers from different vendors (Sun, gcc, MSVC, Borland...) come into play, C++ was intended to be incompatible in this regard. That's long been a reason for favoring C API's over C++.
Have they finally cleaned up that mess, especially on Windows?
![](https://secure.gravatar.com/avatar/eb3fe37da4a199eb4e3b479d8a57f808.jpg?s=120&d=mm&r=g)
On Mon, 30 Aug 2004, Jon A. Cruz wrote:
If everything was built with gcc, then all might be better, but since different compilers from different vendors (Sun, gcc, MSVC, Borland...) come into play, C++ was intended to be incompatible in this regard. That's long been a reason for favoring C API's over C++.
Well, there is finally an official C++ ABI, and gcc decided to adopt it.
The problem is that they are apparently approaching it asymptotically.
-mental
participants (6)
-
Bryce Harrington
-
John Cliff
-
Jon A. Cruz
-
MenTaLguY
-
Mike Hearn
-
Sudhan