weird livarot crash

older
RE: [Inkscape-devel] Re : Various...

MenTaLguY

7 Sep 2004 7 Sep '04

6:28 a.m.

I've been working on some of the groundwork for the layers dialog; doing a routine test build, I discovered that Inkscape had started crashing.

Backtrace points to livarot, which I haven't touched at all...

The only part of my new code that is actually currently used is the addition of three sigc++ signals to SPReprDoc; so far as I can tell that should not cause Shape::~Shape() to crash..

Can you all review this patch (attached) and see if I'm missing something obvious? I'm at my wits' end here...

-mental

Attachments:

blah.diff.gz (application/x-gzip — 2.2 KB)
signature.asc (application/pgp-signature — 189 bytes)

Show replies by date

bulia byak

7 Sep 7 Sep

8:20 a.m.

On Tue, 07 Sep 2004 02:28:34 -0400, MenTaLguY <mental@...3...> wrote:

...

I've been working on some of the groundwork for the layers dialog; doing a routine test build, I discovered that Inkscape had started crashing.

You mean, it's crashing with this patch but is OK without it?

I can't test because I get this when trying to compile with your patch:

make: *** No rule to make target `widgets/document-tree-model.cpp', needed by `widgets/document-tree-model.o'. Stop.

Anyway, all I can say is that this seems to me somewhat similar to what we had on Windows recently - misleading tracebacks, sigc connection crashes etc. Maybe some problem with boehm again. Just a guess.

Bob Jamison

1:01 p.m.

I have noticed this, also, that the last few lines of the backtrace are often incorrect, and that the cause might be elsewhere. Maybe this is the real effect of a 'stack corruption', and we are seeing it. Several times recently gdb has reported a segfault in fill-style.cpp, when I -know- that it is not there at all.

Bob

bulia byak wrote:

...

On Tue, 07 Sep 2004 02:28:34 -0400, MenTaLguY <mental@...3...> wrote:

...
I've been working on some of the groundwork for the layers dialog; doing a routine test build, I discovered that Inkscape had started crashing.

You mean, it's crashing with this patch but is OK without it?

I can't test because I get this when trying to compile with your patch:

make: *** No rule to make target `widgets/document-tree-model.cpp', needed by `widgets/document-tree-model.o'. Stop.

Anyway, all I can say is that this seems to me somewhat similar to what we had on Windows recently - misleading tracebacks, sigc connection crashes etc. Maybe some problem with boehm again. Just a guess.

MenTaLguY

8 Sep 8 Sep

1:18 a.m.

On Tue, 2004-09-07 at 09:01, Bob Jamison wrote:

...

I have noticed this, also, that the last few lines of the backtrace are often incorrect, and that the cause might be elsewhere. Maybe this is the real effect of a 'stack corruption', and we are seeing it. Several times recently gdb has reported a segfault in fill-style.cpp, when I -know- that it is not there at all.

I've noticed that a common thread among the various crashes is that they happen in a specifically malloc()-related context (at least the ones I've seen personally, and I think also the sigc++ crashes).

I'm wondering whether something is writing out of bounds and corrupting the malloc headers. I have tried testing for this using ElectricFence, but at least on my machine we allocate too many small blocks with EF to make it all the way through startup. (not a bug on our part, just an annoying limitation of EF)

Anyway, if the malloc data structures are getting corrupted, stack corruption might easily follow on from that.

As far as potential culprits? libgc itself should be immune to corruption as it does not try to write to memory it does not manage, but of course if its own header structures are corrupted, it could theoretically stomp on something else.

Otherwise, the only other (obvious) possibilities I can think of are (in decreasing likelihood):

1) I screwed up the definition of Inkscape::GC::Finalized, and it's not using the right value for 'this' when calling the destructor.

In this case, introducing sigc++ signals to SPReprDoc (derived from Inkscape::GC::Finalized) could well result in heap corruption when the object was destroyed, since the signal destructors try to free memory and their internal pointers would be wrong.

2) livarot is corrupting the malloc headers, and it's pure luck that we have not been caught by it before.

livarot fails to check for array indices of -1 in many places where they may occur. I've caught it reading from such locations in malloc()ed arrays; it's certain that if it ever writes to them it will corrupt the heap.

Such corruption might be hard to detect, and only result in dramatic failures with specific usage patterns (like the allocation sequence introduced by the new sigc++ signals).

-mental

MenTaLguY

1:34 a.m.

On Tue, 2004-09-07 at 21:18, MenTaLguY wrote:

...

I screwed up the definition of Inkscape::GC::Finalized, and it's not

using the right value for 'this' when calling the destructor.

In this case, introducing sigc++ signals to SPReprDoc (derived from Inkscape::GC::Finalized) could well result in heap corruption when the object was destroyed, since the signal destructors try to free memory and their internal pointers would be wrong.

This one doesn't appear to be the problem; I set breakpoints on Inkscape::GC::Finalized::~Finalized() and SPReprDoc::~SPReprDoc(); the crash apparently occurs before either has been called.

(which I should have known; finalized garbage collected objects aren't destroyed until the idle loop)

-mental

MenTaLguY

3:57 a.m.

New subject: MALLOC_CHECK_

Could those of you on a glibc-using platform please try setting the environment variable MALLOC_CHECK_ to 1 or 2 and running Inkscape?

- a value of 1 will print an error message to stderr if malloc is abused

- a value of 2 will call abort() -- the latter is more useful in the debugger probably, so you can see the backtrace

-mental

Carl Hetherington

11:23 a.m.

New subject: MALLOC_CHECK_

On Tue, 7 Sep 2004, MenTaLguY wrote:

...

Could those of you on a glibc-using platform please try setting the environment variable MALLOC_CHECK_ to 1 or 2 and running Inkscape?

a value of 1 will print an error message to stderr if malloc is abused

a value of 2 will call abort() -- the latter is more useful in the

debugger probably, so you can see the backtrace

Hi,

This shows up an overwrite of a malloc()ed array in FlowRes.cpp. I'm just working on a fix now.

Is there anything else that I should test? I haven't been able to find any other problems with random clicking, drawing shapes etc.

Carl

MenTaLguY

9 Sep 9 Sep

4:08 a.m.

New subject: MALLOC_CHECK_

On Wed, 2004-09-08 at 07:23, Carl Hetherington wrote:

...

This shows up an overwrite of a malloc()ed array in FlowRes.cpp. I'm just working on a fix now.

Is there anything else that I should test? I haven't been able to find any other problems with random clicking, drawing shapes etc.

Different people seem to encounter different bugs. Unfortunately MALLOC_CHECK_ is not guaranteed to catch every bug.

Myself, I just fixed a handful of fairly serious bugs in my own code (missing initializations in SPRepr's copy constructor, primarily), but that doesn't appear to have solved all of the heap corruption problems.

-mental

Carl Hetherington

8 Sep 8 Sep

12:26 p.m.

New subject: MALLOC_CHECK_

On Tue, 7 Sep 2004, MenTaLguY wrote:

...

Could those of you on a glibc-using platform please try setting the environment variable MALLOC_CHECK_ to 1 or 2 and running Inkscape?

a value of 1 will print an error message to stderr if malloc is abused

a value of 2 will call abort() -- the latter is more useful in the

debugger probably, so you can see the backtrace

OK, I've fixed two problems highlighted by this check; one in FlowRes.cpp (a buffer overrun whenever doing basically anything with a text object) and one in helper/stock-items.cpp (a bad free(), which appeared when you chose a marker).

I can't find any more, at the moment. Is there any other operation I should try?

Carl

Jon A. Cruz

2:29 p.m.

MenTaLguY wrote:

...

I'm wondering whether something is writing out of bounds and corrupting the malloc headers. I have tried testing for this using ElectricFence, but at least on my machine we allocate too many small blocks with EF to make it all the way through startup. (not a bug on our part, just an annoying limitation of EF)

Can you try valgrind? In general I've had much better luck with that. (It also catches many things that EF doesn't)

Bob Jamison

3:01 p.m.

New subject: Which font engine?

I am looking around for what might be a cause for <text> nodes not being rendered on Win32.

In FontFactory.cpp and FontInstance.cpp, there are numerous #ifdef switches between XFT and Win32. And any time we have different sets of code for different architectures, we have problems maintaining consistent functionality. And bugs that get fixed on one, often do not get fixed on the other.

Wouldn't it be better for us to drop the XFT/Win32 code, and use Freetype2? This level of abstraction means that we can let someone -else- worry about the machine dependent implementation. Or has this already been tried? Just a thought.

Bob

Carl Hetherington

3:20 p.m.

On Wed, 8 Sep 2004, Jon A. Cruz wrote:

...

MenTaLguY wrote:

...
I'm wondering whether something is writing out of bounds and corrupting the malloc headers. I have tried testing for this using ElectricFence, but at least on my machine we allocate too many small blocks with EF to make it all the way through startup. (not a bug on our part, just an annoying limitation of EF)

Can you try valgrind? In general I've had much better luck with that. (It also catches many things that EF doesn't)

valgrind is really unhappy with livarot at the moment; it hits 50k errors before the main window even comes up. But I suppose you could make a suppression profile for the very common one (a use of uninitialized data) and then look for overruns etc. that way.

Cheers

Carl

Carl Hetherington

3:57 p.m.

On Wed, 8 Sep 2004, Carl Hetherington wrote:

...

On Wed, 8 Sep 2004, Jon A. Cruz wrote:

...
MenTaLguY wrote:

...
I'm wondering whether something is writing out of bounds and corrupting the malloc headers. I have tried testing for this using ElectricFence, but at least on my machine we allocate too many small blocks with EF to make it all the way through startup. (not a bug on our part, just an annoying limitation of EF)

Can you try valgrind? In general I've had much better luck with that. (It also catches many things that EF doesn't)

valgrind is really unhappy with livarot at the moment; it hits 50k errors before the main window even comes up. But I suppose you could make a suppression profile for the very common one (a use of uninitialized data) and then look for overruns etc. that way.

I just had a fiddle with this, and the errors that valgrind finds seem ONLY to be those relating to uninitialized memory locations; I couldn't find any heap overwrites. But I only had a quick go; valgrind is very slow on my machine.

Cheers

Carl

MenTaLguY

9 Sep 9 Sep

4:21 a.m.

On Wed, 2004-09-08 at 11:57, Carl Hetherington wrote:

...

I just had a fiddle with this, and the errors that valgrind finds seem ONLY to be those relating to uninitialized memory locations; I couldn't find any heap overwrites. But I only had a quick go; valgrind is very slow on my machine.

Got one:

==2132== Invalid write of size 4 ==2132== at 0x8123898: prefs_get_recent_files() (prefs-utils.cpp:179) ==2132== by 0x811EE5E: sp_menu_append_recent_documents(_GtkWidget*, SPView*) (interface.cpp:640) ==2132== by 0x811E597: sp_ui_menu_append_submenu(_GtkMenu*, SPView*, void (*)(_GtkWidget*, SPView*), char const*, char const*, char const*) (interface.cpp:481) ==2132== by 0x811F076: sp_ui_file_menu(_GtkMenu*, SPDocument*, SPView*) (interface.cpp:678) ==2132== Address 0x1C69CC8C is 0 bytes after a block of size 4 alloc'd ==2132== at 0x1B904EDD: malloc (vg_replace_malloc.c:131) ==2132== by 0x1C3530D6: g_malloc (in /usr/lib/libglib-2.0.so.0.400.2) ==2132== by 0x812382E: prefs_get_recent_files() (prefs-utils.cpp:169) ==2132== by 0x811EE5E: sp_menu_append_recent_documents(_GtkWidget*, SPView*) (interface.cpp:640)

This one is my fault, as I broke sp_repr_n_children() :/

Should be fixed in CVS shortly.

-mental

MenTaLguY

4:29 a.m.

On Thu, 2004-09-09 at 00:21, MenTaLguY wrote:

...

This one is my fault, as I broke sp_repr_n_children() :/

Should be fixed in CVS shortly.

Actually the bug turned out to have been a change local to my tree. Whoops ^^;

At least I know why the patch I posted earler was breaking things (it was due to the sp_repr_n_children breakage).

Now that this is sorted (everything seems stable) I'm going to go ahead and commit the patch.

-mental

MenTaLguY

8 Sep 8 Sep

3:59 p.m.

On Wed, 8 Sep 2004, Jon A. Cruz wrote:

...

Can you try valgrind? In general I've had much better luck with that. (It also catches many things that EF doesn't)

livarot sets valgrind screaming, but previous attempts to fix livarot had broken it (more) horribly and we had to revert the changes.

The livarot warnings are mostly harmless (uninitialized data which is copied, but never actually used), but they drown out anything else that might be going on.

When I attempted to fix that, I did eventually manage to get rid of the uninitialized value warnings, but livarot had also stopped working. I don't know if the remainder of the valgrind warnings (e.g. out of bounds array accesses) at that point were my fault or fred's.

I am trying to fix it again, taking a more gradual approach this time.

-mental

Jon A. Cruz

9 Sep 9 Sep

3:11 p.m.

MenTaLguY wrote:

...

When I attempted to fix that, I did eventually manage to get rid of the uninitialized value warnings, but livarot had also stopped working. I don't know if the remainder of the valgrind warnings (e.g. out of bounds array accesses) at that point were my fault or fred's.

Ooooh. That's possibly bad.

(BTW, this is for the benefit of others, Mental is pretty up on these things)

The most likely thing is some accidental change that broke things.

However... in clearing up warnings of that type, it's possible that some buggy behavior was depended on, and cleaning things up properly "broke" some work-around that was elsewhere. Or just some sloppy code that just accidentally worked.

...

I am trying to fix it again, taking a more gradual approach this time.

Always good.

Of course, getting everyone possible to join in on "zero-warnings" is a good goal. Sometimes we have to be realistic about hitting it, but keeping it in mind helps get there slow and steady.

MenTaLguY

8 Sep 8 Sep

12:51 a.m.

On Tue, 2004-09-07 at 04:20, bulia byak wrote:

...

On Tue, 07 Sep 2004 02:28:34 -0400, MenTaLguY <mental@...3...> wrote:

...
I've been working on some of the groundwork for the layers dialog; doing a routine test build, I discovered that Inkscape had started crashing.

You mean, it's crashing with this patch but is OK without it?

Yes. At least for me.

...

I can't test because I get this when trying to compile with your patch:

make: *** No rule to make target `widgets/document-tree-model.cpp', needed by `widgets/document-tree-model.o'. Stop.

I shouldn't have included the changes to src/widgets/Makefile_insert; you can safely omit them as the code in the missing files is not used by anything yet.

...

Anyway, all I can say is that this seems to me somewhat similar to what we had on Windows recently - misleading tracebacks, sigc connection crashes etc. Maybe some problem with boehm again. Just a guess.

I hope not. If it is a boehm issue we need to ask what we're doing differently than the other projects which have been using it without problems (gcc, for example).

-mental

7631

Age (days ago)

7633

Last active (days ago)

List overview

Download

17 comments

5 participants

tags (0)

participants (5)

Bob Jamison
bulia byak
Carl Hetherington
Jon A. Cruz
MenTaLguY