Re: [Inkscape-devel] Re: C-based garbage collection for Inkscape

2 Dec 2004

      On Wed, 2004-12-01 at 01:07, Keith Packard wrote:
...
...
I have to say I find the use of a "reference stack" in avoiding the need
to examine the stack or registers to be very elegant.
I was much more interested in portability than transparent use, and with 
some practice, it's not that hard to use the macros.  One nice thing is 
that forgettin to use the macros causes no harm, it just delays the 
collection of garbage collected below the function failing to invoke the 
macros.
That's part of the beauty of it -- I am very fond of intrinsically
fail-safe designs.  Sadly they're rather rare in the software world
("defensive programming" doesn't count).
...
...
The one issue that does come to mind is the problem of storing
references to collector-managed objects in malloc-managed memory.
And this is easy enough with the nickle collector -- you create a 
synthetic object and add it as another root in the system.  The 'mark' 
function then walks the malloced memory to reference the gc'able objects.
The malloc'ed storage need never know that it's being used in this fashion.
That wouldn't always be possible; in our case the referencing malloc()ed
memory is often abstracted away in e.g. a glib or sigc++ closure.  In
those instances there is generally no way for such a 'mark' function to
get at it.
The only general solution I've found so far is to create synthetic
"anchor" objects which are attached to a root and contain a refcount and
a pointer to the managed memory.  Usually the interfaces provided by the
closures or whatever are sufficient to manage a refcount.
...
...
That would be exceptionally nice to have, especially as large RGBA image
buffers have a decent chance of looking a lot like arrays of valid
pointers in places.
heh.  One wonders how much of the GC time is spent wandering through 
non-pointer data.  Think of the cache thrashing when you hit something 
like this which isn't explicitly marked as non-pointer data...
Supposedly it isn't a big deal in practice, but I still avoided using it
as a wholesale malloc replacement for that reason.
As things stand right now, GTK and other libraries still use the system
allocator for everything.  In our own code we use either the system
allocator or GC_malloc_atomic() for strings and image data (the latter
function marks the allocated memory as containing non-pointer data).
The boehm collector does also provide a facility for typed allocation,
but with this approach we've never needed to resort to it.  It does some
clever things like blacklisting unallocated ranges while bogus pointers
or pointer-like values
...
...

implementing other allocation interfaces:
array new/delete
an STL allocator

The fundamental problem in these two cases is that no type information
is provided to the allocator -- the only information available is the
raw size of the memory requested.  Figuring out what's actually been put
in the memory and how to mark it gets painful.
For example, if I use the new[] operator to allocate an array of 32
objects of some type (which happens to have a size of 4 bytes), all the
function implementing the operator will be told is that 128 bytes were
requested.  For all it knows, the requested memory might get used for
e.g. 16 8-byte objects.
In the case of an STL allocator, I don't think it's even guaranteed that
the objects placed in the requested memory will be homogenously typed or
sized.  I know it's certainly possible that they will not be
homogenously spaced nor the memory block fully used.
The simple new/delete case isn't really different, but there we could at
least assume that the memory would only contain one object and that its
address would lie at the start of the allocated block, which would let
us find a pointer to its 'mark' function easily enough.
In a nutshell, C++ is really unkind to typed allocators.
...
...

use in automatic variables

Whether stack-allocated objects are a problem depends on the approach
taken in marrying the C++ type system to the allocator's.
Some approaches, like the sketch in my last email (which has some bugs,
so don't read too much into it), place the struct bfree (at least the
type pointer; I am assuming the 'next' pointer was safe to reuse when it
wasn't in a free list) before the start of the object.  In those cases,
a stack-allocated object will be missing the bfree "header".
...
Thanks much for taking time to look at the code; the prototype C++ code 
looks a lot easier to use than the C macros.  Note that you'll still want 
to use the macros in places like:
a () { return new a_type; }
   b () { return new b_type; }
   c () { return new c_type; }
   bar () { operate (a (), b(), c()); }
   foo () { for (i = 0; i < 100; i++) bar (); }
Failing to use the macros in bar will result in a large pile of 
uncollectable garbage until foo returns to some higher level function 
which does use the macros.  This is probably the most annoying part of the 
whole system.
Hmm.  One of the reasons I adopted a garbage collector was to facilitate
writing in a functional style which would unfortunately be susceptible
to this sort of problem.
Also in some cases the equivalent of bar() may be a generic function
written by someone else (e.g. an STL generic algorithm).  In those cases
it might not always be appropriate to set up a reference stack frame,
and we couldn't anyway if we wanted to.
In this specific example, though, I don't believe the macros are
necessary.  I would expect this to work:
void bar() {
   MemStackFrame frame;
   operate(a(), b(), c());
 }
(MemStack would be reset by MemStackFrame::~MemStackFrame() when 'frame'
goes out of scope)
...
Oh, one thing I used to do was erase newly allocated memory.  This was a 
huge simplification as it meant you didn't need to be particularily 
careful about allocation and initialization order.  As it is, any 
potential pointer values in newly allocated data must be initialized 
before the next allocation call is made.  This turned out to be a 
non-trivial source of bugs when I stopped calling memset.  I'm not sure 
it's worth the modest performance improvements myself.
At least it's more portable.  As you probably know, at least on paper
memset() isn't a portable way to initialize pointers.
[ For those following along at home: the compiler will maintain the
invariant NULL == (void *)0 mandated by the C and C++ languages, but on
some (rare) architectures the NULL pointer is not the zero address,
requring gymnastics on the compiler's part which would be defeated by
the use of memset(). ]
In my experience another problem with the allocator zeroing memory is
social: bugs tend to creep in if folks are relying on the allocator for
implicit initialization.  I think this is partly because explicit
initialization forces people to think through the initial state of the
object, and serves something of a documentation role as well.
In Inkscape I actually fill memory allocated for some classes with
garbage before their constructor is called.  It's helped shake out some
subtle bugs that were previously flying below the radar because zero was
close to a sensible value (and pages claimed from the OS are often
initally zero-filled).
Er, forgive my curmudgeonly ranting. ^^;  I still wonder whether there
might be another good conservative solution to the
initialization/allocation ordering problem...
-mental

Re: [Inkscape-devel] Re: C-based garbage collection for Inkscape

MenTaLguY