Implementing an overly simple mark-sweep garbage collector (GC) for C.
The GC is implemented as a C library that allows memory allocations
in a specific code region to be managed and automatically freed.
A GC'ed region is marked by a call to gc_begin()
and ends with
a call to gc_end()
. All managed allocations within the GC'ed region
must use gc_malloc()
instead of malloc()
. A call to gc_collect()
ask the GC to free all unused allocations.
The system stack is a contigous block of memory allocated by the system to store temporary data to support the program execution. A stack-pointer keeps track of the top of the stack. Data are added or removed from the top of the stack.
This GC implementation only looks at the stack for life variables.
C uses the system stack to store some local variables and some call
arguments and return values.
We highlighted "some" because not all variables are located in the stack.
For speed, C keeps actively used variables in machine registers and will
only put them in the stack when it runs out of register space.
The same is true for call arguments and return values.
One can write assembly code to force all variables into the stack by manually
saving each registers. But for our GC, we simply call setjmp()
to store
all the register values into the stack. This trick seems to work for now given
that the jmp_buf
is just a C struct that contains a field for each registers
and probably a bit more, but this relies on OS implementation detail.
The alternative is to handwrite some assembly code to store all the registers.
The heap is where dynamic allocated memory are located.
The purpose of the GC is to free up unused allocation in the heap.
Textbooks like simplify things and just say that the heap starts from the
opposite end of the memory space from the stack and grows towards the stack.
In practice, it really depends on the OS and the language.
We will just use it as a black-box and add my own structure on top of
the whatever malloc()
implementation we am given. The gc_malloc
will
allocate extra header space to store the following extra information:
note: In practice, implementing an efficient heap is the hardest part for implementating a GC'ed language.
Global variables (or statically allocated variables) are located in the BSS segment. For simplicity, they are ignored.
The mark phase scans the stack for any life objects, which are represented as pointers on the stack. In the machine level, pointers are just numeric address and is no different from other integer of the same bit size. On a 64-bit arch, we look for 64-bit sized and aligned numbers of the stack and match it against any known allocation inside our heap. Since any value that resembles a valid heap pointer is marked, this GC is conservative. (A GC that knows exactly what is a heap pointer is accurate.) We implement an naive approach that will scan each the heap (using our linked-list) for each possible pointer on the stack.
Recall that this GC is only enabled between gc_begin()
and gc_end()
.
We will only stack the stack address range between the current stack location
and the point when gc_begin()
was called. To read the stack pointer
register, we use the following inline assembly:
void* sp;
asm ("movq %%rsp, %0" : "=r" (sp) );
Inside gc_begin()
, we need to know the stack location before the call
to the function but not the stack location inside the function.
To do that, we need to know that the base pointer for AMD64 points to the
stack memory when the previous stack-pointer is preserved.
Reading from the base pointer (RBP) will get what we need.
For other architecture, check the platform ABI.
During the stack scan, every value that matches a heap allocation is marked.
By marking, we set a bit in the linked-list pointer in the header of the
corresponding allocation.
This is a trick to keep the overhead low by using the lowest bit of the pointer
value to store our mark bit.
This is possible because the the malloc
implementation will always return
memory on even address. (This should be true on all architectures.)
Once the stack is scanned, we repeat the process on all interior memory of each marked heap allocation for reference to other heap allocations.
The sweep phase is relatively simple. We will loop through our heap allocations (using the heap linked-list) and free all unmarked heap allocations.
A Makefile is provided.
The garbage collector is in the "gc.c" and "gc.h" files.
The "test.c" file is the test. Run it with valgrind to verify that the GC worked.
The GC presented here is not a complete nor fast implementation. This solely serves as a simplified study material for those interested in how a GC works.
The code is only tested on AMD64 architecture.