This is the Lisp-2 mark-compact algorithm shuffled around [0]. The Lisp-2 algorithm doesn't need semispaces, because if you compute forwarding pointers and rewrite references before moving objects, you can compact in the one space just fine. The next field of the GC header isn't necessary as the next object will always be immediately after when bump allocating [1], and rewrite_nested doesn't need to recursively call itself, since the heap traversal will fix references in all objects anyway.
About "Since allocations are expensive due to them requiring a system interaction via syscalls" and "For this only bump allocation is an option, simply because syscalls are slow, multiple for many small objects are even slower and hot paths would explode" there's no reason that a free-list allocator has to do more syscalls than a bump allocator; any malloc under the sun is going to have at least one layer of caching before it makes a syscall for more memory. You could do a doubling scheme like with the bump allocator, or allocate a lot at a time; mimalloc for example requests segments of 4MiB from the kernel [2].
This is the Lisp-2 mark-compact algorithm shuffled around [0]. The Lisp-2 algorithm doesn't need semispaces, because if you compute forwarding pointers and rewrite references before moving objects, you can compact in the one space just fine. The next field of the GC header isn't necessary as the next object will always be immediately after when bump allocating [1], and rewrite_nested doesn't need to recursively call itself, since the heap traversal will fix references in all objects anyway.
About "Since allocations are expensive due to them requiring a system interaction via syscalls" and "For this only bump allocation is an option, simply because syscalls are slow, multiple for many small objects are even slower and hot paths would explode" there's no reason that a free-list allocator has to do more syscalls than a bump allocator; any malloc under the sun is going to have at least one layer of caching before it makes a syscall for more memory. You could do a doubling scheme like with the bump allocator, or allocate a lot at a time; mimalloc for example requests segments of 4MiB from the kernel [2].
[0] It's approximately steps #1 and #3 fused together, then #2 in https://en.wikipedia.org/wiki/Mark%E2%80%93compact_algorithm...
[1] Something like for (void *object = segment->start; object < segment->end; object += size_in_words(object))
[2] https://www.microsoft.com/en-us/research/wp-content/uploads/...