Mesh: Compacting Memory Management for C/C++ Applications

Source: PLDI 2019 paper — Mesh: Compacting Memory Management for C/C++ Applications

Implementation: Github

I am not the author, this post is just a quick reading summary for the paper.

For the readers who has a bit knowledge of System Programming.

Introduction

Programs written in C/C++ can suffer from serious memory fragmentation, leading to low utilization of memory, degraded performance, and application failure due to memory exhaustion. This paper introduces Mesh, a plug-in replacement for malloc that, for the first time, eliminates fragmentation in unmodified C/C++ applications.

Mesh generally matches the runtime performance of state-of-the-art memory allocators while reducing memory consumption; in particular, it reduces the memory of consumption of Firefox by 16% and Redis by 39%.

Background

malloc()

malloc() is a user-space interface for C/C++ programmers to allocate memories from heap, and it requires users to “free” it manually.

glibc malloc() ← ptmalloc()

In user-space, it uses a free-list to manage the allocated memory chunks. Memory chunks are added to free-list when the user invokes free(), but the chunks will not be freed immediately. If neighbors of the chunk can be merged, it may be merged for reducing the fragmentation. If the required memory size cannot be found in the free-list, it uses brk()/sbrk() to acquire desired memory from kernel.

For more details in chinese.

Other malloc() replacement

  • tcmalloc() from Google
  • jemalloc() from Facebook

Programming Language

Some languages features compacting garbage collection, such as Java, LISP. Other languages like Rust and Go currently does not support compacting garbage collection, and the authors of Mesh claim they may try to integrate into it.

Key Ideas

Mesh

The key idea: Mesh

Random Allocation

A key threat to meshing is that pages could contain objects at the same offset, preventing them from being meshed. In the worst case, all spans would have only one allocated object, each at the same offset, making them non-meshable. Mesh employs randomized allocation to make this worst-case behavior exceedingly unlikely. It allocates objects uniformly at random across all available offsets in a span. As a result, the probability that all objects will occupy the same offset is (1/b)^(n−1), where b is the number of objects in a span (page), and n is the number of spans.

Implementation

  • Mesh = MiniHeaps + Shuffle Vectors + Thread Local Heaps + Global Heap + Meshing
  • MiniHeaps

MiniHeaps manage allocated physical spans of memory and are either attached or detached. An attached MiniHeap is owned by a specific thread-local heap, while a detached MiniHeap is only referenced through the global heap. New small objects are only allocated out of attached MiniHeaps. Each MiniHeap contains metadata that comprises span length, object size, allocation bitmap, and the start addresses of any virtual spans meshed to a unique physical span.

  • Shuffle Vectors
  • Thread Local Heaps

All malloc and free requests from an application start at the thread’s local heap. Allocation requests are handled differently depending on the size of the allocation. If an allocation request is larger than 16K, it is forwarded to the global heap for fulfillment. Allocation requests 16K and smaller are small object allocations and are handled directly by the shuffle vector for the size class corresponding to the allocation request. If the shuffle vector is empty, it is refilled by requesting an appropriately sized MiniHeap from the global
heap.

  • Global Heap

The global heap allocates MiniHeaps for thread-local heaps, handles all large object allocations, performs non-local frees for both small and large objects, and coordinates meshing.

  • Meshing

Meshing is rate limited by a configurable parameter, settable at program startup and during runtime by the application through the semi-standard mallctl API. The default rate meshes at most once every tenth of a second. If the last meshing freed less than one MB of heap space, the timer is not restarted until a subsequent allocation is freed through the global heap. This approach ensures that Mesh does not waste time searching for meshes when the application and heap are in a steady state.

The concept just like the Figure 1 as above.

Related OS APIs in the implementation

  • mmap
  • fallocate (FALLOC_FL_PUNCH_HOLE)
  • mallctl
  • memfd_create
  • mprotect

Evaluation

For a number of memory-intensive applications, including aggressively space-optimized applications like Firefox, Mesh can substantially reduce memory consumption (by 16% to 39%) while imposing a modest impact on runtime performance (e.g., around 1% for Firefox and SPECint 2006).

Conclusion

Mesh, a memory allocator that efficiently performs compaction without relocation to save memory for unmanaged languages.

This paper shows analytically that Mesh provably avoids catastrophic memory fragmentation with high probability, and empirically show that Mesh can substantially reduce memory fragmentation for memory intensive applications written in C/C++ with low runtime overhead.

--

--

A system software engineer from Taiwan. The posts are about my daily reading and polishing my English skills.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
張家榮 Tiba Chang

A system software engineer from Taiwan. The posts are about my daily reading and polishing my English skills.