The Buddy Allocator
Explaining memory management can be tricky, so I've created this document to break down the algorithm: the Buddy Allocator.
What is the Buddy Allocator?
The Buddy Allocator is a memory management algorithm that takes charge of a large, contiguous region of physical memory. It acts like a librarian for your computer's RAM, handing out blocks of memory when requested and tidying them up when they're returned. As a core kernel component, it works directly with physical addresses, the actual hardware addresses of your RAM.
Why "Buddy"?
The algorithm gets its name from how it manages memory. It works on a simple principle: to get a small block of memory, you take a large block and split it in half. These two halves are now "buddies." This splitting process continues until a block of the perfect size is created.
When a block is freed, the allocator checks to see if its buddy is also free. If it is, they are instantly merged back into their original, larger block. This buddy system is a clever way to combat memory fragmentation, ensuring that large, contiguous blocks of memory are available when needed.
The Struct
Our Buddy Allocator has the following two structs:
#define PAGE_SIZE 4096 // The size of a single page, the smallest allocation unit.
#define MAX_ORDER 32 // The theoretical maximum order. The actual max is calculated at runtime.
#define MIN_ORDER 0 // The smallest order, representing a single page (2^0 pages).
// A node in a free list, stored within the free block itself.
typedef struct buddy_node {
struct buddy_node *next;
} buddy_node_t;
typedef struct {
uintptr_t base; // Starting physical address of the managed memory.
size_t size; // Total size of the managed memory in bytes.
buddy_node_t *free_lists[MAX_ORDER + 1]; // Array of free lists, indexed by block order.
uint8_t *map; // Bitmap to track block states for efficient merging.
size_t map_size; // Size of the state bitmap in bytes.
uint8_t max_order; // Actual max order, calculated from the total memory size.
} buddy_allocator_t;
My kernel creates a pre-defined buddy struct in the kernel that will manage the whole available physical memory.
The Math
The buddy allocator doesn't work with page counts directly. Instead, it uses a concept called order to manage block sizes. The order is simply the exponent in a power-of-two calculation.
The number of pages in a block is calculated as \(2^{\text{order}}\).
- Order 0: \(2^0 = 1\) page
- Order 1: \(2^1 = 2\) pages
- Order 2: \(2^2 = 4\) pages
- Order 3: \(2^3 = 8\) pages
To find the maximum order for a given amount of memory, we use the inverse of this formula: the base-2 logarithm.
$$\text{max_order} = \lfloor \log_2(\text{total_pages}) \rfloor$$
An example
Let's say we have 4 MiB of available memory and a page size of 4 KiB (4096 bytes).
-
Calculate Total Pages: $$\text{total_pages} = \frac{4 \text{ MiB}}{4 \text{ KiB}} = \frac{4 \times 1024 \times 1024}{4096} = \frac{4,194,304}{4096} = 1024 \text{ pages}$$
-
Calculate the Order: $$\text{max_order} = \lfloor \log_2(1024) \rfloor = 10$$
This tells us that our 4 MiB memory region can be managed as a single, top-level block of order 10.
On intialisation
Each variable in the struct needs to be initialised. The memory layout will be as follows:
Low Address
------------------------------------
| buddy_allocator_t struct |
|----------------------------------|
| Bitmap | <-- Metadata
|----------------------------------|
| (Optional padding for alignment) |
|----------------------------------|
| Managed Memory Region | <-- buddy_allocator_t->base points here
| ... |
| ... |
------------------------------------
High Address
As you can see, the struct and bitmap of the Buddy Allocator are located
before the usable memory region that the buddy allocator will use.
The start of this memory region is represented by base
.
The Bitmap
At the heart of my buddy allocator is a simple but powerful data structure:
the bitmap. While the free_lists
tell us if a block of a certain size
is available, the bitmap is important for managing the memory space efficiently,
especially when freeing memory.
The bitmap's primary job is to track the state of memory blocks to make merging buddies incredibly fast. Every bit in the map corresponds to a single page of physical memory.
- A
0
means the page is part of a larger, free block. - A
1
means the page is part of a block that is currently allocated or has been split into smaller chunks.
Let's see it in action. Imagine we have 16 pages of memory. When nothing is allocated, our bitmap is all zeroes.
Initial State (All Free):
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
Now, let's say a program requests a 4-page block (order 2). We allocate pages 4, 5, 6, and 7. The bitmap is updated to mark these pages as "in-use".
After Allocating 4 Pages:
Bitmap: [0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0]
Pages: 0 1 2 3 4 5 6 7 8 9 ...
^-----^
Allocated
Merging
So, what happens when we free this 4-page block? To keep our memory from becoming fragmented, we want to merge the newly freed block with its "buddy" if the buddy is also free.
The "buddy" is the adjacent block of the same size. For our block at page 4 (which is of order 2), its buddy is the block starting at page 0.
To check if we can merge, the allocator performs these steps:
- Identify the Buddy: It calculates the buddy's starting address.
- Check the Bitmap: It looks at the bitmap to check the buddy's status. It only needs to check the first bit of the buddy block (the bit for page 0).
- Make a Decision:
- If the buddy's bit is
1
, the buddy is still in use. We can't merge. The freed block is simply added to thefree_list
. - If the buddy's bit is
0
, the buddy is free. We can merge them into a single, larger 8-page block.
- If the buddy's bit is
In our case, the bit for page 0 is 0
, so we can merge.
The allocator combines the block from pages 0-3 and our newly freed block from
pages 4-7 into one large, 8-page block.
After Deallocation and Merging:
Bitmap: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
The bitmap allowed us to instantly know the state of the buddy without searching through complicated data structures. This process is repeated up the chain, allowing small blocks to merge back into the largest possible free chunks, keeping the memory ready for future allocations.
Programming the bitmap:
Programming the bitmap is straightforward. We get the start and end addresses of the available physical memory and calculate how many pages fit within that range. If 16 pages fit in the available physical memory it means we need to create a bitmap of 16 bits (2 bytes).
uintptr_t start_addr = (uintptr_t)get_next_free_addr();
uintptr_t end_addr = (uintptr_t)get_heap_end_addr();
size_t num_pages = (end_addr - start_addr) / PAGE_SIZE;
size_t map_size_needed = CEIL_DIV(num_pages, 8);
g_buddy.map_size = ALIGN(map_size_needed, sizeof(size_t));
g_buddy.map = (uint8_t *)memblock(g_buddy.map_size);
// Check if map exists
memset(g_buddy.map, 0, g_buddy.map_size);
The base & size
The base
& size
is the second section we will need to figure out.
With the help of the bitmap we can find the starting address for base
.
Ensure that base
is aligned with your PAGE_SIZE
.
Take the end_addr
and calculate how big your heap will be for the Buddy Allocator.
g_buddy.base = ALIGN((uintptr_t)g_buddy.map + g_buddy.map_size, PAGE_SIZE);
g_buddy.size = end_addr - g_buddy.base;
// Check if size is smaller than PAGE_SIZE (MIN_ORDER).
With the bitmap, base
, and size
established,
we have enough information to set up the rest of the buddy allocator.
The max order
The default max order on my kernel is 32 (4 294 967 296 bytes),
which is impossible to reach, since we are working on a x86_32
machine.
We calculate the max_order
with our current amount of memory with the following formula:
$$O_{\text{max}} = \lfloor \log_2\left(\frac{S_{\text{total}}}{S_{\text{page}}}\right) \rfloor$$
If the requested allocation is over the max_order
, it can be safely discarded.
The free_list
The free_list
is the most important part of the buddy allocator's allocation strategy.
It's structured as an array where each index corresponds to a block order (size).
Each element of the array is the head of a singly linked list containing all the
free blocks of that specific size. Since the blocks are free,
we cleverly use the memory of the block itself to store a pointer to the next free block,
requiring no extra memory.
An example
Imagine our system has just started, and all the memory is in one giant, free block.
If our maximum size is an order 9 block (\(2^9 = 512\) pages), our free_list
looks like this:
Free List: [0, 0, 0, 0, 0, 0, 0, 0, 0, 1]
Order: 0 1 2 3 4 5 6 7 8 9
Now, a program asks for a tiny order 0 block (1 page). The allocator follows these steps:
- Check Order 0: It looks in
free_list[0]
and finds it empty. - Find a Bigger Block: It searches up the orders until it finds the first non-empty list, which is at order 9.
- Split the Block: The order 9 block is too big, so the allocator splits it. This creates two "buddy" blocks of order 8. One is kept to continue the process, and the other is placed in the order 8 free list.
Free List: [0, 0, 0, 0, 0, 0, 0, 0, 1, 0]
Order: 0 1 2 3 4 5 6 7 8 9
-
Repeat: This process repeats. The order 8 block is split into two order 7s. One is kept, and the other is added to the
free_list[7]
. This continues all the way down until we finally create two order 0 blocks. -
Fulfill the Request: Once we have two order 0 blocks, the allocator gives one to the user and places the other on the
free_list[0]
.
After fulfilling the request for one order 0 block, our free_list
now has a
free block available at every order except order 9.
Free List: [1, 1, 1, 1, 1, 1, 1, 1, 1, 0]
Order: 0 1 2 3 4 5 6 7 8 9
This splitting mechanism ensures that even if only large blocks are free, the allocator can efficiently generate a block of any required size.
Implementation
A key challenge in initializing the buddy allocator is that the total amount of available physical memory is rarely a perfect power of two. If we have 7 pages of memory, for example, we can't create a single 8-page block.
To handle this, the initialization code "carves" the available memory into the largest possible power-of-two blocks that will fit.
The Carving Loop
We run a loop that repeatedly carves off the largest possible block from the remaining space and adds it to the appropriate free list. Let's trace it with an example of 7 pages of available memory:
-
First Iteration:
remaining
is 7 pages (28 672 bytes)- The largest power-of-two block that fits is 4 pages (order 2).
- We create a 4-page block and add it to
free_list[2]
. remaining
is now \(7 - 4 = 3\) pages.
-
Second Iteration:
remaining
is 3 pages (12 288 bytes)- The largest power-of-two block that fits is 2 pages (order 1).
- We create a 2-page block and add it to
free_list[1]
. remaining
is now \(3 - 2 = 1\) page.
-
Third Iteration:
remaining
is 1 page (4096 bytes)- The largest power-of-two block that fits is 1 page (order 0).
- We create a 1-page block and add it to
free_list[0]
. remaining
is now \(1 - 1 = 0\) pages. The loop stops here.
After the loop, our free lists will contain one block of order 2, one of order 1, and one of order 0, perfectly accounting for our 7 pages of memory.
Here is the C code that implements this logic:
// First, ensure all free lists are empty.
for (int32_t i = 0; i <= MAX_ORDER; i++) {
g_buddy.free_lists[i] = NULL;
}
size_t remaining = g_buddy.size;
uintptr_t current_addr = g_buddy.base;
// Loop until all available memory has been carved up.
while (remaining >= PAGE_SIZE) {
// Find the largest order that fits in the remaining space.
uint32_t order = floor_log2(remaining / PAGE_SIZE);
// This check is a safeguard.
if (order > g_buddy.max_order) {
order = g_buddy.max_order;
}
// Add the new block to the head of the correct free list.
buddy_node_t *block = (buddy_node_t *)current_addr;
block->next = g_buddy.free_lists[order];
g_buddy.free_lists[order] = block;
// Subtract the carved block's size and advance our address.
size_t block_size = PAGE_SIZE << order;
current_addr += block_size;
remaining -= block_size;
}
Conclusion
The Buddy Allocator, with its clever use of free_lists
, a bitmap,
and a power-of-two "order" system, stands as a simple yet powerful solution
for managing physical memory. By treating adjacent memory blocks as "buddies"
that can be split and merged, it efficiently provides memory of various sizes
while effectively combating fragmentation.