Emile Timothy
Main Page

Projects

Optimized Dynamic Memory Allocator

During the fall quarter of my sophomore year at Caltech, I took CS 24 (Computing Systems) at Caltech, which was taught by Professor Adam Blank. One of the more notorious week-long assignments in the class was to build an optimized dynamic memory allocator in C through some pretty cool techniques that I'd like to describe in this page. It took ~20 hours to implement these projects, and while it was pretty challenging, it was also incredibly rewarding and fun! So, in this page, I'll describe what a memory allocator is, and then go into details about how I optimized it by using explicit free-lists, coalescing, and block-splitting to minimize calls to the sbrk() function.


Malloc, Calloc, Free, Realloc

In the C programming language, the size of an array cannot be changed in the same way as it is changed in Python. In Python, list manipulation commands like slicing and appending easily allow you to resize arrays until they have an appropriate size. However, in C, this task is less straightforward - you have to use the stdlib.h library. The relevant functions in the stdlib.h library that allow you to resize the storage space for your data are malloc, calloc, free, and realloc. Firstly, I'm going to briefly describe these four functions in terms of their purposes, functionalities, and working mechanisms - after all, it is important to know the weaknesses of a function before embarking on the task of optimizing the function.

1) Malloc (Memory Allocation): Malloc is a function in C that dynamically allocates a large contiguous block of memory with a size that is specified by the user when they call the function. Malloc first calls a low-level system memory management function called sbrk() which finds available space in the system's heap that it can allocate to the usr. Sbrk() then returns a pointer (of type void) which can later be cast by the user program into a pointer of any other type. Malloc then returns a pointer of the memory space to the user, where the memory space is initialized with default garbage values.

ptr = (void*) malloc(size_t size);

2) Calloc (Contiguous Allocation): Calloc is a function in C that is used to dynamically allocate a user-specified number of blocks of memory of a specific type which it returns to the user. It is functionally similar to malloc() since it also uses the sbrk() function, but differs because Calloc initializes each block with a default value of 0, and because it has two parameters whereas Malloc only has one parameter.

ptr = (void*) calloc(size_t number_of_blocks, size_t size_of_each_block);

3) Free (Deallocation): Free is a function in C that is used to dynamically deallocate memory to the heap that has been previously allocated to the user. This function is necessary since memory from malloc() and calloc() do not deallocate themselves after the user has finished using them. Freeing memory after using it is an imperative coding practice since it helps mitigate memory wastage. Once the memory space is freed, the pointer can never be used again: it would be pointing to restricted space in the system which would lead to an error.

ptr = void free(void *ptr);

4) Realloc (Reallocation): Realloc is a function in C that is used to dynamically change the memory allocation of a previously allocated memory. In other words, if the memory previously allocated to the user (through malloc or calloc) is too small or too large for the user's wishes, realloc can be used to dynamically reallocate memory to the user. Realloc works by allocating (through malloc) the memory space with the size specified by the user, copying the data from the old memory space to the new memory space, freeing the pointer associated to the old memory space, and finally returning the new pointer to the user. The reallocation of memory retains as much of the existing data as possible. Specifically, if the storage space is increased from n bytes to m bytes, the last m-n new bytes will be initialized with default garbage values. Similarly, if the storage size is decreased to n bytes, the first n bytes of the initial memory space will be copied to the new memory space.

ptr = (void*) realloc(void *ptr, size_t size);



Faults in Malloc, Calloc, Free, and Realloc

To paint a picture of the current dynamic memory allocation set-up, it is worth noting that sbrk() and free() are extremely slow functions to call since they run at the system level. Let's quantiify this more precisely now. Let \(T_1(n)\) denote the time it takes for sbrk() to allocate n bytes of memory, and let \(T_2(n)\) denote the time it takes for free() to dellocate n bytes of memory. Then, the time it takes to run malloc() must be \(T_1(n)\), the time it takese to un calloc() must be \(T_1(n)+O(n)\), the time it takes to run free() must be \(T_2(n)\), and the time it takes to run realloc() for a pointer that currently points to a space of size n with the command to reshape the space to size m must be \(T_2(n)+O(n)+T_1(m)\). The bottleneck in this system is simply the fact that \(T_1(n)\) and \(T_2(n)\) are steep functions. So, calling malloc(), calloc(), realloc(), or free() whenever some memory is desired or undesired is just too expensive.

Optimization Plan
So, how can we minimize calls to sbrk() and free()? Here's the plan: if we call sbrk() to allocate memory and then free the memory, store a pointer to the storage space in a linked-list at the program-level. Then, if the user requests memory again in the future, we first check the linked-list to see if it contains pointers to any block in memory space that is atleast the size of the space that the user is asking for. If the memory space has more space than the user asks for, we segment it by splitting the block (block-splitting) and allocate the space to the user while adding the residual space as a separate block of memory in the free list. Conversely, if there is no other block of memory in the free-list, we make another call to sbrk(). Additionally, we add in some other optimizations: Every time we free some memory, we check if the blocks to its left or right are also free. If they are free, we coalesce the blocks together to make them contiguous. If not, we just add the block as a separate node on the linked-list. An additional design decision to make was how to search through the free-list to find blocks of optimal size: it was decided that a LIFO (last-in first-out) strategy would be used: this meant that blocks would be added to the head of the linked-list, and removed from its tail. Additionally, to store the sizes of the blocks, we added a header for each block, where the header stated the size of its corresponding memory block. Finally, we denote the head and tail of the free-list with blocks called the prologue and epilogue, where the pointers to the prologue and epilogue are stored as global variables.
Results
As expected, this turned out to be a strong optimization. It increased the speed and utility of the initial dynamic memory allocation by roughly 200%. One way to increase the speed and utility of the code further would be to use an alternative datastructure rather than linked-lissts: using a balanced binary search tree (BST) would allow for a faster way to find blocks of the correct size, which would avoid unnecessary block-splitting (followed by future coalescing).