Process memory defragmentation by another process

Kulasko · Post by **Kulasko** » 2018-01-14, 8:06:27

I came up with an idea for an algorithm that effectively can decrease to completely eliminate memory fragmentation.

When we try to expand the memory space of a process without having enough of a gap before/behind the already allocated space, we would allocate a new block of memory somewhere else, then create an entry in our memory map to extend our address space to the new block, using an offset. The algorithm i came up with lets an other process merge these blocks together without interrupting the application process unless absolutely necessary.

The naive solution would be to interrupt the process, merge the blocks, then resume execution. This would result in a spike of latency similar to a big garbage collection run of some managed programming lanuages. These are sometimes not tolerable. The idea I came up with uses another process to do the defragmentation, therefore leaving the application thread running unless it accesses the block that is currently moved. Naturally, this approach comes with a higher system level overhead.

The critical requirement for this to work is that a memory map of a running process can be atomically patched by another process. This might actual be too complicated to be feasible for real implementations, I find this idea to be interesting nonetheless.

In the following example, I assume the case that we expanded the high end of the process address space and now have enough free space behind the old block to move the expanded block behind it. The memory map entry for the expanded block is just over the entry of the old block, therefore we have no gap in our adress space.

The memory block to be moved is named Block A.
A memory entry without acces rights is called entry E.

- Allocate the needed space
- Determine a transfer block size
- Set up a custom access violation handler for the application process
- Add entry E between both blocks in the memory map, the address is the same as for the entry of block A
- for each transfer block of block A, starting at the lowest address:
-> increase the address of the memory map entry for block A by the transfer block size
-> copy current transfer block of block A to the new position
-> increase the address for entry E by the transfer block size
- delete entry E and the entry for block A.
- deallocate the old space
- reset the access violation handler

If the application process accesses memory in the address range of entry E, it triggers the custom handler. This handler checks if the accessed address is indeed in the range of entry E. If so, it simply waits until the current transfer block finished, then lets the process resume execution. If not, the standard access violation handler is called.

What do you think of this?

Post by **agner** » 2018-01-14, 9:02:20

This sounds like an excellent idea, if I understand you right (I am not sure what you mean by "behind" and "over").

We can really gain a lot if we can construct a system with little or no memory fragmentation. We can get rid of the large translation lookaside buffer (TLB) and multi-level page tables. We can improve caching by having contiguous memory and we can avoid TLB misses. Let me summarize the problems we have to solve to obtain this goal:

Avoid DLL's and shared objects scattered around memory. I am soon finished implementing the re-linking feature that will accomplish this.
Reserve memory space for runtime linking of libraries and plugins if needed.
Let the compiler and linker calculate the maximum stack size. This will not work if we have unlimited recursive function calls.
Use statistics to predict the heap size. This may reduce the problem of heap overflow, but not eliminate it.
Garbage collection when multiple processes with different memory sized are loaded, unloaded, and expanded randomly. Your proposal may help with this.
Interpreted languages and byte code languages are particularly complicated in terms of memory management. The interpreters and frameworks often consume more resources than the program itself. Why not replace interpreters and JIT compilers with a simple compiler that just compiles the code the first time it is run?
Java scripts running from a web page and other single-time scripts may run in a sandbox with a fixed amount of allocated memory.
What else? Have I forgotten something?

The first applications of ForwardCom will most likely be running only a single application where memory fragmentation is not an issue.

HubertLamontagne · Post by **HubertLamontagne** » 2018-01-16, 18:21:15

Question : How many active memory map entries do you think it would be electrically feasible to have on a typical chip? (considering that it must do its job in 1~3 cycles) 64? More? Fewer?

Some observations:
- Garbage collected systems such as the Java JVM are actually easier to implement under ForwardCom because they recopy everything and compact everything when they garbage collect, so they can essentially be defragmented on demand, and the garbage collection pauses are expected.
- Interpreted languages (including Javascript) are hard because they do a million small allocations with variable life span, essentially because all their data structures are webs of loosely coupled sparse data. Compiling the languages doesn't help because even when compiled, their data types are still going to be webs of small variant-typed objects held together with pointers.
- Existing systems already have stack size limits statically built into programs, because large stacks are rare (and should be replaced by dynamic allocation). For instance, under win32, the main thread stack is 1mb (unless overridden in build options), and programs rarely have to change this.
- Thread stacks are more dicey, because they essentially have to be dynamically allocated. Posix systems typically have lower limits (one document said 192kb under Aix for instance, defined in pthreads.h). Win32 allocates 1mb per thread(!), and the reason this works out in practice is that it uses aggressive demand paging (which is essentially impossible on ForwardCom).

Post by **agner** » 2018-01-17, 18:05:20

Hubert wrote:

How many active memory map entries do you think it would be electrically feasible to have on a typical chip?

Each entry marking a boundary between memory sections will need one comparator. So the silicon requirement will be somewhat similar to a contents-addressable memory. They can be quite big, but I don't think it needs to be big. 64 entries sounds like enough for most applications.

forwardcom forum

Process memory defragmentation by another process

Process memory defragmentation by another process

Re: Process memory defragmentation by another process

Re: Process memory defragmentation by another process

Re: Process memory defragmentation by another process