Java Garbage Collector algorithms
We all know what a garbage collector algorithm is. But have you ever asked how this algorithm really work ? Are there multiple algorithms from which we can choose ? If yes, how do we make sure we use the right algorithm for our application ? We will go through a few implementation and we will see what algorithm fits the best for our use case.
But, let’s start first discussing about memory.
When our application runs it needs access to the computer memory to store, for example, the object that we create.
In Java, the memory is split in two areas: The Stack and The Heap
What are these areas you may ask. Let’s take each of them separately.
The Stack is used by Java to store primitive values and references to the objects stored in the heap. The Stack is a very effective data structure being managed completely by the JVM.
Each thread operates on its own stack, meaning that in our application we don’t have only a stack, we may have multiple stacks.
How it’s data pushed and pulled to and from the stack memory ? It behaves as a stack data structure following the Last-In-First-Out (LIFO) rule. What does it mean ? It means we can push and pull data only from the top.
Always the last entry added is the first entry removed.
The heap memory is used to store the objects created by the running application. It is divided in multiple parts:
- Young Generation — here is where are stored all the new created objects. From time to time a minor garbage collector will run on this area to remove the garbage and to compact the memory.
- Old Generations — here is where long surviving objects are stored.
Long story short, every time we create a new object, the reference goes to the stack memory and the object goes to the heap memory. This is it!
Before talking about different types of GC algorithms, let’s see how the GC works in general.
A Garbage Collector algorithm consists of three steps:
Let’s take each step one by one.
The algorithms starts with the mark step. What does exactly at this step ? It goes through the stack memory and it marks all the existing references. The interesting part here is if we have more objects in memory without a reference and less references in the stack (meaning that we have more garbage to collect) the algorithm is faster.
The algorithm uses the information from the first step and it goes to the “sweep” step when it removes all the objects without a reference.
The last step is when the algorithm moves all the remaining object to a new memory area.
Let’s see the exactly steps in a more visual way. In the following picture we have the heap memory divided in 2 main parts(as we discussed): Young Generation and Old Generation.
Let’s suppose we start running the application and we create new objects.
Object a = new Object();
Object b = new Object();
Object g = new Object();
All the new created object will go directly in the Young Generation in the Eden area.
Now let’s assign null to some of our references.
b = null;
d = null;
g = null;
This means that the object that previous were referenced by this variables , now are eligible for garbage collector. Why ? Because there is no way for our application to access this objects again.
When the Eden area is full, a Minor Garbage Collector is triggered. It will do its job by following the 3 steps algorithm: mark, sweep and compact .
In the mark phase, as we discussed previously , it will mark all the references from stack.
At this point, the algorithm knows that the objects in gray have no references and are eligible to be collected.
In the next phase, the objects with no reference are removed from the memory.
Finally,in the last phase of the algorithm, all the objects are compacted to a new memory area.
So, after the first run of the minor garbage collector, the unreferenced objects are removed and the remaining objects are compacted to a new memory area, called Survivor 0. At this point, the Eden area is free and we can create new objects.
When the Eden area is full again, the minor garbage collector will run for the second time. It will start with the marking phase, then it will remove all the objects that have no reference. Please note that this time the algorithm will mark and remove also objects from the Survivor 0 area, not only Eden.
The third phase of the algorithm when the objects are compacted to a new memory consists of moving the surviving objects from Eden and Survivor 0 to Survivor 1.
While our application is running , and while we will create new and new objects the minor garbage collector will run again and again every time our Eden area is full, removing the unreferenced objects and moving all the remaining object to Survivor 1 or Survivor 0.
But what happens when we create new and new objects and the Young Generations becomes full ? Yes, you’re right. A Major Garbage Collector is triggered. What it does exactly ? It will move all the object that survived a number of Minor Garbage collector cycles, from the Young Generation to the Old Generation. The number of minor GC cycles is configurable and we can set a threshold specific for our application.
Why is the heap classified in Young and Old generation ? Because studies of real-life Java applications have shown that more than 90% of objects do not survive their first Garbage collection and the older objects that have survived a few collection cycle tend to remain alive and they have a 98% chances of surviving.
How many types of GC exists in Java ?
Java has four types of GC :
- Serial Garbage Collector
- Parallel Garbage collector
- CMS Garbage collector
- G1 Garbage collector
Let’s see each of them in detail.
Serial Garbage Collector
The serial Garbage Collector is the simplest implementation of the GC algorithm. When it runs it freezes all the other running threads of the application until garbage collection have concluded. It uses only one thread to perform the GC.
It is not recommended to be used in a real application running in production because it is very slow.
Parallel Garbage Collector
The Parallel Garbage Collector is the default Garbage Collector in Java 8 and it acts similar with the Serial Garbage Collector. It freezes all the other running threads while performing the garbage collection. The difference between these two is that the Parallel GC uses multiple threads to clean up the memory making it more performing.
This algorithm is best suited for application that can handle such pauses , like application that does not serve HTTP requests but process some data in the background.
This algorithm has two variations:
- multi threaded Minor GC with single threaded Major GC
- multi threaded Minor GC with multi threaded Major GC
The first variations is the default in Java 6 and 7 and the second variation is the default starting with Java 7 update 4 and Java 8.
CMS Garbage Collector
The CMS (Concurrent Mark Sweep) Garbage Collector algorithm runs concurrently with the application threads. It stops the world in only two situations: mark and remark. The main characteristic of this algorithm is that it has short pauses making it more suitable for applications that will be impacted by stopping all the threads.
It requires more CPU and more memory because the threads performing the GC run concurrently with the application threads.
G1 garbage collector(Garbage first)
It is the default Garbage Collector in Java 9 and it is a low pause garbage collector (similar with CMS). It became the default collector because the majority of the applications benefit from low pauses. Nobody wants long pauses when responding to http requests.
We have seen that the heap is divided in 3 parts: eden, survivor and old.
But this is not the case when talking about G1.
When using G1 as Garbage Collector the memory is divided in 2048 small regions. The size of each region depends on the heap size. Even though the memory is divided in more regions, G1 still uses the concept of eden, survivor and old by assigning a tag with one of these three to all the regions.
How does the memory look in G1 ?
Not all the regions are tagged with eden, survivor and old. And we will see why. Let’s see how G1 works.
When the JVM starts, the G1 garbage collector prepares the Eden regions.
When all the Eden regions are full, a minor GC runs, and it moves all the surviving object to new Survivor 1 regions.
When the minor GC runs for the second time, it will move all the surviving object from Eden and Survivor 1 to Survivor 2.
So, we need free space for our new created object to be moved. When the heap is 45% full then a mixed GC runs. It cleans the Eden and Survivor Area and it moves the most surviving object to the memory areas representing the Old generation.
We can configure the max pause size for this Garbage Collector using the following property –XX:MaxGCPauseMillis= (default= 200).
It means that we can set a short pause for the Garbage Collector and it will still do its job in an efficient way, because it starts scanning first the regions with the most available garbage.
When creating an application most of the developers don’t think at all about the garbage collector algorithm that fits the most for their use case. Using the right algorithm can make a difference because nobody wants all the threads to be freezed when the application is in the middle of serving http requests. On the other hand, the parallel garbage collector runs faster, so if you can handle long pauses this one may be the most appropriate for you .