Garbage collection is an important aspect of the JVM and it is relevant for any Java developer to have at least a basic understanding of garbage collection.
Garbage collection (GC) fundamentally just mean that the runime automatic deallocate dynamically allocated memory when it is no longer needed as opposed to explict deallocation in code.
There are two types of GC:
There exist an alternative terminology where "tracing GC" is just called "GC" and "reference counting GC" is just called "reference counting". But I prefer the above definitions.
The biggest problem with referencxe counting is that unreachable objects can have reference counts greater than zero. Object A reference object B, object B reference object A, neither A nor B is reachable, both A and B have reference count one and does not get GC'ed.
Therefore often a reference counting GC is supplemented with an occasional tracing GC to get rid of these objects.
Java use tracing GC.
Key Java GC terms:
The GC process works at a high level like:
This means that Java GC is:
Which explains why Java GC is very efficient for lots of short lived objects.
To avoid concurrency problems, then GC has to pause other work for a short period of time.
This means that even though GC is very efficient for most common scenarios, then it has usually very poor real time characteristics - the GC pauses are often milliseconds long and happens unpredictable.
The reality is a little more complicated than what is described above.
The young generation is actually usually split in two spaces: eden and survivor, while old generation is only one space: tenured. At first GC then objects get moved from eden space to survivor space and at N'th GC then objects get moved from survivor space to tenured space. And usually the survivor space actually consist of two spaces. Objects get copied from eden space and current survivor space to next survivor space and then the two survivor spaces switch roles.
So instead of:
it is more like:
But the high level perspective is usually sufficient to understand Java GC and its characteristics.
There are lots of -XX: options to control the behavior of Java GC.
First there is:
-XX:+UseWhateverGC
to select the 'Whatever' GC algorithm.
But there are many other. Some are general covering many/all GC algorithms. Some are specific for a certain GC algorith,.
For an intro to these options see HotSpot Virtual Machine Garbage Collection Tuning Guide.
Java has over time supported many different GC algorithms with different characteristics.
Serial GC does GC single threaded.
Usage: small data or single core CPU.
Parallel GC does GC multi threaded.
Usage: high throughput more important than small pauses.
Incremental GC (Train GC) is an early attempt at small pause GC.
Usage: none.
CMS (Concurrent Mark and Sweep) GC does the young generation without pausing application and only pause when doing the old generation.
Usage: small pauses required on older Java.
Note that there are a ton of -XX options to control how CMS GC behave. Back in the late 00's early 10's then Java EE server gurus spent lots of hours optimizing those options.
G1 (Garbage First) GC works similar to CMS but instead of processing all of heap at once then it partions heap in partitions and process them individually.
Usage: small pauses required on newer Java.
Shenandoah GC work similar to G1 GC but has shorter pauses.
Usage: small pauses required on newer Java.
Epsilon GC is a pseudo GC as it does not do any GC.
Usage: short runs with no pauses.
Z GC work similar to G1 GC but has very short pauses.
Usage: very small pauses required or very large heaps.
Available GC and default GC has changed a lot over the lifetime of Java:
| GC | Java 1.1 | Java 1.2 | Java 1.3 | Java 1.4 | Java 5 | Java 6 | Java 7 | Java 8 | Java 11 | Java 17 | Java 21 | Java 25 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Serial GC | default | default | default | default | default/client -XX:+UseSerialGC/server |
default/client -XX:+UseSerialGC/server |
default/client -XX:+UseSerialGC/server |
default/client -XX:+UseSerialGC/server |
default/clien t-XX:+UseSerialGC/server |
default/client -XX:+UseSerialGC/server |
-XX:+UseSerialGC | -XX:+UseSerialGC |
| Parallel GC | N/A | N/A | N/A | N/A | -XX:+UseParallelGC/client default/server |
-XX:+UseParallelGC/client default/server |
-XX:+UseParallelGC/client default/server |
-XX:+UseParallelGC/client default/server |
-XX:+UseParallelGC | -XX:+UseParallelGC | -XX:+UseParallelGC | -XX:+UseParallelGC |
| Incremental GC | N/A | N/A | -XX:+UseTrainGC experimental |
-XX:+UseTrainGC experimental |
-XX:+UseTrainGC experimental |
-XX:+UseTrainGC experimental |
N/A | N/A | N/A | N/A | N/A | N/A |
| CMS GC | N/A | N/A | N/A | -XX:+UseConcMarkSweepGC | -XX:+UseConcMarkSweepGC | -XX:+UseConcMarkSweepGC | -XX:+UseConcMarkSweepGC | -XX:+UseConcMarkSweepGC | -XX:+UseConcMarkSweepGC deprecated |
N/A | N/A | N/A |
| G1 GC | N/A | N/A | N/A | N/A | N/A | N/A | -XX:+UseG1GC | -XX:+UseG1GC | -XX:+UseG1GC/client default/server |
-XX:+UseG1GC/client default/server |
default | default |
| Shenandoah GC | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | -XX:+UseShenandoahGC only in OpenJDK builds not in Oracle builds |
-XX:+UseShenandoahGC only in OpenJDK builds not in Oracle builds |
-XX:+UseShenandoahGC only in OpenJDK builds not in Oracle builds |
-XX:+UseShenandoahGC only in OpenJDK builds not in Oracle builds |
| Epsilon GC | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | -XX:+UseEpsilonGC experimental |
-XX:+UseEpsilonGC experimental |
-XX:+UseEpsilonGC experimental |
-XX:+UseEpsilonGC experimental |
| Z GC | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | -XX:+UnlockExperimentalVMOptions -XX:+UseZGC experimental |
-XX:+UseZGC | -XX:+UseZGC | -XX:+UseZGC |
Or:
| Java version | available GC |
|---|---|
| 1.1 | Serial GC (default) |
| 1.2 | Serial GC (default) |
| 1.3 | Serial GC (default) Incremental GC |
| 1.4 | Serial GC (default) Incremental GC CMS GC |
| 5 | Serial GC (default/client) Parallel GC (default/server) Incremental GC CMS GC |
| 6 | Serial GC (default/client) Parallel GC (default/server) Incremental GC CMS GC |
| 7 | Serial GC (default/client) Parallel GC (default/server) CMS GC G1 GC |
| 8 | Serial GC (default/client) Parallel GC (default/server) CMS GC G1 GC |
| 11 | Serial GC (default/client) Parallel GC CMS GC G1 GC (default/server) Shenandoah GC Epsilon GC Z GC |
| 17 | Serial GC (default/client) Parallel GC G1 GC (default/server) Shenandoah GC Epsilon GC Z GC |
| 21 | Serial GC Parallel GC G1 GC (default) Shenandoah GC Epsilon GC Z GC |
| 25 | Serial GC Parallel GC G1 GC (default) Shenandoah GC Epsilon GC Z GC |
The difference between client and server is:
if number core = 1 and memory < 2 GB then
client
else
server
end if
Which means that client is rare for Java 7 and newer.
The above is based on various internet sources. Hopefully it is reasonable accurate, but there may be something that is not accurate.
Let us make a test to characteristics of different GC algorithms.
Results will vary greatly with scenario. But I have come up with a test program that I consider relevant.
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
public class GCDemo {
private static final int N1 = 100_000;
private static final int N2 = 100;
private static final int N3 = 1_000;
private static final int SIZ = 40;
private static List<Long> dt = new ArrayList<Long>(N1);
private static void test() {
long t1 = System.nanoTime();
byte[][] a = new byte[N3][];
for(int i = 0; i < N2; i++) {
for(int j = 0; j < N3; j++) {
if(a[j] != null || j % 5 != 0 || i % 10 == 0) {
a[j] = new byte[SIZ];
for(int k = 0; k < 40; k++) a[j][k] = (byte)k;
}
}
}
long t2 = System.nanoTime();
synchronized(dt) {
dt.add(t2 - t1);
}
}
public static void main(String[] args) throws InterruptedException {
ExecutorService es = Executors.newFixedThreadPool(8);
for(int i = 0; i < N1; i++) {
es.submit(() -> test());
}
es.shutdown();
es.awaitTermination(1, TimeUnit.HOURS);
Collections.sort(dt);
long min = dt.get(0);
System.out.printf("min = %d us\n", min / 1000);
long median = dt.get(N1 / 2);
System.out.printf("median = %d us\n", median / 1000);
long average = dt.stream().mapToLong(v -> v.longValue()).sum() / N1;
System.out.printf("average = %d us\n", average / 1000);
long max = dt.get(N1 - 1);
System.out.printf("max = %d us\n", max / 1000);
int r_below_1 = (int)dt.stream().filter(v -> (v - min) < 1_000_000).count();
int r_1_10 = (int)dt.stream().filter(v -> (1_000_000 <= (v - min)) && ((v - min) < 10_000_000)).count();
int r_10_50 = (int)dt.stream().filter(v -> (10_000_000 <= (v - min)) && ((v - min) < 50_000_000)).count();
int r_50_up = (int)dt.stream().filter(v -> 50_000_000 <= (v - min)).count();
System.out.printf("distribution = %d %d %d %d\n", r_below_1, r_1_10, r_10_50, r_50_up);
}
}
Biggest problem with this program is that it only measure time to run a task not the actual GC time. So it does not distinguish between regular run time T1 + stop the world GC pause TGC1 and degraded run time T2 + stop the world GC pause TGC2 where T1 < T2 and TGC1 > TGC2 and T1 + TGC1 = T2 + TGC2.
| memory size | average time iteration (=throughput) | longest GC pause | number long (>10 ms) GC pauses |
|---|---|---|---|
| 256M |
Java 21 Parallel GC Java 21 G1 GC Java 8 Parallel GC Java 21 Serial GC Java 8 G1 GC Java 8 CMS GC Java 8 Serial GC Java 21 Z GC Java 21 generational Z GC Java 21 Shenandoah GC |
Java 21 Z GC Java 21 Serial GC Java 21 generational Z GC Java 8 Serial GC Java 21 Parallel GC Java 21 G1 GC Java 8 G1 GC Java 8 Parallel GC Java 8 CMS GC Java 21 Shenandoah GC |
Java 8 G1 GC Java 21 G1 GC Java 21 Parallel GC Java 8 Parallel GC Java 21 Serial GC Java 8 CMS GC Java 8 Serial GC Java 21 Z GC Java 21 generational Z GC Java 21 Shenandoah GC |
| 1G |
Java 21 Parallel GC Java 21 G1 GC Java 21 Serial GC Java 8 Parallel GC Java 21 Z GC Java 21 generational Z GC Java 8 CMS GC Java 8 Serial GC Java 8 G1 GC Java 21 Shenandoah GC |
Java 8 Serial GC Java 21 Serial GC Java 21 Z GC Java 21 generational Z GC Java 21 Parallel GC Java 8 Parallel GC Java 21 Shenandoah GC Java 21 G1 GC Java 8 CMS GC Java 8 G1 GC |
Java 21 G1 GC Java 8 G1 GC Java 8 Parallel GC Java 21 Parallel GC Java 21 Serial GC Java 8 CMS GC Java 21 Z GC Java 21 generational Z GC Java 8 Serial GC Java 21 Shenandoah GC |
| 4G |
Java 21 Parallel GC Java 21 generational Z GC Java 21 Serial GC Java 21 Z GC Java 8 Parallel GC Java 8 Serial GC Java 21 G1 GC Java 21 Shenandoah GC Java 8 CMS GC Java 8 G1 GC |
Java 8 Serial GC Java 21 generational Z GC Java 21 Parallel GC Java 21 Z GC Java 21 Serial GC Java 21 Shenandoah GC Java 21 G1 GC Java 8 G1 GC Java 8 Parallel GC Java 8 CMS GC |
Java 21 Z GC Java 21 generational Z GC Java 8 Parallel GC Java 21 Parallel GC Java 21 Serial GC Java 21 G1 GC Java 8 G1 GC Java 8 Serial GC Java 8 CMS GC Java 21 Shenandoah GC |
Main conclusion must be that best GC algorithm depends on program, available memory, Java version and criteria for "best".
But based on these results recommendations will be:
But again note: there is absolutely no guarantee that your application will have same characteristics as my test program.
"HotSpot Virtual Machine Garbage Collection Tuning Guide" (see link in previous section) has the following recommendations:
If the performance still doesn't meet
your goals, then use the following guidelines as a starting point for selecting a collector:
• If the application has a small data set (up to approximately 100 MB), then select the serial
collector with the option -XX:+UseSerialGC.
• If the application will be run on a single processor and there are no pause-time
requirements, then select the serial collector with the option -XX:+UseSerialGC.
• If (a) peak application performance is the first priority and (b) there are no pause-time
requirements or pauses of one second or longer are acceptable, then let the VM select the
collector or select the parallel collector with -XX:+UseParallelGC.
• If response time is more important than overall throughput and garbage collection pauses
must be kept shorter, then select the mostly concurrent collector with -XX:+UseG1GC.
• If response time is a high priority, then select a fully concurrent collector with -XX:UseZGC -
XX:+ZGenerational.
These guidelines provide only a starting point for selecting a collector because performance is
dependent on the size of the heap, the amount of live data maintained by the application, and
the number and speed of available processors.
Which are somewhat compatible with my recommendations (I don't cover the memory < 100 MB and 1 CPU core cases as they are rarely seen today).
Other notes:
| Version | Date | Description |
|---|---|---|
| 1.0 | October 25th 2025 | Initial version |
See list of all articles here
Please send comments to Arne Vajhøj