Java GC

Content:

  1. Introduction
  2. GC
    1. High level
    2. Details
  3. Java GC
    1. Serial GC
    2. Parallel GC
    3. Incremental GC
    4. CMS GC
    5. G1 GC
    6. Shenandoah GC
    7. Epsilon GC
    8. Z GC
  4. Java versions
  5. Test

Introduction:

Garbage collection is an important aspect of the JVM and it is relevant for any Java developer to have at least a basic understanding of garbage collection.

GC:

High level:

Garbage collection (GC) fundamentally just mean that the runime automatic deallocate dynamically allocated memory when it is no longer needed as opposed to explict deallocation in code.

There are two types of GC:

Tracing GC
mark reacable objects and GC objects that are no longer reachable
Reference counting GC
maintain a reference counter for all objects and GC when it reaches zero

There exist an alternative terminology where "tracing GC" is just called "GC" and "reference counting GC" is just called "reference counting". But I prefer the above definitions.

The biggest problem with referencxe counting is that unreachable objects can have reference counts greater than zero. Object A reference object B, object B reference object A, neither A nor B is reachable, both A and B have reference count one and does not get GC'ed.

Therefore often a reference counting GC is supplemented with an occasional tracing GC to get rid of these objects.

Java use tracing GC.

Key Java GC terms:

Generational GC
"young generation" heap area with young objects + "old generation" with old objects
Minor GC
GC for only young generation
Full GC
GC for both young and old generation

The GC process works at a high level like:

  1. mark heap objects that are still reachable as live
  2. move live objects from young generation to old generation
  3. reset young generation to start over
  4. deallocate non-live objects in old generation
  5. [optionally] compact old generation

This means that Java GC is:

Which explains why Java GC is very efficient for lots of short lived objects.

To avoid concurrency problems, then GC has to pause other work for a short period of time.

This means that even though GC is very efficient for most common scenarios, then it has usually very poor real time characteristics - the GC pauses are often milliseconds long and happens unpredictable.

Details:

The reality is a little more complicated than what is described above.

The young generation is actually usually split in two spaces: eden and survivor, while old generation is only one space: tenured. At first GC then objects get moved from eden space to survivor space and at N'th GC then objects get moved from survivor space to tenured space. And usually the survivor space actually consist of two spaces. Objects get copied from eden space and current survivor space to next survivor space and then the two survivor spaces switch roles.

So instead of:

Java GC generations

it is more like:

Java GC spaces

But the high level perspective is usually sufficient to understand Java GC and its characteristics.

There are lots of -XX: options to control the behavior of Java GC.

First there is:

-XX:+UseWhateverGC

to select the 'Whatever' GC algorithm.

But there are many other. Some are general covering many/all GC algorithms. Some are specific for a certain GC algorith,.

For an intro to these options see HotSpot Virtual Machine Garbage Collection Tuning Guide.

Java GC:

Java has over time supported many different GC algorithms with different characteristics.

Serial GC:

Serial GC does GC single threaded.

Usage: small data or single core CPU.

Parallel GC:

Parallel GC does GC multi threaded.

Usage: high throughput more important than small pauses.

Incremental GC:

Incremental GC (Train GC) is an early attempt at small pause GC.

Usage: none.

CMS GC:

CMS (Concurrent Mark and Sweep) GC does the young generation without pausing application and only pause when doing the old generation.

Usage: small pauses required on older Java.

Note that there are a ton of -XX options to control how CMS GC behave. Back in the late 00's early 10's then Java EE server gurus spent lots of hours optimizing those options.

G1 GC:

G1 (Garbage First) GC works similar to CMS but instead of processing all of heap at once then it partions heap in partitions and process them individually.

Usage: small pauses required on newer Java.

Shenandoah GC:

Shenandoah GC work similar to G1 GC but has shorter pauses.

Usage: small pauses required on newer Java.

Epsilon GC:

Epsilon GC is a pseudo GC as it does not do any GC.

Usage: short runs with no pauses.

Z GC:

Z GC work similar to G1 GC but has very short pauses.

Usage: very small pauses required or very large heaps.

Java versions:

Available GC and default GC has changed a lot over the lifetime of Java:

GC Java 1.1 Java 1.2 Java 1.3 Java 1.4 Java 5 Java 6 Java 7 Java 8 Java 11 Java 17 Java 21 Java 25
Serial GC default default default default default/client
-XX:+UseSerialGC/server
default/client
-XX:+UseSerialGC/server
default/client
-XX:+UseSerialGC/server
default/client
-XX:+UseSerialGC/server
default/clien
t-XX:+UseSerialGC/server
default/client
-XX:+UseSerialGC/server
-XX:+UseSerialGC -XX:+UseSerialGC
Parallel GC N/A N/A N/A N/A -XX:+UseParallelGC/client
default/server
-XX:+UseParallelGC/client
default/server
-XX:+UseParallelGC/client
default/server
-XX:+UseParallelGC/client
default/server
-XX:+UseParallelGC -XX:+UseParallelGC -XX:+UseParallelGC -XX:+UseParallelGC
Incremental GC N/A N/A -XX:+UseTrainGC
experimental
-XX:+UseTrainGC
experimental
-XX:+UseTrainGC
experimental
-XX:+UseTrainGC
experimental
N/A N/A N/A N/A N/A N/A
CMS GC N/A N/A N/A -XX:+UseConcMarkSweepGC -XX:+UseConcMarkSweepGC -XX:+UseConcMarkSweepGC -XX:+UseConcMarkSweepGC -XX:+UseConcMarkSweepGC -XX:+UseConcMarkSweepGC
deprecated
N/A N/A N/A
G1 GC N/A N/A N/A N/A N/A N/A -XX:+UseG1GC -XX:+UseG1GC -XX:+UseG1GC/client
default/server
-XX:+UseG1GC/client
default/server
default default
Shenandoah GC N/A N/A N/A N/A N/A N/A N/A N/A -XX:+UseShenandoahGC
only in OpenJDK builds not in Oracle builds
-XX:+UseShenandoahGC
only in OpenJDK builds not in Oracle builds
-XX:+UseShenandoahGC
only in OpenJDK builds not in Oracle builds
-XX:+UseShenandoahGC
only in OpenJDK builds not in Oracle builds
Epsilon GC N/A N/A N/A N/A N/A N/A N/A N/A -XX:+UseEpsilonGC
experimental
-XX:+UseEpsilonGC
experimental
-XX:+UseEpsilonGC
experimental
-XX:+UseEpsilonGC
experimental
Z GC N/A N/A N/A N/A N/A N/A N/A N/A -XX:+UnlockExperimentalVMOptions -XX:+UseZGC
experimental
-XX:+UseZGC -XX:+UseZGC -XX:+UseZGC

Or:

Java version available GC
1.1 Serial GC (default)
1.2 Serial GC (default)
1.3 Serial GC (default)
Incremental GC
1.4 Serial GC (default)
Incremental GC
CMS GC
5 Serial GC (default/client)
Parallel GC (default/server)
Incremental GC
CMS GC
6 Serial GC (default/client)
Parallel GC (default/server)
Incremental GC
CMS GC
7 Serial GC (default/client)
Parallel GC (default/server)
CMS GC
G1 GC
8 Serial GC (default/client)
Parallel GC (default/server)
CMS GC
G1 GC
11 Serial GC (default/client)
Parallel GC
CMS GC
G1 GC (default/server)
Shenandoah GC
Epsilon GC
Z GC
17 Serial GC (default/client)
Parallel GC
G1 GC (default/server)
Shenandoah GC
Epsilon GC
Z GC
21 Serial GC
Parallel GC
G1 GC (default)
Shenandoah GC
Epsilon GC
Z GC
25 Serial GC
Parallel GC
G1 GC (default)
Shenandoah GC
Epsilon GC
Z GC

The difference between client and server is:

if number core = 1 and memory < 2 GB then
    client
else
    server
end if

Which means that client is rare for Java 7 and newer.

The above is based on various internet sources. Hopefully it is reasonable accurate, but there may be something that is not accurate.

Test:

Let us make a test to characteristics of different GC algorithms.

Results will vary greatly with scenario. But I have come up with a test program that I consider relevant.

Test program:

import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;

public class GCDemo {
    private static final int N1 = 100_000;
    private static final int N2 = 100;
    private static final int N3 = 1_000;
    private static final int SIZ = 40;
    private static List<Long> dt = new ArrayList<Long>(N1);
    private static void test() {
        long t1 = System.nanoTime();
        byte[][] a = new byte[N3][];
        for(int i = 0; i < N2; i++) {
            for(int j = 0; j < N3; j++) {
                if(a[j] != null || j % 5 != 0 || i % 10 == 0) {
                    a[j] = new byte[SIZ];
                    for(int k = 0; k < 40; k++) a[j][k] = (byte)k;
                }
            }
        }
        long t2 = System.nanoTime();
        synchronized(dt) {
            dt.add(t2 - t1);
        }
    }
    public static void main(String[] args) throws InterruptedException {
        ExecutorService es = Executors.newFixedThreadPool(8);
        for(int i = 0; i < N1; i++) {
            es.submit(() -> test());
        }
        es.shutdown();
        es.awaitTermination(1, TimeUnit.HOURS);
        Collections.sort(dt);
        long min = dt.get(0);
        System.out.printf("min = %d us\n", min / 1000);
        long median = dt.get(N1 / 2);
        System.out.printf("median = %d us\n", median / 1000);
        long average = dt.stream().mapToLong(v -> v.longValue()).sum() / N1;
        System.out.printf("average = %d us\n", average / 1000);
        long max = dt.get(N1 - 1);
        System.out.printf("max = %d us\n", max / 1000);
        int r_below_1 = (int)dt.stream().filter(v -> (v - min) < 1_000_000).count();
        int r_1_10 = (int)dt.stream().filter(v -> (1_000_000 <= (v - min)) && ((v - min) < 10_000_000)).count();
        int r_10_50 = (int)dt.stream().filter(v -> (10_000_000 <= (v - min)) && ((v - min) < 50_000_000)).count();
        int r_50_up = (int)dt.stream().filter(v -> 50_000_000 <= (v - min)).count();
        System.out.printf("distribution = %d %d %d %d\n", r_below_1, r_1_10, r_10_50, r_50_up);
    }
}

Biggest problem with this program is that it only measure time to run a task not the actual GC time. So it does not distinguish between regular run time T1 + stop the world GC pause TGC1 and degraded run time T2 + stop the world GC pause TGC2 where T1 < T2 and TGC1 > TGC2 and T1 + TGC1 = T2 + TGC2.

Test results:

memory size average time iteration (=throughput) longest GC pause number long (>10 ms) GC pauses
256M Java 21 Parallel GC
Java 21 G1 GC
Java 8 Parallel GC
Java 21 Serial GC
Java 8 G1 GC
Java 8 CMS GC
Java 8 Serial GC
Java 21 Z GC
Java 21 generational Z GC
Java 21 Shenandoah GC
Java 21 Z GC
Java 21 Serial GC
Java 21 generational Z GC
Java 8 Serial GC
Java 21 Parallel GC
Java 21 G1 GC
Java 8 G1 GC
Java 8 Parallel GC
Java 8 CMS GC
Java 21 Shenandoah GC
Java 8 G1 GC
Java 21 G1 GC
Java 21 Parallel GC
Java 8 Parallel GC
Java 21 Serial GC
Java 8 CMS GC
Java 8 Serial GC
Java 21 Z GC
Java 21 generational Z GC
Java 21 Shenandoah GC
1G Java 21 Parallel GC
Java 21 G1 GC
Java 21 Serial GC
Java 8 Parallel GC
Java 21 Z GC
Java 21 generational Z GC
Java 8 CMS GC
Java 8 Serial GC
Java 8 G1 GC
Java 21 Shenandoah GC
Java 8 Serial GC
Java 21 Serial GC
Java 21 Z GC
Java 21 generational Z GC
Java 21 Parallel GC
Java 8 Parallel GC
Java 21 Shenandoah GC
Java 21 G1 GC
Java 8 CMS GC
Java 8 G1 GC
Java 21 G1 GC
Java 8 G1 GC
Java 8 Parallel GC
Java 21 Parallel GC
Java 21 Serial GC
Java 8 CMS GC
Java 21 Z GC
Java 21 generational Z GC
Java 8 Serial GC
Java 21 Shenandoah GC
4G Java 21 Parallel GC
Java 21 generational Z GC
Java 21 Serial GC
Java 21 Z GC
Java 8 Parallel GC
Java 8 Serial GC
Java 21 G1 GC
Java 21 Shenandoah GC
Java 8 CMS GC
Java 8 G1 GC
Java 8 Serial GC
Java 21 generational Z GC
Java 21 Parallel GC
Java 21 Z GC
Java 21 Serial GC
Java 21 Shenandoah GC
Java 21 G1 GC
Java 8 G1 GC
Java 8 Parallel GC
Java 8 CMS GC
Java 21 Z GC
Java 21 generational Z GC
Java 8 Parallel GC
Java 21 Parallel GC
Java 21 Serial GC
Java 21 G1 GC
Java 8 G1 GC
Java 8 Serial GC
Java 8 CMS GC
Java 21 Shenandoah GC

Conclusion:

Main conclusion must be that best GC algorithm depends on program, available memory, Java version and criteria for "best".

But based on these results recommendations will be:

But again note: there is absolutely no guarantee that your application will have same characteristics as my test program.

"HotSpot Virtual Machine Garbage Collection Tuning Guide" (see link in previous section) has the following recommendations:

                                                     If the performance still doesn't meet
your goals, then use the following guidelines as a starting point for selecting a collector:
• If the application has a small data set (up to approximately 100 MB), then select the serial
collector with the option -XX:+UseSerialGC.
• If the application will be run on a single processor and there are no pause-time
requirements, then select the serial collector with the option -XX:+UseSerialGC.
• If (a) peak application performance is the first priority and (b) there are no pause-time
requirements or pauses of one second or longer are acceptable, then let the VM select the
collector or select the parallel collector with -XX:+UseParallelGC.
• If response time is more important than overall throughput and garbage collection pauses
must be kept shorter, then select the mostly concurrent collector with -XX:+UseG1GC.
• If response time is a high priority, then select a fully concurrent collector with -XX:UseZGC -
XX:+ZGenerational.

These guidelines provide only a starting point for selecting a collector because performance is
dependent on the size of the heap, the amount of live data maintained by the application, and
the number and speed of available processors.

Which are somewhat compatible with my recommendations (I don't cover the memory < 100 MB and 1 CPU core cases as they are rarely seen today).

Other notes:

Article history:

Version Date Description
1.0 October 25th 2025 Initial version

Other articles:

See list of all articles here

Comments:

Please send comments to Arne Vajhøj