Java String's

Content:

Everybody just *know* that Java String's are stored as UTF-16.

Did anyone read about the changes in Java 9? :-)

public class StringStorage {
    private static Runtime rt = Runtime.getRuntime();
    private static long memory() {
        rt.gc();
        return rt.totalMemory() - rt.freeMemory();
    }
    private static String gen(int siz) {
        StringBuilder sb = new StringBuilder();
        int rem = siz;
        while(rem >= 20) {
            sb.append("12345678901234567890");
            rem -= 20;
        }
        while(rem > 0) {
            sb.append("X");
            rem--;
        }
        return sb.toString();
    }
    public static void main(String[] args) {
        int siz = 1000;
        for(int i = 0; i < 20; i++) {
            long m1 = memory();
            String s = gen(siz);
            long m2 = memory();
            System.out.printf("%d chars = %d bytes (%.2f bytes/char)\n", s.length(), m2 - m1, (m2 - m1) * 1.0 / s.length());
            siz *= 2;
        }
    }
}
C:\Work\Java>java -version
openjdk version "1.8.0_412"
OpenJDK Runtime Environment (Temurin)(build 1.8.0_412-b08)
OpenJDK 64-Bit Server VM (Temurin)(build 25.412-b08, mixed mode)

C:\Work\Java>javac StringStorage.java

C:\Work\Java>java StringStorage
1000 chars = 2304 bytes (2.30 bytes/char)
2000 chars = 4040 bytes (2.02 bytes/char)
4000 chars = 8040 bytes (2.01 bytes/char)
8000 chars = 16040 bytes (2.01 bytes/char)
16000 chars = 32040 bytes (2.00 bytes/char)
32000 chars = 64040 bytes (2.00 bytes/char)
64000 chars = 128968 bytes (2.02 bytes/char)
128000 chars = 256040 bytes (2.00 bytes/char)
256000 chars = 512040 bytes (2.00 bytes/char)
512000 chars = 1024040 bytes (2.00 bytes/char)
1024000 chars = 2048088 bytes (2.00 bytes/char)
2048000 chars = 4096040 bytes (2.00 bytes/char)
4096000 chars = 8192040 bytes (2.00 bytes/char)
8192000 chars = 16384040 bytes (2.00 bytes/char)
16384000 chars = 32768040 bytes (2.00 bytes/char)
32768000 chars = 65536040 bytes (2.00 bytes/char)
65536000 chars = 131072040 bytes (2.00 bytes/char)
131072000 chars = 262144040 bytes (2.00 bytes/char)
262144000 chars = 524288040 bytes (2.00 bytes/char)
524288000 chars = 1048576040 bytes (2.00 bytes/char)

C:\Work\Java>java -version
openjdk version "11.0.23" 2024-04-16
OpenJDK Runtime Environment Temurin-11.0.23+9 (build 11.0.23+9)
OpenJDK 64-Bit Server VM Temurin-11.0.23+9 (build 11.0.23+9, mixed mode)

C:\Work\Java>javac StringStorage.java

C:\Work\Java>java StringStorage
1000 chars = 1216 bytes (1.22 bytes/char)
2000 chars = 2040 bytes (1.02 bytes/char)
4000 chars = 4040 bytes (1.01 bytes/char)
8000 chars = 8040 bytes (1.01 bytes/char)
16000 chars = 16040 bytes (1.00 bytes/char)
32000 chars = 32040 bytes (1.00 bytes/char)
64000 chars = 64152 bytes (1.00 bytes/char)
128000 chars = 128040 bytes (1.00 bytes/char)
256000 chars = 256040 bytes (1.00 bytes/char)
512000 chars = 512040 bytes (1.00 bytes/char)
1024000 chars = 1024088 bytes (1.00 bytes/char)
2048000 chars = 2048040 bytes (1.00 bytes/char)
4096000 chars = 4194328 bytes (1.02 bytes/char)
8192000 chars = 8388632 bytes (1.02 bytes/char)
16384000 chars = 16777240 bytes (1.02 bytes/char)
32768000 chars = 33554456 bytes (1.02 bytes/char)
65536000 chars = 67108888 bytes (1.02 bytes/char)
131072000 chars = 134217752 bytes (1.02 bytes/char)
262144000 chars = 264241176 bytes (1.01 bytes/char)
524288000 chars = 528482328 bytes (1.01 bytes/char)

C:\Work\Java>java -XX:-CompactStrings StringStorage
1000 chars = 2256 bytes (2.26 bytes/char)
2000 chars = 4040 bytes (2.02 bytes/char)
4000 chars = 8040 bytes (2.01 bytes/char)
8000 chars = 16040 bytes (2.01 bytes/char)
16000 chars = 32040 bytes (2.00 bytes/char)
32000 chars = 64040 bytes (2.00 bytes/char)
64000 chars = 128176 bytes (2.00 bytes/char)
128000 chars = 256040 bytes (2.00 bytes/char)
256000 chars = 512040 bytes (2.00 bytes/char)
512000 chars = 1024040 bytes (2.00 bytes/char)
1024000 chars = 2048088 bytes (2.00 bytes/char)
2048000 chars = 4194328 bytes (2.05 bytes/char)
4096000 chars = 8388632 bytes (2.05 bytes/char)
8192000 chars = 16777240 bytes (2.05 bytes/char)
16384000 chars = 33554456 bytes (2.05 bytes/char)
32768000 chars = 67108888 bytes (2.05 bytes/char)
65536000 chars = 134217752 bytes (2.05 bytes/char)
131072000 chars = 264241176 bytes (2.02 bytes/char)
262144000 chars = 528482328 bytes (2.02 bytes/char)
524288000 chars = 1052770328 bytes (2.01 bytes/char)

What happened?

Today a Java String is ISO-8859-1 if possible to store characters in that format and -XX:-CompactStrings not present - otherwise UTF-16 like traditional.

Comments: