Everybody just *know* that Java String's are stored as UTF-16.
Did anyone read about the changes in Java 9? :-)
public class StringStorage {
private static Runtime rt = Runtime.getRuntime();
private static long memory() {
rt.gc();
return rt.totalMemory() - rt.freeMemory();
}
private static String gen(int siz) {
StringBuilder sb = new StringBuilder();
int rem = siz;
while(rem >= 20) {
sb.append("12345678901234567890");
rem -= 20;
}
while(rem > 0) {
sb.append("X");
rem--;
}
return sb.toString();
}
public static void main(String[] args) {
int siz = 1000;
for(int i = 0; i < 20; i++) {
long m1 = memory();
String s = gen(siz);
long m2 = memory();
System.out.printf("%d chars = %d bytes (%.2f bytes/char)\n", s.length(), m2 - m1, (m2 - m1) * 1.0 / s.length());
siz *= 2;
}
}
}
C:\Work\Java>java -version openjdk version "1.8.0_412" OpenJDK Runtime Environment (Temurin)(build 1.8.0_412-b08) OpenJDK 64-Bit Server VM (Temurin)(build 25.412-b08, mixed mode) C:\Work\Java>javac StringStorage.java C:\Work\Java>java StringStorage 1000 chars = 2304 bytes (2.30 bytes/char) 2000 chars = 4040 bytes (2.02 bytes/char) 4000 chars = 8040 bytes (2.01 bytes/char) 8000 chars = 16040 bytes (2.01 bytes/char) 16000 chars = 32040 bytes (2.00 bytes/char) 32000 chars = 64040 bytes (2.00 bytes/char) 64000 chars = 128968 bytes (2.02 bytes/char) 128000 chars = 256040 bytes (2.00 bytes/char) 256000 chars = 512040 bytes (2.00 bytes/char) 512000 chars = 1024040 bytes (2.00 bytes/char) 1024000 chars = 2048088 bytes (2.00 bytes/char) 2048000 chars = 4096040 bytes (2.00 bytes/char) 4096000 chars = 8192040 bytes (2.00 bytes/char) 8192000 chars = 16384040 bytes (2.00 bytes/char) 16384000 chars = 32768040 bytes (2.00 bytes/char) 32768000 chars = 65536040 bytes (2.00 bytes/char) 65536000 chars = 131072040 bytes (2.00 bytes/char) 131072000 chars = 262144040 bytes (2.00 bytes/char) 262144000 chars = 524288040 bytes (2.00 bytes/char) 524288000 chars = 1048576040 bytes (2.00 bytes/char) C:\Work\Java>java -version openjdk version "11.0.23" 2024-04-16 OpenJDK Runtime Environment Temurin-11.0.23+9 (build 11.0.23+9) OpenJDK 64-Bit Server VM Temurin-11.0.23+9 (build 11.0.23+9, mixed mode) C:\Work\Java>javac StringStorage.java C:\Work\Java>java StringStorage 1000 chars = 1216 bytes (1.22 bytes/char) 2000 chars = 2040 bytes (1.02 bytes/char) 4000 chars = 4040 bytes (1.01 bytes/char) 8000 chars = 8040 bytes (1.01 bytes/char) 16000 chars = 16040 bytes (1.00 bytes/char) 32000 chars = 32040 bytes (1.00 bytes/char) 64000 chars = 64152 bytes (1.00 bytes/char) 128000 chars = 128040 bytes (1.00 bytes/char) 256000 chars = 256040 bytes (1.00 bytes/char) 512000 chars = 512040 bytes (1.00 bytes/char) 1024000 chars = 1024088 bytes (1.00 bytes/char) 2048000 chars = 2048040 bytes (1.00 bytes/char) 4096000 chars = 4194328 bytes (1.02 bytes/char) 8192000 chars = 8388632 bytes (1.02 bytes/char) 16384000 chars = 16777240 bytes (1.02 bytes/char) 32768000 chars = 33554456 bytes (1.02 bytes/char) 65536000 chars = 67108888 bytes (1.02 bytes/char) 131072000 chars = 134217752 bytes (1.02 bytes/char) 262144000 chars = 264241176 bytes (1.01 bytes/char) 524288000 chars = 528482328 bytes (1.01 bytes/char) C:\Work\Java>java -XX:-CompactStrings StringStorage 1000 chars = 2256 bytes (2.26 bytes/char) 2000 chars = 4040 bytes (2.02 bytes/char) 4000 chars = 8040 bytes (2.01 bytes/char) 8000 chars = 16040 bytes (2.01 bytes/char) 16000 chars = 32040 bytes (2.00 bytes/char) 32000 chars = 64040 bytes (2.00 bytes/char) 64000 chars = 128176 bytes (2.00 bytes/char) 128000 chars = 256040 bytes (2.00 bytes/char) 256000 chars = 512040 bytes (2.00 bytes/char) 512000 chars = 1024040 bytes (2.00 bytes/char) 1024000 chars = 2048088 bytes (2.00 bytes/char) 2048000 chars = 4194328 bytes (2.05 bytes/char) 4096000 chars = 8388632 bytes (2.05 bytes/char) 8192000 chars = 16777240 bytes (2.05 bytes/char) 16384000 chars = 33554456 bytes (2.05 bytes/char) 32768000 chars = 67108888 bytes (2.05 bytes/char) 65536000 chars = 134217752 bytes (2.05 bytes/char) 131072000 chars = 264241176 bytes (2.02 bytes/char) 262144000 chars = 528482328 bytes (2.02 bytes/char) 524288000 chars = 1052770328 bytes (2.01 bytes/char)
What happened?
Today a Java String is ISO-8859-1 if possible to store characters in that format and -XX:-CompactStrings not present - otherwise UTF-16 like traditional.