Digging into JEP 280: Indify String Concatenation
Introduction
I have seen this JEP when looking into changes in Java 9, and I actually realized I do not know how String concatenation actually executed in Java. While looking into this JEP, I also realized it can be a useful example of invokedynamic
opcode.
invokedynamic
is introduced with Java 7, to make it easier and possible to implement dynamic languages on JVM. Basically, the behaviour of other invoke instructions in JVM, such as invokevirtual
, is hard-wired, whereas invokedynamic
is not. This means, normally, if you look into the class file, you can see which method is actually called, but with invokedynamic, you cannot see this.
Here is a simple example.
public class StringConcatExample {
public static void main(String[] args) {
System.out.println(args[0] + " and " + args[1]);
}
}
What does String concat operations actually mean in Java ? The answer is different before and after Java 9.
String Concatenation in Java 8, no invokedynamic
Java 8 compiles the String concat operations into StringBuilder::append
calls, as we can see here, by decompiling the class above:
$ javap -c StringConcatExample
Compiled from "StringConcatExample.java"
public class StringConcatExample {
public StringConcatExample();
Code:
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: return
public static void main(java.lang.String[]);
Code:
0: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
3: new #3 // class java/lang/StringBuilder
6: dup
7: invokespecial #4 // Method java/lang/StringBuilder."<init>":()V
10: aload_0
11: iconst_0
12: aaload
13: invokevirtual #5 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
16: ldc #6 // String and
18: invokevirtual #5 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
21: aload_0
22: iconst_1
23: aaload
24: invokevirtual #5 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
27: invokevirtual #7 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
30: invokevirtual #8 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
33: return
}
You can see three StringBuilder::append
calls in the listing above. So basically, args[0] + " and " + args[1]
is equivalent to:
new StringBuilder().append(args[0]).append(" and ").append(args[1]).toString()
String Concatenation in Java 9, with invokedynamic
JEP 280, released with Java 9, modified this behavior. In order to optimize the code above in Java 8, the compiler had to be modified, because concatenation is translated into StringBuilder::append calls by the compiler. However, it is not feasible to modify the compiler for such optimizations every time. So the String concatenation is changed to use invokedynamic instruction, so the optimizations can be made without changing the compiler.
So what does it actually mean to use invokedynamic
?
If we compile the above class with Java 9, its result is this:
$ javap -c StringConcatExample
Compiled from "StringConcatExample.java"
public class StringConcatExample {
public StringConcatExample();
Code:
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: return
public static void main(java.lang.String[]);
Code:
0: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
3: aload_0
4: iconst_0
5: aaload
6: aload_0
7: iconst_1
8: aaload
9: invokedynamic #3, 0 // InvokeDynamic #0:makeConcatWithConstants:(Ljava/lang/String;Ljava/lang/String;)Ljava/lang/String;
14: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
17: return
}
As mentioned, the String concatenation is converted into a dynamic call (invokedynamic) of makeConcatWithConstants
, rather than invoking StringBuilder::append`, and if we look at the verbose output of the disassembler:
$ javap -c -v StringConcatExample
(not all the output is shown)
(I filtered the output to include only the information related to String concat invokedynamic operation)
(I cleaned the output to show only raw data, not the resolved entries from the constant pool)
...
Constant Pool:
...
#3 = InvokeDynamic #0:#21
#19 = MethodHandle 6:#29
#20 = String #30
#21 = NameAndType #31:#32
#29 = Methodref #36.#37
#30 = Utf8 \u0001 and \u0001
#31 = Utf8 makeConcatWithConstants
#32 = Utf8 (Ljava/lang/String;Ljava/lang/String;)Ljava/lang/String;
#36 = Class #38
#37 = NameAndType #31:#42
#38 = Utf8 java/lang/invoke/StringConcatFactory
#42 = Utf8 (Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite;
...
{
...
public static void main(java.lang.String[]);
...
9: invokedynamic #3, 0
...
}
...
BootstrapMethods:
0: #19
Method arguments:
#20
From the spec of invokedynamic, we learn that:
Its format is:
invokedynamic indexbyte1 indexbyte2 0 0
Each occurance of invokedynamic is called a
dynamic call site
indexbyte1 and indexbyte2 is an index (indexbyte1 « 8 | indexbyte2) refering to the constant pool. The item in the constant pool is a call site specifier.
The call site specifier in the constant pool entry has a CONSTANT_InvokeDynamic_info structure:
CONSTANT_InvokeDynamic_info { u1 tag; u2 bootstrap_method_attr_index; u2 name_and_type_index; }
- bootstrap_method_attr_index is an index in bootstrap_methods arrray in the bootstrap method table of the class.
- name_and_type_index is an index in constant pool for a
CONSTANT_NameAndType_Info
representing the method’s name and descriptor.
In our example, invokedynamic is invoked with #3, 0, so the constant pool entry #3 is our call site specifier, and call site specifier refers to Bootstrap Methods table entry #0 and NameAndType constant pool entry #21. 0 is just the last two bytes, which are always set to 0 for invokedynamic.
Below is a graph of all the references starting from invokedynamic. If you are not familiar with Java Class File format, basically every static information in the bytecode is kept in the tables, primarily in the constant pool, and referred by indices from the bytecodes. The red boxes are the entries of Constant Pool, the green one is Bootstrap Methods table entry.
digraph g {
node[style=filled, shape=box]
invokedynamic[label="invokedynamic #3, 0"]
bm0[label="#0:BootstrapMethods[0]\n#19, #20", fillcolor=green]
cp3[label="#3:InvokeDynamic\n#0, #21", fillcolor=red]
cp19[label="#19:MethodHandle\n6 (REF_invokeStatic ):#29", fillcolor=red]
cp20[label="#20:String\n#30", fillcolor=red]
cp21[label="#21:NameAndType\n#31, #32", fillcolor=red]
cp29[label="#29:Methodref\n#36:#37", fillcolor=red]
cp30[label="#30:Utf8\n\\u0001 and \\u0001", fillcolor=red]
cp31[label="#31:Utf8\nmakeConcatWithConstants", fillcolor=red]
cp32[label="#32:Utf8\n(Ljava/lang/String;\nLjava/lang/String;)\nLjava/lang/String;", fillcolor=red]
cp36[label="#36:Class\n#38", fillcolor=red]
cp37[label="#37:NameAndType\n#31, #42", fillcolor=red]
cp38[label="#38:Utf8\njava/lang/invoke/StringConcatFactory", fillcolor=red]
cp42[label="#42:Utf8\n(Ljava/lang/invoke/MethodHandles$Lookup;\nLjava/lang/String;\nLjava/lang/invoke/MethodType;\nLjava/lang/String;\n[Ljava/lang/Object;)\nLjava/lang/invoke/CallSite;", fillcolor=red]
invokedynamic -> cp3
cp3 -> bm0
cp3 -> cp21
bm0 -> cp19
bm0 -> cp20
cp20 -> cp30
cp21 -> cp31
cp21 -> cp32
cp19 -> cp29
cp29 -> cp36
cp29 -> cp37
cp37 -> cp31
cp37 -> cp42
cp36 -> cp38
}
I am not going to explain each entry as it is not our focus and it is also understandable from the graph, but I will describe the invokedynamic execution.
If you look at the decompilation listing above again, you will see the following just before invokedynamic:
3: aload_0
4: iconst_0
5: aaload
6: aload_0
7: iconst_1
8: aaload
These instructions read the args[0] and args[1] and puts them into the operand stack. Before invokedynamic, we have these two elements in the operand stack.
For invokedynamic execution:
- First, these are pushed to operand stack:
- a reference to
java.lang.invoke.MethodHandle
for the bootstrap method - a reference to
java.lang.invoke.MethodHandles.Lookup
object for the class - a reference to
java.lang.String
for the method name - a reference to
java.lang.invoke.MethodType
for the method descriptor - any other static arguments
- a reference to
- The bootstrap method is executed like an execution of an invokevirtual instruction with the following properties:
- the method’s name is invoke
- the method’s descriptor has a return type of
java.lang.invoke.CallSite
- the method’s descriptor has parameter types derived from the operand stack, of which the first four are:
java.lang.invoke.MethodHandle
java.lang.invoke.MethodHandles.Lookup
java.lang.String
java.lang.invoke.MethodType
- if any static arguments are specified (like here, “\u0001 and \u0001”, referenced from bootstrap methods entry #0 to constant pool entry #20), these are appended to parameter types of the method.
The Bootstrap Method here is: java.lang.invoke.StringConcatFactory::makeConcatWithConstants
with the argument types:
java.lang.invoke.MethodHandles.Lookup
java.lang.String
java.lang.invoke.MethodType
java.lang.String
java.lang.Object[]
and return type: java.lang.invoke.CallSite
.
and here it is from the Java 9 source code:
public static CallSite makeConcatWithConstants(MethodHandles.Lookup lookup,
String name,
MethodType concatType,
String recipe,
Object... constants) throws StringConcatException {
if (DEBUG) {
System.out.println("StringConcatFactory " + STRATEGY + " is here for " + concatType + ", {" + recipe + "}, " + Arrays.toString(constants));
}
return doStringConcat(lookup, name, concatType, false, recipe, constants);
}
The arguments are set as:
java.lang.invoke.MethodHandles.Lookup
: object for the class in which this call happened (where this dynamic call site occurs)java.lang.String
: method name in the call site specifier, which ismakeConcatWithConstants
java.lang.invoke.MethodType
: method descriptor in the call site specifier, which is here (String, String) returning String. This is the descriptor of the method, the bootstrap method will return, and it is going to be invoked with the parameters (args[0] and args[1]).java.lang.String
: recipe, static parameter ("\u0001 and \u0001")java.lang.Object[]
: empty here
This method is invoked like a regular Java method with invokevirtual
instruction.
The recipe
argument is an interesting one here, it describes the way how the concatenation is processed, and it is processed character by character:
- \u0001 means the input to concatenation is taken from the dynamic arguments.
- \u0002 means the input to concatenation is taken from the static bootstrap arguments.
- any other character means a single character constant to be concatenated.
If we look into the source code of doStringConcat
referenced from makeConcatWithConstants
, it generates the actual MethodHandle doing the concatenation according to a strategy.
This can actually be observed in OpenJDK 9 by setting java.lang.invoke.stringConcat.debug property to true. For example:
$ java -Djava.lang.invoke.stringConcat.debug=true StringConcatExample 1 2
StringConcatFactory MH_INLINE_SIZED_EXACT is here for (String,String)String, { and }, []
1 and 2
MH_INLINE_SIZED_EXACT
is actually the default strategy and it is used here. The other strategies can be selected with java.lang.invoke.stringConcat
property. This strategy basically returns a method which returns a single String containing the concatenation result.
As a last note, there are actually two variants of this makeConcat methods. The one I mentioned above is makeConcatWithConstants
, whereas the other is makeConcat
. As it can be guessed from the name, makeConcat
does not receive constants, so it is only used to concat elements of dynamic arguments.
Summary
How does String concatenation work in Java 9+ ?
- An invokedynamic call is placed at the point of concatenation by the compiler
- The invokedynamic call first executes the bootstrap method
makeConcat[WithConstants]
- This method, according to a strategy, returns a method
- The returned method is called, giving the result of the concatenation
The principle should be same but StringConcatFactory::makeConcat[WithConstants]
or the default strategy can be different in Java 10, 11 or 12. That is actually the whole point of this JEP, to be able to change it easily.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.