Digging into JEP 280: Indify String Concatenation

December 05, 2018

Introduction

I have seen this JEP when looking into changes in Java 9, and I actually realized I do not know how String concatenation actually executed in Java. While looking into this JEP, I also realized it can be a useful example of invokedynamic opcode.

invokedynamic is introduced with Java 7, to make it easier and possible to implement dynamic languages on JVM. Basically, the behaviour of other invoke instructions in JVM, such as invokevirtual, is hard-wired, whereas invokedynamic is not. This means, normally, if you look into the class file, you can see which method is actually called, but with invokedynamic, you cannot see this.

Here is a simple example.

public class StringConcatExample {
  public static void main(String[] args) {
    System.out.println(args[0] + " and " + args[1]);
  }
}

What does String concat operations actually mean in Java ? The answer is different before and after Java 9.

String Concatenation in Java 8, no invokedynamic

Java 8 compiles the String concat operations into StringBuilder::append calls, as we can see here, by decompiling the class above:

$ javap -c StringConcatExample

Compiled from "StringConcatExample.java"
public class StringConcatExample {
  public StringConcatExample();
    Code:
       0: aload_0
       1: invokespecial #1                  // Method java/lang/Object."<init>":()V
       4: return
public static void main(java.lang.String[]);
    Code:
       0: getstatic     #2                  // Field java/lang/System.out:Ljava/io/PrintStream;
       3: new           #3                  // class java/lang/StringBuilder
       6: dup
       7: invokespecial #4                  // Method java/lang/StringBuilder."<init>":()V
      10: aload_0
      11: iconst_0
      12: aaload
      13: invokevirtual #5                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      16: ldc           #6                  // String  and
      18: invokevirtual #5                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      21: aload_0
      22: iconst_1
      23: aaload
      24: invokevirtual #5                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      27: invokevirtual #7                  // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
      30: invokevirtual #8                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
      33: return
}

You can see three StringBuilder::append calls in the listing above. So basically, args[0] + " and " + args[1] is equivalent to:

new StringBuilder().append(args[0]).append(" and ").append(args[1]).toString()

String Concatenation in Java 9, with invokedynamic

JEP 280, released with Java 9, modified this behavior. In order to optimize the code above in Java 8, the compiler had to be modified, because concatenation is translated into StringBuilder::append calls by the compiler. However, it is not feasible to modify the compiler for such optimizations every time. So the String concatenation is changed to use invokedynamic instruction, so the optimizations can be made without changing the compiler.

So what does it actually mean to use invokedynamic ?

If we compile the above class with Java 9, its result is this:

$ javap -c StringConcatExample

Compiled from "StringConcatExample.java"
public class StringConcatExample {
  public StringConcatExample();
    Code:
       0: aload_0
       1: invokespecial #1                  // Method java/lang/Object."<init>":()V
       4: return
public static void main(java.lang.String[]);
    Code:
       0: getstatic     #2                  // Field java/lang/System.out:Ljava/io/PrintStream;
       3: aload_0
       4: iconst_0
       5: aaload
       6: aload_0
       7: iconst_1
       8: aaload
       9: invokedynamic #3,  0              // InvokeDynamic #0:makeConcatWithConstants:(Ljava/lang/String;Ljava/lang/String;)Ljava/lang/String;
      14: invokevirtual #4                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
      17: return
}

As mentioned, the String concatenation is converted into a dynamic call (invokedynamic) of makeConcatWithConstants, rather than invoking StringBuilder::append`, and if we look at the verbose output of the disassembler:

$ javap -c -v StringConcatExample

(not all the output is shown)
(I filtered the output to include only the information related to String concat invokedynamic operation)
(I cleaned the output to show only raw data, not the resolved entries from the constant pool)

...
Constant Pool:
...
    #3 = InvokeDynamic      #0:#21         
   #19 = MethodHandle       6:#29          
   #20 = String             #30
   #21 = NameAndType        #31:#32        
   #29 = Methodref          #36.#37        
   #30 = Utf8               \u0001 and \u0001
   #31 = Utf8               makeConcatWithConstants
   #32 = Utf8               (Ljava/lang/String;Ljava/lang/String;)Ljava/lang/String;
   #36 = Class              #38            
   #37 = NameAndType        #31:#42        
   #38 = Utf8               java/lang/invoke/StringConcatFactory
   #42 = Utf8               (Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite;
...
{
...
  public static void main(java.lang.String[]);
  ... 
         9: invokedynamic #3,  0
  ...
}
...
BootstrapMethods:
  0: #19
    Method arguments:
      #20

From the spec of invokedynamic, we learn that:

  • Its format is: invokedynamic indexbyte1 indexbyte2 0 0

  • Each occurance of invokedynamic is called a dynamic call site

  • indexbyte1 and indexbyte2 is an index (indexbyte1 « 8 | indexbyte2) refering to the constant pool. The item in the constant pool is a call site specifier.

  • The call site specifier in the constant pool entry has a CONSTANT_InvokeDynamic_info structure:

    CONSTANT_InvokeDynamic_info {
        u1 tag;
        u2 bootstrap_method_attr_index;
        u2 name_and_type_index;
    }
    
    • bootstrap_method_attr_index is an index in bootstrap_methods arrray in the bootstrap method table of the class.
    • name_and_type_index is an index in constant pool for a CONSTANT_NameAndType_Info representing the method’s name and descriptor.

    In our example, invokedynamic is invoked with #3, 0, so the constant pool entry #3 is our call site specifier, and call site specifier refers to Bootstrap Methods table entry #0 and NameAndType constant pool entry #21. 0 is just the last two bytes, which are always set to 0 for invokedynamic.

Below is a graph of all the references starting from invokedynamic. If you are not familiar with Java Class File format, basically every static information in the bytecode is kept in the tables, primarily in the constant pool, and referred by indices from the bytecodes. The red boxes are the entries of Constant Pool, the green one is Bootstrap Methods table entry.

digraph g {
  node[style=filled, shape=box]
  invokedynamic[label="invokedynamic #3, 0"]
  bm0[label="#0:BootstrapMethods[0]\n#19, #20", fillcolor=green]
  cp3[label="#3:InvokeDynamic\n#0, #21", fillcolor=red]
  cp19[label="#19:MethodHandle\n6 (REF_invokeStatic	):#29", fillcolor=red]
  cp20[label="#20:String\n#30", fillcolor=red]
  cp21[label="#21:NameAndType\n#31, #32", fillcolor=red]
  cp29[label="#29:Methodref\n#36:#37", fillcolor=red]
  cp30[label="#30:Utf8\n\\u0001 and \\u0001", fillcolor=red]
  cp31[label="#31:Utf8\nmakeConcatWithConstants", fillcolor=red]
  cp32[label="#32:Utf8\n(Ljava/lang/String;\nLjava/lang/String;)\nLjava/lang/String;", fillcolor=red]
  cp36[label="#36:Class\n#38", fillcolor=red]
  cp37[label="#37:NameAndType\n#31, #42", fillcolor=red]
  cp38[label="#38:Utf8\njava/lang/invoke/StringConcatFactory", fillcolor=red]
  cp42[label="#42:Utf8\n(Ljava/lang/invoke/MethodHandles$Lookup;\nLjava/lang/String;\nLjava/lang/invoke/MethodType;\nLjava/lang/String;\n[Ljava/lang/Object;)\nLjava/lang/invoke/CallSite;", fillcolor=red]
  invokedynamic -> cp3
  cp3 -> bm0
  cp3 -> cp21
  bm0 -> cp19
  bm0 -> cp20
  cp20 -> cp30
  cp21 -> cp31
  cp21 -> cp32
  cp19 -> cp29
  cp29 -> cp36
  cp29 -> cp37
  cp37 -> cp31
  cp37 -> cp42
  cp36 -> cp38
}

I am not going to explain each entry as it is not our focus and it is also understandable from the graph, but I will describe the invokedynamic execution.

If you look at the decompilation listing above again, you will see the following just before invokedynamic:

3: aload_0
4: iconst_0
5: aaload
6: aload_0
7: iconst_1
8: aaload

These instructions read the args[0] and args[1] and puts them into the operand stack. Before invokedynamic, we have these two elements in the operand stack.

For invokedynamic execution:

  • First, these are pushed to operand stack:
    • a reference to java.lang.invoke.MethodHandle for the bootstrap method
    • a reference to java.lang.invoke.MethodHandles.Lookup object for the class
    • a reference to java.lang.String for the method name
    • a reference to java.lang.invoke.MethodType for the method descriptor
    • any other static arguments
  • The bootstrap method is executed like an execution of an invokevirtual instruction with the following properties:
    • the method’s name is invoke
    • the method’s descriptor has a return type of java.lang.invoke.CallSite
    • the method’s descriptor has parameter types derived from the operand stack, of which the first four are:
      • java.lang.invoke.MethodHandle
      • java.lang.invoke.MethodHandles.Lookup
      • java.lang.String
      • java.lang.invoke.MethodType
    • if any static arguments are specified (like here, “\u0001 and \u0001”, referenced from bootstrap methods entry #0 to constant pool entry #20), these are appended to parameter types of the method.

The Bootstrap Method here is: java.lang.invoke.StringConcatFactory::makeConcatWithConstants with the argument types:

  • java.lang.invoke.MethodHandles.Lookup
  • java.lang.String
  • java.lang.invoke.MethodType
  • java.lang.String
  • java.lang.Object[]

and return type: java.lang.invoke.CallSite.

and here it is from the Java 9 source code:

public static CallSite makeConcatWithConstants(MethodHandles.Lookup lookup,
                                               String name,
                                               MethodType concatType,
                                               String recipe,
                                               Object... constants) throws StringConcatException {
  if (DEBUG) {
    System.out.println("StringConcatFactory " + STRATEGY + " is here for " + concatType + ", {" + recipe + "}, " + Arrays.toString(constants));
  }

  return doStringConcat(lookup, name, concatType, false, recipe, constants);
}

The arguments are set as:

  • java.lang.invoke.MethodHandles.Lookup: object for the class in which this call happened (where this dynamic call site occurs)
  • java.lang.String: method name in the call site specifier, which is makeConcatWithConstants
  • java.lang.invoke.MethodType: method descriptor in the call site specifier, which is here (String, String) returning String. This is the descriptor of the method, the bootstrap method will return, and it is going to be invoked with the parameters (args[0] and args[1]).
  • java.lang.String: recipe, static parameter ("\u0001 and \u0001")
  • java.lang.Object[]: empty here

This method is invoked like a regular Java method with invokevirtual instruction.

The recipe argument is an interesting one here, it describes the way how the concatenation is processed, and it is processed character by character:

  • \u0001 means the input to concatenation is taken from the dynamic arguments.
  • \u0002 means the input to concatenation is taken from the static bootstrap arguments.
  • any other character means a single character constant to be concatenated.

If we look into the source code of doStringConcat referenced from makeConcatWithConstants, it generates the actual MethodHandle doing the concatenation according to a strategy.

This can actually be observed in OpenJDK 9 by setting java.lang.invoke.stringConcat.debug property to true. For example:

$ java -Djava.lang.invoke.stringConcat.debug=true StringConcatExample 1 2

StringConcatFactory MH_INLINE_SIZED_EXACT is here for (String,String)String, { and }, []
1 and 2

MH_INLINE_SIZED_EXACT is actually the default strategy and it is used here. The other strategies can be selected with java.lang.invoke.stringConcat property. This strategy basically returns a method which returns a single String containing the concatenation result.

As a last note, there are actually two variants of this makeConcat methods. The one I mentioned above is makeConcatWithConstants, whereas the other is makeConcat. As it can be guessed from the name, makeConcat does not receive constants, so it is only used to concat elements of dynamic arguments.

Summary

How does String concatenation work in Java 9+ ?

  • An invokedynamic call is placed at the point of concatenation by the compiler
  • The invokedynamic call first executes the bootstrap method makeConcat[WithConstants]
  • This method, according to a strategy, returns a method
  • The returned method is called, giving the result of the concatenation

The principle should be same but StringConcatFactory::makeConcat[WithConstants] or the default strategy can be different in Java 10, 11 or 12. That is actually the whole point of this JEP, to be able to change it easily.