Fooling Truffle’s Intelligence: Part 2

The keen reader that has seen previous posts here will easily find a problem, that I overlooked. I will address it at the bottom of this post.

I got the Graal JIT compiled as the main JIT in my JVM, and it appears that performance is a bit more reasonable now. 1000 msec per iteration remains 1000 msec per iteration after compilation, which makes somewhat more sense. I am still trying to think of a better example (for instance, a primality test or something).

Primality test: I ended up creating a highly inefficient primality tester that manages to stress the JIT a bit more. 2030msec/outer iteration becomes 1141 under graal, and around 800msec becomes 700 under the original -server JIT:

public static class HardMathInnerNode extends Node {
		public long exec(long i){
			for(long m = 2; m < Math.sqrt(i); m++){
				// very inefficient primality test for testing performance
				if(i%m==0) return i;
			}
			return i+1;
		}
	}
	
	public static class HardMathNode extends Node {
		public HardMathNode(){
			this.innerNode = new HardMathInnerNode();
		}
		
		@Child HardMathInnerNode innerNode;
		public long exec(long m) {
			long i;
			for (i = 0; i < 1_000_00; i++) {
				i  = innerNode.exec(i);
				
				
				if (CompilerDirectives.injectBranchProbability(0.00001, i % 1_000_000 == 0) && CompilerDirectives.injectBranchProbability(0.01, m == 42)) {
					//System.out.println("TRAP");

				}
			}
			return i;
		}
	}

I’m running 60 iterations of the above code.

The actual problem: If one looks carefully at the code above, there are nodes declared @Child fooNode. While this may look innocuous, the fact that the node is package-private does not allow the JIT and compiler to do anything about inlining the call, making it nearly useless. I updated the code to have child nodes declared as private, and the time per iteration is now around 750-800 msec before compilation, and 600-610 msec after compilation (although, strangely, there are a few iterations right after compilation that are equally slow, perhaps because the new code is being swapped in lazily). VM args are -server -G:+TraceTruffleCompilationDetails -XX:+TraceDeoptimization -G:+TraceTrufflePerformanceWarnings -G:TruffleCompilationThreshold=5 which causes code to start with old Hotspot optimizations over the actual Truffle runtime code itself, and then causes it to transition to Truffle-aware Graal optimizations.

package com.wordpress.hextruffle.tests;

import java.util.Random;

import com.oracle.truffle.api.CallTarget;
import com.oracle.truffle.api.CompilerDirectives;
import com.oracle.truffle.api.Truffle;
import com.oracle.truffle.api.TruffleRuntime;
import com.oracle.truffle.api.CompilerDirectives.CompilationFinal;
import com.oracle.truffle.api.frame.VirtualFrame;
import com.oracle.truffle.api.nodes.Node;
import com.oracle.truffle.api.nodes.RootNode;

//-G:+TraceTruffleCompilationDetails -XX:+TraceDeoptimization -G:+TraceTrufflePerformanceWarnings -G:TruffleCompilationThreshold=5

public class LoopInvocationTest {
	public static class DivisibilityTestNode extends Node {
		final int target;
		
		public DivisibilityTestNode(int target) {
			super();
			this.target = target;
		}

		public boolean exec(long i, long m){
			return (i%m==0);
		}
	}
	
	
	public static class HardMathInnerNode extends Node {
		private @Child DivisibilityTestNode dtn = new DivisibilityTestNode(0);
		
		public long exec(long i){
			for(long m = 2; m < Math.sqrt(i); m++){
				// very inefficient primality test for testing performance
				if(dtn.exec(i, m)) return i;
				//if(i%m==0) return i;
			}
			return i+1;
		}
	}
	
	public static class HardMathNode extends Node {
		public HardMathNode(){
			this.innerNode = new HardMathInnerNode();
		}
		
		private @Child HardMathInnerNode innerNode;
		public long exec(long m) {
			long i;
			for (i = 0; i < 1_000_00; i++) {
				i  = innerNode.exec(i);
				
				
				if (CompilerDirectives.injectBranchProbability(0.00001, i % 1_000_000 == 0) && CompilerDirectives.injectBranchProbability(0.01, m == 42)) {
					//System.out.println("TRAP");

				}
			}
			return i;
		}
	}

	public static class ImmutableNode extends Node {

		private @Child
		HardMathNode child;

		public ImmutableNode() {
			this.child = new HardMathNode();
		}

		public long exec(Integer q) {
			long i = 0;
			for (long m = 0; m < 10; m++) {
				child.exec(m+q);
				//i += Math.min(0, child.exec());
			}
			i += child.exec(q);
			if (CompilerDirectives.inCompiledCode())
				return q + i;
			return -q - i;
		}

	}

	public static class TestNode extends RootNode {
		private @Child ImmutableNode child;

		public TestNode(ImmutableNode child) {
			super();
			this.child = child;
		}

		@Override
		public Object execute(VirtualFrame frame) {
			System.out.println(CompilerDirectives.inCompiledCode());
			return child.exec((Integer) frame.getArguments()[0]);
		}

	}

	public static void main(String[] args) {
		TruffleRuntime runtime = Truffle.getRuntime();
		TestNode root = new TestNode(new ImmutableNode());
		CallTarget tgt = runtime.createCallTarget(root);
		for (int i = 0; i < 60; i++) {
			long start = System.currentTimeMillis();
			long r = (long) tgt.call(i);
			System.out.println(System.currentTimeMillis()-start+":"+r);
		}
	}
}

Anyway, apparently starting with the command-line parameter -G:TruffleCompilationThreshold=1 throws an assertion error after a few iterations. Can anyone confirm or explain why this is the case?

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s