Writing a Java byte-code compiler for JRuby =========================================== $Id$ Copyright (c) Anders Bengtsson 2002 0. About -------- I'm writing this document at the same time as we are implementing the compiler, so not everything here is necessarily correct or even remotely true. 1. Introduction --------------- The transformation of Ruby code into Java byte-codes is done in several steps. The first step is to transform Ruby source into an AST, which was already done as part of the interpreter. The compilation of the AST to Java byte-codes is done in two steps, since we don't want to deal with the horrors of the JVM at the same time as the horrors of the AST. 2. AST -> Ruby byte-code ------------------------ The first step in compilation is translating the AST into custom high-level byte-codes. The byte-codes assume a VM with an "operand stack", which is the same model as the Java VM uses. These byte-codes serve many purposes. They are intended to be easy to translate to JVM byte-code and they can also possibly be interpreted directly. Ideally the byte-codes should also be on a slightly higher abstraction level than the syntax-oriented AST tree. We flatten out the AST and extract some information hidden in the AST, like the number of arguments a certain method call is using. A simple example, "x = 10", is in AST form a tree like this: newline-node local-assignment-node (variable-index = 3) fixnum-node (value = 10) When transformed to byte-codes it looks like this: push-fixnum (value = 10) assign-local (variable-index = 3) An interpreter working on these byte-codes would probably be faster than the AST-walking interpreter. But since we already have a working interpreter we instead focus on getting to JVM byte codes. 3. Ruby byte-code -> JVM byte-code ---------------------------------- 3.1 Invoking compiled code ------------------------- This is where it gets interesting. The big question is maybe not how to do the compilation, but how to use the resulting byte-code. ASTs and Ruby byte-code can be passed around as objects and used in many different ways, but Java byte-code has to be neatly placed in methods within classes. How do we integrate that in our Ruby runtime? We can always use reflection callbacks to reach our generated code, but that would be slow, probably slower than our interpreted code that uses indexed callbacks(*). A better idea would involve direct, compiled, calls to our methods. This could be done with custom generated callback classes or something similar to the old indexed callbacks. *) See IndexedCallback, ReflectionCallback 3.2 Compilation units --------------------- The most direct mapping from Ruby code structure into Java's class structure is to compile each Ruby file into a Java class file. The outer code of the file compiled into one rubyMain() method and all the other method and block bodies into their own methods. This makes sense from a user's point of view too: Each '.rb'-file that they see can be compiled into a corresponding binary file. Note that we do not use Java's object oriented features here. Since we have an entirely separate OO model we just use Java classes as a place to store code. For this reason it is important that we do not use the extension '.class' for the generated files, since that would confuse the users. [Add example here] 3.3 The environment ------------------- The compiled code doesn't run in a vacuum. It needs access to the Ruby runtime as much as interpreted code does. To make the code generation as easy as possible this environment must be very simple. For this purpose the two variables 'runtime' and 'self' are passed to every Ruby-implementing Java method. 3.4 Handling exceptions ----------------------- [To be written]