I’ve written a JVM in Rust · Andrea Bergia’s Web site
![](https://blinkingrobots.com/wp-content/uploads/2023/07/I-have-written-a-JVM-in-Rust-·-Andrea-Bergias.png)
Printed Wednesday, Jul 12, 2023
–
2164 phrases, 11 minutes
These days I’ve been spending fairly a little bit of time studying Rust, and as any sane individual would do, after writing a couple of 100 strains packages I’ve determined to tackle one thing slightly bit extra formidable: I’ve written a Java Digital Machine in Rust. ???? With plenty of originality, I’ve referred to as it rjvm
. The code is out there on GitHub.
I need to stress that it is a toy JVM, constructed for studying functions and never a severe implementation. Specifically, it doesn’t assist:
- generics
- threads
- reflection
- annotations
- I/O
- simply in time compiler
- string interning
Nonetheless, there are fairly a couple of non-trivial issues carried out:
- management circulation statements (
if, for, ...
) - primitive and object creations
- digital and static technique invocation
- exceptions
- rubbish assortment
- class decision from a
jar
file
For instance, the next is a part of the check suite:
class StackTracePrinting {
public static void predominant(String[] args) {
Throwable ex = new Exception();
StackTraceElement[] stackTrace = ex.getStackTrace();
for (StackTraceElement ingredient : stackTrace) {
tempPrint(
ingredient.getClassName() + "::" + ingredient.getMethodName() + " - " +
ingredient.getFileName() + ":" + ingredient.getLineNumber());
}
}
// We use this rather than System.out.println as a result of we do not have actual I/O
non-public static native void tempPrint(String worth);
}
It makes use of the true rt.jar
containing the lessons from the OpenJDK 7 – thus, within the instance above, the java.lang.StackTraceElement
class comes from an actual JDK!
I’m very proud of what I’ve realized, about Rust and about find out how to implement a digital machine. Specifically, I’m tremendous comfortable about having carried out an actual, working, rubbish collector. It’s fairly mediocre, nevertheless it’s mine and I adore it. ???? On condition that I’ve achieved what I got down to do initially, I’ve determined to cease the undertaking right here. I do know there are bugs, however I don’t plan to repair them.
On this publish, I will provide you with an summary of how my JVM works. In additional articles, I’ll go into extra element about a number of the points mentioned right here.
Code group
The code is a typical Rust undertaking. I’ve cut up it into three crates (i.e. packages):
reader
, which is ready to learn.class
information and comprises numerous sorts that mannequin their content material;vm
, which comprises the digital machine that may execute the code as a library;vm_cli
, which comprises a quite simple command-line launcher to run the VM, within the spirit of thejava
executable.
I’m contemplating extracting the reader
crate in a separate repository and publishing it on crates.io, because it may really be helpful to another person.
Parsing a .class
file
As you recognize, Java is a compiled language – the javac
compiler takes your .java
supply information and produces numerous .class
information, usually distributed in a .jar
file – which is only a zip
. Thus, the very first thing to do to execute some Java code is to load a .class
file, containing the bytecode generated by the compiler. A category file comprises numerous issues:
- metadata concerning the class, similar to its identify or the supply file identify
- the superclass identify
- the carried out interfaces
- the fields, together with their sorts and annotations
- and the strategies with:
- their descriptor, which is a string representing the kind of every parameter and the strategy’s return kind
- metadata such because the
throws
clause, annotation, generics data - and the bytecode, together with some further metadata such because the exception handler desk and the road numbers desk.
As talked about above, for rjvm
I’ve created a separate crate, named reader
, which might parse a category file and return a Rust struct that fashions a category and all its content material.
Executing strategies
The principle API of the vm
crate is Vm::invoke
, which is used to execute a technique. It takes a CallStack
, which is able to comprise the assorted CallFrame
, one for every technique being executed. For executing predominant
, the decision stack will initially be empty, and a brand new body shall be created to run it. Then, every perform invocation will add a brand new body to the decision stack. When a technique’s execution completes, its corresponding body shall be dropped and faraway from the decision stack.
Most strategies shall be carried out in Java, and thus their bytecode shall be executed. Nonetheless, rjvm
additionally helps native strategies, i.e. strategies which might be carried out straight by the JVM and never within the Java bytecode. There are fairly a couple of of them within the “decrease components” of the Java API, the place interplay with the working system (for instance, to do I/O) or the assist runtime is important. Some examples of the latter you might need seen embody System::currentTimeMillis
, System::arraycopy
, or Throwable::fillInStackTrace
. In rjvm
, these are carried out by Rust functions.
The JVM is a stack-based virtual machine, i.e. the bytecode directions function primarily on a price stack. There’s additionally a set of native variables, recognized by an index, that can be utilized to retailer values and go arguments to strategies. These are related to every name body in rjvm
.
Modeling values and objects
The sort Value
fashions a attainable worth of a neighborhood variable, stack ingredient, or object’s area, and is carried out as follows:
/// Fashions a generic worth that may be saved in a neighborhood variable or on the stack.
#[derive(Debug, Default, Clone, PartialEq)]
pub enum Worth<'a> {
/// An unitialized ingredient. Ought to by no means be on the stack,
/// however it's the default state for native variables.
#[default]
Uninitialized,
/// Fashions all of the 32-or-lower-bits sorts within the jvm: `boolean`,
/// `byte`, `char`, `quick`, and `int`.
Int(i32),
/// Fashions a `lengthy` worth.
Lengthy(i64),
/// Fashions a `float` worth.
Float(f32),
/// Fashions a `double` worth.
Double(f64),
/// Fashions an object worth
Object(AbstractObject<'a>),
/// Fashions a null object
Null,
}
As an apart, that is one place the place a sum kind, similar to Rust’s enum
, is a superb abstraction – it’s nice for expressing the truth that a price may be of a number of differing kinds.
For storing objects and their values, I initially used a easy struct referred to as Object
containing a reference to the category (to mannequin the item’s kind) and a Vec<Worth>
for storing fields’ values. Nonetheless, once I carried out the rubbish collector, I modified this to make use of a lower-level implementation, with a ton of pointers and casts – fairly C fashion! Within the present implementation, an AbstractObject
(which fashions a “actual” object, or an array) is just a pointer to an array of bytes, which comprise a few header phrases after which the fields’ values.
Executing directions
Executing a technique means executing its bytecode directions, one after the other. The JVM has a large checklist of directions (over 200!), encoded by one byte within the bytecode. Many directions are adopted by arguments, and a few have a variable size. That is modeled within the code by the kind Instruction
:
/// Represents a Java bytecode instruction.
#[derive(Clone, Copy, Debug, Eq, PartialEq)]
pub enum Instruction {
Aaload,
Aastore,
Aconst_null,
Aload(u8),
// ...
The execution of a technique will preserve, as talked about above, a stack and an array of native variables, referred by the directions through their index. It would additionally initialize this system counter to zero – that’s, the deal with of the subsequent instruction to execute. The instruction shall be processed and this system counter up to date – usually superior by one, however numerous bounce directions can transfer it to a special location. These are used to implement all circulation management statements, similar to if
, for
, or whereas
.
A particular household of instruction is fabricated from these that may invoke one other technique. There are numerous methods of resolving which technique must be invoked: digital or static lookup are the primary ones, however there are others. After resolving the proper instruction, rjvm
will add a brand new body to the decision stack and begin the strategy’s execution. The tactic’s return worth shall be pushed to the stack until it’s void
, and execution will resume.
The Java bytecode format is kind of fascinating and I plan to dedicate a publish to the assorted type of directions.
Exceptions
Exceptions are fairly a posh beast to implement since they break the conventional management circulation, and may return early from a technique (and propagate on the decision stack!). I’m fairly proud of the best way I’ve carried out them, although, and I’m going to indicate you a number of the related code.
The very first thing it’s good to know is that any catch
block corresponds to an entry of a technique’s exception desk – every entry comprises the vary of coated program counters, the deal with for the primary instruction within the catch block, and the exception’s class identify which the block catches.
Subsequent, the signature of CallFrame::execute_instruction
is as follows:
fn execute_instruction(
&mut self,
vm: &mut Vm<'a>,
call_stack: &mut CallStack<'a>,
instruction: Instruction,
) -> Outcome<InstructionCompleted<'a>, MethodCallFailed<'a>>
The place the categories are:
/// Doable execution results of an instruction
enum InstructionCompleted<'a> {
/// Signifies that the instruction executed was one of many return household. The caller
/// ought to cease the strategy execution and return the worth.
ReturnFromMethod(Possibility<Worth<'a>>),
/// Signifies that the instruction was not a return, and thus the execution ought to
/// resume from the instruction on the program counter.
ContinueMethodExecution,
}
/// Fashions the truth that a technique execution has failed
pub enum MethodCallFailed<'a> {
InternalError(VmError),
ExceptionThrown(JavaException<'a>),
}
and the usual Rust Outcome
kind is:
enum Outcome<T, E> {
Okay(T),
Err(E),
}
Thus, executing an instruction may end up in 4 attainable states:
- the instruction was executed efficiently, and the execution of the present technique can proceed (the usual case);
- the instruction was executed efficiently, and it was a return instruction, thus the present technique ought to return with (optionally) a return worth;
- the instruction couldn’t be executed, as a result of some inner VM error occurred;
- or the instruction couldn’t be executed, as a result of a typical Java exception was thrown.
The code that executes a method is thus as follows:
/// Executes the entire technique
impl<'a> CallFrame<'a> {
pub fn execute(
&mut self,
vm: &mut Vm<'a>,
call_stack: &mut CallStack<'a>,
) -> MethodCallResult<'a> {
self.debug_start_execution();
loop {
let executed_instruction_pc = self.laptop;
let (instruction, new_address) =
Instruction::parse(
self.code,
executed_instruction_pc.0.into_usize_safe()
).map_err(|_| MethodCallFailed::InternalError(
VmError::ValidationException)
)?;
self.debug_print_status(&instruction);
// Transfer laptop to the subsequent instruction, _before_ executing it,
// since we wish a "goto" to override this
self.laptop = ProgramCounter(new_address as u16);
let instruction_result =
self.execute_instruction(vm, call_stack, instruction);
match instruction_result {
Okay(ReturnFromMethod(return_value)) => return Okay(return_value),
Okay(ContinueMethodExecution) => { /* proceed the loop */ }
Err(MethodCallFailed::InternalError(err)) => {
return Err(MethodCallFailed::InternalError(err))
}
Err(MethodCallFailed::ExceptionThrown(exception)) => {
let exception_handler = self.find_exception_handler(
vm,
call_stack,
executed_instruction_pc,
&exception,
);
match exception_handler {
Err(err) => return Err(err),
Okay(None) => {
// Bubble exception as much as the caller
return Err(MethodCallFailed::ExceptionThrown(exception));
}
Okay(Some(catch_handler_pc)) => {
// Re-push exception on the stack and proceed
// execution of this technique from the catch handler
self.stack.push(Worth::Object(exception.0))?;
self.laptop = catch_handler_pc;
}
}
}
}
}
}
}
I do know that there are fairly a couple of implementation particulars on this code, however I hope it provides an concept of how utilizing Rust’s Outcome
and sample matching maps rather well to the outline of the conduct above. I’ve to say I’m somewhat pleased with this code. ????
Rubbish assortment
The ultimate milestone in rjvm
has been the implementation of the rubbish collector. The algorithm I’ve chosen is a stop-the-world (which trivially follows from not having threads!) semispace copying collector. I’ve implemented a (poorer) variant of Cheney’s algorithm – however I actually ought to go and implement the true factor… ????
The thought is to separate the accessible reminiscence into two components, referred to as semispaces: one shall be energetic and used to allocate objects, and the opposite shall be unused. When full, a rubbish assortment shall be triggered and all alive objects shall be copied to the opposite semispace. Then, all references to things shall be up to date, in order that they level to the brand new copies. Lastly, the position of the 2 shall be swapped – just like how blue-green deployment works.
This algorithm has the next traits:
- clearly, it wastes plenty of reminiscence (half of the attainable max reminiscence!);
- allocations are tremendous quick (bumping a pointer);
- copying and compacting objects signifies that it doesn’t need to cope with reminiscence fragmentation;
- compacting objects can enhance performances, as a consequence of higher cache line utilization.
Actual Java VMs use much more refined algorithms, usually generational garbage collectors, similar to G1 or the parallel GC, which use evolutions of the copying technique.
In writing rjvm
, I realized rather a lot and I had plenty of enjoyable. Can’t ask for extra from a facet undertaking… however perhaps subsequent time I’ll choose one thing a bit much less formidable to study a brand new programming language! ????
As an apart, I need to say that I had plenty of enjoyable with Rust. I believe it’s a nice language, as I have written before, and I’ve actually loved utilizing it for implementing my JVM!
In case you are fascinated with additional particulars on how rjvm
is carried out (and on how the JVM really works), keep tuned for the upcoming posts!