# Rava A Java interpreter written in C99. Compiles and executes Java source code. Beats Python on all benchmarks. Author: retoor ## Introduction Rava is a complete Java interpreter implemented in C. It provides a full compilation pipeline from source to execution. The pipeline: - Lexer tokenizes Java source code - Parser builds an abstract syntax tree - Semantic analyzer performs type checking - IR generator produces stack-based bytecode - Runtime VM executes the bytecode Supported features: - Primitives: int, long, double, boolean, char - Arrays, strings, and array initializers - Objects, instance methods, and instanceof - Inheritance and interfaces - Control flow: if/else, while, do-while, for, enhanced for-each, switch/case, break, continue - Operators: arithmetic, bitwise (AND, OR, XOR, shifts), ternary (? :) - Exception handling: try/catch/finally, throw - Math functions and String methods - File I/O - Recursion - System.out.println Compiles with `-Wall -Wextra -Werror`. Zero warnings. No memory leaks. ## Installation ```bash make ``` ## Usage Run a Java file: ```bash ./rava file.java ./rava file.java ClassName ./rava file.java ClassName method ``` Start interactive REPL: ```bash ./rava ``` Example source code: ```java public class Fibonacci { public static int fib(int n) { if (n <= 1) { return n; } return fib(n - 1) + fib(n - 2); } public static int main() { System.out.println(fib(30)); return 0; } } ``` Run the benchmark: ```bash make benchmark ``` Run all tests: ```bash make test ``` ## Interactive REPL Rava includes a full-featured interactive interpreter. ``` $ ./rava Rava 1.0 Interactive Interpreter Type "%help" for commands, "%quit" to exit >>> int x = 10; >>> int y = 20; >>> x + y 30 >>> int fib(int n) { if (n <= 1) return n; return fib(n-1) + fib(n-2); } Method 'fib' defined. >>> fib(10) 55 >>> class Point { public int x; public int y; public Point(int px, int py) { this.x = px; this.y = py; } } Class 'Point' defined. >>> Point p = new Point(3, 4); >>> p.x 3 >>> %whos Variable Type Value -------- ---- ----- x int 10 y int 20 p Point null >>> %quit ``` ### REPL Features - Variable declarations with persistence across executions - Expression evaluation with automatic output - User-defined methods callable after definition - User-defined classes instantiable after definition - Array declarations - Multi-line input with brace/bracket/paren tracking - String methods and Math functions - Control flow statements (for, while, if/else, switch) ### Magic Commands | Command | Description | |------------|--------------------------------| | %help | Show help for commands | | %whos | List all variables with types | | %who | List all variable names | | %methods | List session methods | | %classes | List session classes | | %reset | Clear all session state | | %clear | Clear screen | | %debug | Toggle debug mode | | %history | Show input history | | %quit | Exit REPL | ### REPL Tests ```bash make test_repl ``` ## Performance Rava beats Python on all benchmarks. | Benchmark | Rava | Python | Speedup | |-----------|------|--------|---------| | Fibonacci(30) | 257ms | 291ms | 1.13x faster | | Primes(100k) | 273ms | 416ms | 1.52x faster | | Sum(10M) | 666ms | 1104ms | 1.66x faster | Started at 1402ms for Fibonacci. After 9 optimization phases: 257ms. That is 5.5x faster. ## Structure ``` rava/ ├── lexer/ │ ├── lexer.h │ ├── lexer_tokenizer.c │ ├── lexer_keywords.c │ └── lexer_literals.c ├── parser/ │ ├── parser.h │ ├── parser.c │ ├── parser_expressions.c │ ├── parser_statements.c │ ├── parser_declarations.c │ └── parser_printer.c ├── types/ │ ├── types.h │ └── types.c ├── semantic/ │ ├── semantic.h │ ├── semantic.c │ ├── symbol_table.h │ └── symbol_table.c ├── ir/ │ ├── ir.h │ ├── ir.c │ ├── ir_gen.h │ └── ir_gen.c ├── runtime/ │ ├── runtime.h │ ├── runtime.c │ ├── nanbox.h │ ├── fastframe.h/c │ ├── labeltable.h/c │ ├── methodcache.h/c │ ├── superinst.h/c │ └── gc/ ├── repl/ │ ├── repl.h/c │ ├── repl_session.h/c │ ├── repl_input.h/c │ ├── repl_executor.h/c │ ├── repl_output.h/c │ ├── repl_commands.h/c │ ├── repl_history.h/c │ ├── repl_types.h │ ├── tests/ │ └── examples/ ├── tests/ │ └── test_*.c ├── examples/ │ └── *.java └── Makefile ``` ## Optimization Nine phases of optimization using industry-standard techniques from V8, LuaJIT, and CPython. ### NaN Boxing 64-bit value representation using IEEE 754 NaN space. Invented by Andreas Gal for SpiderMonkey. All types packed into 8 bytes instead of 16. Branchless type checking via bitwise operations. Location: `runtime/nanbox.h` ### Fast Frames Pre-allocated frame pool with LIFO stack discipline. Standard technique from Lua and LuaJIT. No heap allocation per function call. Constant-time allocation. Cache-friendly contiguous memory. Location: `runtime/fastframe.h`, `runtime/fastframe.c` ### Label Table O(1) jump resolution via pre-computed label to PC mapping. Used in all bytecode interpreters including CPython and LuaJIT. Replaces O(n) linear search. Location: `runtime/labeltable.h`, `runtime/labeltable.c` ### Method Cache Hash-based method lookup cache. Based on inline cache technique from V8 and Hotspot JVM. O(1) instead of O(n*m) nested search. Cache hit rate typically above 90%. Location: `runtime/methodcache.h`, `runtime/methodcache.c` ### Superinstructions Bytecode fusion combining common opcode sequences. Developed by Ertl and Krall, used in LuaJIT and CPython 3.11+. Reduces instruction dispatch overhead. Fused opcodes: - INC_LOCAL: load + const 1 + add + store - DEC_LOCAL: load + const 1 + sub + store - ADD_LOCAL_TO_LOCAL: fused accumulator pattern - LOAD_LOCAL_CONST_LT_JUMPFALSE: fused loop condition - LOAD_TWO_LOCALS: combined local loads Location: `runtime/superinst.h`, `runtime/superinst.c` ### Computed Goto GCC extension for faster opcode dispatch. Uses jump table instead of switch statement. Eliminates branch prediction overhead. ### Profile-Guided Optimization PGO build using GCC profile instrumentation. Collects runtime data from benchmark runs. Rebuilds with optimization hints for hot paths. ```bash make pgo ``` ## References ### Source Repositories - V8 JavaScript Engine: https://github.com/v8/v8 - LuaJIT: https://github.com/LuaJIT/LuaJIT - CPython: https://github.com/python/cpython - PyPy: https://github.com/pypy/pypy - OpenJDK Hotspot: https://github.com/openjdk/jdk - SpiderMonkey: https://github.com/anthropics/mozilla-central ### Documentation - Lua Manual: https://www.lua.org/manual/5.4/ - GCC Optimization: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html - LLVM Documentation: https://llvm.org/docs/ - JVM Specification: https://docs.oracle.com/javase/specs/ ### Standards - IEEE 754 Floating Point: https://ieeexplore.ieee.org/document/8766229 ### Performance Resources - Agner Fog CPU Optimization: https://www.agner.org/optimize/ - Systems Performance by Brendan Gregg: http://www.brendangregg.com/