Writing An Interpreter In Rust
Last month, Thorsten Ball released his first book: Writing An Interpreter In Go. It’s an awesome book that teaches its readers how to write their own programming language, step by step. It comes bundled with the complete Go code, including tests.
While reading it, I was looking for a challenge. I’m a huge fan of Rust, a safe systems programming language by Mozilla, so I thought it might be a good exercise to port the Go implementation of the programming language “Monkey” to Rust.
Here’s an example of what Monkey code looks like (you can find more examples here):
// Bind values to names with let statements
let version = 1;
let name = "Monkey programming language";
let myArray = [1, 2, 3, 4, 5];
let coolBooleanLiteral = true;
// Use expressions to produce values
let awesomeValue = (10 / 2) * 5 + 30;
let arrayWithValues = [1 + 1, 2 * 2, 3];
I’ve finished implementing the first chapter of the book (the lexer), and decided to share my work. The code is not very idiomatic yet, so I’d welcome pull requests to improve it.
To checkout and run my version of the Monkey lexer, here’s how to do it (in case you do not have Rust or Cargo installed, you can get it via rustup):
$ git clone https://github.com/chr4/writing_an_interpreter_in_rust
$ cd writing_an_interpreter_in_rust
$ cargo tests
$ cargo run
While I followed the original source code quite literally, there are a few things that can’t be done in Rust in the same way it’s implemented in Go. Furthermore, I tried to write idiomatic Rust code, which resulted in quite a few adaptions.
Let me walk you through the challenges that I faced, and the resulting changes.
Porting token.go
The first step was to to migrate was the token constants. Here’s how it was implemented in Go:
const (
ILLEGAL = "ILLEGAL"
EOF = "EOF"
// Identifiers + literals
IDENT = "IDENT" // add, foobar, x, y, ...
INT = "INT" // 1343456
// [...]
)
This can be easily ported to Rust by using constants as well:
pub const ILLEGAL: TokenType = "ILLEGAL";
pub const EOF: TokenType = "EOF";
// Identifiers + literals
pub const IDENT: TokenType = "IDENT"; // add, foobar, x, y, ...
pub const INT: TokenType = "INT"; // 1343456
// [...]
However, there’s a more idiomatic way of handling this in Rust. Rust supports Enums. So let’s use enum
instead:
#[derive(Debug, PartialEq)]
pub enum TokenType {
Illegal,
EndOfFile,
// Identifiers + literals
Ident, // add, foobar, x, y, ...
Integer, // 1343456
// [...]
}
Note: For our implementation, the Debug
and PartialEq
traits are derived.
This looks way cleaner. Have a look at this commit to see the full changeset necessary to migrate from constants to an Enum.
Porting lexer.go
The first problem I came across when porting the lexer was, that the Go version uses the integer 0
(as a byte
) to indicate when the end of file is reached. Let’s have a look at the Go readChar()
function presented in the book:
func (l *Lexer) readChar() {
if l.readPosition >= len(l.input) {
l.ch = 0
} else {
l.ch = l.input[l.readPosition]
}
l.position = l.readPosition
l.readPosition += 1
}
The caller then checks whether l.ch
is 0
. If that’s the case, it generates an EOF token:
switch l.ch {
case '=':
// [...]
case 0:
tok.Literal = ""
tok.Type = token.EOF
default:
// [...]
}
In Rust, a It’s more idiomatic to use an char
can’t take an integer value.Option<char>
and return None
in case the end of the file is reached. It might be even more idiomatic to return a Result
with an EOF error, but choosing Option<char>
seemed to be the quicker solution for now.
Edit: A Reddit user pointed out that this statement is not true. You can assign an integer to a char
casting it using the following syntax: let character = 97 as char;
. Thanks!
match self.ch {
Some(ch @ '=') => {
[...]
// Handle EOF
None => {
tok.literal = String::new();
tok.token_type = token::TokenType::EndOfFile;
}
}
The full changeset is implemented this commit.
Furthermore, I’ve changed the peekChar()
function to peek_char_eq(ch: char) -> bool
, which seemed like a quick solution for not having to return another Option<char>
type.
fn peek_char_eq(&mut self, ch: char) -> bool {
// Return false on EOF
if self.read_position >= self.input.len() {
return false;
}
let peek_ch = self.input
.chars()
.nth(self.read_position)
.unwrap();
peek_ch == ch
}
Porting the REPL
Porting the REPL was pretty straightforward. Read a line from STDIN and then run the lexer on it.
The only pitfall here is that STDOUT needs to be flushed manually so the PROMPT >>
is actually displayed:
use std::io::{self, BufRead, Write};
pub mod lexer;
pub mod token;
fn main() {
let stdin = io::stdin();
loop {
// Stdout needs to be flushed, due to missing newline
print!(">> ");
io::stdout().flush().expect("Error flushing stdout");
let mut line = String::new();
stdin.lock().read_line(&mut line).expect("Error reading from stdin");
let mut lexer = lexer::Lexer::new(&mut line);
loop {
let tok = lexer.next_token();
println!("{:?}", tok);
if tok.token_type == token::TokenType::EndOfFile {
break;
}
}
}
}
Running the tests and REPL
This is the exiting part. Let’s see whether the implementation (still) works!
Run the tests:
$ cargo test
Finished debug [unoptimized + debuginfo] target(s) in 0.1 secs
Running target/debug/deps/writing_an_interpreter_in_rust-e35a8839d5e6ef77
running 3 tests
test lexer::new_token_test ... ok
test token::lookup_ident_test ... ok
test lexer::next_token_test ... ok
test result: ok. 3 passed; 0 failed; 0 ignored; 0 measured
Running target/debug/deps/writing_an_interpreter_in_rust-0dfac4ec669be969
running 3 tests
test token::lookup_ident_test ... ok
test lexer::new_token_test ... ok
test lexer::next_token_test ... ok
test result: ok. 3 passed; 0 failed; 0 ignored; 0 measured
Doc-tests writing_an_interpreter_in_rust
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured
BAM! That worked. Now let’s try the REPL:
$ cargo run
Finished debug [unoptimized + debuginfo] target(s) in 0.0 secs
Running `target/debug/writing_an_interpreter_in_rust`
>> let add = fn(x, y) { x + y; };
Token { token_type: Let, literal: "let" }
Token { token_type: Ident, literal: "add" }
Token { token_type: Assign, literal: "=" }
Token { token_type: Function, literal: "fn" }
Token { token_type: LeftParenthesis, literal: "(" }
Token { token_type: Ident, literal: "x" }
Token { token_type: Comma, literal: "," }
Token { token_type: Ident, literal: "y" }
Token { token_type: RightParenthesis, literal: ")" }
Token { token_type: LeftBrace, literal: "{" }
Token { token_type: Ident, literal: "x" }
Token { token_type: Plus, literal: "+" }
Token { token_type: Ident, literal: "y" }
Token { token_type: Semicolon, literal: ";" }
Token { token_type: RightBrace, literal: "}" }
Token { token_type: Semicolon, literal: ";" }
Token { token_type: EndOfFile, literal: "" }
>>
That’s it! This is my current implementation of chapter 1 of Writing An Interpreter In Go! As mentioned, I’m still fairly new to Rust, and I’d welcome pull requests or hints on howto make our Monkey interpreter in Rust more idiomatic. Feedback welcome!
You can find my contact details here.
Note: There’s a follow up post reviewing changes I’ve made with the help of the Rust community. They improved the overall performance by the factor 10!