chr4

Devops. I've never asked for this.

Writing An Interpreter In Rust (Part 2)

Note: This is a follow-up post to Writing An Interpreter In Rust

Thanks for all your feedback, explainations and pull requests! I’m pretty overwhelmed by the feedback and I can confirm that the Rust community is very friendly and extremly helpful.

In this post, I want to quickly review some of the changes to my implementation of the Monkey interpreter I’ve made with your help. I’ve implemented a small benchmark (Disclaimer: I’ve just put next_token() into a #[bench], I’m not sure whether that’s the best practise for this kind of tests), and the results are really impressive so far!

First implementation:

1
2
3
$ cargo bench
[...]
test tests::bench_next_token ... bench: 30,614 ns/iter (+/- 2,177)

Current implementation:

1
2
3
$ cargo bench
[...]
test tests::bench_next_token ... bench: 3,863 ns/iter (+/- 627)

The new implementation is almost ten times faster than the first one! Let’s have a look at the changes:

Use built-in functions

The first (small) step was to replace hand-written custom functions with built-in Rust functions when possible. For example, the function is_digit() can be replaced with Rust’s built-in is_numeric(). Unfortunately, there’s no built-in function to check whether a character is either a letter or an underscore, but our custom function is_letter() could at least make use of the built-in function is_alphabetic().

1
2
3
4
5
6
fn is_letter(ch: char) -> bool {
    // This was the old statement:
    // 'a' <= ch && ch <= 'z' || 'A' <= ch && ch <= 'Z' || ch == '_'

    ch.is_alphabetic() || ch == '_'
}

In another commit, a loop could be improved using the built-in function is_whitespace().

Improve the enum

Someone pointed out in one of the many Reddit comments that Rust can store different types in a single enum. With this, it was possible to drop the Token struct completely and merge TokenType and Token into a single enum. We can store Ident and Integer as a String, while keeping the other elements plain and simple:

1
2
3
4
5
6
7
8
9
10
11
#[derive(Debug, PartialEq)]
pub enum Token {
    Illegal,
    EndOfFile,

    // Literals are stored as strings
    Ident(String),
    Integer(String),

    // [...]
}

See this commit to review the whole changeset.

Custom iterator

The biggest change was made using Rust’s internal Peekable<Chars> iterator instead of rolling our own. The following struct was the original Lexer:

1
2
3
4
5
6
pub struct Lexer<'a> {
    input: &'a str,
    position: usize,
    read_position: usize,
    ch: Option<char>,
}

Two people pointed out, that running a custom iterator in the way I did actually is not very efficient. With their help, and Jay Klickliter’s implementation as a reference, the struct could be reduced to a single Peekable<Chars>:

1
2
3
pub struct Lexer<'a> {
    input: Peekable<Chars<'a>>,
}

See this commit for the complete adaption.

Thanks

Thanks again for all the valuable feedback I received via different channels! Your comments and pull requests are still welcome!

The feedback and pull requests of the following four Rustaceans was especially helpful, so let me give you guys a special shoutout!

  • Taryn Hill
  • Jay Kickliter
  • Cameron Derwin
  • Tohie

You can find the Writing An Interpreter In Rust project on Github and my contact details here.