Writing An Interpreter In Rust (Part 2)
Note: This is a follow-up post to Writing An Interpreter In Rust
Thanks for all your feedback, explainations and pull requests! I’m pretty overwhelmed by the feedback and I can confirm that the Rust community is very friendly and extremely helpful.
In this post, I want to quickly review some of the changes to my implementation of the Monkey interpreter I’ve made with your help. I’ve implemented a small benchmark (Disclaimer: I’ve just put next_token()
into a #[bench]
, I’m not sure whether that’s the best practise for this kind of tests), and the results are really impressive so far!
First implementation:
$ cargo bench
[...]
test tests::bench_next_token ... bench: 30,614 ns/iter (+/- 2,177)
Current implementation:
$ cargo bench
[...]
test tests::bench_next_token ... bench: 3,863 ns/iter (+/- 627)
The new implementation is almost ten times faster than the first one! Let’s have a look at the changes:
Use built-in functions
The first (small) step was to replace hand-written custom functions with built-in Rust functions when possible. For example, the function is_digit()
can be replaced with Rust’s built-in is_numeric()
. Unfortunately, there’s no built-in function to check whether a character is either a letter or an underscore, but our custom function is_letter()
could at least make use of the built-in function is_alphabetic()
.
fn is_letter(ch: char) -> bool {
// This was the old statement:
// 'a' <= ch && ch <= 'z' || 'A' <= ch && ch <= 'Z' || ch == '_'
ch.is_alphabetic() || ch == '_'
}
In another commit, a loop
could be improved using the built-in function is_whitespace()
.
Improve the enum
Someone pointed out in one of the many Reddit comments that Rust can store different types in a single enum. With this, it was possible to drop the Token
struct completely and merge TokenType
and Token
into a single enum. We can store Ident
and Integer
as a String
, while keeping the other elements plain and simple:
#[derive(Debug, PartialEq)]
pub enum Token {
Illegal,
EndOfFile,
// Literals are stored as strings
Ident(String),
Integer(String),
// [...]
}
See this commit to review the whole changeset.
Custom iterator
The biggest change was made using Rust’s internal Peekable<Chars>
iterator instead of rolling our own. The following struct was the original Lexer
:
pub struct Lexer<'a> {
input: &'a str,
position: usize,
read_position: usize,
ch: Option<char>,
}
Two people pointed out, that running a custom iterator in the way I did actually is not very efficient. With their help, and
Jay Klickliter’s implementation as a reference, the struct could be reduced to a single Peekable<Chars>
:
pub struct Lexer<'a> {
input: Peekable<Chars<'a>>,
}
See this commit for the complete adaption.
Thanks
Thanks again for all the valuable feedback I received via different channels! Your comments and pull requests are still welcome!
The feedback and pull requests of the following four Rustaceans was especially helpful, so let me give you guys a special shoutout!
- Taryn Hill
- Jay Kickliter
- Cameron Derwin
- Tohie
You can find the Writing An Interpreter In Rust project on Github and my contact details here.