The big question when implementing a runtime for a Lisp-like language in Rust is how the interpreter will interact with the GC. It looks like this project has avoided most of the complexity by not implementing actual lists, and allowing multiple references to the same value to exist only in restricted circumstances, hewing close to Rust's ownership model rather than a typical Lisp's. I wonder, is the plan to stay on this side of the space, as a lightweight language with potentially smooth FFI interoperation with other low-level/Rust code? Or to rewrite almost everything to support the high-level semantics of list-processing languages, with their emphasis on shared structure?
The way I’ve approached it in my compiler [1] is that garbage collection can never occur while Lisp/RFI (Rust function interface) code is executing. Once the program returns to the event loop GC can occur with a small number of known roots. This works because there will be no way to block without waiting on a future which can then return to the event loop. I’m considering adding a `conditional-gc` macro for long running computations that’s internally implemented as waiting on a future that’s immediately resolved.
This means that RFI code can treat GCed values as having a `'static` lifetime. One caveat is that RFI code cannot capture values except for a few special runtime functions. This works naturally with the language’s pure functional design, however.
As far as data structures go its impossible to support performant Lisp code by implementing lists as native vectors; the guarantees and idioms are too different. My runtime [2] provides Rust bindings for data types that e.g. allow you to create an `Iterator` from a list or construct a list from one. That should allow transparent interoperability with idiomatic Rust patterns and data structures.
> I wonder, is the plan to stay on this side of the space, as a lightweight language with potentially smooth FFI interoperation with other low-level/Rust code?
I personally hope this is the case; I feel like Lisps in that space are scarce. Strange interactions between the GC and "alien" functions are one of the things that make FFI painful even when using a Lisp with otherwise-thorough FFI support (like Chicken Scheme or most Common Lisps).
I think ultralight scripting languages are an interesting quadrant of language space, but I'm not sure the strengths of Lisp would be evident in a language without lists.
Fair point. Perhaps "Lisp" would be a misnomer, but I do think there's value in an S-expression-based scripting/configuration language even if that language doesn't insist on revolving entirely around linked lists and the semantics/quirks thereof. Seems like vectors/tuples/structs should be just as valid as lists for a program to represent itself.
Great question firethief! I think both approaches -- a) sticking close to rust and supporting beautiful interop, or b) creating our own GC and using interesting data structures (I've wanted to implement persistent vector tries for a while) -- could be very cool.
For now, this was more of a toy project. If you have an interest in pushing either of these directions, would love to talk!
Passing both a &[String] and an index into the function is duplicative, because a slice already has an index in it (the start point). It would be cleaner to just use `tokens.split_first()` to yank the first token off the list, which gives you both that first token plus the remainder, then you can pass the remainder into subsequent calls, and return the remainder at the end when you're done. Or, heck, you could even take a `&mut &[String]` instead, allowing you to mutate the slice directly and not have to return anything.
I love the article, I actually tinkered with something like this a few moths ago.
I tried to navigate back to HN using the back button, however it was inoperable. I found that passing every section heading on this article adds a line in your browsing history. Please don't mess with my browser history while I'm just scrolling. If I click on something that's fine, but not while I'm scrolling. I already have a history of where I am in your article, it's called the SCROLLBAR!
This is a fairly recent addition to medium I think, but I also hate it a lot.
This kind of behavior can make sense on news sites that load another article when you get to the bottom of the page, but adding history entries for each header you haven't interacted with is beyond words.
re: article -- thank you for the kind words TehCorwiz :).
re: scrollbar -- oi -- definitely did not intend that. Will look into this (I am hosted medium, so don't have much ability to move things around, but will see what I can do)
Sorry, that name is reserved for the next step in the project - a parentheses-heavy input syntax for Rust, or more plausibly, for the simplified representation thereof known as MIR. Its biggest impact on the tech industry will actually be to significantly broaden the appeal of the Rust Evangelism Strike Force; indeed, it's expected that very few programs will ever be written in Lust, and that even the programs that are written in the language will be largely toy examples! But the overall effect has nevertheless been considered to be highly worthwhile, so the project will go forward.
It's long been a dream of mine to make some software package that would be immensely useful and offer it for free with the catch being that the name is something childish and embarrassing and offensive.
And then maybe offer a version of it with a nice, normal, appropriate name, but charge money for that version.
I worked on a product sold in the financial industry, and we wanted to ship it with a built-in 3rd-party debugger. The execs complained, so we had to buy and repackage the debugger source as "debug utility".
I thought this was just going to be another forgettable "X implemented in Y", but this is actually a Rust-flavored version of Peter Norvig's classic "(How to Write a (Lisp) Interpreter (in Python))", which is an absolute treasure. Glad I didn't just skip over this!
I did almost the exact same thing (follow Norvig's Lisp tutorial, but in Rust) at one point, though I didn't get as far: https://github.com/eamonnmr/atmoslisp
A bit of feedback to the author: I cringe really, really hard when I see things like:
#[derive(Debug)]
enum RispErr {
Reason(String),
}
One of the beauties of Rust's type system is that you can combine meaningful error types with the From trait to build zero-effort error handling. When you turn every error into a String, you lose information that future programmers (including yourself!) will be glad to have. I know it can be a little bit painful to get used to, but once you do, meaningful error handling will be at your fingertips.
To be helpful to readers of this comment and not just critical of the author, I recommend you don't overthink your error types when you're first starting to write your program. Think of the contents of the enum like a "scratchpad" where you list every failure mode you've encountered so far. While you're still developing, you can greedily match on every variant in your match statements, and it's fine. As you do this, you'll begin to notice themes that will inform the eventual refactor. Best of all, once you do refactor, you won't have to re-understand your code to tease out the various kinds of failures - you'll already have a list of them!
Because I'm on my soapbox, I'll also recommend implementing ToString[1] right off the bat, too. That way you can print out useful error messages immediately.
For example, if your errors are:
enum RispError {
/// Syntax error returning line number and char
SyntaxErr(u32, u32),
/// Parens not balanced; contains number of parens needed
UnbalancedParens(usize),
}
You can impl ToString straight away:
impl ToString for RispErr {
fn to_string(&self) -> String {
match self {
RispErr::SyntaxErr(l,c) => format!("syntax error at line {}, col {}", l, c),
RispErr::UnbalancedParens(n) => format!("use your % key more! add {} more paren", n),
}
}
}
Then, when handling the error, you can
match do_thing() {
Ok(v) => { ... },
Err(e) => { // TODO handle error
println!(e)
}
}
where do_thing's signature looks like
fn do_thing(...) -> Result<T, RispErr>;
It takes the same amount of code, but all your error messages are in the same place, you haven't lost information about them, and refactoring to handle them becomes super easy.
(n.b. I dashed this comment off without actually compiling the above, so please forgive any dumb errors =] )
Instead of implementing `ToString` directly, you should implement `Display`. A `ToString` implementation will automatically be added, and you get all the formatting goodness out of the box.
As a Lisp programmer, the way most other languages handle errors make me cringe. How could I even make a decent ice cream sundae without error recovery? But I do enjoy that 'make a Lisp' is a like a sport in every other language.
Glad to help! I am a huge huge fan of Rust. I'm almost always hanging out on ##rust and ##rust-beginners on Freenode if you want to ask questions synchronously. :)
YAGNI, KISS. Lots of code never ends up caring about differentiating between different error conditions. A lot doesn't even care if the code it called failed - maybe it'll log, but the caller can continue regardless. And for a parser, I'm more likely going to want a struct with file/line info (or a full blown list of context) than a bunch of hardcoded enum cases. Having the enum in place to easily identify refactoring points is nice, but having a bunch of mostly unused variations inside the enum will mostly just distract me, and slow me down, giving me more code to re-understand regardless.
In the cases where I do want to distinguish between error cases, I'd still rather wait until I have the code consuming those error enums to help guide what errors are meaningfully different - evidenced by needing different error handling paths - instead of making every caller decide which of the dozens or hundreds of edge cases they want to be handled which way(s).
There are cases where your approach of distinguishing cases up front would be better - as an example, if you're limited in your ability to refactor due to the interface being part of a public API with semver restrictions to your refactoring - but there's plenty of cases where the author's approach is the correct one, where you should be cringing if you don't see things kept as simple as RispErr.
Oh, I know. :) I mostly spoke up because of my own journey with Rust, and knowing how important articles like these could be to people new to the language. I totally understand why Stringifying errors is appealing in Rust but breaking the mental habit is harder than learning good habits straight away.
This sounds quite cool -- if you have the time, would you mind giving me a quick example? Want to make sure I have a full understanding of what you mean.
Whenever I've read into error writing and handling I inevitability get lost in the weeds. A tl;dr summary like you provided below would be a really helpful writeup in general, especially if it showed how it's just as easy as string errors and showed how much you get for free when doing things the right way.
One interesting way to extend this is to implement ‘call with current continuation’ or call/cc from Scheme.
It’s a great challenge. I’ve done it in the past with rewriting the interpreter to continuation passing style.
This also make it easy to implement breakpoints and a whole debugger for the interpreter.
I still have it running here:
https://csokavar.hu/projects/mini-scheme/
fn env_for_lambda<'a>(
params: Rc<RispExp>,
arg_forms: &[RispExp],
outer_env: &'a mut RispEnv,
) -> Result<RispEnv<'a>, RispErr> {
let ks = parse_list_of_symbol_strings(params)?;
if ks.len() != arg_forms.len() {
return Err(
RispErr::Reason(
format!("expected {} arguments, got {}", ks.len(), arg_forms.len())
)
);
}
let vs = eval_forms(arg_forms, outer_env)?;
let mut data: HashMap<String, RispExp> = HashMap::new();
for (k, v) in ks.iter().zip(vs.iter()) {
data.insert(k.clone(), v.clone());
}
Ok(
RispEnv {
data,
outer: Some(outer_env),
}
)
}
Arguments to values in TXR Lisp (interpreter), written in C:
static void do_eval_args(val form, val env, val ctx,
val (*lookup)(val env, val sym),
struct args *args)
{
for (; form; form = cdr(form))
args_add(args, do_eval(car(form), env, ctx, lookup));
}
Or, if you don't like return value polymorphism, then:
fn eval_forms(arg_forms: &[RispExp], env: &mut RispEnv) -> Result<Vec<RispExp>, RispErr>> {
let mut exps = vec![];
for x in arg_forms {
exps.push(eval(x, env)?);
}
Ok(exps)
}
It looks no more noisy than your C code, and I can't actually tell whether your
C code is doing error handling. Is it? If not, try adding it. Now which is
noisier?
It's at least twice as noisy. It has more operators, more different punctuators, < > brackets in addition to ( ) and { }.
Not sure what error handling you have in mind, but it's robust. We can interactively call it from gdb with some garbage values:
(gdb) r
Starting program: /home/kaz/txr/txr-dbg
This is the TXR Lisp interactive listener of TXR 215.
Quit with :quit or Ctrl-D on empty line. Ctrl-X ? for cheatsheet.
1> (raise 5)
Program received signal SIGTRAP, Trace/breakpoint trap.
0x00132416 in __kernel_vsyscall ()
(gdb) p do_eval_args(9, 9, 9, 9)
Too few arguments in function call.
(gdb) p do_eval_args(9, 9, 9, 9, 9)
** car: 2 is not a cons
** during evaluation at expr-1:1 of form (raise 5)
** run with --backtrace to enable backtraces
2> _
The function relies on args having enough room for all the values; both callers ensure that. The lookup function can't be wrong, either.
What a strange comment. When compared to my first Rust snippet in my previous comment, your C code has more parens, the same number of brackets, more semi-colons, more asterisks and more inscrutable function pointer syntax. Of course, I don't actually find C's syntax all that noisy, but I don't find Rust's syntax particularly noisy either. Probably because I know both languages. But at least be fair and try to look at both with virgin eyes instead of only one of them.
The fact that it does error handling is not at all clear from its type signature, in contrast to Rust's function signature. So that's going to contribute "more noise" from your perspective, but on the flip side, it also conveys more information. Based on your demonstration, it looks like your function just aborts the program on an error, but the Rust function is a bit more versatile. It gives the caller a choice of how to deal with an error. Otherwise, I could just write this instead:
Since we're working on Lisp internals, we can accept a few parentheses.
The code has two asterisks, both in parameter declarations, indicating pointers.
It has exactly one binary operator in the body, the assignment = denoting the one local side effect (stepping the iteration variable of the simple for (;;) loop).
All else is simple function calls. Except for the function pointer declaration, and perhaps not knowing car and cdr, an Awk or JS programmer might grok this.
The program doesn't abort; the exception was caught in the REPL. We were thrown right out of the GNU Debugger where we caused the problem, and back into the Lisp REPL. We can demonstrate that in other ways, like this:
(gdb) r
Starting program: /home/kaz/txr/txr-dbg
This is the TXR Lisp interactive listener of TXR 215.
Quit with :quit or Ctrl-D on empty line. Ctrl-X ? for cheatsheet.
1> (catch (raise sig-trap) (error (x) (put-line `caught error: @x`)))
Program received signal SIGTRAP, Trace/breakpoint trap.
0x00132416 in __kernel_vsyscall ()
(gdb) p car(9)
caught error: car: 2 is not a cons
t
2> (+ 2 2)
4
3> _
gdb gets confused here, though:
3> (exit 0)
[Inferior 1 (process 8529) exited normally]
The program being debugged exited while in a function called from GDB.
Evaluation of the expression containing the function
(car) will be abandoned.
Quite understandably, it doesn't understand the exception handling and didn't notice that we jumped out; it still thinks we are executing the car function. Oh well!
Nah, that will happen once someone implements a Javascript compiler written in a Scheme dialect with a VM which is written in Rust, with the entire stack compiling to WASM so it can run in the browser.
`tokenize` function looks weird. Why use replace and then split? How would you extend this to stirngs (which might contain "()" chars)?
It might be OK in a toy example, but it would have been better if the author wrote an actual Reader (not familiar with Rust, but I guess it should have some readers that allow peeking 1 char ahead OR something like Java's PushbackReader).
Then `parse_list_of_floats`, if I am reading it right, this function is fail-safe, not fail-fast, right? It won't panic upon the very first reading error?
Why eval makes environment lookups? Shouldn't environment itself know how to perform a lookup?
It's basically a clone of the tokenizer from this python implementation https://norvig.com/lispy.html which was referenced in the introduction.
The same code has been seen in a bunch of simple/similar lisp interpreters in different languages.
To be honest I'm not sure if that was the first documented tokenizer using this simple approach, but it is definitely a common pattern for such things - and as you say it is naive at best, and buggy at worst. That said it is easy to get it working, and later fix it properly.
RispEsp is a tree using value semantics. How will RispExp evolve to support e.g. circularly linked lists? How will that interact with Rust's ownership requirements?
The big question when implementing a runtime for a Lisp-like language in Rust is how the interpreter will interact with the GC. It looks like this project has avoided most of the complexity by not implementing actual lists, and allowing multiple references to the same value to exist only in restricted circumstances, hewing close to Rust's ownership model rather than a typical Lisp's. I wonder, is the plan to stay on this side of the space, as a lightweight language with potentially smooth FFI interoperation with other low-level/Rust code? Or to rewrite almost everything to support the high-level semantics of list-processing languages, with their emphasis on shared structure?