Hacker News new | more | comments | ask | show | jobs | submit login
Show HN: A Rust macro that parses Java-like syntax and runs it as a Rust program (gitlab.com)
195 points by jD91mZM2 6 months ago | hide | past | web | favorite | 31 comments

There are a few things that are annoying about writing macros* (in particular when I last checked there was no way to break the hygiene which is something that is occasionally useful for eg making lots of functions that follow a similar template) in Rust but in general I think they strike a really nice balance for a language with complex syntax:

1. Macros only happen in certain points and don’t need to be known to be parsed so parsers can be made which don’t need to understand macro syntax (or know which macros are in scope before parsing). I think this is better than e.g. ocamlp4 (writing a plugin for an alternative compiler frontend) or perl6 (subclassing the grammar of the language and adding your own rules which is a bit nice but also a bit crazy. Perl has the advantage of being so dynamic that any kind of analysis tools are not written to parse the code but to introspect it).

2. Macros don’t require any code to be compiled and then run at compile time. This also means that macros are not programs that have to understand the type of the ast and operate on it, something that would be particularly painful in rust for writing something simple. Languages where macros are not like this are ocaml with ppx, Rust, and Julia (where the ast is super vaguely defined and hard to work with, especially in the way it magically deals with module scoping and quoting, quaisiquoting, and “hygiene”)

3. The input to a macro is not a parse tree but rather a “token tree” which is essentially a sequence of tokens where delimiters are matched and converted into “tokens,” something that feels vaguely like TeX. the only other macro system that works at all like this is Lisp but that’s because the language (and ast) is sequences of tokens with parenthesised bits grouped into single tokens so it doesn’t really count.

4. Macros are based on rewrite rules. These make it quite hard to write macros but I think they’re the best we have for not needing to invent a huge language just to write macros in, and for having macros be hygienic. Scheme macros are also rewrite based but it is easier to understand how they work because the shape of their input it more obvious

One thing that makes me sad about rust macros is having to specify the “type” of things that are matched against, e.g. “expr”. I can see why it’s needed but it does feel somewhat limiting. Another thing is that macro invocations look different from the rest of the syntax so a macro feels like it’s tacked on rather than a part of the language.

* by macro I mean the kind made of rewrite rules and not compiler plugins

You might be interested in the work that Carl Mäsak is doing with 007 (http://masak.github.io/007/). It is the playground for developing future versions of macros in Perl 6.

I gotta say what I really like about macros are that ( and )s are matched. If you say:

  while ($($cond:tt)*) {
You'll actually match both

  while (true) {

  while (5 * (2 + 3)) {
. It counts the parenthesis :D

Even C preprocessor macros count parenthesis.

   MAC((here, have one argument), and then this second one)

But this is a requirement of the “you must be able to parse source code including macros without having to know how to expand macros” rule. Otherwise how would you know how to parse the following:


  foo! ( { ( x ) } )

  foo! ( { ( x } ) )
With {} representing the argument to the macro.

Reminds me of this cool hack: a 6502 assembler as a rust macro.


Cool. On the other hand, projects like these suggest to me that Macro systems like Rust's are a bit too powerful.

Not that I know how to make it just enough less powerful.

In my experience Rust macros are painful enough to write to deter people from abusing them too much. Whether that's a bug or a feature, I'll let you decide.

Rust's macro system is actually quite limited (I don't know if it's Turing-complete, but the artificial recursion limit is low enough that even if it is, it's not going to be all that abusable); macros are mostly just good for getting rid of repetitive boilerplate, and community norms (and implementation papercuts) discourage people from reaching for macros willy-nilly.

They are Turing complete, see https://github.com/durka/brainmunch

Not that "Turing complete" matters much. Raw text find-and-replace is Turing complete.

Like the language /// ? (https://esolangs.org/wiki////)

Interesting, but my thoughts were along a simpler route of making a literal Turing machine out of text.

Represent your tape with digits, and your head with a letter. Have an arrow from the letter to the active cell.

  [ 0 0 0 1 1 A->2 2 2 ]
Then the text replacements look like this

  A->2 becomes 1 A->

  2<-A becomes 1 A->

  B->2 becomes <-D 3

  2<-B becomes <-D 3
So two text replacements for every state/symbol combo. And the halt state has no replacements. Easy!

(Plus a replacement from [<- to [0<- and from ->] to ->0] to make the tape infinite.)

Rust macros are intended to be powerful, by design.

I don't know what I would do with this yet, but this is certainly a cool feat!

I was actually just toying with a similar thing myself. I'm experimenting with a system that generates abilities for a game within certain design parameters, then tests the games' balance by running lots of example games. I figured compiling a DSL via macro compiled down would be worth the upfront cost given how much testing I planned on giving each generated variant. Thanks for laying some groundwork in this part!

Perhaps I don't understand something? (I've only done a very small amount of Rust.)

What exactly does this do? Let you write Java-style code in Rust? Why would I use this? (Is it because writing code like this is easier than straight rust?)

> Let you write Java-style code in Rust?


> Why would I use this?

You wouldn't. It's a toy

Would it be possible to write a compiler from Java to Rust? C# to rust? The goal then would be to get fast(er) execution times.

Rust get’s a speed advantage over those languages from three things:

1. Being safer for writing concurrent programs and therefore having programs which are concurrent

2. (mostly) Type safe memory management which means most things are allocated on the stack and those things that are allocated in the heap don’t need to be gc’d (but note that gc doesn’t mean slow memory management but it is harder to have a shared heap in concurrent programs as some synchronisation is needed), and rust programs are typically written in a way that doesn’t allocate much on the heap.

3. Specialisation of generic code to specific data types

1 and 2 are hard for Java and c# as programs are not normally written in ways which would satisfy rust’s checks and so a compiler would just compile these to runtime checks like the c# or Java compilers would use. For number 3. C# already does this.

Note that actually if C# or Java program were translated into a rust program which does the same work then the Rust program would probably be slower because the Java and C# compilers are much better at optimising programs written in the Java/C# style. A fourth advantage rust gets (part of 3 and part of the type system) is a lack of subclassing and vtables which are typically fast at runtime but hard to inline.

One way to make a Java or C# program faster is to be conscious of the hidden work being done and trying to write programs in a way that safety checks can be eliminated and allocation can be made automatic.

> Would it be possible to write a compiler from Java to Rust? C# to rust?

It's possible to write a compiler from any Turing-complete language to any other Turing-complete language

> The goal then would be to get fast(er) execution times.

You'd improve startup times from not having to warm up the JIT, but otherwise it would very unlikely you'd get a noticeable performance improvement. The speed of a language is at least as dependent on its execution model and idioms as its compiler. Not to mention that the JVM already does plenty of optimizations

I'm not sure the statement "It's possible to write a compiler from any Turing-complete language to any other Turing-complete language" is really accurate, other than in a strict cs sense. If you have a language that only allows printing to the screen as output and reading from the keyboard as input then you can't write anything that requires low level hardware access in it.

Thats more a symptom of the environmemt/compiler, but not the language itself. Theres no reason printing to the screen would stop you from printing binary, and feeding that into your processor as normal. Theres also no reason why the language can't have another implementation that does allow more standard outputs.

Language-wise, theres nothing stopping you from parsing java, and producing equivalent semantics. There's nothing stopping you from producing equivalent jvm bytecode.

The interesting part of racket/perl6/this project is that this compilation step is being done without having to parsing "another language", by using macros. Parsing is still done ofc, but it could also be said that you're really just parsing the rust language, which happens to look like java, and producing rust code. The macro system itself is operating as the compiler, and since macros are "rust", then whats the difference?

Rust is a good base for many languages, but in these cases it would probably still imply implementing the JVM/the CLR, rather than "Java" or "C#".

There are probably research experiments that have involved ahead of time compilation, but if they had succeeded then I expect most people would have used them.

GraalVM looks very promising.

How well do LSP implementations cope with complex macros like this?

In short they don’t, neither does the compiler.

I wrote a rather large macro to auto-generate a Java-Ops parser. The I spent the next 4 days re-writing it so I didn’t have to give the Rust compiler 4GiB of stack just to compile.

Deeply recursive structures (boardering on 1mil+ recursion depth gets rough)

I think that I actually managed to get the language server to take 100% CPU once, I think it was stuck recursing. Don't want to dig deeper because I had to REISUB my computer, heh. But other than that one very rare time it works pretty well. Can't always show you the most useful things, because sometimes it depends on invocation and line number but it's doing well with syntax errors.

this is the wrong way to do translation problems in lisp.

But the issue with Rust ins't the syntax, it's where you need to change your entire thought model to match what Rust expects.

Well, ok, the syntax can be pretty horrific at times, but going to something Java-like doesn't seem like much improvement.

This isn't really intended to be used for any actual code, it's just a fun exercise.

The issue contrariwise with most other languages that can achieve similar performance characteristics as Rust, is that if you don't think about your code in a similar level of detail as the Rust compiler can "think" about Rust code ... then you get undefined behavior, memory leaks, and all sorts of really exciting run-time bugs that your tests may or may not detect.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact