Hacker News new | past | comments | ask | show | jobs | submit login
Roll your own minilanguages with mini-interpreters (1989) (textfiles.com)
99 points by bleachedsleet on July 6, 2021 | hide | past | favorite | 36 comments



"Another World", the game, is an interesting example of this approach. It's implemented in a mini-VM, with 29 opcodes - apart from the low-level "mov", "call", "ret", etc. there are high-level ones too, like "fill fb" or "draw txt". The VM design was closely related to the art style of the game, based on polygons, not bitmaps.

With this approach it was fairly easy to port the game to various platforms. "Only" the VM implementation had to be ported, while the bytecode remains the same. Of course, the VM implementation had to be tied to the graphics and sound subsystems of the target platform.

There's a super-interesting series of blog posts, starting with [1], describing the "Another World" internals and also the design of all the platforms to which it was ported: Amiga, Atari ST, PC, Genesis, SNES and GBA.

[1] https://fabiensanglard.net/another_world_polygons/index.html


You might be interested to know that Another World was not Eric Chahi's first game to implement a VM. Infernal Runner (Amstrad CPC, 1985) was recently reverse-engineered to be playable in the browser. Some details and source: https://github.com/cyxx/infernal_js


I've never played it, but apparently it's being ported to the Zx Spectrum Next and the C64, which are supposedly herculean efforts due to the limitations on those machines


Another famous example: adventure games using z-machine (Infocom), and later SCUMM.


I think most people shy away from doing something like this for good reasons. It is complicated, not easy to understand for other people and can require intense maintenance. So it's usually only worth when it can be used wide-scale.

But in languages like Haskell or Scala (and others), you can now use a concept called "free monad" in a quite easy way.

It allows for almost as much freedom in expressing concepts like minilanguages do, while keeping allowing the user to stay within the original host-language and keep all the benefits that come from that - e.g. you don't have to re-invent the concept of arrays or list, because it already exists in the host-language and you keep all the nice benefits from tooling like IDE and refactoring support, syntax-highlighting, etc. The syntax however will be limited to what the host-language can offer compared to writing a minilanguage (where you have 100% flexibility).

Here's an example how it can look in practice: https://typelevel.org/cats/datatypes/freemonad.html


Free monads are hard to get your head around IMO.

But there is any easier way. In Haskell just use a typeclass! The typeclass defines the “syntax” and your implementation of the class defines the “semantics”

You can also do this in C# or Typescript simply using an interface, or in JS I guess using duck typing.

Basically you can make DSLs using OO concepts!


I think Haskell's return-type polymorphism makes this much easier.

In other languages, a monadic structure is at least general.

A "monad" is just a type with a sequence operation. The heart of it is just that we aim to do "1 then 2 then 3", etc.

This only becomes something worth talking about when not using the semi-colon to sequence, as we then need to use function application (ie., passing arguments).

The benefit of using a function to sequence operations is that we can just "make it the semicolon", ie., and do nothing extra. *Or* we can add extra functionality, "sequence and ...".

In the case of free monads (I just skim read the article), it appears this "extra" is just a "package" operation. So rather than sequence to execute, we (recursively) wrap in a container.

This recursive container is then an abstract syntax tree, and we can define an interpreter separately.

This I think is very straightforward, and really just confused by terminology. We are just making a AST constructor from a linear sequence of operations.


> A "monad" is just a type with a sequence operation. The heart of it is just that we aim to do "1 then 2 then 3", etc.

Isn't that an Applicative Functor? Monads also have join / bind.


Yeah for some reason this explanation which should be simple just blows my mind. I’m probably too experienced in imperative languages. It took me a long time to grok plain old monads. And when you get those GHC compiler errors from heavily stacked up types I find it very hard to work out what went wrong. For practical purposes I prefer OO even though it might be more limited in some regards.


    p = newPerson();
    w = walk(p);
    t = talk(p, w);
    s = shout(p, t);

becomes,

    seq(shout, seq(talk, seq(walk, seq(newPerson())))

"infix",

    newPerson() seq
    walk seq
    talk seq
    shout 

The operations `newPerson` and `seq` form a monad.

If you can omit `w =`, `t=`, and `s=` you have an applicative, rather than a monad. Ie., the operations being sequenced are independent.


> But there is any easier way. In Haskell just use a typeclass

I understand your point. In fact, I mostly do what you say, because it is simple and usually sufficient in most cases.

But it is important to understand that there is a difference. A minilanguage or an embedded DSL with free monads allows for custom transformations and it completely separates the interpretation from the description of the domain logic/actions.

With the approach(es) that you mention, that is not the case. You both describe and directly execute/interprete the logic in an intermingled way. That can be an advantage because it makes things easier (less abstraction) and more direct, but it can be a disadvantage if you have the need to interpret in different ways and have full separation between these two phases.


I cannot imagine a situation where a small interpreted language is "too complicated" but "free monads" are not.


Let me help your imagination: the users of your language ask you to help them get auto completion when they write code in this language.

How much more complicated would that make things based on the two choices?

And, just for completeness, here's an example that turns things around: your users are asking you to change the syntax und use lisp-like evaluation/expressions. So (* 3 (+ 1 2)) instead of 3 * (1 + 2). Or, they ask you to do it the other way around.

How much more complicated would that make things based on the two choices?


At this point rather a lot more weight is being put on the DSL than a mere minilanguage, so I'd embed a language interpreter for something existing. I've used Tcl for this in the past.

But when you say "users", that indicates a different use case anyway. The minilanguage is basically source compression or code generation, it's for the developers.

> your users are asking you to change the syntax und use lisp-like evaluation/expressions

Nobody ever wants that except Lisp users, and if you're surrounded by angry lisp users you should just give them what they want and embed a lisp. Or even just write one, Scheme is small enough.

But as far as I can tell, "free monads" are a lot of machinery to re-implement imperative stateful programming when you could have just used a language that wasn't Haskell in the first place.


> At this point rather a lot more weight is being put on the DSL than a mere minilanguage, so I'd embed a language interpreter for something existing.

Sure, but then why not just use the (host) programming language directly?

> But when you say "users", that indicates a different use case anyway. The minilanguage is basically source compression or code generation, it's for the developers.

I meant developers as well - they are still users of your language no? :)

> Nobody ever wants that except Lisp users

If you are just going to disregard all use-cases and examples I give, then of course in the end we will end up with a solution-space where all my arguments don't apply. You said you can't imagine such a situation, I gave you an example. I can't really do more than that.

> But as far as I can tell, "free monads" are a lot of machinery to re-implement imperative stateful programming

Not at all. And free monads are completely orthogonal to Haskell and very useful in non-pure-functional programming languages as well. I think this conversation is not very fruitful without a basic understanding of what they are.


> It is complicated, not easy to understand for other people and can require intense maintenance

I think this code sample illustrates the exact opposite. Look how concise it is - you really don't need much code at all to implement a simple interpreter.


But this interpreter does almost nothing. It draws mazes and prints text, and it only works in 80-char terminals.

A hosted DSL in a modern high level language gives you the full power of a modern high level language to build the simple constrained language of your choosing, and it can be highly portable to different environments, including keyboard vs network play, and different UI environments.


> But this interpreter does almost nothing. It draws mazes and prints text, and it only works in 80-char terminals.

That is part of the point, though: An interpreter does not need to be complex. Adding "instructions" that does more complicated things does not need to complicate the interpreter much.

Nor does the implementation language make much difference here - fundamentally the interpreter is only the part up to the definition of SubProg. Everything else is just implementation of specific instructions.

Don't get hung up on the use of assembler here - unless you need blazing speed from the interpreter itself, it's trivial to do in pretty much any language.

Conceptually, the interpreter is basically just a table of instructions, a bytecode buffer, and a loop that keeps fetching the next instruction, looks it up in the table, and calls it. The "SubProg" instruction could reasonable be considered part of the generic aspect of the interpreter, but the rest are the domain specific bits, no different

You can abstract out the instructions easily enough. E.g. here's a trivially dumb interpreter written in Ruby where the instructions are separate from the interpreter, that works conceptually not much differently to the linked assembler:

    class Interpreter
      attr_accessor :pc

      def initialize program, instructions
        @program = program
        @instructions = instructions
        @pc = 0
      end
  
      def getnext
        n = @program[@pc]
        @pc+=1
        return n
      end

      def run
        while i = getnext
          @instructions[i].call(self)
        end
      end
    end

    # We could make this an array and insist on numeric instructions,
    # but that's an implementation detail - the interpreter doesn't even need to know
    # if you pass it a hash or an array.

    instructions = {
      :myarg => ->(interp) { puts "My argument is #{interp.getnext}" },
      :helloworld => ->(interp) { puts "Hello World" }
    }

    program = [:myarg, 42, :helloworld, :myarg, 3]

    Interpreter.new(program, instructions).run

Because the program counter is settable, "SubProg" can just as easily be implemented externally to the interpreter, btw.. Adding variables would really just be a matter of exposing another array or hash in the interpreter so that you could add instructions to read/write them.


The point is to not have abstraction power. The goal is to make the DSL be a direct expression of the intended thought, which usually means cutting it down to the point where it's reduced almost everything to preset categories. At that point, you don't need much of a language to build it in, so it ends up porting well to all of them.


Just look at the article. They define "SubProg%" precisely to have more abstraction power. At some point someone will want to have loops, conditions, recursion, ... where does it end?

Because of that, it's much nicer to use the power of the host language directly, so you don't have re-invent all these things everytime - but with free monads you keep the property of having a well defined, business specific list of actions or instructions that can be interpreted in different ways.


What you're describing is mostly an implementation detail to me. The point of this example is not that you can't do it in a higher level language. The point is to illustrate the power of being able to create a list of code as data. Everything else is implementation detail.


Well, I totally agree with you! My whole point was to offer an alternative that makes this implementation easier and offers some nice extras (while sacrificing free choice of syntax). :)


Free monads are a bit misnamed, you probably want 'freer monads'?


These days you can benefit from minilanguages without actually implementing them. It is surprisingly easy to embed JS or Lua into your program.

Recently I was looking for Linux games which match my taste at some Russian torrent site. Search engine on it is not particularly good, it allows search only on torrent title. While each torrent description contains genre, year and a lot of other info which is pretty approachable for machine interpretation.

So I found XML dump of tracker database (about 22 GB) with torrent descriptions and built tool which allows to process torrent records with an arbitrary JS script to filter records. Like AWK, but suited for specific XML input and scripted with JS.

This way I could filter games based on year, genre and gaming engine. JS engine I used is implemented in pure Go, so it doesn't introduce additional runtime dependencies. With such approach it's easy to write any search query, aggregation and so on.

BTW, here is the project link: https://github.com/Snawoot/trusearch

Pure-Go JS engine: https://github.com/dop251/goja


Sigh... I am getting old. I had the issue of Dr Dobbs this originally appeared in and I was just thinking about this the other day.


I've read stories about old games on constrained systems, where the solution to limited space was to design a special byte code which would be much more compact than native code. I think one of the examples was Another World, maybe ScummVM.

The idea sounds like it could be still relevant for microcontrollers, AVRs can have <2KiB program space.

Are there any good resources that discuss the design of the "mini-languages" composed of byte codes and interpreters?


Just to be pedantic, ScummVM is a re-implementation of the original SCUMM, which totally was an instance of this approach. There is a (crazy long) Youtube video [1] where they take a peek at the SCUMM code for the first Monkey Island, and hack it almost on the fly. Pretty good for a 1990 game.

Of course, Infocom's Z-machine may be an even earlier example.

[1] https://www.youtube.com/watch?v=ikaqus5_QIg (the relevant bit starts at around 3:09:30)


Well, a (more) compact bytecode representation is one of the design decisions made for web assembly... That is one reason why it is a stack machine.


Bytecode in practice is almost always more compact than corresponding machine code.


And then notice that a Forth is a sort of meta-mini-interpreter. (Whether it's good to use one in any particular case is a different matter, of course.)


Reminds me of a couple of threads I saw recently where people were creating their own in-game scripting languages with Godot's Expression class. I keep wondering how far one could take this.

[0]https://old.reddit.com/r/godot/duplicates/msodaw/finally_got...

[1]https://www.youtube.com/watch?v=Sk2f2gPFH80


X86 16-bit real mode assembly for MS-DOS? Don't see a lot of that these days.


Can anyone point me at a walkthrough of these concepts in a higher level language than assembly?


Today the mini languages torch is most prominently carried by Racket. The most accessible ease-in to this I've come across is Matthew Flatt's 'Creating Languages in Racket' https://queue.acm.org/detail.cfm?id=2068896


Love the no nonsense licensing!


Dan knows mazes!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: