Hacker News new | past | comments | ask | show | jobs | submit login
Flattening ASTs and other compiler data structures (cornell.edu)
285 points by fagnerbrack on July 2, 2023 | hide | past | favorite | 81 comments



Blender (the 3D modelling software) is a fascinating case of this. To make loading and saving files fast and lossless, it uses identical on-disk and in-memory representations. In other words, everything is in an arena, and saving and loading is just a memcpy of the entire arena. Considering the massive potential complexity of a Blender project and the myriad issues you could have with serialisation and deserialisation, this seems like a great design to me.

The trade-off of course is that your data structure design is kinda sticky, insofar as you need to be able to open files for older versions.


> The trade-off of course is that your data structure design is kinda sticky, insofar as you need to be able to open files for older versions.

The missing piece is a way to evolve those data structures over time, in a similar way to database migrations. Only when loading data from an older version of the app those operations would need to be issued, and then the app could save the updated version in disk immediately to avoid incurring that cost again.

An example implementation of this concept is this project [1] that was created in the context or CRDTs. If it is not directly applicable, at least it should be a good inspiration.

[1] https://www.inkandswitch.com/cambria/


I once worked on a very large commercial application that was based on a home-grown I/O framework which operated on similar principles.

It was a complete pain in the ass. You were constantly future-proofing your data structures because you knew you were going to be stuck with them for all eternity because the I/O framework was going to serialize them verbatim whether you liked it or not. Those were dark days...


I don't get it. Surely if you ever needed to, you could write an interpreter for loading tbe older files if you decided to abandon this 1:1 representation?


You ever hear of .doc files?

https://learn.microsoft.com/en-us/openspecs/office_file_form...

Yeah. There is a reason XML was seen as the future back in the 90s.

Custom databases often get extended into custom filesystems. And these systems on top of systems get obscure features (like embedding excel sheets inside of Word) and... Thing get hairy.


I think the effort involved disincentivizes against it, and nudges people towards defensively adding spare fields or whatever.

The other comment pointed out that you can make a fall back migration code path that migrates over older file versions. That's the escape hatch if you have no other options


Wasn't that also the way Word used to dump structs onto disc in the "doc" days? I think I read that way also a major problem in writing converters, you baobab needed to decode Word's internal data structures which were of course undocumented.


It's very common in old software and video games. Pre-97 office did that though with more formalisation than most (the binary format is now called Binary Interchange File Format, or BIFF, and is relatively generic, at the time MS almost certainly had utility libraries for working with BIFF).

This was a common source of portability issues for game saves (as well as blowing your saves on updates), as they'd commonly just blit internal data structure, with no formalised interface.


Yep, pretty sure Valve's brief experiment with a 64-bit Source engine in late 2005 resulted in save games that didn't work with 32-bit Half-Life 2.


I remember Spolsky saying Excel .xls worked like this, but I guess this should apply to .doc files as well.

This is usually orders of magnitude faster than serializing to/deserializing from a different storage format.


I don't think so. Serializing is IO-bound, if your format is "simple" (e.g. CBOR) you can't measure the difference.

All this look like BS to justify laziness to me, and makes loading/saving operations fragile and setting in-memory structures in stone.


Don't forget that Excel file format is literally 30 years old. When you have 4MB of RAM in total for your OS, Excel and whatever part of the file you can fit into the rest and your CPU is an i486 if you're lucky, minimizing parsing starts to make lots of sense.


> Serializing is IO-bound

Once upon a time, DMA did not meaningfully exist in the x86 world, so an IO-bound task was also CPU-bound.


Yes of course, but the original example was Blender, running on today's hardware. IMO this is just sloppy engineering in this case.


the cbor implementations i've had the misfortune to have to use are ridiculously slow, as slow as json parsers, perhaps because everything in cbor is variable-length

just because it's unreadable in a text editor doesn't mean it'll be fast


And potentially photoshop.


This is exactly how a PlayStation game I worked on worked and because everything was signed they were in no way worried about attackers. If you were able to alter the files on disk you had already owned your device. It meant loading was absurdly fast we pulled everything in the memory and then just patched pointer locations to where they actually got loaded


I believe Microsoft Word originally did this, and it was a major pain as the file format evolved.


It also means you can't share files between little endians and big endians.


There's almost certainly byte order+version markers in the header, a "happy path" that just memcpy's the structs, and a "safe path" that does byte order swapping, migrates the old structs to the current format, etc.

You can version the structs/loaders by keeping the relevant header files in their own version subfolders, and copious usage of sed. I've done this in a Django project to maintain support for ancient untouchable clients using an old version of the API, it's pretty manageable.


I checked the repo. Blender detects files with a different endian and byte swaps if necessary.


Is this a real problem though? It seems to me that any system you would realistically want to run Blender on will be little-endian.


It isn't a real problem. Blender supports only little-endian targets and the code byte swaps data files as needed. I guess someone could port it to AIX, z/OS, or Linux/zSeries, but that's unlikely at best.


It doesn't, you just need to be careful to always read and write properties via accessor functions.


You can have the benefits of fast save and load without having the same representation. Stream changes to disk.


Storing changes and the originals blew up badly for Microsoft Word in the late 1990's due to privacy concerns.


That is of course not Kosher at all. Compilers are generally free to layout and pad structs in whichever way they prefer, meaning that two compilers for the same platform might very well use incompatible layouts. It can lead to major problems if structs saved by say a 64-bit version of the program is to be loaded by the 32-bit version.


Compilers (for C and C++) typically provide mechanisms to control padding and other layout requirements as necessary to make this work.

It's not that unusual in C and C++ code to define a struct that has a specific and well-defined memory layout. It's kosher, as long as you accept that you're working with a specific set of real compilers and use the appropriate #pragmas or other controls as needed to avoid undefined behaviour.


Yep, but then you force the compiler to choose a specific struct-layout which may be suboptimal if, for example, coding for simd. You also can't control the values of the struct padding bytes which may cause very interesting bugs.


I love flattened ASTs. One of my favorite uses of them is for inline markup in pulldown-cmark (see [1] for a brief description).

Here's a sketch of the problem being solved. The raw input is broken into a sequence of nodes, and something like * is turned into a "MaybeEmphasis" node, as it has the potential to turn into emphasis, or remain text if no match for it is found.

Another pass goes through those nodes in sequence, using a stack to find potential matches (the rules for whether a pair of such nodes actually match are quite fiddly and complicated). When a match is found, the MaybeEmphasis node is turned into the appropriate emphasis node, and the entire sequence of nodes between open and close is snipped out and made into a subtree of the new node. This is a somewhat unusual tree transformation, and a straightforward implementation could easily be O(n). But with the flattened AST representation, it can all be done O(1), no matter the number of nodes or stack depth.

For those interested in details, the tree representation is in [2] - there's basically a "child" and "next" index along with the node body - and the tree surgery on emphasis match is in [3].

The performance is excellent. I think pulldown-cmark may not be the single fastest CommonMark parser out there, but it's certainly competitive, and a lot faster than approaches that do, for example, an allocation per node.

[1]: https://fullyfaithful.eu/pulldown-cmark/

[2]: https://github.com/raphlinus/pulldown-cmark/blob/b7e709c0bd6...

[3]: https://github.com/raphlinus/pulldown-cmark/blob/b7e709c0bd6...


This reminds me of a GDC talk where rust was praised because it forces you to either go mad because of the borrow checker or structure code using an entity component system. I find it funny that the value of the borrow checker in all these real world scenarios with complicated lifetimes is to find a way to avoid it entirely by putting stuff in arrays and reference them by their index.


Using an entity component system felt like writing very fancy spaghetti code in my limited experience with the Bevy game engine in Rust. No longer having direct references between objects means the type system isn't nearly as useful and I found it very hard to reason about the code. I can believe it becomes useful in very large systems but it was just a hinderance in the small program I was writing.


You can use phantom-typed index types (a wrapper type per container) to recover type safety. Or at least you can when using an ECS in C++; I'm not sure how easy it is to write the wrapper types generically in Rust.


You can have typed indices (https://lib.rs/crates/typed-index-collections), but unless you force the collection to be a singleton, there's no way to prevent mixing of indices between different instances (clones) of the same collection type.


A lot of the mangling produced by ECS is an outcome of representing dynamic behaviors through static optimizations.

If you have a dynamic type that you can attach arbitrary new fields to, then, capability-wise, you have exactly what's needed to make game entities: to do a lookup by entity type, just traverse the list of all entities looking for a magic pattern in the data. To look up a specific one, assign each one a unique ID and search by ID.

The problem is that game developers get anxious about this kind of lackadaisical structure(for good reason, if there's any aim at serious performance) and want to put more things in their own index, and allocate things with a more compact representation and less fragmentation.

And the alternatives are...god object with every possible behavior crammed into an oversized record type, and ECS bookkeeping, in its various flavors, some more compilation-heavy and others more reflective with more runtime editing functionality.

But regardless of what approach you take, every time you introduce dynamism into your entities and enable more flexibility in asset assignment, you convert more of your bugs into data bugs. There are a huge number of bugs in games that are configuration problems with the entity and not a flow control or algorithms issue.


Correct. Rust lifetimes become less powerful in certain niches of high-performance programming where you're avoiding the heap or don't have one in the first place. Games, databases, embedded, HPC-style batch workloads, and even compilers.

Of course you still have restrictions on aliasing, so you're not going to get data races. But you'll still get bugs that are essentially equivalent to ones you'd get with raw pointers.


Note that you still have implicit/elided lifetimes for basically every function argument and every local variable. They’re everywhere, even if you’re not typing 'a.


You could say that it forces you to define long living entities that own memory and lend it to short-lived objects.

In the process you avoid use-after-free, double-free, and accessing unallocated memory because the borrowers always live shorter lives than the owners, and can only access borrowed and thus allocated memory that is deallocated once the owner dies, and there is nobody else who can use it or free it once more.


> In the process you avoid use-after-free, double-free, and accessing unallocated memory

You don't really avoid those things, though. In the process you end up with index-use-after-index-free, double-index-free, and accessing index-unallocated array entries.

These are the exact same bugs the borrow checker prevents in main memory, just hidden from the borrow checker by adding a layer of abstraction.

These are memory-unsafety bugs. You still get the same wrong answers caused by coding errors. Random junk values and behaviours, when dereferencing use-after-free invalid pointers that point to memory reused for a new object. It's just that pointers are now called indexes, and the borrow checker doesn't check these pointers.

It's like turning the borrow checker off for this set of objects. Those long-lived entities you mentioned act as a mechanism to enable that. That's useful to do, but nobody should be under the impression use-after-free, pointers to the wrong objects and other memory-unsafety bugs don't happen in the array index model.


Yes, this is true, but sometimes it’s because pointer ownership is a poor fit for the problem. Sometimes dangling references are logically possible and what you want. (There should be a runtime check, though. Generation numbers can help.)

A compiler can only prevent bad things from happening within a system, typically just one process. The world around it doesn’t work that way. A system is often a cache, not an owner, and caches go out of date because the world changed without notice. You want to update on what notices you get that the world changed. You can’t prevent inconsistency, only detect it and react by updating or removing outdated information.

This probably isn’t relevant within a batch compiler, but it would be for a language service in an IDE.

Games often simulate worlds where references shouldn’t own things. It would be a weird form of power to remotely prevent something from getting destroyed because you remember it. Even though both objects are within the same process, they’re modeled as independent systems where pointer ownership doesn’t happen.


With something like the slotmap, you store a generation counter, so you can't accidentally use something that has been freed and recreated.

It also checks that the slot you are trying to get is occupied, and if it's not it won't let you access it. It actually is forced to check that the slot is occupied because the slot is an enum with vacant and occupied variants, and the vacant variant doesn't have a value to get in the first place. In rust you can't read a data out of an enum without checking the variant first.

With these two restrictions, you can't use stale keys, you can't UAF, and you can't read junk data at all, it's enforced by the type system and the runtime checks.

You can still have keys that are invalid, but the slotmap's "get" method returns an option type, and invalid keys cause it to return "None".

So in practice, you will end up panicking if your keys become invalid (the index operator panics when the return value is "None"), or you will explicitly handle the "None" case and do whatever you want to do.

All of this only applies to this specific data structure though, and the checks do have some performance cost.


> These are memory-unsafety bugs.

It seems that these are memory-unsafety bugs with the caveat that they have nothing to do with the memory allocator. Maybe a sort of sandboxed memory unsafety? I don’t know.


> nothing to do with the memory allocator

It's an application-specific memory allocator.


That’s in effect what I’m saying. You’ve implemented a memory allocator using the “system” one.


That's the specific case of (mis)using an arena, no? Could this be extended to the general model of ownership?


Ironically, we could say that we're losing the true ownership relationships between the objects, and we're making medium-term memory leaks more likely; drop() can free a Box pointing to a child, but can't free an index. In other words, we lose the guarantee that an element is released for reuse.

AFAICT, a language would need something like higher RAII [0] or linear types [1] for that. I'd love to see Rust adopt these features too one day, though it may be difficult to do backwards compatibly.

[0] https://verdagon.dev/blog/higher-raii-7drl

[1] https://austral-lang.org/linear-types


Re linear types, Rust actually has an affine typesystem, and the compiler complains when you move a variable and try to access it, so instead you need to provide a reference if you intend to do that multiple times.

A reference is a new object that references an existing memory value. You can not store a reference unless the borrow-checker can prove that the object that stores it has a shorter lifetime than the referenced object.

That is also why you can't just pass that reference around willy-nilly, because the reference is consumed due to affine types.

I may be misunderstanding something though so feel free to correct me.


It’s almost as if immutability writ large is vindicated, but with a whole lot of complicated rules to let mutation infect everything everywhere even if you’ll never use it. At least that was my takeaway trying to learn Rust.


Do you think a purely functional language without GC would be much simpler than Rust and Rust's borrow checker? I don't think the mutation part of Rust is what makes it complicated.


It’s very possible there’s something I’m missing, but my instinct is: yes, it very probably would. If data can be presumed immutable, a whole lot of ownership rules could be relaxed to scope and cycle counts which could be verified at compile time with no runtime penalty. The whole premise that data needs to be borrowed is explicitly to guard against conflicting views of shared state. Such conflicts can only arise by mutability of shared state. Rust’s solution to that is to limit mutability to only one part of a procedure at a given time. If nothing can mutate state at any point in a program’s lifecycle, that tradeoff can be expanded to basically whatever facilities the hardware/compile target affords.


Ada has better tools for the task if mutability is allowed.


Yes totally, I keep seeing people mention these arena / flattened tree and graph structures, without mentioning memory safety.

And also weird claims that arenas solve memory safety problems in C, when it's equally likely (depending on the program) that they CAUSE dangling pointers, use-after-free, etc.

The same issue comes up in slightly different ways in both C/C++ and Rust

---

My comment on this post from 2 months ago:

https://old.reddit.com/r/ProgrammingLanguages/comments/1350d...

Summary: the upsides are very real, but we should mention the downsides too:

- Arenas punt on memory safety; Ownership can be nontrivial (a bunch of examples)

- Mutation, and appending to list/vectors are complications

- Pointer representations are more friendly to debuggers

My wiki page is linked at the bottom of this article (which I appreciate because it actually has code and measurements!)

https://github.com/oilshell/oil/wiki/Compact-AST-Representat...


> when it's equally likely (depending on the program) that they CAUSE dangling pointers

Packing your objects into their own tight heap not only doesn't solve memory allocation issues, but makes them harder to debug. The tools for fighting these problems in C don't work as well.

In TXR Lisp, I have to have a number of strategies in place for debugging GC issues.

One is Valgrind integration. If you want to use Valgrind, but have implemented your own bump allocation within packed heaps, Valgrind won't be as helpful.

For example, what is a semantic use-after-free in your world (something accessing an object that has been garbage collected) looks fine to the C library; you're just dereferencing a pointer into a large object that you allocated just fine.

With Valgrind integration, you can use the API mark free objects inaccessible. But: that only means Valgrind will detect the wrong use of a reclaimed object (quite swiftly, thank you very much). It will not tell you who allocated that object. The diagnostic will be something like "invalid read, 15300 bytes into a 262164 object, allocated at <call stack>". That is not very useful! It gives you the call stack when that entire heap was allocated, not when that misused object within that heap was allocated.

I have some debug support which takes advantage of reproducible repro test cases where we can count on addresses of bad objects being the same. Once I know the address of the offending object, I can put it into a debug variable called break_obj. When the object is allocated or reclaimed, there is a VALGRIND_PRINTF which will dump a trace. I can also get a breakpoint in gdb (that's why it's the break_obj).

There is also an option to run the GC in a kind of torture mode where garbage collection is invoked on every allocation. Newly allocated objects that are not properly retained by the caller (made visible to gc) will be scavenged immediately, revealing those kinds of bugs by bringing the allocation and misuse contexts close together. A substantial portion of the test suite runs in this GC torture mode. Not everything, because it's quite slow.


Yes, I mentioned the same issue a few days ago -- arenas break ASAN/Valgrind, and those are the BEST tools to ensure memory safety:

https://news.ycombinator.com/item?id=36427264

In response to a claim about arena-based allocators and memory safety


Maybe Valgrind has an API now to attach a backtrace to a custom allocation out of an arena. I haven't touched that in years. I did the initial Valgrind integration in 2009 and just maintained it with changes to the gc.

I see this in the current manual:

VALGRIND_MALLOCLIKE_BLOCK:

If your program manages its own memory instead of using the standard malloc / new / new[], tools that track information about heap blocks will not do nearly as good a job. For example, Memcheck won't detect nearly as many errors, and the error messages won't be as informative. To improve this situation, use this macro just after your custom allocator allocates some new memory. See the comments in valgrind.h for information on how to use it.

I can't remember whether I saw this before; i.e. is this something existing that had not been useful, or is it something new. I will look into it.

It does look like it should be attacking the same problem: that the errors aren't detected well or nearly as informative in blocks carved out of larger block by custom allocators.


Alas, I've not been able to get it working. Possibly, the issue is that in my heaps, I do not have red zones; things are tightly packed. That's okay because the heap cells do not store arrays. I'm not concerned about overruns whatsoever.

When I go through the hoops of registering each heap as a memory pool, which supports malloc-like allocations doled out of it, problems happen.

When cell [0] of a heap becomes garbage and is indicated as freed (reclaimed by GC), Valgrind throws an invalid free/delete/delete[] error, saying there is a conflict between that and the entire block. I think this is because there is no offset between the block and the start of the heap. Valgrind assumes there will be some header or red zone between the entire memory zone and the pool area. It seems to be saying, whoa, you can't do VALGRIND_FREELIKE_BLOCK at that address; you allocated that address from malloc. Well no shit, and I told you that block was a heap, from which I'm getting VALGRIND_MALLOCLIKE_BLOCK blocks.

Secondly, after the first GC cycles, errors go haywire, like "invalid read of 2 bytes: 0 bytes after 16 byte block allocated at ... blah blah". Basically, it thinks that various accesses are overruns, possibly due to there being no red zones between the blocks. Its quite baffling.

Maybe current Valgrind has fixes for some of this; I just use what distros package.


Pointers are just indexes into memory space. Seems to me that if Rust is trying to solve a general programming problem the borrow checker needs to handle all indexes, not just those specialized for memory space.


I feel that for this to be really viable, Rust would have to allow references that consume less memory (like 8-bit or 16-bit pointers).


This is also how Prolog terms are represented on the heap in the Warren Abstract Machine (WAM). For instance, taking the example of the article, if we have an expression such as the Prolog term +(*(a,b), c), written using operator notation as:

    expr(E) :-
            E = a*b + c.
Then we get a flattened representation on the global stack of the virtual machine. In Scryer Prolog, we can inspect the WAM instructions with:

    ?- wam_instructions(expr/1, Is),
       maplist(portray_clause, Is).
yielding:

    put_structure(*,2,x(3)).
    set_constant(a).
    set_constant(b).
    put_structure(+,2,x(2)).
    set_value(x(3)).
    set_constant(c).
    execute(=,2).
Note how both compound terms are linearized, and appear on the heap as: functor, followed by arguments, each occupying exactly one memory cell of the WAM. The arguments can point to other memory cells. The heap is an array of such cells, all of the same concrete (as opposed to abstract, i.e., WAM-level) type. For example, Scryer Prolog uses 8 bytes for each cell, making cell access and modification very efficient on 64-bit architectures.


> Instead of allocating Expr objects willy-nilly on the heap, we’ll pack them into a single, contiguous array. Instead of referring to children via pointers, Exprs will refer to their children using their indices in that array.

This isn't flattening; it's just an alternative heap representation. The shape of the AST hasn't changed. It's been done in many languages before; such as Lisps (packing cons cells and other objects into arrays, using bump allocation, and indices for pointers and such).

Objects being in an array makes them convenient for GC to traverse them in a sweep pass after the marking is done. Marking walks the graph to find the reachable objects; sweep goes trough the flat arrays to clear the GC bits, and indicate unreachable objects for recycling.

I don't think you will easily find a semi-serious Lisp implementation that just mallocs every cons cell individually. You'd have to put them into a global linked list to be able to do the sweep part of GC. Or a global array that just contains pointers. I've seen at least two toy Lisp project someone banged up in one weekend in which cons cells were malloced and leaked; GC was left as a giant TODO.

Global arrays can end up happening anyway, even if cells come from a packed array heap. E.g. if you implement generational GC in a non-copying allocator, one way to round up the baby objects so you can sweep them in a fast GC cycle is to add them into an auxiliary array; that then represents the nursery.


Good article, but I'll mention two gotchas:

1. Storing nodes in a resizable array means as the input program grows the compiler will need a larger and larger block of contiguous memory (which may or may not be available). You could work around this by allocating page-sized blocks to pool from.

2. Care needs to be taken with how AST nodes are represented in code. For example, using a union type to store nodes is self-defeating as a union type is as large as its largest member and since not all AST nodes are equal size, this means small AST nodes will be pointlessly padded to account for the size of the largest AST node.


> small AST nodes will be pointlessly padded to account for the size of the largest AST node.

This is a great point, and something I mentioned in my blog post on custom-bitwidth integer types: https://alic.dev/blog/custom-bitwidth

Discriminated unions can work, but you have to be clever with your memory footprint.


Sometimes I wish we didn't end up with flat address spaces. The "virtual memory" technique allows us to stitch fragmented regions of physical memory into a contiguous region of virtual memory and if virtual address space was segmented, it simply would not ever become fragmented; any memory region could always be grown in-place, without intersecting with any other region, so realloc() implementations could just drop their memcopy() paths... oh well.


Can be mitigated by a rope(?) structure (i.e. the vector is split into chunks of fixed size, no re-allocation required) with real pointers. It will be unsafe inside, but safe (read-only) interface seems to be possible.

Deallocation will be O(n), but still much faster than a tree.


I was surprised to see nodes still have two pointers ("references") given that you now know that that the first pointer will always point exactly to the next node. I've see https://github.com/rswier/c4 use that. Granted it doesn't make for the most readable code, but it's even smaller and faster.


When I think of "arena" related to memory management, it's not about "flattening", but just an arena allocator ... When you're allocating a bunch of items with the same lifetime, you can more efficiently allocate them from one or more large chunks of memory, and then at the end of their collective lifetime you free the large chunk(s) rather than freeing the items individually. The allocation can also be more efficient since you can just incrementally use the space in the parent chunk - no need for free-lists like a general purpose heap allocator.

In this context, the "flattening" aspect of this - using indices rather than pointers - could be viewed just as using pointers (offsets) that are relative to the parent chunk.


I used such a succinct AST structure to implement a JavaScript parser and interpreter for a severely memory constrained environment (embedded): V7 (https://github.com/cesanta/v7)

We later switched to a ast->bytecode compilation step but for a while the implicit AST was directly traversed during interpretation.


I did something similarly for my Yaml to Sql compiler at https://yaml2sql.netlify.app.

The process of flattening is kinda weird, but fun, worth the effort afterall.

For example, the Boolean expression flatten is good exercise for who wants to try out: https://github.com/revskill10/yaml2sql/blob/main/app/query.r...


Flattened memory seems rather like normal memory. ie. it's linear and addressed by an offset. So you could "just" use a custom memory allocator that didn't bother with all the design considerations that are needed to be able to free memory and prevent fragmentation.

It wouldn't get you the benefit of using 16-bit indices or whatever but it should still be helpful and might let you write your program very "normally".


I'm using this technique in a compiler [1]. One thing I really like about it is that, unlike pointers, I can reuse the integer indices (`ExprRef` in the article) between arrays. So, for example, I have separate arrays for the AST and computed types, which allows the AST to be immutable. (perhaps the article already mentions this advantage, but I didn't see it in the list of advantages)

[1] https://github.com/audulus/lyte


Interesting. When I read the title, I thought "flattening ASTs" would refer to eliminating control structures (ifs and loops) and converting those to a flat list of instructions with jumps (gotos) between them.

And the reason I thought that is because I used this term, "flattening ASTs", when implementing "bytecode compilation" for EndBASIC (see https://jmmv.dev/2022/11/endbasic-bytecode.html). This kind of flattening had the nice side-effect of unlocking the ability to implement GOTO as well as the ability to more-accurately capture and handle "interrupts" in the language executor.

Anyhow, I'll have to keep this article in mind when I end up implementing flattening for expression evaluation as well, which I haven't gotten around to yet :P


AFAIK, many BASIC implementations worked in-place: each input line was replaced by its tokenization, then by the bytecode, then the labels would be resolved. Consequently, printing the program listing was not a matter of simple "print all earlier input lines" (those are gone), it involved actual pretty-printing.

In fact, replacing the program text with its tokenization as the first compilation step was a very popular strategy back in the day, due to memory constraints. IIRC the early IBM FORTRAN compilers, being quite large (they performed quite a lot of optimizations), would not even fit into the core memory together with the source code of any reasonably useful program. So they were instead written as a series of passes (about fifty or so, I believe) each of which took the output of the previous pass and transformed it into the input for the next one; the tokenizer was quite tiny but it freed lot of space (text is quite redundant) so the next passes had room for the auxiliary structures and could themselves be larger, too.


Right, thanks. That’s more than I knew about how the old interpreters worked, although I suspected something along these lines.

This explains why I have had so much trouble representing the language as an AST, and I tried to cover this in this other post: https://jmmv.dev/2023/01/endbasic-parsing-difficulties.html


Same ideas are used in Nim's incremental compilation (wip) https://github.com/nim-lang/Nim/tree/devel/compiler/ic Also there are two JSON implementations with a flat architecture https://github.com/Araq/packedjson and https://github.com/planetis-m/packedjson2 (this one I wrote) But I agree it's a pain to write it like that.


One more JSON implementation using this approach is https://github.com/zserge/jsmn.


There was some work on having flattened JS as bytecode. It never got much adoption but I believe it would have made the web a lot more efficient.

https://github.com/tc39/proposal-binary-ast


Bellard's LibNC library uses flattening structures to lay out tensors.

https://bellard.org/libnc/libnc.html


Reminds me of tree_flatten/tree_unflatten from pytrees (jax). Just heard about co-free traversable functors but I don’t quite have the expertise to say whether they are also this


I haven't seen RPN inside a compiler yet, but that's actually not a bad idea. Interesting article.


Aren't stack machines the bread and butter of many intermediate representations? Or is that more of an interpreter thing than a compiler thing?


They are for executable formats.

On compiler IR, the modern way is to use SSA rather.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: