
Using Lua as a Serialization Format - fish45
https://mkhan45.github.io/2020/06/16/using-lua-as-a-serialization-format.html
======
ufo
Fun fact: Lua started its life as a language for representing data like this.
At first it only had lists and hash tables but over time it gained support for
things like if statements and loops, to be able to describe more interesting
data. Eventually it morphed into the Turing complete programming language that
we know today :)

[https://www.lua.org/history.html](https://www.lua.org/history.html)

The current version of Lua still has some traces of that origin. For example,
to keep RAM use manageable when parsing very large files the parser directly
emits bytecode in a single pass, without building an intermediate abstract
syntax tree.

~~~
sillysaurusx
Ironically, your username is one of the neatest Lua[JIT] projects I've seen.
[https://github.com/malkia/ufo](https://github.com/malkia/ufo)

~~~
akiselev
Unspoken HN rule #27: There's a 75% chance this isn't a coincidence.

~~~
Macha
Though the person who wrote that library has a different HN account:
[https://news.ycombinator.com/item?id=1643658](https://news.ycombinator.com/item?id=1643658)

------
motogpjimbo
This isn't an argument for putting a Turing-complete language in your config
file parser. It's an argument for documenting your config file format so your
users can write scripts (in a language of their choice) to generate config
files programmatically.

It seems cute to do it this way, but now you have to start talking about
removing certain functions from the standard library, running the script in a
sandbox, etc.

~~~
joppy
What is your opinion on config files used by the Bazel build system, written
in a programming language called Skylark [1] (much like Python)? I haven't
looked into it but it seems plausibly Turing-complete, or at least powerful
enough to cause problems in the same way a Turing-complete language would.

[1]:
[https://docs.bazel.build/versions/3.3.0/skylark/language.htm...](https://docs.bazel.build/versions/3.3.0/skylark/language.html)

~~~
laurentlb
For the record, Starlark has some restrictions: no infinite loops, no access
to the world, etc.

It's true that computations could be arbitrary long (as with most other
languages). Bazel has a flag to reduce this kind of problem
(--max_computation_steps).

------
flohofwoe
About 20 years ago, I used Tcl as object serialization format in a game
engine, and then made the next mistake to allow injecting manually written
'scriptlets' into this serialization format (after all, serialized objects
were just Tcl scripts).

These two things combined were probably the biggest software design mistake I
ever made. We still have to drag a stripped down Tcl interpreter along in our
asset pipeline 20 years later because it's nearly impossible to switch away
(because of the manually written 'scriptlets' lurking everywhere in asset
source files - of course it _would_ be possible, but it was never on the right
side of the cost-benefit equation).

Never mix code and data, folks :)

~~~
enriquto
a stripped down tcl interpreter is one of the tiniest and less harmful
dependendencies you can have nowadays. I do not see a problem with that.

And code is data, and has always been.

~~~
kstenerud
Until someone sends serialized code-data like:

    
    
        for i = 1, 1000000 do
            for j = 1, 1000000 do
                for k = 1, 1000000 do
                    for l = 1, 1000000 do
                        ...
    

Once you move your publicly injectable data format from declarative to
something else, you're in for a world of hurt.

~~~
nmadden
Tcl has had sandboxed interpreters with resource limits for a long time [1].
You can even remove built-in commands (for/while etc) if you want to. You can
limit the language to a purely declarative subset and selectively expose
commands and resources as you wish.

[1]:
[https://www.tcl.tk/man/tcl8.5/TclCmd/interp.htm#M46](https://www.tcl.tk/man/tcl8.5/TclCmd/interp.htm#M46)

------
jmiskovic
Author is surprised by convenience of describing data in Lua. I think he's in
for another surprise if he re-implements simulation on Lua side. The LuaJIT is
so fast, I can just code the most naive implementation and never go back to
refining it. It's the simplest fast language and to me optimal choice for
fully utilizing a single CPU core.

~~~
fish45
well I've been super impressed by luajit in the past but given that the
simulation is pretty highly multithreaded I don't think a Lua version would do
so well in comparison

~~~
metroholografix
It's trivial to have multiple Lua states running in different threads in the
same process. Messaging and data exchange between those states is easy and
LuaJIT gives you a lot of tools to make it painless.

------
thurn
Lisp programmers realized in the 70s that having an artificial division
between code and data was frequently problematic, hopefully other languages
continue to catch on as well :)

~~~
kiwidrew
On the other hand, it's often useful to have data and configuration files that
_aren 't_ Turing complete, because then you can safely make assumptions about
how much time and space is required to load/parse them.

Unfortunately every configuration file format these days seems to add so many
features that it inevitably becomes Turing-complete... the trick is finding
the balance between expressiveness and halting decidability.

~~~
lmm
You might like to look into Dhall - a config language that has been careful to
remain non-Turing-complete.

~~~
kiwidrew
Wow, cool, that is a really nifty language!

I don't care for the syntax (at all) but I am seriously impressed by the
careful design. I've always wondered where you'd need to draw the line between
plaintext config files and executable code, and Dhall is much further into
"programming language" territory than I would've expected.

*Dhall homepage for anyone else that's interested - [https://dhall-lang.org](https://dhall-lang.org)

------
ReactiveJelly
I tried this on 1 or 2 small projects, years ago.

It was okay. It's cool that you can import data directly into Lua scripts,
since Lua usually has poor library support for any other data format.

But I hit a wall with a limit on the number of tables in a file or something,
that was specific to the Lua impl I was using. (Probably LuaJIT)

Sometimes I miss Lua. I think with only a few big breaking changes it could be
a much better language.

~~~
scruple
I used Lua as a data exchange format for some embedded devices (software
defined radios) because it played nicely with the C code and the Lua scripts
we would run on the device (which served as a sort of interface bridge between
hardware device drivers). The Lua binary can be compiled down to < 100kB and
the language itself is fast enough for a scripting language to have been
useful in this context (I don't know if that still holds today, I left
embedded in 2014).

It was a really pleasant experience but we were not working with a very large
number of tables. I miss Lua all of the time, too, and play around with it
whenever the opportunity arises. It holds a really special place in my
programmer heart, alongside lisp, C, and Ruby, for having shown me some really
interesting things at the right time in my career. It's too bad it doesn't
have more widespread adoption, I think a lot of people could benefit from
using it for the things it's good at.

~~~
rstupek
There are a number of places it's being used today. There's an nginx module
with an entire web front end development environment (openresty), Cloudflare
uses the nginx lua module heavily, you can develop cross platorm (ios,
android, web) games in Solar2D.

~~~
scruple
I'm aware, but I've never seen a shop I've consulted for or any of my
employers use it. I've tried to get it in use a few times and ultimately it
went no where because I would be the only programmer in a department with >
100 FTEs who would be able to work on it or maintain it unless some others
took the time to learn it. It's been a non-starter as a result.

When I ask around, I don't hear about it in use locally (edit: that's actually
not entirely true, Blizzard uses it and maybe some of the other local game
shops, but I don't see it anywhere else locally). I met someone at a
conference a year or two back who was using Kong. That's been it. Maybe I'm
just missing it but I don't really see widespread use.

~~~
vvanders
It never really saw adoption outside of games from what I've seen.

Its such a simple but powerful language, has one of the best interop apis with
C and I love how they approached coroutines.

It's really a shame it doesn't see wider use. We used it to ship all our game
logic on the PSP, 400kb prealloc block and it worked like a charm.

~~~
zwirbl
It's used in CHDK, the Canon custom Firmware Project for user definable
scripts [https://chdk.fandom.com/wiki/CHDK](https://chdk.fandom.com/wiki/CHDK)
Besides this I've rarely encountered Lua in the wild

Edit: And of course in Magic Lantern, for Canon EOS DSLRs
[https://magiclantern.fm/](https://magiclantern.fm/)

------
smcameron
My hobby project, a space game (Space Nerds in Space) allows for Lua
scripting, and it will allow exporting (most of) the state of the game to a
lua script which, when run, restores that state. Ironically, the state of
potentially running Lua scripts is the main thing stopping me from being able
to checkpoint the entire state of the game.

~~~
dividuum
I solved the state for a Lua project of mine by replaying all external input
and making the game itself only depend on that input. Of course that's not
viable in most cases, but helped by built
[https://geolua.com](https://geolua.com). You can restore a game from 6 years
ago almost instantly.

------
Impossible
My game, Shadow Physics used Lua as a serialization format as a side effect of
the game initially having no level editor or level import capabilities, only
Lua scripts to create levels procedurally. Copy and paste was also implemented
using Lua, a fun side effect of this you could copy and paste arbitrary Lua
code as a live coding technique.

------
OJFord
OP if you're the author, FYI first link to previous post ('this is a follow-up
to') is broken:

[https://mkhan45.github.io/2020/06/12/Lua-
integration.html](https://mkhan45.github.io/2020/06/12/Lua-integration.html)

Edit: Ah, 'Lua' should be lower-case.

~~~
fish45
thanks, I've fixed it

------
memexy
This is cool. Props to the author for thinking in original ways. Most people
would have reached for a static solution like templating markup or a
serialization format like YAML. Using a real programming language is way
better.

------
rurban
A warning: Don't think that this is a good idea. It's small, it's fast, it's
elegant. But it's totally insecure.

Everybody who had access to this format can take it over. It can do far too
much, that's why everybody uses JSON. It can only represent hashes, arrays,
and some limited primitives. No links, no logic, no lib. But lua is far too
powerful. Game over.

Lisp would be even better, btw. Much easier to parse, and much easier to avoid
eval. But his idea is eval.

~~~
memexy
You can always sandbox the process if you don't trust the input. Ruby has
taint checking built-in and most operating systems have lightweight sandboxing
solutions. Docker is another way to go. You can just execute the script in a
container and then grab standard output as the configuration.

~~~
plorkyeran
A much simpler thing to do is to simply disable all of the standard library
functions which allow i/o. Lua has functionality specifically intended for
limiting the environment available to a script, and the standard library is
small enough that doing this is actually practical.

~~~
memexy
I remember playing around with Lua but didn't look too closely at sandboxing
facilities but what you say makes sense.

------
mhd
I used Tcl for that in the past. Allows for some pretty powerful
configuration.

For pure "serialization", this is a bit overkill, of course.

------
skocznymroczny
I feel like the ron format is just too verbose, and JSON would be comparable
to your Lua example. As long as you don't serialize stuff like polymorphic
classes (and you don't have those in Rust anyway), you don't need to put the
name of every object into the serialization file.

~~~
fish45
I somewhat agree, but I feel that being able to do math/logic with Lua makes
it superior for this use case since I'm not worried about security

