Hacker News new | comments | ask | show | jobs | submit login
“Little Languages” by Jon Bentley (1986) [pdf] (edu.mt)
78 points by akkartik 4 months ago | hide | past | web | favorite | 17 comments

Counterpoint: "A Universal Scripting Framework, or: Lambda: the ultimate “little language”"

"Abstract. The “little languages” approach to systems programming is flawed: inefficient, fragile, error-prone, inexpressive, and difficult to compose. A better solution is to embed task-specific sublanguages within a powerful, syntactically extensible, universal language, such as Scheme. I demonstrate two such embeddings that have been implemented in scsh, a Scheme programming environment for Unix systems programming. The first embedded language is a high-level process-control notation; the second provides for Awk-like processing. Embedding systems in this way is a powerful technique: for example, although the embedded Awk system was implemented with 7% of the code required for the standard C-based Awk, it is significantly more expressive than its C counterpart."


I'm torn. I think most polyglot projects I deal with are overly complicated. However, I also feel most embedded languages I deal with are overly complicated.

Some have clearly worked out. Format strings, for example, are much more appealing to me than the equivalent code. Not even getting into the crazy abilities of common lisp. (Seeing what you can accomplish just from the format string, I really wonder why it isn't more widespread.)

And regexes, despite all of the well justified jokes about them being problematic, are often much better than the code that would replace them. It is a shame there is not a BNF equivalent, sometimes.

But it seems most "inner languages" I have had the pleasure to deal with personally have not truly helped. My suspicion is that me and my colleagues just tried to do it all. Knowing when to say "not supported" in a toy language is tough. And the penalty for not doing so, is catastrophic, it seems.

Agreed. Regexps and format strings are cool. Regarding small, dedicated languages, I also like SQL and (to a lesser extent) CSS selectors.

I think the LittleLanguages [0] should be further subdivided into: * languages for describing something (e.g. regular expression, format strings, graph .dot format, LaTeX math equations, etc.) that are usable both from standalone UNIX tools, and from inside programming languages * languages with a dedicated tool (awk, etc.) that are not widely available embedded inside other programming languages. Usually these languages allow you to perform some actions / transformations

I think both papers agree that the former is "good" (the reimplementation of awk in Scheme uses regular-expressions).

In the counterpoint paper it turns a LittleLanguage (Awk) into an Embedded Domain Specific Language by embedding it inside Scheme.

SQL is more difficult to categorize: I think the SELECT expression should be considered a LittleLanguage, however the UPDATE/INSERT part is better when having a full programming language available (see procedural languages inside SQL / triggers, which hint at the limits of the base SQL language).

[0] http://wiki.c2.com/?LittleLanguage

I'm not that fond of fancy strings myself, or things like CL's LOOP macro. I prefer real code with composable abstractions. The problem in my mind is that they're not integrated into the host language, and therefore don't compose well with the rest.

But that being said, using a simple embedded glue/scripting language that's fully integrated into the host environment makes a lot of sense to me. Emacs Lisp, Autocad, any game engine etc. It's a tried and proven strategy.

I started writing my own glue language [0] for C++ since I couldn't find anything simple and integrated enough for my taste.

[0] https://github.com/codr4life/snabl

Is Lua not good enough for C++?

Lua is written in C, which is a very different language these days.

Snabl uses the C++ standard library all the way through, which leads to a tighter fit in C++ applications.

And I have enough issues with some of Lua's design choices to rule it out either case. Everything is most definitely not a table and I want an expressive, generic and gradual type system built in.

Cool. Was just curious about your thoughts on it.

> It is a shame there is not a BNF equivalent, sometimes.

Yeah, something like instaparse [1], but more widespread.

[1] https://github.com/Engelberg/instaparse

Looks great! Amusingly, often I don't care about the parse tree. Just want to know that something validated.

Another successful example of language layering is Python, which sits atop many C and Fortran libraries (and must do so for efficiency).

I can't believe this paper has never been submitted to HN.

(As the submitter of OP, I'd somehow never seen OP or this until today.)

Here is a PDF link found using Google Scholar: https://3e8.org/pub/scheme/doc/Universal%20Scripting%20Frame...

or as pragmatic alternative "Tcl: An Embeddable Command Language" [0].

[0] https://web.stanford.edu/~ouster/cgi-bin/papers/tcl-usenix.p...

Little languages are good for little programs. Once your program grows up, a little language starts holding it back. Your program needs a language for grown-ups. The truth is all little languages are quite primitive, and this is by design. Little languages are designed for a specific narrow purpose. Beyond that narrow purpose they are much less expressive, clunky and inefficient.

Real life tasks often require more than one little language. You can start with a simple text extraction using Awk, then use Sed for text conversion, and glue it all together with Bash. But very soon it becomes hard to maintain and extend this collection of scripts and easier to replace them with one Perl or Python program, which can do all of the above and more.

So the bottom line is, use little languages then they suit your task, but know their limitations.

What would you say makes gluing those languages together hard to maintain?

I've found the opposite, where it's difficult to look through the many dark corners of Perl to do something that could have been accomplished instead by a smaller tool.

I've had a lot of luck with little languages that have an escape hatch into the implementation language.

My "little languages" always have atomic programs/expressions whose meaning is given by an implementation in the base language.

Sometimes, e.g., when my little language has an optimizing compiler or some fancy semantics, the atomic programs/expressions need a tighter interface than "does something". In those cases, I write down the full semantics of the language and figure out what lemmas I need to prove about atomic programs/expressions in order to prove whatever property I want about the overall language. I then enforce those lemmas at runtime using asserts (or, in the case of termination, wrapper code that enforces a timeout).

This pretty much only goes wrong when "everything" is factored out into the atomic programs/expressions. I avoid these situations by only using "little languages" when I can do something useful and difficult using the semantics of the little language. Most common examples include: automatic optimization, automatic parallelization, and automatic verification/static analysis.

But even when it does go wrong in this way, it's no the end of the world: if everything is implemented in the atomics, then you just refactor the implementations of the atomic programs/expressions into a library, delete the interpreter/compiler, and write some wrapper code.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact