Hacker News new | comments | show | ask | jobs | submit login
Domain-Specific Language Engineering (2007) [pdf] (psu.edu)
38 points by davidjnelson 892 days ago | hide | past | web | favorite | 30 comments



The main challenge I've observed with DSLs is maintaining and growing them over time. It's not unusual for a DSL to grow organically based on individual use cases that seem like a good idea, and the end result is a language that's not quite consistent or orthogonal.

"A foolish consistency is the hobgoblin of little minds" is the rallying cry of those who only have to deal with a small number of libraries/languages/tools/systems, and don't realise the true externalities of having to memorise hundreds to thousands of special cases. Small consistent languages are good languages.


To avoid this issue DSLs must be highly modular, which is easily achivable if you're implementing them as eDSLs on top of a meta-language (i.e., mixing DSLs is trivial and interop is not a problem).


I'm not following you. I'm thinking of things like https://github.com/prometheus/prometheus/issues/482 where core elements of a language don't behave in a predictable manner (in this case == has two distinct modes). That's a simple example, more structural issues can be compound by workarounds and lead to a lot of complexity growing over time.


In order for DSLs not to evolve into a mess they must be made of high quality semantic components, which you simply mix together and alter syntax a bit in order to get a new DSL.

When you develop your DSLs this way the problems you're talking about are nearly impossible to run into.

Essentially, if you build DSLs on top of an elaborate hierarchy of DSLs, every single DSL adds only a tiny bit of functionality which makes it maintainable and protects from evolving into a pile of messy layers.

If you have any questions about technical details of this approach - feel free to ask.


> they must be made of high quality semantic components, which you simply mix together

> the problems you're talking about are nearly impossible to run into.

If you mean this in any non-fluffy sense, such a property would depend upon the set of semantic components having some incredibly strong properties.

It's not clear to me that those properties don't immediately conflict with the entire point of creating DSLs -- that the host language's semantic model isn't the one you want.

You can probably get trivial properties about the static semantics without a ton of machinery, and more dynamic properties might be possible to read off from very constrained examples. But it's entirely unclear to me why semantic models should easily combine without creating unintended behavior.

Could you share some concrete examples?


> such a property would depend upon the set of semantic components having some incredibly strong properties.

Why? You can have as many semantic building blocks as you like. My current tool belt contains, among the others, a generic high performance imperative backend, generic functional backend, generic typing engine (which easily include various Hindley-Milner implementations as well as simple type propagation forms), generic lazy functional, generic Prolog and Datalog - these are the low level semantic building blocks which can be mixed together in any proportion.

On top of them there is a large hierarchy of high level building blocks, which include, for example, generic optimisation components (e.g., IR-agnostic SSA-based optimisations), generic parsing frontends, data representation and transformation languages, and many more.

> But it's entirely unclear to me why semantic models should easily combine without creating unintended behavior.

Why would they? As long as the interop rules are clear, there is no problem in mixing different things together.

> Could you share some concrete examples?

Take a look at my github repo (linked in some other comments, do not want to keep spamming it).


> Why?

Because I care about soundness.

And especially soundness for combinations of under-specified semantic models, because eDSLs are useful exactly when the semantic models at hand aren't 30-60 years old with mature implementations.

> imperative backend, generic functional backend, generic typing engine (which easily include various Hindley-Milner implementations as well as simple type propagation forms), generic lazy functional... which can be mixed together in any proportion.

That's just so totally not true. There are a ridiculous number of ways to combine these things in ways that totally kill type soundness, confluence, etc.

The reason you didn't have a hard time combining them is because they are 1) really truly extraordinarily well-understood and well-defined languages; 2) we've known exactly how to combine these things for a really long time; and 3) you knew a priori the reasonable interaction points.

None of these three assumptions are true for most of the eDSLs engineers might want to write down in general.

In general, a framework that lets you get unsound bullcrap from two perfectly sound things isn't a solution to the (e)DSL problem.


> Because I care about soundness.

And what's the problem with soundness? I can even derive formal proofs for simple DSLs when I have too.

I never had any issues with mixing semantic properties of DSLs on any level of abstraction - and the higher the abstraction level, the easier it is to mix without caring at all about the underlying semantics.

For example - I've got a generic PEG frontend language. I can mix it into nearly any combination of execution semantics, because it's high level enough, and it has an intermediate representation which can be trivially mapped into pretty much anything at a very little cost (because of simplicity of requirements). Why should I care about soundness of the whole mix while I can easily prove correctness of this very last step of translation, and this is enough to ensure the rest.

> There are a ridiculous number of ways to combine these things in ways that totally kill type soundness, confluence, etc.

Then do not combine these things in the unsound ways. As simple as that.

For example, this is how I'm using Prolog embedded in an eager somewhat-functional host: the host DSL is used to define transforms over ASTs, including simple flattening transforms that derive sequences of equations (i.e., Prolog terms). Then the Prolog code is used to ask questions about these systems of equations. This way many typical internal compiler tasks are dead simple but yet reasonably efficient (and for some of the most important things I can even use Datalog instead, with all its cool optimisations). The only point where the Prolog world is interacting with the rest of the system is via these lists of equations, so I do not have to care how to reason about WAM interaction with a memory-managed eager functional semantics, all is well compartmentalised.

> None of these three assumptions are true for most of the eDSLs engineers might want to write down in general.

How is it so? Everything is still boiling down to one of the "fundamental" execution models. Higher level semantics are added as sequences of well-understood, provable, trivial transforms all the way down to such fundamental blocks, for which, as you said, we have at least 30-60 years worth of understanding.

The very high level semantics are added independently of any fundamental blocks, with a transform to a specific execution model being added at the very last moment and it is always trivial enough.

> a framework that lets you get unsound bullcrap from two perfectly sound things

If the source language is sound, each of the target languages are sound, and all the transforms in between are sound, then the result is sound and robust.

> isn't a solution to the (e)DSL problem

In my practice it is a solution indeed. Every eDSL I design is dead simple in implementation and very rarely I have to add something new to my current toolbox. With problem domains ranging from 3D CAD applications to hardware design.


> And what's the problem with soundness?

You can definitely not provide any guarantee for arbitrary DSLs implemented in your system, but claim you do.

> I can even derive formal proofs for simple DSLs when I have too.

Am I looking at the correct thing on GitHub? CombinatoryLogic? I can't find any code here that would suggest you're interfacing with a theorem prover, much less the incredibly amount of machinery that must be involved in taking two correctness proofs and producing a correctness proof for their combination.

> Then do not combine these things in the unsound ways. As simple as that.

I think this pretty much sums up the contribution here. It's a framework that doesn't face any problems with compositions because you've chosen not to write down things that don't compose well.

> If the source language is sound, each of the target languages are sound, and all the transforms in between are sound, then the result is sound and robust.

To repeat myself once again, your framework doesn't force any of these to be true. It's just as easy to write down unsound things as to write down sound things. The fact that you choose to write down sound things is nice I guess, but it's not a demonstration of the capacity of the framework or methodology to preserve soundness. It's just a demonstration of your own capabilities... we hope.

That works fine and well when DSLs don't get inter-leaved in unexpected ways and are only developed by a small set of developers. But then, we've known how to design DSLs in those cases for a long time. There's no REAL composition going on -- just the sort of combination that's always happened.

The actually interesting question is how to support an open world of DSLs. Where composing DSL1 with DSL2 won't break some guarantee that the writer of DSL2 worked hard to achieve. And where even changes to DSLA, which DSL2 depends on, also won't break those guarantees. And where the authors of all three languages never even know each other exist. And not just "if you don't write down the wrong thing", but "because the framework prohibits it". You don't solve this problem. In fact, I can't even figure out how to state some specification of a DSL, much less that it's preserved under a transformation...

Again, your framework might be a great setting for doing DSL engineering. And I don't doubt that you designed a tool that's useful for you. But you're making some pretty extreme and apparently over-stated claims beyond "nice framework solving a few practical implementation problems".

(Which is a shame because your tool seems nice, but your comments come across (to me) a bit snake-oily, so I'm not sure how seriously I should take the tool / approach. I think if you re-stated your claims so that they're a more accurate representation of the problems you do (and don't) solve, you might get a better reception.)


> You can definitely not provide any guarantee for arbitrary DSLs implemented in your system, but claim you do.

No, I never claimed so. It's a responsibility of a DSL developer, although it is very easy, because of the small size of any practical DSL implementation.

> I can't find any code here that would suggest you're interfacing with a theorem prover

I did not publish yet a large chunk of work which contains a statically typed version of the host language with an ACL2-like inference engine. It's still quite experimental, but I did many mechanical proofs manually in the past (mostly in the hardware-related DSLs).

> much less the incredibly amount of machinery that must be involved in taking two correctness proofs and producing a correctness proof for their combination.

I still cannot understand your point. Why exactly a combination of DSLs is going to introduce any additional complexity?

I demonstrated in my examples that mixing the DSLs is exactly the same thing as using them. There is a dataflow dependency between various semantic realms, but it is really hard to break any constraints by merely feeding valid data in and getting a valid data out.

So, yes, no existing language workbench enforce language soundness, it's up to the designer to ensure the correctness. But, can you name a single general purpose language with an enforced soundness, CompCert aside?

> get inter-leaved in unexpected ways

Mind providing any examples of this? I cannot think of any interesting case, due to the very nature of this approach, there is no difference between using the languages in a "normal" way and targetting them in code generation.

> guarantee that the writer of DSL2 worked hard to achieve

What does this guarantee worth if it can be broken by merely using this language?

I see that you're more on a bondage&discipline side of the PL research. In this case we have to agree to disagree, my decades of Lisp experience are forcing me to lean the other way. I'm all for the formal proofs and I'm doing them a lot when it is really necessary (e.g., in hardware, where the cost of error is huge and verification capabilities are limited), but besides that, having a hacking and permissive language allows experimentation, while b&d languages inhibit innovative designs.

I only built a b&d, non-Turing-complete (total functional), strictly typed version of my language workbench after years of experimenting on a dynamic, unrestricted codebase.

So, my claims are:

1) It's dead simple to implement eDSLs using metaprogramming, if you've got the right tools (features for dealing with ASTs and their transforms; examples: Nanopass framwork, Racket in general)

2) If you have a hierarchy of ready-made DSLs, then with the above approach you can easily mix arbitrarily selected properties of all of your existing languages into a new DSL.

3) PEG and GLR are great. The others are inferior. All hail the lexerless parsing!

And, btw., my framework is not any different from the other metaprogramming-based language workbenches. You can achieve this ease of development and robustness with any other framework.


If the language reflects the domain of the problem being solved, and it still grows; not sure there is really a better way.

Artificially constraining it to a consistent API is just going to get you into the same problem many functional advocates are having. Which is the confusion of everything being covered by a few names.


Not only maintaining and growing a DSL is challenging but also designing it at first. You need a lot of both domain and technical experience to achieve a half-decent result. That's why the "hype" around 2007 didn't lead to widespread practice. DSLs are costly.


I'm currently developing a DSL. I definitely agree with what you're saying. Domain and technical expertise are needed in abundance. Writing a spec is HARD. It takes a certain perseverance to learn good parser design (I've written a great parser generator and I still feel inadequate in this area sometimes).

DSLs are costly. My concern is that the cost will outweigh the benefit and people won't like it and we'll be suck with it. My hope is that it'll stabilize and be expressive enough to fit everybody's needs.

Did you ever write a DSL used in a production environment? How complex was the DSL, more declarative or closer to a general purpose language?


> Writing a spec is HARD.

Chances are that you're severely over-engineering it. Just take the English specification for any particular problem your DSL is supposed to solve and slightly tweak it into looking like a formal language. That's it. Extremely easy with a bit of practice.


You have no idea.


Yes, tell me more. I build eDSLs professionally. You can take a look at some of the stuff I'm building: https://github.com/combinatorylogic


But where is the domain in your Domain Specific Languages? You built SLs but not DSLs.


In this case domain itself is language construction: I built a system of DSLs that make implementing any other DSLs easy. And I am using this and the other similar toolchains (like Racket) to build DSLs for a very wide range of domains.


I never wrote a DSL for a production environment. Once I wrote a DSL that allowed users to script an existing API for testing purposes. In my experience e.g. EDI and ISO specifications and protocols are also good DSLs.


I actually find compilers for transformation of DSL AST to target languages much more costly then designing the DSL syntax.

But that's probably because I don't think using templates for code generation is good enough. At least if you want to do something interesting with it.

Language workbenches cut down the cost of DSL design to minimum, but more interesting problem is providing valuable output from it.


> I actually find compilers for transformation of DSL AST to target languages much more costly then designing the DSL syntax.

Mind explaining why? What can be costly and complex in a chain of trivial transforms, each being very simple, flat and comprehensible?


When I say DSL I mean external DSL, not a fluent interface.

So an example of the problem I deal with is a database migration. Let's say we have an entity with a value

entity SqlTable { List<TableColumn> nestedTuple; }

value TableColumn { int i; }

and if I change column type in value object to long I want my compilers to prepare a DB migration with the appropriate SQL statements for the specific DB I use. Of course, there is nothing too complex about it, but it's not trivial either (in this case you have to prepare a second field, unnest the whole hierarchy to get to the nested field, copy it to new type and compact the hierarchy back again).

I find it costly since there are gazillion of such features. And when they start interacting with each other things gets messy.


> When I say DSL I mean external DSL, not a fluent interface

If you're using a meta-language, there is no difference between external and embedded DSLs. External (or standalone) DSLs also should be built exactly the same way, on top of a hierarchy of existing DSL components.

In your case, you're likely not compartmentalising your DSLs properly, trying to stuff too much functionality into a single DSL while it must be a chain of different DSLs. E.g., one DSL for describing a schema of the DB you want to migrate (with a tool for inferring it from the existing DB schema), another for representing the intermediate uniform data format and transforms over it, and a third for mapping the uniform intermediate representation on your target DB schema. Of course I do not know any details of your particular problem, just describing how I solved a DB migration problem before. Your specifics could add more to the DSL chain design.


But I find it highly valuable to have a single model representation as a source of truth. You can see how it works in practice here: https://github.com/ngs-doo/revenj

Also, there are no tools for describing DB schemas that way (except if you consider DB DDL such a schema). So my DSL is used as uniform data format. And it's not a problem of mapping between formats, but within the logic required to do such a mapping. It can't be expressed as a simple transformation, compiler is required to analyze and transform it appropriately in various scenarios. And if you want optimizations, good luck with "simple mapping".

So yeah, it's complicated, but it needs to be complicated to support simple modeling DSL. Otherwise you are better off with having several DB schemas, various POOs, Protobuf/Flatbuffer IDL etc...


In this case you need four DSLs - your core model and three DSLs for mapping the model to the specific storage backends.


> When I say DSL I mean external DSL, not a fluent interface.

That's the problem here. 'DSL' is a vague and ambiguous notion. An ad-hoc 'fluent interface' usually isn't considered a DSL.


I'm talking about efficiently compiled embedded DSLs built on top of meta-language hosts (with potentially more than one backend).


> Not only maintaining and growing a DSL is challenging but also designing it at first.

It's not that challenging if you do it the right way.

> DSLs are costly.

Plainly wrong. DSLs are the easiest and cheapest way of eliminating complexity. You simply have to follow a proper discipline and use the right tools.

When I build a DSL, it's rarely more than 100 lines of code including literate comments, and the resulting DSL is a proper optimising compiler with debugging support, and this new DSL is immediately supported by an IDE (at least with syntax highlighting, autoindentation and autocompletion).


Funny, I was a student of his and as soon as I read the title, I thought it might be his work :)


Interesting article




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: