
Domain-Specific Languages and Code Synthesis Using Haskell (2014) - dodders
http://queue.acm.org/detail.cfm?id=2617811
======
conistonwater
Embedded DSLs in Haskell are really cool, but sentences like these confuse me:

> _The do statement can be reified by normalization._

???

> _Control flow is problematic and cannot be used directly, but there is a
> generalization of Haskell Boolean that does allow deep-embedding capture._

What does it mean for control flow to be problematic? Control flow is the most
basic feature I expect (it's not shift/reset), so what's up with
_problematic_? What does it even mean for something so basic to be
problematic?

For someone unfamiliar with these techniques, it's hard to tell the difference
between a nifty trick that makes it possible to make something work at all,
like a proof of concept, and a general tool that can be used to solve general
problems.

~~~
tikhonj
> What does it mean for control flow to be problematic?

The idea is that we're embedding a DSL in Haskell by evaluating Haskell
expressions _symbolically_ within the DSL. Instead of doing a computation _in
Haskell_ and providing a value, we want to output a program in some other
language from Haskell and then run that. This other program can have totally
different semantics from normal Haskell: it might be a description of hardware
(like Kansas Lava), it might be efficient C code, it might be an SMT
formula... etc.

A nice, lightweight way to do this is to define a type where operations
produce expressions in your target language. We can overload a bunch of things
like + to do this; 1 + 2 would be evaluate to an AST along the lines of `Plus
(Literal 1) (Literal 2)` which can then be treated as a circuit or a logic
formula or whatever. This is called a "deep embedding".

The problem with control flow is that it happens at the _Haskell_ level, not
at your symbolic level. So `if (x > y) then 1 else 2` will either evaluate to
1 or to 2, but we actually want it to evaluate to the AST representing the
whole conditional expression. In a sense, we want to "take both branches"
symbolically.

A good way to think about this is that actual Haskell computations become
_compile-time operations_ for your DSL. Compile-time control flow can change
the result of the compilation, but it can't depend on runtime information—and,
of course, we definitely want to have control flow at runtime!

In the case of `if` we can still make this work by defining our own symbolic
boolean type and overloading the if-then-else syntax, but we simply _can 't_
do this for pattern matching within the confines of Haskell. Haskell patterns
are not first-class citizens and the language does not give you any to talk
about what pattern matching means _within the language_ (except maybe
something really hideous involving Template Haskell).

As to your broader question: deeply embedding DSLs in Haskell is a really
broad and powerful technique, but it also has some real limitations. The
control flow thing is one; another is dealing with common subexpression
elimination. For example, if x gives you a big AST of some sort, you'd
probably like x + x to give you an AST that realizes the two xs are the same
and unifies them somehow but actually getting this structure from a Haskell
expression is awkward at best[1].

However, the technique has one major advantage: it is really easy to use. It's
much less work to build a nice deeply embedded DSL than it is to write a full
compiler; if the above problems aren't a big deal, you're getting _a lot_ of
structure for free by relying on Haskell—you can rely on Haskell's types and
type inference, you don't have to write a parser, you get Haskell as a
powerful compile-time metaprogramming language and you can use things like
Haskell's rewrite rules for lightweight optimizations.

It's a great tool to have in your arsenal because it really lowers the barrier
to writing a DSL, which is an incredibly powerful technique for a whole host
of real-world problems.

~~~
actsasbuffoon
I know a comment like this doesn't add much to the conversation, but I greatly
appreciate your comment. I didn't initially grasp the point of overriding
basic operators to return expression nodes. I've built plenty of programs that
create things like expression nodes, but I wouldn't have ever thought to
override existing parts of the language to return them.

When I read the article I thought the overriding was weird, and I guess the
author's intention slipped past me. Your explanation made it click. I'm dying
to try it out now.

Thanks for the great post!

