Hacker News new | past | comments | ask | show | jobs | submit login
A Case for a New Language (pl-rants.net)
95 points by jasim 27 days ago | hide | past | web | favorite | 57 comments

Some DSL are success stories (SQL, HTML, CSS, etc).

But most DSL I encountered in the wild were terrible: no community, weak tooling and documentation, plenty of design mistakes and limitations.

And there are so many in the shadows of the corporate caves, lurking in systems created by the cowboy of the moment who though that would be the perfect solution to the problem du jour. It then had to be supported forever, leaving bugs, frustrations, and unsatisfied programmers and users in it's wake.

9 times out of 10, a library would have been a better choice.

But even famous and popular open source DSL can be a mistake. For me the most obvious example is Ansible. Because they needed idempotence, and maybe expected other languages to join in, they created a YAML based DSL.

In the end, you get a terrible coding experience, learning the thing is depressing and can only be leverage for this very tool, plus debugging very complicated deployment files is a huge hassle.

The ironic part ? In the end this DSL just call Python modules. It's just bytecode with extra steps.

They should have skipped the middle man and gone with a well crafted Python API + tooling + doc making hard for you to make something non idempotent.

What do you think about Java/Groovy DSL of Apache Camel? It is well documented but the syntax requires to chain commands with ‘.’ (dots) which leads to awfully formatted code.

you might find this interesting to look at:


Haven't tried it.

> And if a type system can not guard against run-time errors, what’s the merit of having one?

The type annotations would document the programmer's assumptions and lead to fail-fast behavior. In the presented use case, it would be immediately clear from the run-time type error that the input data isn't in the expected format (as opposed to a typing bug somewhere else in the program).

Second, as aid in design. I had the pleasure to use languages capable of deriving implementations from types directly (where otherwise you'd use reflection) and ones with type-directed search and typed-holes (even though unsafeCoerce fitting every hole is a bit annoying :)

What language was this?

Sounds very much like Haskell. The search engine is Hoogle.


Haskell and Purescript (a close-relative compiled to Javascript). Have yet to 'use it in anger', though :)

The problem author describes, is very similar to the problem of response caching. And DSL that may be similar to Varnish DSL


Varnish VCL, accomplishes the job, but far from elegant, of course.

It would be nice, indeed, to be able to start from some template of a rules language.

Where one can customize the 'language syntax' of declaring entities, and what to do with them. And,then, to implement a domain-specific backend (either as code generated, or as interpreted at run-time).

I could not find a framework like that when looking for it, however.

( At the time, I was looking for something allows me to quickly build frontend, and backend runtime in language that can be embedded into java, C++ and javascript).

I started with metalua (this was years ago), and some success but was too difficult at the time to take it further.

Then I tried xText and xtend (so that I could get automatically an IDE.. but again was too difficult).

> > "... The project I am currently working on involves receiving and processing gRPC messages from some A-brand network devices. The messages contain ordered sets of key-value pairs. The meaning and naming conventions for the keys and values varies greatly and is inconsistent across different message types. The problem was how to describe various processing rules for the messages...."

I'm neither a fan of new languages nor dsl's, so I'm in a minority? Rather than a dsl, I want a great API, and before apis I want compact and expressive utilization of existing apis.

I also agreed with this until very recently. Jumped into a new language and realized how language features can help shape great APIs.

DSL is more like a generalisation of config files.

Mose DSLs I encounter started as configs and tend to be inelegant because they're incidental rather than planned languages. I applaud thinking in terms of languages up front and exercising restraint to keep it to the bare minimum.

While that is one quite useful application of DSLs, they're certainly used more widely than config files. Care to expand further?

I said "generalisation". You're thinking I mean DSLs in a static way - where they compile to a single static structure that is like a configuration file. I am trying to redefine the term here instead, because it seems more appropriate.

Essentially what I am asking here is: where do we set the line between "scripts" and "configuration files"? Because scripts are simply configuration files with a tree-like structure as they represent programs.

Scripts "configure the execution paths of your program".

A DSL thus is just a configuration file with a set of rules of applying those tree-like structures to each other.

Scripts are definitely not configuration file.

I would say the line is Turing-completeness.

For me, DSL are half way between configuration files and scripts. If they are not Turing complete they are configuration languages. If they are, then they are programming language (with a specific use case).

And I think that it is usually a bad idea to have configuration files in Turing complete languages (with exceptions, it can certainly be done well).

>> Scripts are definitely not configuration file.

>> I would say the line is Turing-completeness.

>> For me, DSL are half way between configuration files and scripts

_you_ would say.

>> If they are not Turing complete they are configuration languages. If they are, then they are programming language (with a specific use case).

Or, you could say that DSLs are what they are: "Domain Specific Languages". Which means that can either be a set of static definitions a-la configuration file, or on another side they can be scripts.

All of these definitions fit really well the meaning of "Domain Specific Language".

The decision to bring a new language into the world is not one taken lightly. There are simply too many languages.

DSLs are nonetheless a "necessary" evil; Most languages are too narrow to be useful for everything, so anywhere you see a boundary in your language -- something that might be done better in another -- your code will be shorter, it will be faster, and you will make fewer bugs.

But what if we can create a language with no boundaries? That language would not need a DSL.

A good API (and the client to that API) is indistinguishable from a DSL


I think it's worth demonstrating that more pedestrian approaches are insufficient before writing a new language. New languages can be worth the effort, but they are certainly expensive approaches, all things considered.

Interesting approach! I've been contemplating a similar query DSL for compressed time-series data on embedded devices. It's easier to make a specialized language than commonly thought. Though generally the trickiest part is creating appropriate tooling for things like source maps, debugging, etc.

It surprises me that the author used Elixir for implementation but didn't just write a query DSL utilizing the Elixir macro facilities [0]. A macro based DSL likely would be much simpler than writing a new grammar and builds on existing language tooling. Ecto for example does a great job (IMHO) of making a sane DSL for SQL [1]. Given data processing primitives provided by Elixir libraries like Streams and GenStage [2] it seems it'd be pretty straightforward to implement.

Take for example the "segmented query path expression". It is handled fine with a minor modification as an Elixir quoted statement:

`quote do: @query_path//foo/goo[instance='a2']/blah/x[a='S 512']/y`

That yields a well defined AST tree base on the Elixir syntax: ``` {:@, [context: Elixir, import: Kernel], [ {:path, [context: Elixir], [ {:/, [context: Elixir, import: Kernel], [ ... ```

The closures work as well by switching the dollar signs to ampersands: `quote do: select(@path/foo/&x/&goo/z/y/w, { value(&1) < 3 })`

Makes me want to write an Ecto adapter for dealing with Elixir Streams. =)

0: https://hackernoon.com/understanding-elixir-macros-3464e1414... 1: https://github.com/elixir-ecto/ecto 2: https://hexdocs.pm/gen_stage/GenStage.html

> It's easier to make a specialized language than commonly thought. Though generally the trickiest part is creating appropriate tooling for things like source maps, debugging, etc.

Right. Maintenance, documentation, training, tooling, packaging, debugging, etc., of a DSL far outstrips its development cost.

When you use regular expressions to solve a problem, you have two problems. And regular expressions have (basically) well understood parsers, logic generators, and so on. When you use a DSL to solve a problem, you have N problems. Maybe that's much, much better than the alternative (query languages are particularly interesting), but people definitely shouldn't stop at "this was easy to make... what's the big deal?"

What’s with the macros? With Python you can build fully-featured DSLs without needing macros by simply overriding standard operators.

True, you can but it results in a more limited DSL. This is mainly due to variable names needing to exist, be a valid object and chained used method invocations, which results in more awkward expressions (IMHO). Importantly macros enable compile time checking of a DSL rather than only runtime checking. That alone is very powerful.

Take a SQL query in Python's SQLAlchemy:

   s = select([db.func.sum(users.c.id)]).\
           Users.outerjoin(Posts, Users.id == Posts.user_id))
           users.c.id == a1.c.user_id,
           users.c.id == a2.c.user_id,
           a1.c.email_address == 'jack@msn.com',
           a2.c.email_address == 'jack@yahoo.com'
That's not bad, but compare it to a similar query in Elixir's Ecto:

   from account in App.Account,
       left_join: t0 in App.Transaction,
       on: t0.account_id == account.id
            and t0.user_id == ^current_user.id
            and t0.deleted == false
            and t0.type == "inflow",
       where: account.id == ^id and t1.user_id == t0.user_id,
       group_by: account.id,
       select: {account, {sum(t0.amount), sum(t1.amount)}})
The Elixir DSL allows a bit more flexibility when re-using language built-in's such as `and`/`or`, or undefined ones likes `sum` without needing the hacked names like `and_` or `db.func.sum` as in Python (or Java or others OOs). Generally the formatting is easier with macro based DSL's.

P.S. The examples might not be exactly correct as I cobbled a couple of different ones together.

Very cool, does it compose?

Some statically typed languages allow you to build up composable query snippets along the lines of (from a Scala DSL):

    val accountQ = for {
      (a, t) <- Account leftJoin Transaction (_.id is _.accountId)
      if a.id is ... && t.userId ...
      groupBy a.id
    } yield (a, t.id, t.amount.sum, ...)

    val invoiceQ = for {
      (a, transId, total) <- accountQ
      i <- Invoice join (_.transId is transId) 
    } yield (a, i, total)
Basically you can build up arbitrarily complex queries, all compile time checked -- it's really quite wonderful. Haskell has something similar with Esqueleto and C#/F# has LINQ to SQL.

I love query DSLs, the more we move away from string-y SQL the better :)

Ecto is pretty composeable [1]. Having a compile time checked SQL DSL was always one of the biggest features I wanted back in my Java days.

1: https://elixirschool.com/blog/ecto-query-composition/

I agree - the truth is that in Python it's easy to define DSLs only if they consist of expressions.

It's not that easy to define an arbitrary language that allows you to write code that is close to something else completely (like Ecto vs SQL).

Can you give an example? I've never heard people talk about Python DSLs (just Lisp, Smalltalk, Rebol, and Forth usually).

Theano: http://deeplearning.net/software/theano/library/index.html

Define arbitrary execution graphs that compile to GPU or CPU.

Polymage is a good example: http://mcl.csa.iisc.ac.in/polymage.html#examples

Python embedded DSL for image processing / general matrix munging

Interesting use case ! I think model driven engineering deserved to be treated better nowadays. For people against DSL, just try to play with Ecore (Eclipse) and Xtext, you will be amazed by the speed of conception, and by the powerful reusability of your model. DSL is just the tip part of the iceberg, the concept of meta-model for example is a very interesting to study.

If you want to invest a bit in the learning curve, have a look at the free JetBrains MPS (meta programming system): it is a proper language workbench for exactly that.

Itemis, the company behind xtext, is now using MPS.

It supports composable languages and provides IDEs for the language you created with code completion etc.

It is complex, though, so it takes a while to learn.

I will have a look ! Thanks you for this sharing !

I knew I read about "A domain-specific language for processing ad hoc data" in my interwebs trawling -> https://pads.cs.tufts.edu/papers/pldi.pdf (pdf).

IIRC I got a grammar working then went off to play with some new shiny thing before trying to get it to generate some code since I didn't really have a use case beyond just playing around though I believe my plan was to get it to spit out AST classes since my ASDL parser/generator is kind of a goofy mix of boost::spirit and moustache.

https://github.com/dhall-lang/dhall-lang seems like it might have been a suitable candidate here also.

Check out Gremlin by Apache TinkerPop: http://tinkerpop.apache.org/gremlin.html It is a Turing Complete language that can be embedded in any host language that supports function composition (f.g) and function nesting (f(g)). Finally, if this resonates with you, https://arxiv.org/abs/1508.03843 .

I wish the author went into their failed attempts at traditional tools. I can't help but think XPath and an existing language would do the job and be more approachable to would be maintainers.

Related Racket blog post under discussion at the moment:


Quite a neat language, I just wished there was more code samples / full ones, and/or having the code be open sourced (maybe not possible depending on the employer).

Also I thought it was going to talk about something like the Blub Paradox: http://wiki.c2.com/?BlubParadox

Oh, I did almost the exact same thing once in an SQL database reporting system, brings back memories!

We were putting very hierarchical data into a "flat" database, but we still wanted to be able to query on those deeply nested paths. At the time our front end used a lot of XSLT as well so it was a similar approach. Writing the library that turned these "special" queries back into SQL was one of the more fun engineering projects I ever worked on. Definitely wouldn't do it the same way today but for the project it was a good fit.

http://restsql.org/doc/Overview.html is pretty awesome. Same guy who wrote YAML spec I think.

I don't see why this can't be implemented as a simple library, with perhaps a little more verbosity.

For example:

> select(/foo/$x/$goo/z/y/w, { value($1) < 3 })

Could become in JS:

    select("/foo/$x/$goo/z/y/w", function(a){ return a<3; })

There's potential for optimization if the code executing the query can "understand" what it means.

E.g. an SQL engine can potentially use an index to answer x<3. Your library calling your function for every item probably can't do that, or can't guarantee that it can do that.

There's still potential for optimization, it's just a little harder, because you'd have to parse and process the JS. But ... you don't have to invent a new language.

I'd argue the difference between the two isn't just "a little harder", and is not the same as your function call example: in most languages, you can't or can't always access the source of a function you've been passed, so the code would need to be passed as text. If you define a subset of JS you accept, you've created a DSL. This can be a completely valid way of creating one, and if you actually want to fully interpret it picking something where you can bind a high-quality existing interpreter makes sense, but custom ones have their strengths too, especially if you otherwise would cross paradigms (e.g. the examples in the article seem clearly more like a declarative language).

Nice! It'd be good to hear Stephen Dolan's story of how jq [0] came to be!

[0] https://stedolan.github.io/jq

I have been coding for 49 years. And people seem to be imventing new languages at a steady rate the entire period. The cutrent emphasis is mobile full stack and cloud OS.

Glossing over the numerous articles and he's clearly an intelligent fellow. I can't follow even half of the concepts.

> And if a type system can not guard against run-time errors, what’s the merit of having one?

This makes me really question the author's credibility. Clearly the author is pretty confused about static vs. dynamic type systems, and the value and history of type systems in general.

Why? I think it's very reasonable that a type system should make runtime erorrs impossible. Obviously errors happen but ideally the type system should make you handle them all.

For example this should be disallowed by the type system.

    db_handle h = new DBConnection()
* What if the connection arguments are incorrectly formated?

* What if the DNS name doesn't resolve?

* What if the port isn't open?

* What if the password is wrong?

If you're going to assign to the handle you have to prove to the compiler that the constructor will return a valid, working handle.

    Result q = h.sendQuery(...)
This should also fail.

* What if the connection dropped?

* What if your credentials are no longer valid?

* What if the query is malformed?

    x = a / b
Prove to the compiler that b is nonzero or that you've handled the case where it is.

Elm damn close to this ideal.

One of the early uses of type systems was about performance of memory allocation, which brought massive performance improvements. Folks quickly realized that they could be used to enforce invariants, which they're also really good at.

Those benefits are orthogonal to dynamic vs. static type checking. Even with a dynamically typed language, you get massive performance, security, and correctness guarantees vs. an untyped language. There's huge merit in having a type system that provides all these benefits but still doesn't guard against runtime errors.

Which brings me to my point: this person is clearly confused about the benefits of type systems, which don't require static typing to add incredible value.

> If you're going to assign to the handle you have to prove to the compiler that the constructor will return a valid, working handle.

I'm trying to imagine how one could prove to the compiler that an event in the far distant future is valid (without the obvious solution of throwing an exception in the constructor but that is at runtime so...)

Well, an event in a far distant future is always valid if none of the paths leading to that event contain any code that raise an exceptional condition. That’s it.

The link didn't work from my part of the world so I couldn't read it, but the title is self-explanatory, so I'll make a point which you may have already made. Previously constructed languages like Esperanto use vocabulary sampled from European languages only, which I suspect is why they haven't become popular. A new language would need to take words and grammar from non-European languages also, such as Arabic, Swahili, Chinese, Japanese, and Indonesian, in order to be adopted in any meaningful scale.

The topic is programming languages, DSLs specifically.

I'm surprised the link doesn't work for you. Any idea why? (To be clear, I'm in no way responsible for the link or its content.)

I don’t think that this comment is relevant to the article.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact