
So you want to write your own language? - ternaryoperator
http://www.drdobbs.com/architecture-and-design/so-you-want-to-write-your-own-language/240165488
======
agentultra
My unsolicited, biased advice: use s-expressions. Don't trigger a parse simply
to compute with the sums of products and integers [0]. If you can think of
something better than s-expressions use that.

Also, read [http://this-plt-life.tumblr.com/](http://this-plt-
life.tumblr.com/) and don't take yourself too seriously. Stand on the
shoulders of giants and continue reaching for the sky. Languages are super
fun.

[0] [http://www.infoq.com/interviews/mccarthy-
elephant-2000](http://www.infoq.com/interviews/mccarthy-elephant-2000)

~~~
munificent
But parsers are easy to write and unfriendly syntax will completely kill a
nascent language's adoption. If you're aiming for users that like s-exprs,
then by all means use them. Otherwise, try to stick to a syntax roughly
similar to one familiar to the users you're targeting.

~~~
read
It's a misconception parsers are easy to write. Plus, they do an unfathomable
amount of damage.

Parsers are far from harmless pieces of logic you can just throw together. And
not just because they take time to write - parsers are physically dangerous.
Compiler writers develop incipient carpal tunnel syndrome trying to write
parsers.

When you finish writing a parser as a language designer the problem of writing
a parser for the language doesn't go away. The programming language users
might want to apply transformations to source code written in the language --
which now means they need to write brand new parsers from scratch to do these
transformations.

What s-exprs give you instead is the option to write a "parser" in one
function call: using the read function. It doesn't get shorter than that.

Now, _that 's_ a parser that's easy to write.

edit: rewording

~~~
WalterBright
> It's a misconception parsers are easy to write.

Really, they are easy. They are literally insignificant when you factor in all
the hours you'll work on a language.

I wrote another one recently for a small side project. It took more time to
write the unit tests for it. The parser practically wrote itself.

~~~
giergirey
Are you including the time taken to make the parser produce helpful error
messages? That always seems like the hard bit to me ...

~~~
mattfenwick
Good point -- also, dealing with corner cases, writing tests, and maintaining
the parser as the language evolves are a couple more things that might take a
lot more work.

Do you know of any strategies for error reporting, or of tools that implement?
I'm always on the lookout for cleaner approaches to this.

~~~
WalterBright
> dealing with corner cases, writing tests, and maintaining the parser as the
> language evolves are a couple more things that might take a lot more work.

Not really.

Seriously, if you're going to get bogged down trying to get the lexer/parser
to work, you're not ready to work on a full blown language/compiler.
Lexing/parsing is the EASY part, as in a minute, insignificant part of the
time you'll invest in the project. I really do mean that.

~~~
haberman
I think there are a few things worth mentioning here.

One is that there is a big difference between writing a parser for a language
you are inventing and writing a parser that is attempting to implement an
existing language. Languages have lots of corner cases; if you are inventing
the language then every quirk of your parser is correct by definition. You
might not even be aware of some of the subtle choices that your hand-written
parser is making.

As an example, of this, it was not discovered that ALGOL 60 had a "dangling
else" ambiguity in its grammar until it after had been implemented, used
extensively, and even published in a technical report. It was essentially an
accident of the implementation that it resolved the ambiguity in the way that
it did. So while it might not be too much work to "get the lexer/parser to
work", it doesn't follow that all of the issues around parsing are trivial.
There is still a lot of complexity and subtlety around parsing if you're
trying to design something that could reasonably have multiple interoperating
implementations.

Secondly, there is a very very seriously wide variation of lexical/syntactic
complexity between languages. You can pretty easily write a 100% correct JSON
parser in an hour or less (possibly much less, depending on what language you
choose to write it in). On the other hand, it takes man-years to write a 100%
correct C++ parser (not least because C++ tightly couples parsing and semantic
analysis). Now I know this article is more talking about designing your own
language, and no language will start out as syntactically complicated as C++,
but empirically most of the languages we actually use have a fair bit of
complexity to them, so delivering the lesson that lexers/parsers are easy _in
general_ is, I think, the wrong message to be sending.

Thirdly, there are a lot of practical considerations that can make parsing
more complex. For example, take Steve Yegge's attempt to do some incremental
parsing (from [http://steve-yegge.blogspot.com/2008/03/js2-mode-new-
javascr...](http://steve-yegge.blogspot.com/2008/03/js2-mode-new-javascript-
mode-for-emacs.html)):

    
    
      I had two options: incremental parsing, or asynchrous
      parsing. Clearly, since I'm a badass programmer who can't
      recognize my own incompetence, I chose to do incremental
      parsing. I mentioned this plan a few months ago to Brendan
      Eich, who said: "Let me know how the incremental parsing
      goes." Brendan is an amazingly polite guy, so at the time I
      didn't realize this was a code-phrase for: "Let me know when
      you give up on it, loser."
    
      The basic idea behind incremental parsing (at least, my
      version of it) was that I already have these little
      functions that know how to parse functions, statements,
      try-statements, for-statements, expressions,
      plus-expressions, and so on down the line. That's how a
      recursive-descent parser works. So I figured I'd use
      heuristics to back up to some coarse level of granularity —
      say, the current enclosing function – and parse exactly one
      function. Then I'd splice the generated syntax tree fragment
      into my main AST, and go through all the function's siblings
      and update their start-positions.
    
      Seems easy enough, right? Especially since I wasn't doing
      full-blown incremental parsing: I was just doing it at the
      function level. Well, it's not easy. It's "nontrivial", a
      word they use in academia whenever they're talking about the
      Halting Problem or problems of equivalent decidability.
      Actually it's quite doable, but it's a huge amount of work
      that I finally gave up on after a couple of weeks of effort.
      There are just too many edge-cases to worry about. And I had
      this nagging fear that even if I got it working, it would
      totally break down if you had a 5,000 line function, so I
      was kinda wasting my time anyway.
    

All of this is to say: I can't argue with your basic point that "getting
lexing/parsing to work" for a language you are inventing isn't terribly
difficult. But I disagree with your larger (somewhat implied) point that
parsers _as a whole_ are easy.

~~~
WalterBright
A couple points: 1. the dangling else problem is trivial to deal with. 2. I've
written a C++ parser, I know the issues involved, and I know that a parser
generator isn't going to fix that.

> delivering the lesson that lexers/parsers are easy in general is, I think,
> the wrong message to be sending.

I stand by the message :-) in the sense that if a person finds lexing/parsing
to be hard, they're likely to find the semantic/optimization/codegen parts of
the compiler to be insurmountable.

I've written compilers for numerous languages, including C++, including 2
languages I invented, and lexing & parsing is just not that hard relative to
the rest of a compiler.

------
cocoflunchy
If you don't know how to start but want to give it a try, I highly recommend
the _How To Create Your Own Freaking Awesome Programming Language_ book
[http://createyourproglang.com/](http://createyourproglang.com/) . One of the
best purchases of this year for sure.

I've been writing my own toy compile-to-js language after reading the book:
[https://github.com/cosmith/panda](https://github.com/cosmith/panda) (I might
have to change the name though, I recently discovered
[http://yesco.org/panda.html](http://yesco.org/panda.html) ...)

~~~
gordonguthrie
I am writing an OTP-ish dialect of Erlang that compiles to javascript
[http://luvv.ie](http://luvv.ie) Just bought that book on your recommendation
- better be good or else ;-D

~~~
biot
Suggestion: you really need "here's what the syntax looks like" example(s) on
your home page with some non-trivial code (more than just hello world).
Ideally, show the sample and the JS output side by side like on the
CoffeeScript home page: [http://coffeescript.org/](http://coffeescript.org/)

~~~
gordonguthrie
I suppose I am being a bit of a blushing bride about the syntax as it isn't
mine, just being good old-fashioned Erlang.

The demo doesn't show much more than hello world, because that's all there is
at the moment - the focus has been on building the end-to-end toolchain and I
am now filling out the features, but good suggestion going forward.

~~~
biot
I read "OTP-ish dialect of Erlang" as meaning it's not quite the same, and I
was looking for details of those differences. If it's just plain Erlang, then
I see your point.

------
stormbrew
I really like that this article suggests against using compiler compilers. My
experiences have always been much worse when using them. I think the biggest
risk when not using them (especially if you're writing a recursive descent by
hand) is avoiding the temptation to make things more context sensitive (not
necessarily in the rigorous language theory sense) than absolutely necessary.

~~~
anaphor
Yes, as Rob Pike said, just write everything by hand. Ken Thompson also
recommends writing your lexers by hand as well! I also agree about the
temptation to make it too complex just because how powerful writing it by hand
is, I've found myself succumbing a bit to the temptation by adding in the
ability to change the behaviour of the lexer in the source itself.

~~~
saurik
This doesn't make any sense to me. The algorithm to construct optimally
implemented lexers is not something any human would ever type by hand. Tools
like flex were designed with one goal in mind: performance; it even has a mode
that analyzes your set of tokens looking for "mistakes" that would cause
backtracking, as it has a really keen interest in getting exactly linear
performance. It also strives for an insanely low constant multiplier: the
manual page even talks about the number of machine instructions used per
character of input. Of course, some people might have needs that allow for
lower throughput in exchange for "less memory", so it had various modes to
"compress" its lookup tables, with varying performance tradeoffs, which you
can experiment with without changing your lexer definition. This is one of the
super powers you get from "the correct level of abstraction".

Like, seriously: what are you trying to achieve by writing your lexer by hand?
Your result will be both more difficult to maintain and is pretty much
guaranteed to be slower than the output of a tool like flex. At least when
people throw away the advantages of parser generators and write recursive
descent parsers they gain the ability to have "easy context sensitivity"
(which makes implementing many languages much easier), but I don't see why
anyone would ever hand write a lexer. "I know how it all works" is also fine,
but in that case you write your own lexer generator tool, you don't skip
directly to the lexer (unless you are doing your first one as a "homework
assignment"). If you don't like the input syntax of your lexer generator,
there are many to choose from, or maybe you write one yourself, but that's no
reason to switch to something that is not only going to be more verbose and
error-prone but will also be slower.

(afterthought: I guess an interesting analogy: you don't lay out a hashtable
by hand, like in a C initializer list in your source code that has a large
number of NULL entries with only a subset filled in, already in hashed order;
you instead write a hash function and let the computer reorder your entries
for you. Manually hashing the values feels "hard core", but offers no
advantages, and means you have to throw away all of your work and start over
when the size of your table changes. Doing it by hand is also strictly grunt
work: you don't gain anything by having placed it by hand other than the
possibility that you made a mistake somewhere and now your element will never
be found. And if later you want to try different hash functions or different
search algorithms--maybe you are willing to pay some costs to get range
queries, and end up using a tree--you can later do so without changing your
input files. You should think of your lexer like a really complex data
structure tied to an algorithm that always has the abstract interface "get
next token".)

~~~
seanmcdirmid
Lexing is such a small part of compilation overhead that not being optimal
isn't going to kill you. In my case, writing the elder by hand is necessary
because I have to memoize token identities (and all the parsing/typing info
attached).

Yes, I write my own hash tables also for the same reason (so they support
incremental computations).

~~~
saurik
> Lexing is such a small part of compilation overhead that not being optimal
> isn't going to kill you.

You still then need to just the extra boilerplate per element. If you use a
tool, you are literally looking at just "keyword <space> code when that
keyword is pressed" without any surrounding "how to check if it is that
keyword". This is easier to write and easier to maintain, in addition to the
advantages I discussed earlier.

> In my case, writing the elder by hand is necessary because I have to memoize
> token identities (and all the parsing/typing info attached).

If I understand what you mean, then this is trivially done with most existing
tools as part of your token rule (retired an interned string, which you can
use an existing data structure for). If not, then you would first write a
tool. Again: it may feel really "hard core" to write a lexer, but it is
repetitive code that a good design factors out into a lexer generator.

> Yes, I write my own hash tables also for the same reason (so they support
> incremental computations).

Careful: I imagine you mean to say you write your own hash table
library/compiler, which is different from laying out the hashtable by hand. I
have also written my own hashtable for many reasons, but I would never sit
down with an array literal and manually put the entries in there by hand: even
if I don't make any mistakes, it is a pointless waste of my time that is
trivially automatable using a computer.

~~~
seanmcdirmid
Keywords are quite easy: just plug your identifiers into the hash table after
their boundaries are detected. Again, it is much more expensive than
generating a FSM, but the expense is really in the noise when everything else
is considered.

As for incremental lexing, you need to tell if your tokens pre-existed as the
same kind (not necessarily the same string!) before the edit or not. It would
be trivial to add this to a generator, but how would it then feedback the
signals needed to take advantage of the memoization (e.g. by providing a
persistent token ID that can unlock pre-existing information about the token).
There are simply no standards for that.

In most of the professionally written compilers (e.g. scalac) I've worked on,
lexer and parser generators aren't even used, and it really isn't that big of
a deal to write these in code; you also get the benefit that the same language
is being used. This becomes especially true when IDE services are considered,
whereas most generators are pretty much limited to batch applications.

> I have also written my own hashtable for many reasons, but I would never sit
> down with an array literal and manually put the entries in there by hand:
> even if I don't make any mistakes, it is a pointless waste of my time that
> is trivially automatable using a computer.

I see your point, but it really depends on the key space you are optimizing
for. You might just put the elements in by hand if a generic algorithm isn't
really called for.

------
programminggeek
The last bit I think is the most important - that you will have to take the
show on the road and tell people about it to ever have it really take off is
interesting and important. Promoting a piece of tech as critical as a language
is no small chore. You can't just put something out on the internet and hope.

~~~
carsongross
Having helped put something out on the internet and hoping ([http://gosu-
lang.org](http://gosu-lang.org)) I 100% agree, but I think there is something
else in there beyond just advocating a language. Let me try to articulate it:

Programming languages don't win because they are better qua programming
languages. They win because they solve one problem really well that others do
not. Ruby (+ Rails) -> building structured web apps. Javascript -> Being
there. C -> Being portable assembler. Java -> C with garbage collection.

Consider the greatest language ever, Lisp (I hate it, but I recognize it's
power.) The reason it's never taken off is that there is no one big problem
that it solves that other languages can't solve easily enough.

The bad news for a lot of the better, smaller languages out there is that the
newer problems are often solved in libraries and momentum (tooling, deployment
support, etc.) in the big languages is huge.

~~~
masklinn
> They win because they solve one problem really well that others do not.

That assertion's got more holes than emmental. Where does Python fit? C#?
Javascript and Objective-C don't really "solve one problem really well",
they're the only fucking option in their space. Java was not and has never
been "C with garbage collection", the syntax is the only thing it got from C.

The language being great at solving a specific use case is definitely a boon
and increases its chances (witness PHP or Lua), but it's nowhere near
sufficient (Tcl lives in the same niche as Lua — lightweight, extensible,
multi-paradigm language for embedding in and scripting of native codebases,
predates Lua, and is considered by many of its lovers better than Lua, yet
it's dying its slow and protracted death)

> The reason it's never taken off is that there is no one big problem that it
> solves that other languages can't solve easily enough.

The reasons it never took of is a combination of little advertising/pushing
(compare with java), and highly fragmentary communities (the community unit of
Lisp, as those of mages, is 1) leading to no corpus of code.

> The bad news for a lot of the better, smaller languages

And yet there's never been more uptake in new languages.

~~~
Silhouette
_That assertion 's got more holes than emmental._

Maybe, but your counterexamples don't seem particularly convincing.

 _Where does Python fit?_

Originally, a readable scripting language.

More recently, a language for back-end web development that doesn't ram OO
down your throat.

Also carving out something of a niche as a general purpose scripting language
embedded in other applications instead of a custom macro language, like Lua
but many more people have prior experience programming in it.

 _C#_

Similar strengths to Java, but with a lot of useful extra features, without
most of the limitations that should have gone away a decade ago but didn't,
and with a runtime environment that is available by default on Windows.

 _The reasons [Lisp] never took of is a combination of little advertising
/pushing ... and highly fragmentary communities ... leading to no corpus of
code._

Also, a highly uniform and generic structure is not always an advantage in a
programming language. Sometimes it's better to have things that are different
obviously look like they're different.

------
agumonkey
What do you think about the point of view of skipping concrete syntax and just
design the language at the construct level using s-exps ? ( as mentioned here
[http://cs.brown.edu/courses/cs173/2012/book/Everything__We_W...](http://cs.brown.edu/courses/cs173/2012/book/Everything__We_Will_Say__About_Parsing.html)
)

~~~
mamcx
This how could help in build a C-like or Python-like language? Or you say just
make a LISP-like language?

~~~
agumonkey
meaning the syntax is not the semantic, I can try imperative algol things in
lisp clothing, or prolog things or both .. but I won't waste time on syntax,
more on semantics.

~~~
marktangotango
I think that's valid, even desirable. Others have said lexing/parsing are a
fraction of the language implementation problem. So, when using s-expression
like syntax (xml as some have done) you are basically writing code in the AST.
Then you can concentrate on interpretating the AST, or translating it to
bytecode for some vm (maybe of your own design) or performing passes on the
AST to do things like type checking, up casting values in expressions,
optimizations, etc. There is really a lot more to language implementation than
lexing parsing.

~~~
agumonkey
That's the point made by the course instructors, focus on
semantics/interpretation/compilation rather. Another point I'd like to see is
that with a small syntactic surface it may be easier to compare linguistic
traits. Instead of having 219 languages we'd have one core and libs (with
their macro sugar) giving the desired features. But maybe that would only move
the problem further.

~~~
mamcx
I think exist some gap in how build languages. I wanna make one, but need to
re-create a lot of basic stuff just to bring my own thing. I know
.net/java/llvm is suposed to solve this, but exist something that give just a
AST and I plug onto it? So I don't need to invent how declare a var or how
make a function, but just control the syntax, the sugar, etc.. (ie: Like have
a API to build languages).

Of course, where this dream get killed is when I'm interested in do it in C,
not in .NET, or need a VM (or not). But when I see julia/GO/lua/nimrond and
think "Hey, I could build on that" but is too highly coupled and then the
option is do everything from the start...

------
lmm
I found [http://www.hokstad.com/compiler/](http://www.hokstad.com/compiler/)
really educational - implementing a language in the same way I'm used to
writing code, by solving one problem at a time. It makes sense in a way that
compiler theory lectures never did for me.

------
andrewflnr
I have to speak up in defense of minimizing typing. Sure, programming is
mostly long stretches of thinking with short stretches of typing, but when you
do start typing, it's usually the result of some piece of inspiration. It's
especially at those times that typing lots of useless characters is a
distraction. Lots of noise in the code is also distracting while reading,
which we all know is what we spend most of our time doing, though you don't
want it to be _too_ short either.

~~~
Scramblejams
As someone who had a bout with RSI, I agree. Minimizing keystrokes, if it can
be done without impacting readability too much, is excellent.

~~~
WalterBright
I don't have RSI, so don't know from experience, but wouldn't a syntax aware
IDE with autocomplete be very useful for minimizing keystrokes without
necessarily minimizing the characters?

~~~
Scramblejams
Probably. I tend to avoid IDEs because to me they feel like straight jackets
and their editors tend to be inferior to standalone editors. I'll have to look
around at autocomplete options for my usual languages (Erlang, Python) though
-- I figured if I couldn't fit what I was using into my head, I needed to work
on my head, not on my autocomplete options. But maybe I'm wrong. :-)

------
lnanek2
Hmm, so this is the author of D. I've never seen D actually used anywhere.
Honestly, I think he missed the biggest thing - being employed by a large
company that will push your platform. From the Wiki page, D sounds like a
replacement systems programming language, but I've never heard of it while
Google's Go system programming language replacement is all over the place.
Similarly Sun, now Oracle, and Microsoft push languages that actually get
used.

~~~
billforsternz
The first thing that jumped out at me in the (very interesting I thought)
article was this; "If you need validation and encouragement from others, it
isn't going to happen". In other words, you need a thick skin as well as
technical ability. I remember Walter Bright achieved a certain amount of fame
in the programming community back in the 80s by writing and productizing
outstanding C and C++ compilers for MS-DOS when that was an important market
with big competitors including MS and Borland. He seems to have been plugging
away with D, conceived as a better successor for C than C++ for a long time
now. I suspect that the reason that C++ is ubiquitous while D remains
relatively obscure has a lot more to do with inertia and momentum than the
objective merits of the two languages. It's a tough field to get traction in
with so many brilliant and opinionated critics and competitors. My impression
is that Walter Bright is an outstanding programmer who deserves more
recognition and kudos.

~~~
WalterBright
MS and Borland produced C++ compilers long after Zortech C++ proved the market
for them on MS-DOS.

I was lucky in that Zortech C++ came out just as the market for C++ exploded
(and it might even be that ZTC++ was the case of that explosion).

~~~
billforsternz
Thanks for replying. I feel a Wayne's World moment coming on... "I'm not
worthy, I'm not worthy...". Congratulations on a wonderful body of work. I am
going to use this moment to motivate myself to push my own projects forward.
Of course I'm not trying to write my own language (that would be crazy...).

------
rch
Though is probably isn't as straightforward as other options, I really like
how Lepl is put together:

[http://www.acooke.org/lepl/intro.html](http://www.acooke.org/lepl/intro.html)

A quick example:

[https://gist.github.com/rch/8791893](https://gist.github.com/rch/8791893)

Edit -- I guess not many people agree with me:

[http://www.acooke.org/lepl/discontinued.html](http://www.acooke.org/lepl/discontinued.html)

------
jhallenworld
Well my pet language Ivy [http://sourceforge.net/projects/ivy-
lang/](http://sourceforge.net/projects/ivy-lang/) allows you to define new
statements via delayed evaluation tricks (but without actual quoting):

    
    
      fn myfor(&init, &test, &incr, &body)
        *init
        while *test
          *body
          *incr
    
      var a
    
      myfor a=0, a!=10, ++a
        print a
    

& creates a zero-argument lambda function out of the argument. * is sugar to
call a zero-argument function.

Also:

    
    
      fn set(&x y)
        *x = y
    
      set a 7  # Set variable a to 7
    

It works because when variables are evaluated, the result includes both the
value and the address from which it originated.

Anyway, creating a language is one thing. Creating a fast compiler for it is
quite another. It's impressive how well v8 works for the dynamic language
javascript, but the amount of work embedded in it is daunting. On my list of
future projects is to try to make a simple SSA-based optimizer.

~~~
mattgreenrocks
That's really awesome!

------
webreac
I have ideas about a new language since I finished school in 1993. I had many
ideas, some of them got implemented in java. Recently, I discovered scala. All
my vague ideas are in scala. The only complaint I have against scala is the
slowness of the compiler. I have stopped learning about compilers.

~~~
username42
At coursera.org, there is an excellent resource for learning about compilers:
"Stanford University - Compilers"

------
kriro
I'm building a toy-DSL and working through the ANTLR4 book right now mostly
because it was the first thing that came up when searching. Since I don't have
a formal CS background I'm lacking a bit in the "language development"
department.

Is there a good overview comparison of strength/weaknesses of what's available
(blog post, academic journal don't really care)? I mostly know stuff exists
because I've heard it mentioned somewhere (Flex/Bison, yacc, using Prolog :P)
and it's not the easiest thing in the world to make an educated decision.

My current plan is to stick with ANTR and muddle through and build a prototype
then spend a day or two researching alternatives and seeing what comes up but
if someone could cut down that research time I'd be thankful :)

~~~
andrewflnr
Recursive descent parsers are easy, at least if your grammar is LL(1). I seem
to recall there being a lot of resources on how to make your grammar be LL(1),
but I don't recall which one I read.

------
rurban
Surprisingly shadow article for someone like Walter Bright. Just some basic
recommendations, but nothing practical? No links to books, parsers,
implementations?

The easiest start would be "Compiler Construction using Flex and Bison" by
Anthony A. Aaby, available as free pdf, and for adding LLVM to the mix the
link would be: [http://gnuu.org/2009/09/18/writing-your-own-toy-
compiler/](http://gnuu.org/2009/09/18/writing-your-own-toy-compiler/)

There needs much more to be said, e.g. links to such languages which were
written in a day and are _not_ necessarily lisp or scheme. That would be too
easy.

~~~
joe_the_user
This is a fabulous article.

This article seems like the perfect antidote to the usual "here's the dragon
book" article.

Articles that point you to the dragon book, give a few ideas of what parser
and lexer are, and then turn you loose are just terrible. They are look
articles on "how to build a house" that focus on "here's where to buy a
backhoe, here's where you can concrete, etc..."

Essentially, designing your own programming language is a difficult and in
many ways, bad, idea. But what you to have a ghost of a chance is "what choice
should you make, what will it feel like" narrative (there should also be "what
are the milestones" but given programming, that will sadly be too variable to
predict).

------
je_bailey
Every so often, there's an article on HN promoting the concept that everyone
should learn to code. I feel the same way about programming languages, that
every developer should write a language. It doesn't need to be a great
language, or overly functional. For me, it has provided incredible insight
into what is sugar, what is required, and why languages are syntaxed the way
they are. The years I've spent writing languages has made me a far better
developer.

------
e12e
If you want to play with languages, you could do worse than taking a peek at
OMeta:

[http://tinlizzie.org/ometa/](http://tinlizzie.org/ometa/)

------
asdasf
I think this shows that Walter has lots of compiler experience, not so much
lots of language design experience. You can't get good error messages without
semicolons? How can anyone say something that absurd in a serious article?
Have you tried a language without them? It works just fine.

Minimizing keywords is important not because of a word shortage, but because
of an overlap with words the programmer wants to use. It is obvious that if
you make "i" a keyword, someone will likely murder you. But there's tons of
grey area where you might be stepping on toes and getting in the way, so
minimizing the amount of toe stepping is good. Especially since there is
absolutely no reason to have lots of keywords.

~~~
Goosey
Full context: "Redundancy. Yes, the grammar should be redundant. You've all
heard people say that statement terminating ; are not necessary because the
compiler can figure it out. That's true — but such non-redundancy makes for
incomprehensible error messages. Consider a syntax with no redundancy: Any
random sequence of characters would then be a valid program. No error messages
are even possible. A good syntax needs redundancy in order to diagnose errors
and give good error messages."

Given the context I think it isn't so much that Walter is saying you can't
have good error messages without semicolons. He's saying you can't have good
error messages without redundancy (in this example a statement terminator).

~~~
asdasf
>Walter is saying you can't have good error messages without semicolons

That is what he said though. I understand it is one example for a larger
point, but it is an example that doesn't support that point at all. It is
pretty hard to judge the overall point when the example is nonsense.

~~~
ronaldx
I think you're really missing the point.

Your goal is to write a whole programming language.

Calculating where each statement starts and ends in order to serve a good
error message is misdirected effort. Just use a terminating character, and use
the standard one: a semi-colon.

~~~
repsilat
> Just use a terminating character, and use the standard one: a semi-colon.

Significant whitespace is popular as well. You can argue all day about it, but
at the moment I think making a language "look like python" is the safer bet
for a _new_ language.

~~~
mercurial
It's debatable. I know two languages with significant whitespace, that's
Python and Haskell. And Haskell's rules are a pain.

For the language I'm working on, I had started with significant whitespace.
Then I realized that it would be a pain for things like anonymous callbacks
(the kind of thing JS code is littered with) and that it was distracting. At
least when you start, don't get hung up on the syntax, focus on basic things.
What's important is what you are going to do with your AST, the lexer/parser
part are the least important areas, unless you are doing a "syntactic skin"
over a language (eg, coffeescript). You can always change the syntax later.

~~~
chc
There's also F#, CoffeeScript, Yaml, Sass/Stylus, Haml/Jade and several
others, just among the popular languages.

And if we mean "significant whitespace" as in "newlines can act as statement
and expression terminators" (since we were discussing semicolons as the
alternative) we can also add a ton of other languages, like Ruby, JavaScript,
Go, Scala, Visual Basic and Lua.

~~~
mercurial
I don't think it's fair to compare markup languages to programming languages,
so we're down to F# and Coffeescript. That's not much of a trend toward
significant whitespaces.

I agree with newlines vs semicolons (though I've always been told to write
Javascript with semi-colons, and that's how I have encountered it in the
wild).

~~~
chc
> _I don 't think it's fair to compare markup languages to programming
> languages, so we're down to F# and Coffeescript. That's not much of a trend
> toward significant whitespaces._

I take your point about Haml, though Sass is actually Turing-complete, so I
almost feel like it belongs in the list despite being really off-the-wall.

And I was only choosing from reasonably popular languages (so things like Boo
are out even though they'd help my numbers). There just aren't that many
mainstream programming languages out there. Four in a category seems like a
pretty fair number to me. You could just as easily say functional programming
isn't a thing if four mainstream languages is considered a paltry showing.

------
Fasebook
Here's a more technical version of a very similar article by the same author
that was previously posted on HN: [http://www.drdobbs.com/tools/creating-your-
own-domain-specif...](http://www.drdobbs.com/tools/creating-your-own-domain-
specific-langua/231002656)

