
The Rewards of Creating a Programming Language - StylifyYourBlog
http://mikedrivendevelopment.blogspot.com/2015/02/the-incredible-rewards-of-creating.html
======
t1m
I was hired (a long time ago) to write a language for a Very Large Telco
Equipment Supplier in Canada in order to support their automated regression
testing effort for their digital telephone switches.

It was called, ingeniously enough, "T" (no, not _that_ "T"). As far as I (and
cursory Google searches) know, it was never released to the adoring public.

I used lex and yacc (half jokingly referred to as 'ick' and 'yuck') and K&R C
for the compiler and VM.

The particular type of testing we were targeting involved writing test cases
that would read/write over serial lines to a telephone switch's console
program. Therefore, the language needed to have good serial/terminal I/O, and
it needed to have amazing string/pattern matching.

I wrote two features that I am still particularly fond of:

\- regexps were a built in type. Strings were written like 'hi there' and
regexps were written like `hi ..ere` - supported standard unix regexps

\- associative arrays. Lots of languages have these now, like python's dict

The cool thing about it was how we tried to allow strings and regexps have a
polymorphic relationship at the language level. The statements:

    
    
      x == 'some string' or x == `some [^t]ring`
    

would be valid for strings in x, though the regexp had other operators that
didn't make much sense with strings. It got really interesting when we
combined the regexps with the associative array:

    
    
      dict['hello'] = 4
      dict['help'] = 2
    
      dict[`he.*`] == [2, 4]   # true

~~~
TheLoneWolfling
I wonder how one would go about efficiently implementing such an associative
array.

I suppose a trie + glue logic (regexp -> NFA -> DFA) could work, for classic
(read: actually regular expressions) regexps at least.

~~~
t1m
I have wondered the same thing, but wasn't extremely concerned about getting
it super-fast at the time. We had an existing, older language that was much
loathed and quite slow, and we just had to beat it. And awk.

------
evincarofautumn
If you’re thinking of writing a language in earnest, you will create something
much more valuable if you start from a novel semantics, and only then come up
with a syntax to express those semantics, than if you were to start from
syntax.

The world does not need yet another reskin of Java, but it could use new
programming paradigms and new ways of solving problems.

As a learning exercise, implementing a language is worthwhile simply to gain
the understanding that languages are _not magic_. To boot, you’ll pick up
loads of useful techniques in the realms of parsing, data flow analysis, error
reporting, and optimisation.

~~~
nostrademons
If you're looking to build something that people will actually use, you're
better off not doing anything novel at all, but rather combining novel ideas
that have shown promise in research languages into a package that people might
actually want to use for everyday programming.

There's a rule of thumb among language designers that your language should
either focus on proving out one big _language feature_ , or it should
introduce zero new language features but combine new language features from a
number of other languages. So for example:

Erlang introduced lightweight CSP-based concurrency. Go popularized it.

Haskell introduced typeclasses. Go and Rust popularize them (as interfaces and
traits, respectively, the latter also influenced by C++ STL's concepts).

Cyclone introduced linear types. Rust popularizes them.

Smalltalk introduced object-orientation. (More precisely, Simula did and
Smalltalk took it to its logical conclusion.) Java, C++, and Objective-C
popularized it.

Self introduced prototype-based programming. Javascript popularized it.

CLOS (Common Lisp) introduced the meta-object protocol. Python and Ruby
popularized it, particularly in their Django and Rails web frameworks
respectively.

Usually popularizing a language involves a good deal of work that's not sexy,
notably building up a large standard library, developer tools, a package
manager, and a whole ecosystem around the language.

~~~
felixgallo
erlang's concurrency is actors, which is different from go's CSP. And I nearly
spat out my tea when you said that Go popularized something Haskell did; and I
haven't even had any tea today.

~~~
codygman
But didn't Go effectively popularize typeclasses through interfaces?

~~~
felixgallo
no?

~~~
codygman
Can you explain how? Obviously at least two people either disagree or aren't
privy to the knowledge you hold!

~~~
peteretep
I suspect a reasonable person might expect you to explain how you came to the
conclusion that a language with a smaller or comparative user base /
popularity to Haskell "popularised" type classes.

~~~
sanderjd
Huh, I interpreted it differently, as in questioning whether Go's interfaces
have much in common with Haskell's typeclasses. But now I'm intrigued — does
Haskell really have a larger or comparative popularity as Go? That would
surprise me, but I don't have any data, do you?

~~~
peteretep
It's remarkably difficult to find anything approaching hard numbers :-)
However, I am going off:

* [http://redmonk.com/sogrady/2015/01/14/language-rankings-1-15...](http://redmonk.com/sogrady/2015/01/14/language-rankings-1-15/) (Haskell slightly ahead in their numeric ranking) * [http://www.tiobe.com/index.php/content/paperinfo/tpci/index....](http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html) (Golang slightly ahead in their ranking) * Slightly more published books on Haskell that I can find (although obviously it's been around a lot longer)

I have no doubt that Go will eventually surpass Haskell in terms of
popularity, and may even have a slight edge, but suggesting that Go
popularized typeclasses smacks of fanboyism, and reminds me - in no small
measure - of:
[https://www.youtube.com/watch?v=bzkRVzciAZg](https://www.youtube.com/watch?v=bzkRVzciAZg)

~~~
codygman
> suggesting that Go popularized typeclasses smacks of fanboyism

I'm not exactly a fan of Go nor am I a fanboy, quite the opposite these days.
I don't have to like go to mention that the way interfaces are used kind of
accomplish the same thing as typeclasses.

I also feel that (sadly) Go is more popular than Haskell. Partially because of
the number of Haskell programmers I've encountered in real life is much lower
than the number of Go programmers I've encountered.

------
nemesisrobot
The author says he turned to Coursera, but doesn't mention the course(s) he
took, but I'm going to guess it's the the 'Compilers' class from Stanford[0].
I've heard good things about the course and the lecturer (Alex Aiken) so I
really wanted to take the course while it was being offered but was too busy
last year. I hope they offer it again this year.

[https://www.coursera.org/course/compilers](https://www.coursera.org/course/compilers)

~~~
zellyn
This is excellent: I recently worked through this class (at my own pace: you
can register for past offerings). I made things a little harder than necessary
by eschewing the provided framework code and writing everything myself, in go.

I found SPIM super-annoying, but managed to resist the temptation to build a
better MIPS emulator. :-)

Highly recommend: writing a (very simple) compiler is no longer a black-art-
seeming thing to me.

~~~
zellyn
Oh, [http://github.com/zellyn/gocool](http://github.com/zellyn/gocool): I
recommend you steal my tests (bash, sorry!) but avoid cheating too much on the
actual meat of the exercises :-)

~~~
nemesisrobot
Thank you for this; cloning it now

------
austinz
Another interesting thing to try (and maybe complementary as well) is building
an interpreter or compiler for a language you like, trying to match the spec
and feature set as closely as possible, and (hopefully) gaining more insight
into the tradeoffs the language designers made when deciding why things are
the way they are.

~~~
SloopJon
Yes, I've written tools for existing languages, and found just getting the
parser right to be a challenge, be it with Emacs highlighting, flex/bison,
what have you. Real languages aren't as tidy as the textbook examples.

~~~
seanmcdirmid
When I do PL these days, I always start with the editor; e.g. see:

[http://research.microsoft.com/en-
us/people/smcdirm/managedti...](http://research.microsoft.com/en-
us/people/smcdirm/managedtime.aspx)

Co-designing your editor with your language allows for a better experience.

You are right in that all of this is basically undocumented in textbooks. Even
their presentation on on parsing is mostly unhelpful (if you want an error
tolerant incremental parser for your editor, recursive descent works very
well).

~~~
handojin
That is really cool. This focus on time is what draws me into the whole
clojure/datomic world. We're ill equipped to handle it. Tools help.

Does time itself reveal itself as the horizon of being?

~~~
seanmcdirmid
We can manage time one way or the other. Clojure does it with a focus on
immutability, I'm doing it with a focus on mutability :)

One thing that is cool is how time is related to observation, the same thing
that happens in quantum entanglement. If you don't observe the state of an
object, time doesn't really exist.

------
WalterBright
I wrote this a while back on creating your own language:
[http://digitalmars.com/articles/b89.html](http://digitalmars.com/articles/b89.html)

~~~
Toenex
_As an aside, I 'll note that working on stuff that I needed has fared quite a
bit better than working on stuff that I was told others need._

Someone once told me something similar, "always be customer number one".

------
zak_mc_kracken
If you are a developer and you have never written a language, you owe it to
yourself to do this at least once.

Writing a language (lexical, syntactic, semantic phase and then code
generation) will teach you an incredible amount of things in much less time it
would take you to read about these things.

What your language looks like is completely irrelevant, this is strictly about
learning from the journey.

And once you've done that, make sure to mention it in your resume: I, for one,
will instantly give you brownie points if you mention in your resume that you
wrote a language.

~~~
tjradcliffe
I view writing a language as either a creative epiphany or evidence of
psychosis, so when I see someone with a language on their CV I'm always
careful to explore why they wrote it. Surprisingly often they think it's going
to Solve All The Problems rather than allowing them to explore one of the
deepest and most important parts of the programming art.

I've written little languages (mostly for generating code in other languages,
which proved to be an interesting way to explore certain design patterns) but
always shied away from going all in on a bigger language. Maybe when I
retire...

------
bsurmanski
I've been working on a programming language in my spare time, also. I call it
OWL. Its not quite ready, but here's a link anyways:
[https://github.com/bsurmanski/wlc](https://github.com/bsurmanski/wlc)

OWL aims to be a low level object-oriented language without a garbage
collector (albeit with reference counting).

The best 'example' program using OWL right now is a game I made for the 48
hour game jam, Global Game Jam 2015:
[https://github.com/bsurmanski/ggj2015](https://github.com/bsurmanski/ggj2015)

I agree with the poster that making a programming language from scratch is a
great way to explore programming. On top of that, its a great experience
seeing able to implement all of those features you wish were in your favorite
language (or find out why they aren't).

------
nilliams
> Notice how not a single variable in this code has a name. They all only have
> types.

Pretty cool, never even thought about this being possible.

~~~
evincarofautumn
It’s not a novel idea, but Wake does it quite well. Most variable names aren’t
that useful—they’re just one of many possible syntactic ways to plumb values
around.

~~~
chipsy
The canonical example is how it's done in Forth: Ask for a number and it's
already on the stack. Ask for three numbers and you push onto the stack three
times. Upside: Terse code. Downside: Stack juggling can be a big maintenance
burden.

FWIW I've realized that in languages that are variable centric, an anonymous
naming convention is often preferable to a human name because it centers one's
concerns around the algorithm. After studying the options for a while I now
flip between descriptive word style and "single letter and number." So I have
a lot of "x0, x1, i0, s0, a0" in my function arguments. Where it's a record
type, I'm deciding between abbreviation and descriptive - which ultimately
depends on how often and densely I expect the data structure to be
accessed(short names imply it gets used in a dense, idiomatic style).

~~~
evincarofautumn
Yep. I am working on a statically typed stack-based language, as I quite like
the Forth style. Stack manipulation sucks, so I mainly use locals for
auxiliary parameters, to get them out of the way of the main data structure or
values that a definition is manipulating implicitly. For example:

    
    
        define do_the_thing (Foo bool -> Bar):
          -> should_log;
    
          if (should_log):
            dup log_foo
    
          foo_to_bar
    
          if (should_log):
            dup log_bar

------
kolev
I've recently discovered PEG.js [0] and prototyped a small DSL in the browser
[1]. Amazing stuff! I've dealt with ANTLR [2] before, but PEG.js feels so much
nicer although it may not be as powerful!

[0] [http://pegjs.org/](http://pegjs.org/)

[1] [http://pegjs.org/online](http://pegjs.org/online)

[2] [http://www.antlr.org/](http://www.antlr.org/)

~~~
munro
PEG.js is definitely fun to prototype ideas in the browser.

My new love is Parsec [1] + doctest [2]. You start building simple little
parsers, and build on top of them, plus inline testing makes it easy to write
the parsers correctly. Also Haskell's algebraic datatypes and pattern matching
make it nice to build and work with the AST. <3

[1] [https://wiki.haskell.org/Parsec](https://wiki.haskell.org/Parsec) [2]
[https://hackage.haskell.org/package/doctest](https://hackage.haskell.org/package/doctest)

------
element11
I always wonder about the final conversion to assembly. Do you really need to
learn all the instructions, the linking, the binary formats and all that stuff
?

~~~
tlb
My favorite way is to output C, which is conveniently human-readable for
debugging. Many people also like the LLVM back end.

~~~
element11
I new about that possibility but how efficient is it compared to other
techniques ?

~~~
samatman
Nim is competing effectively by using this tactic. It's efficient.

------
emmab
When your top-down parser gets caught in an infinite loop, you'll need to
convert the left recursion to right recursion. See here:
[https://en.wikipedia.org/wiki/Left_recursion#Removing_left_r...](https://en.wikipedia.org/wiki/Left_recursion#Removing_left_recursion)

Writing a compiler is a significant amount of work and should be pre-
researched to avoid issues like the above.

------
damian2000
I was reading something similar here about creating your own language using
LLVM (and also using the Ruby bindings for LLVM as opposed to C++) ...
[http://macournoyer.com/blog/2008/12/09/orange/](http://macournoyer.com/blog/2008/12/09/orange/)

------
agumonkey
I'm curious how people feel about syntax. I tend to think everything should be
written in sexp (I like [s]ml syntax too though). Much like this
[http://cs.brown.edu/courses/cs173/2012/book/Everything__We_W...](http://cs.brown.edu/courses/cs173/2012/book/Everything__We_Will_Say__About_Parsing.html)

Maybe I'm missing something, how many people felt that syntax made an idea pop
differently and was part of the understanding ? To me it's often the opposite,
it conflates too many things in a few symbols and then you waste time
discussing corner cases.

------
xjia
Inaccurately, programming language = syntax + semantics.

Do not waste your time on syntax, if you are going to create a new language,
rather than a parser.

I'm not saying that syntax is not important. But I feel semantics deserves
much more attention.

~~~
tormeh
I think syntax may actually be more important for adoption than semantics. But
for just learning semantics is king.

------
girvo
Yes! I agree entirely, which is why I've been diving deep into creating (and
playing with) new languages lately. Turns out a lot of problems that I have to
solve at work can be fit into a parsing problem too, so it's not just all
theoretical, but the big part of learning how to build languages is
appreciating how the ones I use every day work; I can extend them, understand
bugs, and push them to their limits. I couldn't do that prior: I always
thought it was just black magic that greying neckbeard wizards did off in some
ivory tower!

------
nubela
"that types are often a great name for variables, which both the programmer
and compiler can use to easily and effectively understand most common code."

This hit me, what if variable names CONTAIN types? int__i = 1

And of course, it can be made optional. I will much prefer to do optional
typing in the var name itself rather than the ugly syntax that guido has
proposed with python.

~~~
WalterBright
It's called Hungarian Notation, famously used by Microsoft.

[http://en.wikipedia.org/wiki/Hungarian_notation](http://en.wikipedia.org/wiki/Hungarian_notation)

------
lifeisstillgood
the best happy-accident ... Months and months later, I realized that ...
thanks to this simple idea, I could ...

That sentence is in every act of creation - something new just spawns
something else that was not possible before the original act.

It's marvellous to see and indicates he is on the trail of something good -
all the best.

------
erikb
I scrolled through the article and found some attributes of a language that is
probably created by the author. But where are the rewards? I'm confused.

------
scotty79
I wonder how much fun one could have with OMeta
[http://tinlizzie.org/ometa/](http://tinlizzie.org/ometa/) ?

------
davidw
I had fun doing Hecl. It didn't bring me fame and riches, but it did get used
by a few companies, and it got me some interesting gigs.

------
blt
currently stockpiling semantics ideas until I have enough good ones to justify
the effort...

