
Show HN: a lexer, parser, interpreter and web runtime (np, a 5-weekend project) - Udo
http://np-lang.org/
======
Udo
I made this primarily because I wanted to learn more about writing an
interpreter and I also wanted to experiment a little with the decisions that
go into language design - so it turned out not like a "normal" C-like, but
hopefully still recognizable. To my surprise, making decisions about how scope
should work was actually the hardest part.

I can only recommend that every hacker does this at least once in their
lifetime, it's been a real eye-opener for me in respect to problems I don't
usually think too much about.

That said, feedback or questions are very welcome :)

~~~
gngeal
"To my surprise, making decisions about how scope should work was actually the
hardest part."

Perhaps a thorough reading of SICP would enlighten you as to how scope should
be handled if you don't want to run into problems with it.

~~~
Udo
I know SICP, it's great. However, arguments from authority only get one so
far. I think it's definitely worth experimenting with different paradigms. And
I'm not done yet. However, there is something to be said for not taking "the
standard way of doing things" at face value all the time, even if that means
arriving at the exact same solution in the end. To my defense though,
different languages behave in different ways in this respect, so at least I'm
not violating the entire collective wisdom of computing (just a part of it).

------
peterkelly
"I wanted to explore what it feels like to write a parser, a lexer, and an
interpreter without using a framework or some other ready-made solution"

This is an excellent exercise for any programmer who wants to really
understand what's going on behind the scenes. I've always believed it's useful
to understand at least one layer below what you normally work with.

Some people will criticise you for "why create another programming language
when we already have X?" but ignore them. As long as you explain that it's an
experiment (as you have) you should be fine. And it's possible that by
starting something completely from the ground up you'll discover a different
take on things that might be better than what exists already.

Just be prepared to do a complete rewrite or three once you've got your
initial prototype working ;)

"and I didn't really consult any literature either"

I really recommend that you do.

The single best resource on this topic is the "Dragon Book" (officially
"Compilers: Principles, Techniques, and Tools" by Aho, Lam, Sethi, and Ullman)
- the latest edition is from 2006, though the older one should be fine for
what you want too. I wrote my first compiler before learning the "proper" way
of doing things and it was a complete mess. It was only after studying
compiler construction at university that I learnt about things like formal
grammars, syntax trees, etc. The dragon book will take you all the way through
to expert level, and is written in such a way that you can implement different
parts of your compiler as you go through the book.

One caveat though is it's _huge_ (close to 1,000 pages). I spent a couple of
months full time working through it (coding as I went along) during a long
break from work a couple of years ago, and only got about half-way through in
that time. But you only really need the first few chapters to get the basics;
the later parts go on to a lot of advanced code optimisation techniques that
you only really need if you're trying to write a high-performance production
compiler or get into compiler research.

SICP, as someone else mentioned, is also very good, though it's really more
about the overarching theoretical concepts and doesn't go into the same amount
of depth as the dragon book. I think reading both is a good idea.

Just be warned, this stuff is highly addictive, and you can spend _years_
studying it and still end up with lots of gaps in your knowledge ;)

~~~
stiff
_The single best resource on this topic is the "Dragon Book"_

Your advice is great in general, but I wish people would back up statements
like this with some context, for example, what specifically have you compared
it with? Inspired by Steve Yegge blogposts I once decided to write a toy
compiler for educational purposes and I started with reading the Dragon Book,
in fact I eventually read it almost cover to cover and implemented many
algorithms from it, it is definitely full of valuable information. But it also
is a very roundabout way of learning how to write an actual compiler, it is
more a theoretical reference work than anything else. There are several
hundreds of pages devoted to parsing, but some of the more modern techniques
are not covered, runtime is treated much more briefly and many practical
issues are not discussed at all. There are some nice modern textbooks that are
more to the point:

[http://www.amazon.com/Modern-Compiler-Design-D-
Grune/dp/0471...](http://www.amazon.com/Modern-Compiler-Design-D-
Grune/dp/0471976970/) [http://www.amazon.com/Engineering-Compiler-Second-
Keith-Coop...](http://www.amazon.com/Engineering-Compiler-Second-Keith-
Cooper/dp/012088478X/)

ANTLR is a great tool and many practical issues I solved with the help of this
book about it:

<http://www.amazon.com/gp/product/1934356999/>

Finally, source code of original AWK is a great example codebase to learn
about real-world parsing, building and traversing parse trees etc.:

<https://github.com/danfuzz/one-true-awk>

It is an interpreter and not a compiler but once you know how to do the things
just mentioned converting it to a naive compiler isn't that hard if you are
not interested in fancy optimization like the OP said.

~~~
peterkelly
s/The single best resource/Out of the resources I've read, the one I've
personally found most useful/

;)

I should also probably add that I only ever read the dragon book years after I
first started learning about compilers through more gentle means such as the
3rd-year compiler construction course I studied, and a bunch of other reading
and experimentation. It's probably better to start off with something simpler.
I haven't read either of the two books you referenced; they may indeed be more
appropriate for a beginner.

While I haven't used ANTLR, I've used bison & flex, as well as Stratego (which
operates on a much higher level and is actually very nice).

I guess it depends on what aspect of compilers you're most interested in and
how much depth you want to go into. The dragon book will teach you all about
how to write your own lexer + parser generator (which I found quite
fascinating), but you don't need to know about NFAs/DFA construction etc. if
you just want to create your own language implementation, given the existence
of many good lexer/parser generators to which you can just pass in a formal
grammar.

Another book I also found very useful was "The Implementation of Functional
Programming Languages" by Simon Peyton Jones
([http://research.microsoft.com/en-
us/um/people/simonpj/papers...](http://research.microsoft.com/en-
us/um/people/simonpj/papers/slpj-book-1987/index.htm)), though it's mostly of
use only if you're specifically interested in functional programming (which is
what I've mainly focused on my research; this book is much more specialised
than some of the others).

~~~
stiff
_The dragon book will teach you all about how to write your own lexer + parser
generator (which I found quite fascinating), but you don't need to know about
NFAs/DFA construction etc._

One of the most fun projects I made inspired by the Dragon Book was a small
grep implementation that was building up an NFA from a regular expression
parsed by recursive descent and than simulating the NFA using an algorithm
with two stacks described in the book :)

It's just that it takes a lot of time to get something practical running
starting from just the Dragon Book, and it can be discouraging given how much
effort it takes to read it. Overall I can second everything you say, though.

------
mkdir
This is an inspiring project. It would have taken me _far_ more than five
weekends to create this, so you can feel smug knowing you're about ten times
as efficient as a random HN member!

I have a couple of criticisms. First of all, strange things happen when a
semicolon is omitted from the end of a function definition. In the following
example, there is no output. What's going on?

    
    
        square = { n |
          n * n
        }
        println(square(5));
    

Second, according to the rules for omitting parentheses, it seems like this
should output "25". Instead, it outputs "Function5":

    
    
        square = { n |
          n * n
        };
        println square(5);
    

It seems like it's passing "square" and 5 as separate parameters to println,
which is unlikely to be the intended behavior in most cases.

Lastly, is recursion possible?

    
    
        factorial = { n |
          if (n < 2) { 1 }
          { n * factorial(n - 1) }
        };
        println(factorial(5));
    

That code generates the following error:

    
    
        Error: function identifier expected ('Exp() :3:5' found) at line 3 char 5
    

Anyway, I don't mean to nitpick! Do you plan on developing this further?

~~~
Udo
First: thanks for even trying it out!

Yeah, if you don't leave off the final semicolon, it doesn't return the last
value. In that case you would have to use a normal "return" statement. It's
definitely a bug, but I left it in there for some time because I thought it
was weirdly interesting. I'm going to get rid of that bug though in the next
version.

 _> Second, according to the rules for omitting parentheses_

Ah, I'm doing a bad job with the tutorial then. Parentheses don't work like
they do in C-like languages. It's more like Lisp in that regard. Your last
statement prints "square"="Function" because you're not invoking the function
but getting the function pointer. The correct way would be (again, somewhat
Lisp-like):

    
    
        square = { n |
          n * n
        };
        println (square 5);
    

Again, this would also be the solution to your last example.

I should probably make a general syntax paragraph as part of the tutorial,
especially for people coming from C-likes where invokation goes function(a)
instead of (function a).

 _Edit_ : there's a section on the site now to explain this, I hope this will
make it easier.

~~~
mkdir
Actually—I think you did explain properly. I must confess that I jumped into
the sandbox before reading the documentation, and then I only checked the
documentation when ran into behavior I didn't expect.

I _do_ think the semicolon behavior is a bit confusing. The documentation
explains that expressions aren't auto-returned if they're followed by a
semicolon, but I don't think that quite covers the behavior I pointed out. I
might be missing something else from the documentation, though...

Also, I found one more major issue. Your documentation uses the numeric
literal "3.1415" in the explanation for named parameters. That _really_ needs
to be "3.1416". :P

~~~
Udo
> _I do think the semicolon behavior is a bit confusing._

I agree.

Regarding the named params, the example is:

    
    
      f 
        3.1415 
        #name:"I'm using a named param!";
    

where 3.1415 is an unnamed parameter, and the 3rd line contains the named
part. I'm probably showing too many things at once here.

Also, there are definitely a few bugs still to iron out!

~~~
mkdir
Regarding "3.1415", I was simply passive-aggressively pointing out that if
you're referencing pi, you should probably round up to "3.1416". The example
itself made sense!

~~~
Udo
Ah, I'm so dense! Sure, there is also "math.pi" as a built-in for that. (And I
fixed this grave rounding error)

I also discovered a scope problem with the recursion example you gave. Turns
out, there is (of course) a major bug in there, thanks for discovering that.
It's about the visibility of the "factorial" symbol itself, so the workaround
is for now:

    
    
      factorial = { n |
        factorial = outer.factorial;
        if(n == 0) {
          1
        } {
          n * (factorial (n - 1))
        } 
      };
      println (factorial 5); 
    

At least, until I restart the service (which I don't want to do while everyone
is potentially using the tutorial).

------
endlessvoid94
Keep up the good work!

I recently started working on a bytecoded javascript interpreter for the same
reason - fun. I've found the biggest criticism people have for me is "why are
you doing this?" You can definitely draw a line between the typical responses:
either "oh, that's cool!" and "that's stupid". It's been interesting.

~~~
mgallivan
I've seen this response to others working on new languages. How do people
expect languages to improve without hobbyists honing their skills? A lot of
_good_ languages start out in one person's head so please keep at it!

------
niggler
Failing in chrome:

    
    
        java.lang.ArrayIndexOutOfBoundsException: 1
    	at np.LibRequest.parseCookies(LibRequest.java:37)
    	at np.LibRequest.<init>(LibRequest.java:26)
    	at np.Interpreter.initRootContext(Interpreter.java:94)
    	at np.Interpreter.run(Interpreter.java:115)
    	at np.Interpreter.load(Interpreter.java:142)
    	at npfcgi.handleRequest(npfcgi.java:60)
    	at npfcgi.main(npfcgi.java:107)

~~~
Udo
Of course I managed to cause a fatal error just after posting the link here...
;)

Anyway, the tutorial demo server should work now.

------
iskander
Looks cool but the sandboxed interpreter fails to run in my instance of Chrome
(v25 on Mac OS X):

>Error: java.lang.NullPointerException
np.LibRuntime.b_assign(LibRuntime.java:214)
sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:616)
np.LibRuntime.execute(LibRuntime.java:104)
np.CoreBuiltin.execute(CoreBuiltin.java:85)
np.ClastExp.invoke(ClastExp.java:32) np.ClastExp.run(ClastExp.java:45)
np.ClastExp.invoke(ClastExp.java:21) np.ClastExp.run(ClastExp.java:45)
np.ClastModule.run(ClastModule.java:29)
np.Interpreter.run(Interpreter.java:120)

etc...

~~~
Udo
The tutorial examples seem to work. Could you post the code you're executing?

Thanks for letting me know!

~~~
iskander
I don't remember exactly what I typed in, something along the lines of:

    
    
        x = 10;
        y = 20;
        println (x ^ (y ^ y));

~~~
Udo
Huh, the ^ operator isn't defined yet, but that shouldn't have caused it.
Damn, I would have liked to find this out, because the end user should never
see exceptions of course. Anyway, thanks for telling me, I hope it'll crop up
again in the future!

~~~
huhtenberg

      oki-doki = 42;
    

will cause an exception too.

------
akkartik
I'm curious: why did you choose the lisp-like _(f x)_ syntax for function
calls?

~~~
Udo
The short answer is: I wanted to experiment and do something different. I
thought it was worth looking into the fact that we've become so accustomed to
the f(x) form, when more logically the function should form a more obvious
unit with its parameters.

 _However_ , I have to admit that it still takes some getting-used-to even for
me. Maybe there is something about f(x) that is inherently more readable then
(f x), but I'll reserve my judgement for a week weeks.

~~~
goldfeld
I have been coding CoffeeScript for a while and slowly got rid of it's
parentheses down to doing `do someFunction`. When I learned about Clojure and
got excited to learn it (but haven't yet), I also gradually shifted my
aesthetics to using (someFunction param1, param2) in CoffeScript every time
that was needed for making it unambiguous (e.g. chained function calls), and
now I consider someFunction(param) a most hideous invocation. But (fn x y) is
even nicer. I have noticed you allow for optional commas when invoking
multiple parameters.

Overall I love your syntax decisions, maybe I'm partial for liking lispy-ness
and lambda syntax, but still it's refreshing to see someone experimenting
beyond all the C-ness that pervades us. I feel like there's a dash of Rust and
a sprinkle of Haskell too. Nice work! And if you keep development I might
actually consider using this for a project or two in the future.

~~~
Udo
_> I have noticed you allow for optional commas when invoking multiple
parameters._

Yes, but they're thrown away by the lexer. I'm actually still on the fence
about some of these details.

 _> And if you keep development I might actually consider using this for a
project or two in the future._

That's great to hear, thanks! There is still a TON of bugfixing and fleshing-
out of the runtime library to do, but when it's ready I'll be doing a project
with it as well. Gotta see how it all holds up under pressure!

------
andrewflnr
I've only skimmed it so far, but the docs seem excellent for a mere 5-weekend
project. That is perhaps the thing that impresses me most.

~~~
Udo
I basically always had the doc/html editor open alongside the IDE. It helped
to document things as I went along. I'm not 100% sure I completely succeeded
in keeping the two in sync, however. Since I always planned to Show HN, that
motivated me to make it as nice as possible. There is also a programming buddy
who had a semi-watchful eye on it.

------
mgallivan
Could you post any helpful resources you used? Also, what was your experience
with language design before going into this project?

~~~
Udo
> _Could you post any helpful resources you used?_

Not a lot initially. I did have a look at the LLVM tutorial, which is
absolutely great, _but_ I eventually closed that tab because I found it more
exciting to think about these problems by myself (instead of devoting
resources to always figuring out "why did they do it this way?"). It probably
took a lot longer, but it was also a better learning experience. There is
nothing fundamentally magic about this, one simply has to go ahead an do it.

It's a mixed bag, because I can feel there are a lot of inefficiencies in my
code that exist either because I personally find it more readable or simply
because I couldn't quite grasp the alternative at the time.

> _Also, what was your experience with language design before going into this
> project?_

None, that's a big reason why I wanted to do this. I did make a tool for
pen&paper roleplaying before that could execute standard dice codes
(<http://rolz.org/>) but that's nothing like this project. Over the years I
made several domain-specific "configuration" languages for some projects, but
again, this was different.

It did help a lot to break down the process into distinct steps that have
almost no overlap. A lexer to convert a string of tokens, a parser to build a
tree, an interpreter to execute that tree in place (not the most efficient
thing to do but it was a lot of fun), and a minimal runtime library on top of
it. For the runtime functions I did cheat a little, however, because there are
a lot of Apache Commons function wrappers in there.

There were some decisions of probably questionable value. For example, when I
decided that the language would have no commas and would use whitespace as a
generic token separator. It's been surprising sometimes to deal with the
consequences of those early decisions.

One of the things that surprised me was that despite the total absence of
optimization, np doesn't execute _abysmally slow_. When I was banging out the
code I was convinced it would degrade to C64-like performance levels ;)

I'm actually planning to add more functionality, as I've hinted at in the
tutorial, up to the point where someone could conceivably do a web project
with it. That someone will probably be me ;)

------
kunil
I am also working a programming language. I want programmers to be able to
dynamically edit program code in run time. For example replacing operators in
equations, conditions or injecting code into labels etc. Currently I have a
working parser and semi working interpreter. But I can't find enough time to
continue

------
kamaal
If you did this in your spare time or even other wise. 5 weeks is amazingly
short time for a project of this size.

Good work done!

~~~
Udo
Thanks kamaal! It probably looks larger than it is, there is still a huge
amount of stuff missing that people would expect from a real programming
language (and also, a lot of bugs still need to be fixed).

------
LolWolf
Really digging the 5-week project; great job, mate, looking forward to seeing
more!

~~~
Udo
Thanks, it's great to finally show something, even if it's not really of any
practical value!

