
Plan to throw one away - AndrewDucker
http://garethrees.org/2015/12/14/javascript/
======
normac
Interestingly, Tcl was parsed the same way all the way up until Tcl 8.0
introduced a bytecode system in 1997. Each line was parsed right before
execution with no separate parsing stage--your code could crash in the middle
of executing from a syntax error.

Tcl is now about as fast as a typical scripting language and its
implementation is regarded as a model of high-quality readable code. I wonder
if John Ousterhout (the creator of Tcl) knew he was building a rough first
system that he would need to rewrite later, or he really thought it was the
right way to do things at the time.

(To be fair, Tcl is a whole lot simpler to parse than JavaScript and
Ousterhout is a good coder, so even the pre-8.0 versions were fast enough to
be practical for everyday scripting tasks.)

~~~
munificent
> Interestingly, Tcl was parsed the same way all the way up until Tcl 8.0
> introduced a bytecode system in 1997. Each line was parsed right before
> execution with no separate parsing stage--your code could crash in the
> middle of executing from a syntax error.

Fun fact: most modern JavaScript engines in browsers work this way _today_ ,
though at a different level of granularity. Parsing slows down application
startup, so many JS engines don't parse a function body until it's first
called. This means you can have a syntax error in a function body and won't
know if it's never used.

~~~
normac
That's pretty neat. Clearly some level of parsing needs to happen before run
time, or else it couldn't even balance braces to know where the function body
ends. So it must be that it parses the function body just enough to figure out
where it starts and ends, then does a complete parse at runtime (probably
building off the results from the first stage).

~~~
munificent
> or else it couldn't even balance braces to know where the function body ends

That's literally all it does. It tokenizes and counts braces.

------
loopbit
Cool read, however, I feel the need to point out that 20 years ago, javascript
in the server wasn't "too soon", it already existed.

In 1995 Netscape had the Netscape Enterprise Server[0] that ran javascript for
server-side scripting. Actually, the two books I used, back in the day, to
learn Javascript was the client- and the server-side javascript guides
published by Netscape.

[0]
[https://en.wikipedia.org/wiki/Netscape_Enterprise_Server](https://en.wikipedia.org/wiki/Netscape_Enterprise_Server)

~~~
smegel
> javascript in the server wasn't "too soon", it already existed

Those two things aren't necessarily in conflict. Netscape's server wasn't
exactly a roaring success. What really made Javascript-in-the-server work was
a Javascript JIT engine (v8) that actually made Javascript very competitive
compared to traditional server side scripting languages. So given that there
did not exist a Javascript JIT engine back then, maybe it was too soon.

~~~
ghayes
Agree that was probably too soon, even if it existed in some form. I've often
wondered why precisely people love JavaScript on the server. I would guess
it's a mix of 1) event machine parallelism, 2) isomorphism, 3) speed (since
V8/SpiderMonkey/Chakra), 4) cross-compatibility, and 5) engineers not needing
to learn another language.

~~~
eponeponepon
I've always assumed it's heavily weighted towards 5 - never underestimate the
ability of a coder to avoid effort!

------
t0rb3n
This is precisely not what Brooks meant by 'plan to throw one away'. Brooks
point is, that you gather information while writing your prototype and
incorporate the new-found knowledge into your second, hopefully sound, design.

Rees never learned anything new from the first system (what could he learn
anyway, what he described sounds like a nightmare), but proceeded to implement
it from scratch.

Funny thing, I'm reading 'The Mythical Man-Month' right now :D

~~~
garethrees
Yes, you've spotted the irony in the title :)

~~~
t0rb3n
Awesome read btw :>

------
mcv
I would like to know how the CTO responded to the idea of throwing away the
code he'd been slaving on at the very last minute, and replacing it with
something that someone cobbled together in two weeks (but worked).

~~~
tankenmate
From the sound of the description the original CTO authored code was cobbled
together and its replacement was properly engineered (for some value of
properly).

~~~
mcv
Of course, but from the time invested, it would seem the other way around.
It's often painful to accept that you wasted a lot of time and that it's
better to start over (which is of course the entire point of the article), and
even more so if the new hire has already done that and thrown the CTO's code
away.

Ultimately it's a question of ego. Does the CTO welcome the new hire who knows
more about this than he does, or does he insist he knows better? Does he feel
threatened or slighted?

~~~
ifdefdebug
This is the "First class people hire first class people, second class people
hire third class people" paradigm.

------
jsjohnst
I find the dates a bit suspect (referenced ECMA script standard was June 1997,
book was late 1996, so how did this all happen twenty years ago, just months
following the official release of JavaScript in May 95?), but guess that's not
the point. Maybe I'm alone, but implausible details distract me from an
otherwise great story that can teach a lesson many should learn.

~~~
garethrees
For "twenty years", read "once upon a time". I feel that a bit of vagueness in
dates is appropriate for a story like this. I have no contemporary written
documentation, so it's based on my fallible memory, and I don't want to give a
false impression of accuracy.

~~~
jsjohnst
Yep, that's more or less what I did ("back in the mid to late 90s") to
continue reading and not second guess it to much. Great story otherwise,
thanks for sharing it!

~~~
garethrees
I do take your point, and so I've rewritten the opening to be even vaguer.

------
jaytaylor
Are Lex and Yacc still the state of the art 20 years later?

~~~
jhallenworld
Be prepared to just use the generated tables and forget the generated code:

The generated code is not so great if you have an event driven environment: I
mean I want to push 50 bytes into the compiler, instead of having it request
the next input character. This means it's not so great for a repl, unless some
other pre-parser gives complete translation units to lex/yacc.

I think Yacc has a pretty primitive conflict resolution strategy: (pick shift
unless otherwise indicated by precedence declarations). If code could resolve
the conflict then you could handle a lot of newer syntax: for example, in
"a-b" vs. "a -b", you could use the whitespace to distinguish between prefix
(reduce) or infix (shift).

It would be nice (I know someone who has done this..) if there was an option
to degenerate into using backtracking to resolve conflicts. It means try all
possibilities (split the stack at each conflict) and keep only those results
which produce an error free full parse. If there is more than one full result,
only then do you have a real conflict.

I'm not sure if other parser generators are any better, but there are
improvements which could be made.

~~~
eru
In terms of ease of use, Parser Combinators beat most Parser Generators. (For
absolute speed, Generators might still beat Combinators.)

~~~
jhallenworld
Yeah, I should try them. I remember a previous discussion about the old
language SNOBOL having parser combinators as a built-in feature:
[https://news.ycombinator.com/item?id=10103460](https://news.ycombinator.com/item?id=10103460)

~~~
eru
I learned Parser Combinators on Haskell. But a basic version is easy to write
yourself in any language that supports first class functions.

------
SixSigma
"Plan to throw one away, you will anyhow." \-- Fred Brooks

"If you plan to throw one away, you will throw away two." \-- Craig Zerouni

------
curiousGambler
This makes me want to write a JS interpreter... Sweet read!

~~~
PeCaN
A simple ES3 interpreter is pretty easy, particularly if you just go for 'the
good parts' (plus basic global objects).

I can highly recommend mpc[1] if you do, it's a really amazing library for
parser combinators in C. I was never much of a fan of yacc/lex (perhaps out of
my ignorance of BNF). With yacc/lex I always felt like I had to have my
grammar all planned out (which if you implement JS I suppose isn't a problem)
before I could start developing and it was very hard to work incrementally.
mpc is very flexible and since it's just a C library I find it a lot easier to
iterate with.

1: [https://github.com/orangeduck/mpc](https://github.com/orangeduck/mpc)

------
kazinator
> _The failure to store a represention of the parsed code was disastrous
> because it meant that in order to evaluate an expression or statement for a
> second time, the engine had to parse it a second time._

Ha! They must have learned how to write an interpreter from Herbert Schildt:

[http://www.drdobbs.com/cpp/building-your-own-c-
interpreter/1...](http://www.drdobbs.com/cpp/building-your-own-c-
interpreter/184408184) [1989]

In this piece of amusement, the Little C program is a giant null-terminated
string, and the "instruction pointer" is of type char * . You get the picture.

> _When I did come to write a garbage collector I used the mark-and-sweep
> algorithm. But something puzzled me, and I couldn’t find an answer in any of
> the textbooks I looked at: how was I supposed to schedule the collections?
> In a classic description of a garbage collector, you wait until memory runs
> out and then you collect the world. But this leads to bad behaviour on
> modern systems, because of swapping, and because other processes need memory
> too. You need to schedule collections well before memory is full. But when
> exactly? I still don’t know of a comprehensive solution to this problem._

In a nutshell, you let your run-time pretend that it's running in a small
machine, until it is too close to huffing and puffing too hard and then you
say "hey I lied, you're actually in a bigger machine: have some breathing
room". This rubbery constraint keeps it behaving reasonably nicely, rather
than "Wee, I have 4GB of free RAM to stomp over with strings and cons cells
before ever calling GC!"

What you have to do is pick some heap size (that is typically substantially
smaller than the amount of RAM). You let the GC whack against this artificial
threshold, and if that gets too excessive, according to some measure, you
increase it. E.g. if after a full GC you have less than some fudge threshold
free, call the OS for more memory to integrate into the object heap.

The threshold is calculated in some way that the image doesn't have to execute
numerous frequent GC's before it triggers the request for more space (it
doesn't have to whack too hard and wastefully against the artificial limit).

Also, ephemeral GC will help, and ephemeral GC can have its own threshold
against frequent ephemeral GC's. When not enough space is liberated by
ephemeral, you schedule a full. Then if _that_ doesn't liberate enough space,
add more. Since ephemeral is fast (doesn't scan the full heap), you can work a
bit closer to the heap limit (since you can suffer frequent ephemeral GC's
better than frequent full GC's).

And, of course, the parameters controlling these behaviors are exposed in some
way so users can tweak them. Command line arguments, env vars, local config
file, system config file, run time global variable/API, ...

~~~
eru
Didn't PHP also used to use the position of a fileseek as the instruction
pointer?

~~~
ploxiln
Shell script interpreters like bash do. You can change a bash script while
it's running (append, or truncate and re-write) and it'll keep going from the
same offset.

But then again, shell script is in a different class of language I'd say.

    
    
      #!/bin/bash
      COUNTER=0
      addmore() {
          echo hi $COUNTER
          ((COUNTER++))
          echo "addmore" >>$0
          sleep 1
      }
      addmore

~~~
kazinator
Bash reads code from a stream, and doesn't read the entire stream before
executing the code.

However, the stream pointer isn't its instruction pointer in the script. A
backwards branch (such as the end of a while loop, or a call to a function
defined earlier) does not rewind the stream to that piece of text.

> _But then again, shell script is in a different class of language I 'd say._

Not much different in this regard from how, say, Common Lisp (load
"file.lisp") processes the top-level forms in the file.

    
    
      (defvar *counter* 0)
    
      (defun addmore ()
        (format t "hi ~s~%" (incf *counter*))
        (with-open-file (f "addmore.lisp" :direction :output :if-exists :append)
          (write-line "(addmore)" f)))
    
      $ clisp addmore.lisp 
      hi 1
      WARNING: OPEN: #<INPUT BUFFERED FILE-STREAM CHARACTER #P"addmore.lisp" @8>
               already points to file "/home/kaz/test/addmore.lisp", opening the
               file again for :OUTPUT may produce unexpected results
               Open the file anyway
      hi 2
      WARNING: OPEN: #<INPUT BUFFERED FILE-STREAM CHARACTER #P"addmore.lisp" @9>
               already points to file "/home/kaz/test/addmore.lisp", opening the
               file again for :OUTPUT may produce unexpected results
               Open the file anyway
      hi 3
      WARNING: OPEN: #<INPUT BUFFERED FILE-STREAM CHARACTER #P"addmore.lisp" @10>
               already points to file "/home/kaz/test/addmore.lisp", opening the
               file again for :OUTPUT may produce unexpected results
               Open the file anyway
      hi 4
      WARNING: OPEN: #<INPUT BUFFERED FILE-STREAM CHARACTER #P"addmore.lisp" @11>
               already points to file "/home/kaz/test/addmore.lisp", opening the
               file again for :OUTPUT may produce unexpected results
               Open the file anyway
      hi 5

------
frabcus
As an aside, but I think relevant to Hacker News, the author of this blog post
also wrote the classic mid-1990s text adventure game Christminster
[http://garethrees.org/1995/08/08/christminster/](http://garethrees.org/1995/08/08/christminster/)

------
jhallenworld
>One thing that I had not had time to do was to write a garbage collector.

Cheat: use Hans Bhoem's conservative collector for C/C++:
[http://www.hboehm.info/gc/](http://www.hboehm.info/gc/)

------
dagw
Of course the opposite problem also exists. You plan to throw away that quick,
ugly hack you threw together for a demo as soon as the demo is done, and
somehow it sticks around in production for years to come.

------
dandare
This adage is fairly old, aren't today's programming and management techniques
rendering it obsolete?

~~~
tinco
If this guy were 'properly' managed he would never be allowed to overhaul this
project like he did. Not many managers will be impressed by a "I will build a
full compiler that is going to be better than what the _CTO_ built in the next
two weeks immediately after which is the deadline" speech, as the author also
admits.

If the company was properly managed they also wouldn't have been in a
situation where there's only two weeks left and a critical component is in a
not well understood state.

In practice though these kinds of things constantly happen in even well
managed companies. If the author weren't around they'd probably just have
shipped a slow Javascript interpreter that couldn't do prototypes very well.

Just curious, what kind of programming technique renders the idea of this
article obsolete?

------
hinkley
As a corollary:

[https://en.wikipedia.org/wiki/Ship_of_Theseus](https://en.wikipedia.org/wiki/Ship_of_Theseus)

You can replace all the parts of your crappy v1.0 code and still call it the
same code base. That's what the smart ones go around doing quietly.

~~~
dvirsky
While in most cases it's probably a better option, I think that in this case
there wasn't time to do that. I would probably explore the option to still use
the parser, and have it generate an AST first, and from there go pretty much
in the direction of the author.

~~~
hinkley
Part of the problem is being given an assignment two weeks before a ship date.
I know people do stuff like that all the time but that's hardly an excuse (we
do terrible things to each other all the time so it's okay?)

The author is already screwed before he even has a chance to start.

------
legulere
Wow, this seems quite idiotic to me. Sure, recursive descent parsers have some
theoretically inferior properties to other techniques, but in reality they're
much better because you can actually handle errors in a user-friendly way.
That's why basically every bigger compiler uses handwritten recursive descent
parsers.

~~~
dang
> this seems quite idiotic to me

Please don't be uncharitably dismissive. The author didn't throw away the
parser because it was recursive descent—in fact he explicitly excludes that as
a reason.

(The rest of your comment is fine.)

~~~
legulere
Sorry I was too harsh, but to me it reads much like: I personally don't like
this code, so I threw away everything.

~~~
dang
It's true that that's a mega motivation lurking in the soul of every
programmer. But it's also often true that writing a program can be faster than
revising one (or even reading it) when the original programmer isn't
available.

To me his justification sounded pretty convincing, though not to you, which is
totally fair. As far as HN commenting goes it's just a matter of explaining
your view in a substantive way.

