

Dead Code Elimination for Beginners - kinetik
http://blog.mozilla.com/rob-sayre/2010/11/17/dead-code-elimination-for-beginners/

======
kenjackson
I highly recommend reading this article.

Although it does point out one reason that JS is a poor language to be the
foundation for client-side web computing: It's extremely difficult to optimize
a language this dynamic. I honestly think we could be leaving 50% perf
improvement on the floor due to the dynamic nature of the language.

And this is on top of the fact that you already have to reduce the amount of
time spent optimizing since you don't compile the code ahead of time (most
optimistically using a JIT of some sort).

Can we please move away from JS and move to a nice IL?

~~~
rbanffy
> JS is a poor language to be the foundation for client-side web computing

> I honestly think we could be leaving 50% perf improvement on the floor

Unfortunately, humans have not got significantly faster (or smarter) in the
last couple thousand years, as opposed to computers, that get twice as fast
every 18 months or so.

Dynamic languages, by placing the burden of optimization on the
compiler/runtime are making the right bet. Static-typing is little more than a
human doing the machine's work, often at the expense of introducing subtler
bugs (the compiler catches the obvious mistakes and that may leave programmers
with a false sensation of security).

> Can we please move away from JS and move to a nice IL?

As for moving to precompiled, possibly obfuscated scripts, I would much rather
concede a little less dynamism on the language and move the compiler into the
browser. You could, conceivably, declare the type of arguments and return
values and rather easily, in that case, perform some optimization as the code
is parsed.

I have no love for JavaScript. It's a little tortured language that I am sure
passed through a lot more marketing specialists than needed before taking its
current form and bears the scars from that process. It has, however, some good
parts that are very good.

~~~
sid0
> Static-typing is little more than a human doing the machine's work

Interesting. I look at (good: e.g. ML or Haskell) static typing as the machine
doing the work that a human would have to do in a dynamic language. Especially
type inference.

~~~
rbanffy
There are cases when it makes sense to use static typing or, at least, to
stick to certain types for input and output.

In my experience however, those cases are not many and doing it when you don't
have to creates unneeded complexity.

~~~
sid0
There are cases when it makes sense to use dynamic typing (e.g. dealing with
XML and friends).

In my experience however, those cases are not many and doing it when you don't
have to creates an unneeded burden on the programmer.

(sorry -- this is probably not to HN's taste, but I just wanted to show how an
entirely different perspective can also be valid)

~~~
rbanffy
Unless you intend to strictly enforce types (rather than validity) on your
dynamic code, there is no burden on the developer.

Now I am curious: what do you do? I do mostly server-side web stuff, but I
have done embedded/driver and can see situations where you really need a
3-byte unsigned integer to stay a 3-byte unsigned integer. But, apart from
that, I can't imagine other cases.

~~~
sid0
_Unless you intend to strictly enforce types (rather than validity) on your
dynamic code, there is no burden on the developer._

Of course there is. I program more in dynamic languages (particularly JS) than
in statically-typed ones, and I always keep worrying about passing
representation A of some data rather than representation B. I find myself
having to look at the source code of callees far more than I think I should
be.

On the other hand, my academic life revolves around statically-typed languages
(Haskell). The way Haskell's type system is designed means that there are
entire programming paradigms that are not just available but _production-
ready_ in Haskell, but are virtually impossible in any other popular language.
For a concrete example, see software transactional memory.

I'm also a huge fan of pushing as much lower-level drudgery as possible to
machines to free up human minds for higher-level thinking, and a good type
system combined with good type inference goes a long way.

~~~
sedachv
I don't understand why you need monads to get STM. orElse is neat, but there
are other ways to do the same thing in imperative languages.

~~~
thesz
I quote Simon Peyton-Jones: "STM requires runtime logging of every memory read
and write. In an imperative language (such as C#), in which the very fabric of
computation involves reads and writes, STM is bound to be expensive. [Lots of
work has been done to omit redundant logging based on program analysis, but
it's still expensive.] When do you need readTVar/writeTVar? Answer, precisely
when you are manipulating state shared between threads."

[http://www.0x61.com/forum/programming-haskell-
cafe-f245/is-t...](http://www.0x61.com/forum/programming-haskell-cafe-f245/is-
there-any-experience-using-software-transactional-memory-in-substantial-
applications-t1289646.html)

It looks like STM works better for functional languages and you _have_ to mark
shared data somehow. And then you have to mark the code that manipulates
shared data somehow (so it won't prematurely launch missiles). Then you're
reinvented monad.

I reserve right to be wrong so I'd like to see what are the ways for
imperative languages.

~~~
sid0
Oh, tons of work has been done in trying to bring STM to imperative languages,
most notably with STM.NET (which is what SPJ's referring to there).

------
ashish01
Follow up on : <http://news.ycombinator.com/item?id=1913102>

------
jsvaughan
this is pretty damning for ie9 js performance isn't it? they are going to have
to rework their static code analysis and it sounds like they don't have the js
knowhow - what else might they incorrectly be optimizing?

~~~
rbanffy
> this is pretty damning for ie9 js performance isn't it?

I think the whole point of the post is to poke fun at how Microsoft is
cheating the benchmark and how it was caught. The part on the definition of
Angles and that the "optimizer" should throw an exception it doesn't (most
probably because it's not optimizing anything) is hilarious.

~~~
sid0
_The part on the definition of Angles and that the "optimizer" should throw an
exception it doesn't (most probably because it's not optimizing anything) is
hilarious._

What are you talking about? That code _is_ getting optimized away into a no-op
when it shouldn't be, which is why no exception is thrown. This means that

a. the optimization is _definitely_ happening

b. it is unsound

~~~
rbanffy
Unsound in a _very_ specific and suspicious way - yesterday it was pointed out
adding a single, do-nothing, line to the middle of the function makes the
"optimizer" fail.

I also think the "for beginners" in the title is somewhat directed towards
whoever implemented the "optimizer".

~~~
sid0
_Unsound in a very specific and suspicious way - yesterday it was pointed out
adding a single, do-nothing, line to the middle of the function makes the
"optimizer" fail._

That isn't unsoundness, that's incompleteness. An optimization being
incomplete means that something that _could_ be optimized isn't -- one being
unsound means that something that _shouldn't_ be optimized is. These terms
come from mathematical logic.

~~~
rbanffy
Sorry. I am an engineer. I got it as "not solid", "wrong", "flaky". I appears
all three apply perfectly to the optimizer at hand. Being it _both_ unsound
(as it optimizes too much of a benchmark) _and_ incomplete (as it doesn't
recognize the invalidity of the code being optimized - the Angles thing - and
fails to optimize too much when a no-op is added) we could just be more
concise and settle for "broken".

~~~
scott_s
I think you're still mixing up what "unsound" means. If I say that applying an
optimization is unsound, it means that if I apply that optimization, I can no
longer guarantee that the resulting code is semantically equivalent to the
original code.

Being incomplete means the code you generate is not as fast as possible. Being
unsound means the code you generate may be _wrong_. We don't normally call an
optimizer "broken" if it does not optimize something even though we can
recognize the optimization could be applied - no optimizer does every
conceivable thing. But generated code that is not semantically equivalent to
what was given violates correctness.

~~~
rbanffy
I think the "Angles" removal demonstrates the optimization is unsound. The
"optimized" code gives out a result when a correctly optimized one would give
a ReferenceError. This optimizer ignores invalid code (that references a non-
existent thing) and happily short-circuits it.

The only reason to trust this code to do other optimizations well is all the
evidence it's targeted at one specific benchmark.

------
larsch
What are the real benefits of dead code elimination on actual production code
in real applications? Seems to me that only synthetic tests contain dead code
(by intend) whereas dead code in production code is by definition unnecessary
and, by most coding standards, a bug.

~~~
saucetenuto
It's important to be able to delete dead code even if your developers never
write any, because earlier compiler optimization passes can create dead code.
For example, consider this simple function written in a hypothetical dynamic
language:

    
    
      function add(a,b) { return a + b }
    

Our hypothetical compiler turns this function into some IL that looks like
this:

    
    
      function add1(a,b)
      {
        if(is_null(a)) throw ArgumentNullException()
        if(is_null(b)) throw ArgumentNullException()
        low_level_generic_add(a,b)
      }
    

...where low_level_generic_add sums a and b if they're integers, or looks for
a suitable overloaded operator+ and invokes it if they're not.

Suppose further that our hypothetical language has a tracing JIT, and that
most of the time when a and b are called, they're both integers. In that case,
the compiler might perform the following transformation:

    
    
      function add2(a,b)
      {
        if(is_integer(a) && is_integer(b)) return add_integers(a,b)
        else return add_generic(a,b)
      }
    

...where add_generic is just a copy of add1 above, and add_integers is a copy
of add1 with low_level_generic_add() replaced by the less-expensive
perform_integer_addition()

    
    
      function add_integers(a,b)
      {
        if(is_null(a)) throw ArgumentNullException()
        if(is_null(b)) throw ArgumentNullException()
        perform_integer_addition(a,b)
      }
    

But suppose one more thing: let's say integers in this language are value
types; that is, that is_null(some_integer) is never true. Then you can get
another performance win in add_integers by removing the two null checks - but
you can only find out about that if you have a dead code detector.

This has been your woefully incomplete dynamic compilation moment.

