

Try-catch speeding up my code? - vishal0123
http://stackoverflow.com/questions/8928403/try-catch-speeding-up-my-code

======
gorhill
I actually have a similar question re. js since a while now [with Chromium
34]... Consider these two pieces of code which do _exactly_ the same thing,
one is a standalone function:

    
    
        var makeKeyCodepoint = function(word) {
            var len = word.length;
            if ( len > 255 ) { return undefined; }
            var i = len >> 2;
            return String.fromCharCode(
                (word.charCodeAt(    0) & 0x03) << 14 |
                (word.charCodeAt(    i) & 0x03) << 12 |
                (word.charCodeAt(  i+i) & 0x03) << 10 |
                (word.charCodeAt(i+i+i) & 0x03) <<  8 |
                len
            );
        };
        

The other a method:

    
    
        var MakeKeyCodepoint = function() {};
        MakeKeyCodepoint.prototype.makeKey = function(word) {
            var len = word.length;
            if ( len > 255 ) { return undefined; }
            var i = len >> 2;
            return String.fromCharCode(
                (word.charCodeAt(    0) & 0x03) << 14 |
                (word.charCodeAt(    i) & 0x03) << 12 |
                (word.charCodeAt(  i+i) & 0x03) << 10 |
                (word.charCodeAt(i+i+i) & 0x03) <<  8 |
                len
            );
        };
        var makeKeyCodepointObj = new MakeKeyCodepoint();
    

Now why the standalone function runs at over 6.3M op/sec, while the method
runs at 710M op/sec (on my computer)?

Try it: [http://jsperf.com/makekey-concat-vs-
join/3](http://jsperf.com/makekey-concat-vs-join/3)

~~~
chewxy
I could be wrong (and if so, pie my face), but I believe it's mostly due to
one of the many the inline cache optimizations that v8 employs.

Let's consider the receiver (i.e the `this` value) of Example 1 and 2. The
receiver of Example 1 is Benchmark, if invoked normally. The receiver of
Example 2 is the empty function object function(){}.

When you call makeKeyCodepointObj.makeKey() - the VM looks up the object's
prototype chain and finds the function. This call site is cached (think of it
as a K:V store, where the key is "makeKeyCodepointObj.makeKey" and the value
is the call site of the function.)

When you call makeKeyCodepoint(), the VM has to, for each call, look up the
prototype chain until it finds the variable. The variable is then resolved
into the function call site. Because of scoping issues in JS, I don't think
this is cached (or if it's cached, it'd be invalidated a lot), and a lookup
has to happen every time. (I know in my JS engine, I tried to perform caching
optimization for global object properties and I gave up).

TL;DR: Function lookups happen all the time when the function is a method of
the global object. When a function is a method of an object, the lookup is
cached.

If I am talking out of my arse, please feel free to correct me.

~~~
Stratoscope
I don't think a global variable lookup is the reason for the difference. Here
is the code that jsperf generates for the function version of the test:

    
    
        (Benchmark.uid1400600789397runScript || function() {})();
        Benchmark.uid1400600789397createFunction = function(window, t14006007893970) {
            
            var global = window,
                clearTimeout = global.clearTimeout,
                setTimeout = global.setTimeout;
                
            var r14006007893970, s14006007893970, m14006007893970 = this,
                f14006007893970 = m14006007893970.fn,
                i14006007893970 = m14006007893970.count,
                n14006007893970 = t14006007893970.ns;
            
            // Test Setup
            var makeKeyCodepoint = function(word) {
                var len = word.length;
                if (len > 255) {
                    return undefined;
                }
                var i = len >> 2;
                return String.fromCharCode(
                    (word.charCodeAt(    0) & 0x03) << 14 |
                    (word.charCodeAt(    i) & 0x03) << 12 |
                    (word.charCodeAt(  i+i) & 0x03) << 10 |
                    (word.charCodeAt(i+i+i) & 0x03) <<  8 |
                    len
                );
            };
            
            s14006007893970 = n14006007893970.now();
            while (i14006007893970--) {
                // Test Code
                var key;
                
                key = makeKeyCodepoint('www.wired.com');
                key = makeKeyCodepoint('www.youtube.com');
                key = makeKeyCodepoint('scorecardresearch.com');
                key = makeKeyCodepoint('www.google-analytics.com');
            }
            r14006007893970 = (n14006007893970.now() - s14006007893970) / 1e3;
            
            return {
                elapsed: r14006007893970,
                uid: "uid14006007893970"
            }
        }
    

The test setup and the test itself are all part of the same function, and
makeKeyCodepoint is a local variable in that function.

~~~
chewxy
variable and property lookups do go through different processes (and hence
optimized differently).

variables are stored on activation records (a Context object in v8), while
properties are stored in well, a magic hidden class type of thing (for v8).

The latter can be cached, the former not so much. Plus, the former also
creates quite a bit of garbage, so gc should theroetically kick in more often

~~~
comex
I would expect any local variables to be stored in registers once the
optimizer kicks in. I bet there's a different explanation.

------
userbinator
I wasn't surprised to see it had to do with register allocation, since I've
encountered some extremely odd compiler output with similar issues before.
"Why would it ever decide this was a good idea?" is the thought that often
comes to mind when looking through the generated code.

Register allocation is one of those areas where I think compilers are pretty
horrible compared to a good or even mid-level Asm programmer, and I've never
understood why graph colouring is often the only way that is taught because
it's clearly not the way that an Asm programmer does it, and is also
completely unintuitive to me. It seems to assume that variables are allocated
in a fixed fashion and a load-store architecture, which is overly restrictive
for real architectures like x86. There's also no interaction between RA and
instruction selection, despite them both influencing each other, whereas a
human programmer will essentially combine those two steps together. The bulk
of research appears to be stuck on "how do we improve graph colouring", when
IMHO a completely new, more intuitive approach would make more sense. At least
it would make odd behaviour like this one less of a problem, I think.

~~~
davidcuddeback
Register allocation is an NP-complete problem. Graph coloring works because it
can be done with information available to a compiler from live variable
analysis.

Problems are classified as NP-complete based on the Turing machine model.
Since the human brain is not a Turing machine, it may be better suited for
solving NP-complete problems than a computer. Solutions to NP-complete
problems commonly employ heuristics to strike a balance between completeness
and efficiency. The human brain seems (at least to me) to be better at solving
problems involving heuristics. Chess is an obvious example.

~~~
userbinator
To me, whether it's NP-complete is of little concern, since humans have been
allocating registers (and beating compilers) with little difficulty. On the
contrary, I feel that being labeled as NP-complete has somehow slowed the
development of better RA algorithms, on the basis that it's "too hard".
There's a saying "one of the first steps to accomplishing something is to
believe that it's possible", and if people believe that RA is a more difficult
problem than it really is, then that has a discouraging effect. NP-
completeness is only related to the complexity as the problem size increases,
but in practice the problem sizes aren't that big --- e.g. within a function,
having several dozen live variables at a time is probably extremely uncommon,
and machines don't have that many registers - a few dozen at most.

I think the human brain is probably Turing-equivalent, but it also likely
doesn't matter -- if I can describe the algorithm that I, as a human, take to
perform compiler-beating register allocation and instruction selection, then a
machine can probably do it just as well if not faster than me since it can
consider many more alternatives and at a much faster rate.

I agree that heuristic approaches are the way to go, but in a "too far
abstracted" model like graph colouring, some heuristics just can't be easily
used; e.g. the technique of "clearing the table" \--- setting up all the
registers prior to a tight loop, so that the instructions within do not have
to access memory at all. Using push/pop (x86) for "very ephemerally-spilled"
values is another one.

~~~
davidcuddeback
> _To me, whether it 's NP-complete is of little concern, since humans have
> been allocating registers (and beating compilers) with little difficulty._

Following that logic, one would conclude that writing an AI for Go is trivial
as well [1].

> _if I can describe the algorithm that I, as a human, take ... then a machine
> can probably do it just as well if not faster_

This is pretty easy to disprove. You can probably look at a program's source
code and tell whether or not it halts. But it's been proven that a Turing
machine cannot [2]. The halting problem is one of many undecidable problems in
computer science [3]. If any one of the undecidable problems can be solved by
a human, that proves that the human brain is not Turing equivalent.

[1]
[https://en.wikipedia.org/wiki/Computer_Go](https://en.wikipedia.org/wiki/Computer_Go)

[2]
[https://en.wikipedia.org/wiki/Halting_problem](https://en.wikipedia.org/wiki/Halting_problem)

[3]
[https://en.wikipedia.org/wiki/Undecidable_problem](https://en.wikipedia.org/wiki/Undecidable_problem)

~~~
yongjik
I think you misunderstand what is the halting problem. It's being able to tell
whether a program will halt or not, _for all conceivable programs_. A human
brain certainly can't do that.

For example, does this program halt? (Let's assume infinite-precision numbers,
for simplicity. After all, a Turing machine can access an infinitely long
tape.)

    
    
        for (int n = 3; ; n++)
          for (int a = 1; a < n; a++)
            for (int b = 1; b < n; b++)
              for (int c = 1; c < n; c++)
                for (int m = 3; m < n; m++)
                  if (pow(a, m) + pow(b, m) == pow(c, m)) exit(1);
    

Show me that this program never halts, and you just proved Fermat's last
theorem.

Edit: added one missing loop

------
stinos
One of the nice things about this question (apart from the serious in-depth
answers) is that Eric Lippert himself comes with an answer after discussing it
directly with the people that can actually provide the proper fix. Q&A at it's
best!

 _edit_ same goes for Jon Skeet of course, and looking for info about him I
came across this [http://meta.stackexchange.com/questions/9134/jon-skeet-
facts...](http://meta.stackexchange.com/questions/9134/jon-skeet-facts/9235)
which has some hilarious ones like

 _Jon Skeet 's SO reputation is only as modest as it is because of integer
overflow (SQL Server does not have a datatype large enough)_ and _When Jon
Skeet points to null, null quakes in fear._

------
driax
Notice that this question is 2 years old. I would imagine that several lots of
things have happened for Roslyn. (They even talk about some of what they were
working on). Nevertheless quite interesting.

------
nutjob2
It's a compiler bug.

~~~
teebot
I wish I could say that more often

~~~
hugi
No you don't :)

~~~
dllthomas
More often per unit of bug, maybe (... maybe.). No more often per unit of time
or unit of code, please!

------
logn
I don't code in C# but it would be interesting to surround the code in just a
block instead of a try-catch block and see if the same behavior is evident. If
a plain block is still slow, then maybe branch prediction gets overwhelmed
with considering having to unwind the stack all the way to main and dealing
with open streams and objects on the stack.

edit: I don't know byte code or machine code well so my description of what
happens unwinding the stack is probably wrong, but my point is just that it's
simpler for the CPU not having the possibility of unwinding the stack beyond
the code section OP called out.

------
batmansbelt
It's literally the best feeling in the world when Eric Lippert answers your c#
question.

It's like if you cried out "dear God, why?" about your troubles but actually
got a response.

------
fulafel
Puzzling that so many people still run in i386 mode. I haven't used a 32-bit
system since shortly after x86 hardware went 64-bit, 10+ years ago. I guess in
the Windows world it's because of XP?

~~~
listic
I run in i386 becaues it uses less memory (though I think I'll be switching)

~~~
dbaupp
Linux offers the x32 ABI[1] for this reason: small 32-bit pointers but
maintaining the advantages of x86-64 (more/bigger registers etc).

[1]:
[http://en.wikipedia.org/wiki/X32_ABI](http://en.wikipedia.org/wiki/X32_ABI)

~~~
srean
Have you used it or know people who have ? I am very curious about the
experience and the details. Are there distributions that ship with libraries
compiled with X32_ABI ?

~~~
ldng
Maybe some embedded systems distro have tried but as far as I know not any
major distro. That would probably uncover quite a lot of bugs and require
adaptations since lot of code assume Intel and compatible are either x86 or
x86_64 and address memory accordingly.

There are some people trying to port Archlinux to x32 but I really don't know
the status.

[edit] An one year old LWN article :
[http://lwn.net/Articles/548838/](http://lwn.net/Articles/548838/)

