

Computing file hashes with node.js - titel
http://www.saltwaterc.eu/computing-file-hashes-with-node.js.html

======
nakkiel
I ran the code, which now displays number of calls to update(), and made
another test with the following code:

    
    
       7 fs.readFile(filename, function (err, data) {
       8     shasum.update(data);
       9     console.log(shasum.digest('hex') + '  ' + filename);
      10 });
    

Here are the results:

    
    
      $ dd if=/dev/urandom of=xxx.data bs=1000 count=200000
      $ time sha256sum xxx.data
      e774e4c46ab832ec09dbfd1a944044651560c3fdc3c5e2e8b46c2ea7d54f6649  xxx.data
      
      real	0m2.087s
      user	0m2.000s
      sys	0m0.064s
      $ time node xxx-1.js xxx.data
      e774e4c46ab832ec09dbfd1a944044651560c3fdc3c5e2e8b46c2ea7d54f  6649  xxx.data
      4883
      
      real	0m13.972s
      user	0m13.885s
      sys	0m0.264s
      $ time node xxx-2.js xxx.data
      e774e4c46ab832ec09dbfd1a944044651560c3fdc3c5e2e8b46c2ea7d54f6649  xxx.data
      
      real	0m14.043s
      user	0m13.433s
      sys	0m0.732s
    

Personally I never make multiple calls to update() but it seems not to be
linked here. Oh, and check the docs, readFile() is async.

PS: timings of xxx-1.js and xxx-2.js are equivalent on several runs.

~~~
SaltwaterC
The comment about blocking the event loop is all about hogging the CPU with
the hash computation, stuff that's not light. See the "node.js is cancer"
article for more details. fs.readFile(), although non-blocking, is worse since
it actually buffers the data. Some files may be larger than the system memory.
Kinda impossible in this case to buffer all the bytes. Even without multiple
calls to update(), node is still dog slow.

------
krmmalik
I dont know if its just me, but i found the results difficult to read and
understand. Too much metadata in there that i dont need to know about. Just
sum it up with the actual results that we need to say would make it easier to
digest the information being given.

No doubt its useful information but when presented like this the effect is
diluted.

~~~
SaltwaterC
node hash.js – 29.661s sha256sum – 5.093s openssl dgst -sha256 – 4.567s

Updated the article as well. I usually post all the output to avoid the
appearance that I made up the results. Even though they are fully reproducible
up to an extent. Or that I have better things to do that making up horror
stories about poor node.js performances in certain areas.

------
mahmud
useless insanity! you can do same in the shell and with every PL worth using.

Seriously folks...

~~~
nakkiel
Well, in the real world people tend not to fork from within a webserver.

~~~
fexl
I have noticed a popular aversion to "fork", and I don't quite understand it.
It uses the "copy on write" technique, and it seems very fast in my
benchmarks. For example, I have a benchmark where I can spawn 1000 or more
processes right from the command line, all banging away on shared files with
locking, and it all seems very fast and stable, every time. I don't even
bother with pre-forking. But I do make heavy use of "Keep-Alive", with
appropriate guards against abuse under heavy load. I'm a big fan of the
simplicity of fork and blocking I/O, but I'm sure other people have different
requirements and constraints they might want to share here.

~~~
SaltwaterC
Spawning a new process is not the same thing as forking, but people often
forget this bit. This post wasn't about forking.

~~~
fexl
I thought that spawning a new process _was_ the same as forking. What am I
missing?

~~~
SaltwaterC
<http://linux.die.net/man/2/fork> \- this explains better.

~~~
fexl
Thanks, but I already know what fork does. My point is that "fork" is
precisely how one spawns a new process, and there is no other way to do so.

------
willvarfar
ouch that is horrid; you can almost imagine it feeding it to the update() one
boxed byte at time

