
Show HN: Fdir – Faster Node.js glob alternative - thecodrr
https://github.com/thecodrr/fdir
======
thecodrr
Hey everyone,

I recently created fdir mostly out of curiosity about how fast a program
written in Node.js could be. It so happened that I (accidentally) created the
fastest directory crawler in the NodeJS environment. fdir can easily crawl
around 1 million files in under 1 second. 1 million files distributed in about
100k directories. (your mileage may vary depending on hardware).

It's also < 1kb in size (gzipped). Supports all node versions (> 6).

Feel free to give it a run and ask me any questions (if any) :D

Blog post: [https://dev.to/thecodrr/how-i-wrote-the-fastest-directory-
cr...](https://dev.to/thecodrr/how-i-wrote-the-fastest-directory-crawler-
ever-3p9c)

Take care, thecodrr

~~~
29athrowaway
I don't like it when a software description includes claims like "fast",
"lightweight", "secure", "simple".

That is not an intrinsic trait of your software, just an aspirational thing.

~~~
threatofrain
All projects make trade-offs on values, and I'm happy to see what values the
author would like to identify their project with. Unless the project is
already done and in maintenance mode, yes, it's aspirational.

~~~
29athrowaway
It is also irresponsible and misleading to say it. It is self-promoting in a
way that is against the best interest of the user.

We are engineers not snake oil salesmen.

------
lhorie
This here looks wrong:
[https://github.com/thecodrr/fdir/blob/master/index.js#L86](https://github.com/thecodrr/fdir/blob/master/index.js#L86)
It's calling a blocking method from async in Node <10

Another thing I noticed, it looks you only handle `dirent.isDirectory()`, but
not `dirent.isSymbolicLink()` (meaning the library won't find files in
symlinked folders, e.g. lerna node_modules)

~~~
thecodrr
Hey, thanks for taking the interest.

Yes, that sync method call is intentional (for now) +. I have been meaning to
add separate lstat call for async/sync for < node 10 without impacting
performance. So that will change. Thanks for pointing that out.

Handling symlinks is work in progress. (it will be exposed as an option that
you can enable/disable because symlinks resolution is expensive and not
everyone wants that).

------
zXuPh94rt
This has absolutely nothing to do with globbing. The title is very misleading
– I was super excited to see a new glob library.

~~~
thecodrr
How so? You can easily extend the searchFn to do the glob matching of files
for you. I wanted to keep it dependency free. So I fail to see how fdir has
"absolutely nothing to do with globbing"? Care to explain?

~~~
lhorie
Not parent, but I think their argument is that this lib doesn't implement a
glob interface so calling it a glob alternative is misleading, i.e. one cannot
easily replace a globbing library with this if they expose the glob DSL as an
interface. It's more accurate to say that your lib is a recursive-readdir
alternative.

~~~
thecodrr
In that case, fair enough. But no reason to be dishearted, we can always make
"fdir-glob" that has globbing built in :D but you are right, not an easy*
alternative but still an alternative.

------
BiteCode_dev
I took it for a ride

They all list all files recursively in a sync fashion, from the node_modules
dir (like in the benchmark), excluding dirs, and print a total count.

Here are the results calculated with hyperfine
([https://github.com/sharkdp/hyperfine](https://github.com/sharkdp/hyperfine)):

    
    
        hyperfine "bash test.sh" --warmup 5
        Benchmark #1: bash test.sh
        Time (mean ± σ):       7.5 ms ±   0.5 ms    [User: 4.5 ms, System: 4.2 ms]
        Range (min … max):     6.9 ms …  10.5 ms    332 runs
    
        hyperfine "perl test.pl" --warmup 5
        Benchmark #1: perl test.pl
        Time (mean ± σ):      25.6 ms ±   1.2 ms    [User: 16.8 ms, System: 8.8 ms]
        Range (min … max):    24.0 ms …  30.8 ms    97 runs
    
        hyperfine "python3.7 test.py" --warmup 5
        Benchmark #1: python3.7 test.py
        Time (mean ± σ):      43.4 ms ±   1.4 ms    [User: 32.6 ms, System: 10.9 ms]
        Range (min … max):    40.9 ms …  46.8 ms    66 runs
        
        hyperfine "ruby test.rb" --warmup 5
        Benchmark #1: ruby test.rb
        Time (mean ± σ):      66.5 ms ±   2.0 ms    [User: 52.1 ms, System: 14.4 ms]
        Range (min … max):    63.2 ms …  70.3 ms    42 runs
    
        hyperfine "node test.js" --warmup 5
        Benchmark #1: node test.js
        Time (mean ± σ):      83.7 ms ±   4.0 ms    [User: 74.7 ms, System: 15.6 ms]
        Range (min … max):    79.4 ms …  95.3 ms    36 runs
    

Here are the results of an hello world with each runtime for comparison:

    
    
        hyperfine "bash test.sh" --warmup 5
        Benchmark #1: bash test.sh
        Time (mean ± σ):       1.2 ms ±   0.3 ms    [User: 1.1 ms, System: 0.3 ms]
        Range (min … max):     0.9 ms …   3.8 ms    1521 runs
    
        hyperfine "perl test.pl" --warmup 5
        Benchmark #1: perl test.pl
        Time (mean ± σ):       1.3 ms ±   0.3 ms    [User: 1.3 ms, System: 0.3 ms]
        Range (min … max):     1.0 ms …   5.3 ms    1103 runs
    
        hyperfine "python3.7 test.py" --warmup 5
        Benchmark #1: python3.7 test.py
        Time (mean ± σ):      19.5 ms ±   0.9 ms    [User: 16.2 ms, System: 3.4 ms]
        Range (min … max):    18.3 ms …  23.7 ms    144 runs
    
        hyperfine "ruby test.rb" --warmup 5
        Benchmark #1: ruby test.rb
        Time (mean ± σ):      55.2 ms ±   2.2 ms    [User: 47.2 ms, System: 8.1 ms]
        Range (min … max):    52.2 ms …  61.9 ms    51 runs
        
        hyperfine "node test.js" --warmup 5
        Benchmark #1: node test.js
        Time (mean ± σ):      55.4 ms ±   1.8 ms    [User: 49.5 ms, System: 7.0 ms]
        Range (min … max):    53.0 ms …  60.0 ms    53 runs
    

Now, my machine is not setup for a clean benchmark. The disk cache is warmed
up. Hyperthreading is on. Other softwares are running.

Plus the scripts all found a slightly different number of files :) I suspect
they all treat symlinks/dotted dirs differently, and I didn't take the time to
normalize. Although I don't think this makes up for the difference.

Still the result is a bit interesting. The non JS tests are not using any 3rd
party libs. Ruby and Node seems to have the same cost for VM startup.

I'm quite surprised that node is last frankly, especially on a uber optimized
code. I'm expecting V8 code to be blazing fast as it's Google made and C++.

~~~
thecodrr
Hey, thanks for taking the interest and the time to audit and benchmark.

You can get a rough idea of VM startup time by running a simple hello world
program.

fdir is supposed to compete against nodejs alternatives so I am not surprised
it comes last. Maybe the next goal should be to optimize it against other
cross language platforms.

V8 does a lot of optimizations but there is still overhead. A native solution
in bash has little to no overhead so no wonder it's the fastest.

Anyway, thanks again. This benchmark was insightful (also humbling).

~~~
BiteCode_dev
Gonna add test with hello world to see vm startup time.

~~~
thecodrr
If we subtract the vm time from total time that should (very roughly) be the
time taken for fdir to crawl the directory.

So basically:

83ms - 55ms = 28ms

Not bad, not bad. :D the scale remains the same tho so not that beneficial.

------
mayank
This is a replacement for walk, not glob.

