
Performance of nnn vs. ls - apjana
https://github.com/jarun/nnn/wiki/performance
======
SonOfLilit
Seems like running `ls` takes twice as long as running `nnn`, but I wonder how
long it takes to type each...

    
    
        $ ipython
        In [1]: %time input()
        ls
        Wall time: 324 ms
        Out[1]: 'ls'
        
        In [2]: %time input()
        nnn
        Wall time: 711 ms
        Out[2]: 'nnn'
    

In other words, `nnn` takes twice as long (on my setup, including a qwerty
keyboard and my fingers and muscle memory) and it's on the same order of
magnitude as running them, so I guess they're more or less equal.

This is all tongue in cheek, but my point is that the important optimization
is of user productivity and not of runtime, and in that respect as far as I'm
aware `Total Commander` on Windows and its open source plugin-compatible clone
`doublecmd` that runs on both Windows and Linux are far superior to anything
else in the market. Don't let the Windows-3.11 reminiscent design scare you,
with Total Commander you'll be as productive doing complex file operations as
those vim masters are doing complex text transformations, and you won't ever
be able to go back to any other file manager.

edit: I misread, actually typing them is a decimal order of magnitude slower
than running them on 2083 files.

~~~
jandrese
I alias 'ls' to 'd' when I'm setting up an environment.

~~~
eindiran
Do you mean that you have an alias like this:

    
    
      alias d='ls'
    

Or is there a program called `d` that is a faster version of `ls` that you're
aliasing `ls` to?

~~~
jandrese
Just an alias. I figure I could use a single letter command for something I do
like 1/3 of the time on the commandline, especially since it's on the opposite
hand from the enter key.

ls is actually pretty well optimized for an QWERTY layout already so it's not
a big win, but it's something I've been doing for 25 years now so the habit is
well ingrained.

Usually the alias is now:

    
    
      alias d="ls -F --color=auto" 
    

on Linux systems.

~~~
v_lisivka
Why not just use a console file manager, like mc?

~~~
apjana
`nnn` is a console file manager.

------
kgwxd
Just before this made it to the front page, I happened to see this toot about
slow ls:
[https://ubuntu.social/@Simon/102054773088776436](https://ubuntu.social/@Simon/102054773088776436)

I'm curious if that would make a difference.

~~~
apjana
For reference, from the system where the numbers were taken:

    
    
         $ echo $LS_COLORS
         
         $
    

LS_COLORS wasn't set. The readings were taken for ls default performance.

~~~
BlackLotus89
LS_COLORS mustn't be set for colors to be used. If you unset it the default
color schema will be used.

There was an article about this posted to HN a week ago
[https://news.ycombinator.com/item?id=19761159](https://news.ycombinator.com/item?id=19761159)

------
kekebo
A comparison to exa[0] would be interesting as well

[0] [https://github.com/ogham/exa](https://github.com/ogham/exa)

~~~
apjana
I am on a much faster system right now. From 2 sample runs of each, ls is
consistently way (~5 times) faster than exa:

exa:

    
    
        run1:
        0.03user 0.02system 0:00.11elapsed 56%CPU (0avgtext+0avgdata 8672maxresident)k
        208inputs+0outputs (1major+1333minor)pagefaults 0swaps
    
        run2:
        0.04user 0.00system 0:00.13elapsed 42%CPU (0avgtext+0avgdata 8560maxresident)k
        0inputs+0outputs (0major+1310minor)pagefaults 0swaps
    

ls:

    
    
        run1:
        0.00user 0.01system 0:00.02elapsed 69%CPU (0avgtext+0avgdata 3812maxresident)k
        0inputs+0outputs (0major+303minor)pagefaults 0swaps
    
        run2:
        0.01user 0.00system 0:00.02elapsed 62%CPU (0avgtext+0avgdata 3840maxresident)k
        0inputs+0outputs (0major+305minor)pagefaults 0swaps

~~~
eridius
How about lsd?
[https://github.com/Peltoche/lsd](https://github.com/Peltoche/lsd)

~~~
apjana
lsd:

    
    
        run1:
        0.03user 0.04system 0:00.16elapsed 50%CPU (0avgtext+0avgdata 4884maxresident)k
        0inputs+0outputs (0major+4635minor)pagefaults 0swaps
    
        run2:
        0.02user 0.05system 0:00.10elapsed 77%CPU (0avgtext+0avgdata 4920maxresident)k
        0inputs+0outputs (0major+4635minor)pagefaults 0swaps
    
        run3:
        0.03user 0.04system 0:00.14elapsed 58%CPU (0avgtext+0avgdata 4936maxresident)k
        0inputs+0outputs (0major+4635minor)pagefaults 0swaps

~~~
eridius
I went ahead and ran the tests myself, using hyperfine. In fact, I reproduced
the benchmark described on the lsd README directly, except using `ls -FG`
instead of colorls.

    
    
      Benchmark #1: lsd -la /etc/*
        Time (mean ± σ):      34.2 ms ±   0.9 ms    [User: 18.7 ms, System: 13.0 ms]
        Range (min … max):    32.9 ms …  37.6 ms
       
      Benchmark #2: ls -FG /etc/*
        Time (mean ± σ):       7.1 ms ±   0.3 ms    [User: 3.2 ms, System: 3.4 ms]
        Range (min … max):     6.4 ms …   8.9 ms
       
      Benchmark #3: exa -la /etc/*
        Time (mean ± σ):      24.7 ms ±   1.2 ms    [User: 19.3 ms, System: 25.4 ms]
        Range (min … max):    23.3 ms …  33.3 ms

------
zeroimpl
I’m not surprised there are a bunch of optimizations which could be made to
“ls”. On a related note, I once looked at source code for “rm” (specifically
“rm -rf”) and was surprised how inefficient it seemed. Not sure if it has
changed since, but it would stat files before deleting and do various
permissions checks - things that could be accomplished by just letting the
kernel try to do the delete and then checking the status code. That ought to
be faster 99% of the time.

~~~
apjana
There are several aspects of `rm` which seemed safer rather than re-writing
the defensive logic. Though, it can be done in the code without any
dependency, I agree.

------
ordu
_> no floating point arithmetic_

Does it really makes things faster? I cannot find enough motivation to test it
myself, but after I've read Abrash's Black Book, I got impression that
floating point operations are at least as fast as operations on integers and
in some cases they are faster.

While I see reasons to not use FPU, I believe that speed is not among of them.
At least on x86 where a fast hardware implementation of floating point numbers
exists.

~~~
apjana
numbers here:
[https://github.com/jarun/nnn/pull/84](https://github.com/jarun/nnn/pull/84)

~~~
ordu
Thank you. It seems have nothing to do with speed of computation, difference
in speed caused by formatting inside snprintf. If I'm right, it explains why
integers are faster in your case and slower in the Black Book.

~~~
apjana
> It seems have nothing to do with speed of computation, difference in speed
> caused by formatting inside snprintf

Both the versions have snprintf. Please share your numbers without snprintf.

~~~
ordu
> Both the versions have snprintf.

Yes, but with different format strings.

It is obvious isn't it? There are just 2 FPU operations in coolsize_f, while
coolsize_i have a bunch of 'if'. It just cannot be that the speed difference
is due to FPU operations. It must be somewhere in snprintf.

But if you asked, I tried to test my assumptions.

tl;dr:

\- seems that measuring is wrong somehow

\- in any case formatting in snprinf is much slower than calculations itself

Numbers for test "as is" are:

    
    
        Timing 500000 iterations
        F time 0.151318
        I time 0.064491
    

Then I changed:

    
    
        - snprintf(size_buf, 12, "%.*Lf%c", i, size + rem * div_2_pow_10, U[i]);
        + snprintf(size_buf, 12, "%" PRId64 "%c", size + rem * div_2_pow_10, U[i]);
    

This way I just replaced floating point formatting, with integer formatting.
It broke all tests of course, but it doesn't matter for our test: all the
calculations are the same and we are measuring them.

The numbers are:

    
    
        Timing 500000 iterations
        F time 0.056365
        I time 0.064846
    
    

Now coolsize_f is faster. But it somehow strange, because I expected
coolsize_f to be even more faster in this case (coolsize_i conditionally uses
longer format string).

Then I removed all the snprintfs (in case of coolsize_f I just commented them
out with FPU operations; in case of coolsize_i I commented out the if around
snprintf):

    
    
         Timing 500000 iterations
         F time 0.012604
         I time 0.007449
    

Now coolsize_f has just while-loop, and coolsize_i has this loop and a bunch
of 'if's. But coolsize_f is slower, wtf? I did look closer and found one
difference between loops:

    
    
        if (rem >= 512) ++size;
    

Seems it is a costly operation when inside of loop.

But then I hit a strangeness. I tried to make coolsize_f to show the same
timings as coolsize_i or lower. I removed this 'if' inside of 'while', I
discovered that type of rem is long double (so it was converted to int for
bitwise & and then back to long double?) and changed it to int. And
nevertheless I see the difference:

    
    
        Timing 500000 iterations
        F time 0.010668
        I time 0.007448
    

WTF? I changes the order of tests in test_relative_speed, so the coolsize_i
tested before coolsize_f:

    
    
        srand48(17);
        gettimeofday(&t0, NULL);
    
        for(i = 0; i < n; i++)
            coolsize_i(lrand48());
    
        gettimeofday(&t1, NULL);
        timediff(&ti, &t1, &t0);
    
        srand48(17);
        gettimeofday(&t0, NULL);
    
        for(i = 0; i < n; i++)
            coolsize_f(lrand48());
    
        gettimeofday(&t1, NULL);
        timediff(&tf, &t1, &t0);
    

and viola:

    
    
        Timing 500000 iterations
        F time 0.006025
        I time 0.011848
    

I don't know how to explain it. n seems to be large enough to make any caching
effects to be small and insignificant. Maybe it has something to do with
compilator being too clever and removing "useless" code? But why the changing
of order of time measures makes any difference? It needs more testing.

edit: formatting

~~~
apjana

        Timing 500000 iterations
        F time 0.006025
        I time 0.011848
    

Expected, right? First you called `coolsize_i(lrand48())` and then
`coolsize_f(lrand48())`? In your results F should be I and I should be F.

In all your tests the F version is slower, as per my expectation.

------
eindiran
I just downloaded `nnn` from apt and I am not seeing this performance at all.
`ls` is blazingly faster than `nnn` on this machine. I am seeing a speedup
over `ranger`, but both are taking quite a while for a directory with 196388
files in it.

    
    
      time /usr/bin/nnn -S /local
    
      /usr/bin/nnn -S /local  0.04s user 0.36s system 15% cpu 2.503 total
    
      time /usr/bin/ranger /local
    
      /usr/bin/ranger /local  0.31s user 0.02s system 12% cpu 2.667 total
    
      time /bin/ls /local
    
      /bin/ls /local  0.00s user 0.00s system 86% cpu 0.002 total
    

Perhaps this is related to the `$LS_COLORS` environment variable, as discussed
in a few other comments? It is set on this machine.

    
    
      echo $LS_COLORS
    
      no=00:fi=00:di=34:ow=34;40:ln=35:...<many more entries>...:*.sqlite=34:

~~~
apjana
The version available on apt is older. The performance numbers are from
master. I am on a device with SSD now and `nnn` master consistently gives a
better performance in 3 runs.

Here are the numbers I see with $LS_COLORS set to:

    
    
        ex=00:su=00:sg=00:ca=00:
    
        nnn (still shows dir colors as it's designed to):
        0.00user 0.00system 0:00.01elapsed 33%CPU (0avgtext+0avgdata 3420maxresident)k
        0inputs+0outputs (0major+319minor)pagefaults 0swaps
    
        ls (no colors):
        0.00user 0.01system 0:00.02elapsed 60%CPU (0avgtext+0avgdata 3724maxresident)k
        0inputs+0outputs (0major+297minor)pagefaults 0swaps
    

And going deeper, with the same $LS_COLORS, see the difference in the number
of calls (no wonder `nnn` outperforms `ls`):

    
    
        $ time strace -c /usr/local/bin/nnn /usr/bin | wc -l
        % time     seconds  usecs/call     calls    errors syscall
        ------ ----------- ----------- --------- --------- ----------------
         82.55    0.004592           2      1988           newfstatat
          8.47    0.000471         118         4           getdents
          2.82    0.000157           3        58         1 stat
          1.89    0.000105         105         1           inotify_add_watch
          1.64    0.000091           9        10           brk
          1.33    0.000074           5        15           close
          0.23    0.000013           1        14           openat
          0.18    0.000010           3         4           write
          0.18    0.000010           0        22         2 ioctl
          0.13    0.000007           0        15           fstat
          0.13    0.000007           1        14           mmap
          0.13    0.000007           7         1           inotify_rm_watch
          0.11    0.000006           1         8           read
          0.11    0.000006           6         1         1 unlink
          0.07    0.000004           4         1           sysinfo
          0.05    0.000003           3         1           lseek
          0.00    0.000000           0         2           lstat
          0.00    0.000000           0        10           mprotect
          0.00    0.000000           0         1           munmap
          0.00    0.000000           0         9           rt_sigaction
          0.00    0.000000           0         9         7 access
          0.00    0.000000           0         1           execve
          0.00    0.000000           0         1           arch_prctl
          0.00    0.000000           0         1           inotify_init1
        ------ ----------- ----------- --------- --------- ----------------
        100.00    0.005563                  2191        11 total
        0.03user 0.01system 0:00.06elapsed 74%CPU (0avgtext+0avgdata 3424maxresident)k
        2968inputs+0outputs (7major+541minor)pagefaults 0swaps
        0
        
        $ time strace -c ls -l /usr/bin | wc -l
        % time     seconds  usecs/call     calls    errors syscall
        ------ ----------- ----------- --------- --------- ----------------
         32.03    0.004882           2      1989           lstat
         30.42    0.004636           2      1989      1989 lgetxattr
         22.45    0.003421           2      1548      1548 getxattr
          6.57    0.001002           2       442           readlink
          6.15    0.000938         235         4           getdents
          0.45    0.000068           2        33         5 openat
          0.43    0.000066           6        11           munmap
          0.31    0.000047           1        34           close
          0.30    0.000045           1        40           mmap
          0.26    0.000040           2        17           lseek
          0.20    0.000031           1        30           fstat
          0.10    0.000015           4         4         4 connect
          0.09    0.000014           3         5           brk
          0.07    0.000011           3         4           socket
          0.07    0.000010           0        31           write
          0.06    0.000009           9         1           mremap
          0.03    0.000005           0        17           read
          0.00    0.000000           0        20           mprotect
          0.00    0.000000           0         2           rt_sigaction
          0.00    0.000000           0         1           rt_sigprocmask
          0.00    0.000000           0         2         2 ioctl
          0.00    0.000000           0        12        12 access
          0.00    0.000000           0         1           execve
          0.00    0.000000           0         2         2 statfs
          0.00    0.000000           0         1           arch_prctl
          0.00    0.000000           0         1           futex
          0.00    0.000000           0         1           set_tid_address
          0.00    0.000000           0         1           set_robust_list
          0.00    0.000000           0         1           prlimit64
        ------ ----------- ----------- --------- --------- ----------------
        100.00    0.015240                  6244      3562 total
        0.04user 0.06system 0:00.09elapsed 114%CPU (0avgtext+0avgdata 3676maxresident)k
        0inputs+0outputs (0major+521minor)pagefaults 0swaps
        1989

~~~
eindiran
Very cool. Do you know what the timeline is for getting the contents of master
into the version packaged for apt?

On the topic of `ls` making many more syscalls than `nnn`, its interesting
that `nnn` is using the `newfstatat` syscall. Is that capturing the
information that the `lstat`, `lgetxattr`, and `getxattr` syscalls are getting
for `ls`? If so, will that potentially limit this speed up to architectures
that make use of the newer `newfstatat` syscall?

~~~
apjana
To get the last release:
[https://github.com/jarun/nnn/issues/217#issuecomment-4692263...](https://github.com/jarun/nnn/issues/217#issuecomment-469226327)

I am not sure when I will make the next release. I need to make sure it's
stable enough through testing and that would take a while.

> newfstatat

Don't think so. For the basic info ls/nnn shows in the list mode, extended
attributes are not strictly required.

------
kazinator
That's nice; _ls_ was fast enough for my personal use more than twenty years
ago.

I don't have that many more files to list today, just some vastly larger ones.

Speaking of twenty years ago, that's about when some of the most stubborn
programmers had quit the habit of using shifts instead of multiplies, and
embraced floating-point.

~~~
apjana
`nnn` is not a replacement of `ls`, it's a file manager. This is a comparison
of the speed while listing. Turned out we don't need floating point for a file
manager, so we left that for some other useful work.

------
mereel
This seems pretty cool, but why does the source code have to be contained
within a single 4000 line file[0]? I'm fairly ignorant of the design patterns
used in Linux tool development, but it seems like splitting this into a few
more files could make the code a little easier to understand/maintain.

[0]
[https://github.com/jarun/nnn/blob/master/src/nnn.c](https://github.com/jarun/nnn/blob/master/src/nnn.c)

~~~
groovybits
I believe the concept of nnn is that all the source code is contained in one
file. If you wish to add extensions onto nnn, you can then write a c file and
include it in nnn.c

~~~
apjana
Extensions are (mostly) POSIX-compliant shell scripts (though they can be
compiled binaries also). I have (kind of) explained in the main thread the
reason I never felt the need of re-factoring the code into multiple files.

~~~
groovybits
Ah, it was shell scripts I was thinking of.

For the record, the concept makes since to me. I moved from ranger to nnn a
while back, and the performance improvement that nnn provides really shows.

Thanks for all the hard work!

~~~
apjana
Thanks for the compliment!

The plugin-based mechanism fits perfectly with the fundamental design to keep
the core file manager lean and fast, by delegating repetitive (but not
necessarily file manager-specific) tasks to the plugins.

------
mochomocha
Without variance information, the numbers reported are pretty meaningless. A
single run doesn't make a benchmark.

~~~
lwhsiao
I'll give a quick plug for hyperfine [0], which makes running a series of
benchmarks really easy.

[0]:
[https://github.com/sharkdp/hyperfine](https://github.com/sharkdp/hyperfine)

------
ape4
I would be nice to use some of these techniques to speed up * (wildcard) in
bash

~~~
SonOfLilit
Not sure they're relevant? * doesn't sort or filter by metadata, and since
it's all strings touched only once I guess there's not much in the name of
data structures to not paragraph-align?

------
peterwwillis
What's the use case where you need 'ls' to be fast?

~~~
OJFord
Interactive

------
agumonkey
timing should be amortized, when `nnn` (or similar) is on you have state and
can think and operate infinitely faster

~~~
apjana
Absolutely! I personally use xfce4-terminal is drop-down mode to preserve my
contexts over days.

