Hacker News new | past | comments | ask | show | jobs | submit login
Performance of nnn vs. ls (github.com)
84 points by apjana 8 months ago | hide | past | web | favorite | 57 comments



Seems like running `ls` takes twice as long as running `nnn`, but I wonder how long it takes to type each...

    $ ipython
    In [1]: %time input()
    ls
    Wall time: 324 ms
    Out[1]: 'ls'
    
    In [2]: %time input()
    nnn
    Wall time: 711 ms
    Out[2]: 'nnn'
In other words, `nnn` takes twice as long (on my setup, including a qwerty keyboard and my fingers and muscle memory) and it's on the same order of magnitude as running them, so I guess they're more or less equal.

This is all tongue in cheek, but my point is that the important optimization is of user productivity and not of runtime, and in that respect as far as I'm aware `Total Commander` on Windows and its open source plugin-compatible clone `doublecmd` that runs on both Windows and Linux are far superior to anything else in the market. Don't let the Windows-3.11 reminiscent design scare you, with Total Commander you'll be as productive doing complex file operations as those vim masters are doing complex text transformations, and you won't ever be able to go back to any other file manager.

edit: I misread, actually typing them is a decimal order of magnitude slower than running them on 2083 files.


`nnn` is covered there. It's not a one-shot utility, but a file manager.

Also, you would be happy to know I use the following alias on my system to fire it in 162 ms:

    alias n='nnn -isl'


Naturally :-) That's why "tongue in cheek". I still can't imagine myself switching from Total Commander, it's so amazingly powerful.


I alias 'ls' to 'd' when I'm setting up an environment.


Do you mean that you have an alias like this:

  alias d='ls'
Or is there a program called `d` that is a faster version of `ls` that you're aliasing `ls` to?


Just an alias. I figure I could use a single letter command for something I do like 1/3 of the time on the commandline, especially since it's on the opposite hand from the enter key.

ls is actually pretty well optimized for an QWERTY layout already so it's not a big win, but it's something I've been doing for 25 years now so the habit is well ingrained.

Usually the alias is now:

  alias d="ls -F --color=auto" 
on Linux systems.


It was the default since Mandriva/Linux Mandrake 6.2 too!


Why not just use a console file manager, like mc?


`nnn` is a console file manager.


`d` is taken by the much more important command `git diff`. I don't even have an alias for `ls`, I seldom use it, I mostly run `s` (which is an alias of `git status` of course).


Windows-3.11 reminiscent design

Strictly speaking, it's an MS DOS design - a descendant of Norton Commander.


I meant the graphic design, which is what put most people off.


The formal name is an orthodox file manager.


An inexplicable and incomprehensible choice, given that they've all been 'commander' for 30+ years. Norton Commander, Midnight Commander, Windows Commander, Total Commander, Derek Smart's Desktop Commander, etc. Clearly they should just call it a file commander. Fits much better in an overweening nerd-judgmental sort of recommendation too - "You're using a file manager? You should really switch to a file commander "


Just before this made it to the front page, I happened to see this toot about slow ls: https://ubuntu.social/@Simon/102054773088776436

I'm curious if that would make a difference.


For reference, from the system where the numbers were taken:

     $ echo $LS_COLORS
     
     $
LS_COLORS wasn't set. The readings were taken for ls default performance.


LS_COLORS mustn't be set for colors to be used. If you unset it the default color schema will be used.

There was an article about this posted to HN a week ago https://news.ycombinator.com/item?id=19761159


You'll find the original discussed on Hacker News at https://news.ycombinator.com/item?id=19761159 .


A comparison to exa[0] would be interesting as well

[0] https://github.com/ogham/exa


I am on a much faster system right now. From 2 sample runs of each, ls is consistently way (~5 times) faster than exa:

exa:

    run1:
    0.03user 0.02system 0:00.11elapsed 56%CPU (0avgtext+0avgdata 8672maxresident)k
    208inputs+0outputs (1major+1333minor)pagefaults 0swaps

    run2:
    0.04user 0.00system 0:00.13elapsed 42%CPU (0avgtext+0avgdata 8560maxresident)k
    0inputs+0outputs (0major+1310minor)pagefaults 0swaps
ls:

    run1:
    0.00user 0.01system 0:00.02elapsed 69%CPU (0avgtext+0avgdata 3812maxresident)k
    0inputs+0outputs (0major+303minor)pagefaults 0swaps

    run2:
    0.01user 0.00system 0:00.02elapsed 62%CPU (0avgtext+0avgdata 3840maxresident)k
    0inputs+0outputs (0major+305minor)pagefaults 0swaps



lsd:

    run1:
    0.03user 0.04system 0:00.16elapsed 50%CPU (0avgtext+0avgdata 4884maxresident)k
    0inputs+0outputs (0major+4635minor)pagefaults 0swaps

    run2:
    0.02user 0.05system 0:00.10elapsed 77%CPU (0avgtext+0avgdata 4920maxresident)k
    0inputs+0outputs (0major+4635minor)pagefaults 0swaps

    run3:
    0.03user 0.04system 0:00.14elapsed 58%CPU (0avgtext+0avgdata 4936maxresident)k
    0inputs+0outputs (0major+4635minor)pagefaults 0swaps


I went ahead and ran the tests myself, using hyperfine. In fact, I reproduced the benchmark described on the lsd README directly, except using `ls -FG` instead of colorls.

  Benchmark #1: lsd -la /etc/*
    Time (mean ± σ):      34.2 ms ±   0.9 ms    [User: 18.7 ms, System: 13.0 ms]
    Range (min … max):    32.9 ms …  37.6 ms
   
  Benchmark #2: ls -FG /etc/*
    Time (mean ± σ):       7.1 ms ±   0.3 ms    [User: 3.2 ms, System: 3.4 ms]
    Range (min … max):     6.4 ms …   8.9 ms
   
  Benchmark #3: exa -la /etc/*
    Time (mean ± σ):      24.7 ms ±   1.2 ms    [User: 19.3 ms, System: 25.4 ms]
    Range (min … max):    23.3 ms …  33.3 ms


I’m not surprised there are a bunch of optimizations which could be made to “ls”. On a related note, I once looked at source code for “rm” (specifically “rm -rf”) and was surprised how inefficient it seemed. Not sure if it has changed since, but it would stat files before deleting and do various permissions checks - things that could be accomplished by just letting the kernel try to do the delete and then checking the status code. That ought to be faster 99% of the time.


There are several aspects of `rm` which seemed safer rather than re-writing the defensive logic. Though, it can be done in the code without any dependency, I agree.


> no floating point arithmetic

Does it really makes things faster? I cannot find enough motivation to test it myself, but after I've read Abrash's Black Book, I got impression that floating point operations are at least as fast as operations on integers and in some cases they are faster.

While I see reasons to not use FPU, I believe that speed is not among of them. At least on x86 where a fast hardware implementation of floating point numbers exists.



Thank you. It seems have nothing to do with speed of computation, difference in speed caused by formatting inside snprintf. If I'm right, it explains why integers are faster in your case and slower in the Black Book.


> It seems have nothing to do with speed of computation, difference in speed caused by formatting inside snprintf

Both the versions have snprintf. Please share your numbers without snprintf.


> Both the versions have snprintf.

Yes, but with different format strings.

It is obvious isn't it? There are just 2 FPU operations in coolsize_f, while coolsize_i have a bunch of 'if'. It just cannot be that the speed difference is due to FPU operations. It must be somewhere in snprintf.

But if you asked, I tried to test my assumptions.

tl;dr:

- seems that measuring is wrong somehow

- in any case formatting in snprinf is much slower than calculations itself

Numbers for test "as is" are:

    Timing 500000 iterations
    F time 0.151318
    I time 0.064491
Then I changed:

    - snprintf(size_buf, 12, "%.*Lf%c", i, size + rem * div_2_pow_10, U[i]);
    + snprintf(size_buf, 12, "%" PRId64 "%c", size + rem * div_2_pow_10, U[i]);
This way I just replaced floating point formatting, with integer formatting. It broke all tests of course, but it doesn't matter for our test: all the calculations are the same and we are measuring them.

The numbers are:

    Timing 500000 iterations
    F time 0.056365
    I time 0.064846

Now coolsize_f is faster. But it somehow strange, because I expected coolsize_f to be even more faster in this case (coolsize_i conditionally uses longer format string).

Then I removed all the snprintfs (in case of coolsize_f I just commented them out with FPU operations; in case of coolsize_i I commented out the if around snprintf):

     Timing 500000 iterations
     F time 0.012604
     I time 0.007449
Now coolsize_f has just while-loop, and coolsize_i has this loop and a bunch of 'if's. But coolsize_f is slower, wtf? I did look closer and found one difference between loops:

    if (rem >= 512) ++size;
Seems it is a costly operation when inside of loop.

But then I hit a strangeness. I tried to make coolsize_f to show the same timings as coolsize_i or lower. I removed this 'if' inside of 'while', I discovered that type of rem is long double (so it was converted to int for bitwise & and then back to long double?) and changed it to int. And nevertheless I see the difference:

    Timing 500000 iterations
    F time 0.010668
    I time 0.007448
WTF? I changes the order of tests in test_relative_speed, so the coolsize_i tested before coolsize_f:

    srand48(17);
    gettimeofday(&t0, NULL);

    for(i = 0; i < n; i++)
        coolsize_i(lrand48());

    gettimeofday(&t1, NULL);
    timediff(&ti, &t1, &t0);

    srand48(17);
    gettimeofday(&t0, NULL);

    for(i = 0; i < n; i++)
        coolsize_f(lrand48());

    gettimeofday(&t1, NULL);
    timediff(&tf, &t1, &t0);
and viola:

    Timing 500000 iterations
    F time 0.006025
    I time 0.011848
I don't know how to explain it. n seems to be large enough to make any caching effects to be small and insignificant. Maybe it has something to do with compilator being too clever and removing "useless" code? But why the changing of order of time measures makes any difference? It needs more testing.

edit: formatting


    Timing 500000 iterations
    F time 0.006025
    I time 0.011848
Expected, right? First you called `coolsize_i(lrand48())` and then `coolsize_f(lrand48())`? In your results F should be I and I should be F.

In all your tests the F version is slower, as per my expectation.


Very interesting indeed! If you can come up with a faster and accurate floating point alternative do raise a PR by all means. Many thanks for running the tests and sharing the results!


Yes, even I ran the tests and noticed printing the double in snprintf() is the bottleneck Any idea how we can do that faster and we get the format we need in `nnn`?


I believe all these utilities must be memory bound, and having short `int`s instead of `float`s would be a less traffic on the bus.


no floating point arithmetic was still a bit of a coding mantra circa 1993.


I just downloaded `nnn` from apt and I am not seeing this performance at all. `ls` is blazingly faster than `nnn` on this machine. I am seeing a speedup over `ranger`, but both are taking quite a while for a directory with 196388 files in it.

  time /usr/bin/nnn -S /local

  /usr/bin/nnn -S /local  0.04s user 0.36s system 15% cpu 2.503 total

  time /usr/bin/ranger /local

  /usr/bin/ranger /local  0.31s user 0.02s system 12% cpu 2.667 total

  time /bin/ls /local

  /bin/ls /local  0.00s user 0.00s system 86% cpu 0.002 total
Perhaps this is related to the `$LS_COLORS` environment variable, as discussed in a few other comments? It is set on this machine.

  echo $LS_COLORS

  no=00:fi=00:di=34:ow=34;40:ln=35:...<many more entries>...:*.sqlite=34:


The version available on apt is older. The performance numbers are from master. I am on a device with SSD now and `nnn` master consistently gives a better performance in 3 runs.

Here are the numbers I see with $LS_COLORS set to:

    ex=00:su=00:sg=00:ca=00:

    nnn (still shows dir colors as it's designed to):
    0.00user 0.00system 0:00.01elapsed 33%CPU (0avgtext+0avgdata 3420maxresident)k
    0inputs+0outputs (0major+319minor)pagefaults 0swaps

    ls (no colors):
    0.00user 0.01system 0:00.02elapsed 60%CPU (0avgtext+0avgdata 3724maxresident)k
    0inputs+0outputs (0major+297minor)pagefaults 0swaps
And going deeper, with the same $LS_COLORS, see the difference in the number of calls (no wonder `nnn` outperforms `ls`):

    $ time strace -c /usr/local/bin/nnn /usr/bin | wc -l
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
     82.55    0.004592           2      1988           newfstatat
      8.47    0.000471         118         4           getdents
      2.82    0.000157           3        58         1 stat
      1.89    0.000105         105         1           inotify_add_watch
      1.64    0.000091           9        10           brk
      1.33    0.000074           5        15           close
      0.23    0.000013           1        14           openat
      0.18    0.000010           3         4           write
      0.18    0.000010           0        22         2 ioctl
      0.13    0.000007           0        15           fstat
      0.13    0.000007           1        14           mmap
      0.13    0.000007           7         1           inotify_rm_watch
      0.11    0.000006           1         8           read
      0.11    0.000006           6         1         1 unlink
      0.07    0.000004           4         1           sysinfo
      0.05    0.000003           3         1           lseek
      0.00    0.000000           0         2           lstat
      0.00    0.000000           0        10           mprotect
      0.00    0.000000           0         1           munmap
      0.00    0.000000           0         9           rt_sigaction
      0.00    0.000000           0         9         7 access
      0.00    0.000000           0         1           execve
      0.00    0.000000           0         1           arch_prctl
      0.00    0.000000           0         1           inotify_init1
    ------ ----------- ----------- --------- --------- ----------------
    100.00    0.005563                  2191        11 total
    0.03user 0.01system 0:00.06elapsed 74%CPU (0avgtext+0avgdata 3424maxresident)k
    2968inputs+0outputs (7major+541minor)pagefaults 0swaps
    0
    
    $ time strace -c ls -l /usr/bin | wc -l
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
     32.03    0.004882           2      1989           lstat
     30.42    0.004636           2      1989      1989 lgetxattr
     22.45    0.003421           2      1548      1548 getxattr
      6.57    0.001002           2       442           readlink
      6.15    0.000938         235         4           getdents
      0.45    0.000068           2        33         5 openat
      0.43    0.000066           6        11           munmap
      0.31    0.000047           1        34           close
      0.30    0.000045           1        40           mmap
      0.26    0.000040           2        17           lseek
      0.20    0.000031           1        30           fstat
      0.10    0.000015           4         4         4 connect
      0.09    0.000014           3         5           brk
      0.07    0.000011           3         4           socket
      0.07    0.000010           0        31           write
      0.06    0.000009           9         1           mremap
      0.03    0.000005           0        17           read
      0.00    0.000000           0        20           mprotect
      0.00    0.000000           0         2           rt_sigaction
      0.00    0.000000           0         1           rt_sigprocmask
      0.00    0.000000           0         2         2 ioctl
      0.00    0.000000           0        12        12 access
      0.00    0.000000           0         1           execve
      0.00    0.000000           0         2         2 statfs
      0.00    0.000000           0         1           arch_prctl
      0.00    0.000000           0         1           futex
      0.00    0.000000           0         1           set_tid_address
      0.00    0.000000           0         1           set_robust_list
      0.00    0.000000           0         1           prlimit64
    ------ ----------- ----------- --------- --------- ----------------
    100.00    0.015240                  6244      3562 total
    0.04user 0.06system 0:00.09elapsed 114%CPU (0avgtext+0avgdata 3676maxresident)k
    0inputs+0outputs (0major+521minor)pagefaults 0swaps
    1989


Very cool. Do you know what the timeline is for getting the contents of master into the version packaged for apt?

On the topic of `ls` making many more syscalls than `nnn`, its interesting that `nnn` is using the `newfstatat` syscall. Is that capturing the information that the `lstat`, `lgetxattr`, and `getxattr` syscalls are getting for `ls`? If so, will that potentially limit this speed up to architectures that make use of the newer `newfstatat` syscall?


To get the last release: https://github.com/jarun/nnn/issues/217#issuecomment-4692263...

I am not sure when I will make the next release. I need to make sure it's stable enough through testing and that would take a while.

> newfstatat

Don't think so. For the basic info ls/nnn shows in the list mode, extended attributes are not strictly required.


That's nice; ls was fast enough for my personal use more than twenty years ago.

I don't have that many more files to list today, just some vastly larger ones.

Speaking of twenty years ago, that's about when some of the most stubborn programmers had quit the habit of using shifts instead of multiplies, and embraced floating-point.


`nnn` is not a replacement of `ls`, it's a file manager. This is a comparison of the speed while listing. Turned out we don't need floating point for a file manager, so we left that for some other useful work.


This seems pretty cool, but why does the source code have to be contained within a single 4000 line file[0]? I'm fairly ignorant of the design patterns used in Linux tool development, but it seems like splitting this into a few more files could make the code a little easier to understand/maintain.

[0] https://github.com/jarun/nnn/blob/master/src/nnn.c


I wish some open source enthusiast offered to contribute in return for the re-factoring you are looking for ;). I am (generally speaking) the lone developer of this project for 2 years. And with ctags and cscope I don't find any problem in developing.


For comparison, the `ls` in GNU coreutils is a single 5,310 line C file[0], so I think nnn's codebase pretty much par for the course (nnn has much more functionality than ls with 1K fewer lines of code). Splitting into different files makes more sense when you have a bunch of independent subsystems, but nnn is just a file manager. As someone who has dug through nnn's source code enough to track down a bug and propose a fix, I found it to be a pretty easy codebase to work with.

[0] https://github.com/coreutils/coreutils/blob/master/src/ls.c


I believe the concept of nnn is that all the source code is contained in one file. If you wish to add extensions onto nnn, you can then write a c file and include it in nnn.c


Extensions are (mostly) POSIX-compliant shell scripts (though they can be compiled binaries also). I have (kind of) explained in the main thread the reason I never felt the need of re-factoring the code into multiple files.


Ah, it was shell scripts I was thinking of.

For the record, the concept makes since to me. I moved from ranger to nnn a while back, and the performance improvement that nnn provides really shows.

Thanks for all the hard work!


Thanks for the compliment!

The plugin-based mechanism fits perfectly with the fundamental design to keep the core file manager lean and fast, by delegating repetitive (but not necessarily file manager-specific) tasks to the plugins.


Without variance information, the numbers reported are pretty meaningless. A single run doesn't make a benchmark.


I'll give a quick plug for hyperfine [0], which makes running a series of benchmarks really easy.

[0]: https://github.com/sharkdp/hyperfine


They were kind of consistent across multiple runs.


I would be nice to use some of these techniques to speed up * (wildcard) in bash


Not sure they're relevant? * doesn't sort or filter by metadata, and since it's all strings touched only once I guess there's not much in the name of data structures to not paragraph-align?


What's the use case where you need 'ls' to be fast?


Interactive


timing should be amortized, when `nnn` (or similar) is on you have state and can think and operate infinitely faster


Absolutely! I personally use xfce4-terminal is drop-down mode to preserve my contexts over days.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: