
Htop Explained - anderspitman
https://peteris.rocks/blog/htop/
======
for_xyz
The problem with htop on Linux is that once there are 200+ processes running
on the system htop takes significant portion of CPU time and utilization. This
is because htop has to go through each process entry in procfs (open, read,
close) every second and parse the text context instead of just calling
appropriate syscall like on the OpenBSD and friends to retrieve such
information.

It would help if kernel provided process information in binary form instead of
serializing it into text. Or even better to provide specific syscalls for it
like on macOS, Windows, OpenBSD, Solaris and others.

~~~
herpderperator
Significant in what way? I created 400 processes + 328 threads on a 10-year-
old CPU and htop is not using more than 1.3% CPU on a machine with 800%
available CPU power (quad-core, 8-thread)[0]. That means 0.16% total CPU used.
While I agree that it is _less_ efficient than some other ways, in what way is
that _significant_?

[0] [https://i.imgur.com/onNSHQw.png](https://i.imgur.com/onNSHQw.png)

~~~
sanjayts
On a 64 core / 512GB RAM server-class machine with 2K tasks, around 20K
threads and 12% load, `htop` lags like crazy -- pretty much unusable.

------
shanecoin
One of my favorite things about htop are some of the projects that have been
created that are modeled after htop but focus on information other than system
resources.

Cointop [0] is one of these projects that comes to mind.

[0]
[https://github.com/miguelmota/cointop](https://github.com/miguelmota/cointop)

~~~
1996
Focus on information does not requires ncurses. Try:

elinks [http://cmplot.com/accessible-index.html](http://cmplot.com/accessible-
index.html)

(the other parts require subscription)

~~~
pnako
Why is everything in scientific notation? This is about as sensible as the
concept of picodollars.

~~~
1996
The mantissa change more on a day to day basis than the exponent, allowing
more information density for the relevant parts

------
NelsonMinar
htop is an excellent tool. I appreciate his valiant effort to explain what
load average is; it's confused Unix users forever. His explanation is more or
less right but I think misses a bit of context about the old days of Unix
performance.

It used to be in the early 90s that Unix systems were quite frequently I/O
bound; disks were slow and typically bottlenecked through one bus and lots of
processes wanted disk access all the time. Swapping was also way more common.
Load average is more or less a quick count of all processes that are either
waiting on CPU or waiting on I/O. Either was likely in an old Unix system, so
adding them all up was sensible.

In modern systems your workload is probably very specifically disk bound, or
CPU bound, or network bound. It's much better to actually look at all three
kinds of load separately. htop is a great tool for doing that! but the summary
load average number is not so useful anymore.

~~~
codetrotter
> In modern systems your workload is probably very specifically disk bound, or
> CPU bound, or network bound. It's much better to actually look at all three
> kinds of load separately.

Linux recently got an interface called Pressure Stall Information that lets
you collect accurate measures of CPU, I/O and memory pressure.

[https://www.kernel.org/doc/html/latest/accounting/psi.html](https://www.kernel.org/doc/html/latest/accounting/psi.html)

~~~
bluetech
PSI is a perfect fit for htop. I sent a PR to add it some time ago
([https://github.com/hishamhm/htop/pull/890](https://github.com/hishamhm/htop/pull/890))
but it hasn't been merged yet.

------
herpderperator
It should be noted that you shouldn't really do

> $ strace uptime 2>&1 | grep open

Instead, you should do `strace -e open uptime` to select that system call.
Like Useless Use of Cat[0], this could be considered a Useless Use of Grep.

[0]
[https://en.wikipedia.org/wiki/Cat_%28Unix%29#Useless_use_of_...](https://en.wikipedia.org/wiki/Cat_%28Unix%29#Useless_use_of_cat)

Edit: Heh, when I went back to continue reading the article this was mentioned
on the following line. Oops :)

~~~
ddevault
I disagree in this case, I think it's more Unix-style to use grep when
appropriate. You should learn a lot of generally applicable tools, not the
intricate details of a few specialized tools. The "useless cat" case can be
represented as, instead of grep foo filename, grep foo < filename.

~~~
herpderperator
That's like saying you can just use `find . | grep ... | wc -l` and then
learning the hard way that you can have newlines in filenames. While I agree
you should learn lots of general tools, you should not stop there. If you have
a particular need you should consult the manpage; it is one of the ways you
become better. In the htop example it might be fine as a quick-and-dirty
method, but when teaching others like through this particular blog post you
should do so the right way.

~~~
lixtra
If your pipeline is meant to count files then the bigger problem is that it
counts directories too.

If you have filenames with newlines you may have other - less nasty - stuff
too.

So you either get a reasonable answer in 1s (or 2s if you have short look at
the output before counting). Or you spend 1h+ discussing requirements and
carefully writing a program that gives a precise answer.

Things to consider:

\- how to treat symlinks

\- how to treat hardlinks

\- what if files are added/removed to/from the directory tree while you scan

\- how to react to missing read permissions

\- is your regex on the whole path or just on the basename

\- ...

~~~
herpderperator
Well, in Unix, everything is a file[0], even directories. So when I mentioned
filenames, this includes directories. :-)

As a real-world example for those who are curious, running mkdir twice yields
errno 17, EEXIST, which is "File exists"[1]:

    
    
      root@vbox:~# mkdir directory
      root@vbox:~# mkdir directory
      mkdir: cannot create directory ‘directory’: File exists
    

But sure, the fast way could be mostly right, and maybe your goal is to just
get it done and not get better at these tools, in which case, sure.

[0]
[https://en.wikipedia.org/wiki/Everything_is_a_file](https://en.wikipedia.org/wiki/Everything_is_a_file)

[1]
[https://elixir.bootlin.com/linux/latest/source/include/uapi/...](https://elixir.bootlin.com/linux/latest/source/include/uapi/asm-
generic/errno-base.h#L21)

------
anderspitman
This article is about htop but explains tons of useful Linux commands and
idiosyncrasies along the way.

~~~
ehsankia
Yeah, honestly htop is just used as an excuse to talk about much much more.
And I like how the author works their thought process along the way.

------
redsparrow
> I decided to look everything up and document it here.

You can be a hero, too! I find this inspiring. It's nice to see such an
accessible and pragmatic way of making a contribution to the community. My
very first thought on seeing that was "I could do that!"

Thank you.

------
aljarry
I regret neither Top neither Htop show you estimates of "IO Activity time"
like in Windows 10 task manager - I need to use separate iostat to observe
that.

I found the "IO Activity time", percentage of time when IO is used, to be a
really good indicator of IO load on machine level - neither io-op per second,
neither bandwith tell much if you're already using up all available IO. Load
does not help here, as number of processes doing IO influences "load" more.

~~~
microcolonel
On Linux, that information is privileged, IIRC. so since most people use htop
as an ordinary user, they wouldn't see that anyway.

~~~
aljarry
Yeah, you need root or sudo to view iotop. But some information is (at least
on some fresh ubuntu) available without elevation, like the per-disk output
from iostat.

------
jszymborski
The one thing I don't get about htop is the progress bars... they never seem
to behave the way I'd expect them to based on the percentages, and they've got
some colour coding I'm not clear on either... surely there is something I'm
missing.

~~~
catalogia
Wait, which progress bars? Is it using SIGINFO or something?

~~~
jszymborski
Progress Bars was maybe the wrong term, I guess they're more "meters" since
they're used for CPU and RAM usage.

------
samfriedman
>There will be a lot of output. We can grep for the open system call. But that
will not really work since strace outputs everything to the standard error
(stderr) stream. We can redirect the stderr to the standard output (stdout)
stream with 2>&1.

Besides being a great explanation of htop, I like the way this article
captures the way I - far from a shell guru - tend to think when putting
together a few steps in the terminal. And even then it shows that it pays to
read the man page too!

------
zaphar
I am convinced that load average on a machine is one of the most misleading
statistics you can view. It never means what you think it means and half the
time it doesn't even indicate a real problem when it's high.

------
vesinisa
> One process runs for a bit of time, then it is suspended while the other
> processes waiting to run take turns running for a while. The time slice is
> usually a few milliseconds so you don't really notice it that much when your
> system is not under high load. (It'd be really interesting to find out how
> long time slices usually are in Linux.)

Isn't this the famous kernel HZ? It was originally 100 (interrupts/second),
but nowadays often 250 or 1000:

[http://man7.org/linux/man-pages/man7/time.7.html](http://man7.org/linux/man-
pages/man7/time.7.html)

~~~
bri3d
It’s much more complex than that these days. With the CFS scheduler a process
will run for somewhere between the target latency (basically the size of slice
that N processes waiting to be scheduled are competing for, I think defaulted
to 20ms) as the upper bound and the minimum granularity (the smallest slice
that may be granted to a process being scheduled) as the lower bound, I think
defaulted to 4ms.

This is made more complex by the ability to configure the scheduler with high
granularity, including the ability to schedule different processors and
process groups with different schedulers (and the rules that govern how the
schedulers can then preempt each other).

~~~
vesinisa
Still, no scheduler can operate on granularity lower than the fundamental
kernel tick rate. Which is by default 4 ms (1000 ms / 250), as you said.

------
jmercha
Fascinating! I never knew I needed a htop t-shirt until I read this article!

------
jasoneckert
Excellent htop tutorial - I'll be sharing the link with my students!

------
carlchenet
Awesome article, this HN thread should indicate it was originally published on
HN in 2016. As mentioned on the front page "#1 on Hacker News on December 2,
2016"

~~~
helb
here:
[https://news.ycombinator.com/item?id=13087904](https://news.ycombinator.com/item?id=13087904)

------
zepearl
Nice explanations.

Btw. additional recommendation (especially when a lot of CPUs and/or disks are
involved + you want to keep an eye on multiple things at once): nmon

------
pmoriarty
Is there any way to make it so that htop doesn't clear the screen when it
exits?

Seeing the last page of htop's output after it exits is usually useful to me.

~~~
Operyl
You could suspend the terminal with Ctrl-S and unsuspend with Ctrl-Q?

------
rajandatta
Excellent article and very well done in-depth work.

------
kccqzy
This is a great explanation of everything in the output of htop and related,
but I suggest the author clean up the prose a little bit to make it a bit less
conversational and easier to read.

------
Teknoman117
another of my favorites is glances.

------
ngcc_hk
Just to say great. Thanks thanks

------
butuzov
It's like a good book.

------
sytelus
ooh... don't miss the t-shirt!

------
Jahak
interesting article, thank you

