Hacker News new | past | comments | ask | show | jobs | submit login
Htop Explained (peteris.rocks)
818 points by anderspitman 17 days ago | hide | past | web | favorite | 98 comments

The problem with htop on Linux is that once there are 200+ processes running on the system htop takes significant portion of CPU time and utilization. This is because htop has to go through each process entry in procfs (open, read, close) every second and parse the text context instead of just calling appropriate syscall like on the OpenBSD and friends to retrieve such information.

It would help if kernel provided process information in binary form instead of serializing it into text. Or even better to provide specific syscalls for it like on macOS, Windows, OpenBSD, Solaris and others.

Significant in what way? I created 400 processes + 328 threads on a 10-year-old CPU and htop is not using more than 1.3% CPU on a machine with 800% available CPU power (quad-core, 8-thread)[0]. That means 0.16% total CPU used. While I agree that it is _less_ efficient than some other ways, in what way is that _significant_?

[0] https://i.imgur.com/onNSHQw.png

On a 64 core / 512GB RAM server-class machine with 2K tasks, around 20K threads and 12% load, `htop` lags like crazy -- pretty much unusable.

On my system there are 240 tasks running with ~1700 threads. Htop is using 6% of single core with cgroup column disabled and 9% with it enabled. It spends most of the time in kernel space so it's not htops fault.

> While I agree that it is _less_ efficient than some other ways, in what way is that _significant_?

This gets noticeable when every connected user/session runs the htop.

Try many small or tiny processes, and soon htop overhead becomes significant (up to 5% total utilization in my situation), that's not exactly efficient.

I remember way back in the early years of the Third Age, I had to write a process accounting system that supported all kinds of Unices (HP-UX, AIX, Solaris, FreeBSD, Linux ...). And you're right, there's a plethora of other options other than procfs, although IIRC, Linux wasn't the only one I had to support with that variant. Wasn't Solaris one of them?

I will say, htop is a lot more efficient than GNU top, despite its functionality. I do not use it in the (default?) mode where it lists all the threads, because that is nuts.

Traditional Unix implementation of ps and similar tools work by directly reading the appropriate data structures from kernel (through /dev/kmem or something to that effect). Modern BSD implementations have libkvm.a, that abstracts that somehow, but still directly reads stuff from kernel space.

I don't know, but I don't see doing random reads from kernel memory as particularly sane API to get list of processes and procfs is several orders of magnitude cleaner solution.

Getting a list of numbers from the kernel by serialising them as text in a pretend filesystem is a several orders of magnitude cleaner solution?

> Traditional Unix implementation of ps and similar tools work by directly reading the appropriate data structures from kernel (through /dev/kmem or something to that effect).

This is not correct - /dev/kmem and similar are typically only readable by root. If what you say were correct, ps and friends wouldn't work for unprivileged users (unless they were setuid root, which they're not).

Some version of *BSDs probably fixed that with some sysctl interface, but on older Unixes you would read the kernel memory to get that system information.

You can see a ps.c implementation here: https://searchcode.com/codesearch/view/29853364/

which uses kvm library: https://www.freebsd.org/cgi/man.cgi?query=kvm&sektion=3&apro...

FWIW, in FreeBSD libkvm also uses the sysctl interface, it doesn’t read kernel memory directly.

Its not meant to be used 24/7, its just used as a guide/diagnsostic to how the machine and its processes are performing! I use it for maybe about 10 seconds.

Some people use tmux when connecting to their virtual machines on the server and forget to quit the htop instances. This adds to the total CPU utilization really quick.

Well, you are right (and top has the same problem if not worse), but I'm always surprised what these tools are doing that they take significat CPU time. Parsing 200 virtual text files every second for sure should not.

> Or even better to provide specific syscalls for it like on macOS, Windows, OpenBSD, Solaris and others.

top manages to take 6% CPU on my Macbook.

I have wondered if it made sense to re-write these top style tools using ebpf https://lwn.net/Articles/740157/ but I think ebpf requires root.

Lennart, we know it's you.

One of my favorite things about htop are some of the projects that have been created that are modeled after htop but focus on information other than system resources.

Cointop [0] is one of these projects that comes to mind.

[0] https://github.com/miguelmota/cointop

intel_gpu_top helped me solve a mysterious performance issue on a MacBook after countless hours of fruitless investigation. Overheating and throttling was an issue but even after I fixed it the system would lag hard - instantly when I used the external 4k display, and after a while on the internal 1440p screen. Turns out cool-retro-term was maxing out the integrated Intel GPU which caused the entire system to stutter and lag.

Unfortunately both the MBP and my current XPS 15 are unable to drive cool-retro-term on a 4k display with the CPU integrated graphics, and they both overheat and throttle if I use the nvidia graphics card :/

It's a really cool terminal though: https://github.com/Swordfish90/cool-retro-term

It's amazing that we think it's a good idea to pack powerful hardware into laptops that are too thin to actually make use of that hardware.

Laptops have very poor cooling. I have a Clevo laptop with a great processor but it will sometimes throttle itself to cool down. Great for small bursts of activity such as compilation but I don't understand how they could market these laptops as gaming machines. Running ffmpeg stabilizes the temperature at a healthy 96 degrees.

It's more amazing to me that this modern powerful hardware can't emulate technology from 1983 without overheating.

Modern powerful hardware has a hard time emulating a glass of water with good fidelity. Reproducing physical effects like ghosting is often harder than it looks.

There's a lot of different usages that may not heat the GPU as much. Also Windows might have better thermal management in the drivers.

CPU wise, Intel defines their TDP as the average heat dissipation, but the CPU can boost higher than this. But from what I understand they tell manufactures to design to the TDP.

Most importantly, nvtop: https://github.com/Syllo/nvtop

"NVIDIA GPUs htop like monitoring tool"

Shameless plug: aria2p. I built an interactive interface very similar to htop to see your aria2 downloads progress.


Curious to see more examples


Well, it has "top" in the name. ^__~ I would say that jnettop is more similar to nethogs than htop...

I know iotop exists, but I've never used it.

htop can do all (most?) of what iotop can.

Press S in htop and you can select to show i/o-related data, including number of bytes and number of operations in total and per second.

You will need to be root to look at most i/o related data.

Have any more examples?

Focus on information does not requires ncurses. Try:

elinks http://cmplot.com/accessible-index.html

(the other parts require subscription)

Why is everything in scientific notation? This is about as sensible as the concept of picodollars.

The mantissa change more on a day to day basis than the exponent, allowing more information density for the relevant parts

To me a browser for this seems like overkill, but I can understand the argument that "everyone already has a browser open", even if I don't think that it leads to good places.

It doesn't require, but damn, my life would be so bleak without ncurses.

htop is an excellent tool. I appreciate his valiant effort to explain what load average is; it's confused Unix users forever. His explanation is more or less right but I think misses a bit of context about the old days of Unix performance.

It used to be in the early 90s that Unix systems were quite frequently I/O bound; disks were slow and typically bottlenecked through one bus and lots of processes wanted disk access all the time. Swapping was also way more common. Load average is more or less a quick count of all processes that are either waiting on CPU or waiting on I/O. Either was likely in an old Unix system, so adding them all up was sensible.

In modern systems your workload is probably very specifically disk bound, or CPU bound, or network bound. It's much better to actually look at all three kinds of load separately. htop is a great tool for doing that! but the summary load average number is not so useful anymore.

> In modern systems your workload is probably very specifically disk bound, or CPU bound, or network bound. It's much better to actually look at all three kinds of load separately.

Linux recently got an interface called Pressure Stall Information that lets you collect accurate measures of CPU, I/O and memory pressure.


PSI is a perfect fit for htop. I sent a PR to add it some time ago (https://github.com/hishamhm/htop/pull/890) but it hasn't been merged yet.

Kernel 4.20, so probably not yet available on many production systems.


- Horribly under-specced machine (8GB RAM) with way too many (~150) tabs open -> continuous swap stalls that last for minutes at a time and freeze the mouse cursor: 5-second load average of 12-20

- 6-year old Android phone doing tasks in the background, generally feeling more than adequately performant, and lukewarm enough that you aren't sure if it's your hand or the phone generating the warmth: load average of 12-13

- Building Linux with -j64 to see what would happen: 5-second load average of 0.97... 0.99... 49.40... 127.21... ^C^C^C^C^C^C^C^C^C 180.66... ^C^C^C^C^C^CCCCCCC^^^CCcccCCcCC 251.22... ^C^C^C^C^C^C ^C^C^C^C ^C^C^C ^C^C^C^C^C^C ^^^CCCCC^C^C^C (mouse finally moves a few pixels, terminal notices first ^C) 245.55... 220.00... 205.42... 198.94...

- Resuming from hibernation with 20GB of data already in swap: 0.21... 0.15... 0.10... 251.50... 280.12... 301.69... 362.22... 389.91... 402.40... 308.56... 297.21... 260.66... 254.99... (etc; this one takes a while)

It should be noted that you shouldn't really do

> $ strace uptime 2>&1 | grep open

Instead, you should do `strace -e open uptime` to select that system call. Like Useless Use of Cat[0], this could be considered a Useless Use of Grep.

[0] https://en.wikipedia.org/wiki/Cat_%28Unix%29#Useless_use_of_...

Edit: Heh, when I went back to continue reading the article this was mentioned on the following line. Oops :)

These are two different commands. "strace | grep foo" looks for any line containing foo. It will find "foobar" and "food" system calls. It will find the word "foo" in the (abridged) string data that it prints out (read(..., "foo...", ...)).

Meanwhile, strace's -e filter will find exact matches of syscalls that are named on the command line.

Obviously the author wants the second one, but it is hardly useless to grep. And, once you know how to use grep, you know how to do this sort of search for every program ever written. It is nice that strace has its own budget string matching built in... but knowing how strace works only tells you how strace works. Knowing how grep works lets you do this sort of thing to any program that outputs text.

(A lot of work has been done to try and make something as good as "strace -e" but generically; for example, Powershell's Select-Object cmdlet. I have never managed to get that to do anything useful, but they did manage to break grep, erm ... Select-String, in the process. Great work!)

As I have mentioned in another reply, just because you know grep does not mean you should stop there. Especially when teaching others, you should find the optimal way and mention that you _could_ also use grep if you were in a rush.

You could always do things the quick-and-dirty way, but does that help you grow as a programmer? You could write Python code that looks like C, like many people do when they come from a C background, or you could learn how to write Pythonic code by reading the documentation and examples.

> Knowing how grep works lets you do this sort of thing to any program that outputs text.

It's worth noting that I could say the same about strace. Once you know strace, you could run it against any program that uses system calls, which by the way, is many. :-)

I disagree in this case, I think it's more Unix-style to use grep when appropriate. You should learn a lot of generally applicable tools, not the intricate details of a few specialized tools. The "useless cat" case can be represented as, instead of grep foo filename, grep foo < filename.

That's like saying you can just use `find . | grep ... | wc -l` and then learning the hard way that you can have newlines in filenames. While I agree you should learn lots of general tools, you should not stop there. If you have a particular need you should consult the manpage; it is one of the ways you become better. In the htop example it might be fine as a quick-and-dirty method, but when teaching others like through this particular blog post you should do so the right way.

If your pipeline is meant to count files then the bigger problem is that it counts directories too.

If you have filenames with newlines you may have other - less nasty - stuff too.

So you either get a reasonable answer in 1s (or 2s if you have short look at the output before counting). Or you spend 1h+ discussing requirements and carefully writing a program that gives a precise answer.

Things to consider:

- how to treat symlinks

- how to treat hardlinks

- what if files are added/removed to/from the directory tree while you scan

- how to react to missing read permissions

- is your regex on the whole path or just on the basename

- ...

Well, in Unix, everything is a file[0], even directories. So when I mentioned filenames, this includes directories. :-)

As a real-world example for those who are curious, running mkdir twice yields errno 17, EEXIST, which is "File exists"[1]:

  root@vbox:~# mkdir directory
  root@vbox:~# mkdir directory
  mkdir: cannot create directory ‘directory’: File exists
But sure, the fast way could be mostly right, and maybe your goal is to just get it done and not get better at these tools, in which case, sure.

[0] https://en.wikipedia.org/wiki/Everything_is_a_file

[1] https://elixir.bootlin.com/linux/latest/source/include/uapi/...

> learning the hard way that you can have newlines in filenames

Speaking as someone who is using Unix-like systems since the days of Santa Cruz Operation and Linux on a 50 MB disk partition - not everything that is permissible is auspicious.

And most often that's because the permissions were too lax to begin with. That's what you get when a bunch of pot-smoking hippies (X) draw your OS specifications.

I avoid using even spaces in file names, for this specific reason.


(X) - and I say that in the best way possible, though some may be inclined to disagree.

"...he and brethren were attempting to make a small fortune for themselves by secreting away an item - a printed book - that was easy enough to get in 1640 but near impossible to get in Tristan's age, making it of great worth."

"So you're thieves and chancers," said I approvingly.

"No," he objected.

-- The Rise And Fall Of D.O.D.O. by Neal Stephenson and Nicole Galland

I'm sympathetic to this view, but the 'strace' flags are worth learning. Something like

    strace -e trace=process
is very handy, the and grep pattern to match that would be impossibly difficult, as well as OS-dependent and prone to obsolescence over time.

You might need to cover openat too. I think newer glibc uses the openat(AT_FDCWD,...) syscall for all open calls.

If the cat or grep makes the command more clear or easier to write then it's not useless.

Agreed, but you should also keep in mind that this means extra work. Often the difference is negligible. But if you e.g. strace a syscall-wise very busy process and pipe that into grep, all the logging strace does that is then piped over to grep just to be discarded by grep eventually, may actually affect performance a lot; while strace -e avoids this extra work.

I tried both, but the `-e open` option doesn't return anything

  $ strace uptime 2>&1 | grep open
  openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
  openat(AT_FDCWD, "/proc/uptime", O_RDONLY) = 3

  $ strace -e open uptime         
  10:12:14 up 2 days, 18:52,  1 user,  load average: 1.22, 1.30, 1.71
  +++ exited with 0 +++

strace -- version 5.3 Optional features enabled: stack-trace=libunwind stack-demangle m32-mpers mx32-mpers

After digging in the documentation, I finally found something working:

  $ strace -e trace=openat uptime
EDIT: got it. my system calls are openat, not open. Hence the following works

  $ strace -e openat uptime

> strace -e file uptime

Is probably easier to remember. You can filter from there.

strace -fe file,network

This article is about htop but explains tons of useful Linux commands and idiosyncrasies along the way.

Yeah, honestly htop is just used as an excuse to talk about much much more. And I like how the author works their thought process along the way.

This is such a great article - I’ve worked with linux for more than a decade and never really understood what “setuid root” actually meant or that “kill” is a builtin to Bash

> I decided to look everything up and document it here.

You can be a hero, too! I find this inspiring. It's nice to see such an accessible and pragmatic way of making a contribution to the community. My very first thought on seeing that was "I could do that!"

Thank you.

I regret neither Top neither Htop show you estimates of "IO Activity time" like in Windows 10 task manager - I need to use separate iostat to observe that.

I found the "IO Activity time", percentage of time when IO is used, to be a really good indicator of IO load on machine level - neither io-op per second, neither bandwith tell much if you're already using up all available IO. Load does not help here, as number of processes doing IO influences "load" more.

To be fair it’s a far worse problem on Windows (although I guess a lot of my Linux machines are running from ram now so maybe I wouldn’t notice if it weren’t.)

You also need something like `nethogs` to view network IO, I wish there were a colored, htop-like utility for that as well.

In FreeBSD top(1) does show the IO statistics - just press “m”.

On Linux, that information is privileged, IIRC. so since most people use htop as an ordinary user, they wouldn't see that anyway.

Yeah, you need root or sudo to view iotop. But some information is (at least on some fresh ubuntu) available without elevation, like the per-disk output from iostat.

Where exactly can I find this in Windows' Task Manager? I don't see it as a column on the Details pane...

It's either on Performance tab (per disk) - the big chart is "active time", either per process on Processes tab as a column "Disk" - color is related to time usage, while the bandwidth is as text.

I'm a fan of iotop for monitoring various IO stats in a top-like interface.

pidstat is also good at showing IO Activity, per process.

The one thing I don't get about htop is the progress bars... they never seem to behave the way I'd expect them to based on the percentages, and they've got some colour coding I'm not clear on either... surely there is something I'm missing.

The bit you’re missing is actually Explained in the link. The colours in the bars refer to threads and their priority.

ah, thanks! I did a quick Ctrl+F for "Progress Bars" (probably the wrong word anyway), but not colour!

Thanks again, mystery solved.

Wait, which progress bars? Is it using SIGINFO or something?

Progress Bars was maybe the wrong term, I guess they're more "meters" since they're used for CPU and RAM usage.

memory and cpu usage methinks

>There will be a lot of output. We can grep for the open system call. But that will not really work since strace outputs everything to the standard error (stderr) stream. We can redirect the stderr to the standard output (stdout) stream with 2>&1.

Besides being a great explanation of htop, I like the way this article captures the way I - far from a shell guru - tend to think when putting together a few steps in the terminal. And even then it shows that it pays to read the man page too!

I am convinced that load average on a machine is one of the most misleading statistics you can view. It never means what you think it means and half the time it doesn't even indicate a real problem when it's high.

> One process runs for a bit of time, then it is suspended while the other processes waiting to run take turns running for a while. The time slice is usually a few milliseconds so you don't really notice it that much when your system is not under high load. (It'd be really interesting to find out how long time slices usually are in Linux.)

Isn't this the famous kernel HZ? It was originally 100 (interrupts/second), but nowadays often 250 or 1000:


It’s much more complex than that these days. With the CFS scheduler a process will run for somewhere between the target latency (basically the size of slice that N processes waiting to be scheduled are competing for, I think defaulted to 20ms) as the upper bound and the minimum granularity (the smallest slice that may be granted to a process being scheduled) as the lower bound, I think defaulted to 4ms.

This is made more complex by the ability to configure the scheduler with high granularity, including the ability to schedule different processors and process groups with different schedulers (and the rules that govern how the schedulers can then preempt each other).

Still, no scheduler can operate on granularity lower than the fundamental kernel tick rate. Which is by default 4 ms (1000 ms / 250), as you said.

Fascinating! I never knew I needed a htop t-shirt until I read this article!

Excellent htop tutorial - I'll be sharing the link with my students!

Awesome article, this HN thread should indicate it was originally published on HN in 2016. As mentioned on the front page "#1 on Hacker News on December 2, 2016"

Nice explanations.

Btw. additional recommendation (especially when a lot of CPUs and/or disks are involved + you want to keep an eye on multiple things at once): nmon

Is there any way to make it so that htop doesn't clear the screen when it exits?

Seeing the last page of htop's output after it exits is usually useful to me.

You could suspend the terminal with Ctrl-S and unsuspend with Ctrl-Q?

Excellent article and very well done in-depth work.

This is a great explanation of everything in the output of htop and related, but I suggest the author clean up the prose a little bit to make it a bit less conversational and easier to read.

another of my favorites is glances.

Just to say great. Thanks thanks

It's like a good book.

ooh... don't miss the t-shirt!

interesting article, thank you

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact