
Understanding /proc - fredrb
https://fredrb.github.io/2016/10/01/Understanding-proc/
======
aub3bhat
This is a great post, recently I have been trying to re-learn and understand
linux (specifically) ubuntu using monitoring tools. In my opinion htop and
Facebook osquery are the two best available tools for understanding how an
operating systems and processes work. The osquery approach of recording all OS
data in form relational tables (with PIDs as keys etc.) is very useful.

[https://hisham.hm/htop/](https://hisham.hm/htop/)

[https://osquery.io/](https://osquery.io/)

The osquery query packs are especially useful:
[https://osquery.io/docs/packs/](https://osquery.io/docs/packs/)

Here is an incomplete draft about a similar post:
[https://github.com/AKSHAYUBHAT/TopDownGuideToLinux](https://github.com/AKSHAYUBHAT/TopDownGuideToLinux)

~~~
dorfsmay
Isn't htop just top with colours?

For a quick global view I like atop, then if I need to drill in a subsystem,
iftop, vmstat, free, etc...

~~~
digi_owl
Not quite. You can access lsof and strace from inside htop. There is also a
process tree view, and you can select the process you want to manipulate via
arrow keys.

~~~
leetrout
And you can click on things in htop with your mouse. That feature never gets
old. I wish more command line tools supported that...

~~~
realusername
I feel stupid now, I've been using htop for years and never noticed that...

~~~
leetrout
Same here! I picked up htop ~2011 and learned about mouse support in ~2014. :D

Also- if you're on a Mac and you use tmux I just started trying out the tmux
integration with iterm2. Pros and cons but it's interesting to see first class
OS windows for tmux.

Edit: add missing iterm2 reference.
[https://gitlab.com/gnachman/iterm2/wikis/TmuxIntegration](https://gitlab.com/gnachman/iterm2/wikis/TmuxIntegration)

------
gbrayut
Very nice write-up, and a great way to dive deep into an interesting system!
But if you plan on maintaining a project like this long term I would recommend
using one of the many existing libraries like
[https://github.com/prometheus/procfs](https://github.com/prometheus/procfs)
or [http://pythonhosted.org/psutil/](http://pythonhosted.org/psutil/)

There can be a lot of edge cases, and inevitability things will change in the
future. Centralising the work of parsing /proc files goes a long way and helps
keep things sane for maintenance.

------
gmjosack
It's worth noting that `man proc` has pretty exhaustive, though incomplete,
documentation of the various files. It's a great read to learn about some of
the files available.

~~~
Rapzid
Best place to learn about proc IMHO... And possibly linux header files..

------
stouset
I wrote something to do similar parsing of process state recently. It seems
nuts to me that you can't get this all in one call. The naïve way of
`fopen()`ing the files you need has a race condition if the PID is reused
between two calls to `fopen()`.

Admittedly, probably rare. But why route through calls to `fopen()` and
`read()` when you could just provide a function that returns OS-defined
structs?

~~~
cyphar
You can use openat() with a file descriptor corresponding to the /proc/pid
directory to avoid the race condition.

~~~
stouset
Yep, just saying the "obvious" approach is wrong, which is (IMHO) always a
crappy way to design an API.

~~~
jerf
Based on what I learned a few months ago trying to deal with all these
problems, the conclusion I've come to is that basically the entire "standard"
set of POSIX calls are wrong. Correct handling requires an entire parallel set
of calls like "openat". However, this parallel set of calls postdates the more
conventional calls by a couple of decades and in some cases we found may still
not yet be entirely complete. (I'd guess they probably are by now, we were a
couple of versions back on the kernel.)

The problem is that those old calls have immense, immense inertia. They are
how people think of files. They are how almost all, if not actually all,
higher level languages interact with files by default, saving the "correct"
calls for external modules, if you even get that. In fact, many higher-level
languages are actively inimical to correct file handling by trying to abstract
away the "file handle" so you only have to deal with file names, but for
correct handling you really need to consider the file handle the _real_ file
and the file name merely a transient method for obtaining a file handle, which
is to be never used again once you have the handle.

Bear in mind the "crappy" API in question is probably older than you are, so
it's not really that surprising that it has needed some work as our world has
changed.

~~~
to3m
Even if you fix the pid reuse problem, you'll still have difficulties parsing
things like /proc/PID/maps. Unless your pseudo-file consists of fixed-size
records - and as I recall, /proc/PID/maps doesn't, as each line has a
variable-width path in it - your best option is just to read the entire file
in at once, and fingers crossed it ended up being atomic (Obviously the system
can't block operations that affect the maps file...)

signalfd gets this bit right.

It didn't take much POSIX programming before I started to look at Windows in a
whole new light...

~~~
tcoppi
I agree that parsing /proc/pid/maps is a complete nightmare, but I'm pretty
sure that the way pseudo-files on linux work, it is guaranteed the contents
won't change out from under you while you are reading it.

~~~
cyphar
Not all pseudo-files are like that. For example,
/sys/fs/cgroup/.../cgroup.procs will only have consistent content if you read
everything in a single page. Which is kinda dumb IMO.

~~~
tcoppi
Interesting... good to know.

------
gshrikant
Minor nitpick: calling it a a clone of the Unix 'ps' wouldn't be exactly
right, since I understand /proc is Linux-specific. On the other hand, how did
the Unix 'ps' or the 'ps' in other Unix-clones work? Is there an alternative
method to expose the process data to userspace instead of using a VFS like
procfs?

~~~
binarycrusader
/proc is not Linux-specific; my late, great Colleague Roger Faulkner did a lot
of work on it in Solaris:

[https://www.usenix.org/memoriam-roger-
faulkner](https://www.usenix.org/memoriam-roger-faulkner)

...and there's a general history here:

[https://blogs.oracle.com/eschrock/entry/the_power_of_proc](https://blogs.oracle.com/eschrock/entry/the_power_of_proc)

The part that may be specific to Linux is Linux provides a text-based
interface, whereas systems like Solaris provide a binary interface.

~~~
gshrikant
That link is really interesting! Thanks for that. I should have probably said
procfs is Linux-specific. Anyway, learned (rather unlearned) something new
today.

Also interesting is that early Unix systems (before v8) used `ptrace()` for
gathering process information - the same system call programs like
strace/ltrace use today.

~~~
binarycrusader
No, procfs is also not Linux specific ;-) /proc _is_ a filesystem, so most of
us refer to it as 'procfs' for short. In fact, the header file you include a C
program in Solaris to use it is '<procfs.h>'.

As I said before, the only thing that's really Linux-specific is Linux chose
to represent it as text instead of something machine-parsable.

~~~
pietromenna
Which other OS uses procfs?

~~~
konstmonst
Hier is the history
[https://en.wikipedia.org/wiki/Procfs](https://en.wikipedia.org/wiki/Procfs) I
think the one from linux was inspire by plan9's implementation.

~~~
pietromenna
Thank you!

------
keeperofdakeys
I wrote a small ps clone as a side project, and "man proc" was invaluable in
understanding what everything meant.

There was interesting work happening on a proposed newer api though,
"task_diag"
[https://lwn.net/Articles/685791/](https://lwn.net/Articles/685791/)
[https://criu.org/Task-diag](https://criu.org/Task-diag).

------
skun
Thank you for writing this. This is a wonderfully insightful post :)

------
TorKlingberg
Is there something similar for /sys and /run?

