
The everything-is-a-file principle – Linus Torvalds - majke
https://yarchive.net/comp/linux/everything_is_file.html
======
Animats
In QNX, everything is a message, including files. The basic primitive is
MsgSend, sent to another process which has a MsgRecv outstanding. The other
process sends a reply back with a MsgReply, which unblocks the MsgSend and
returns data. Any amount of data can be sent. POSIX file primitives,
open/close/read/write, are small functions that make MsgSend calls.

MsgSend is a more useful primitive than a file. It's a subroutine call across
processes, a better fit to anything that isn't stream I/O. In most OSs, you
want a subroutine call, but the OS gives you an I/O operation. Linux has to
stand on its head for operations such as getting all the properties of a
pluggable device, because returning a variable length array of fixed-format
records as an atomic operation isn't a Linux primitive.

Microkernels have a bad reputation because interprocess communication in Mach
was botched. Mach was built on BSD. To do this right, you have to design CPU
scheduling and message passing together, and they need to be tightly coupled.
The key to interprocess communication is arranging the normal path for a
message pass to throw control from one process to another without a trip
through the scheduler. Get that wrong and your microkernel will be sluggish.

~~~
thom
What current work (that might have a hope of becoming mainstream for some
subset of devices) do you see that does this well?

~~~
akavel
Not sure what API approach they use, but in the area of microkernels:

\- GenodeOS is reportedly used in production by some commercial clients,
though mostly undisclosed IIRC; it is a higher-level layer (~OS) compatible
with numerous microkernels

\- Fuchsia OS - in dev; it's a recent development by Google; as much as Google
is officially silent about it (I understand they're not sure how the
experiment will work out for them), observers assume it's most probably hoped
to be used as a successor to Android

\- Redox OS - in dev; no concrete news of mainstream usage plans I know of,
but has some mindshare among developers

\- Minix 3 - used in production; infamously by Intel in their IME

~~~
roblabla
The Nintendo Switch's Kernel, Horizon/NX , is an example of a microkernel,
tailored for their specific use-case, and is in wide use[0]. It is, sadly,
closed source, however it has been reverse engineered. Their IPC API is, I
believe, pretty smart and elegant:

\- There is a per-thread IPC zone of 0x100 bytes. When doing IPC, the request
is serialized and put into this. If bigger data than 0x100 bytes is necessary,
pointers are passed around, and the Kernel maps it into the process servicing
the call. \- svcSendSyncRequest is used to call an IPC. It is a synchronous
API that will block until the process servicing the call replies to it. \-
svcReplyAndReceive is used to receive an IPC request and reply to a request,
and then wait until a new one is received. The syscalls are "merged" into a
single one to avoid the syscall overhead: Almost all svcReply will be followed
by an svcReceive, so merging them into a single call makes a lot of sense.

You can find more information about the SVCs at [1] and the IPC layout at [2].

[0] Preempting the "I thought the switch used BSD": wikipedia was wrong, the
OS is completely custom, and the kernel is tailor-made. [1]
[http://switchbrew.org/index.php?title=SVC](http://switchbrew.org/index.php?title=SVC)
[2]
[http://switchbrew.org/index.php?title=IPC_Marshalling](http://switchbrew.org/index.php?title=IPC_Marshalling)

~~~
beefhash
> and the kernel is tailor-made

A little note about this: Horizon/NX actually traces back to the Horizon OS on
the Nintendo 3DS. The IPC marshalling was _significantly_ more simple back
then[1]. In all honesty, I'm not sure what made Nintendo thing the IPC
marshalling on the NX was a good idea; to me it just looks like a hastily-
designed mess (cf. "This one is packed even worse than A, they inserted the
bit38-36 of the address on top of the counter field." on the [2] page linked
by the parent comment). However, the NX incarnation was designed with
"naturally" wrapping C++ methods in mind and it does a fairly decent job at
that.

[1]
[https://www.3dbrew.org/wiki/IPC#Message_Structure](https://www.3dbrew.org/wiki/IPC#Message_Structure)

(Shoutouts to 3dbrew actually realizing that HTTPS is a thing that exists and
maybe should be deployed.)

------
DonHopkins
I prefer the "everything is a computer" API. (Which happens to be Alan Kay's
model of object oriented programming as message passing.)

[https://softwareengineering.stackexchange.com/questions/4659...](https://softwareengineering.stackexchange.com/questions/46592/so-
what-did-alan-kay-really-mean-by-the-term-object-oriented)

>"I thought of objects being like biological cells and/or individual computers
on a network, only able to communicate with messages (so messaging came at the
very beginning -- it took a while to see how to do messaging in a programming
language efficiently enough to be useful)." -Alan Kay

NeFS -- aka NFS 3.0 -- used a PostScript interpreter as the file system API.

[https://news.ycombinator.com/item?id=17061967](https://news.ycombinator.com/item?id=17061967)

~~~
nine_k
If we invoke "OOP" here, why not invoke classes? Or rather "interfaces" in
modern OOP-speak, or "traits" in other parts of the landscape.

There's a common trait among many objects: you can read a sequence of bytes
from it. Disk files, sockets, various input devices, random number generators,
etc. There's another common trait, writing a stream of bytes. Another, even
more common pair of traits, is "opening" and then "closing" something.

All of them can be described using interfaces / traits to clearly communicate
which sets of operations are applicable to which objects.

This e.g. can be neatly used to describe the "many things are a file" idea:

    
    
        File passwords_file = Filesystem.make("/etc/passwd");
        Readable password_readable = passwords_file.openRead();  // Could be openWrite().
        Process zcat_process = Processes.make("zcat /tmp/something.gz");
        Readable zcat_stdout = zcat_process.stdout.openRead();  // Can only be openRead().
        ClientSocket sock = Network.makeClientSocket(host, port, options);  // Unlike a file.
        Readable sock_readable = sock.openRead();
        for (Readable r in [password_readable, zcat_stdout, sock_redable]) {
          byte b = r.read();  // Read one byte from each, no matter what it is.
        }
    

But this would require a very different language, way more powerful than C
(but also _not_ C++ please). Unlike in 1969, now we have languages like that,
e.g. Rust and ATS-lang. Likely we need another 10-15 years for a viable,
somehow widely used OS kernel written in such a language to emerge.

~~~
aidanlister
You are a level too high in the stack - what is happening behind the scenes
when you execute any of those?

~~~
nine_k
As long as this gets type-checked at compile time, the types info can be
erased during compilation and be absent at runtime.

Since traits / interfaces can't have their methods overridden, you don't need
the dynamic dispatch, per-instance VMT, and such; you can have fixed offsets
in a method table per class, both in userland and in kernel. (Some of the
indirect calls can probably be made direct if the class is exactly known at
compile time.)

------
zvrba
Windows does it nicely: everything is a securable (with ACL) object. Process,
thread, file, socket, console, service... is an object, in a namespace. The
only exception I've encountered so far is network management stuff, like
routing tables. WinObj is a nice tool to inspect how Windows handles this
([https://docs.microsoft.com/en-
us/sysinternals/downloads/wino...](https://docs.microsoft.com/en-
us/sysinternals/downloads/winobj))

> In Windows, you have 15 different versions of "read()" with sockets and
> files and pipes all having strange special cases and special system calls.

That's not correct. ReadFile can be used for all kinds of objects, including
sockets: [https://docs.microsoft.com/en-
us/windows/desktop/api/fileapi...](https://docs.microsoft.com/en-
us/windows/desktop/api/fileapi/nf-fileapi-readfile) Yes, it behaves
differently depending on the type of the object, but so does read(2) (E.g.,
you'll never get SIGCHLD when reading from a file, for example.)

~~~
forapurpose
Remember that Linus wrote that in 2007. Let's say he was talking about XP;
might things have changed since then?

~~~
zvrba
The concepts of objects and ACLs are in the NT kernel itself. So things have
worked like this on all windows versions built on the NT kernel: from WinNT
3.1, to Win2k, ... to this day. Windows TCPIP stack has improved since then,
but I'd be _very_ surprised if ReadFile didn't work on sockets on XP (or
anything NT-based, actually).

In any case, he's very rash to call out Windows when Linux has its share of
special-purpose syscalls that do the "same" thing (e.g.: send, sendv, sendmsg,
sendfile, sendmmsg, …) with myriads of options to alter their behavior.

------
Nokinside
> The whole point with "everything is a file" is not that you have some random
> filename (indeed, sockets and pipes show that "file" and "filename" have
> nothing to do with each other), but the fact that you can use common tools
> to operate on different things.

~~~
pjmlp
Try to apply the file concept to graphics programming.

You won't get Crysis running out of it.

~~~
coldtea
If I am not mistaken, this is not unlike how all game graphics were done in
the 80s... directly writing to some frame buffer...

~~~
AnimalMuppet
Yes, you directly wrote to a frame buffer, _but not as a file_. You wrote to
it as a block of raw memory at a fixed address.

~~~
chrisseaton
The frame buffer is a file on Linux isn't it? /dev/fb

~~~
pjmlp
Yes if you want 2D MS-DOS style performance.

------
mpweiher
Linus wrote (2002):

"But what would you _do_ with them? What would be the advantage as compared to
the current situation?"

Here is one answer (2008):

"This paper presents PipesFS, an I/O architecture for Linux 2.6 that increases
I/O throughput and adds support for heterogeneous parallel processors by (1)
collapsing many I/O interfaces onto one: the Unix pipeline, (2) increasing
pipe efficiency and (3) exploiting pipeline modularity to spread computation
across all available processors. PipesFS extends the pipeline model to kernel
I/O and communicates with applications through a Linux virtual filesystem
(VFS), where directory nodes represent operations and pipe nodes export live
kernel data. Users can thus interact with kernel I/O through existing calls
like mkdir, tools like grep, most languages and even shell scripts. To support
performance critical tasks, PipesFS improves pipe throughput through copy,
context switch and cache miss avoidance. To integrate heterogeneous processors
(e.g., the Cell) it transparently moves operations to the most efficient type
of core"

Sounds pretty good to me!

[https://research.vu.nl/en/publications/pipesfs-fast-linux-
io...](https://research.vu.nl/en/publications/pipesfs-fast-linux-io-in-the-
unix-tradition)

------
fooker
I am glad Linux chooses pragmatism over what I call 'aesthetic engineering',
where new interfaces and abstractions do not buy anything.

------
mabynogy
Rob Pike designed a nice windowing system around files:
[http://doc.cat-v.org/plan_9/3rd_edition/rio/rio_slides.pdf](http://doc.cat-v.org/plan_9/3rd_edition/rio/rio_slides.pdf)

~~~
kchr
I first learned of Rob Pike when starting to dabble with golang. Now I notice
he's been involved in more or less everything I already use or stumble over.
Busy guy!

------
FrozenVoid
Having everything as X isn't as flexible and performant as specialized
types."When all you have is a hammer".

This results in systems being built ontop of narrow file-as-everything
service, providing a wider interface to real
data(events,sockets,pipes,messages,async IO).

Of course, people don't like it and build libraries to bypass it(and even the
kernel layer itself [https://blog.cloudflare.com/kernel-
bypass/](https://blog.cloudflare.com/kernel-bypass/) ) just because the
interface is inherently limited and inflexible.

------
DonHopkins
I always wanted /dev/zero, which is used to mmap zeros into memory, to be more
general and use the device minor number to define which byte gets mapped, so
you could mknod /dev/seven with a minor number of 7, to provide an infinite
source of beeps!

~~~
greglindahl
Unfortunately, device minor numbers don't go high enough to support
/dev/U+1F4A9, which is an infinite stream of ...

~~~
DonHopkins
Don't they have a special purpose USB keyboard with one big easy to press
squishy sculpted and iconically colored button that generates just that
character?

~~~
greglindahl
Autorepeat on a keyboard just can't compare the the computational power behind
/dev/zero!

~~~
DonHopkins
Plus a virtual memory system that provides crap on demand, and poopy on write!

------
vbezhenar
"Everything is a file" is like "Everything is a REST". Sounds good but doesn't
work in practice.

~~~
w8rbt
Has worked great on Unix for how many years now? Keep it simple is all that
Linus is saying.

~~~
vbezhenar
There are many abstractions with Unix that are not files. I don't even know
any modern unix, where directory is a file.

~~~
lazyant
A directory is just a special type of file that contains a list of file names
and their corresponding inodes, how is a directory NOT a file?

~~~
ioquatix
That's a reasonable point at an abstract level. But in reality, while you use
`open`, `read`, `write` and `close` for files, the equivalent for directories
is `opendir`, `readdir`, `closedir`.

~~~
elderK
I don't see how providing specialized open/read/close calls for directories
breaks the file abstraction.

Imagine byte-level access to a directory. You'd have to have some library in
userspace that would correctly be able to manipulate that directory's
metadata. Now imagine doing that for various filesystems.

Plan 9's namespaces and file-servers were pretty awesome and flexible. It
really opened my eyes to the generality of the file abstraction.

~~~
DonHopkins
I do see how having a specialized set of calls for directories breaks the
abstraction that directories are files. Isn't that the very definition of
breaking the abstraction?

~~~
elderK
:) Care to elaborate?

I mean, as far as I see it, the directory is still a file, of a specific
format.

How is that any different than an image file, say, of a particular format? You
still need specialized programs in order to manipulate them in a meaningful
way. But they're just bytes. Just like the directory is just bytes.

The difference, of course, is that allowing the user to arbitrarily manipulate
a directory entry at the byte-level could lead to filesystem corruption.

I'm aware I may be missing something really obvious here. Heck, even
contradicting myself :)

Educate me :)

~~~
DonHopkins
Well, you could unify the interface to files and directories by removing all
the system calls to deal with opening and closing and reading and writing
files, and then removing all the system calls to deal with opening and closing
and reading and writing directories, and then simply using ioctl() for
everything!

------
adamnemecek
I think that "everything is a file" is superseded by "everything is a
reference". Memory for example isn't a file. Neither is a GPU.

I believe that Windows' HANDLE is closer to the correct abstraction.

~~~
vbezhenar
Abstraction presumes some common operations, otherwise it's not abstraction,
it's just some opaque bytes. You can open/read/write/close files. May be seek.
What common operations can you do with some random windows HANDLEs? AFAIK you
can't even close that handle, you need to call specific function.

~~~
deathanatos
Typically, I've found, the common operation is not so much the file/HANDLE
itself, as that the file/HANDLE can generate _operations_ that you need to
wait on. But select() set the stage by waiting on the file descriptor, and
poll/epoll mostly follow suit. You register interest in events on an FD.
(Instead of, say, starting an operation and saying "tell me when _this
operation_ finishes"; e.g., "futures" as exposed by a number of languages.

Aside from waiting, I'd say passing/referring to the objects being operated on
is also a commonality. Passing a FD to a child process, for example, or giving
it (the object, be that a file/pipe/socket/timer/process/etc.) a name in the
FS s.t. other things can refer to them by name. (I personally wish though that
fork()/exec() had forced you to specify _exactly_ the set of objects
(files/etc.) for the child to avoid the entire multithreading/close-on-
exec/atomic flag setting hell that exists today.)

I don't know that read/write are actually abstract at the level of a "file
descriptor" though. Eventfds and timerfds both support them, but it feels
forced. Pipes are one-way, so one of read/write don't make sense, and seek on
pipe doesn't either. I think _some_ file descriptors (child classes of a
generic "object"/FD/HANDLE, such as files) support read/write, but not
necessarily _all_ FDs support read/write/seek.

~~~
vbezhenar
My point is that HANDLE is too general abstraction. Can you wait on HANDLE
that's result from CreateHeap call? Can you wait on HANDLE that's result from
CreateWindow call (HWND is defined as HANDLE)? What's common between file,
window and heap object?

~~~
bhk
Closing/destroying and granting (to another process) are generally applicable
to about any resource held by a process. UNIXes, by contrast, expose multiple
"close" system calls for non-file things, and have a largely incomplete
mishmash of ways to share non-file things.

------
ggm
I liked the idea of filerefs for everything precisely because I hate the
sockets() semantics. But, most of this is about the awful pain of the ioctl()
stuff you either have to call as setup magic on the FD, or pass as
setsockopts() because you cant coerce enough into the limited modality of
file-like moments opening the (pseudo) file.

Really it feels more like 'what is the hierarchical structure of my namespace'
because once you nail that down, the file semantics become clearer. if its
async io under the class of io its open("/io/async/my-thing", ...)

So its one of those yes.. but its so hard to nail it down moments.

------
lisper
Wow, Linus had me all the way to the very end:

> If you can read on it, it's a file.

Um, no. If you can SEEK on it, it's a file. If you can't, it's a socket or a
pipe.

~~~
fjsolwmv
I think Linus understands Linux files. Reread the message:

> In UNIX, a file descriptor is pretty much anything. You could say that
> sockets aren't remotely file-like, and you'd be right. What's your point?

The fact that you can't seek on a socket is irrelevant to the thread, which is
about _accessing_ files, not navigating them.

~~~
lisper
> I think Linus understands Linux files.

Sure, but that doesn't mean he doesn't occasionally make a mistake.

> In that message, he's talking about...

Yes, I know. But the larger context is Linus debunking (correctly) the unix
philosophy that "everything is a file". Linus is correct: it is not the case,
never has been the case, nor should it be the case, that everything is a file.
But then he undermines his own argument with the (mistaken) claim that "if you
can read it, it's a file." No. There's a salient difference between files one
the one hand, and pipes and sockets on the other, and it has nothing to do
with whether or not they have names. A named pipe is still a pipe, not a file.
"Files" in /dev are not files, neither are "files" in /proc, despite the fact
that they have names.

It's a detail, but an important one in the larger context IMHO.

~~~
fjsolwmv
This quibbling semantics. That's like saying a ball isn't really a ball
because it's not curved at a molecular level. It's true and it'd interesting
in it's own sense, but not at all relevant to the real world discussion at
hand, and not a refutation of anything in context ("had me until the end...").

Yes, the wording around "file" is a bit ambiguous, because Unix has a major
design philosophy that said you can treat non files as files as get a lot of
mileage, but didn't call those things "fauxles" or something instead.

~~~
lisper
> This quibbling semantics

No, it's quibbling pedagogy.

> Yes, the wording around "file" is a bit ambiguous

Exactly. And this can be a real problem when people are first learning about
unix.

In fact, it can be a real problem even after that. Just the other day (true
story) I was trying to debug some server latency issues and I asked one of our
sysadmins for help. He suggested that I run lsof to help debug the problem. I
told him I was pretty sure that wouldn't help because all the evidence
indicated that the problem was with a rogue process, not a file, and he said,
"This is unix. Everything is a file."

------
dozzie
Everything is a file... except processes (you can't just delete a process),
network interfaces (try setting an IP address with a write() or ioctl()), and
plenty of other things. Unix never had "everything is a file" as its mantra,
and Torvalds never introduced it.

~~~
bo1024
My understanding of the mini-slogan "everything is a file" is that it was
never meant to refer to the entire API of the object, but just reading/writing
data.

~~~
dozzie
And then you hit ioctls, which are necessary even by the very file-like way of
working with terminals. The slogan in the unix world never matched anything
well, or at least never since '90s.

~~~
eadmund
> The slogan in the unix world never matched anything well

It matched Plan 9 (which is really Unix V9) really well …

------
gsaga
>But there's absolutely no point in opening /dev/futex from a shell script or
similar, because you don't get anything from it.

The same can be said for files in '/dev/input'. Why are mouse and touchpad
input exposed as files?

~~~
LukeShu
No, you can't say the same thing. You open it from a shell script, you
literally do get something from it.

In the email you quote, Linus gives two reasons why futex and sockets should
not be files:

 _> there's absolutely no point in opening /dev/futex from a shell script or
similar, because you don't get anything from it._

 _> Perhaps because you cannot enumerate sockets and pipes?_

It stands to reason that the converse of those _are_ reasons to make things a
file:

I can open /dev/input/$X and get something from it. The normal thing to do is
open a file under /dev/input/ and read data from it. No, we don't usually do
that directly, because libinput does it for us; but that's what it's doing. As
an exception, if code deals with joysticks, it seems to me that it's common to
open /dev/input/js$X yourself.

I can list the entries in /dev/input/ and enumerate the input devices attached
to the system.

------
theshadowknows
It must be very frustrating being Linus

------
_bxg1
I would be terrified to ever try and contribute anything to Linux, lest Linus
yell at me and call me stupid. Seems like he does that in every excerpt that
gets posted here.

~~~
_bxg1
For the record I was mostly making a joke, everybody

~~~
B1FF_PSUVM
Yeah, and when Linus gets put to pasture and betrization [] becomes mandatory,
we can all stop laughing. Probably laughing won't be allowed either, anyway,
too aggressive.

[]
[https://en.wikipedia.org/wiki/Return_from_the_Stars](https://en.wikipedia.org/wiki/Return_from_the_Stars)

------
osrec
I appreciate the work Linus has done a great deal. Clearly he's a gifted
individual, but he does come across a bit aggressive in his communication at
times! It can be funny, but also somewhat offensive.

"... and a black star for being stupid"

~~~
vortico
As a BDFL, it's the only way to survive, unless you're on sedatives your whole
life. You very frequently come across ideas that are so unexplainably stupid
that the only way to fix them is to stop them in their tracks. Opening a
friendly debate about the advantages and disadvantages is just a huge waste of
multiple hours, because you'll never end up convincing people that their pet
ideas/projects should go back to square one (or square zero in programming).
His aggression has turned contributers away from the project and made negative
relations, but the end goal is to maintain the Linux operating system, and his
aggression, strictness, and blows to self esteem have had a proven
effectiveness 10x more than soft discussions.

Think about it this way: A dictator has to be extremely assertive because if
he said "No, I don't want that in Linux" with no basis, he'd eventually lose
his dictatorship power as people would form their own "unions" to try to
change things, making several directionless forks of Linux. If he instead
said, "This is a completely idiotic approach and anyone that thinks it's okay
is braindead," nobody would question it and most people would just take his
word for it. Most heated debates have an equal number of supporters for each
side, and it takes someone with a strong voice to remind them that one
decision must be chosen regardless.

~~~
fjsolwmv
You skipped the middle ground between "no basis" and "ad hominem is the only
basis".

Lying about things and people is a bad way to force people to accept
decisions. Calling someone or something stupid isn't a technical argument,
it's a poorly worded conclusion.

------
21
Let's not forget the massive pain that /dev/urandom being a file was, that
they had to add a syscall for it: getrandom()

~~~
Aaron1011
I thought that the issue with /dev/urandom was that it wouldn't block when
read shortly after startup (when the entropy pool wasn't yet initialized).

~~~
mmebane
That's part of it. Another concern was ensuring that fetching a random number
couldn't fail due to file descriptor exhaustion. [1]

[1]: [https://lwn.net/Articles/605828/](https://lwn.net/Articles/605828/)

------
rurban
He really has to be awarded an Anti-UNIX prize somewhen.

------
popee
He is right, but I feel fat now

------
tobyhinloopen
Linus is an asshole. But if you have made some nice things, you are allowed to
be an ass, it seems.

------
rossdavidh
Hypothesis: the reason Linus Torvalds is still BDFL for Linux, but Guido van
Rossum recently announced he is stepping down as BDFL of Python, is that GvR
is just a lot nicer as a human being. Being a BDFL either requires getting a
lot of abuse and not responding in kind, or being such an irascible boor that
it doesn't bother you.

Just a hypothesis; I've never met either one and never been a BDFL either.

------
stevebmark
I've always wondered what sad life Linus leads to make him communicate like
every developer I've worked with that I don't want to work with. Clearly he
has deeper issues that lead to his blooming sarcasm and condescension, that
our community already struggles with. This behavior is not normal, nor
sustainable.

~~~
megaman22
Yikes, I'm disturbed that occasional outbursts of righteous anger would make
someone start assuming serious psychological problems. That's concerning to me
- and I do not think I would be comfortable working with people who are so
deeply offended by an angry email.

God help you if you'd ever grown up working on farm equipment.

~~~
stevebmark
We were all 26 once. Professionalism builds successful teams long term.
Calling out toxic behavior that worsens things, like others imposter syndrome,
is part of how we help things get better.

