
Privileged Ports Are Expensive (2016) - phaer
http://adamierymenko.com/privileged-ports-are-causing-climate-change/
======
btrask
In terms of security/isolation, processes, users, containers and
virtualization are all essentially the same thing. I wish the people working
on these things would step back and notice the forest for the trees.

Whatever the ultimate "isolation unit" ends up being, it needs to be
recursive. That means being able to run processes _within_ your process
(essentially as libraries), create users _within_ your user account (for true
first-class multi-tenancy), or VMs _within_ your VM (without compounding
overhead).

It turns out that this author also wrote "Docker: Not Even a Linker"[1] which
was also deeply insightful about unconscious/accidental architecture
decisions. I'm impressed by his insight and disturbed that most people don't
seem to understand it.

[1]
[https://news.ycombinator.com/item?id=9809912](https://news.ycombinator.com/item?id=9809912)

~~~
catern
Let me take that one step further:

>processes, users, containers and virtualization are all essentially the same
thing.

...and so are modules/objects/whatever your language of choice calls them.
Abstraction boundaries, to be precise. Abstraction, security, and type-safety,
are all _very_ closely related.

These language-specific mechanisms for isolation _are_ recursive - trivially
so. And language runtimes and compilers make security cheap - so cheap that
it's ubiquitous.

Processes, users, containers and virtualization all rely on an operating
system for security, which in turn relies on hardware features. Specifically,
virtual memory and privileged instructions. And those hardware features are
slow, and more importantly: they're not recursive!

But hardware-based isolation does have one key advantage over language-based
isolation: It works for arbitrary languages, and indeed, arbitrary code.

I completely agree that recursive isolation is necessary. We need to figure
out rich enough hardware primitives and get them implemented; or we need to
migrate everything to a single language runtime, like the JVM.

~~~
btrask
Great point. The JVM tried for this position and failed IMHO (I think it
abstracted too much). Now the browser is slowly honing in on it, and it might
succeed (mostly due to sheer inertia). As opposed to the JVM, I like to call
the ultimate goal the "C Virtual Machine" (just process isolation++).

I think moving isolation out of hardware is really important (both to make it
recursive and portable). NaCl is an interesting step in that direction. If you
could use something like it to protect kernelspace (instead of ring 0),
syscalls could be much, much faster.

There's another problem with language-based isolation: it makes your
language/compiler/runtime security-critical. Conversely, NaCl has a tiny,
formally proven verifier that works regardless of how the code was actually
generated, which seems like a much saner approach.

I'll also say that I don't think it's reasonable to expect every
object/module/whatever within a complex program to be fully isolated (in
mainstream languages at least). There's no need for it, and it will have too
much overhead (in a world where objects in many languages already have too
much overhead). Better to start relatively coarse-grained (today the state of
the art is basically QubesOS), and gradually improve.

~~~
nickpsecurity
"NaCl is an interesting step in that direction. If you could use something
like it to protect kernelspace (instead of ring 0), syscalls could be much,
much faster."

It's actually partly inspired by how old security kernels work mixed with SFI.
The first, secure kernels used a combination of rings, segments, tiny stuff in
kernel space, limited manipulation of pointers, and a ton of verification.
Here's original ones:

[http://www.cse.psu.edu/~trj1/cse443-s12/docs/ch6.pdf](http://www.cse.psu.edu/~trj1/cse443-s12/docs/ch6.pdf)

A Burroughs guy who worked with Schell et al on GEMSOS and other projects was
the Intel guy who added the hardware isolation mechanisms. They were
originally uninterested in that. Imagine the world if we were stuck on legacy
code doing the tricks no isolation allows. Glad it didn't happen. :)

Eventually, that crowd went with separation kernels to run VM's and such that
market was demanding. They run security-critical components directly on the
tiny kernel.

[https://os.inf.tu-dresden.de/papers_ps/nizza.pdf](https://os.inf.tu-
dresden.de/papers_ps/nizza.pdf)

The SFI people continued doing their thing. The brighter ones realized it
wasn't working. They started trying to make compiler or hardware assisted
safety checking cost less with clever designs. One, like NaCl and older
kernels, used segments to augment SFI. Others started looking at data flow
more. So, here's some good work from that crowd:

[http://dslab.epfl.ch/pubs/cpi.pdf](http://dslab.epfl.ch/pubs/cpi.pdf)

[https://www.cs.rutgers.edu/~santosh.nagarakatte/softbound/](https://www.cs.rutgers.edu/~santosh.nagarakatte/softbound/)

[https://www.microsoft.com/en-us/research/wp-
content/uploads/...](https://www.microsoft.com/en-us/research/wp-
content/uploads/2006/11/dfiOSDI.pdf)

So, have fun with those. :)

------
roblabla
One of my biggest frustration while running a multi-tenant system running
NixOS (Which is great btw, every user can install stuff in their home without
sudo) is that HTTP is bound to port 80. And beyond the whole privileged port
jazz, there's another trouble here : only one program can listen on myip:80.

Ideally, HTTP would use SRV DNS records. For the uninitiated, those are
records that contain both an IP and a port, so instead of having a "default
port" of 80 (which is completely arbitrary), you get to define on what port
that service is running. Then I could just assign each user of my multi-tenant
system a range of ports (With 1000 port each, we could get 65 users. Maybe
less due to ephemeral ports, but it's already more than what I need anyway).

There are other solutions, such as machines connected to IPv6 could get
millions of IP address essentially for free. But IPv6 coverage is still
spotty. And in my case, the system was running on a kimsufi box, which gives
exactly _one_ IPv6 address per machine. (And let me rant for a moment and say
that this is __really __stupid, for multiple reasons such as ipv6 blacklist
using blocks anyway).

I wonder if it's too late to get SRV records in, say, HTTP2. Or even in HTTP1
for that matter, as an amendment RFC. Because it would trully be awesome.

~~~
slaymaker1907
A reverse proxy can kind of act as a nice hack around this by multiplexing off
of the the Host header (which is unencrypted even with HTTPS due to the way
that the SSL handshake is done).

~~~
lsaferite
The HTTPS Host header is not unencrypted. You are talking about SNI which is
an extension to TLS that adds the hostname as part of the handshake. The Host
header is still encrypted when using SNI.

------
zlynx
In Linux the thing the author wants is
/proc/sys/net/ipv4/ip_unprivileged_port_start which defaults to 1024 but can
be set to anything you like. Such as 0.

Edit: I didn't realize how new that was. Kernels 4.11+ only. I think some
people were using this on custom patched kernels though because I've been
seeing it around. Was committed in January.

[https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4548b683b78137f8eadeb312b94e20bb0d4a7141)

~~~
mindslight
No, the things that would have obviated the author's specific example were (a
finer-grained version of) ip_unprivileged_port_start and SRV records _back in
the 90s_ , when the customs actually developed.

~~~
api
That's the whole point about path dependence. We went down this _massively_
more complex and expensive path because we didn't do a few _teeny weeny little
things_ to network interfaces and permissions back in the 90s.

------
michaelhoffman
> Reproductive organs are what evolutionary biologists would call a highly
> conserved system, meaning they don't tend to change often.

No. Reproductive genes are well-known for what evolutionary biologists (such
as myself) call _positive selection_ , meaning they tend to change more often
than you would expect.

See, for example,
[https://www.nature.com/nature/journal/v403/n6767/full/403304...](https://www.nature.com/nature/journal/v403/n6767/full/403304a0.html)

~~~
sillysaurus3
Would anyone mind posting the PDF for those of us who can't afford it but
would like to study it?

~~~
garaetjjte
from scihub: [http://moscow.sci-
hub.cc/7d2cc404e2efff26c37a6dd544258956/wu...](http://moscow.sci-
hub.cc/7d2cc404e2efff26c37a6dd544258956/wu2000.pdf)

~~~
sillysaurus3
Thanks!

------
zielmicha
> Step two: extend user and group permissions and ownership into the realm of
> network resources, allowing UID/GID read/write/bind (bind in place of
> execute) permission masks to be set for devices and IPs. By default a bind()
> call that does not specify an address (0.0.0.0 or ::0) will listen for
> packets or connections on all interfaces that the current user has the
> appropriate permission to bind and outgoing connections will default to the
> first interface the user owns.

I think that this can be achieved with network namespaces and some iptables
magic, entirely in userspace (and without the whole burden of containers).

Anyway, IMO the point of containers is not network virtualization, but rather
isolation of dependencies. It's much easier to just pack everything into a
container than to invent proper non-root package manager (like Nix).

~~~
marcosdumay
The point of directories is to provide proper isolation of dependencies. But
that followed the same path.

What leads to... It's better to add non-root mounting into the author's list.
I don't think there's even anything that needs to change, permissions are
already there.

------
salmo
It's not just ports, resource contention in general is largely the pain of
multi-user systems.

If you've ever managed a host with thousands of users, you'd remember. Sure
many of these have work-arounds now, but I've seen users (or have personally)
run a machine out of:

\- /tmp space

\- processes

\- open file handles

\- inodes

\- shared memory segments

\- ephemeral ports

\- etc., etc.

You ever have a root shell on a box that's out of processes. Luckily modern
shells have built-ins, so you can at least `ls` and stuff. But when you can't
run, say, `lsof`, what do you do? That used to be a classic interview
question.

And then there's the traditional resource contention folks think of like
saturating links, i/o channels, disk space, cpu, memory, and so on.

But nowadays it's virtual machines all the way down. My favorite is to think
of something like Puppet server (I like to pick on it) running in Docker.
Count the abstractions in the chain:

\- JRuby interpreter

\- JVM

\- Process

\- Container

\- OS

\- Virtual Machine

\- Hypervisor process

\- Hypervisor OS

\- even UEFI to an extent

\- Hardware

You could say things like the process and OS are redundant, and you could
probably add more if you consider some of the subcomponents of some of these.
But each one is there to segment and/or simplify and then we add all kinds of
holes for each to directly access one of its parental chain.

It is an evolved system where we are doomed to "reinvent Unix badly". But I'll
be curious to see where we end up in 10-15 years...

------
hamandcheese
Somewhat related, but I always thought it would make sense for browsers to use
DNS based service discovery to find the correct HTTP port, and then fall back
to 80/443\. That alone would make it easier for users in a multi-tenant
environment to self host, IMO, and eliminate some of the need for SNI.

~~~
manacit
This is what SRV records were (largely) created for. There have been a couple
of attempts at making this a reality:

[https://tools.ietf.org/html/draft-andrews-http-
srv-01](https://tools.ietf.org/html/draft-andrews-http-srv-01)
[https://tools.ietf.org/html/draft-jennings-http-
srv-05](https://tools.ietf.org/html/draft-jennings-http-srv-05)

Unfortunately, HTTP is just too widespread of a protocol - you would end up
having to listen for legacy clients on 80/443 forever, making it a nonstarter.

~~~
dsparkman
The reality is that this is not that infeasible a change as everyone wants us
to think that it is. It is simply a change to the DNS resolution logic for
HTTP/HTTPS. The major browser vendors could make the change in relatively
short order. The major hurdle is that it would increase the number of DNS
queries to resolve a website.

XMPP is an example of a protocol that currently has this correct, in that it
uses SRV records to control ports for a given host.

~~~
LgWoodenBadger
Why would it involve extra DNS queries? Why wouldn't the SRV record come back
as part of the original response to the host-name resolution request?

~~~
zAy0LfpBZLC8mAC
There is no such thing as a "host-name resolution request". A DNS query
specifies a domain name and a record type, and gets as the result a list of
all records of that type under that name. Record types would be A (IPv4
address), AAAA (IPv6 address), MX (mail exchanger name), SRV (server name and
port for a particular service), TXT (free-form text), and many others, most of
them nowadays unused.

~~~
icebraining
You can specify "ANY" as the type in the query and get all results. Try
running "host -a ycombinator.com". It doesn't recursively resolve CNAMEs like
an A query does, though. Also, at least Cloudflare refuses those requests, to
reduce DNS reflection attacks.

~~~
JdeBP
"any" does not mean "all". People regularly make this mistake. One cannot do
these sorts of tricks with ANY queries.

* [http://jdebp.eu/Softwares/djbwares/qmail-patches.html#any-to...](http://jdebp.eu/Softwares/djbwares/qmail-patches.html#any-to-cname)

------
api
I'm the original author.

I wrote that a while back. My opinion hasn't changed too much, though I have
to say that article could stand a rewrite. Don't really have time right now.

The real crux of the article is less about privileged ports and Unix
permissions than about path dependence and how it leads to complexity
explosions in systems. Instead of building a fix for X, maybe we should first
question whether X really has to be that way and if there exists some simpler
path to achieving our goals that involve some amount of change but far less
complexity.

Sometimes you can't do that, but sometimes you can. I think privileged ports
are a case where we could have easily eliminated a lot of complexity and
headaches by just eliminating an obsolete feature.

~~~
rsync
Please search this HN comment thread for "jail" and my own comment - I am
curious about your thoughts ...

~~~
api
Jails are closer. They are perhaps a way of achieving some of this. The
problem (unless I'm wrong) is that you need root to create one, so you are
back to needing root for everything.

~~~
rsync
Yes, of course the base systems root is necessary to create the jail, but then
the jail has its own root user as well as its own /etc/passwd (and
/etc/everythingelse).

For many, many purposes (almost all ?) a FreeBSD jail is indistinguishable to
the root user from a bare metal server.

------
eru
> How did we end up with the nested complexity explosion of
> OS->VM->containers->... instead of the simplicity of multi-user operating
> systems?

> [...], it pushes about 5-10 megabits of traffic most of the time but the
> software it runs only needs about 10mb of RAM and less than a megabyte (!!!)
> of disk space. It also utilizes less than 1% of available CPU.

> The virtual machine it runs on, however, occupies about 8gb of disk and at
> least 768mb of RAM. All that space is to store an entire CentOS Linux base
> installation, [...]

The exo-kernel and uni-kernel people have the opposite idea: go all the way to
virtualization, and eliminate the traditional operating system.

You can still have all the conveniences of the programming models you like,
but eg file-systems and network protocols will be implemented in user level
libraries.

~~~
ezrast
I can easily see a world where history repeats itself here. VMs get thinner,
hypervisors get fatter to make up the difference, all the other layers of
abstraction get squished out of the system, and what we're left with is just
another process model. It's a model built on different assumptions than the
one we're used to, with stronger guarantees about isolation than we might be
comfortable with at first, but one that has an appeal even for desktop
computing.

Later, once everyone has a hypervisor in their pocket, we tell our children
about how cloud infrastructures used to be so expensive that only big
corporations could maintain them, and they came in server racks the size of
refrigerators.

~~~
eru
No, the idea is not to make hypervisors fatter. Just the opposite: put the
abstractions into userspace.

------
justinsaccount
This is one of the reasons why the kubernetes network model[0] is kinda neat.
Every service running in each pod can bind to port 80 without any conflicts
because every pod gets a dedicated ip address.

[0] [https://kubernetes.io/docs/concepts/cluster-
administration/n...](https://kubernetes.io/docs/concepts/cluster-
administration/networking/#kubernetes-model)

~~~
lend000
Right, I read this article as "IPv4 is causing climate change." If you have
enough IP addresses, dedicated port conventions don't bother me at all.

~~~
ryukafalz
Yeah, but even if each user has their own dedicated IP, they can't bind to a
privileged port unless they're root. So it's still a problem, no?

~~~
phil21
No, not at all. This problem was solved decades ago via a myriad of different
workarounds.

You can simply turn off privileged ports if you feel like it. Or do things
like setuid such as how Apache starts as root but switches to userspace
immediately.

It's not ideal, but to say the privileged port thing is an issue is bizarre to
me. It's the least interesting item the author brings up in the article, and
certainly was _not_ the primary driver behind multi-tenancy virtualization.

Better yet is using reverse proxies - service providers have been doing this
for ages. A single shared HA cluster of haproxy, that maintains records for
each individual application/tenant that lives wherever it likes. This is the
model I prefer, since it allows me (as a sysadmin) to protect the developers
from themselves by being able to easily filter at layer 7 in front of their
application as needed. Also lets you direct traffic easily, and is generally
the model used by any container orchestration service.

Very easy to expose all that via various APIs and tooling so users can self-
service.

------
ChuckMcM
It was a fun read, and there is a clever thought there which I'll get to in a
minute. But first ...

The author gets a lot wrong, it isn't uncommon because most software people
see a "server" as a set of numbers "A" bytes of memory, "B" bytes of disk
space, and "C" hertz of clock speed.

Computers aren't numbers of course, they are systems, and the systems are a
collection of moving parts that are connected by linkages which constrain and
amplify the various parts. In system design there is a concept of 'balance'
which discusses how the linkages enable and constrain such that an
understanding of the amount of "work" that can be done by the system is
understood.

There is a great example in the article comparing a Raspberry Pi to a
Sparcstation 10. Most of the software the author used "back in the day" is
still around in one form or another, and a RasPi is cheap, so its easy to
create modest recreation of the environment from that time and to build
simulated users who can access the "server". Just two or three users and the
Pi will "choke". Understanding how a system with so many "bigger" numbers is
less capable of doing the same "work" as one with much smaller numbers is a
useful exercise to run if you are into systems analysis. The system the Pi was
_designed_ to run, and the one for which it is fairly well balanced is a
smartphone. Something the SPARC 10 would truly suck at.

The clever idea however is making networking resources just another resource
like disks. Interestingly enough, with IPV6 that is much easier than it was
before (to the point about how things were done before in networking, vs now).
It is straight forward to create a model of network interfaces that is
equivalent to serial lines. Have the kernel get set an IPV6 subnet with a
block of 32K addresses and let your user program open /dev/net/<1> through
<32766> because its IP each "net" device comes with its own set of port
numbers etc. IPC is just networking and a cleverly coded kernel running on a
machine with a 64 bit virtual address space would do direct data placement. No
need to do even any page table mashing.

~~~
ScottBurson
> Just two or three users and the Pi will "choke".

Do you really know this? What's the limiting factor?

I've never played with a Pi and I don't really know, but I don't see offhand
why it would be so bad. Interrupt-handling and context-switching times should
be much better. There's plenty of main memory. The file system should be
faster, running on flash. I don't see where the bottleneck would be.

~~~
dfox
Probably only limiting factor in the hardware is that R-Pi does not have any
meaningful high-speed low-latency IO.

But on the software side there is one huge problem: today's Linux kernel and
userspace is incredibly bloated in comparison to early 90's SunOS. On the
other hand, said SunOS probably did not support many things that everybody
expects today, like scalability to large-ish SMP systems (ie. >2 CPUs), shared
libraries, loadable kernel modules, threads...

~~~
ScottBurson
SunOS definitely had shared libraries -- I think those go back at least to
4.3BSD.

Anyway, how much I/O do you need to support people running terminal apps over
Ethernet?

~~~
dfox
What I meant by the IO is that the aforementioned sparcstation had ethernet
card connected to SBus which was capable of directly producing interrupts and
doing DMA, and all the modems were connected through some multiport serial
card/concentrator which tried very hard to offload stuff from the CPU. Which
is completely different situation than R-Pi with everything hanging off USB
with it's "novel" approach to interrupts.

------
vinceguidry
I just use iptables to forward all traffic on port 80 to port 8080. You can
just replace your /etc/rc.local with this:

iptables -t nat -A PREROUTING -p tcp --dport 80 -j REDIRECT --to-ports 8080

iptables -t nat -I OUTPUT -p tcp -d 127.0.0.1 --dport 80 -j REDIRECT --to-
ports 8080

exit 0

Simple enough to add to a configuration management workflow. I used to
maintain a configurable reverse proxy, until I decided that was way too much
work and to only put one web facing app per server.

edit: rc.local may not be the best place to put these rules, check this link:

[http://bencane.com/2011/12/30/when-its-ok-and-not-ok-to-
use-...](http://bencane.com/2011/12/30/when-its-ok-and-not-ok-to-use-rc-
local/)

~~~
jerf
Unprivileged users can't use iptables, so that doesn't change anything about
the blog post. I'd suggest that if it's so easy, but still doesn't get done,
that still also leaves the blog post intact. In a way it doesn't help when
there's a thousand ways to do something, so none of them get done.

This seems to be a common misunderstanding of the blog post. Yes, there's a
ton of ways for root to create privileged ports, and a ton of ways to delegate
it in various ways. But all of them require actions by root, and sysadmining,
and aren't as secure as a system designed to work this way from the beginning
would be, so nobody uses them, so in terms of addressing his discussion
points, they might as well not exist.

~~~
salgernon
But surely, creating the user requires sysadmin privs, and as part of setting
up the user the appropriate work could be done to allow whatever hole-punching
is required. The reality though is that if Mary, Joe and Tim all want to
listen in port 80, there's going to be a mess - hence using a common server
(on a privileged port!) to demux incoming requests.

------
haikuginger
Even if port 80 was unprivileged, wouldn't you still run into issues if
multiple users attempted to bind to the same port?

Really, what would the problem be with just deploying an nginx instance that
listens on 80 and routes by domain?

~~~
mac01021
Well, someone has to administrate the nginx routing config, or there needs to
be a mechanism for each webapp sitting behind the nginx to register itself as
a forwarding target.

What I want to know is why can't we get all the major web browsers to look for
(and honor) SRV records, so that www.mydomain.com can transparently have the
browser connect to port 30338 (or whatever port my app is listening on). Then
we dispense with the need for proxies altogether.

~~~
eximius
All major protocols should, IMO.

------
nicoburns
Isn't this exactly what shared hosting is? I bet companies like dreamhost
regularly put 50 clients on a box. And Webfaction[0] will even give you pretty
open shell access to a shared environment.

[0] [https://www.webfaction.com/](https://www.webfaction.com/)

~~~
varenc
Indeed this is exactly what webfaction does!

They get around the port issue by using a nginx frontend that uses the Host to
route each request to the correct apache instance each user is running out of
their home directory. All these apache instances are bound to higher
unprivileged ports.

~~~
Spivak
Imagine a world where you don't need a router! A world where the browser would
look up a http/s service record for the host and connect directly to the
unprivileged port.

------
agentultra
There are plenty of trade-offs but network resource ownership?

For a datacenter isn't it more efficient to pack workloads across dense
machines than to over-provision idle ones?

While a factor, maybe, I don't think privileged ports are causing excess CO2
production.

Although I've been interested in low-power edge networking with uni-kernels,
would be an interesting project to work on (thinking micro-dc's with
homogenous ARM-based boards at the edge and having a dynamic system to boot up
uni-kernels on demand closest to the requesting user... a system that mostly
stays off unless it's needed).

------
klodolph
The biggest change which would have made this much easier in my mind is adding
port numbers to DNS. Instead of having HTTP at port 80, have the port number
for HTTP returned as part of the DNS query. You can allocate a block of ports
to each tenant on a host and let them use them as they see fit, you don't need
to use the low ports for anything.

~~~
throwaway2048
[https://en.wikipedia.org/wiki/SRV_record](https://en.wikipedia.org/wiki/SRV_record)

SRV solves this problem nicely, unfortunately SRV support for HTTP 2.0 was
rejected.

------
rch
I'm pretty sure most or all of this port/sudo nonsense was fixed with Plan 9.

~~~
dullgiulio
Well, ports are just files and have a owner, group and permissions. As simple
as that.

------
mikegerwitz
> alter package managers to allow installation of packages into a subtree of
> the user's home directory if the user is not root, etc.

Guix and Nix. I haven't used Nix, but Guix even allows ad-hoc containers
running only specific programs and their dependencies:

[https://www.gnu.org/software/guix/manual/html_node/Invoking-...](https://www.gnu.org/software/guix/manual/html_node/Invoking-
guix-environment.html)

That doesn't help with the privileged port problem, but per-user services can
be dealt with.

------
foxhop
This work was already done in other "Unix" environments like Solaris and BSD
(zones and jails)

My personal favorite is SmartOS (based on illumos) it runs zones on bare metal
and even has support for running Linux containers (docker)! It does this by
wrapping the Linux syscall APIs and translating them. Magic stuff!

Launch a Linux native container and run 'ps' and you will only see processes
that you own!

reference:
[https://wiki.smartos.org/display/DOC/Home](https://wiki.smartos.org/display/DOC/Home)

~~~
rsync
I came here to say just that ...

A FreeBSD jail does not emulate or create a virtual machine - it's just a
fancy chroot mechanism that produces only the unix processes that actually get
run inside the jail.

A jailed httpd does not take up any more resources than the exact same httpd
run on the base system.

This makes it extremely efficient and, in fact, allows you to create an _even
richer multi-user platform_ than the original one the op has nostalgia for: a
multi user unix system where everyone gets to be root.

------
cwp
I've long thought that a combination of NixOS and illumos would be incredibly
useful. Software would be installed into the nix store in the global zone, and
mapped into dedicated zones for each user. IP and port virtualization via
Crossbow.

User zones would be extremely light, with your typical database-driven web app
taking _no_ space in the zone, and very little space in the nix store.

------
pjc50
The "privileged port" model was basically dead at the time of the Morris worm
30 years ago. It should have been sorted out in that time but there was never
the momentum.

Everything in UNIX is a file, apart from the things that aren't. If ports
_were_ files, you'd be able to chown them or put them in a group for
delegation purposes.

------
davexunit

        Container solutions like Docker get us part of the way there. In a
        sense you could argue that containerization is precisely the
        multi-tenancy solution I'm heading toward, except that it borrows
        heavily from the legacy path of virtualization by treating system
        images like giant statically linked binaries.
    

Nailed it. This is why Docker does not excite me at all, and why I think
there's room for other container systems to improve up on the Docker model by
solving this problem. I made my initial attempt awhile ago by adding basic
container support to a package manager that allows users to use a virtualenv-
like tool to create containers to hack in:

[https://www.gnu.org/software/guix/news/container-
provisionin...](https://www.gnu.org/software/guix/news/container-provisioning-
with-guix.html)

~~~
lowbloodsugar
"Container solutions of the kind that are like docker" vs "Container solutions
in general of which docker is an example". If he is using docker to write-off
all container solutions, then that's a mistake, since it sounds like using the
underlying container solution (on which docker then adds the ability to manage
system images) is exactly what he wants.

~~~
davexunit
Yeah, perhaps namespaces will satisfy them, but from what I've seen network
namespaces leave something to be desired. I found that it wasn't too hard to
roll my own container implementation and hook it up to a tool that was already
providing software environments to unprivileged users, so I think there's
plenty of room for other people to take the primitives that Linux provides and
run with them.

------
bjt
His proposed solution actually looks a lot like one early flavor of LXC
implementations, where instead of unzipping a big tarball with your own
environment, you'd just bind-mount system folders into the container. I
suspect LXC devs are reading this post and thinking "we already have this!"

------
slaymaker1907
The issue with apt install has more to do with a poor packaging mechanism on
Debian's part rather than networking. Windows installers for once are actually
superior in this regard since they generally provide an option to install for
either the system or only the current user.

Also, to provide multi-tenant hosting, you could just create some sort of root
level reverse proxy (possibly Nginx or Apache) with some sort of registration
system for domains. You just use the Host header to multiplex the system's
port 80/443\. For localhost stuff, I think you could set up something with DNS
to be [http://{username}.localhost](http://{username}.localhost). It would be
kind of an interesting idea to do as a happy medium between serverless and
separate VMs.

~~~
api
Apple had the right idea with .app bundles. There should be no such thing as
"installation." Android and iOS kind of get it right too.

~~~
mod50ack
Problem with Apple's .apps is that they still manage to mess up your
~/Library. You can often find gigabytes of data left in there on your average
Mac.

------
colanderman
Easy solution: multiple IP addresses (one per user), and setuid wrapper to
open listening socket on port 80 of the user's IP address and pass it to the
web server after dropping privileges. No containers, VMs, or redesign of Unix
needed.

Or put them all on different ports and front it with a proxy. Run only one
MySQL instance.

Of course, containers and VMs aren't meant to solve this problem anyway.
Containers are about deployment, and VMs are about virtualization/migration.
Your customers would do well to use the former (for ease of deployment), and
you the latter (for ease of maintenance).

Privileged ports really have nothing to do with this problem.

------
mrmattyboy
Great articular... But, forgetting the (apparently GBs) of size, isn't this
really what docker and such like are trying to achieve? (maybe I'm missing
something). Especially when it comes to only storing a single instance of the
base image if all 50+ users are sharing the same image? So it will only store
the differences in each container. It sorts out your network issues, user can
be root in the container and bind to whatever port they like and IP addresses
are assigned to each container... What other overheads are there? (this is a
rhetorical question - you seem much more knowledgeable than myself) :)

~~~
mrmattyboy
Just a quick apology - that was meant to read "this isn't a rhetorical
question".. I was truly asking :D

------
FeepingCreature
Why not just run a single web server and set it up to serve public_html from
people's home folders?

The difference to that multiuser system is that we want people to manage their
own infrastructure. That's the real reason for the bloat - no matter how much
that VM server costs, I bet you it costs less than paying people to administer
a shared infrastructure. It has nothing to do with ports.

~~~
catern
>Why not just run a single web server and set it up to serve public_html from
people's home folders?

Doesn't work with dynamic content beyond CGI, which is too slow and usually
not supported by modern web frameworks.

~~~
pmoriarty
Could you elaborate on why it doesn't work with dynamic content beyond CGI? Is
this some kind of insurmountable limit, or is it just the web frameworks don't
care to support it?

~~~
catern
It's not an insurmountable limit; it's not the web framework's fault either.
It's that there's not a way for a individual user to indicate to a shared
system-wide webserver, "Here is the FastCGI (or whatever) socket that you
should connect to, to generate dynamic content for my home directory." But
certainly a way to express that could be created.

------
exabrial
Service location in the TCP stack is a bad idea. Why port 80 _every_ time for
http? Put that into DNS SRV! Boom! IPv4 crisis solved (for now)

------
rnhmjoj
I'm happy to know I'm not the only one who wants the death of privileged
ports. When I proposed this for NixOS I wasn't exactly well received:
[https://github.com/NixOS/nixpkgs/issues/11908#issuecomment-2...](https://github.com/NixOS/nixpkgs/issues/11908#issuecomment-250979432)

~~~
api
"systemd will handle that"

No no no handling with complex logic what can be achieved instead with simple
design is how Windows got its suckage. Sigh.

------
sinxoveretothex
I was a little bit annoyed (enough to comment) by the use of the wrong units.
'mb' is millibit, perhaps millibyte, but certainly not megabyte (MB or MiB).

Similarly:

> it pushes about 5-10 megabits of traffic most of the time

Bits are a unit of information, not flow. Probably the author meant Mib/s.

~~~
CydeWeys
This annoyed me too. The author does not seem to be aware that case is highly
important when dealing with SI/binary prefixes as well as unit abbreviations.
If you don't even know (or care) that "b" is a bit and "B" is a byte, and you
use them incorrectly, how am I supposed to trust your technical knowledge
about the rest of the stuff you're talking about?

~~~
eximius
The author is the creator of ZeroTier. He is likely aware and didn't care
because it's extremely pedantic and most people know what he means in informal
settings, which this is. I expect he'd use the correct case in an RFC or
something.

~~~
CydeWeys
Getting abbreviations for bits/bytes correct is not "extremely pedantic", it's
about communicating correctly. In a network context, which this _is_ , "b"
means bits, but he was using it to refer to bytes. Also I've never heard of
ZeroTier, but even if I had, I probably would not have made the connection
that he was the author of it, so being correct about these things is important
for establishing credibility with new audiences.

~~~
eximius
IIRC, most if not all uses of 'mb' or 'gb' were about disk space or RAM, so
not really a network context - this is about bloat, ports are just the
whipping boy.

And I'm not against pedantry in the right context, but this is just a casual,
relatively nontechnical rant. Pedantry is really not needed.

------
adrianratnapala
The original post seems a bit unix-centric. Much of what he is complaining
about comes from the nature of TCP/IP networking, Unix can only be blamed to
the extent that it doesn't abstract things like ports away.

The general thrust of his post is to fix Unix so that its (ohh so well
respected!) permission model can extend across into the cloud. This will run
into friction when more than one OS is involved.

That said: anything that unifies all the different kinds of virtualization as
the OP wants would need to be an OS. And the only plausible candidate for an
OS lingua-franca in this world is Unix.

So long live unixcentricsim?

------
apenwarr
Since we're here anyway: if we would just change web browsers to connect to a
default port number obtained via DNS instead of always using 80/443 by
default, then we could offer web servers through NAT gateways (by hosting up
to 65535 web servers on a single IPv4 address). We'd need some way to tell the
NAT we want to expose a port (some variant of uPnP or some less-dumb
protocol), but other than that, it would be easy. And then we'd have
effectively 65535x as many IPv4 addresses, which would be enough for everyone,
permanently, and IPv6 wouldn't be needed.

~~~
colechristensen
What? This isn't necessary at all. HTTP includes the requested hostname in the
request so you can have as many DNS names as you want point to a single IP. No
NAT or other magic necessary. The entire HTTP Internet could be hosted on a
single IP.

~~~
rictic
True but if the sites were served by different processes then there would have
to be a trusted reverse proxy process to dispatch.

------
rocqua
I'd point out that currently, access to port 80 for a domain is access to a
certificate for that domain.

This means that, stupid or not, HTTPS essentially is as secure as privileged
ports are.

~~~
nly
Slightly more secure, since you can have multiple A records on a domain and
your domain verification should really check them all.

It's more fair to say HTTPS is basically as secure as DNS. If someone hijacks
your DNS for 5 minutes they can have a TLS cert from LetsEncrypt for your
domain for 90 days

------
mindslight
> _(USB is slowly replacing the lighter plug, but most cars still have them.)_

And yet, that's just trading for different path dependence annoyances. 5V,
500mA limit (or proprietary higher-voltage hackland), and that feeling of
connecting your valuable phone to a low-bid switcher? Airports, cars, home
outlets - I'll choose the appropriate third party device every time! I just
wish that car power socket wasn't so dildonic so that more outlets would come
already built in.

------
rythie
Ultimately the issue is that IPv4 addresses are expensive. Allowing multiple
users bind to port 80 on different IP addresses, assume you can have multiple
IP addresses on that IP. Unless they are private IPs, that's going to be the
main issue.

Any practical solution would need a system wide nginx to proxy to all of the
tenants, who would run their apache/nginx/node etc. on a high numbered port.

------
nailer
> How did we end up with the nested complexity explosion of OS->VM->containers

If someone is putting containers into VMs they've eliminated the performance
benefits of containers and added an additional layer of complexity for no
reason: ie they don't know what they're doing and wanted to run Containers on
their existing VM only cloud platform.

------
leeoniya
does [https://github.com/redox-os/redox](https://github.com/redox-os/redox)
address any of these pain points?

i asked: [https://github.com/redox-
os/redox/issues/987](https://github.com/redox-os/redox/issues/987)

------
mioelnir
Well, FreeBSD's mac_portacl(4) has been around since 5.1-R (June 2003) and
allows per-user ACLs on privileged ports. Although the permission is for all
IP addresses, not a specific one. But one could create a virtual network
device per user and assign it a mac_mls policy to restrict that interface to
that user... hmmmmm...

------
mugsie
We also currently have technology, and could have allowed HTTP/2.0 to allow
for this, if we just used SRV records.

If we had SRV records, you could whatever damn port you want, and it would be
invisible to users.

It would also allow us to not have to get load balancers for most (definitely
not all) setups - but that is different rant.

------
garaetjjte
This wouldn't change anything because you cannot run multiple service
instances on same port so users still would have to specify port in URL. SRV
record in DNS could help if anything supported it. But it is not needed now
because with IPv6 we have enough addresses to assign each process unique IP.

------
csours
Good enough + soon enough beats "best possible", or even "better later".

In this case Ethernet + TCP/IP beat OSI.

Also, JavaScript beat any number of sensible scripting languages. Houses are
frequently constructed such that they are not very serviceable.

~~~
rjsw
I'm not sure your example matches your mantra. OSI was usable before ISPs
became a thing, you ran OSI over X.25.

~~~
csours
X.25 isn't the whole networking stack. Elements of OSI were available, and are
still available, but most people use something else because it's easier: good
enough, soon enough.

~~~
rjsw
I was developing for OSI at a time when it was not possible to do equivalent
things on a WAN in Europe using TCP/IP.

TCP/IP didn't win by being first.

~~~
csours
How did TCP/IP win?

~~~
rjsw
I'm not sure that the OSI people even realized that they were in competition
with anything else. I think they basically neglected the low end of the
market.

At the time there were multiple LAN protocols that were mostly used for file
sharing, Netware, Appletalk, NetBEUI, etc. You had NFS for TCP/IP but it
wasn't really used for anything other than UNIX workstations.

Probably it would have required somebody to write a cut-down OSI stack for MS-
DOS that could be linked to a particular killer application (whatever that
was, maybe FTAM).

This would still have had to compete with the way that TCP/IP was able to
swallow up the other LAN protocols, we wouldn't have had to go through the
IPv4 to IPv6 migration though.

------
bsder
Umm, the real problem is that I can't discover what _port_ you are on.

So, in the article's instance, the machine owner needs to set up a web server
on default port 80 that redirects to the correct web server on the correct
non-privileged port.

~~~
Spivak
Sure you can, this is the entire point of service discovery. We've built so
much crap to deal with the fact that web browsers don't resolve SRV records
(and at least for HTTP/1.1 prohibited by RFC from doing so).

------
CydeWeys
The article does not mention backups or standardized images at all. These are
huge reasons why virtualized machines are attractive, and which multi-tenant
OSes don't provide.

------
z3t4
In the wordpress example you could use named pipes/sockets, but it's rather
complex to manage. A better idea is to hand out ipv6 addresses and let users
bind to them.

------
woliveirajr
Why was the original title ("Privileged Ports Cause Climate Change") changed ?

~~~
dullgiulio
I guess because it's quite silly, grossly imprecise and click-bait-y.

------
gaius
If DNS returned IP:port for hostname, that would sort it too.

www.foo.com --> sharedserver:8000, www.bar.com --> sharedserver:8001 and so
on.

~~~
aaronmdjones
It can, with SRV records, which is what things like ActiveDirectory and XMPP
use to discover the correct server and port for a given name.

Its use for HTTP (and most other L7 protocols in general) just never really
caught on.

~~~
gaius
I believe Consul can do this and it's a pretty nice capability but the effort
of retrofitting it onto things that just assume well known ports would be
phenomenal.

I mean even if you know you want http the "right" thing to do is look up the
port number in /etc/services but who ever does that? We all just hardcode 80.
So that's another potential technique that we can't use...

------
pishpash
This article makes no sense. Virtualization's first use was to run a guest OS
on a different host OS. Containerization's first use was to achieve
reproducible configuration at a lower cost than virtualization. Neither were
trying to solve a multi-tenant hosting problem where everybody is running the
same system -- were it that simple!

~~~
nickpsecurity
The first use of virtualization that I can remember was VM/370\. It did the
things you said plus hosted systems for many users connecting via terminals.
Supporting legacy systems and getting higher utilization with multiple
workloads were also advertised benefits of VMware on x86. IBM and NUMA vendors
touted isolation of multiple workloads or users with things like LPARS. The
cloud providers now claim a lot of this stuff.

So, it seems like it was intended to do these things going back to the 70's.
They also designed insecure, bloated solutions that INFOSEC founders like Paul
Karger called them out for. Resulted in better designs like KVM/370, KeyKOS,
VAX VMM, separation kernels, and recently mCertiKOS. Most peddling
virtualization for multiple users still use bloated, untrustworthy components
though.

