
Misadventures in process containment - yorwba
https://apenwarr.ca/log/20190111
======
sgentle
Is it just me or is the cloud era of developer tooling fundamentally weird? It
seems like there used to be this tacit assumption that a good tool is one that
starts small with you and grows as your needs grow. How do I log in to a
remote machine? ssh. How do I log in to hundreds of remote machines? ssh. How
do I reverse tunnel my local webserver through a firewall that only allows
HTTPs traffic? ssh.

Of course, there were always systems that asked a lot from you before they
gave anything back. I think everyone's first experience of SQL was probably
"what?" followed by "no, really, what?" Kneel at the altar of the cartesian
product, my child, that you may be reborn into the kingdom of relational
algebra. But, hey, if apostasy is your bag there's still nothing stopping you
from putting all your data in one big table. If you're good enough at it you
might even snag a job at Google.

But the Dockerverse is really something else. No shade on the technology –
it's really cool stuff – but boy does it love to tell you how big and complex
your problems are. You want to build images on one machine and run them on
another? Sounds like what you really need is a Secure Container Registry! Want
to have services with dependencies? Forget init^H^H^H^Hsystemd, try Docker
Compose! Just kidding, I mean Docker Swarm! Just kidding, I mean Kubernetes!
Actually you kind of want all of them maybe!

I don't think this is a weakness of the software itself; you can find simple
ways to use containers if you try hard enough. But that difficulty really
speaks to the motivations of the cloud services industry. Using Docker for
simple problems is asking a car salesman for whatever will get you from A to
B. Hey, buddy, you don't know what you're missing. Everyone thinks they want
something simple until they see our amazing upgrade options. And what about
safety? Think about what would happen to your family if you were caught in a
tragic Byzantine Fault.

And, yes, absolutely, if you have Big Cloud Problems, the cost of finding the
right cloud-native service discovery mesh is near-zero when amortised over
thousands of servers worth of existential dread. But I just can't help but
feel like I'm preparing for cloud scale like I'm preparing for a zombie
apocalypse. It's an interesting problem space, and a great excuse to hang out
on the tactical flashlight subreddit, but I'm not sure these tomorrow
solutions actually help me with my today problems. The people selling the gear
sure are doing well, though.

~~~
lkrubner
And you have to go double-or-nothing at each stage, which is the real risk.
Docker didn't solve everything? Try Kubernetes! But wait, Kubernetes didn't
solve everything? Maybe try it with Ranch! Or give up and go to Mesos?

I've worked with startups that have invested more and more money in an attempt
to get the assumed benefits of containers. The double-or-nothing aspect is
what had me worried when I wrote "Docker is the dangerous gamble which we will
regret":

[http://www.smashcompany.com/technology/docker-is-a-
dangerous...](http://www.smashcompany.com/technology/docker-is-a-dangerous-
gamble-which-we-will-regret)

------
JNRowe
Great read about containers aside, I think the most interesting point for me
is that `redo` is getting some love. 150 commits in the past couple of months
after a six year hiatus, including some great documentation improvements.

\--

Anyone familiar with the mentioned `bupdate` tool and zsync¹? I'd love to read
a comparison, as I've only used the latter.

1\. [http://zsync.moria.org.uk/](http://zsync.moria.org.uk/)

~~~
apenwarr
I'm the author of the article in question. I didn't know about zsync! Thanks,
I'll link to it from the footnote.

Flipping quickly through the zsync docs, it looks very well done. I'm not sure
if the way they adapted rsync is better or worse than the bup/bupdate way of
doing it; it looks like they take more time to do the initial encoding of the
index file, but that's not very important since you only do it once,
especially if it saves bytes.

They also have a (complex and potentially error-prone) way to look into .gz
files and sync them extremely efficiently even without gzip --rsyncable.
That's really cool, but risky, and of course only works with gzip, not other
compressors. Not sure if that's a good idea or a bad idea, but nobody forces
you to use it.

tl;dr zsync has actual documentation and an actual release, so you should
probably use it instead of bupdate.

(People should feel free to ask me any questions about redo, bupdate, the redo
container builder, etc in the comments here if you like.)

~~~
JNRowe
> They also have a (complex and potentially error-prone) way to look into .gz
> files and sync them extremely efficiently even without gzip --rsyncable.

FWIW, the last release of `zsync` predates `gzip --rsyncable` by about six
years¹.

> People should feel free to ask me any questions

"Thanks" isn't a question, but thanks for creating `redo` and the great
articles. I always seem to come out of them having learnt something new, and
often end up rabbit holed in the interesting side topics that are raised too.

1\.
[http://git.savannah.gnu.org/cgit/gzip.git/tree/NEWS#n74](http://git.savannah.gnu.org/cgit/gzip.git/tree/NEWS#n74)

~~~
apenwarr
You're welcome :)

The zsync docs clearly talk about "gzip --rsync", which I guess must have been
an earlier version of "gzip --rsyncable".

------
zimbatm
The only good thing about the Dockerfile format is that most developers can
write one easily. Here are a list of a few issues:

* Dependencies are best represented as a tree. Dockerfile forces to linearize that tree.

* It's not possible to compose two or more Dockerfile together.

* If `COPY` or `ADD` instructions are being used, all the files are being send by the client to the daemon, including timestamp and UID. This breaks caching badly as two different users would not produce different images.

* In general Dockefile are not bit-reproducible; two developers building the same Dockerfile will get a different output.

The underlying v2 image format has content-addressable layers which is great.
It means that in theory it's possible to upgrade the base OS layer without
rebuilding the rest.

PS: Nix's dockerTool.buildLayeredImages fixes all those issues and can take
advantage of the CAS format.

------
peterwwillis
Wow, I had no idea about _gzip --rsyncable_. I haven't used rsync in years, so
I guess it never became necessary. (Also, I stopped using gzip years ago...)

------
ryanpetrich
Odd that apt-get was mentioned as a tool that could use incremental downloads:
it has supported them since 2006.

~~~
apenwarr
As far as I know, it does this by having a base file and a bunch of delta
files alongside it on the server. Applying these deltas is time consuming, you
have to choose how many levels of delta to keep around, and you end up
downloading a lot of redundant content (ie. information about old packages,
only to replace it with the delta content).

bupdate avoids that by just letting you post only the latest file (and .fidx),
then the client figures out which blocks it doesn't yet have.

------
je42
mmh. i wonder why he doesn't mention multi-staged docker builds. they are
clearly more limited than redo, but for small + simple dependency trees they
are pretty okish.

