
Fragile narrow laggy asynchronous mismatched pipes kill productivity - trishume
https://thume.ca/2020/05/17/pipes-kill-productivity/
======
btreecat
So I have been thinking about software projects lately, and I have come to the
conclusion that a lot of these tools/solutions exist to "build houses" when
most of us are just throwing together lean-to sheds and dog houses.

Software projects today are naturally more complex and have more complex
tooling the same way building a house today requires more knowledge and skill
than it did 50 years ago.

Then there are some folks/organizations building cathedrals, and the
associated tooling (react, angular, maven, etc) and all the rest of us look up
in awe and thing "well I guess if I want to be that good I need to use those
tools on this dog house."

But your dog house doesn't have the need to host parties, provide security, or
even real weather protection other than a roof to keep the rain and sun out.
Yet we all try to build our dog houses in ways that might be better if they
are one day converted to a proper living quarters but likely will never have a
need for running Water or windows.

~~~
colechristensen
French cooking's _mise en place_ or _first order retrievability_ by Adam
Savage or _Lean Engineering_ or whatever.

There's a common thread among them, having what you need, when you need it,
where you need it, and understanding how to use it.

This is what is lacking. The original Unix philosophy build by greybeards and
university linguistics and language professors had this at it's core. The "do
one thing well" combined with how the shell worked being driven by people who
really _really_ understood language and came up with one that made sense to
them for interacting with computers... was, is still somewhat, _wonderful_.

What's missing today is exactly what you mention. People designing tool sheds
with materials often shoddily designed for cathedrals.

Complexity. This is the enemy, the second enemy is bad attempts to reduce
complexity which often end up adding more complexity than they take away, just
harder to find.

The favourite example of this is - perhaps apocryphal, but entirely believable
- is replacing dozens of nodes using fancy big data tools with one node
running sed/awk,etc.

One thing is clear, nowhere I've been has had the tools readily available, the
documentation clear and forthcoming, and the scale in the right range for
projects.

I found myself recently solving a problem with Hashicorp Vault using GCP to
verify identity of machines wanting secrets. It was a stretch goal which had
been on my plate for six months, every once in a while I would try to go back
and figure out how to make it work, and months and months and months after
trying, I put it together and it worked perfectly. The documentation to lead
to this understanding had to be read out of order on several different pages
with some lucky guesses to arrive at the solution, which in the end was just a
few steps easily explained. _Afterwards_ the documentation was fine and made
perfect sense. _Before_ I grokked the issue the documentation just seemed like
a bunch on nonsense which led me to believe what I wanted to do wasn't
possible in a constrained security environment.

That is the kind of problem I solve all the time as someone with a decade of
DevOps,SysAdmin,whatever experience behind me. Not using knowledge and tools
to amplify what I do, but spending 60% of my time confused as hell about
something which should be obvious and is only obvious afterwards, 20% trying
to convince people of things they're often reluctant to believe, and 20%
actually using built up knowledge and tools to do many many things very
quickly. It's frustrating.

~~~
zrm
> Not using knowledge and tools to amplify what I do, but spending 60% of my
> time confused as hell about something which should be obvious and is only
> obvious afterwards, 20% trying to convince people of things they're often
> reluctant to believe, and 20% actually using built up knowledge and tools to
> do many many things very quickly. It's frustrating.

It also creates some nasty perverse incentives. Because if you spend the hours
to find the best way to do something that should be simple, it often turns out
to actually be simple. API function #487 does just exactly what you need, if
only it hadn't taken seven hours to find it. So then you spend a day writing
five lines of code that should have taken half an hour, but in the end you
have a good solution.

But there is an alternative to that. You can reinvent the wheel. Instead of
spending your time understanding the unnecessarily complicated thing with poor
documentation, write some new code to do just your thing.

And who looks more productive to the boss? The one who spent all day to write
five lines of code, or the one who added a complex new feature that clearly
took a lot of time and effort (but will now take ten times more effort to
maintain)?

This is especially nasty because it's not always an obvious answer. Sometimes
the existing wheel is made of glass and is shaped like a triangle and
reinventing it is totally worth it, and then you spend all day discovering
that and still have to spend tomorrow doing what a lazier person did
yesterday.

Whereas without the unnecessary complexity the right answer would have been
obvious much sooner.

~~~
rkangel
This is why good documentation is such a wonderful feature. I like Elixir for
lots of reasons but the documentation for the core language and particularly
for Phoenix (the primary web framework) is the best technical documentation I
have seen in any context, ever.

It makes _such_ a difference to your ability to get things done with
unfamiliar libraries etc.

------
lostcolony
Failure is such a fun thing to think about, and it gets handwaved away so
often. So many devs, architects, product owners, etc, just focus on happy
path, and leave failure unspecced, unhandled, and just hope it never happens.
And then boast about 99% uptime, but once you start questioning them you find
out they get weekly pages they have to go investigate (and really the system
is behaving weirdly a solid 10% of the time, but they don't know what to do
about it and it eventually resolves itself, and they don't count "pageable
weirdness" in their failure metric).

It's actually one of the things I love about Erlang, and how it's changed my
thinking. Think about failures. Or rather; don't. Assume they'll happen, in
ways you can't plan for. Instead think about what acceptable degraded behavior
looks like, how to best ensure it in the event of failure, and how to
automatically recover.

~~~
jodrellblank
On the subject of failures, you might like this blog post
[https://danluu.com/postmortem-lessons/](https://danluu.com/postmortem-
lessons/) if you haven't seen it before.

------
chubot
I think git is a good model for what would otherwise be "laggy async and
mismatched" distributed systems.

It has a fast sync algorithm, and after you sync, everything works locally on
a fast file system. You explicitly know when you're hitting the network,
rather than hitting it ALL THE TIME.

\-----

I would like to use something like git to store the source code to every piece
of software I use, and the binaries. That is, most of a whole Linux distro.

I have been loosely following some "git for binary data" projects for a number
of years. I looked at IPFS like 5 years ago but it seems to have gone off the
rails. The dat project seems to have morphed into something else?

Are there any new storage projects in that vein? I think the OP is identifying
a real problem -- distributed systems are unreliable, and you can get a lot
done on a single machine. But we are missing some primitives that would enable
that. Every application is littered with buggy and difficut network logic,
rather than having a single tool like git (or rsync) which would handle the
problem in a focused and fast way.

It would be like if Vim/Emacs and GCC/Clang all were "network-enabled"... that
doesn't really make sense. Instead they all use the file system, and the file
system can be sync'd as an orthogonal issue.

Sort of related is a fast distro here I'm looking at:
[https://michael.stapelberg.ch/posts/2019-08-17-introducing-d...](https://michael.stapelberg.ch/posts/2019-08-17-introducing-
distri/)

~~~
dimatura
I have been interested in "git for binary data" for a while, mostly for
ML/computer vision purposes.

I've tried quite a few systems. Of course, there's git-lfs (which keeps
"pointer" files and blobs in a cache), which I do use sometimes - but it has a
quite few things I don't like. It doesn't give you a lot of control on where
the files are stored and how the storage is managed on the remote side. The
way it works means there'll be two copies of your data, which is not great for
huge datasets.

Git-annex ([https://git-annex.branchable.com/](https://git-
annex.branchable.com/)) is pretty great, and ticks almost every checkbox I
want. Unlike git-lfs, it uses symlinks instead of pointer files (by default)
and gives you a lot of control in managing multiple remote repositories. On
the other hand, using it outside of Linux (e.g., MacOS) has always been a bit
painful, specially when trying to collaborate with less technical users. I
also get the impression that the main developer doesn't have much time for it
(understandably - I don't think he makes any money off it, even if there were
some early attempts).

My current solution is DVC ([https://dvc.org/](https://dvc.org/)). It's
explicitly made with ML in mind, and implements a bunch of stuff beyond binary
versioning. It does lack a few of the features of git-annex, but has the ones
I do care about most - namely, a fair amount of flexibility on how the remote
storage is implemented. And the one thing I like the most is that it can work
either like git-lfs (with pointer files), like git-annex (with soft- or hard-
links), or -- my favorite -- using reflinks, when running on filesystems that
support it (e.g. APFS, btrfs). It also is being actively developed by a team
at a company, though so far there doesn't seem to be any paid features or
services around it.

Pachyderm ([https://www.pachyderm.com](https://www.pachyderm.com)) also seems
quite interesting, and pretty ideal for some workflows. Unfortunately it's
also more opinionated, in that it requires using docker for the filesystem, as
far as I can tell.

Edit: a rather different alternative I've resorted to in the past -- which of
course lacks a lot of the features of "git for binary data" \-- is simply to
do regular backups of data to either borg or restic, which are pretty good
deduplicating backup systems. Both allow you to mount past snapshots with
FUSE, which is a nice way of accessing earlier versions of your data (read-
only, of course). These days, this kind of thing can also be done with ZFS or
btrfs as well, though.

~~~
kortex
+1 for DVC. Setting up the backing store can be some extra work if you are
doing that yourself, but after that it's a breeze.

What do you use for the backing store?

Git-lfs has been a pain in my seat since my first use of it. Most of the
issues stem from the pointer files that have to be filtered/smudged pre/post
commit.

Haven't used git-annex myself, but I have heard from coworkers that cross-OS
is a pain.

~~~
dimatura
Mostly S3. I used to do SSH, but these days I can afford to keep the data in
the cloud. I do appreciate the possibility of migrating to other stores if
needed in the future, though - might have to soon, for $reasons.

------
adrianmonk
See also Peter Deutsch's "Fallacies of Distributed Computing" list
([https://en.wikipedia.org/wiki/Fallacies_of_distributed_compu...](https://en.wikipedia.org/wiki/Fallacies_of_distributed_computing)).

There's some overlap, but also some new stuff. In particular, "pipes" isn't
covered by the Fallacies list and is consistently a pain point and/or issue
you always face in some way. Also "asynchronous" isn't covered by the
Fallacies list.

------
smitty1e
> Sometimes a distributed system is unavoidable, such as if you want extreme
> availability or computing power, but other times it’s totally avoidable.

But so much of our sales pitch involves these shiny cloud systems.

Who ever sold business by telling the customer: "Your use-case really isn't
exciting, and a boring batch-driven process is completely appropriate"?

~~~
TomMarius
I do it like that, it's always met with excitement like "oh wow, all these
other companies were telling us how hard and costly and lengthy it will be,
thank you"

~~~
smitty1e
Do you ever have the experience of doing a prototype, and the customer looks
at it and says: "Great work. Put it into production"?

~~~
dylan604
every. damn. time. i'm a freelancer, so there's not a dev team behind me. the
client likes what i made, and wants to start using it. realizing this, i've
started spending much more time on the UI/UX (even though i'm not one of those
guys) to at least make the tool useable in a dogfooding way.

~~~
MauranKilom
Maybe it's not 100% relevant to your case, and maybe you've already seen it,
but just in case:

[https://www.joelonsoftware.com/2002/02/13/the-iceberg-
secret...](https://www.joelonsoftware.com/2002/02/13/the-iceberg-secret-
revealed/)

~~~
ChrisMarshallNY
Joel Spolsky is my hero. There are so many nails he’s been hitting on the head
for decades.

Engineering (to me) is a _craft_ ; not just a vocation.

I also think that the software industry’s Ponce de Leon obsession might be
part of the problem.

Younger folks tend to “dream big,” which I think may be a factor behind the
ageism.

 _“They haven’t had some grizzled old fart tell them that it can’t be done!”_
is the argument that I’ve heard.

Tru dat, but one of the things that older folks have, is _experience getting
things done_ , often, highly optimized, polished, and of excellent quality.
That tends to require a lot of tenacity and patience.

Shipping is boring.

Ever watch a building go up? A good prefab looks complete after three months,
but doesn’t open for another nine months. It looks awesome and shiny, but is
still behind a rent-a-fence. What gives?

That’s because all that interior work; the finish carpentry, the drywall, the
painting, etc., take _forever_ , and these are the parts of the building that
see everyday use, so they need to be absolutely spot-on. The outside is mostly
a pigeon toilet. It doesn’t need to be as complete; a solid frame and
watertight is sufficient. They just needed it to keep the rain out, while the
really skilled craftspeople got their jobs done.

I like to make stuff polished, tested and _complete_. I don’t like making
pigeon toilets.

------
tlarkworthy
I love the catagorization. But decent software should be distributed. I
dislike single teams that begat 10s of miscroservices, but it should be buy,
not build, features from specialized 3rd parties. Thus a decent modern
installation should be leaning on a ton of 3rd party services (e.g. Identity
providers, databases, caches) because they all do a better job than the hand
rolled local one. It's how you outsource expertise.

The vision of the service mesh is to make unreliability and security no longer
the job if the application binary. Even without a service mesh, you can put a
lot of common functionality into a reverse proxy. Personally I am loving
OpenResty for the simplicity of writing adapters and oauth at the proxy layer
with good performance.

~~~
trishume
I think it should be possible to buy or use software from third parties. One
thing I'm disappointed about and think we need better tools to avoid is the
fact that third parties provide their tools as services rather than libraries.
There's reasons they do that, deploying a library that all your customers can
easily use is hard right now, but there's no reason it has to be that way.

Some things can't easily be libraries like databases, but other things like
some caches, miscellaneous operations like image resizing (depending on an
architecture that can handle the load on those servers), and a bunch of other
things could just be libraries.

~~~
tlarkworthy
The world is distributed. Money is distributed. You need services to interact
with the world. The large portion of useful business services cannot be
encapsulated into a hermetic library.

If you think that distributed service clients as libraries is enough to solve
the issue of distributed computing, that is incorrect as most of the same
distributed systems crap will still happen.

~~~
kortex
Totally agree. I have found that all of the pain points OP article mentions,
are things that still crop up in monoliths, and by hardening against them, the
app is more antifragile. Ex:

Fragile: writing fault tolerant (because the endpoint is unreliable) hedges
against the case when unforseen conditions throw a spanner into the works.

Async/Laggy: some operations just take longer. Being able to gracefully route
control flow around blockages improves performance.

Pipes: I've found it's helpful to have ser/de interfaces throughout the
program, as it forces you to use immutable data structures - aka functional
programming. This helps reduce state space.

Put together, I find having various "shear lines" in a monolith codebase,
where I could split into a service if I wanted, greatly improves robustness.

------
ChrisMarshallNY
This article has a point.

But, as in all things software, "it depends."

It depends on what the tools are, and what we are writing.

In my own case, I have the luxury of writing fully native Swift code for Apple
devices. I don't need to work with anything off the device, except for fairly
direct interfaces, like USB, TCP/IP or Bluetooth.

Usually.

I have written "full stack" systems that included self-authored SDKs in Swift,
and self-authored servers in PHP/JS.

I avoid dependencies like the plague. Some of them are excellent, and well
worth the effort, but I have encountered very few that really make my life as
an Apple device programmer that much easier. The rare ones I do use (like
SOAPEngine, or ffmpeg, for instance), are local to the development
environment, and usually quite well-written and supported.

If I were writing an app that reflected server-provided utility on a local
device, then there's a really good chance that I'd use an SDK/dependency with
network connectivity, like GraphQL, or MapBox. These are great services, but
ones that I don't use (at the moment).

I'm skeptical of a lot of "Big Social Media" SDKs. I believe that we just had
an issue with the FB SDK.

That said, if I were writing an app that leveraged FB services, I don't see
how I could avoid their SDK.

So I write fully native software with Swift, and avoid dependencies. That
seems to make my life easier.

But Xcode is still a really crashy toolbox.

------
perfunctory
Just reading the title I assumed it was a post about business processes and
communication between teams. Because this is how working for a big corp
sometimes feel.

~~~
snazz
It's a bit of Conway's law: the software structure mirrors the organization
structure.

------
lowbloodsugar
Really like the curiosity and thought behind this article. Couple of thoughts:

>probably upwards of 80% of my time is spent on things I wouldn’t need to do
if it weren’t distributed.

Sure. Do it on one giant machine. Then you'll be spending 80% of your time
doing things you wouldn't need to do if it weren't monolithic.

At the end of the day, if your customer is on the other end of the internet,
then all of those complaints apply. If you solve that by running an app on
their device, then oh boy are you going to have fun testing.

I prefer scaling out. The stackoverflow peeps prefer scaling up. There are
some great write-ups about how they scale. I found this [1] one after some
quick googling, but I am certain there are more. So it's really about choose
your poison.

>I think people should be more willing to try and write performance-sensitive
code as a (potentially multi-threaded) process on one machine in a fast
language if it’ll fit rather than try and distribute a slower implementation
over multiple machines.

Sure. I once replaced a system that ran on 10 32-core machines with one that
ran on one with four cores on one machine and did the work in the same time.
Another time I had 96 cores, more threads, and I replaced it with one that had
three threads and was faster.

But both of those solutions were evolutionary dead-ends. The tasks were very
specific, and not subject to change. The first one was a single C file. The
latter was actually java, but with hand-rolled hash tables and optimistic
locks. The first one I doubt I could follow it now.

My point is, you can have understandable systems that good people (as opposed
to geniuses) can work on, evolve and adapt, and that have well understood
failure modes and scaling cliffs. Or you can have bonkers code that everyone
is afraid to touch, and which fails in production when it hit a cliff you
didn't know about and now your site is dead for eight days.

If you can strike a good balance, then you'll probably have some combination
of distributed, and brute force.

[1] [http://highscalability.com/stack-overflow-
architecture](http://highscalability.com/stack-overflow-architecture)

~~~
naniwaduni
> Sure. Do it on one giant machine. Then you'll be spending 80% of your time
> doing things you wouldn't need to do if it weren't monolithic.

Deeply false equivalence.

------
evadne
Do you have a moment to talk about our Lord and Saviour, Erlang/OTP?

~~~
shijie
My thoughts exactly! As the Elixir code at my company continues to grow at a
rapid rate, writing OTP services for everything has allowed me to never have
to think about entire classes of bugs and edge cases simply by virtue of the
patterns inherent in OTP and Elixir/Erlang.

Right tool for the right job.

------
crazygringo
Of course they do. But there's no alternative.

No matter how fast or beefy your server is, these days if your product becomes
a success, 99% of the time it will outgrow what's possible on a single server.
(Not to mention needs for redundancy, geographic latency, etc.) And by the
time you see the trend heading upwards so you can predict what day that will
happen, you already won't have the time for the massive rewrite necessary.

So yes, it's tons slower to write distributed servers/systems. But what other
choice do you usually have?

Though, as much as possible, you can _try_ to avoid the microservices route,
and integrate everything as much as possible into monolithic replicable "full-
stack servers" that never talk to _each other_ , but rather rely entirely on
things like cloud storage and cloud database. Where you're paying your cloud
provider $$$ to not fail, rather than handle it yourself. Sometimes this will
work for you, sometimes it won't.

~~~
nixpulvis
I've seen a handful of applications that attempt to "scale" by going down the
micro-service route in a completely flawed way. Only to end up with a tangled
mess that's impossible to reliably debug. All progress halts.

There's nothing inherently wrong about your statement, just that it's still
far to easy to write shitty distributed system, and so much easier to push
that complexity off onto the OS or even network layer itself.

Why should _I_ the application developer care about the way my user's data
enters my DB? This should be tightly abstracted away, and traced/logged
accordingly. Leave it to _me_ the systems developer to get the details right,
and share the fruits of my labor with everyone.

I can imagine no system more deserving of shared resources than network
technology. Try and imagine a world without TCP/IP, do you not end up with
something similar?

------
nitwit005
I've found people have these problems inside of their datacenter, where there
is reliable low latency bandwidth, but where things might rebooted due to
upgrades or maintenance.

Common example is data being pushed between systems with HTTP. Take the
simplest case of propagating a boolean value. You toggle some setting in the
UI, and it sends an update to another system with an HTTP request, retrying on
a delay if it can't connect. This has two problems. The first is that if the
user toggles a setting on and then off, you can have two sets of retries
going, producing a random result when the far end can be connected to again.
The second, is that the machine doing the retries might get rebooted, and
people often fail to persist the fact that a change needs to be pushed to the
other system.

I've seen this issue between two processes on the same machine, so technically
you don't even need a network.

~~~
jxcl
If you're dealing with multiple processes persisting data to different data
stores, you're dealing with a distributed database problem, which has many
more failure modes than the ones you've described here.

If there's anything I've learned across my career, it's to avoid distributed
databases unless absolutely necessary, and if it is necessary, then spend a
bunch of time trying to make sure you got it right. And then even after that
you probably got it wrong.

------
jancsika
> Untrusted: If you don’t want everything to be taken down by one malfunction
> you need to defend against invalid inputs and being overwhelmed. Sometimes
> you also need to defend against actual attackers.

Sorry, but unless your centralized alternative is only used internally by
troglodytes you have to at least defend against invalid inputs.

------
robbrown451
The title reminded me of the turboencabulator.
[https://www.thechiefstoryteller.com/2014/07/16/turbo-
encabul...](https://www.thechiefstoryteller.com/2014/07/16/turbo-encabulator-
best-worst-jargon/)

------
lazyjones
Networking and distributed software are well understood nowadays, the lower
productivity of web companies vs. SpaceX etc. comes from all the unstable
(both ever-changing and buggy) software they need to use. Most modern software
is affected by this, but the web has it worse because of security issues and
because the way browsers are evolving (on purpose, one has to add, because
it's a cartel of large players on the web trying to stifle competition).
SpaceX doesn't get some innovative new alloy they didn't order every couple of
weeks and even the games industry has fewer obstructing external dependencies
(hardware vendors and their drivers being one).

At least that's my experience from about 20 years of web development (15
professionally).

------
at_a_remove
Yes, I have a little Python library for managing network shares in Windows.

It has things like automatic retries that "back off" slowly, switching to
cached IPs in case DNS is down, and checking to see if all of the drive-
letters are full and either re-using a letter or creating a "letter-less"
share. I had to develop it during a period of great instability within our
network. It's ... large and over-engineered, but it just keeps on truckin'.

On the other hand, it has been quite useful going forward, so that's a plus.

I tend to program fairly defensively, in layers, right down to the much
maligned Pokemon exception handling. The results don't have the, ah, velocity
that is so often praised but they'll be there ticking along years later.

------
hn_throwaway_99
Hah, I saw the title "Fragile narrow laggy asynchronous mismatched pipes kill
productivity" and thought it was about the pitfalls of trying to coordinate
remote teams across disparate time zones.

~~~
acjohnson55
Yeah, my guess was it was a Slack takedown :)

------
carapace
> I hope this leads you to think about the ways that your work could be more
> productive if you had better tools to deal with distributed systems, and
> what those might be.

We have tools. Promula/SPIN model checker is one just off the top of my head.

------
Andaith
Just some fun with the English language:

> probably upwards of 80% of my time is spent on things I wouldn’t need to do
> if it weren’t distributed.

If you don't ever plan on distributing your software you can save a _lot_ of
time :)

------
wpietri
Funnily, I thought the headline was talking about development process, as that
also describes how a lot of places (mis-)handle the flow of what get worked
on.

------
_bxg1
> I also think all these costs mean you should try really hard to avoid making
> your system distributed if you don’t have to.

There's a point here about microservices.

------
emmelaich
It's perfectly applicable to people too; one stickler for the rules or slow
working in a critical role, say security officer, or change board chair can
kill productivity.

------
ahh
I feel so attacked right now.

------
RandyRanderson
TLDR he means microservices. We now have a generation that doesn't even recall
a non-MS world.

Tristan, the generation between us has created the IT world you describe.
You'll probably spend the next 20 years of your career dealing with that mess.
Sorry about that.

------
gfxgirl
doh! I thought this was going to be about remote work.

