
Thoughts on Conway's Law and the Software Stack - rbanffy
https://blog.jessfraz.com/post/thoughts-on-conways-law-and-the-software-stack/
======
sarcasmic
Not sure if the comparison to Conway's Law works. The key goal of layering is
abstraction, so that one can be productive without having to know details
about the layers below, but much of optimization is about exploiting details
in the layers below for gain. Clearly, these goals are in conflict.

After posing a hypothesis, the post talks about security and process
isolation. But the problems raised aren't in line with the hypothesis: the
challenge in these cases isn't "insufficient communication" between the levels
of the stack, but rather a discrepancy between the abstraction's actual
behavior vs. a human's desires and expectations about big-picture topics.

These abstractions often compromised by information leakage through side
effects that executing code can observe or deduce, leading to the class of
vulnerabilities that have long been around, but have received far more
attention since Spectre.

Protecting against timing attacks and other side-channel attacks requires the
observable state of the system to not vary due to execution in a different
security domain. Timing attacks are particularly frustrating, because
processes can estimate their execution time even without external timers, so
it can compare the time taken between different calls. Cryptographic
operations often take special care to avoid leaking information through
timing, but the same discipline isn't commonplace in system calls or userland
code. And shared caches will leak info in timing but greatly improve
performance.

Hardware isolation is an effective solution for curbing timing attacks for
systems that don't communicate over a network. It's not sufficient in the case
of networked systems, because the network and its connections form another
source of observable state that's likely full of unrelated side-effects.

~~~
laughinghan
> the challenge in these cases isn't "insufficient communication" between the
> levels of the stack, but rather a discrepancy between the abstraction's
> actual behavior vs. a human's desires and expectations about big-picture
> topics

Isn't the thesis of the post that that exact discrepancy is due to
insufficient communication between the people at different levels of the
stack?

An abstraction is an interface between two levels of the stack, right? I think
the thesis of the post is that the exact discrepancy you describe, between the
abstraction's actual behavior and the desires and expectations of the people
one level up, is due to insufficient communication between the people one
level up and the people implementing the abstraction's actual behavior, which
due to Conway's Law are separate groups of people.

You mention "information leakage", but that's leakage between _software
components_ at different levels of the stack; perhaps you have that confused
with the "insufficient communication" referred to by the post, which is
between _groups of people_ at different levels of the stack?

------
mjw1007
This idea seems surprising: « you’d be crazy to think hardware was ever
intended to be used for isolating multiple users safely »

Surely the era where it was common for multiple users to be logging into one
computer was long enough, and central enough, that it still informs a great
deal of current hardware design.

~~~
wmf
I thought the consensus was more like: multi-user security is a solved problem
except for side channels which are everywhere, there's nothing you can do
about them, and they're impractical to exploit. That last part turned out to
be wrong.

~~~
rbanffy
You can have physical partitioning on very high-end machines.

~~~
GauntletWizard
You can have physical partitioning on low end machines, too. The cpuset cgroup
allows you to partition processes so that they only run on certain cores, and
then you can prevent them from accessing memory through other cores, and and
and... And you'll still find side channel attacks that you didn't think of,
because the system is complex.

Or you can buy a dozen cheap systems and run each of your customers on one of
those. There's still the opportunity for side channels there - We were finding
timing exploits in crypto libraries long before VMs.

It's an arms race, and we're neither winning or losing, but as the market
grows the spread gets wider.

------
di4na
The answer is to accept it. Problems happens. We are dealing with systems that
are inherently complex.

You can try to alleviate some problems but in the end this lack of
communication is also what create the success. It allows for "slack". It build
learning.

We have "incidents". This is good. This is where we can learn. What we need is
to understand that and learn better.

I suppose jessie knows John Allspaw...

~~~
jopsen
We can try not to build abstractions that are easy to misunderstand.

But yeah, communication won't scale.

------
perfunctory
> Or is the answer simply, own all the layers of the stack yourself?

Like Apple?

~~~
discodave
I think AWS and the other cloud providers owning the whole stack is what
Jessie is thinking of with that statement.

AWS has their own ARM processor (Graviton), and hardware designs. All
completely proprietary. GCP has TPUs, and so-on.

~~~
rbanffy
This is also a nice way to differentiate and not compete in price/performance
alone.

