
Distributed Systems Are a UX Problem - platz
http://bravenewgeek.com/distributed-systems-are-a-ux-problem/
======
marknadal
I work on distributed systems, and I thought this was a nice post and echoes
some of my own sentiments.

The hardest problems in distributed transactions (banks, inventory, etc.) are
often easier solved with human psychology (UX) than algorithms.

Hi, I am an ATM. Yes you have money, yes I am offline and can't check, yes you
can withdrawal so you as the customer are happy with high availability. BUT I
know who you are, and if you cheat me I will punish you when I find out!

Hi, I am a shopping cart. Why yes we have one of those in stock, but I am
offline so I can't check. I'll take your money now and have it 2 day delivery
for you. Oops, I just found out we don't have it in stock but I already have
your money, this will take a few weeks now but we'll give you $20 off your
next purchase - or do you want a refund?

This is the better approach, changing your business model to prioritize
customer satisfaction (UX). Trying to build a globally consistent system
instead either has to break the laws of physics with the speed of light, or
make your customer have to wait - and if they have to, they probably won't be
your customer anymore. One of these options is possible, but incorrect for
your business, therefore use a distributed system with good UX.

~~~
learnstats2
> Why yes we have one of those in stock, but I am offline so I can't check.

Assuming the "I am offline" part is redacted, I view this as deliberate lying
- deliberate in the "we already have your money" sense that you identified.

If I had made a decision to order with you because you had claimed something
was in stock, but it wasn't, I will withdraw my patronage from you and I will
complain loudly and vigorously.

This type of behaviour/UX has permanently harmed my relationship with several
large retailers.

~~~
jrochkind1
I think to call it lying is based on the assumption that there is some perfect
answer. There usually isn't.

I work at a university library. Our inventory is fairly small on internet
scale (~3 million items in 5 or 6 separate 'warehouses'; although we usually
only have 1-3 count of each 'item' in inventory), so we don't really have
these large scale/distributed problems. The biggest problem we have with the
system saying an item is 'in stock' when it isn't is -- the item has been
stolen or lost, and we haven't noticed yet and recorded it as such.

Is it "lying" if our system says it's on the shelf, when in fact it's been
stolen or lost and we haven't noticed yet?

There are obviously ways we could improve our 'loss reduction'. But there will
ALWAYS be cases where the system's knowledge is an imperfect representation of
the real world, in any system.

"I've been offline for 10 minutes so the last information I have is as of 10
minutes ago" is just one more.

You can spend more money to try to make the information more accurate, but it
will never reach 100% (even before you add in distributed computing, which
adds some of it's own issues), so as with everything, it's cost-benefit, how
much does the customer care, what can we afford to do, at what point is our
information good enough to keep them happy -- and, like the OP says, how do we
properly make the UX to keep them happy despite information that's not 100%
accurate, which it NEVER will be.

~~~
mason55
I think the problem here is that these systems are frequently set up to look
authoritative. "Hurry up! Only 1 left!" is a common sight on Amazon.

The user doesn't care about the challenges of a globally consistent
distributed database, all they know is you said there was one left so they
bought from you and now you're telling them you were wrong. You set
expectations and then failed to meet them and that upsets people

If your system is not quite perfect, especially around something that can
drive a purchasing decision, then make it clear to the user. "Hey, we're low
on stock, we think we have 1 left but we might be out". Maybe you can even
give a confidence interval, like "This item sells very quickly so we're
probably out by now and don't realize it" vs. "we sell two of these a year and
know that as of 5 minutes ago there was one left so we probably still have
it". Now the user can make an informed decision.

~~~
nostrademons
That's where the "compensation" part of the original article comes in. If a
company's doing this strategy right, they make it up to the customer in some
generous way - "We'll give you a full refund and your next purchase is free"
if the shopping cart says something is in stock and it isn't, "We'll pay for
your remodel" if someone trashes your AirBnB, "We'll give you a free ticket"
if you get bumped from a flight.

What some of the smarter big companies have realized is that emotions are
fungible, and they work on a "last writer wins" basis. If you do something
really nice for the customer after inconveniencing them (and it has to be more
"nice" than the initial problem was "nasty"), they remember you making it up
to them, not the initial problem. That shifts the cost of compensation back
onto the company, which gives them an incentive to improve their systems, but
also lets them trade-off occasional hefty compensation charges against getting
100% consistency & availability, which is impossible.

------
tylertreat
There seems to be a bit of confusion over just what exactly I'm trying to get
at here. I'll be the first to admit, the article might not do it justice. This
is a comment I posted on it which hopefully helps clarify:

The point is there are certain realities inherent in distributed systems which
can’t be papered over, and those realities often manifest themselves at all
levels in the stack. Sure, we can try to build abstractions which hide those
problems, but they often leak or simply eschew the problems in dangerously
subtle ways.

Frontend and UX folks need to understand what they are building on. They need
to understand the potential pitfalls of these abstractions and why they are as
such. Sometimes there is a good business case for apologizing, sometimes
there’s not. These things are never black and white, but if everyone can
understand how systems work, we can build better ones and make better choices.

~~~
AnEngineer
And my reply:

I agree that there are complexities (as you state, “certain realities”)
inherent in distributed systems. If there is one thing I believe we could both
agree upon is that distributed systems increase the surface area of complexity
exponentially when compared to the relatively simpler stack of clients
communicating with one system.

Failures in a distributed environment must be expected. Embraced even, since
treating them as “an exception” is tantamount to the proverbial ostrich poking
its head into the sand.

However, failure in a distributed system does not necessitate “Frontend and UX
folks” having to address failures which are not directly a result of client
communication with the next tier. For example, should the UX be prepared to
handle when the primary provider of a service is down and fail over to a
secondary system? If the service is the one which the client is directly
interacting with, then likely that makes sense.

But what of the case when the primary system has issues interacting with its
collaborators? Such as its persistent store. Or a strategic partner providing
required functionality for the workflow. Is it your position that this type of
failure be handled by the UX?

Why wouldn’t a server maintain separation of concerns and only communicate to
a client when its options have been exhausted? IOW, why assume what “the other
side of the connection” is going to do in order to satisfy a request placed on
it?

In summary, my position is that when the collaborators in a distributed system
have a well defined software contracts (protocol, expectations, etc.) and all
parties adhere to them (without assumptions in implementation), then this
allows the system(s) involved to provide resilience in depth. However, the
fashionable trend of replicating a PowerBuilder environment does not lend
itself to these separation of concerns.

Foisting complexity to either end of the spectrum is where truly nasty
problems are bred.

~~~
huslage
In a distributed system, there is no "primary" system. All systems are
distributed and decoupled. The UX sends a request and waits for a response
that itself be composed of other requests and responses from multiple other
systems. The failures have to be assumed and mitigated for at the UX level,
not at some back-end level.

~~~
AnEngineer
By "primary system", I meant the initial system(s) interacted with in a
distributed environment. Assuming a connection based distributed model (e.g.
TCP/IP), there is a system on the other end of the connection to which a
client is communicating. Even for connectionless models, in my mind the
"primary system" would be defined as the set of systems which could possibly
respond to a client's request.

Perhaps the term "primary" is not the best fit for this role. It just seemed
less verbose than saying something like "the set of systems for which a client
is directly aware and may interact with."

------
hinkley
> A distributed system is one in which the failure of a computer you didn't
> even know existed can render your own computer unusable.

\-- Leslie Lamport

------
panic
Running this argument in the other direction: if you want an uncompromised
user experience, you should use or build a centralized system rather than a
distributed one.

~~~
prewett
Reading your comment made me think of Apple and Microsoft. Microsoft makes an
operating system that runs on a distributed hardware ecosystem and they have
the unenviable task of making the user with crappy (but cheap) hardware
actually work. Apple is completely vertically integrated, so they can make
sure everything works smoothly in their vision, but you only get Apple's
vision.

Then you have Linux, which is a distributed software ecosystem on top of a
distributed hardware ecosystem. UX is terrible, but flexibility is through the
roof. You can pretty much have exactly what you want, although you might have
to do parts of it yourself.

~~~
ArkyBeagle
UX doesn't have to be terrible. It's just a long twisty maze of interlocking
decisions. Depends on the U, too - I feel better with command line tool output
and filters than with fancy GUI furniture.

~~~
prewett
Depends on what you want out of your UI. I ran Linux for years and years on my
laptops. But getting a consistent look was difficult: Firefox, OpenOffice (at
the time), Emacs, Inkscape, Gimp, the KDE CD burner I used, etc. all had
different toolkits so theming never really looked right. Every time I upgraded
Ubuntu something else broke: first sleep, then wifi, then audio (pulseaudio
and jack, grrr). Getting on the network after an install was usually a bit of
work. Maybe DHCP worked fine, ... and maybe it didn't. Configuring X
(resolution, mousewheel, etc.) was a chore, although I think that's no longer
necessary. I never did get my USB thumb drives to automount, although I heard
tell it was possible.

That long, twisty maze of interlocking decisions, and the fact that every
decision could and did go wrong, usually necessitating several hours of
troubleshooting, that's bad UX.

Don't get me wrong, I love the idea of Linux. But Mac OS X is the real Unix on
the Desktop.

------
shockzzz
If you like the article, you might like CockroachDB. They just got funded by
Google Ventures.

[https://github.com/cockroachdb/cockroach#design](https://github.com/cockroachdb/cockroach#design)

[https://twitter.com/GoogleVentures/status/606534505332154369](https://twitter.com/GoogleVentures/status/606534505332154369)

------
uuilly
I really enjoyed the analog world analogies like the work order.

------
einarvollset
At trivial scale, sure. But consider a multicore CPU or GFS. Both these are
distributed systems, but where UX has no solutions.

~~~
Terr_
That's only because you've drawn such tight boundaries around your definition
of "distributed system" that "User Experience" is actually "API design".

The same principle applies: The system can't do everything magically, so it
exposes some of that through the API. I mean, I doubt the GFS API is simply a
read() and write() method that are always guaranteed to work instantly.

Instead you start posing the same kind of "Hey, I need you to make a decision
here" tradeoffs that the author is talking about:

"Sometimes X will happen, if so you need to check Y and try to recover with Z,
unless you didn't really care about W. If you don't want to recover and would
rather discard, make sure you call Q, otherwise we'll eventually destroy it
after T minutes."

~~~
einarvollset
I would have to disagree. One of the main benefits of distributed systems is
that you _don 't_ need to know that in reality your file system is sharded on
to 5 different disks at 2 different colocs. I understand that sometimes you
need to manually recover, but OP sweepingly states that distributed systems is
a UX problem, which I think is a ridiculous reduction and equivalent to saying
"well this shit is hard so let's just spill the guts of the system internals
up to the user"

------
jamesblonde
Please don't read this article. I wasted 2 mins of my life on it....

~~~
vog
Would you care to elaborate on that?

~~~
cies
Myself did not think it was that bad, but the article is not very concise. The
first comment on the article (below it, by "An Engineer"), sums up my
sentiments pretty well.

