
Uber Using Driver Phones as a Backup Datacenter - antman
http://highscalability.com/blog/2015/9/21/uber-goes-unconventional-using-driver-phones-as-a-backup-dat.html
======
kirk21
Off topics but the fact that Uber requires you to have the latest version of
its app to book rides is ridiculous. Not everyone has the newest smartphone
with tons of space to update apps.

~~~
jimminy
I actually got stuck out at Fort Mason one day during the Launch Conference
last spring because of this issue. I used an Uber to get there, but when I
went to leave the app required an update and I couldn't get a decent enough
connection to download the update near the venue.

~~~
brazzledazzle
They really should let you use the N-1 version at the very least. That should
prevent these same day/trip issues.

~~~
jimminy
I have a feeling they probably do.

I hadn't updated in a week or two, just out of laziness to not check for
updates everyday. So it's possible I had missed more than one update that
caused the issue.

~~~
mahyarm
I heard the force-upgrade window is 3 months. To verify just deny updates
until the app doesn't let you use the uber app anymore. It might also be
necessary fixes pushing up the minimum version.

------
lemevi
> Have to assume driver phones can be compromised which means the data must be
> made tamper proof. So all the data is encrypted on the phone.

> In the background the Replication Service encrypts the data and sends it to
> the Messaging Service.

> The Messenger Service sends the backup to the phone.

As long as the phone can't ever read the encrypted data this is probably
secure, but shifting costs onto the driver through using their bandwidth to
store encrypted backup blobs seems unethical. The uber app on the phone isn't
using this data to the benefit of the user, it can't as it is encrypted.

I don't think you should use a person's device for anything other than serving
that person. This is the kind of thing though I would totally expect from
Uber. Based on what the media has reported in the past, the whole company's
culture seems toxic and dishonest from top to bottom.

~~~
dankohn1
> I don't think you should use a person's device for anything other than
> serving that person.

It's not the person's device. Uber requires all drivers to have a separate
phone just for Uber usage. I had always assumed it was so that drivers
wouldn't hit on passengers by calling them after dropping off. The distributed
data backup seems like another advantage.

~~~
smt88
> assumed it was so that drivers wouldn't hit on passengers

To my knowledge, Uber relays calls between drivers and passengers through its
own phone network, which effectively anonymizes each end.

For example, an Uber driver once called me, and then my phone lost service. I
tried to call back with a friend's phone, and an automated Uber message said
that there were no active rides associated with my friend's number.

~~~
terinjokes
When I took Uber outside the US, this wasn't true. The app faithfully showed
my real phone number on the driver's phone (the driver took the +1 as a queue
to not call me and instead drive slowly in hopes of finding me).

~~~
smt88
Are you sure it was your real phone number, or was it just an American number
that Uber owned? That's what they use in the US -- a pool of numbers that they
bought to function as masks.

------
manibatra
" Updating a stored trip happens like: set(“trip1, version2”, “yyu”);
delete(“trip1, version1”). The advantage is if there’s a failure between the
set and delete there will be two values stored instead of nothing stored."

So simple yet could be overlooked. Great read overall.

~~~
stingraycharles
This is Copy-On-Write in disguise: the different versions can actually be
considered different objects, in which case it's just a copy and an eventual
cleanup of the old object.

~~~
manibatra
Right. Just read what Copy-On-Write is. Follows the same basic principle. I
might be wrong, but Copy-On-Write is formally referred to the case of a shared
resource? Though I think we could consider the trip information to be shared
between the phone and the data centre.

------
metaverse
I wonder why they didn't use something like CouchDB for this purpose and
instead engineered their own solution? I see the CouchDB replication protocol
+ conflict detection perfect for their usecase.

~~~
brazzledazzle
I often wonder why CouchDB didn't take off more. It wasn't a panacea but it
was clever enough and you could use it in mobile apps. Perhaps the decision of
the "host" company to merge with another company and head in the direction of
CouchBase made people wary. Or there's some technical reason I'm not aware of.
The latter seems more likely.

~~~
jchrisa
I am one of the original CouchDB folks, and a cofounder of Couchbase. We have
spent the last few years quietly building an enterprise-class suite of Couch
sync compatible databases. Everything we do is open source and Apache
licensed.

Couchbase Mobile is native on iOS, Android, and C#, and compatible with
PouchDB and Apache CouchDB.
[http://developer.couchbase.com/mobile](http://developer.couchbase.com/mobile)

Here is an example of how General Electric is using our open source platform
to power the Internet of things. [http://www.couchbase.com/nosql-
resources/presentations/offli...](http://www.couchbase.com/nosql-
resources/presentations/offline-first-and-how-ge-integrated-couchbase-mobile-
in-less-than-90-days.html)

[edit] we've been able to do this, funded by the success of Couchbase Server.
Which looks like a high-performance NoSQL database, not an offline mobile
database, but it uses many of the same data structures.

~~~
brazzledazzle
Thanks, looks like I have some research/reading to do.

Aside: This is why I love HN. Thanks for all of your hard work.

------
dmethvin
This seems like a great approach when the client devices are all friendly and
collaborating. What sort of countermeasures do you need when some of them
might be malicious attackers?

~~~
vemv
The tampering issue is described in the article.

~~~
ryandrake
> Have to assume driver phones can be compromised which means the data must be
> made tamper proof. So all the data is encrypted on the phone.

Their explanation leaves me wanting more. If the phone is compromised, surely
the encryption process might also be compromised.

~~~
maxerickson
The replication system encrypts the information before it is transmitted to
the phone. The phone never has access to the key.

~~~
ryandrake
Ahh, yes it's described in a different section. Cool!

------
ogrisel
Vector clocks and Conflict-free replicated data types should make it safer and
easier for more and more applications to follow a similar approach to
distributed data storage with delayed synchronizations to the reference data-
center:

[https://en.wikipedia.org/wiki/Conflict-
free_replicated_data_...](https://en.wikipedia.org/wiki/Conflict-
free_replicated_data_type)

------
bholdr
This looks like the prime example of FOG computing [1]:

"A network architecture that uses one or more end-user clients or near-user
edge devices to carry out a substantial amount of storage (rather than stored
primarily in cloud data centers), communication (rather than routed over
backbone networks), and control, configuration, measurement and management
(rather than controlled primarily by network gateways such as those in the LTE
core)."

[1] [http://fognetworks.org](http://fognetworks.org)

------
devit
Looks like there might be a surplus of engineering talent (or non-talent,
perhaps) at some companies, which results in these bizarre ad-hoc projects.

~~~
detaro
What's bizarre about this?

------
api
Wow... always makes me very happy to see any move toward pushing a bit of
intelligence out to the endpoints.

------
esMazer
what if in the distant future they could do away with most of their datacenter
with a solution like this (improved of course), imaging the savings... maybe
this is their first step into looking into it.

~~~
gaius
Unless phone storage and bandwith become cheaper than data centres, there is
no saving, and even if there were, the cost hasn't vanished, just been hidden.

~~~
icebraining
That depends on the type of messages; if one device is sending the data, and
the other is receiving it, you could absolutely save bandwidth by implementing
a P2P system instead of having the data center as the middle man.

An example would be the rider seeing the position of the car they have hailed
- P2P would save DC bandwidth without having the devices spend more.

~~~
gaius
There's overhead to P2P; if there wasn't we'd never have had servers, the
desktops on the LAN would have just done it.

------
OJFord
Previous discussions:
[https://news.ycombinator.com/item?id=10428508](https://news.ycombinator.com/item?id=10428508)
[https://news.ycombinator.com/item?id=10253158](https://news.ycombinator.com/item?id=10253158)

~~~
dang
Threads aren't considered duplicates until the story has had significant
attention on HN.

[https://news.ycombinator.com/newsfaq.html](https://news.ycombinator.com/newsfaq.html)

~~~
wcarss
Just noticing:
[https://news.ycombinator.com/item?id=10428508](https://news.ycombinator.com/item?id=10428508)
is a 3-day old submission from the _same user_ , and had modest attention.

~~~
detaro
Same user is fine and might actually mean that the mods asked the user to
repost (although dang probably would have mentioned it if they did in this
case)

~~~
dang
No, you're right, it was one we invited the user to repost. I didn't see that
earlier.

------
thrownaway2424
This seems to poke an ever bigger-than-usual hole in their fantasy that
drivers are contractors rather than employees.

------
newobj
So, they use the internet, which is generally considered "two nines" reliable,
to achieve 99.99% reliability? Oh, tell me more. <wonka-meme />

I skimmed through the youtube preso and saw nothing about results observed in
production, either through actual DC outages, or chaos monkey, or game days.

Seems highly overwrought architecture astronaut'ing and also very difficult to
get any telemetry on how well/if it works, to me.

~~~
encoderer
Do you experience 90 hours of Internet downtime every year? That's "two
nines".

~~~
newobj
Okay, good theory crafting...

From
[http://www.nytimes.com/2011/01/09/business/09digi.html?_r=0](http://www.nytimes.com/2011/01/09/business/09digi.html?_r=0):
"A home user’s Internet connection, with a laptop using Wi-Fi, would be
available about 99.8 percent of the time, estimates Mr. Hölzle at Google,
which equates to about 18 hours of cumulative downtime a year."

So yeah, two nines.

Also, here's the results of me pinging google.com from the network of my
"large technology company" who employees me.

652 packets transmitted, 646 packets received, 0.9% packet loss.

Two nines.

------
benevol
They sure know how to offload their costs to society, while reaping the
benefits.

I'm looking forward to the day when this "fat client" strategy will have lead
to a "decentralized Uber" (based on open-source technology, etc.).

~~~
travisp
Securely storing information about a driver's trip on that driver's phone is
hardly "offloading their costs to society." I assume the driver does not want
the information about the trip they are providing to be lost, and therefore
that they do not get paid, if an Uber data center goes down. This is not about
storing all of Uber's data on the driver's phones (this would make no sense),
only the information for that particular driver's activity.

Further, as the article describes, this may actually be more reliable than
separate backup data centers.

