
Rides of Glory – Uber Blog (2012) - t0dd
https://web.archive.org/web/20140827195715/http://blog.uber.com/ridesofglory
======
mmcclure
I gotta say, I'm not really seeing the creepy / cringey / evil / whatever-else
here...

Anyone (especially the HN crowd) should know they have the data, and if you
think they're not carefully analyzing it behind the scenes (like every other
tech company who has your data), I've got things to sell you. I personally
think a tiny peek like this into the data, much like the usage posts that
OKCupid, YouPorn, and others give, is neat.

~~~
striking
The problem here (for me personally, at least) is that Uber is not in the
business of selling dates/"encounters" and that people don't expect a
ridesharing company to go right for the sexual data. Even OKCupid is
straddling the line here with [http://blog.okcupid.com/index.php/we-
experiment-on-human-bei...](http://blog.okcupid.com/index.php/we-experiment-
on-human-beings/) noting that:

    
    
      To test this, we took pairs of bad matches (actual 30% match) and told them they were exceptionally good for each other (displaying a 90% match.)
    

That's really not something people like having done to them. And the "HN
crowd" shouldn't have an expectation of privacy and decency in data? Of course
they're analyzing data, but it's really the viewpoint from which they do it
that is unsettling. OKCupid says "no, duh, we're unethical. Deal with it."
Uber says "Check it out! We drew a line between social security checks and
prostitution!" (as waterlesscloud notes at
[https://news.ycombinator.com/item?id=8644138](https://news.ycombinator.com/item?id=8644138)
)

There are a million more beneficial ways that people could be using the data.
Fighting hunger, poverty, illiteracy, etc., to me, is a "good" use of Big
Data. Looking at sexual habits (when you're not selling sex) or openly
manipulating people to get data is, to me, a "bad" use.

~~~
gtremper
A little off-topic, but I don't see why OKCupid's actions here are unethical.
Their matching algorithm isn't perfect, so they shouldn't treat it as an
oracle of truth. How else would they discover false negatives in their
algorithm? Especially since, in this case, a false negative is worse than a
false positive (not meeting someone you'll like vs having one unsuccessful
date).

~~~
saidajigumi
> How else would they discover false negatives in their algorithm?

This is _exactly_ why research that deals with humans at Universities
invariably must pass a human subjects review process. "How else would we
discover X?" is certainly not reason to subject anyone to an unethical
experiment. Subjecting people to what you likely believe to be a bad date
should very definitely raise red flags, even if the details in practice would
pass a human subjects review.

And that's the trouble: there's a tremendous space of research that just isn't
ethical to carry out on actual living humans. As such, we have to find methods
to determine answers to those questions that don't breach ethical standards.
The burdens of discovery must lie squarely on the researchers, not on the
(often unwitting) experimental subjects.

~~~
tempestn
Do you think that giving someone an artificially inflated OKCupid match really
rises to the standard of an unethical experiment though? OKCupid doesn't tell
you who to go on a date with; they just suggest potentially good matches.
(Right? I'm married and don't tend to troll dating sites, but that's my
understanding.) You're free to read their profile, exchange messages, etc.,
before arranging a date. If it is indeed a bad match, then most likely you
would realize your incompatibility early in the process.

------
striking
To me, it's not even the use of data _per se_ that is most creepy about this
post. Really, the tone of the essay seems to revel in "having 'fun' with user
data," as if a sophomore at a university wrote it.

I mean, I found the idea behind the post interesting: of course you can
analyze trends in ridership to draw interesting conclusions. At the end of the
day, however, it's a horrible idea to say "Hey, we know which of you are being
'frisky' and where!"

Perhaps with a different motivation, this post wouldn't be nearly as ruinous.
How about ridership patterns of sick or socioeconomically disadvantaged
people? That's the kind of data that can change lives for the better.

~~~
simonster
The author is Bradley Voytek
([http://darb.ketyov.com](http://darb.ketyov.com)), who is now a professor of
neuroscience and should probably know better.

~~~
coke12
He acknowledges the questionable nature of his blog posts on his website:

> Between June 2011 and August 2011 I worked with my friends over at Uber as
> their data scientist, writing (what I thought were) amusing, data-driven
> blog posts (among other, more serious roles).

------
dil8
Stuff like this is one of the many reasons I love archive.org. I think i's
really important to capture historical artifacts for future analysis.

The service they provide doesn't allow the "Ministry of Truth"[1] to doctor
historical documents to meet their present day narrative.

[1]
[https://en.wikipedia.org/wiki/Ministry_of_Truth](https://en.wikipedia.org/wiki/Ministry_of_Truth)

~~~
nathanb
Sadly, it does.

archive.org respect the robots.txt of the current website owner. This can mean
that they have the data but choose not to give you access to them. I have seen
cases in the past where a website I once frequented became defunct, then the
domain expired, then someone parked a holding page on that domain including a
robots.txt that keeps archive.org from displaying the old data (which do not
even belong to the current owner of the domain!).

If they wanted to, there are a number of ways Uber could prevent archive.org
from displaying that blog post. Many of these ways are due to the good faith
under which archive.org operates (nobody is forcing them to respect
robots.txt), and some even involve resorting to legal methods. But history is
always mutable.

(Nothing but love on my end for archive.org, believe me! But I do want to
point out the lengths that some people will go to alter the historical
record).

~~~
TazeTSchnitzel
The Internet Archive should implement some sort of digital signature system to
allow website owners with foresight to prevent this.

~~~
toyg
They could just timestamp different versions of robots.txt (which they
probably do already), and respect it depending on date (which is more of a
hassle, because you have to build it in your UI logic).

~~~
nathanb
That would not solve the problem they're trying to solve.

Let's say I post something that I shouldn't have posted -- insider stock
information, nude photos, whatever. Perhaps something illegal for me to post.
I need to make it go away.

I need to be able to create a robots.txt _today_ which affects stuff I posted
_yesterday_.

This is why archive.org respects the current robots.txt for access to past
content.

------
chockablock
In another deleted post [0] the author talks about using a name-to-gender API
to look at ride locations by gender, which implies that these analyses were
_not_ done using anonymized data.

[0]
[https://web.archive.org/web/20140827195715/http://blog.uber....](https://web.archive.org/web/20140827195715/http://blog.uber.com/2012/01/09/uberdata-
san-franciscomics/)

~~~
kogir
You have to start with the original data, which is obviously de-anonymized.
Full data -> [gender, time, origin neighborhood, destination neighborhood]
leaves you with a pretty anonymous dataset, and is all that would be required
for this analysis.

Internal metrics teams nearly always have access to complete data. The issue
is sharing non-anonymized data externally.

~~~
chockablock
I agree it's possible that the name-to-gender mapping was done before the full
ride data was handed over to this analyst. (Though just removing real names
would still leave a lot to be desired in the anonymizing process).

However there's no mention in these posts of such safeguards, and subjectively
the post reads more like the analyst is just fishing around in the full raw
dataset of ride times, start and end locations, and names. To wit:

"What else can we learn? First, we can devise a way to statistically assess
whether there are more women or men in a neighborhood than we’d expect. [...]
We used Rapleaf’s Name to Gender API to assess the likelihood of a rider’s
gender given their name, only accepting a match if the probability was >=
95%."

And in the original post, he categorizes rides as possibly related to a late-
night hookup based on whether the destination and departure points for 2 rides
are within 0.1 mi of each other.

>Internal metrics teams nearly always have access to complete data. The issue
is sharing non-anonymized data externally.

I disagree pretty strongly with this. Do you think that your average Uber
rider would be OK with Uber employees analyzing their ride patterns (with
their real names attached) to try to figure out where and when they are having
sex? Do you think Uber should allow such access to its employees by policy?
(It seems we agree that writing a blog post about it is not a great idea.)

~~~
Jach
> Do you think that your average Uber rider would be OK with Uber employees
> analyzing their ride patterns (with their real names attached) to try to
> figure out where and when they are having sex?

Sure, as long as Uber isn't broadcasting that information with their name
attached. The average person really doesn't care about (or understand the
extent of) data analysis (from companies or the government) -- what they care
about is public disclosure which may mean personal embarrassment or a lawsuit
or other form of inconvenience. People who want to control all their data are
hoping for a fantasy world where observations and inferences by third parties
are magically made impossible. The reasonable thing to focus lawmaking efforts
on is limiting legal forms of disclosure and standardizing safe storage
requirements for the raw data -- indeed such laws already exist, with the
HIPPA privacy rule perhaps being the best known in the US.

~~~
chockablock
HIPAA's not a great example for you to use, since it does in fact limit
_access_ to protected information by employees (under a 'minimum necessary'
standard) [0]. You can even serve time in federal prison for a violation
without disclosing anything [1].

>People who want to control all their data are hoping for a fantasy world
where observations and inferences by third parties are magically made
impossible.

I think you are setting up a straw man here. What I suspect the average user
expects is for their sensitive personal data to be dealt with in a
professional and respectful way, with protections against abuse by rogue
employees. There are plenty of companies who deal with private data and
understand this well. Potatolicious had a comment on another Uber thread
detailing the hoops an Amazon employee has to go through to get access private
customer data [2].

Scrubbing these posts suggests that Uber realizes that they have a real
problem, at least at the PR level. I wouldn't be surprised if they are also
getting more serious about controls on internal access to ride data.

[0]
[http://www.hhs.gov/ocr/privacy/hipaa/understanding/covereden...](http://www.hhs.gov/ocr/privacy/hipaa/understanding/coveredentities/minimumnecessary.html)

[1] [http://dailybruin.com/2010/05/05/former-ucla-medical-
center-...](http://dailybruin.com/2010/05/05/former-ucla-medical-center-
employee-huping-zhou-se/)

[2]
[https://news.ycombinator.com/item?id=8624945](https://news.ycombinator.com/item?id=8624945)

~~~
Jach
I think HIPPA is a great example precisely because it goes beyond "don't
disclose this", it also regulates "safe storage requirements", whose purpose
is ultimately to make unwanted disclosure (through breaches, rogue employees,
etc.) less likely, of whatever scale. (e.g. my plaintext password for a
service shouldn't ever be disclosed to even a single person.) I think we're in
agreement about people generally expecting professionalism.

------
codezero
Given the time frames they looked at, couldn't a bunch of these be people
going to a pub, and then leaving after last call?

This would also explain the spike near the weekend, among other things.

~~~
dkural
that would certainly explain the Boston geographic map.

------
colinbartlett
Is it sad that in the realm of Uber blunders I find this relatively tame?

~~~
meowface
Yeah, frankly this doesn't seem all that bad compared to some of the words
coming out of executives' mouths.

One could do an analysis like this while still working with anonymized data.
Still a bit creepy, but not that different from reports and blog posts you see
from other startups and tech companies.

~~~
potatolicious
Uber is suffering from a lack of credibility built up by by years of mild-to-
moderate asshole behavior.

Nothing they've done so far, in isolation, are IMO worth the pitchforks being
handed out in tech and mainstream consciousness right now, but taken as a
whole it's pretty easy to see why people aren't willing to cut Uber any slack
or give them the benefit of the doubt.

So yeah, this thing by itself isn't "that bad", but it's one piece of a large
puzzle of Uber's misbehavior.

~~~
meowface
Agreed.

------
sp332
Ahem [https://archive.org/donate](https://archive.org/donate)

------
danso
Off-topic: Because of situations like these, I'm surprised that part of the
checklist when launching a PR blog is not: "Block googlearchive/archive.org
robots"

There have been very, very few times when a company's webpage was down and I
needed to go to google-archive or archive.org to refer to some innocuous
information. However, the times that I've used those sites to gather evidence
of possible whitewashing? Many, many times, in comparison.

~~~
yuhong
What is funny is that using robots.txt would have been enough.

------
leeber
Comparing Uber to OKCupid is ridiculous.

OKCupid is a dating website which deliberately branded themselves as further
on the "edgy" and "hookup" side of dating websites. Then you have POF
somewhere in the middle, with eHarmony way on the other side, quite opposite
of OKCupid.

I'm not sure why Uber would want to put themselves anywhere on that same scale
(i.e. aligning your brand with notions of sex and one night stands). There's a
time and a place for everything, and for edgy data analysis like this -- that
"place" is edgy dating websites who want to be known for hooking up.

It's unprofessional and out of line with their brand image, obviously why the
post got deleted. IMO this further validates all the bad press the media has
been publishing about Uber.

------
sergiotapia
I can see why they deleted it, it's extremely cringey.

~~~
waterlesscloud
I think their post (also now deleted) inferring a connection between welfare
checks and spikes in prostitution might have been even more cringey.

[https://web.archive.org/web/20140827195709/http://blog.uber....](https://web.archive.org/web/20140827195709/http://blog.uber.com/2011/09/13/uberdata-
how-prostitution-and-alcohol-make-uber-better/)

Note that both of these posts had been up for years and only disappeared in
the last few days.

~~~
wtbob
Cringey, or true?

~~~
potatolicious
It can be both, like the annoying coworker who won't shut up about how much
sex he had over the weekend, or the guy who raves about his favorite scotch at
an AA meeting.

A correct observation does not shield it from being inappropriate or in poor
taste.

------
bricestacey
The Boston anomaly and map is interesting. The requirements are a trip from
10pm-4am and a trip again 4-6 hours later. However, due to the MBTA shutting
down around 12:30am and bars closing at 2am the criteria captures just about
any partygoer.

So if you took an uber to some bar/club/friends at 10-11pm and again after 2am
when all bars or the T is closed, you're likely counted. I doubt this
represents customers having one night stands and is likely just a heat map.
This is further explained by the small pocket in Somerville that is not
accessible by the train, but by bus where people may opt for an uber.

That's not to say that there are no rides of glory or whatever the hell kids
call it today.

------
leeber
One more thing--

Would google publish data that shows how searches for porn spike during
different times of the day or times of the year, as if it's some "cool and hip
and edgy!" insight?

I don't think so.

And for the same reason they don't (whatever reason that is), it would
probably also be wise for Uber not to post stuff like this.

I really don't care, nor am I offended. I'm just speculating that Uber doesn't
have the brightest team of execs and still have a lot of "growing up" to do.

~~~
nathanb
This is actually a really good point.

Google have been fighting a public relations war for a long time now to not
appear creepy or stalkerish. I can think of few things they could blog about
to make people consider not using Google more than "we know when you're
looking for porn".

Uber have not (yet?) been widely called out as being creepy the way Google
have. But Uber have data that can be every bit as personal as your search
history, and posts like these make it obvious that people at Uber are thinking
hard about putting those data to use.

There's a lot lurking under what at first glance appears to be merely a
poorly-considered sophomorish post.

------
joshvm
Curious to see how companies deal with this kind of data because, ignoring the
creep-factor, it is thoroughly interesting to see these sort of patterns
emerge and the only way you can find out this kind of stuff is to track
people.

It's the actions of the unscrupulous minority that ruin this for the rest of
us. I personally believe that most of the time when companies say "We simply
aren't that interested in you." they're probably telling the truth. Stats is
pointless if you look at single points. It only takes one person to snoop on
an ex or to blow everything up. Unfortunately you have to mitigate that risk,
but proper database sanitisation before handing over to the analysts should be
sufficient. Provided there is no overlap between the sensitive database and
the one the analysts have access to there shouldn't be a problem.

I guess it's a side effect of becoming 'big' that you can no longer run these
kind of public posts without looking extremely unprofessional.

~~~
yuhong
_I guess it 's a side effect of becoming 'big' that you can no longer run
these kind of public posts without looking extremely unprofessional._

Does it really matter these days?

------
pistle
So they blog about it in aggregate. It's not like they would know exactly who
each of those riders were and would think about using that data for anything
than the lulz. I'm sure this sort of data wouldn't be interesting for social
engineering purposes in the hands of 'others' as well.

------
mgalka
It is a bit unsettling that this information is out there, but I agree that it
is fairly obvious that they have this data. And as long as they are not
exposing the individuals, I don't see it as irresponsible to publish something
like this.

There was a related story published recently, NYC Taxicab Dataset Exposes
Strip Club Johns and Celebrity Trips

[http://research.neustar.biz/2014/09/15/riding-with-the-
stars...](http://research.neustar.biz/2014/09/15/riding-with-the-stars-
passenger-privacy-in-the-nyc-taxicab-dataset/)

------
EvanL
I think it's a lot of fun and they pulled some interesting data from it. There
are far more pressing concerns in the world right now, people have sex
sometimes NBD.

------
api
Big bro is watching you.

~~~
cbd1984
> Big bro is watching you.

We are watching them pretty close, aren't we?

------
almost_usual
"You people are fascinating."

Is this data fascinating? I guess the time of year patterns and holiday
anomalies are interesting but aside from that this behavior seems obvious?

------
jpeg_hero
blog article was 2.5 yrs ago, they were pulling out all the stops to get
traction.

now they have critical mass they can transition into "full boring corp speak"

HN don't throw stones, what boundaries are pushing to get traction right now?

~~~
teacup50
> _HN don 't throw stones, what boundaries are pushing to get traction right
> now?_

We're not pushing ethical boundaries.

------
djtriptych
Anyone know what they used for the map visualizations? They're beautiful...

------
bren2013
Uber employees are the kind of people who kept a telescope in their bedroom
window to peep on girls down the street. I'll never understand why they're
still in business.

~~~
op00to
They're still in business because they are far better than the entrenched,
governmentally-appointed taxi companies in any locality I've been in, except
for perhaps Manhattan.

~~~
cddotdotslash
Manhattanite here. Taxis are still terrible for some things. While it's almost
never impossible to find one, they still do the same tricks of "oh, the credit
card machine is broken," and "I'll only drive uptown right now," and "I don't
drive to JFK in the morning." I can't stand a lot of Uber's ethical practices,
but I still prefer them over cabs.

~~~
potatolicious
Another Manhattanite here, I actually prefer cabs to Uber, though I prefer
Lyft over both.

Uber drivers in the last year have become, without fail, become much worse at
pathfinding than cab drivers. I've had drivers completely miss major turns or
get lost while the meter's running. And more recently, I've noticed a pattern
of behavior where I'd call an Uber and the car wouldn't even begin moving for
> 5 minutes.

I'm not sure what that's all about, maybe they're waiting for Surge to kick in
in the hopes of getting a fatter fare? Either way, I have not had an Uber
arrive within the estimated time for over a year.

Lyft drivers on the other hand start moving right away after they're assigned.

> _" oh, the credit card machine is broken," and "I'll only drive uptown right
> now,"_

It sucks that you have to deal with it, but the solution is really simple.
Just get in the cab, don't be a sucker and tell the driver where you're going
through the window. If they balk, take out your phone and take a picture of
their license at the back and tell them you're dialing 311. They will
immediately fold and take you where you're going.

Ditto credit card - if the credit card machine is broken they are _obligated_
to tell you at the beginning of the ride, and if they don't you can walk away
for free.

Ditto the JFK thing - a cab _cannot_ refuse a fare within city limits.

I've never seen a cabbie not fold like a house of cards when threatened with a
311 call. For all its warts the T&LC actually polices driver complaints pretty
hard.

