
How We Built Uber Engineering’s Highest Query per Second Service (2016) - dankohn1
http://eng.uber.com/go-geofence/
======
gfrangakis
A must-read followup that explores other data structures for geofencing:

[https://medium.com/@buckhx/unwinding-uber-s-most-
efficient-s...](https://medium.com/@buckhx/unwinding-uber-s-most-efficient-
service-406413c5871d)

~~~
willsalz
Great write up! I cringe whenever I see that blog post; other spatial indexing
systems at Uber were and are designed more thoughtfully, and with better
performance characteristics.

~~~
velox_io
The article didn't really go into detail about how this works within each
city.

I am surprised they're scanning each city linearly, even a crude index (such
as boxing), or a binary search would greatly improve the lookup time. I guess
if the data fits into the cpu cache the order these are scanned doesn't matter
so much?

------
Twirrim
2 years after the initial blog post, and we get this submitted twice in two
days?

See
[https://news.ycombinator.com/item?id=16084090](https://news.ycombinator.com/item?id=16084090)
for the brief discussion yesterday.

------
mrdmnd
"Complicated S2?" That's surprising, the S2 library
([https://github.com/google/s2geometry](https://github.com/google/s2geometry))
seems really clean and fairly intuitive to me.

~~~
Mr_P
Seriously. S2 is just quadtrees over a cubemap, with a nice mapping from
integers to cells.

So, instead of using a quadtree (because it's "complicated"), they chose to
make their own two-level tree requiring O(N) linear scans at both levels, as
opposed to O(1) with a simple trick.

------
jwatte
4500 queries per second per node doesn't sound like much.

"100s" of linear polygon scans could likely turn into a few, plus a quad tree
or spatial grid hash lookup. I don't quite understand why those approaches are
considered "complex?" Every simplistic game engine does this as soon as it
implements collision.

Finally, choosing languages for workloads is fine. Python and JavaScript are
famously non-threaded. But what about Java, C#, or C++? And why is the index
updated in process, instead of a new process spinning up with the new index,
and then switching over who takes requests?

Not saying what they built doesn't work, just curious about options.

~~~
quietbritishjim
> Finally, choosing languages for workloads is fine. Python and JavaScript are
> famously non-threaded.

Python has famously misunderstood threading support. It is certainly single
threaded sometimes (or rather multi threaded but only one can run at a time
even on a multi core machine). But it can be usefully multithreaded if most of
your computation is done with a C extension module that releases the GIL while
it's being called e.g. numpy. If such a wrapper for the C++ S3 library exists
(I haven't checked) it would be viable for this task.

I agree that choosing languages for work loads is fine though, and admittedly
even in the best case Python would still not be as efficient as Go.

------
jdcarr
The last time this was posted a few commenters mentioned using PostGIS instead
of Uber rolling this themselves.

If anyone's used PostGIS with a similar level of traffic/latency requirements
could you comment on your experience?

~~~
afeezaziz
I am learning PostGIS and PgRouting but there is not many tutorials/resources
out there. Do you have any recommendations?

~~~
SOLAR_FIELDS
Search Anita Graser, she has some good introductory materials.

~~~
afeezaziz
Thanks!

------
NelsonMinar
I imagine this is the service they used to implement greyballing.
[https://www.nytimes.com/2017/03/03/technology/uber-
greyball-...](https://www.nytimes.com/2017/03/03/technology/uber-greyball-
program-evade-authorities.html)

------
ianamartin
I know it's fun to hate on Uber here, and there are plenty of reasons to do
so. But this is surprising to me. I was at one of the Uber tech meetups in NYC
last year or maybe a year and a half ago, and it seemed like a total circus.
The guy who was championing golang was up there talking about how cool go is
and that they were rewriting all their microservices in it just because go is
cool.

He also talked about how many problems it had caused them and all that fun
stuff. I left that meetup thinking two things: 1. Who the fuck breaks stuff
just to use a cool new language? 2. Good lord, I really don't ever want to
work on this team. It's chaos.

------
SamPutnam
_Instead of indexing the geofences using R-tree or the complicated S2, we
chose a simpler route based on the observation that Uber’s business model is
city-centric; the business rules and the geofences used to define them are
typically associated with a city. This allows us to organize the geofences
into a two-level hierarchy where the first level is the city geofences
(geofences defining city boundaries), and the second level is the geofences
within each city._

What about Uber's business model impacted this choice of algorithm?

Edit: left only the unanswered question

~~~
chris11
I'm not quite sure what you are referring to by strict background checks. But
here's I think the service is used. A user initiates a call to the service
requesting a ride. The service takes in the user's location and first finds
the city (first tier), then the service searches and returns the georeferences
in that city for ones that contain that user location. These georeferences
might attached to drivers, and those drivers might be analyzed to find good
drivers to send that ride request to, or they could be something like
neighborhoods, where Uber would search that neighborhood for drivers. I'd
guess that city specific requirements could be associated with georeferences
in either tier.

------
stablemap
From early 2016.

------
ryanthedev
Excellent article! I love to see go more and more in big enterprise companies.

Have you guys looked at moving to rust for any services?

------
tintor
Why do you need 50ms for 99th percentile for such a simple and stateless
service?

------
beatpanda
Hey, really cool tech, good work. How's building a profitable business going?

~~~
foobar1962
A shame this comment was voted down. It is a pertinent question.

~~~
rosser
Disagree. It's _im_ pertinent, both in the sense of its being off-topic for
the _technical_ achievement in The Fine Article, and that of its being ill-
mannered.

I say that as someone who assiduously avoids using Uber's services, who finds
what I know of their culture to be odious, and who has strongly criticized
some of their other technical decisions, so please don't construe my comment
as fanboy defensiveness.

------
chx
Could we ban Uber from HN? There really needs to be consequences.

------
sideshowb
Sounds legit. I heard Uber does lots of queryable stuff per second.

