
How We Built Uber’s Highest Query per Second Service Using Go - dodders
https://eng.uber.com/go-geofence/
======
emrekzd
I don't think I've learned anything from this article. It's interesting how
quickly the keywords pick up on HN (Uber and Go).

More than 3 years ago we've implemented a reverse geocoder web-service that
indexed complete Census TIGER dataset. The service handled over 15K reqs/sec
on a 2011 macbook, doing exact in polygon search(no approximations). We
implemented an optimized (for in polygon search) R-tree data structure for the
lookups.

Assuming Uber's geofence lookups would rely on a much smaller dataset than
TIGER, I think you could have come up with a much more efficient
implementation requiring a fraction of the resources.

And again, use of Go; irrelevant.

~~~
azth
> We implemented an optimized (for in polygon search) R-tree data structure
> for the lookups.

I wouldn't be surprised that due to the lack of generics in golang, this
approach was less feasible.

~~~
nitwit005
Where would they even use generics? All their algorithm requires is arrays of
polygons, and a function to check if a polygon contains a point.

~~~
azth
To store the RTree entries. Say you want to switch between 32 bit and 64 bit
floating point types in your implementation, without re-writing the entire
thing.

~~~
majewsky
sed -i s/float/double/ *.go

------
StreamBright
This article is a little bit hand-wavy for me. Essentially they could use any
languages to implement this, also there is no proof how channels could not be
used. I bit more details would be essential to make this a good read, at least
for me.

~~~
throwaway6497
Glad I am not the only one who feel this way. Some benchmarks between load
ptr/store ptr and the read-write lock would be helpful. Also simplified code-
blocks for each of the approaches and more details into pro/cons of each
approach could have helped for better understanding.

~~~
harlanji
Agreed. CSP is one of the selling points of Go, in my view. I expect some
overhead to be the price of a simple and easy concurrency model. Some
conservative use of locking etc is okay in some cases, and as a non-Gopher
those are the cases I'm most interested in.

~~~
chimeracoder
> Agreed. CSP is one of the selling points of Go... and as a non-Gopher those
> are the cases I'm most interested in.

This makes sense, and when I first started writing Go, I felt the same way,
but almost four years later, concurrency is one of the least important reasons
I still write Go.

I wouldn't say all Gophers feel this way, but I know it's a very common
experience for Gophers - "you come for the concurrency, but stay for the
interfaces"[0].

[0] I've alternatively heard things like "readability", "tooling", and
"robustness" used in the place of "interfaces" in this quote.

~~~
hepta
Wait a minute, is this a troll post? Interfaces? I get the tooling, maybe, and
CSP out of the box, I don't understand what's great about Go's interfaces
(compared to plain old boring Java, for instance).

~~~
stuartweir
Have you had much experience with Go? Go interfaces and Java interfaces are
different beasts... Go's interfaces are incredible...take a look at Reader and
Writer from the io package as an example - they're two of the most powerful
interfaces in the language, and extremely flexible.

~~~
chimeracoder
> Go's interfaces are incredible...take a look at Reader and Writer from the
> io package as an example - they're two of the most powerful interfaces in
> the language, and extremely flexible.

Yup. I literally just gave a talk at two Go conferences in the last two weeks
(GopherCon India and GopherCon Dubai) about the io.Reader/io.Writer interfaces
specifically. They're deceptively simple on the surface, but they're insanely
powerful once you 'get' them.

~~~
solidsnack9000
It would be interesting to know more about the power you seen in `io.Reader`
and `io.Writer`. They seem to say, effectively, that every type implements
serialization from bytes and serialization to bytes. Generic serialization is
not so unusual but maybe in the context of Go can be so much more?

------
Animats
It's interesting that they're doing a spatial database without using a 2D
spatial data structure. MySQL and PostGres both have data structures for this.
But maybe they don't have that many geofences per city.

They have an N readers, one writer problem. It's possible to do that without
blocking the readers. There was a YC article on a lockless solution to that
yesterday. This is an easier problem. Each city's geofences can be updated by
making a copy, updating the copy, and atomically replacing the single
reference to the copy. Any searches still running on the old copy continue to
run on the old copy. Eventually, the GC deletes the old copy.

~~~
spyspy
I believe that's exactly what they were doing with the
StorePointer/LoadPointer primitives. However it is generally agreed in the Go
community that the sync/atomic is to be avoided in favor of the sync/mutex for
this type of problem. The RWMutex is designed for a 1 writer/N readers
scenario. The code is more readable and you don't offload any work to the GC.

~~~
pcwalton
> However it is generally agreed in the Go community that the sync/atomic is
> to be avoided in favor of the sync/mutex for this type of problem.

Why? Concurrent data structures are extremely useful.

> The code is more readable and you don't offload any work to the GC.

You shouldn't write concurrent data structures yourself; you should use a
library. And the amount of time spent in the GC for this is going to be
negligible assuming writes are infrequent compared to reads. It's rare in a
GC'd language that your write operation won't be creating garbage somewhere
along the way, so it ends up being amortized.

------
chris_overseas
"Instead of indexing the geofences using R-tree or the complicated S2, we
chose a simpler route..."

"While the runtime complexity of the solution remains O(N), this simple
technique reduced N from the order of 10,000s to the order of 100s."

It seems odd to me that they're posting about the performance of Go, yet they
deliberately chose a less optimal algorithm because the better ones were
"complicated". Or, if their simpler approach did in fact run faster than
R-trees or S2, it would have been nice to see some benchmarks and a clearer
explanation for why. Choosing the best algorithm seems more important than the
language in a case like this one.

~~~
brianwawok
Sometimes a worse O algo is way better in real life, due to things such as
memory alignment and page size. The more time you spend writing high
performance code, the more you learn that the kind of algorithm analysis you
learn in school is crap in the real world. If my O(N^2) also is 3 lines and
smoking in real benchmarks your 600 line O(n log n) algo, guess which one I am
using?

~~~
vvanders
Yup, so much this.

I always have fun trotting out Radix Sort and watching fresh CS grads try to
understand how a O(kN) walks over O(NlogN) by a factor of 10x+.

~~~
brianwawok
I also have fun busting out O(N^2) and getting a death stare. Then I ask them
the biggest N could possibly be and get an answer like "16"...

~~~
vvanders
Haha yeah,

Fun bit about Radix is it really scales well if your dataset fits in memory.
Last time I benched my lame Java implementation it was about 2-5x faster than
the built in Arrays.sort() on native values(float, int) and 20x faster when
you started putting it up against Comparable. Trended that way well up to 65k+
entries.

It gets even better if you can drop down to a proper native language that lets
you prefetch.

------
NolMan
I was surprised to not see PostGIS at least mentioned. I've been using PostGIS
professionally for the last few years and only have good things to say, the
toolchain is very mature.

We make pretty extensive use of R-Tree indexes in PostGIS, tables with
10,000,000+ rows and shapes that roughly match streets, parcels, blocks, zip
codes. While I don't have 99th percentile data offhand average query time is
<1ms. Perhaps there are slow responses at the 99th percentile however I would
be very surprised by this.

------
williamsharkey
I would be interested in comparing the performance of Uber's first pass "city-
fencing" vs a pre-computed 1D array.

EG: Subdivide a flat projection of earth into n^2 squares. Create an array of
length n^2. Set the value of each element in the array to a list of canidate
geofences(which have area in that square).

Scale lat and long between 0 and 1. Then you can index directly into it with
PrecomputedArray[floor(lat*n+long)]

This is trading space for time, may as well choose space here.

~~~
jonaias
Classical example where algo > lang

Great approach!

------
artursapek
> While historically Uber has been mostly a Node.js and Python shop, the Go
> language is becoming the language of choice for building many of Uber
> Engineering’s new services.

It feels like this type of narrative is becoming more common.

~~~
hughw
I was surprised that they felt Node's performance was slower for computation,
given the effort V8 has put into optimization. Particularly he called out
"interpreted" and "dynamic-typed". But V8 compiles and optimizes frequently
used functions, and recompiles running functions when it detects that they are
being called with the same runtime types repeatedly. This runtime optimization
can't be quite as fast overall as Go, but I'd be surprised if well-designed JS
couldn't run 60-80% as fast. Maybe that's enough difference for them to make
the switch.

~~~
hughw
It's chickens*t to down vote my comment, when you could argue a fact, or
explain something to me that I don't understand.

My question isn't silly. I just now wrote two short programs, one in Go and
one Javascript, to run a vector multiply and add on two length 1000 vectors, a
million times. The JS program is faster.

Result (in seconds):

    
    
      $ node vmadd.js
        0.984
      $ go run vmadd.go 
        1.141044662
    

JS code:
[https://gist.github.com/hwinkler/62c9da0d9f2d8c7981b0](https://gist.github.com/hwinkler/62c9da0d9f2d8c7981b0)

Go code:
[https://gist.github.com/hwinkler/d234bb62b2c5fa081a50](https://gist.github.com/hwinkler/d234bb62b2c5fa081a50)

~~~
_ph_
Go run first compiles the code and builds an executable. You are mostly
benchmarking for both systems the compilation time. Building an executable in
Go takes most of a second. A significant benchmark would check the run times.

~~~
artursapek
Exactly. Build it and then benchmark the binary. Nobody uses `go run` in
production.

~~~
hughw

      $ go build vmadd.go
      $ ./vmadd
      1.106074715
    

This is a 2013 Core i7 15" MBP

~~~
hughw
I'd rather be right, than have karma.

------
rosshemsley
Ridiculous article.

'Highest QPS?'

This is a trivial geometric problem, any sane implementation should be several
orders of magnitude faster than doing network IO. /rant

~~~
codesushi42
Agreed.

"Instead of indexing the geofences using R-tree or the complicated S2..."

"... we first find the desired city with a linear scan of all the city
geofences"

Why not use a spatial index? It's not hard, and you wouldn't need to worry
about rebuilding your index because your city geofences are not likely to
change frequently.

The bottleneck here isn't rooted in I/O but a better algorithmic approach to
the problem.

------
mbell
I'm not terribly familiar with geo algorithms but I'm curious why it's
worthwhile taking the two layer approach.

Can't you precompute the X and Y minima and maxima for each geo fence and
throw out 99% of candidates extremely quickly? In other words store the
bounding rectangle for each geofence. Even with 100,000s of entries we're
talking 4 primitive type comparisons per entry to exclude all non-possible
candidates. I would have a hard time believing this is slower than ray casting
to city geofences.

~~~
silenz
It would still be faster with a two layer approach.

Check 100 000 entries (4 comparisons each) - > filter out 1% to do expensive
calculations on.

Vs

Check 25 cities (4 comparisons each) - > filter out 1/25 Do another easy run
on the filtered 4000 and get the same areas to do expensive calculations on as
above.

That's 25+4000 easy calculations against 100 000.

But yeah, I'm not sure why they seem to go the hard route from the beginning.

~~~
mbell
That assumes the cities have rectangular boundaries, at least the inference I
got from the article is that they are also polygonal geofences.

If you have a nested series of rectangular bounding boxes you're well on your
way to building an R-tree.

------
spacemanmatt
Why not PostgreSQL (PostGIS)?

~~~
throwaway449
Most things at Uber were first built in the natural "why don't you just" way.
Python/Flask monolith, Postgres, haproxy, etc. They are being replaced with
fancier distributed systems (microservices, Schemaless, Ringpop, etc) by
necessity.

~~~
TYPE_FASTER
Has the transaction rate surpassed what you can get out of Postgres? If not,
what is the necessity for the change? Just curious. Thanks for any insight.

------
bufo
_CPU intensive workload. Geofence lookups require CPU-intensive point-in-
polygon algorithms. While Node.js works great for our other services that are
I /O intensive, it’s not optimal in this use case due to Node’s interpreted
and dynamic-typed nature._

That's misleading, math in V8 can be VERY fast (aside from the current absence
of vectorization). See the benchmarks here:
[http://julialang.org](http://julialang.org)

~~~
TillE
Can be. Based on those particular benchmarks, LuaJIT is even better than V8,
and I've frequently run into situations where doing math in C++ is 5-10x
faster than in LuaJIT.

I suspect the lack of an integer primitive is a big deal in both Lua and
JavaScript.

~~~
pcwalton
> I suspect the lack of an integer primitive is a big deal in both Lua and
> JavaScript.

Not as much as you'd think. All JavaScript JITs speculate that numbers that
have been observed to be integers remain integers and compiles them to use
integer operations. As long as your code is hot enough to enter the optimizing
JIT, the overhead is solely in whatever overflow checks couldn't be optimized
out.

~~~
chrisseaton
Right - just because you don't see integer primitives when you're programming
doesn't mean they aren't there.

------
benbjohnson
How often does the fencing data change? If it's infrequent I wonder if you
could load the data on start up and simply restart nodes to refresh the data.
That would avoid a mutex lock entirely.

However, a mutex lock/unlock should take <100ns. It doesn't seem like that
should be a bottleneck. It'd be interesting to hear more about the data,
requirements, and response times (median, 90%, etc).

------
iamleppert
I've done something similar. I used the GPU (actually WebGL) and made several
images of my geofences where each one was simply drawn as a bitmask (black and
white). Then, I wrote a fragment shader to determine if a point is within a
geofence by looking the pixel value. I rounded the lat/long pairs to a
precision that matched the resolution of my image.

Can't remember the exact results but throughput was much higher than what I
was able to achieve using highly optimized PostGIS queries.

------
bsaul
really interesting to see that go channels weren't used due to performance
concerns. channels is the often advocated tech for concurency in this language
so having to fall back to mutexes seems like a downpoint. I would have loved
to see a benchmark in the article.

~~~
pcwalton
This is very common. Sharing memory is extremely important whenever you're
doing anything related to CPU parallelism/concurrency; often if you don't use
it you end up worse than the sequential algorithm.

~~~
haberman
It's surprising to hear you say this. I thought a design goal of Rust was
highly discouraging shared-state concurrency. Has this design philosophy
evolved somewhat over time?

~~~
pcwalton
> Has this design philosophy evolved somewhat over time?

Yes, it has. Post-1.0 Rust's design philosophy is about _taming_ shared state
concurrency: that is, disallowing data races statically, making sure you take
locks or use atomics where possible, and having a robust set of generic
concurrent data structures ready for use so you don't have to implement your
own.

This article, for example, rightly observes that atomics are difficult to get
right, making it not scalable to have every program implement concurrent data
structures from scratch. The benefit of having a rich set of generic
concurrent data structures readily available is that it mitigates this
problem.

~~~
haberman
Makes sense! It's definitely a compelling story to offer the same abstractions
that exist in other languages, but with static guarantees of correctness. A
mutex that you can't mess up, that's a great thing.

------
nevi-me
I built a route finding algorithm in Node.JS for one of my projects. I also do
some geofence checks, but I use a geo-index (Mongodb).

Similar to other comments, the article is a bit lacking in detail. Regardless
of language used, I would have liked to see a comparison of query times using
a spatial index and not. From my limited experience with PostGIS, there are
some inside polygons that perform poorly on large polygons, where Mongodb has
done better, and vice versa with linestrings.

------
yueq
"...service handled a peak load of 170k QPS with 40 machines running at 35%
CPU usage on NYE 2015. The response time was < 5 ms at 95th percentile, and <
50 ms at the 99th percentile."

It's ~4k per instance. I would like to know what the R/W ratio is. And how
often the index is updated is the key, aka locking operations, which should be
mentioned in the article.

Also this article should give more comparison on other implementations, if
possible. At least compare with Uber's monolithic implementation.

The geo-datastructure part is very cool.

------
nailer
> Because node.js is single threaded

cluster's been around a while now, and is built in to node 5.x.

------
enitihas
Am I the only one not able to access the site?

~~~
EugeneOZ
Copy for you:
[https://rawgit.com/e-oz/6c563f372feccab78e8d/raw/65c091dd881...](https://rawgit.com/e-oz/6c563f372feccab78e8d/raw/65c091dd8812a695b73c864c825d7f799783dda1/snapshot.html)

------
rodionos
The article is about hiring a go developer that will write a better article

------
EwanG
If you need an explanation of how Go might help your organization, this
discusses how Uber does GeoFencing using the language.

------
w84t1me
A reliable runtime is listed in the three main benefits of using Go. Are the
runtimes for the alternatives really any less reliable?

I can buy that the language is easy to learn since the feature set is very
limited, but is productivity in the first week/month really what to optimise
for?

They also list performance as a benefit.

Go is a reasonable choice, but I doubt that the result would have been much
different if they'd used any other modern statically typed language such as
Rust, Scala, D, or even Nim.

------
matthewaveryusa
>Because Node.js is single threaded, background refreshing can tie up the CPU
for an extended period of time (e.g., for CPU-intensive JSON parsing work),
causing a spike in query response times.

I'm not the biggest node fan, but that is utterly false. How do you think
asynchronous system calls get processed in libuv (the event loop that node
uses.) ? threads. Of course you'll park the background thread while you're
waiting in case of IO, but if it's CPU intensive then you bet that you can use
all cores with background work. You just need to know how to write C++ and
read the documentation on the bindings

~~~
sciurus
Why should they write it in JavaScript _and_ C++ when they can write it in one
language instead?

~~~
matthewaveryusa
They already have node which is a mix of javascript, C++ (v8) and c(libuv.) Go
is a fine language, and it's ok to want to write infrastructure code in go,
but they shouldn't be giving false reasons.

~~~
sciurus
Sure, node.js is implemented in multiple languages. But when people talk about
writing code in node.js, they mean writing javascript. It's a javascript
runtime, after all; it says so right on the front page of
[https://nodejs.org/en/](https://nodejs.org/en/) and even has js in the name.
:-)

Saying "of course node.js supports multiple threads of concurrent computation,
you just have to write in c++ instead of javascript" is missing the point of
why people use it.

~~~
cdelsolar
Hahah I wasn't sure if parent was trolling but it seemed like he wasn't...

