
Handling 1M Requests per Minute with Go - mcastilho
http://marcio.io/2015/07/handling-1-million-requests-per-minute-with-golang/
======
peterwaller
There are two other solutions that spring to mind, which might require quite a
bit less code:

1) Take the original code, do the upload exactly in place in the original
request (not even spawning a goroutine). However: protect the upload with a
semaphore which only allows N-in-flight.

My reasoning is, well, if the system operates with low latency when operating
nominally, blocking the incoming request isn't too painful. The reason there
was a problem in the first place that there were too many requests in flight
and the system hit a meta-stable state where no requests could complete
efficiently.

2) (or instead of (1)): If you're going to have a worker pool, why have that
complicated chan-chan-Job business? It seems that `func StartProcessor` was
close to being a viable solution. All you need is to start a few of those in
parallel, each reading from the same `Queue`. Was there a reason to introduce
the `WorkerPool chan chan Job`? That looks quite a bit more complicated than
it needs to be. The queues don't need to be separate per worker unless there
is some other substantial reason.

\--

The next thing one would need to take care of is to ensure that the whole
system doesn't stall due to a broken/laggy network, so, to put some timeouts
on the S3 uploads, for example, to ensure the system can return to a stable
state on its own when the thundering herd has passed.

~~~
ignoramous
Re: Semaphore: Not dropping the connection might mean we might starve others
incoming msgs of resources (which might be a good thing) [0]. Also, releasing
the semaphore back into the pool in case of failures takes on a happy
complication of having to deal with errors beyond one's control.

Re: Queue: Wouldn't the queue involve locking lest two workers end up trying
to work on the same request? To be completely concurrent, I guess one could
use a lock-free data structure instead (or implement one on top of something
like RocksDB)?

[0] [http://ferd.ca/queues-don-t-fix-overload.html](http://ferd.ca/queues-don-
t-fix-overload.html)

[0] [http://engineering.voxer.com/2013/09/16/backpressure-in-
node...](http://engineering.voxer.com/2013/09/16/backpressure-in-nodejs/)

~~~
peterwaller
Semaphore, I'm suggesting:

    
    
      Block on N-Semaphore, with timeout
      Do timeout upload
      Replace N-Semaphore
    

If you don't _always_ replace the semaphore, that's a bug.

Queue: I'm just comparing to what the article does. It already has contention
on a queue (the chan), it's just the chan-chan-Worker rather than chan-Job. In
practice, go channels happily handle millions of messages contending to
multiple workers just fine. Consider this test example where you aren't even
actually burning any CPU to perform the work:

    
    
        package main
        
        func main() {
            q := make(chan int)    
            for i := 0; i < 10; i++ {    
                    go func() {    
                            for x := range q {    
                                    x = x * 10    
                            }    
                    }()    
            }    
        
            for i := 0; i < 1000000; i++ {    
                    q <- i    
            }    
        }
    

On my laptop, it runs in 0.333s single core, and it's slightly slower when you
set GOMAXPROCS > 1\. But not much slower, the total runtime goes to 0.4-0.5s
or so. (Measured with go 1.4). As soon as you do any actual work with the
messages you are passing around, the overhead of locking will be lost in the
noise.

------
Udo
I was confused by the numbers at first, so: that's 17k requests per second,
spread out over 4 dual core Xeon (Haswell) machines, which works out to just
over 4000 requests/s per machine. It's still a respectable number, but it's
much closer to what one would expect given the task.

Don't get me wrong, the most interesting part is definitely the implementation
and as a Go noob I found it very useful - it's just a bit misleading for the
headline to sum your request rate across all parallelized machines.

------
fasteo
>> But since the beginning, our team knew that we should do this in Go because
during the discussion phases we saw this could be potentially a very large
traffic system

I don't get this reasoning.

~~~
Ao7bei3s
Go is 1-2 orders of magnitude faster than Ruby, and much easier to write
concurrent code in.

It makes perfect sense to me. What would you have recommended them, for a
reasonably-high-performance server implementation? (Please don't say C.)

[https://benchmarksgame.alioth.debian.org/u64q/benchmark.php?...](https://benchmarksgame.alioth.debian.org/u64q/benchmark.php?test=all&lang=go&lang2=yarv&data=u64q)

~~~
pjmlp
> Go is 1-2 orders of magnitude faster than Ruby, and much easier to write
> concurrent code in.

Than MRI Ruby you mean.

The advantage wouldn't be as much if Ruby designers cared to add AOT
compilation in the same vein as Dylan or Common Lisp to the canonical
implementation.

~~~
pselbert
That's exactly the space that Crystal [0] seems to be exploring. Statically
type checked and pre-compiled via LLVM. So it isn't the Ruby designers
themselves, but definitely Ruby flavored.

0: [http://crystal-lang.org/](http://crystal-lang.org/)

------
thwd
Arguably, UploadToS3 should not be a method of *Payload.

Suggestion: Make an S3-uploader package with _internal_ connection pooling,
upload queueing and concurrency handling.

~~~
zimbatm
This. Each S3-uploader can then hold a http.Client instance that keeps the
connection open to S3 between uploads.

------
darksaints
Does anybody here have any experience with Go's garbage collection pauses with
large stack sizes? I've got a scala app that regularly consumes about 48G of
ram, and I'm very happy with the response times during heavy loads like this,
but the P99.5 is abysmal because of garbage collection. I've tried tuning it,
but it doesn't seem like anything I do helps. I'll probably end up using an
Azul JVM but I'm curious how other languages end up handling this problem.

~~~
cdelsolar
What is P99.5?

~~~
ihsw
99.5th percentile -- below that threshold, requests are generally fine, but
there's a small fraction of requests that take way too long to go through due
to GC kicking in.

------
bpicolo
Cool article, not necessarily because of the language specifics but because of
the thought process involved. Thanks!

------
wpeterson
If this system is merely decoding JSON and writing payloads to S3 for
asynchronous data processing, why not have your clients write directly to S3?

~~~
blakesmith
I'm not the author of the article, but if I'd have to guess: Data
encapsulation, and interface control. If all your clients are talking directly
to S3 instead of your encapsulated service interface, you can't inject any
business logic at all and must design around the fact that you don't control
the service interface, Amazon does.

------
mrfusion
It's interesting this came up today. I'm looking for a new language to migrate
my flask/uwsgi web service to. I'm having a terrible time making it scale.

Are there any tutorials/templates/best practices for writing a small web
service in Golang?

~~~
plydatbk
the only reason to write something like that in Go is if you're looking to
impress a recruiter at (Go). Advise: pick a better tool for your problem.

~~~
thequailman
This is terrible advice. Golang may have a flavor-of-the-week status in some
people's minds, but it's a language that really deserves more credit. It is
really easy to program and learn, and it does a lot of stuff other languages
rely on third party programs for. With each release, nagging issues (namely
around garbage collection) are getting resolved, but it's more than production
ready. To ignore Golang right now would be akin to ignoring Java back in the
early 2000s in my opinion.

~~~
sbov
Golang has my attention, but I don't think it's anywhere near Java, at least
popularity-wise, in the early 2000s. By then most schools had already switched
their language of choice to Java - I'm not aware of any that has switched
theirs to Golang.

I do enjoy coding in Golang, but we use mostly Java where I work, and for us,
the benefits don't make up for the things we lose. This blog post is a great
example: the solution they had to find is the first thing you'd probably do in
Java, because Java has a standard package with all sorts of concurrency
patterns.

~~~
muraiki
Yeah, I remember wanting to do a fan-out pattern in Go and reading the Go
Pipelines and Cancellation article[0]. I saw the function merge() in the
article and thought, "Great, here's what I need!" I then proceeded to read
further and saw that I have to define this function myself based on the types
I'm using, which made me quite sad.

Go really needs a library for these patterns built in... I assume the lack of
generics prevents users from creating that themselves (I'm not trying to start
a language war here, seriously).

[0] [http://blog.golang.org/pipelines](http://blog.golang.org/pipelines)

------
AYBABTME
You can put up any large number of request if you make the period in 'per
{{period}}' large enough.

------
eva1984
Little confused here.So the third solution mentioned in the post is just a
worker pool? I think if we just slightly modified the second solution, it will
totally work.

1).Initialize a job channel

2).Initialize a set of workers that listen to this channel to pull the jobs
indefinitely. In this case just call go StartProcessor() for fixed number of
times.

What confuses me is that IMO workPoolChannel isn't necessary here. What is the
consideration behind to use a channel for workers?

------
avitzurel
IMHO this solution is nice but wrong. The point is not just creating a "cool"
program with Go that will handle HTTP requests.

Without really knowing the company's needs, I am relying on this paragraph
from the post:

While working on a piece of our anonymous telemetry and analytics system, our
goal was to be able to handle a large amount of POST requests from millions of
endpoints. The web handler would receive a JSON document that may contain a
collection of many payloads that needed to be written to Amazon S3, in order
for our map-reduce systems to later operate on this data.

Knowing this, I would build it differently.

1\. Clients post to S3 Directly 2\. Lambda -> Overload business logic, private
data, cleanup, spam control etc... 3\. Prepare files (64M) for Hadoop 4\.
Hadoop

There's no reason to have that proxy in the middle, Amazon S3 will handle
those millions of requests with no real trouble, I wouldn't throw machines on
this process.

------
th0br0
Other than the worker/concurrency mechanism being part of the language, what's
the difference to a RabbitMQ-Worker architecture? You might even argue that
the lack of persistence (in the given example) is a potential source for data
loss.

~~~
fixxer
I'm guessing persistence is not a priority.

The difference is management of a simple process (behind elastic load
balancer, of course) vs a more complicated architecture with three distinct,
load balanced process types (webserver -> queue -> worker).

------
istvan__
Is this supposed to be great performance? I think Netty does ~30K/s (1800000
req/min) out of the box. It thought Go has more out of the box performance,
maybe I am missing something.

~~~
anonyfox
Oh, performance? Let's throw numbers around! Here, Elixir/Phoenix beats them
all!

[https://twitter.com/julianobs/status/614416512825323520](https://twitter.com/julianobs/status/614416512825323520)

Hey, and Elixir is already _way_ more expressive than Go and it's incredibly
easy to build fault-tolerant and distributed systems, not to mention the
productivity gains when using the phoenix framework!

Seriously, posting requests/second metric without _any_ context about hardware
and sample code doesn't help anyone.

~~~
istvan__
Cool it is only 10-16x slower than Aleph.
[https://github.com/ptaoussanis/clojure-web-server-
benchmarks...](https://github.com/ptaoussanis/clojure-web-server-
benchmarks/tree/master/results/60k-keepalive#60k-keepalive)

~~~
anonyfox
full blown mvc framework vs communication layer, seems legit :)

but hey, as long as stuff responds in microseconds with zero errors under
load, just use it ! (Also, clojure is a way better language than go, too.)

~~~
istvan__
Yes I like to throw meaningless numbers around as much as the other guy. :)

------
Spien
Can already do this with a single thread using epoll. In fact, even a simple
epoll implementation can handle 1M HTTP requests in ~30 seconds on a single
thread juggling 10k connections.

------
sinzone
Have you considered in putting KONG [1] which is basically OpenResty (nginx)
in the front? With LuaJIT performances [2] are outstanding.

[1] [https://github.com/mashape/kong](https://github.com/mashape/kong)

[2]
[https://github.com/mashape/kong#benchmarks](https://github.com/mashape/kong#benchmarks)

------
mstump
16k requests per second isn't fast.

~~~
fixxer
16k requests per second is not fast if you're talking about fetching a page or
doing a minimal amount of I/O.

16k requests per second is worth writing about if you're talking about a
process with substantial side effects (S3 I/O).

~~~
mstump
It's just pass through. The limit in this instance is probably packets per
second of the legacy AWS network. Why is this difficult?

~~~
fixxer
Read the analysis. First attempt was "just a pass through".

~~~
mstump
No, it is just pass through, they just did a crappy job of handling
concurrency.

~~~
fixxer
Maybe they should hire somebody smart like you.

~~~
mstump
To be snide, maybe they should, I am an expert in this topic this is what I do
all day every day.

They're doing development without understanding how computers work, where the
bottlenecks are, or what the maximum theoretical throughput for the use-case
is. They ended up with something slightly better than the horrible situation
they were in, and are celebrating a inefficient solution as a technical
triumph.

~~~
fixxer
> inefficient solution as a technical triumph

They were able to solve their problem in a single process balanced over 4
boxes without ever having to hire someone like you, despite your expertise.

Could they have increased throughput? Absolutely. It would have involved a
different architecture with more complexity & time, and it also would have
relied on skills beyond what was immediately available. I'm guessing their
line count is around ~200 for the core functionality.

Can you share some actual technical points where they made an error? I would
really like to see you demonstrate expertise beyond these uninspiring
generalities.

~~~
Denzel
This type of simple I/O bound pass-through problem lends itself extremely well
to evented I/O. Conceptually, their first solution was closest to mimicking
the benefits of evented I/O, given how Go's runtime works. When a goroutine
submits a blocking I/O request, it will yield to another goroutine and wake up
later when it can work with the data. So what happened with their first
solution?

Well, Go's runtime allocates 8KB (last I recall) of growable stack space per
goroutine. Assuming that their first solution was deployed on the same
instance type as their final solution: c4.large (3.75 GB), then they could
handle at most ~470,000 outstanding gorountines; assuming that all RAM is used
for only gorountines, which is not realistic of course. So their server fell
over once it exhausted memory.

This type of memory exhaustion isn't a problem with evented I/O. You have a
single thread that responds to async events related to the I/O you're
performing.

So, due to the limitations of Go's runtime, they settled upon a worker-pool
that allows at most MAX_WORKERS outstanding requests to S3. Not the most
efficient solution for this problem. But it works for their use case, for now,
and that's what truly matters.

~~~
fixxer
Excellent, logic-driven critique.

------
meir_yanovich
Can you please explain why to use go and not c++/c

forget about language syntax / compilation complexity.

say i know both very well , now why to go with "GO" ?

thanks

~~~
nindalf
There are a few benefits of Go I can think of

* Fewer lines of code, fewer gotchas and hence easier to reason about and maintain.

* Powerful concurrency primitives (channels, select) built right into the language, rather than a library. A scalable producer-consumer implementation would probably be 100 lines of Go code.

* If your application isn't too latency sensitive (game server, frequency trading etc) then the GC simplifies matters. Its guaranteed to run for a maximum of 10ms out of every 50ms which is good enough for most applications. (but typically runs for around 1ms)

* Some of the tooling around the language is great. There are some great articles (I remember one posted to HN yesterday) about how people wrangled a lot more performance out of their code using the profile tool, for instance.

* Miscellaneous goodies like testing out of the box, an extensive standard library and being able to compile in 1/10th of the time.

An example of a service migrated from C++ to Go - dl.google.com -
[http://talks.golang.org/2013/oscon-
dl.slide#1](http://talks.golang.org/2013/oscon-dl.slide#1)

These are the benefits I could think of if a programmer knows C++ and Go
equally well. However, suppose he has to work with fellow programmers who
aren't comfortable with either, Go would be a superior choice. It would take a
week to learn most of Go and perhaps a month to grok it. I think C++ takes
much, much longer than that to learn properly.

~~~
meir_yanovich
Thank you for the reasoned reply.

~~~
nindalf
You're welcome :)

------
azth
Would have been much simpler and more straight forward to use a library like
Akka instead of manually coding all of this.

------
melling
What's the trick to resubmit without getting stuck going to the first post?
For example, I submitted this story 2 hours before this one:

[https://news.ycombinator.com/item?id=9844826](https://news.ycombinator.com/item?id=9844826)

In the past, when I try to submit a story, even if it's a couple days old, the
submission is ignored, and my vote is added to the original.

~~~
thwd
The slash at the end of the URL made the difference in this case.

------
mrfusion
Mods, can we change "go" in the title to "golang"? That's the official name,
and it makes it searchable by search engines?

~~~
f2f
I'm sorry but the official name is Go. You can search for "Golang" or "Go
Language" with no issue.

~~~
mrfusion
Would this thread come up under one of your suggested searches?

~~~
f2f
both the original article and the Reddit discussion come up in my results (as
1 and 2). the reddit discussion is only 16 hours old.

[https://www.google.com/search?q=golang%201%20million&rct=j](https://www.google.com/search?q=golang%201%20million&rct=j)

this HN discussion is too recent to show up in my results, however yesterday's
thread about Qihoo and golang does show up as number 4 or 5 when searching for
"golang qihoo".

no need to panic.

