
Rap Genius responds to Heroku's apology - tomlemon
http://rapgenius.com/Oren-teich-bamboo-routing-performance-lyrics
======
littledot5566
Not sure how this works but... Is the actual response in the green
hightlights? Are all the green highlights from the same person? I think they
are. But then, who made the yellow highlights?

~~~
pseut
It looks like yellow highlights are external links.

------
alexdevkar
Love the use of their product to respond. Haha.

~~~
_neil
I noticed that on the original blog post as well. Very clever. Though it seems
on the blog posts that some people maybe don't understand what rapgenius does
and end up saying things like 'WHAT ARE THESE BOXES WHY AM I HERE WHY DO I
NEED TO FILL THIS OUT???' in the context notes. But clever nonetheless.

------
foobar2k
The most interesting thing about this for me is the YC company vs. YC company
fighting.

~~~
akavi
YC invests widely enough that this isn't surprising in the slightest.

Hell, they've funded competing products at least a couple of times.

------
olgeni
"Your Recommended Daily Allowance of Drama."

------
kdsudac
Classless and pointless.

At this point I think rapgenius is just using this as a way to get
attention/traffic.

------
gavingmiller
> I’m convinced that the best path forward is for one of your developers to
> work closely with [redacted] to modernize and optimize your web stack. If
> you invest this time I think it’s very likely you’ll end up with an app that
> performs the way you want it to at a price within your budget

So... Heroku's CTO acknowledged there was a problem with their stack, and
offered to help RapGenius modernize and optimize it. And RapGenius quibbles on
the word "yesterday"?

While I do appreciate RapGenius raising the issue publicly to bring better
accountability to Heroku. Their response to Heroku's response ought to have
been along the lines of: "Hurray! Everyone's boat is rising with the tide."
Not this.

~~~
tomlemon
> Their response to Heroku's response ought to have been along the lines of:
> "Hurray! Everyone's boat is rising with the tide." Not this.

If Heroku actually knew about the problem for a long time and yet didn't
officially respond or apologize until we published that post, doesn't that
make you take their apology less seriously?

~~~
gavingmiller
Not really no. They offered to help you solve your problem. You received a
response from the CTO. One that is favorable to your position. How many other
CTOs would have done that? What more did you expect from them?

------
sherpa_derpa
The most amazing part of this situation, to me at least, is that a site
devoted to explaining rap lyrics is lucrative enough to pay $20k/month in
hosting. Imagine if they actually had a product!

~~~
adventured
I think it remains to be seen if it's lucrative enough to pay for itself as a
business.

The $15+ million they raised is currently paying for everything.

------
habosa
I don't think this was necessary. The first post by RapGenius was great and
was a really good way to point out an issue with a commonly used product.
However now that Heroku has come to them hat in hand and promised some sort of
resolution this seemed unnecessarily petty. Especially the one line about the
date of the report, that's just pedantic.

At this point in the issue RG and Heroku should be communicating privately,
not via blog post.

~~~
tomlemon
When Heroku's apology for misleading its customer is ITSELF misleading..
that's bad

> Especially the one line about the date of the report, that's just pedantic.

What they claim to have known and when is very relevant to how much you can
trust them! Here's the timeline from my perspective:

1) In 2011 Tim Watson points out the problem:
[http://tiwatson.com/blog/2011-2-17-heroku-no-longer-
using-a-...](http://tiwatson.com/blog/2011-2-17-heroku-no-longer-using-a-
global-request-queue)

2) Heroku responds but does nothing
([https://groups.google.com/forum/?fromgroups=#!msg/heroku/8eO...](https://groups.google.com/forum/?fromgroups=#!msg/heroku/8eOosLC5nrw/Xy2j7GapebIJ))

3) On 2/8/2013 I send Heroku full details about the problem including
simulations, etc

4) On 2/11/2013 Heroku responds and says the best path forward is for me to
optimize Rap Genius and that Adam's done talking about it with me.

5) The Heroku's Ugly Secret article goes up on 2/13/2013

6) Heroku releases BIG apology on 2/14/2013 saying they only learned about the
problem on day ago.

Heroku knew about the problem at the LATEST on 2/11, and before we made a big
public stink, their response was to say "just make Rap Genius faster"

------
mikeocool
While RapGenius it right to point out that Heroku was incorrect in describing
how their load balancing works. It seems off to blame this exclusively for
their performance problems. Plenty of high traffic sites not on Heroku operate
just fine using nginx's upstream load balancing, which is simple round robin
load balancing, paying no attention to how many requests are being handled
currently by a backend.

What seems particularly odd is that rap genius appears to be using their
Heroku dynos to serve their css, javascript and certain images. So loading the
homepage appears to make about 12 requests that hit their dynos, rather than
1.

If you were looking for low hanging fruit to reduce load on the dinos for a
high traffic site, this seems like an obvious place to start, rather than
jumping straight to a 'smart' load balancing solution.

~~~
tomlemon
> It seems off to blame this exclusively for their performance problems

There are definitely ways to improve the performance, but you can't measure
them since Heroku gives you no way of determining how much time requests spend
in the in-Dyno queue. Tho you can modify your app to get New Relic to display
this info: <http://rapgenius.com/1506509>

> What seems particularly odd is that rap genius appears to be using their
> Heroku dynos to serve their css, javascript and certain images. So loading
> the homepage appears to make about 12 requests that hit their dynos, rather
> than 1.

These assets are Cached in Varnish. We serve them without hitting dynos at all

~~~
mikeocool
> These assets are Cached in Varnish. We serve them without hitting dynos at
> all

Ahh gotchya, was under the mistaken impression that rapgenius.com pointed
directly at Heroku. Didn't realize there was a varnish layer in between.

~~~
bad_user
With the Bamboo stack, there's a Varnish in front of your servers that you can
use to cache responses. So it's part of Heroku. Cedar no longer has Varnish.

------
zaidf
_Come on_ Rap Genius, please stop treating your blog posts like song lyrics.

~~~
mhp
But that's exactly what their product is trying to do! They are saying you can
use their tech to comment and explain anything, not just song lyrics. I like
that their eating their own dog food. (I can't say the same for their color
choices).

~~~
zaidf
"Oren Teich – Bamboo Routing Performance"

What does that mean? Look, I love Rapgenius for the lyrics and I am as much a
fan of eat your dog food as anyone else. But hopefully they are going to go
full circle with the philosophy and also listen to user feedback. This user is
very confused.

~~~
tomlemon
It's definitely confusing – we've got a long way to go before Rap Genius is
the perfect platform non-musical textual analysis! (But I still think this is
a good way to present our reply since I want to comment on 2 of Heroku's
specific claims)

------
6thSigma
So what is the consensus at this point? Stay away from Heroku if you are
running a Rails app?

~~~
teraflop
More like, stay away if you are running an app that has limited concurrency
per worker dyno (either because your framework is single-threaded, or because
your workers are CPU-bound) and you need predictable low latency.

Bear in mind that it's not like Heroku does anything particularly terrible in
that use case. As far as I can see, it works about as well as any standard
round-robin load balancer would. It's just that if those are your
requirements, you have a problem that Heroku can't magically solve for you.

------
danielpal
Not sure if this is called for. Heroku has a performance issue and their
documentation had a mistake. They accepted everything, apologized and are
working to resolve it. What am I missing here?

~~~
ChuckMcM
There are two ways to look at this, and depending on your point of view you
might be upset.The root of the dispute is how they scale and how that affects
latency.

According to these write-ups, Heroku scales performance by doing dynamic
scheduling on an array of identical servers (called 'dynos'). The
documentation talks about a feature named "Intelligent Routing" which only
sends work to a dyno which is available to do work.

That is a pretty ideal setup because in practice it means that you get linear
scaling by adding server instances, and since costs are based on total server
instances you get both linear scale increase with linear cost increase.

However, there is a very classical problem, first noted by Gene Amdahl, about
the cost of figuring out how best to parallelize a stream of requests, vs the
rate at which you could satisfy those requests. It became known as "Amdahl's
Law"[1]. It limits the practical scalability of a lot of systems.

So at some point, Heroku got big enough, that the cost of figuring out which
server instance wasn't busy, was taking "too long". (that cost is the (1-P)
part in the Amdahl equation) so they decided to reduce the cost of making the
choice by replacing a "data driven choice" with a "statistics driven choice".
This too is actually a pretty well known way of doing things (Google and
Blekko use it to send search queries to a bunch of waiting backends) But
unlike the 'idealized' case which has every server instance handling at most
one queue, the value becomes a probability that the server instance is either
not-busy, or that the current transaction will finish quickly. This works well
for systems where the cost of every transaction is nearly identical, you just
add servers until the 90th percentile of requests hits your target, but poorly
for systems where each request has a variable amount of work it might do.

I spent a number of years studying these sorts of systems while solving
scalability issues at Network Appliance. A file system built out of a
distributed set of nodes providing access to a single file system image needs
to know a-priori the cost of each transaction flowing through it in order to
optimize scheduling. Similarly RAID subsystems need to know which disk I/Os
are going to land in the cache or on the disk, and if they land on the disk
will they result in a seek or not. You end up with a directed graph of
weighted probabilities being shoved through a channel of fixed bandwidth. Its
all amazingly fun until someone says "I have to get data back from the disk in
no less than 10mS every time" (databases would say stuff like that) and then
you start trading dollars for milliseconds as they say.

So Heroku changed their algorithm, didn't tell anyone, and the systemic
behavior changed in a very user visible way for large users (in this case Rap
Genius). The folks at Rap Genius were pissed off that they made this change
without informing them, and they, Rap Genius, looked bad to their customers
because of it. Nobody in operations wants to say "Uh, I don't know why your
experience with our service is currently sucking."

I can see why Rap Genius is mad, and I can see that Heroku might not have
fully thought through the ramifications of their algorithm change.

[1] <http://en.wikipedia.org/wiki/Amdahl%27s_law>

~~~
chc
I think the explanation for the change is a bit simpler than that. Someone can
correct me if I've misunderstood, but AFAIK it isn't that "Intelligent
Routing" was too expensive, but that it depended on simplifying assumptions
that stopped being true.

Originally, Heroku was only for Ruby, and it depended on the assumption that a
server could only handle one request at a time. All the talk of Intelligent
Routing seems to date from this time, so it appears that Intelligent Routing
just meant they never routed more than one request at a time to a server. But
then Heroku wanted to add support for things like Java and Node.js, which can
support multiple requests per server. This meant the simplifying assumption of
"1 dyno = 1 request" baked into the old routing algorithm was no longer valid,
so they had to switch to something else or they'd be crippling pretty much
everything but Ruby.

~~~
ChuckMcM
That would make sense, if Heroku didn't know which were ruby requests and
which weren't. But it seems like they did. If only as a set of VIPs (virtual
IPs) landing on the router being tagged as 'for ruby' or 'not ruby' which
could pick the appropriate routing algorithm.

Understand that keeping millisecond accurate state on 10 machines is doable,
on 100 machines its hard, and on a few thousand machines? It really starts to
break down. One way I've seen that done is that on ingress a request is
wrapped in a message for server X which is taken out of the 'free' pool, and
then when server X returns the answer back through the router it gets added
back into the free pool. But the next order effect is that list insertion /
removal has different sorts of behaviors, if you shift frees into the end of
the list and pop them from the front (a round robin approach) you get good
distribution but sometimes send things 'far' away when they could be served
locally. If you push/pop things from the front you get some really hot servers
and some really cold servers. Early on Google played some games which were
designed to maximize the use of available network backbone bandwidth (its
always oversubscribed from the server to the 'net'). Like any of the more
interesting problems it starts off easy and then gets harder and harder.

------
MikeKusold
I'm confused by this blog format. This seems to be a posting of the apology
not a response. Did I miss something?

~~~
_neil
Click on the yellow/green bits. Rapgenius is a lyrics site that allows people
to give context to lyrical meanings inline. See alexdevkar's comment below.

~~~
bitcartel
Aha... but it's not obvious. I thought the highlighted text were links to
supporting articles.

~~~
alexdevkar
But now you know how Rap Genius works so their plan was successful.

~~~
dlss
I think UX success is not measured in how many people you've already inflicted
a bad design on, but instead by how many you avoid hurting in the first place.

Why not make the rap explanations _look_ different than a style lots of sites
use for regular links? (perhaps a big square around the section or something
similar)

~~~
_neil
There are green badges next to their added comments. And a big bold part at
the top that says "Click the green links below to see our responses", but I'm
not sure it was there before or after they saw these reactions.

edit: I agree with you on the UX part. Just wanted to point out that there was
_some_ context clues.

