
How Art.sy survived a NYTimes.com homepage launch with Heroku + MongoHQ - carterac
http://success.heroku.com/artsy
======
minimax
No offense to the Art.sy team, but this is just Heroku marketing copy. It
lacks the detail we typically see on these sorts of high scalability "how we
did it" posts. Non-trivial applications scale non-trivially, and when someone
comes along claiming they have solved the scalability problem with the push of
a button I am instantly skeptical.

I would really like to see more details about their architecture, especially
about the MongoHQ integration.

~~~
dblock
I wrote a blog post on our overall tech stack here:
[http://artsy.github.com/blog/2012/10/10/artsy-technology-
sta...](http://artsy.github.com/blog/2012/10/10/artsy-technology-stack/)

Our MongoHQ integration is very straightforward - MongoHQ provides us with a
replica set and we configure it as any other MongoDB database.

Feel free to ask any specific questions.

~~~
georgeorwell
I realize this is off-topic but just as a bug report, I tried to search for
"cezanne watercolor" and it didn't understand, despite that you have
watercolors by Cézanne. Also it wasn't clear whether watercolors were best
found under "paintings" or "works on paper". I think you need to make it
easier for non-computer people to find specific types of image.

~~~
dblock
It's an interesting observation. We built a full text semantic search in 2011
by reverse-indexing art search results from popular search engines and it
could do things like "Cezanne watercolors" at ease.

We showed it to people and we found that one could easily trick the system
with things like "Worst American Art Ever". That generates results, but shows
the limits of a general semantic search in a narrow context.

Happy to hear any suggestions about how to make something like this both
useful and not too easy to make look really silly.

~~~
georgeorwell
Well surely each of them is tagged with both 'cezanne' and 'watercolor', can't
you just match search substrings to tags?

~~~
dblock
It only works in that case. What if there's another work that is called
"Cezanne's Favorite Watercolor", and we're just in the beginning of search
hell :)

------
jcampbell1
I'd read this with a grain of salt. I remember looking at art.sy via the
NYTimes link and thinking the site was terribly slow. I still find it
moderately slow. I am not sure they really need mixpanel, google analytics,
and kissmetrics, on every click and the api calls should take less than the
1-4 seconds I am seeing.

~~~
dblock
dblock from Art.sy here. The analytics comment is totally fair - we spent a
lot of time looking at all kinds of stats as we keep experimenting and it
needs to be trimmed down to 1 (or none :)).

Our average API response is 380ms. It's about 20x too long as far as I am
concerned. It's not Heroku's fault to be fair, there's a mix of Ruby code,
database queries and some not so easy to optimize math in some cases. It's
definitely work in progress.

~~~
bigiain
I suspect I'm well outside your target demographic, but I'm seeing ~22 whole
seconds to onload (according to Firebug) here in Sydney, Australia (as 52
requests comprising 1.6Meg of data.)

That certainly seems excessive for a non-logged-in user on your main landing
page.

~~~
dblock
China and Australia are the worst regions right now in terms of content
delivery speed by a factor of 3.

I suspect with a new AWS data center in Australia Cloudfront delivery will
become better there soon, and we're looking at Akamai medium term.

------
cjstewart88
I'm just curious, how much traffic NYTime brought you guys. I'm also on Heroku
and my app(<http://www.tubalr.com>) "survived" several articles around the
web(Mashable, TechCrunch, The Verge, The Next Web, Japan LifeHacker, and
several others) within a very short timespan. My bill has never been over $25,
which includes no extra workers.. just extra DB storage and an upgraded DNS
plugin.

Just curious, thanks!

~~~
abfabry
Art.sy was lucky to be the top article on NYTimes (the one with the large
image at the top of their homepage) from about 5pm to 11pm, were the #3 most
emailed, and the NYT article was #3 on Hacker News. We went to 1500
concurrents almost instantly, and maintained it for several hours. This is
nothing for many large sites, but was the most we'd had. We had added some
extra dynos in preparation, and our API response time actually went down as
more obscure routes were hit and cached.

We prepared for the worst, though, and had built a failsafe way to
progressively shut off more demanding features on the site (for instance: our
related search results on artwork pages) in case we were getting overwhelmed.
Much better to have a reduced feature set during launch than a broken site.
Fortunately we didn't have to flip that switch!

~~~
cjstewart88
That's insane, congrats on staying up, and thanks for the response :)

------
triplesec
Ugh! this is just a puff from heroku. Yes, I'm sure it's a great tech, but we
have no idea of whether it is accurate or not. This piece has no analysis, no
comparisons, and certainly no independence or objectivity.

------
_lex
I'd love to deploy on Heroku, but it's so dramatically more expensive than
aws. For a startup, is it really worth that additional cost?

~~~
prezjordan
Heroku is really just AWS with convenience, isn't it? I'm not sure what else
they do - but I guess that "convenience" is worth it for some.

~~~
gridaphobe
Is it possible to launch using just a single dyno and heavy
caching/optimization?

~~~
oellegaard
They will shut down your instance, if you don't use it for an hour or two -
and then it will take like 8-10 seconds to load it again. If you can live with
that, I guess that would work, yes.

~~~
gkop
There are very, very easy ways around this.

~~~
citizens
One being a service like <http://uptimerobot.com/> that pings the site every 5
minutes

~~~
dblock
From what I see, when you have a single dyno, Heroku deploys you to a
different environment sandbox called "development". Typically that has less
uptime and takes lots of upgrades all the time with interesting consequences.
I believe they promote to "production" applications much less frequently.

------
thelarry
I think this article can be a little misleading. Sure, with cloud tech you can
easily spool up more instances. And yes, it is great to not have to worry
about configuring a load balancer (I guess). But just because adding more
instances sort of fixes a problem doesn't mean it is a good thing to do. Not
"having to do calculations" is a very bad attitude. You should know where the
bottle necks are, and if adding instances is actually necessary or just some
scotch tape. Bad architecture and coding can bite you in ways that adding
hardware cannot fix.

