How Art.sy survived a NYTimes.com homepage launch with Heroku + MongoHQ

minimax · on Nov 27, 2012

No offense to the Art.sy team, but this is just Heroku marketing copy. It lacks the detail we typically see on these sorts of high scalability "how we did it" posts. Non-trivial applications scale non-trivially, and when someone comes along claiming they have solved the scalability problem with the push of a button I am instantly skeptical.

I would really like to see more details about their architecture, especially about the MongoHQ integration.

dblock · on Nov 27, 2012

I wrote a blog post on our overall tech stack here: http://artsy.github.com/blog/2012/10/10/artsy-technology-sta...

Our MongoHQ integration is very straightforward - MongoHQ provides us with a replica set and we configure it as any other MongoDB database.

Feel free to ask any specific questions.

georgeorwell · on Nov 27, 2012

I realize this is off-topic but just as a bug report, I tried to search for "cezanne watercolor" and it didn't understand, despite that you have watercolors by Cézanne. Also it wasn't clear whether watercolors were best found under "paintings" or "works on paper". I think you need to make it easier for non-computer people to find specific types of image.

dblock · on Nov 27, 2012

It's an interesting observation. We built a full text semantic search in 2011 by reverse-indexing art search results from popular search engines and it could do things like "Cezanne watercolors" at ease.

We showed it to people and we found that one could easily trick the system with things like "Worst American Art Ever". That generates results, but shows the limits of a general semantic search in a narrow context.

Happy to hear any suggestions about how to make something like this both useful and not too easy to make look really silly.

georgeorwell · on Nov 28, 2012

Well surely each of them is tagged with both 'cezanne' and 'watercolor', can't you just match search substrings to tags?

dblock · on Nov 29, 2012

It only works in that case. What if there's another work that is called "Cezanne's Favorite Watercolor", and we're just in the beginning of search hell :)

continuations · on Nov 27, 2012

> we use SendGrid and MailChimp to send e-mail.

Why do you use both SendGrid and MailChimp for email?

Can you talk about the different use cases that require using the different email services?

redguava · on Nov 27, 2012

I don't know about how they do it, but I use both currently.

SendGrid for one-off emails (ie. a user signs up for an account, their welcome email).

MailChimp for email marketing (we send newsletters via them).

MailChimp has since released Mandrill that caters to the one-off emails, but we were already with SendGrid by then.

dblock · on Nov 27, 2012

Same thing for us at Art.sy. We have humans doing targeted emails, with MailChimp.

Sendgrid is an SMTP relay with high deliverability. That's all it does. You can find who received what on the website, too. With MailChimp you have to setup lists and all that unnecessary stuff.

There's one big drawback to using both: we currently have to sync our users to MailChimp and sync MailChimp unsubscribes back. Hence we're going to get rid of MC eventually when we can build good enough UI to manage mass emails.

continuations · on Nov 27, 2012

I see. What make SendGrid better for one-off emails and MailChimp better for email marketing?

jcampbell1 · on Nov 26, 2012

I'd read this with a grain of salt. I remember looking at art.sy via the NYTimes link and thinking the site was terribly slow. I still find it moderately slow. I am not sure they really need mixpanel, google analytics, and kissmetrics, on every click and the api calls should take less than the 1-4 seconds I am seeing.

dblock · on Nov 26, 2012

dblock from Art.sy here. The analytics comment is totally fair - we spent a lot of time looking at all kinds of stats as we keep experimenting and it needs to be trimmed down to 1 (or none :)).

Our average API response is 380ms. It's about 20x too long as far as I am concerned. It's not Heroku's fault to be fair, there's a mix of Ruby code, database queries and some not so easy to optimize math in some cases. It's definitely work in progress.

bigiain · on Nov 27, 2012

I suspect I'm well outside your target demographic, but I'm seeing ~22 whole seconds to onload (according to Firebug) here in Sydney, Australia (as 52 requests comprising 1.6Meg of data.)

That certainly seems excessive for a non-logged-in user on your main landing page.

dblock · on Nov 27, 2012

China and Australia are the worst regions right now in terms of content delivery speed by a factor of 3.

I suspect with a new AWS data center in Australia Cloudfront delivery will become better there soon, and we're looking at Akamai medium term.

mh- · on Nov 27, 2012

my first (cold cache) load of the homepage took about 7-8 seconds to be usable but the rest feels fluid.

don't be discouraged, after the initial load it performs better than most sites that look this nice.

dblock · on Nov 27, 2012

Thanks for the kind words. We load 1GB of CSS and JS. OK, maybe I am exaggerating :) It's a constant struggle between "it looks and feels amazing" and "it's super fast".

cjstewart88 · on Nov 26, 2012

I'm just curious, how much traffic NYTime brought you guys. I'm also on Heroku and my app(http://www.tubalr.com) "survived" several articles around the web(Mashable, TechCrunch, The Verge, The Next Web, Japan LifeHacker, and several others) within a very short timespan. My bill has never been over $25, which includes no extra workers.. just extra DB storage and an upgraded DNS plugin.

Just curious, thanks!

abfabry · on Nov 27, 2012

Art.sy was lucky to be the top article on NYTimes (the one with the large image at the top of their homepage) from about 5pm to 11pm, were the #3 most emailed, and the NYT article was #3 on Hacker News. We went to 1500 concurrents almost instantly, and maintained it for several hours. This is nothing for many large sites, but was the most we'd had. We had added some extra dynos in preparation, and our API response time actually went down as more obscure routes were hit and cached.

We prepared for the worst, though, and had built a failsafe way to progressively shut off more demanding features on the site (for instance: our related search results on artwork pages) in case we were getting overwhelmed. Much better to have a reduced feature set during launch than a broken site. Fortunately we didn't have to flip that switch!

cjstewart88 · on Nov 27, 2012

That's insane, congrats on staying up, and thanks for the response :)

triplesec · on Nov 27, 2012

Ugh! this is just a puff from heroku. Yes, I'm sure it's a great tech, but we have no idea of whether it is accurate or not. This piece has no analysis, no comparisons, and certainly no independence or objectivity.

_lex · on Nov 26, 2012

I'd love to deploy on Heroku, but it's so dramatically more expensive than aws. For a startup, is it really worth that additional cost?

dblock · on Nov 26, 2012

I think it's totally worth it. Right now it still costs us less than a developer to run on Heroku and not that long ago we were 3 of those devs. I think having our front-ends on EC2 along with setting up memcached/mongo/nginx/etc ourselves is a much higher upfront cost.

Then there're traffic spikes. Pretty hard to deploy to bare metal or EC2 instantly without building your own Heroku-like system.

taligent · on Nov 27, 2012

Well this just shows how little you know about deployments.

Using something like Hudson/Jenkins can make deployments simple borderline trivial and EC2 supports ElasticBeanstalk which makes deployments a breeze.

nthj · on Nov 27, 2012

> Well this just shows how little you know about deployments.

Exactly.

Which is that much more I know about shipping working products which make my clients money.

oellegaard · on Nov 26, 2012

I use heroku for prototyping, which it is great for. I would not use it for larger projects in production (disclaimer: while also being a developer, I'm the company syadmin). If we measure downtime, heroku had much more downtime, than the servers we have internally - and I find heroku extremely slow compared to what I can deliver with basic hardware. Their database offerings are awesome, though.

jordanscales · on Nov 26, 2012

Heroku is really just AWS with convenience, isn't it? I'm not sure what else they do - but I guess that "convenience" is worth it for some.

zrail · on Nov 26, 2012

The convenience is totally worth it for prototypes and very early stage startups. The less work I do on a system level is more time I can spend on my startup.

Once you move past that, because your app is well structured you can easily deploy on your own hardware.

jasonmccay · on Nov 27, 2012

Really? Do you work on your own car? Is it because you think you aren't smart enough to do it or because you think that spending your time on other efforts is a more valuable configuration?

AWS is the electric company. Heroku tends to be the circuitry, outlets and light switches for your house. You may take it for granted because it looks so simple, but it's the simplicity that makes it great.

As PaaS providers begin to mature and turn attention to more worthy challenges (like geo-agnostic deployments, easier scaling, for example), then their value is only going to increase.

jordanscales · on Nov 27, 2012

Right, I was not disrespecting Heroku in any way - I truly do not know what product they sell. That being said, great analogy.

gridaphobe · on Nov 26, 2012

Is it possible to launch using just a single dyno and heavy caching/optimization?

thinkbohemian · on Nov 27, 2012

Heroku employee here, don't abuse the free. It's there for a design/prototyping/staging and as long as people don't abuse it will stay that way.

oellegaard · on Nov 26, 2012

They will shut down your instance, if you don't use it for an hour or two - and then it will take like 8-10 seconds to load it again. If you can live with that, I guess that would work, yes.

CoffeeOnWrite · on Nov 26, 2012

There are very, very easy ways around this.

citizens · on Nov 27, 2012

One being a service like http://uptimerobot.com/ that pings the site every 5 minutes

dblock · on Nov 27, 2012

From what I see, when you have a single dyno, Heroku deploys you to a different environment sandbox called "development". Typically that has less uptime and takes lots of upgrades all the time with interesting consequences. I believe they promote to "production" applications much less frequently.

adrianpike · on Nov 26, 2012

Depending on what you need to do to show off the product, absolutely. A static landing page should withstand just about anything you throw at it.

kmfrk · on Nov 27, 2012

You pretty much just described the concept of PaaS. :)

thelarry · on Nov 27, 2012

I think this article can be a little misleading. Sure, with cloud tech you can easily spool up more instances. And yes, it is great to not have to worry about configuring a load balancer (I guess). But just because adding more instances sort of fixes a problem doesn't mean it is a good thing to do. Not "having to do calculations" is a very bad attitude. You should know where the bottle necks are, and if adding instances is actually necessary or just some scotch tape. Bad architecture and coding can bite you in ways that adding hardware cannot fix.