Hacker News new | past | comments | ask | show | jobs | submit login
How shaving 0.001s from a function saved $400/mo on Amazon EC2 (benmilleare.com)
60 points by taylorbuley on Nov 26, 2014 | hide | past | favorite | 41 comments

The article is light on all important details.

The most obvious questions:

What language are they using to run ExtractBot?

How did they identify the expensive function?

Where was this expensive function (CSS bot is mentioned, is this their code or did they use a lib in which the fix would be of interest to others)?

Is the ExtractBot home page demo form purposefully broken at present due to HN load, or just broken for me?

Without knowing those things, some guesses: It's a scraper in Node or Ruby, it uses a load of existing libs that were not written with performance in mind. Those libs pull apart HTML and extract text values which are returned in a JSON doc (or something). They wondered why they were running so hot, managed (intentionally? fortuitously?) to spot some loop that needn't exist, or an expensive function that could be avoided by some cheap caching.

Just for fun I tried to estimate the performance characteristics of this service.

My initial assumption is that the 2.2M pages per ~18h are the main workload. This is also supported by the chart at the bottom, outside of the 18h timespan there is hardly any baseload. The blog additionally gives the following facts: 18 c1.medium instances and ~60% utilization after the optimization (taken from the chart).

Now this allows us to calculate the time per page. First the time for the total workload per day is num_machines(cpu_time_per_machine)=18machines(18h*0.6)=194h of processing per day.

On page level this is than 194h/2.2M=317ms per page.

This feels really slow, and should even be multiplied by two to get the time per cpu core (the machines have two cpu cores)! I would guess that the underlying architecture is probably either node.js or ruby. Based on these performance characteristics the minimum cost for this kind of analysis per day is $25. For customers this means that on average the value per 1k analyzed pages should be at least $1.13. I think this is only possible with very selective and targeted scraping, given that this only includes extracting raw text/fragments from the webpages and does not include further processing.

>What language are they using to run ExtractBot?

" 18 c1.medium (2 cores x 2.5 units) = 3,700 per second"

90GHz = 3,700 per second

24 million cpu cycles per one parsing routine

sounds like bloated js, python, or server side php

Oh hai, OP here.

Firstly - my first HN front-page, yay!

So, this was a little unexpected to say the least. As has already been pointed out, this post was written about 14 months ago now, and yes details are a little light. I'll ignore the usual HN hospitality and answer a couple of the more pressing questions:

1. this was a very early MVP at the time, it was not a production-ready piece of enterprise software, so no it was not built for out and out speed in the first instance.

2. yes there were probably much better options than c1.medium on AWS, but see (1).

3. yes it uses off-the-shelf libs, see (1).

4. no the website is not meant to work, I never got round to finishing it up, see (1).

5. sadly, I don't have the original git commit to reference (I was at my private repo limit and removed it) but yes, it was essentially a simple 1-line optimisation. IIRC it was something being evaluated in a loop that didn't need to be. Very mundane indeed. No tools used to identify it, I just knew it was being called a lot.

Ironically, the post was meant as a linkbait to drum up a bit of interest in the tool and see if it was worth developing (hence the tabloid title). It didn't get any traction and so the project kinda halted.

Question about #5. Does the statement "I was at my private repo limit and removed it" mean you deleted all the history AND code for the project?

That seems sad if true.

No I still have the code, just no commit history.

...and yes I clearly should have written this in machine code and deployed to a Gibson. Thanks for the tips.

"We have lots of EC2 instances and a slow function called in a tight inner loop, so we saved a bunch of cash by making it slightly quicker" - summed up blog post in one sentence.

Some details about what tools you used to find the culprit, or how you optimised it, might be more interesting.

>How shaving 0.001s from a function saved $400/mo on Amazon EC2

Without reading the article, I figured there was only one likely explanation—the function gets called a lot.

And ruby, python, js could focus on optimizing some frequently used core class methods (strings, arrays) and it'll save the earth billions of dollars, will help the environment and some national economies.

They do, but of course, it also needs to be generic enough.

But you can optimize their usage in your code.

They probably could save that amount of money by maybe trying different machines/providers.

I wonder what's the annual worldwide cost of parsing XML for example.

Or you can use C++ and not worry that your abstractions will cost you measurably in execution time. Of course, there's other disadvantages, but ...

Yeah... No thanks

Unless it's for something very specific, of course.

i figured the only likely explanation is ecs is overpriced, ;/

You'd probably shave much more that $400 by using dedicated servers (rented monthly) rather than EC2 instances.

Example pricing (from OVH):


(You'd probably need to do some homework to find the most price effective platform for your workload. For CPU intensive workload, I usually start with https://www.cpubenchmark.net/ given than OVH gives the exact reference of the CPU they're using).

OVH isn't really production grade hosting, so it's not quite fair to use them for comparison. On the flipside, SoftLayer has been overpriced for a while now, and the IBM acquisition really didn't help any.

Thankfully, there are a plethora of dedicated server providers out there, which can easily beat Amazon's uptime record while still being considerably more cost effective without going as cheap as OVH.

Can you tell about other dedicated server providers you mention. I am on lookout for one.

Best place to look is the webhostingtalk.com forums. There are a ton of providers who are active there, and people also often post reviews and outage reports.

Just to clarify - I wasn't paying full instance prices. I was using spot instances which worked out around $7/mo each at the time.

That depends on how stable their server requirements are. EC2 comes into its own when it comes to scalability.

People tout EC2 scalability all the time, but the reality is it only allows you scale down. You can easily find dedicated servers more powerful than the hardware that EC2 runs on, for vertical scaling, with hardware specifications tailored to your workload to boot. And if your infrastructure needs require horizontal scaling, you may need more flexibility than the generic use cases that the tools Amazon provides or can easily find alternatives for less than the cost of the EC2 premium anyhow.

When I worked at Facebook we would measure performance gains (or losses) in terms of metric tons of computers saved (or lost).


There was also a rule about always showing your work. ;) The details of the actual optimization are less important than the means by which one measured where the CPU time was going, and how the expected result was confirmed.

c1.medium used to be the cpu workhorse on ec2, but with the new generation, they're almost always not the ideal choice. For example, for this use case, what about c3.larges? They're cheaper and have more CPU [1]. And if you don't need to have a lot of storage, the smaller SSD's can even improve your throughput.

[1] http://www.ec2instances.info/

There's a nice AWS re:Invent 2014 presentation on performance tuning EC2 that starts talking about instance selection and moves onto more detailed topics (EC2 features, kernel tuning, observability).


This post was written a couple months before the c3s were released, I think.

Well, thats a problem you can see a lot these days. People just throwing software together without thinking about the efficiency anymore. Because "We'll scale it in the cloud". I'm wondering how much energy and money could be saved by just having efficiency in mind again.

Recently I found out that in Rust you can not only mark a function as `[test]`, but also `[bench]` out of the box (everything included in the base distribution). This can be run automatically on every cargo build, so people actually see the result. (in time per execution and optionally data throughput) I think that's a great way of improving visibility of the performance issues.

Many people are happy to play for internet points, whether they're SO points, number of patches submitted, bugs fixed, or something else. I'd be really happy if performance was one of the games they can play.

That isn't really a problem. The problem is that people never pause and reflect over it once they reach some critical adoption. OPs software might never have been a productive business if he micro optimized some C++ code in pursuit of best possible performance. Make it work, then make it fast if you need to, but only if you need to.

As Dewie also pointed out, it is about keeping performance and efficiency in mind when programming and not about squeezing out the last drop of performance before going live. Not thinking about it from the start will cause a lot of trouble in the long run and might make the transition to something more efficient quite painful or "impossible".

> OPs software might never have been a productive business if he micro optimized some C++ code in pursuit of best possible performance.

It's not an either-or. Maybe the best long-term approach would be to write it in a high-level style favouring readability, productivity and adaptiveness, while also being conscious not to use features that could lead to performance problems. Maybe you'd write it in a compiled (native/efficient VM) language[1] with garbage collection and be mindful of memory layout, instead of writing it in Ruby[2]and then having to rewrite some parts in another generally faster language later. Then you also wouldn't have to have that "Ugh, now I need to bust out C++/C/Go/Whatever and rewrite that stuff... oh never mind it's probably fast enough" when you start to run into performance problems.

[1] More precisely an implementation of a language which is compiled...

[2] Though maybe Ruby is compiled in the canonical implementation, for all I know.

10/10, would agree again.

What Herb Sutter calls premature pessimization.

I'd like to know how you made the function 0.001s faster. Not that I think it's going to be a huge engineering feat, actually I'm assuming it didn't take much - but I could be wrong. Would be interesting if 1 line of code was costing you $400/mo.

well on one of the large Uk Job sites I found a single misplaced rel=canonical tag cost 1/2 a million in less than a week :-)

Can you share with us the tools you used to find this functions?

What an odd article - He basically said, "We have a function we call a lot, and so we reduced how often it gets called, and therefore saved money on EC2"

No Details, No how they tracked down the issue, No mention of how they captured the timing. It kind of read like the status reports I send each week to my boss. But lighter.

Interesting. Care to share with us how you made the function faster and what programming language you guys are using?

Moving off Amazon will save you even more

What software did you use to generate that graph?

I'm currently using munin, but would like something more cluster-oriented (as-in, seeing the big picture, not individual servers)

That's from the AWS EC2 console

Thank you.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact