Hacker News new | past | comments | ask | show | jobs | submit login

> Web applications are not CPU-bound, they're IO-bound.

Is that a fact? I'm not sure anyone has ever shown numbers to back it up. It would be an interesting piece of work to do. I can imagine it's true, but I'm not certain.

I think Rap Genius did a blog post ages ago about being hosted on Heroku, and they said that something like 70% of time was spent inside the Ruby interpreter, not waiting on IO. So modest improvements in the speed of Ruby will have an impact on their bottom line.




>Is that a fact? I'm not sure anyone has ever shown numbers to back it up.

What? I've seen tons of profiling posts and articles showing just that thing, and it's the same thing I've also seen forever on my servers' stats. I'm obviously talking about the majority of web apps that are using DBs and doing some kind of CRUD work -- not video processing services et al...


This is the blog post I was referring to http://genius.com/James-somers-herokus-ugly-secret-annotated and the key graph http://images.rapgenius.com/75aa2143d3e9bf6b769fc9066f6c40c8....

Is that non-typical? Or am I mis-interpreting the graph? (Genuine questions, I've never done any web app operations.)


That graph still shows time to process/respond, not CPU utilization.

From the article the slowness is when "a dyno is tied up due to a long-running request" -- so, we they are probably still talking about slow IO operations, and servers running on the dyno that are not async to free it up in between.

From Heroku's one optimization page: "Most applications are not CPU-bound on the web server".

https://devcenter.heroku.com/articles/optimizing-dyno-usage

(And Heroku might be an outlier in having high real-cpu use too, as they tend to over-provision).


But the graph shows time in waiting on web requests (dark green), time waiting on cache (teal), time waiting on the database (brown) and time queued before the interpreter gets the request (light green) separately to time spent running the Ruby interpreter. It's a cumulative graph.

So if the IO isn't web requests, cache, database or waiting in a queue, what can the interpreter be waiting on? Local files?


It varies a lot depending on type of app. Backend dashboard full of reports? Odds are it will be waiting on complex JOIN's to complete in database requests. More lightweight stuff with mostly simple key'd queries and things like complex views can easily make it CPU bound.

In general it's a useful idea that you're likely to be IO bound, because most of the time when something in a new webapp is too slow, it will be database (it may be a small fraction of overall requests, but the ones that hurt) until you have optimized/cached away your database request and are trying to scale further.

Basically once you've solved the "naive"/early database bottleneck, then CPU and memory very quickly becomes more likely to be a cost factor.


I take a bit of a No True Scotsman position on this. In general, web applications don't need to be CPU-bound: the amount of computation they need to do has a small cost compared to that of the IO they need to do. But not all web applications only do the computation they need to do - some do a lot of unnecessary computation.

For example, there's a painfully slow web application at my company that puts data on a page by doing a straightforward select from a database, and then running lots of nested loops over the data to render the page. It's definitely CPU-bound. If it was written to use a more powerful query, or to use more sophisticated algorithms, it would be a lot faster, and would probably be IO-bound. But it isn't.


I shouldn't have spoken in absolute terms, "primarily IO-bound" would have been a better choice of words.

That said, in my experience this has been the case when it comes to handling HTTP requests.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: