Hacker News new | past | comments | ask | show | jobs | submit login
9 million hits/day with 120 megs RAM (tumbledry.org)
303 points by verisimilitude on Aug 31, 2011 | hide | past | web | favorite | 126 comments

Blogs really don't need a PHP, Rails, or anything backend. It's static content.

Here's how I think blogging should work:

1. Visit a web app where you create your blog post, add pictures, use your rich text editor, that sort of thing.

2. Click the "Publish" button, which generates static HTML and runs any other processing like tag generation or pngcrush.

3. Your static HTML gets pushed out to some server that does the hosting for you. It could even be one of those really cheap shared hosting providers.

If you really want comments, let someone like Disqus or Intense Debate handle it. Pretty much any dynamic feature you need can be outsourced.

That's pretty much how MovableType used to work.

Fun fact: it was terrible. Most bloggers I know hated it.

Wordpress is a bucket of poop too, but it has the quality that when you press 'publish', the site is updated in seconds, not minutes.

(I believe MT has changed since then).

It's how Greymatter and a few other systems used to work as well. The main problem was determining which pages should be updated, which usually resulted in the entire blog being rebuilt. This took several minutes if you had a few years of archives, and if anything went wrong part-way through the process you had to start it all again.

I've ruminated in the past about a better blogging system based on updating on-disk caches based on POST and not GET. I think everybody who watches Wordpress trip over and burst into flames when you so much as breathe on it has.

The main problem to solve is inter-page dependency. The rest is not that hard /famouswords.

You can get too carried away with scalability though. I run half a dozen WordPress blogs, plus some static sites and email for multiple domains (with AV and spam filtering) within 500MB of RAM. I don't need to implement any caching because I already have sufficient resources.

Get this: http://wordpress.org/extend/plugins/w3-total-cache/

It's called a caching plugin, but it also compresses and minifies stuff. Any site can benefit from loading faster. You can turn it on and let it go. It has sane defaults.

Thanks for the info, after hearing people praise wordpress as the holy grail of blogging, I found it to be very very slow. Hopefully this plugin will help

WordPress is usually praised for usability, not performance. To make WordPress faster and more efficient, you need a caching plugin like W3 Total Cache. It caches your pages in static files, caches queries and results in memcache, and rewrites css, images, and js to CDN paths if you have a provider. In front of this, I would have Varnish cache as much as possible in memory, for even faster access. W3 Total cache can even invalidate your Varnish cache when pages are updated.

Cannot give Varnish enough praise, it can tubo charge even the most expensive / slowest loading pages and uses minimal resources.

I found that a major contributor to page generation times on Wordpress is the time taken to load and parse all the PHP files. I installed a PHP opcode cache (APC) as I recall, and it made a huge difference, because the parsed representation of all the Wordpress code is cached in memory (or disk). I'd do that before installing any Wordpress caching plugins, though total cache has other nice features, like minification. Plus it has three different levels of caching. You can cache entire generated pages, and/or objects and/or DB results.

> I don't need to implement any caching because I already have sufficient resources.

I want you to print this out and stick it somewhere out of sight. When one day you are linked to on Reddit, Slashdot, a major news site or the like, pull it out for a hearty laugh.

There's a risk vs effort balance involved in these sorts of decisions. The risk of any of my sites getting Slashdotted is sufficiently low that it's not worth spending time preparing for such an eventuality, and I'm prepared to accept the consequences if it does happen.

Installing WP-super-cache or W3 Total Cache takes about 5 minutes and will do the most effective 80%.

Also, I think W3 Total Cache is an order of magnitude easier to install than WP Super Cache.

> I don't need to implement any caching because I already have sufficient resources.

Define "sufficient resources". It might be enough for your normal traffic, but without caching those blogs will crash from the /. effect (unexpected traffic spike).

Sufficient = enough for normal use. My point was that people tend to get worked up and spend time on caching strategies when there's no need for the vast majority of sites.

Even with caching, I suspect Apache would fall over if I got slashdotted.

I agree there is a balance between sane and insane efforts at caching, but I disagree that no caching is a viable choice. Apache and Wordpress with no caching on a smallish VPS will fall over really quickly. This is unacceptable when you can get huge benefits very easily by installing a widely available plugin like WP-Super Cache or WP-Total Cache. It's almost no work to get that set up, and will help you survive a mini-slashdotting. That's going to let you serve an order of magnitude more traffic than usual.

An easy solution like those plugins is such a big win for so little work, I just can't agree with "it's good enough for normal use". The last thing you want is for your blog to fail when you are getting a traffic spike.

I'm not sure I understand that.

Around 2002-2003, I built a gigantic, sub-optimal, regex-heavy Perl script for automatic updating of webcomic sites (it's still in use on several sites I know of, ugh).

Each day of the archive had its own separate HTML page generated from a template.

I never got around to the complexity of supporting selective updates, because brute-force regenerating the entire archive every time the script ran, even to several thousand comics, took just seconds on anything but Windows (whose file I/O was just too bloody slow, but even then the run only took a minute or so).

This was on a shared hosting platform, where the provider probably oversold capacity, and without access to the command line. I suspect part of the delay was waiting for the CGI script to send data to the browser.

I've been using MT since 2003 and never thought it was terrible. Publishing times (full-site publishing) was always slow but publishing an entry only took a few seconds. The new Melody project forked MT and updated the UI, it looks great now.

I used MovableType when rebuilding steves-digicams.com (not by choice), and it really slowed to a crawl when we did a full publish (30,000+ articles). Luckily we only had to do that a few times when setting up the site. I don't see the advantage of static files being worth the hassle of publishing when you can achieve the same or better results using a caching proxy like Varnish. We used Varnish in front of Apache on ultimatecoupons.com (all dynamic, built in the Symfony PHP framework), and once cached, pages load in 15ms (first byte in browser).

http://daringfireball.net/ uses Movable Type and gets a lot of visits, and I'm sure it can handle a lot more. Yes, static content is the key.

"the site is updated in seconds, not minutes."

That is an implementation issue. That problem is in no way inherent to static content.

The problem with saying "Wordpress is poop" - as a developer, that problem report has no info I can actually use to fix anything.

This is what you get with WordPress and W3 Total Cache (the caching plugin I use). It generates HTML files, then uses some fancy .htaccess magic to send requests to them. It can even push to a CDN. My blog survived being whacked by 3000 hits from HN over the course of 2 hours without blowing up.

Pre-gzipped pages are nice too. :)

3000 hits over 2 hours is < 0.5 hits/second and nothing to write home about, sorry.

The raw number wasn't the point. WP is notorious for collapsing under a gentle load without caching.

The commenter also didn't take into account that the req/sec was probably not evenly distributed across 2 hours.

I can vouch for that. I've had vanilla WP die on me from the gentle load that comes with a link going on HN. I assumed it was performant out of the box ... learned my lesson!

W3 Total Cache really is a first rate plugin, we've had great success throwing Varnish https://www.varnish-cache.org/ infront of WordPress

W3TC integrates with varnish and supports a number of backend web servers.

The permissions changes it asks for are because caching plugins need to create files on the server and most servers are either poorly configured or managment of groups:users is difficult, so temporarily requesting that users modify permissions so that the caching plugin can create the files for the user improves the installation experience via lowering the technical experience needed and the number of steps required.

I don't recommend suPHP in practice as it's quite slow, similar if not better security can be realized by using a reverse proxy, and then running PHP in fastcgi mode for example with a backend web server.

My concern with the wordpress cacheing plugins I've looked at is that they require the webserver to have write access to the webserver directory. This is particularly dangerous on shared webhosts, as a malicious customer on the same server could write php files in your cache directory.

I'm pretty sure that W3 Total Cache writes to /wp-content/uploads which you'd need to be writeable anyway.

That's what suPHP is for.

Blogger used to work this way. You used to be able to download an archive of your site and host it on your own server! The thing is, unless your site is very simple and small, regenerating static HTML becomes more of a headache than maintaining a system that serves the content out of a database.

That is how viaweb worked. All product pages were static, and once you finished editing, you would 'publish' and it would create the static files for your online store.

I used Jekyll to generate my static content a few months ago/ Just type a text file, and then push it out to the server. I really thought that was cool.

"If you really want comments, let someone like Disqus or Intense Debate handle it. Pretty much any dynamic feature you need can be outsourced."

If you are fine with your comments residing somewhere else, yes. I usually prefer to have them with me and to have them search engine indexable (that's free content created from users for me)

Good point. An unfortunate casualty. It might be possible to augment the statically generated blog posts with data via the comment providers' APIs periodically.

Vignette StoryServer (http://philip.greenspun.com/wtr/vignette-old) .. if you see commas in a .html URL, thank them for it. And TCL as a scripting language, whoo-hoo :)

I do this with WP's caching plugin and a caching plugin for Drupal. It bugs me that these aren't part of the official distribution and need to be plugins.

In the realm of CMSs, RedDot works this way. It is a bake CMS, when you publish, it creates the whole site as a set of files. You can use it to generate php, asp, jsp etc pages, but it takes the data stored in a database and creates the files from the taxonomy stored there.

You must be talking about Stacey then as it does some of the described "magic". :)


staticmatic deploy amazon_s3


I guess my age is catching up with me-- my gut reaction on seeing the headline was: 120MB? That's a lot of RAM-- who has that? Oh, wait...

Indeed! I read it as:

9 million hits/day with 120 gigs RAM

Especially because of the scalability discussion around it. Only later in the article it dawned on me that he is using a cheap cloud instance.

LOL... me too... I was thinking... who couldn't hit 9M hits a day with 120 gigs of RAM... just load everything in memory and walk away! 120 MB is certainly more impressive.

Cross posting this from a comment I made on reddit: This is something I've actually worked extensively on solving and it's not quite as easy as this article claims it to be. In fact, there are quite a few too many draw backs to this method to any site that isn't largely static or updated very rarely.

* Whenever anything is updated the entire cache is invalidated and each item needs to be fetched again. This means you'll have some page loads being slow and others being fast. If you have a very dynamic website you will hardly ever even see a cached version.

* You can't cache things forever, primarily because when anything is updated the entire version namespace is invalidated. This means that if you have a site that isn't updated at all in a long time then the cache is still invalidated by the TTL and has to be updated. Of course, if you decide to cache forever and the version namespace is incremented then...

* You never know when your cache is full. Since the method of updating the cache isn't to invalidate keys but rather to just fill it with new keys, you will have a lot of stale data. This data will eventually have to get evicted from the cache. This means you don't reliably know when you need to upgrade your cache memory.

All that said. Version namespacing your cache is better than not caching at all and it's usually also better than having a lot of stale data as active keys. If you want to do proper cache invalidation in case you have a highly dynamic site then it's still possible, but it requires a lot more work, there's a reason for this famous quote: http://martinfowler.com/bliki/TwoHardThings.html

This underscores how ridiculously overspecced modern servers are due to the bloat of a lot of modern software.

Virtual machines change the economics of hosting for the host. A given VM has a lower administrative overhead per customer than shared hosting does. This will eventually drive out shared hosting, IMO -- except for single-application purposes.

"for the host" is key, really. the customer has to do a lot more work on a VPS than on a shared host, so I think shared hosting isn't going away.

Now, I agree with you that the shared hosting market will continue to move more towards constraining the user to a particular language within that shared hosting environment, as that allows the host to provide better service, but this has been true for a while. For the last 10+ years, if a shared-hosting provider allowed you to run an arbitrary binary, it only allowed you to do so as a second-class citizen, behind several layers of setuid wrappers and other (usually slow) security. Back in the day, if you wanted to run at reasonable speed on a shared host, you'd write in php, and use the supplied MySQL database, which conceptually isn't that different from what many of the language specific cloud providers do today.

The interesting thing is that this means shared hosting, usually with a well-supported but language constrained environment, is actually becoming the premium hosting option. VPSs, formerly the high-end option, are now the cheap stuff. Which makes sense; as a provider, sure, the VPS costs me more ram, but I can get an 8GiB reg. ecc ddr3 ram module for under a hundred bucks now. Ram is cheap. what costs me money is the support, and it is a lot easier to rent out unsupported VPSs than it is to rent out unsupported shared hosting.

If anything, with this new 'premium' image and advances in billing customs, I think we are seen a renascence in 'premium' shared hosting services.

> The interesting thing is that this means shared hosting, usually with a well-supported but language constrained environment, is actually becoming the premium hosting option. VPSs, formerly the high-end option, are now the cheap stuff.

Exactly this. I have been using shared hosting for around $100/yr, which was cheap 3 years ago when private servers were double or triple that. But recently I bought a VPS for $15/year. Sure, I have to do my own admin, but I'm fine with that. If the cost for the shared hosting doesn't come down, I'm going to have to move everything to my VPS.

$15/year? is it an OpenVZ (advanced chroot jail) VPS?

That's an interesting counterpoint to this discussion; advanced chroot jail hosts are still cheaper than Xen hosts, and OpenVZ (the common advanced chroot jail software on linux) is in between shared hosting and full virtualization with regards to how many resources are shared and how much host sysadmin effort is required.

With OpenVZ, all users share a kernel and share swap and pagecache; you can fit more OpenVZ guests on a particular bit of hardware than Xen guests, and generally speaking, more host sysadmin involvement is required when running an OpenVZ vs. a xen or kvm host.

The interesting part of the xen/OpenVZ dichotomy is that it goes the other direction; so far the market price for OpenVZ guests is much lower than the market price for Xen guests; OpenVZ mostly occupies the very low end of the VPS market.

> $15/year? is it an OpenVZ (advanced chroot jail) VPS?

As far as I can tell (from the kernel name), it is OpenVZ. I only discovered that last night, coincidentally.

It is tiny, only 256MB, bursting to 512MB, of RAM. But at that price it is fine for my needs; I even get my own static IP address (the full cost of the VPS is cheaper than just adding a static IP address on my shared site!).

I was reading up on OpenVZ vs Xen or KVM last night, and I take on board your comments above. The virtual server market seems to be transitioning right now, with pricing yet to settle down within a common range like they were a few years ago.

yeah, at $15/year, you are getting quite a steal, OpenVZ or not. for a 256M xen VPS I charge $76.80. Heck. even if your provider is being heavy on the swap (my understanding is that OpenVZ 'RAM' numbers include swap, while under Xen, I give you ram, and you can add swap if you like, so when directly comparing OpenVZ and Xen guests, you should compare a xen guest with less ram to an OpenVZ guest with more.) and you are only getting 128M of actual ram I'm charging you $57.60 for that, and you haven't said, but if they are following industry norms they are giving you more disk than I am, and I primarily compete on price; I'm fairly low-end. Did you get a special? or is that the sort of thing you can order every day?

It was a special, but one that they offer regularly, in the sense that every so often they offer say 1,000 of them until the offer is filled up. Then it is closed off until the next round.

Disk space is 10GB and traffic is 500GB.

I found the vendor on www.lowendbox.com .

I predicted the 'doom' of conventional shared hosting in 2008[1], but I didn't foresee the rise of PaaS offerings like Heroku.

Interesting times.

[1] http://clubtroppo.com.au/2008/07/10/shared-hosting-is-doomed...

I agree with almost everything you said; I think the major flaw in your thinking was that it turns out that making a virtual appliance that can safely run on the open internet for 5 years without intervention from a SysAdmin is a whole lot more difficult than you'd think. If you could have predicted that, with the other information you had, I think that premium shared hosting (which is what PaaS is) with VPSs eating the low end, would have been the obvious conclusion.

I've thought quite a lot about this, in fact; for a while I was talking about starting a PaaS company that used the customer's equipment. As far as I can tell, the current model (where the PaaS company controls everything) is by far the easiest (thus, if sysadmin/programmer time is your constraint, the best) way to solve the problem.

The key points:

    1. Use caching.
    2. Use Nginx.
    3. Use PHP-FPM.

I think people are too quick to throw out Apache + mod_php. It remains the defacto deployment environment for most open source PHP apps, which means that it is the best understood and best supported. For example, if you deploy using Apache, you don't have to worry about converting .htaccess rules that wordpress and some caching plugins create into an Nginx equivalent.

People bitch a lot about the memory consumption of apache, but it's often overstated. They add up the resident set for each individual apache process and come up with a huge number, missing the fact that a significant amount is actually shared between processes.

Apache's memory consumption problems have more to do with how it handles clients. Each client ties up a worker process for the entire duration of a request, or beyond if keepalives are enabled (as they should be). That means that the memory overhead of the PHP interpreter is locked up doing nothing after the page is generated while it's being fed to the client. Even worse, that overhead is incurred while transmitting a static file.

I use nginx as a reverse proxy to Apache. It lets me deploy PHP apps easily and efficiently. Apache returns the dynamic and static requests as quickly as it can and moves on to the next request. Nginx buffers the result from apache and efficiently manages returning the data to all the clients. I used to have Nginx serve the static files directly, which is more efficient, but added complexity to my config. I chose simplicity.

A PHP opcode cache, like APC, is also a big win, because it cuts the overhead of parsing and loading the PHP source files. I'm not convinced of the value of other caching for most uses. CPU time usually isn't the scarce resource, RAM is. The DB and filesystem cache are already trying to keep needed data in RAM. Adding more caching layers generally means more copies of the same data in different forms, which means less of the underlying data fits in RAM.

> Nginx buffers the result from apache and efficiently manages returning the data to all the clients. I used to have Nginx serve the static files directly, which is more efficient, but added complexity to my config. I chose simplicity.

Funny, I felt the same way about ditching Apache entirely. Just one more moving part I don't need.

So true. Apache + mod_php is very reasonable choice. Anyone who advocates using FastCGI instead of mod_php to "save memory" just doesn't understand what the actual memory footprint of Apache + mod_php really is, and how adjust the number of Apache processes.

In fact FastCGI still ties up an Apache process for the duration of the request: Apache hands the PHP request off to a FastCGI worker, then waits for that PHP worker to send back the output, so the Apache process is still blocked waiting on PHP in either scenario.

Also the overhead of Apache serving static content is miniscule compared to the amount of work a PHP does per dynamic request, unless the static content is very large, like large media files.

It's not just about steady state memory footprint. It's about the whole stack, how apache and mod_php and mysql (if you use it) interact.

Excessive traffic means a spike in simultaneously served connections, which means a spike in apache threads (assuming worker MPM, iirc there's 1 process per 25 threads by default). With the mod_php model, the per-thread php memory usage can be very expensive when you have dozens or hundreds of apache threads serving requests. A spike in running php instances leads to a spike in mysql connections for a typical web app. If you haven't tuned mysql carefully, which most typical environments have not, mysql memory usage will also skyrocket.

Then for the coup de grace you get stupid apps which think it's perfectly fine to issue long-running queries occasionally (occasionally meaning something like .1% to a few percent of page loads). When that happens, if you're using myisam tables which were the default with mysql < 5.5 (and which therefore dominate deployments, even if "everyone knows" you're supposed to be using innodb), then those infrequent long-running queries block the mysql thread queue, leading to an often catastrophically severe mysql thread backlog. Since php threads are issuing those queries, the apache+mod_php threads stack up as well, and they do not use trivial amounts of memory.

The result is that you have to severely over-engineer the machine with excess memory if you want to survive large traffic spikes. If you don't, you can easily hit swap which will kill your site temporarily, or worse, run out of swap too and have the oomkiller kill something... either your webserver or mysql.

The benefit to fastcgi is it takes the memory allocation of php out of apache's hands, so every new apache thread is more limited in how much bloat it adds to the system. With a limited pool of fastcgi processes, you can also limit the number of db connections which further improves the worst-case memory usage scenario.

The advantage of in-apache-process php is that it serves php faster when there are few parallel requests, but it's on the order of single-digit milliseconds difference (the extra overhead of sending requests through a fastcgi socket), which is dwarfed by network rtt times even if none of the above pathologies rear their heads.

The apache+mod_php model is to do php processing for all active connections in parallel. The fastcgi model is to do php processing for at most x connections where x is the php fastcgi pool size, leaving all other requests to wait for an open slot. It may intuitively seem like the fastcgi model is going to be slower because some requests have to wait for a fastcgi process to become free, but if you think about average php request time it's going to be better for high parallelism, because the limiting factors are cpu and i/o. The apache model ends up using ridiculous amounts of resources just so no php request has to wait to begin getting processed by php. The high contention particularly for i/o created by apache and mysql when they have large numbers of threads is what makes the fastcgi model superior.

Why use FPM over Fast-CGI? I know Rasmus uses FPM and I would like to know the benefits.

FPM stands for "FastCGI Process Manager".

I hope that answers your question.

I'm aware of that but if it's just a manager wouldn't it add overhead?

You have to use a managing script anyway even if you don't use FPM. FPM makes things easier by managing that part for you. In addition, FPM comes with some goodies that managing scripts don't have.

I'd like to know how much the Joyent Smartmachine contributed to this. They make some bold claims on their website, and really do seem like a great alternative to EC2 (disk IO that doesn't suck!) if they deliver. Anyone have any experience?

Some interesting techniques in here (e.g. Faking Dynamic Features Using Inline Caching), but otherwise it seems easy to scale to this level when the majority of page content can be cached.

The difference between Apache and Nginx is that out of the box, Nginx is built for speed. Both are capable of thousands of requests per second, but Nginx arguably does it better with its event-based architecture (opposed to Apache being process based). The config syntax is also refreshingly simple, so converting .htaccess rules couldn't be easier.

We were recently paying a small fortune for hosting one of our websites. It was bumping up against memory limits even after a serious code rework and aggressive caching. Instead of upgrading we decided to test a new config using Nginx.

Now we run three sites, one fairly popular, on a 512Mb Linode with Nginx, APC, FPM, Varnish and a CDN, and it can take an amazing amount of load. Varnish needs memory, but without Varnish we could run this setup on a box a fraction of the size.

This plan costs $19/month! I still can't believe we're paying so little.

Instead of focussing just on the server though, and like the TumbleDry article somewhat suggests, HTTP cache is probably the best place to start in terms of performance. Varnish, CDNs, etc all rely on intelligent HTTP caching. And if you do it right, you don't need to worry (too often) about cache invalidation.

What I'm really looking forward to is making use of ESI in Symfony2 and Varnish. That will mean setting different cache headers for portions of pages, which will further reduce the need to manually invalidate cache.

For now though, I'm loving Nginx + FPM + APC.

I agree that this should be the standard, not an exception.

I've been once on /., two times on Digg and all times my server crashed. I've improved it a lot since then, using different techniques. But never been hit by one of those again :(

New self hosted blogs can't get the load of peak times, because new bloggers start a WordPress blog, install on it a lot of plugins and never think about performance. I think most of us have made that mistake some time.

Sure. But it's not a mistake; it's a correct choice given current expectations.

What I don't understand is why the various CMS's don't offer automatic on-the-fly reconfiguration. It's 2011. We should be able to have the best of both worlds:

--when load is light, your blog software hits the database 42 times with every page loaded, no problem

-- when site load shifts from 1 page per hour to ten pages per second, the CMS should automatically, with no user intervention, say "oh shit" (perhaps audibly in the server room) and then automatically generate a static front page, and static article pages for whatever pages are being linked to, turn off the bells, turn off the whistles, email and tweet at the server administrator, etc. The CMS should cheerfully weather a storm all by itself. And when the load dies down, the static page should revert to a dynamic, 42-queries-on-the-database page, again without any intervention from the server administrator.

Does this exist anywhere, out of the box?

I don't know, but I do know that my architectural approach is to get any web app I write to only hit the database ONCE per page view. Ten calls is a lot for me. I've seen some that hit the DB 100 times on each view and I was able to reduce all 100 down to a single call.

I heard one specific Joomla site used 800 queries to render the front page. A dev needs natural talent to reach that point, I guess.

I worked on a site once that loaded at an okay rate for the live site, but the staging server would take about 15 minutes (seriously) to load once every other hour or so. I thought this was strange, same code and all, so I looked into it.

There was a loop on the page that instantiated objects with an id, the number of ids dependent on the results of a previous query. What did that object instantiation do under the hood? It performed a query to fetch state. I calculated that 5000 queries were being run, and it only cropped up every once and a while (and seemingly 'never' on the live site) because queries were automatically cached for a set amount of time.

I was new on the project, and in what is probably poor form, went around the whole office, letting my horror be fully known. People just shrugged though.

edit: I forgot to add, modifying it to only perform one query was trivial.

Less queries isn't always better. Sometimes a couple small fast (easily optimized by the DB server) queries are faster than a join (which could block several tables at once instead of just one).

This is true; you have to decide when that is. Of course, multiple queries across tables can be placed into a view and queried at once.

I was slashdotted at the beginning of the year. Spent quite a bit of time at #1 for https://grepular.com/Abusing_HTTP_Status_Codes_to_Expose_Pri...

I managed to have it up and running again in about half an hour by converting some content which didn't need to be dynamically generated for every request, into static content.

Once things calmed down, I spent some time optimising it further and making lots of content static, which didn't need to be dynamically generated for every request.

I'm glad I did, because I hit the #2 spot on Slashdot for most of a day this very weekend because of this article:


My lowly Linode VPS didn't even break a sweat this time.

It's not apparent if the 9 million+ daily hits number is taking into account that peak hours will be higher than off hours. It would take 100 reqs/sec if the traffic is even throughout the day, but 375 reqs/sec if 15% of the day's traffic is in the peak hour.

Yea, I see several diffrent numbers being tossed about but:

This caching (I’m using APC right now) got my page load times down to about 170 microseconds for most pages, and 400 microseconds

400 microseconds ~= 2,500hits/sec; 170 microseconds ~= 5,880hits/sec; Both of which seem reasonable for a simple PHP site using a single core CPU @ 1+Ghz.

As a lot of you have said: Static content is the key to success. You can name it:

- Movable type

- Drupal + boost

- Wordpress + SuperCache

- Jekyll or other static website generators

Better if Nginx is serving those static files, LAMP can be behind creating the static files.

I used that way with Drupal+boost for a lot of time and worked.

The title says "120 megs of RAM", but I wonder if that's at all comparable to a real machine with 120 MB. I imagine that the "120 MB" VM is running on a beefy host with tens (or hundreds) of gigabytes of RAM shared between the guest VMs and also used for disk cache. It seems likely that accessing a guest's virtual disk would actually hit the host's disk cache a lot of the time (especially when that guest has been busy recently); that would improve the speed of disk access for that VM enough that it could make up for the lack of memory for disk cache within the guest.

This is purely speculation, but I would be interested to see if there is any actual research to back it up.

I suppose if the guest in this instance is not swapping very often, then this is fairly irrelevant, but the article didn't mention anything about swap.

Very nicely done. Using JS to give personalized experiences seems to be the way to go. I suppose you could generate a JSON list of id,date pairs to reduce page bloat, if it matters.

[Edit: This is 100 qps. It's a lot for a blog, but is not an unreasonable load by any means.]

He could just say "tl;dr\n<machine specs>\n<I use static pages>".

For static, high-traffic, small-size content, doesn't it make sense to load a minimal OS entirely to RAM and serve it from there? Has anybody tried this? (I guess this rules out VPSes...)

Note: This is a variation of a previous comment I made, but a variation nonetheless. Sorry to belabor the point.

Naive question: suppose that you serve a low-throughput site, say with a total of 3MB of data (probably text files). What's the simplest way to ensure that those 3MB of content (very little compared to 120MB) live always in RAM? By this I mean not giving the server a choice =).

Point the server at a ramdisk, and set up a cron job to sync it from the regular disk?

discovered blitz.io will keep me and my servers busy this weekend

Haha, looks like they could put some of the article's suggestions to good use there. Home page of blitz.io throws a 500 Internal Server Error right now...

same error ..was going to check it out to see how many server handles load

sorry, we had a glitch this morning when one of our db clusters went offline - had to reroute all traffic to our other cluster - all good now.

Kinda expensive :(

I had a weekend hack project like blitz/loadimpact, was thinking of charging by the test (ie you pay what you use, which makes more sense than charging per calendar time).

Plan was to offer quick tests (ala loadimpact), whole-site test (you give it the url to start, and it'd hit the linked pages with some probability as well) and custom scenarios (a list of urls to hit in order, rinse, repeat), and API to trigger tests automatically (for integration in routine integration/regression testing).

Got the backend working, never finished the frontend/UI. If anyone reading this is interested, let me know in reply - might put up a quick working demo page for it.

I use http://loadimpact.com/

The free version can give you an idea of how your server will handle a small spike, and the other versions are per day so if you only need to test once or so a week as you finish a sprint then it wouldn't be a horrible cost.

It's odd that they let you just test any website without proving its yours (by e.g. putting special file in the root or something). Can't unauthorized testing be considered a DoS attack?

I can only speak for http://blitz.io. We generate an account-specific UUID which becomes a URL that you need to make available on your app (adding a route in ruby/node.js or uploading a txt file). Before each load test we check that your UUID is available on the app. Even if your UUID is leaked, this is not a problem, since it's unique to your account. Unless of course, your Google/Facebook account gets compromised. We currently, do not support password logins. It's either OAuth/FB Connect or SSO through our partners.

That's about 100requests/sec, which isn't particularly amazing.

On average that means 1.2 Mb memory available/request, which isn't particularly boring either.

I would flip shit to get that many views. I'm sure I'm not the only one.

One thing to point out, based on my experience, is that you need about 10x or more peak throughput to handle a given average throughput. Spikes kill you.

Wonder how much faster it would be if PHP was taken out of the mix (looks like he's just just serving static pages anyway).

But he is using the APC cache, which is part of PHP, so I'm not sure he could pull that off? PHP-FPM is the fastest way he could be using that interpreter anyhow, as it maintains a persistent bank of PHP instances in-memory, with nginx acting as a proxy to it via a Unix socket.

You wouldn't need APC if you weren't running PHP...

Yes, obviously I meant just serving static pages. You don't need to run every page through PHP in order to have comments, dude. (I didn't mean completely remove PHP, just not make everything run through it)

You are right, it would need less hardware for the same work. Two days ago I wrote this:


But the way his comments work, they way he has implemented the infinite scroll it's great and I'm no expert, but I think that is not possible with static pages.

What would you replace PHP with? Are you suggesting static files ?

Quick Question. Would using Nginx as a front-end to Node improve performance in the same way it has done for serving PHP?

Putting any proxy cache in front of your app will help. You can 'simulate' static caching of dynamic content. You will have to take a look at your headers however as most caches obey upstream caching headers to spec tightly.

By inlining the comments, he's reducing cpu time by...transferring extra data across the network?

love it. gives me faith that the archaic machines i have serving can still hold their own!

There was some post about a happy Win/IIS/CF guy - he definitely should read this.. ^_^

hits/day and megs ram are orthogonal.

I have to say I haven't tried that myself nor have I looked at the prices so my question is: All the fun of having your own server aside, why wouldn't I rather just run a site like that on something like amazon ec2 and stop worrying about hits and load even if it is just a personal blog?

I'm not too sure I'd go with a load testing company that can't even keep their own website up.


> Internal Server Error

It's always exactly the time when you need your service to be up the most that it goes down. What matters is how they respond to it.

thanks for the vote of confidence, will write up what happened soon.

Here's the blog on what we are learning as we scale: http://blog.mudynamics.com/2011/09/01/blitz-io-path-finding-...

how about just using a CDN service. There is no need to play around anymore.

Very impressive, it's amazing how the "LAMP" stack continues to evolve.

Looks more like a LNMP stack.

SNMP actually.

I'm using PHP-FPM myself and it is fantastic, major increase in performance when compared to mod_php. I still have not let go of Apache2 yet, but nginx is calling to me due to it's non-blocking design...

It's pretty painless beyond understanding it. Plus curbs most DDoS issues lately.

Not to mention that the folks in #nginx on freenode are really very helpful.

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact