Hacker News new | past | comments | ask | show | jobs | submit login
Sysadmin mistakes start-ups make (cloudkick.com)
164 points by polvi on Oct 20, 2009 | hide | past | web | favorite | 92 comments

Here's one we made recently: Purchasing an array of hard drives (for storage servers) and not making sure that not all of them are from the same batch. Since they were made in the same batch, they had the same defects and when they failed, they failed one after each other in a very short interval. Since all of them failed, RAID didn't help, we had to restore the day-old offline backup.

I've thought this for a while. Ideally, you want to have drives of different ages, so they're at different pts of the 'bathtub curve'.

If you're mirroring, you should only ever mirror a "fresh, unproven" disk with an "old stalwart" disk. Doing that also means that when your "old stalwart" becomes senile, it's paired with a younger disk.

But doing that does mean rotating mirror sets when you buy a new tranch of disks. Which does put load on, which can trigger failure.

Anyone have a good plan for doing this?

(In theory, as I don't run a storage farm) RAID6 might help here since if the disk rotation triggered a failure you'd have a second parity disk to fall back on. However ideally all the disks in an array would be from different lots.

Or, I believe that on some RAID controllers it is possible to add a third disk to a RAID1 pair, which means you could build the new disk before removing either of the old ones, thus there is never a single point of failure even during the disk replacement operation.

Legend has it that one of the early ESS systems ran into something like this.

Telephone switching systems have functionally zero downtime. They're designed to be fully modular, entirely hot swappable, the kind of thing Erlang was built for and give Z-series a run for their money. So you have this hulking room-sized[1] brute of a telephone switch which cannot ever fail and if it does so must always do so gracefully and with plenty of warning.

And one day it just falls over. No warning, no graceful failover to a redundant system, just poof gone. After much wailing and gnashing of teeth the root cause is identified: n drives were capable of failing without issue, n+1 failed simultaneously.

At this point, stories differ: this was either the beginning of a "no two drives from the same manufacturer" policy or the end of the career of a PHB who vetoed said policy on grounds of excessive cost.

[1] http://www.montagar.com/~patj/phone-switches.htm

You frequently see RAID cascade failures.

Drive A fails. You pop it out, put in a new disk. Machine starts rebuilding RAID array. This involves a ton of reads from other disks. Under the increased strain, Drive B, which was already on the edge, also fails.

And so on.

I'm no longer a believer in redundant disks. Redundant machines would be better.

Good lesson that RAID != Backup!

RAID5 by chance?

ARRRRRHGH. That is NOT THE LESSON. I'm sorry, every time there is a discussion of hard disks and RAID, someone posts this same stupid comment. The GP comment didn't suggest that RAID == Backup. Not in any way. Your reply suggests that they did.

"RAID5 by chance?" How do you know they didn't use it? They might have had multiple failures too close together to rebuild the array.

Sorry for the caps, but I just snapped seeing this comment for the nth time and with so many upvotes.

That isn't something I'd ever thought about until now.

Would it make sense to use drives from more than one company as they would very different failure characteristics?

It's a terrible idea to mix and match drive models, because they'll have drastically different performance characteristics, and if you're lucky you'll only get a little worse than the lowest common denominator on all axes.

What quality OEMs do is make sure they never ship you drives from the same manufacturing batch in one enclosure.

How do you make sure to buy from a quality OEM?

Possibly. IBM had the infamous bad run of DeskStar (or was it another model?) drives about 8-9 years ago... We got a batch of them - in that case it wouldn't have helped you to buy them from different distributors or otherwise tried to get a different batch as the number of problematic drives was huge. At least they were extremely good about replacing them no questions asked and we got them all replaced before we lost data.

DeskStar (or was it another model?)

DeskStar is right. They earned the nickname "DeathStar" because they failed so often.

Would it make sense to use drives from more than one company as they would very different failure characteristics?

This is probably difficult to take advantage of unless you setup your RAID layout very carefully.

Fork is actually a very fast system call. It never blocks, and (on Linux), only involves copying a very small amount of bookkeeping information. If you exec right after the fork, there is basically no overhead.

However, forking a new shell to parse "mv foo bar" is more expensive than just using the rename system call. And it's easier to check for errors, and so on.

SQLite is also not as slow as people think it is; you can easily handle 10s of millions of requests per day with it. If your application's semantics require table locks, MySQL and Postgres are not going to magically eliminate competition for locks. It's just that they both pick very weak locking levels by default. (They run fast, but make it easy to corrupt your data. Incidentally, I think they do this not for speed, but so that transactions never abort. Apparently that scares people, even though it's the whole point of transactions. </rant>.)

Most of my production apps are SQLite or BerekelyDB, and they perform great. I am not Google, however.

Actually, PostgreSQL's MVCC architecture makes lock contention radically less likely than with most other DBMSes. For SELECTs, you're only ever taking "Access Share" locks on the table ("Hey, I'm using this table; you can't DROP it right now."). For DML queries (UPDATE, INSERT, DELETE), you'll see those, plus "Row Exclusive", which is just what it sounds like.

There are a few other lock types you'll run into as well, but they're typically only seen in narrow, specific cases -- when you're performing maintenance (vacuuming, clustering, &c), indexing, DDL changes, or when you explicitly LOCK a table for whatever reason.

> SQLite is also not as slow as people think it is; you can easily handle 10s of millions of requests per day with it

Scaling relational databases to the point of 10s of millions of requests is extremely non-trivial. Unless you can show me personally or can show me evidence otherwise, don't make this claim. You're doing a disservice to the people that have worked countless hours to eke every last millisecond of performance out of MySQL and Postgres.

10 million per day is about 100 per second. SQLite performs about this quickly. I wrote a test script and it did 125 (unindexed) lookups per second. Then I ran two of these tests at the same time, and the rate stayed about the same. I have 8 cores, so I made 8 processes, and it was the same. 125 requests/second * 8 * 86400 seconds/day = 86_400_000 requests per day.

I added another thread writing as quickly as possible to the mix (7 readers, 1 writer), and this brought the read rate down to about 45/reads per second per thread. Still more than 10 million per day, so technically I am right.

Also, I don't doubt that MySQL and Postgres (and BDB) are both significantly faster than this. It's just that SQLite is not going to guarantee "instant failure" of your project, as the article implies.

(One thing to note -- every time you type a character in Firefox's address bar, you are doing an SQLite query. It is Fast Enough for many, many applications.)

I respect your research into the matter, but it frankly doesn't matter one bit. Here's why.

Your peak load/sec is never going to be close to your average load/sec. I've seen be between 2x-5x average load.

Read performance on simple selects is straightforward to scale. Most solutions to that problem just put the data in memory, via the database cache or memcache.

Part of the difficulty is in scaling writes. What if your 1000 queries/sec are all inserts on the same 500M row table? What if they are updates updates on a table that's 10M rows long? This is when you have to make hard decisions about sharding and the like.

I certainly believe that SQLite is "fast enough" for many applications. I also know that many applications are NOT doing 10s of millions of requests a day.

That -is- fast, but I still have trouble reconciling that deep down in my computer, a human readable SQL query gets built, and then another process parses that SQL. Seems so wasteful building and then parsing a human readable string for something that's happening on the same machine.

I know nothing of SQLites's internals, but wouldnt it make more sense to parse the query once and then store a compiled version of the query for subsequent lookups? Like you might do with a regexp?

Yes, This is known as a prepared statement. You compile a parametrized statement once, then execute it as many times as you like with different arguments.

Also, SQLite, unlike most other databases, is an embedded database which does everything in-process rather than invoking multiple processes.

Scaling relational databases to the point of 10s of millions of requests is extremely non-trivial.

Simply not true. That level of database activity is bread-and-butter for many people these days. Hundreds of millions, non-trivial in the sense that it will cost you a pile of money, but these days, basically straightforward, especially if you've done it before. Thousands of millions, now that's where things get interesting.

Oh, and he's talking about requests. The trading systems I've worked on can and do handle hundreds of millions of commits per day.

The memory use is not accurate unless you take shared pages into account. Copy-on-write will make it look like each apache child is using 40MB, when really it's only 10MB private RSS. Use a RSS-calculating script (http://psydev.syw4e.info/new/misc/meminfo.pl) to determine the close-to-real memory use. If you don't calculate your maximum memory use correctly you will run into swap with traffic peaks. Also keep in mind that swap is a good thing. Is your app constantly cycling children? This isn't going to allow it to move unused/shared memory into swap. Don't ignore memory leaks by reducing your max requests per child.

The forking thing is more of the same. Copy-on-write means it's not going to balloon your memory unless some function turns that shared rss into private. It isn't something that you want to do a lot of, though.

This stood out to me as well. I like the script to actually calculate real usage. Modern operating systems are smarter than I am when it comes to memory management.

What you don't want is to have anything you use more than once a minute in swap, and preferably only the stuff you don't plan on using for an hour (i.e. not any time soon). That probably means you want your main application and web server in memory all the time. If there are pieces of it that are unused and you're hitting a resource cap then you have something mis-configured.

RAM is also dirt cheap right now, making it often easier to add RAM than to optimize slightly sloppy code.

One of the most common problems we see is DNS misconfiguration. It seems most folks just haven't read the grasshopper book. If you're doing anything on the Internet, you need a basic understanding of DNS.

Once you grasp the fundamentals, most DNS problems become completely transparent, but I've seen people spend weeks trying to solve DNS problems due to lack of understanding.

dns is the cause of many seemingly unrelated problems. some services (like sshd) do reverse dns lookups on connecting ips. a misconfigured dns server somewhere (or improper delegation) along the path can make this initial connection take up to 30 seconds while waiting for dns timeouts. it may look like an extremely slow/busy server, but in reality it's just sitting there doing nothing waiting for a dns reply.

tools like dnstracer (http://www.mavetju.org/unix/dnstracer.php) and dig are very useful for diagnosing these issues, but you really need to understand the fundamentals of how dns and delegation work.

Yes, I should have been emphatic that part of the problem with not understanding DNS is that if you don't understand it, you might not even realize that your problem is DNS-related.

Whenever I hear, "X is slow!", my first response tends to be, "Are you sure it's not DNS instead of X?" About 50% of the time, DNS misconfiguration is at least a component of the problem if not the entirety of it.

DNS touches every service on the Internet. If you get it wrong, you break every service, sometimes in subtle ways.

Thanks, we just got a fixed ip at work and I was wondering why the initial connection to our servers took longer than usual. Our new ip doesn't have a dns entry.

I'm not sure which book you are talking about but my guess would be "DNS and Bind". http://oreilly.com/catalog/9780596100575/

Yes, it's one of the handful of books that built the O'Reilly reputation for stellar books about Open Source software. And it's one I would recommend for everyone building anything on the Internet. Unless you have a full-time system administrator, it should be considered required reading. You probably don't need the whole thing, but I can't think of any better source for the bits you do need to know.

Or, you can skip the O'Reilly book --- and most of the mistake you could make with the simplest BIND setup --- and just use djbdns.

While I respect your opinion on most topics, I believe you're way out of your element here. I'm talking about fundamental misunderstandings of the protocol, not stupid typos in the configuration file: Glue records wrong at the registrar, incorrect NS records, non-existent MX records, misunderstanding of propagation/caching/TTL, no PTR, etc. These are the kinds of problems I see all the time.

djbdns doesn't make the person doing the configuration understand those things any better than BIND does.

I'm pretty sure somewhere in my resume there's something relevant to this conversation, but I can't remember.

Meanwhile, the reason I brought up djbdns is that it has automatic best-practice behavior for a lot of basic DNS config issues, like matching PTR records, or setting up reasonable TTLs and SOA field values. It's right there in the data format.

That's apart from where the dots and semicolons go in the database files. Which yeah is a mistake I managed to make a lot, even after writing a dynamic DNS server for our ISP, and is a problem that djbdns makes go away completely.

Sorry. You're going to have to add this to the list of topics I'm insufferable about. ;)

I'm pretty sure somewhere in my resume there's something relevant to this conversation

Hey, what a coincidence...me too!


I've also been a contributor on an alternative DNS server in the distant past...back when it was fashionable to hate BIND.

Meanwhile, the reason I brought up djbdns is that it has automatic best-practice behavior for a lot of basic DNS config issues, like matching PTR records, or setting up reasonable TTLs and SOA field values. It's right there in the data format.

BIND also has reasonable defaults, and there are tools (Webmin, for example, just to throw something out there completely and utterly at random) that make it easy to get all of the "reasonable default" things you've mentioned right.

You're going to have to add this to the list of topics I'm insufferable about.

OK. But you're still wrong.

Ok... but I didn't just say I was right, I offered a reason why. Why am I wrong about BIND? I'm not as up on BIND 9 as I am with 8 and 4, but last I remembered, BIND didn't automatically match PTR's to A's (one of your examples).

Because I'm not talking about BIND at all. I'm talking about DNS. I don't care what DNS server people are using (we also support PowerDNS, and we also get questions about djbdns, even though we don't support it). These are the mistakes people make, regardless...and it's due to fundamental misunderstandings of how DNS does what it does, and rarely has anything to do with syntax (most of the people I interact with are using Webmin, which hides the configuration file entirely, and so the semi-colons and dots are irrelevant because Webmin always gets them right; and Webmin can also automatically manage PTRs).

Ok, but I'm specifically not talking about the semicolons and dots. I'm talking about the semantic issues you brought up, like matching PTR's to A's, or the fact that "You may omit ttl; tinydns-data will use default cache times, carefully selected to work well in normal situations.", or the fact that you can't forget to bump serial numbers in djbdns files.

I'm not needling you, Joe. I just genuinely don't know what --- apart from picking up the delegation for your domain name --- the common DNS errors are that djbdns doesn't address. I'm not saying there aren't any; I'm just asking you to say what they are.

Also, shouldn't you just go ahead and support djbdns? It always seemed to me like Bernstein went way out of his way to make it easy to drive tinydns from other programs. Do you really get more requests for PowerDNS than djbdns?

I think what SwellJoe is getting at is that different software won't address misunderstandings and assumptions. Eg: There's a lot of folks out there who aren't aware that they should incrementally drop a records TTL prior to changing it's data so as to control the amount of time the record is in flux. See http://search.twitter.com/search?q=dns+waiting+OR+propagate or similar.

OK, let's get specific:

PTR requires delegation from the authoritative server for the IP address, and this is wholly separate from the registrar process. Many folks don't understand how PTR works vs. how standard A records work. If you don't understand it, you don't know you need to talk to your hosting provider about this delegation.

Many folks also don't understand how the PTR is used in validating sending mail servers. And many folks have a hard time grasping that you only configure one PTR for each IP address (see? I'm talking fundamental misunderstandings of DNS here, and no DNS server or GUI can fix them, though gods know we've tried). Finally, you'd be surprised how many people intentionally configure a PTR that does not resolve in the other direction, thus breaking mail.

Most folks are dealing with many domains on a single IP address. This further confuses folks with regard to the PTR. Which name for the IP? Many folks are baffled. A basic understanding of DNS would resolve this (no pun intended). As I mentioned above, with an understanding of the way it works, most DNS problems are completely transparent.

Default TTL doesn't solve a user not understanding that when they change IP addresses, it can take as much as days for the rest of the world to know about those changes (first world will know within hours; third world caches much longer and occasionally ignores TTL). Again, I'm talking about users fundamentally not understanding how DNS works and that it is a heavily cached protocol. If you don't understand those things, having a default TTL doesn't help you when it's time to migrate to a new data center. Webmin also provides default ttl values.

Serial numbers. I have never once mentioned serial numbers in this discussion. I can't remember the last time I've heard someone who had a problem with serial numbers...actually, now that I'm thinking of it, I do remember. A user (I think a recovering djbdns user, actually) who thought they knew more than they really did wanted to argue about RFCs and whether an incrementing number or a datestamp was valid (I don't remember which, but the form he was arguing was invalid was also provably valid, based on examples found within the RFC). But, this is the kind of silly stuff that I'm not talking about. Syntax is irrelevant to my suggestion that people read DNS and BIND. Oh, and Webmin handles serial numbers in BIND, in just about any format that is valid according to the RFC.

Some others issues that I see people make:

Incorrect NS, MX, etc. records. djbdns doesn't babysit the records to make sure they point to working servers. Users occasionally try to use IPs directly in these records. BIND will error on this, as expected, but it still confuses people who don't grasp basic concepts.

Performance. People who don't understand DNS often think they can get out of it by relying on someone else for their DNS service. Like the overworked DNS servers provided free with their registrar or hosting account. On top of this, these are also the same people that make all the dumb mistakes mentioned above, compounding their confusion into a perfect storm of "nothing was working, so I reinstalled my operating system and now nothing works!" stupidity.

Fundamental misunderstandings of the protocol and the way DNS does what it does are what I'm talking about. Some DNS servers are easier to configure than others...I'll concede that. Doesn't matter. When you set out on a journey, you kinda have to have a vague notion of where you're going, or you probably aren't going to get there, even if you have a GPS.

Also, shouldn't you just go ahead and support djbdns?

Why? All the security issues that plagued BIND 4, and to some degree BIND 8, have mostly been long resolved, and BIND 9 is, by far, the world's most popular DNS server. djbdns these days is obscure, at best. You and I remember the time when it was relevant...but I'm involved enough in this aspect of the industry to know that the ship has long since sailed for djbdns; new users are not adopting djbdns on any scale worth talking about, though plenty of cantankerous old-timers keep hanging on. We get maybe one person every six months to ask about djbdns. There are so many things ahead of djbdns in our queue that it will likely never make it to the head.

nginx is our next major endeavor (which gets requested every couple of days, and has real advantages over Apache for some of our real paying customers). nginx is actually a much bigger job than djbdns would be, since DNS is already fully abstracted out of Virtualmin because of the PowerDNS module, while web service with Apache is pretty deeply ingrained. But I can pretty clearly see an upside for us in supporting it that I do not see with djbdns.

Also, the historic license stupidity of djb software has guaranteed that all of it would fall into obscurity. I guess most of it is public domain now, but I just don't see much new interest in their use.

Do you really get more requests for PowerDNS than djbdns?

Yes, we did, but we also got money for PowerDNS support. It wouldn't have happened without it being fully sponsored by a hosting provider that wanted to use PowerDNS. We don't do contract work any more, but we could help you find a developer to add djbdns support, if you'd like to see it in Webmin and Virtualmin. Or, if you know Perl, we'd be happy to assist you with getting up to speed on the module API.

Actually, it looks like there is already a third party Webmin module, of some sort, for djbdns. Though it looks like it hasn't been updated in years, and doesn't have a lot of discussion on the web about it.

Just so you don't think we hate djb, we do fully support qmail, though I happen to prefer Postfix by a large margin (and I doubt we would spend significant time on qmail today...the userbase is a fraction of what it used to be, and we rarely get questions about it).

Anyway, I'm still not really interested in talking about the relative merits of DNS servers. I don't care. They all work acceptably well, at this point, and I rarely see a BIND configuration file (I don't have a problem with BIND configuration files, but why bother? I've got tools for that).

I just wanted people to understand the most fundamental building block of the Internet a little better, and I pointed to the best book on the subject. Complaining about DNS and BIND because it uses BIND configuration files for the examples is like complaining about TAoCP because it uses MIX rather than Python (or whatever) for the examples. There is no better book on the subject of DNS, or if there is, I have never seen it.

This was a great comment. I disagree with a bunch of things in it, but regardless of that, I feel vindicated in this thread for prying it out of you. =)

Serial numbers: I'm not sure if BIND 9 just made this problem go away, but back in the dark ages when I managed BIND for a couple thousand zones, you had to manually bump the serial for every change you made. AXFR relies on the serial in the SOA to decide whether to propagate a change.

Performance: I think people are way out of whack on the performance implications of DNS. I've spent years hammering DNS servers, and while it's true that BIND 8's terrible memory management will drag down the performance of the rest of a server, the actual request latency BIND adds answering queries is so low that you can use it to statistically detect whether sniffers are running, as a proxy for user->kernel latency (which shoots up when your ethernet device takes the hardware MAC address filters off to go into promiscuous mode). So, for whatever it's worth: I don't buy that there are serious performance problems with server selecton.

Supporting djbdns: meh, I was curious, not challenging you. Obviously don't do it if your customers aren't asking you. You're wrong about the security implications of BIND, though.

Thanks again for replying in such detail.

I suspect our respective positions are coming from two very different sides of the industry. You're assuming people, on average, working in web applications are way better informed about the infrastructure of the Internet, or much better at intuiting how a system like DNS might work, than they actually are. I've been supporting non-technical users who are building things on the web for a dozen years now. I'm no longer surprised by the capability of people (even smart people who build cool things) to misapprehend how their systems are talking to the rest of the world and how others are finding them.

Performance:...I don't buy that there are serious performance problems with server selecton.

I generally agree. DNS is an incredibly low demand task, and even a modest server can serve millions of queries per day. No argument there.

That said, DNS is a latency cost that echoes through every service. And some free DNS services are notably slower than a server you run yourself would be. Doubling the latency of DNS queries can add measurable latency to a first load (where you might lookup a dozen names for images, media content, ads, etc.). People do care about shaving a second off of a page load time.

But, yeah, performance is mostly irrelevant. The bigger problem is just that we see folks using those kinds of services as a substitute for actually understanding DNS. We get a disproportionate number of queries from users using third party DNS services, and they tend to be of the really stupid, has no concept of DNS at all, variety.

You're wrong about the security implications of BIND, though.

I will certainly not argue with you on security questions, since it is not my area, and I have a lot of respect for your opinion on security issues.

But, I was unaware of any exploits in current BIND versions. According to the BIND security advisories page there have been two security advisories this year; one a DoS and the other was actually an issue in OpenSSL. And, most importantly, there have been no root or user-level access exploits. That seems to me to be a pretty good security record.

OpenSSH (which we all trust and consider "secure", I guess?) tends to have about one major security issue per year...so if OpenSSH is considered secure, then it seems fair to consider BIND pretty secure, as well. There are probably "more secure" DNS servers (and djbdns may be one of them), but I'm not really competent to make those kinds of judgements, so I trust my OS vendors to choose reasonable defaults for this kind of thing. And, BIND is the default DNS server on every OS I use. If it really had a poor security history, I would probably be spending time worrying about it, or contributing on an alternative DNS server project, as I did back when BIND did have a poor security record.

What security implications do you consider using BIND to have currently?

Could you name that book? Do you have know of any other books people should read for sys-adminning?

There are very few generally useful books about system administration. DNS and BIND just happens to cover a subject that touches everything we do on the Internet; and covers topics that are often poorly understood, and difficult for most folks to get right through intuition and dumb luck. It doesn't hurt that BIND is damned near universal in usage, so odds are extremely high it's the DNS server you use.

The Frisch book is probably a great start for general concepts, though:


It's been ten or more years since I've read it, but it's been updated every few years, and is probably due for a new edition any day now actually. The concepts it covers (backups with standard UNIX tools, for example) are somewhat timeless. It's probably not required reading, though, if you don't actually want to be a system administrator.

Most books are just re-hashes of the documentation for a particular service, like Apache or Postfix or Sendmail or whatever, so I don't really have any strong opinions in that direction. When I had problems with Sendmail in the distant past, I found the O'Reilly book useful, but I've never needed third party docs for Postfix, which I've been using for the past eight years or so. Books about specific software are also often quickly dated by new versions of the software.

So, that's a long-winded way of saying, "Not really."

I'm amazed nobody has mentioned the "Practice of System and Network Administration" book by Limoncelli and Hogan.

Highly recommended.


DNS and Bind (now in its 3rd edition) by Liu, Albitz, and Loukides. It's an O'Reilly book.


Amazon shows it's actually in its 5th edition now:


Good call :) Too late to edit :(

In my experience of the most common mistakes is the failure to realise that on pretty much all Linux distros, services like Apache and MySQL come conservatively tuned. This is deliberate; it means a DoS or out-of-control process within one of those domains is unlikely to take out the entire server, because there's a hard limit on consumption of memory, CPU, child processes, threads, etc.

However, this default configuration needs to be tuned to allow you to take advantage of the hardware - if you have generous hardware. Otherwise, you will wonder why your web sites are extremely unresponsive, yet the server load stands at something relatively unimpressive.

I found this out the first time a blog post on one of my servers got digg'd.

I'd guess the real number one mistake is insufficient paranoia about backups.

I know lots of companies doing TDD but that have never done a full test restore from their backups.

The easiest way to do this is to make your backups the mechanism by which you refresh your Dev/QA environment from Production. It means your Ops team are very nearly doing a DR exercise every week.

I'd never heard that advice before - sounds like a great idea.

It's a great idea, we were doing that at my old company a few years ago. It also means that you have fewer problems with migrations when pushing them out to production.

This was a great article, but I ended it wondering whether they either (a) knew what a system call was (until the end, I thought maybe they meant a system() shell-out) or (b) realize how many system calls a vanilla request/response cycle incurs.

I disagree with 1.3. "Serving static content is the easiest possible task for any web server." Yes, but keeping connections open for slow clients (esp with KeepAlive on) is not a good use of your 500MB Mongrel process' time. On the other hand, KeepAlive is a handy thing to have.

Using a proxy like nginx or varnish to serve static files (and even dynamic data) if you have the proper KeepAlive and Nagle bits flipped can save you a lot of server resources at the application layer.

It's almost always a bad idea to use anything other than a non-blocking/async server to handle static content.

I think it's simpler/easier (maybe faster) to serve content from a separate sub-domain (static.site.com or whatever). Using a reverse proxy works too, but unless you're caching dynamic content it's probably no benefit and it's less efficient.

A good reverse proxy will buffer client and server side so that your heavy app can be available to serve the next request whilst the light proxy feeds the page back to a slow client.

Under certain circumstances serving static files from separate hostnames can be beneficial as HTTP clients are supposed to limit the number of simultaneous connections per hostname.

The default KeepAliveTimeout setting for Apache is 15 seconds, which is too long. Many of our large customers are setting KeepAliveTimeout to 2 seconds which frees up that apache worker to process new requests fairly quickly. You'd be surprised how many people never change this value from the default.

Apache disables Nagle by default, which is what you want for small static files, but I'd love to see data showing that Nagle is actually a significant performance issue for a realistic load.

You're right about Nagle; I mention it only because lighty or one of the others does not turn it off by default.

Having a lightweight proxy that keeps connections alive on the client end but cuts them off between themselves and the application layer is the bigger win all round for many real-world web loads.

Yep, #1 happened to me the other day. We hit our Apache server limit of 256 and the site slowed to a crawl. I'm not really sure what was causing the load to be like 50-90, but requests were quite delayed waiting for an open process (keepalive was at 5 secs).

Indeed, my first idea was indeed to install nginx for images really quick. However, I have no experience with nginx. Thankfully, we had a spare server and I offloaded the images to there for now... Throwing more hardware at the problem usually works.


However, sqlite should never be used in production. It is important to remember that sqlite is single flat file, which means any operation requires a global lock

I don't know jack about sqlite's locking architecture or scalability, but this statement is just silly. There are a conceptually infinite number of ways to make fine-grained locking work on a single file, both within a single process, a single host, or across a network. Maybe the author is thinking fcntl() locking is somehow the only option.

I guess the corrolary to this article has to be "Don't let your startup's sysadmins diagnose development-side issues."

SQLite locking: http://www.sqlite.org/lockingv3.html

""" An EXCLUSIVE lock is needed in order to write to the database file. Only one EXCLUSIVE lock is allowed on the file and no other locks of any kind are allowed to coexist with an EXCLUSIVE lock. In order to maximize concurrency, SQLite works to minimize the amount of time that EXCLUSIVE locks are held. """

But compared to something like MySQL w/ InnoDB (or postgres, or Cassandra, or BerkeleyDB), which all have something closer to Row Level or Page Level locking, SQLite's concurrency for server side applications is a serious deficiency.

Yes, there are lots of ways to have fine grained locking, SQLite just doesn't do them.

Like many of SQLite's other quirks, this is because SQLite is designed to accommodate embedded usage.

I guess the corrolary to this article has to be "Don't let your startup's sysadmins diagnose development-side issues."

You'll have to add "Make sure your developers can diagnose development-side issues" to the list as well. Most web app developers I have met do not know how to diagnose problems, or simply defer immediately to the sysadmins if there's no syntax errors or logs to refer to.

As someone else mentioned, Sqlite does indeed use lock per database file. But of course that is not an argument for it to never be used in production, but another argument for not letting sysadmins diagnose development-side issues - there are plenty of production type scenarios where Sqlite's locking is perfectly fine (but scenarios that are write heavy enough to cause lock contention is not one of them - if you use sqlite you do need to understand the locking and what workloads it is unsuitable for; especially since some bindings will give an error rather than wait for the lock)

I read an interesting performance comparison a while back (which I can't find now) which had some surprising results for sqlite with concurrent access. As I recall, it turned out that it was much faster to close the db connection and open it again for each operation than to keep a connection open and rely on the file locking to mediate access.

I'm getting 10k uniques a month and SQLite is working fantastically, as the db behind one Sinatra process, behind one Thin.

I'm getting 10k uniques a month

I think I see your problem.

Also known as 10-15 hits an hour.

My microwave can probably serve that traffic.

"any operation requires a global lock" is incorrect. Only inserts/changes lock the database; there can be multiple selects: http://www.sqlite.org/faq.html#q5

How are the last two system administration problems?

This seems like an odd section of sysadmin mistakes - I would have thought there are some other ones being made more often.

Especially since the third one is a developer mistake that, as a sys admin and developer, I've had to point out to developers not to do -- but for security reasons, not because fork is oh-so-super expensive (even though it can be).

Also, there is no "system" system call. "system" is a library call that forks and execs a shell to evaluate and execute a string. Having a sys admin that doesn't know the difference may be the biggest sys admin mistake you could make. There are a lot of library wrappers for system calls, but these are documented in section 2 of the man pages as system calls.

I'd say their biggest mistake is usually not hiring a sysadmin who also has development experience (or developers without sysadmin experience). I've found that my knowledge in both realms has been invaluable in determining how to design the infrastructure and how to write the code.

If you fork inside an app server, such as mod_python, you will fork the entire parent process (apache!). This could happen by calling something like os.system("mv foo bar") from a python application.

I nominate this post as the most distressingly important bit of information I've ever received at 2:43 AM in the morning.

Now the question: what can I do in Ruby to avoid the four calls a second or so I'm currently making to system(big_command_to_invoke_imagemagick) ?

Four forks per second is basically nothing. This article is blowing it all out of proportion. You can't sustain forking per web request on a really large site but at this scale it's not going to matter.

The author is being stupid: the size of the process that you're forking doesn't really matter (it might start to matter if you didn't call exec() or exit() right after you forked, but that's not the case: you're just execing another program, which replaces the current process in memory). VERY little is copied; fork is defined to have copy-on-write semantics for the process's address space.

I'd use something like DelayedJob and send_later the call to your image processing stuff, that way the forking happens out of the request path, at least.

You just described my exact setup. However, my understanding is that Delayed::Job's worker threads have a full Rails environment in them, and if this blog post is correct and I am indeed forking that entire Rails process for every call out to ImageMagick, my vague recollections of what a fork entails suggest to me that the Ghosts of C Programmers Past are going to visit a terrible vengeance upon me.

The fork+exec is efficient. The blog post compares things without units. Forks (principally page table copies w/copy-on-write in effect) are measured in microseconds and the exec is your standard binary startup time. While you don't want to put a synchronous fork/exec in the way of 5,000 reqs/sec, it will be a trivial part of your asynchronous imagemagick processing.

At scale, you might care about the imagemagick startup latency, but not the forking.

Only if you run out of memory. But with DJ at least you should be forking only one call at a time, rather than multiple, like you might from the controller itself. So although you'll end up using more memory, it'll only be one extra rails process, not 4.

May not still be ideal... interested to hear other people's ideas.

Use a queue. You should never be doing time consuming method calls inside a controller anyway.

Yes, yes, yes. Beanstalkd is an easy one to set up, for example, with good Rails integration.

The solution to use an image processing library such as RMagick, http://rmagick.rubyforge.org/

Calling into RMagick/ImageMagick from inside the request/response cycle is probably even worse than shelling out, because ImageMagick does grievous damage to your runtime.

I guess it all depends how you design it and what you are doing. I would have to agree with others, the out of request cycle image processing solutions are definitely the right way to go overall.

Last I tried RMagick, it leaked significantly. Definitely not something I want to use in a long-lived process. I remember having to fork to use it, to work around the memory leak. If you don't need the fancier operations, there are lighter image manipulation gems out there that do just the basics, but without leaking. e.g. ImageScience http://seattlerb.rubyforge.org/ImageScience.html

hmm, I have a hard time understanding why anyone would try to use sqlite in production unless they explicitly wanted to?

Simpler to use (no external program to start and monitor) and to backup (just copy the sqlite file)

Personally I don't think we would ever run into those issues. A. we don't have other servers to switch over to B. We are using MySQL for testing and development and C. we don't like what happens when we make system calls for within a web app. forget about forking.

Take #1 and generalize it to the mistake of trying to fix a problem without really understanding what the problem is. This has to be the most common mistake I've seen in the sysadmin world.

Is this a example of a knowledge level of modern sysadmin? If so, we're in trouble. =)

Sysadmin should be able to think in terms of data flows, which means memory management, data partitioning, and network stack usage, able to put different types of data into different kinds of storage, and understand the role of cache and how data should be access.

Packages are just a tools.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact