If you're mirroring, you should only ever mirror a "fresh, unproven" disk with an "old stalwart" disk. Doing that also means that when your "old stalwart" becomes senile, it's paired with a younger disk.
But doing that does mean rotating mirror sets when you buy a new tranch of disks. Which does put load on, which can trigger failure.
Anyone have a good plan for doing this?
Or, I believe that on some RAID controllers it is possible to add a third disk to a RAID1 pair, which means you could build the new disk before removing either of the old ones, thus there is never a single point of failure even during the disk replacement operation.
Telephone switching systems have functionally zero downtime. They're designed to be fully modular, entirely hot swappable, the kind of thing Erlang was built for and give Z-series a run for their money. So you have this hulking room-sized brute of a telephone switch which cannot ever fail and if it does so must always do so gracefully and with plenty of warning.
And one day it just falls over. No warning, no graceful failover to a redundant system, just poof gone. After much wailing and gnashing of teeth the root cause is identified: n drives were capable of failing without issue, n+1 failed simultaneously.
At this point, stories differ: this was either the beginning of a "no two drives from the same manufacturer" policy or the end of the career of a PHB who vetoed said policy on grounds of excessive cost.
Drive A fails. You pop it out, put in a new disk. Machine starts rebuilding RAID array. This involves a ton of reads from other disks. Under the increased strain, Drive B, which was already on the edge, also fails.
And so on.
I'm no longer a believer in redundant disks. Redundant machines would be better.
RAID5 by chance?
"RAID5 by chance?" How do you know they didn't use it? They might have had multiple failures too close together to rebuild the array.
Sorry for the caps, but I just snapped seeing this comment for the nth time and with so many upvotes.
Would it make sense to use drives from more than one company as they would very different failure characteristics?
What quality OEMs do is make sure they never ship you drives from the same manufacturing batch in one enclosure.
DeskStar is right. They earned the nickname "DeathStar" because they failed so often.
This is probably difficult to take advantage of unless you setup your RAID layout very carefully.
However, forking a new shell to parse "mv foo bar" is more expensive than just using the rename system call. And it's easier to check for errors, and so on.
SQLite is also not as slow as people think it is; you can easily handle 10s of millions of requests per day with it. If your application's semantics require table locks, MySQL and Postgres are not going to magically eliminate competition for locks. It's just that they both pick very weak locking levels by default. (They run fast, but make it easy to corrupt your data. Incidentally, I think they do this not for speed, but so that transactions never abort. Apparently that scares people, even though it's the whole point of transactions. </rant>.)
Most of my production apps are SQLite or BerekelyDB, and they perform great. I am not Google, however.
There are a few other lock types you'll run into as well, but they're typically only seen in narrow, specific cases -- when you're performing maintenance (vacuuming, clustering, &c), indexing, DDL changes, or when you explicitly LOCK a table for whatever reason.
Scaling relational databases to the point of 10s of millions of requests is extremely non-trivial. Unless you can show me personally or can show me evidence otherwise, don't make this claim. You're doing a disservice to the people that have worked countless hours to eke every last millisecond of performance out of MySQL and Postgres.
I added another thread writing as quickly as possible to the mix (7 readers, 1 writer), and this brought the read rate down to about 45/reads per second per thread. Still more than 10 million per day, so technically I am right.
Also, I don't doubt that MySQL and Postgres (and BDB) are both significantly faster than this. It's just that SQLite is not going to guarantee "instant failure" of your project, as the article implies.
(One thing to note -- every time you type a character in Firefox's address bar, you are doing an SQLite query. It is Fast Enough for many, many applications.)
Your peak load/sec is never going to be close to your average load/sec. I've seen be between 2x-5x average load.
Read performance on simple selects is straightforward to scale. Most solutions to that problem just put the data in memory, via the database cache or memcache.
Part of the difficulty is in scaling writes. What if your 1000 queries/sec are all inserts on the same 500M row table? What if they are updates updates on a table that's 10M rows long? This is when you have to make hard decisions about sharding and the like.
I certainly believe that SQLite is "fast enough" for many applications. I also know that many applications are NOT doing 10s of millions of requests a day.
I know nothing of SQLites's internals, but wouldnt it make more sense to parse the query once and then store a compiled version of the query for subsequent lookups? Like you might do with a regexp?
Also, SQLite, unlike most other databases, is an embedded database which does everything in-process rather than invoking multiple processes.
Simply not true. That level of database activity is bread-and-butter for many people these days. Hundreds of millions, non-trivial in the sense that it will cost you a pile of money, but these days, basically straightforward, especially if you've done it before. Thousands of millions, now that's where things get interesting.
Oh, and he's talking about requests. The trading systems I've worked on can and do handle hundreds of millions of commits per day.
The forking thing is more of the same. Copy-on-write means it's not going to balloon your memory unless some function turns that shared rss into private. It isn't something that you want to do a lot of, though.
What you don't want is to have anything you use more than once a minute in swap, and preferably only the stuff you don't plan on using for an hour (i.e. not any time soon). That probably means you want your main application and web server in memory all the time. If there are pieces of it that are unused and you're hitting a resource cap then you have something mis-configured.
RAM is also dirt cheap right now, making it often easier to add RAM than to optimize slightly sloppy code.
Once you grasp the fundamentals, most DNS problems become completely transparent, but I've seen people spend weeks trying to solve DNS problems due to lack of understanding.
tools like dnstracer (http://www.mavetju.org/unix/dnstracer.php) and dig are very useful for diagnosing these issues, but you really need to understand the fundamentals of how dns and delegation work.
Whenever I hear, "X is slow!", my first response tends to be, "Are you sure it's not DNS instead of X?" About 50% of the time, DNS misconfiguration is at least a component of the problem if not the entirety of it.
DNS touches every service on the Internet. If you get it wrong, you break every service, sometimes in subtle ways.
djbdns doesn't make the person doing the configuration understand those things any better than BIND does.
Meanwhile, the reason I brought up djbdns is that it has automatic best-practice behavior for a lot of basic DNS config issues, like matching PTR records, or setting up reasonable TTLs and SOA field values. It's right there in the data format.
That's apart from where the dots and semicolons go in the database files. Which yeah is a mistake I managed to make a lot, even after writing a dynamic DNS server for our ISP, and is a problem that djbdns makes go away completely.
Sorry. You're going to have to add this to the list of topics I'm insufferable about. ;)
Hey, what a coincidence...me too!
I've also been a contributor on an alternative DNS server in the distant past...back when it was fashionable to hate BIND.
BIND also has reasonable defaults, and there are tools (Webmin, for example, just to throw something out there completely and utterly at random) that make it easy to get all of the "reasonable default" things you've mentioned right.
You're going to have to add this to the list of topics I'm insufferable about.
OK. But you're still wrong.
I'm not needling you, Joe. I just genuinely don't know what --- apart from picking up the delegation for your domain name --- the common DNS errors are that djbdns doesn't address. I'm not saying there aren't any; I'm just asking you to say what they are.
Also, shouldn't you just go ahead and support djbdns? It always seemed to me like Bernstein went way out of his way to make it easy to drive tinydns from other programs. Do you really get more requests for PowerDNS than djbdns?
PTR requires delegation from the authoritative server for the IP address, and this is wholly separate from the registrar process. Many folks don't understand how PTR works vs. how standard A records work. If you don't understand it, you don't know you need to talk to your hosting provider about this delegation.
Many folks also don't understand how the PTR is used in validating sending mail servers. And many folks have a hard time grasping that you only configure one PTR for each IP address (see? I'm talking fundamental misunderstandings of DNS here, and no DNS server or GUI can fix them, though gods know we've tried). Finally, you'd be surprised how many people intentionally configure a PTR that does not resolve in the other direction, thus breaking mail.
Most folks are dealing with many domains on a single IP address. This further confuses folks with regard to the PTR. Which name for the IP? Many folks are baffled. A basic understanding of DNS would resolve this (no pun intended). As I mentioned above, with an understanding of the way it works, most DNS problems are completely transparent.
Default TTL doesn't solve a user not understanding that when they change IP addresses, it can take as much as days for the rest of the world to know about those changes (first world will know within hours; third world caches much longer and occasionally ignores TTL). Again, I'm talking about users fundamentally not understanding how DNS works and that it is a heavily cached protocol. If you don't understand those things, having a default TTL doesn't help you when it's time to migrate to a new data center. Webmin also provides default ttl values.
Serial numbers. I have never once mentioned serial numbers in this discussion. I can't remember the last time I've heard someone who had a problem with serial numbers...actually, now that I'm thinking of it, I do remember. A user (I think a recovering djbdns user, actually) who thought they knew more than they really did wanted to argue about RFCs and whether an incrementing number or a datestamp was valid (I don't remember which, but the form he was arguing was invalid was also provably valid, based on examples found within the RFC). But, this is the kind of silly stuff that I'm not talking about. Syntax is irrelevant to my suggestion that people read DNS and BIND. Oh, and Webmin handles serial numbers in BIND, in just about any format that is valid according to the RFC.
Some others issues that I see people make:
Incorrect NS, MX, etc. records. djbdns doesn't babysit the records to make sure they point to working servers. Users occasionally try to use IPs directly in these records. BIND will error on this, as expected, but it still confuses people who don't grasp basic concepts.
Performance. People who don't understand DNS often think they can get out of it by relying on someone else for their DNS service. Like the overworked DNS servers provided free with their registrar or hosting account. On top of this, these are also the same people that make all the dumb mistakes mentioned above, compounding their confusion into a perfect storm of "nothing was working, so I reinstalled my operating system and now nothing works!" stupidity.
Fundamental misunderstandings of the protocol and the way DNS does what it does are what I'm talking about. Some DNS servers are easier to configure than others...I'll concede that. Doesn't matter. When you set out on a journey, you kinda have to have a vague notion of where you're going, or you probably aren't going to get there, even if you have a GPS.
Also, shouldn't you just go ahead and support djbdns?
Why? All the security issues that plagued BIND 4, and to some degree BIND 8, have mostly been long resolved, and BIND 9 is, by far, the world's most popular DNS server. djbdns these days is obscure, at best. You and I remember the time when it was relevant...but I'm involved enough in this aspect of the industry to know that the ship has long since sailed for djbdns; new users are not adopting djbdns on any scale worth talking about, though plenty of cantankerous old-timers keep hanging on. We get maybe one person every six months to ask about djbdns. There are so many things ahead of djbdns in our queue that it will likely never make it to the head.
nginx is our next major endeavor (which gets requested every couple of days, and has real advantages over Apache for some of our real paying customers). nginx is actually a much bigger job than djbdns would be, since DNS is already fully abstracted out of Virtualmin because of the PowerDNS module, while web service with Apache is pretty deeply ingrained. But I can pretty clearly see an upside for us in supporting it that I do not see with djbdns.
Also, the historic license stupidity of djb software has guaranteed that all of it would fall into obscurity. I guess most of it is public domain now, but I just don't see much new interest in their use.
Do you really get more requests for PowerDNS than djbdns?
Yes, we did, but we also got money for PowerDNS support. It wouldn't have happened without it being fully sponsored by a hosting provider that wanted to use PowerDNS. We don't do contract work any more, but we could help you find a developer to add djbdns support, if you'd like to see it in Webmin and Virtualmin. Or, if you know Perl, we'd be happy to assist you with getting up to speed on the module API.
Actually, it looks like there is already a third party Webmin module, of some sort, for djbdns. Though it looks like it hasn't been updated in years, and doesn't have a lot of discussion on the web about it.
Just so you don't think we hate djb, we do fully support qmail, though I happen to prefer Postfix by a large margin (and I doubt we would spend significant time on qmail today...the userbase is a fraction of what it used to be, and we rarely get questions about it).
Anyway, I'm still not really interested in talking about the relative merits of DNS servers. I don't care. They all work acceptably well, at this point, and I rarely see a BIND configuration file (I don't have a problem with BIND configuration files, but why bother? I've got tools for that).
I just wanted people to understand the most fundamental building block of the Internet a little better, and I pointed to the best book on the subject. Complaining about DNS and BIND because it uses BIND configuration files for the examples is like complaining about TAoCP because it uses MIX rather than Python (or whatever) for the examples. There is no better book on the subject of DNS, or if there is, I have never seen it.
Serial numbers: I'm not sure if BIND 9 just made this problem go away, but back in the dark ages when I managed BIND for a couple thousand zones, you had to manually bump the serial for every change you made. AXFR relies on the serial in the SOA to decide whether to propagate a change.
Performance: I think people are way out of whack on the performance implications of DNS. I've spent years hammering DNS servers, and while it's true that BIND 8's terrible memory management will drag down the performance of the rest of a server, the actual request latency BIND adds answering queries is so low that you can use it to statistically detect whether sniffers are running, as a proxy for user->kernel latency (which shoots up when your ethernet device takes the hardware MAC address filters off to go into promiscuous mode). So, for whatever it's worth: I don't buy that there are serious performance problems with server selecton.
Supporting djbdns: meh, I was curious, not challenging you. Obviously don't do it if your customers aren't asking you. You're wrong about the security implications of BIND, though.
Thanks again for replying in such detail.
Performance:...I don't buy that there are serious performance problems with server selecton.
I generally agree. DNS is an incredibly low demand task, and even a modest server can serve millions of queries per day. No argument there.
That said, DNS is a latency cost that echoes through every service. And some free DNS services are notably slower than a server you run yourself would be. Doubling the latency of DNS queries can add measurable latency to a first load (where you might lookup a dozen names for images, media content, ads, etc.). People do care about shaving a second off of a page load time.
But, yeah, performance is mostly irrelevant. The bigger problem is just that we see folks using those kinds of services as a substitute for actually understanding DNS. We get a disproportionate number of queries from users using third party DNS services, and they tend to be of the really stupid, has no concept of DNS at all, variety.
You're wrong about the security implications of BIND, though.
I will certainly not argue with you on security questions, since it is not my area, and I have a lot of respect for your opinion on security issues.
But, I was unaware of any exploits in current BIND versions. According to the BIND security advisories page there have been two security advisories this year; one a DoS and the other was actually an issue in OpenSSL. And, most importantly, there have been no root or user-level access exploits. That seems to me to be a pretty good security record.
OpenSSH (which we all trust and consider "secure", I guess?) tends to have about one major security issue per year...so if OpenSSH is considered secure, then it seems fair to consider BIND pretty secure, as well. There are probably "more secure" DNS servers (and djbdns may be one of them), but I'm not really competent to make those kinds of judgements, so I trust my OS vendors to choose reasonable defaults for this kind of thing. And, BIND is the default DNS server on every OS I use. If it really had a poor security history, I would probably be spending time worrying about it, or contributing on an alternative DNS server project, as I did back when BIND did have a poor security record.
What security implications do you consider using BIND to have currently?
The Frisch book is probably a great start for general concepts, though:
It's been ten or more years since I've read it, but it's been updated every few years, and is probably due for a new edition any day now actually. The concepts it covers (backups with standard UNIX tools, for example) are somewhat timeless. It's probably not required reading, though, if you don't actually want to be a system administrator.
Most books are just re-hashes of the documentation for a particular service, like Apache or Postfix or Sendmail or whatever, so I don't really have any strong opinions in that direction. When I had problems with Sendmail in the distant past, I found the O'Reilly book useful, but I've never needed third party docs for Postfix, which I've been using for the past eight years or so. Books about specific software are also often quickly dated by new versions of the software.
So, that's a long-winded way of saying, "Not really."
However, this default configuration needs to be tuned to allow you to take advantage of the hardware - if you have generous hardware. Otherwise, you will wonder why your web sites are extremely unresponsive, yet the server load stands at something relatively unimpressive.
I found this out the first time a blog post on one of my servers got digg'd.
I know lots of companies doing TDD but that have never done a full test restore from their backups.
Using a proxy like nginx or varnish to serve static files (and even dynamic data) if you have the proper KeepAlive and Nagle bits flipped can save you a lot of server resources at the application layer.
I think it's simpler/easier (maybe faster) to serve content from a separate sub-domain (static.site.com or whatever). Using a reverse proxy works too, but unless you're caching dynamic content it's probably no benefit and it's less efficient.
Under certain circumstances serving static files from separate hostnames can be beneficial as HTTP clients are supposed to limit the number of simultaneous connections per hostname.
Having a lightweight proxy that keeps connections alive on the client end but cuts them off between themselves and the application layer is the bigger win all round for many real-world web loads.
Indeed, my first idea was indeed to install nginx for images really quick. However, I have no experience with nginx. Thankfully, we had a spare server and I offloaded the images to there for now... Throwing more hardware at the problem usually works.
However, sqlite should never be used in production. It is important to remember that sqlite is single flat file, which means any operation requires a global lock
I don't know jack about sqlite's locking architecture or scalability, but this statement is just silly. There are a conceptually infinite number of ways to make fine-grained locking work on a single file, both within a single process, a single host, or across a network. Maybe the author is thinking fcntl() locking is somehow the only option.
I guess the corrolary to this article has to be "Don't let your startup's sysadmins diagnose development-side issues."
An EXCLUSIVE lock is needed in order to write to the database file. Only one EXCLUSIVE lock is allowed on the file and no other locks of any kind are allowed to coexist with an EXCLUSIVE lock. In order to maximize concurrency, SQLite works to minimize the amount of time that EXCLUSIVE locks are held.
But compared to something like MySQL w/ InnoDB (or postgres, or Cassandra, or BerkeleyDB), which all have something closer to Row Level or Page Level locking, SQLite's concurrency for server side applications is a serious deficiency.
Yes, there are lots of ways to have fine grained locking, SQLite just doesn't do them.
You'll have to add "Make sure your developers can diagnose development-side issues" to the list as well. Most web app developers I have met do not know how to diagnose problems, or simply defer immediately to the sysadmins if there's no syntax errors or logs to refer to.
I think I see your problem.
My microwave can probably serve that traffic.
Also, there is no "system" system call. "system" is a library call that forks and execs a shell to evaluate and execute a string. Having a sys admin that doesn't know the difference may be the biggest sys admin mistake you could make. There are a lot of library wrappers for system calls, but these are documented in section 2 of the man pages as system calls.
I nominate this post as the most distressingly important bit of information I've ever received at 2:43 AM in the morning.
Now the question: what can I do in Ruby to avoid the four calls a second or so I'm currently making to system(big_command_to_invoke_imagemagick) ?
The author is being stupid: the size of the process that you're forking doesn't really matter (it might start to matter if you didn't call exec() or exit() right after you forked, but that's not the case: you're just execing another program, which replaces the current process in memory). VERY little is copied; fork is defined to have copy-on-write semantics for the process's address space.
At scale, you might care about the imagemagick startup latency, but not the forking.
May not still be ideal... interested to hear other people's ideas.
Sysadmin should be able to think in terms of data flows, which means memory management, data partitioning, and network stack usage, able to put different types of data into different kinds of storage, and understand the role of cache and how data should be access.
Packages are just a tools.