While generally good advice, sometimes it does make sense to put binary images in the database:
At Loopt, long before maps were included on mobile platforms, we served our own map tiles. These 64x64 pixel PNGs were often less than 100 bytes. Even though we only covered the US at the time, there were upwards of 20 million of these tiny files.
They caused file system destruction. We tried ext3, reiserfs, ufs, ntfs - they all choked. In the best of cases each file took as little as 512 bytes on disk, 5x larger than the data within. Performance was awful, no matter how we bucketed the files in directories or tuned the file systems. Reiser 4 trunk was looking promising, but then Hans was arrested.
The solution was to store the images in a database. They fit in-row without needing blob storage, and the DB indexes were much faster than any of the file systems we tried. Backups took less space and less time.
For one application I was designing, I planned to use Berkeley DB (key-value database) as a content-addressable data-store, ie. hash the binary blobs and use the hash as key. Then the hash would be stored at the application (SQL) database.
For very small files (such as your <100 byte PNGs) it probably wouldn't be worth it, but for larger blobs it be used to avoid the 'millions of files' problem.
I don't know if it would have worked, but I though it was a cute idea.
(note: I'm playing devil's advocate here for storing images/files in the database - this is not something I'd recommend doing without careful thought)
In terms of efficiency, you're only paying a small price on the first hit for an image before the cache comes into play. It won't be as efficient for you as directly serving from S3, but with a properly configured cache, the effect would be negligible.
In terms of complexity, you're looking at a single nginx configuration endpoint of a handful of lines to cache the images on-disk or in memcache for fast serving. In the case where you've got CloudFlare in front of your web app, it's free. Overall it can be less complex, because you can re-use your existing code for talking to the database, rather than adding code to talk to another service like S3.
The real argument against storing big binary data in the database should be based on the argument that databases aren't necessarily always tuned properly for storing things that aren't relational (which the article briefly addresses), and that storing your binary data alongside your relational data could have deleterious effects on the performance of your relational queries.
Incidentally, SQL Server 2008 has a FILESTREAM type which integrates with NTFS to give you streaming access to BLOBs (you can even work with them via Windows file handles). MSDN has a pretty thorough (and I thought interesting) technical article on them. http://msdn.microsoft.com/en-us/library/cc949109(v=sql.100)....
Additionally, there may be an argument for locality here. If every time you're serving an image, you need to look up a bunch of metadata for it, you can benefit from storing it in the same place. Then it's a matter of whether you rather encode the metadata in the image and store it as a file, or encode the image in the metadata and store it as a database entry.
Now if you throw in having to do a little bit of post-processing based on various conditions, then the scales start to tip.
The component to serve the files up may be slightly more complex, but that has to be balanced against the need to have separate high-availability, scalability and backup solutions for your data that is now in two kinds of places: a DB and a file-system.
Of course, it depends on what you are doing, but many databases make working with BLOBs pretty straightforward and I wouldn't rule out the benefits just to follow some over-general rules.
What about plain text or uniform-key, unsalted passwords? Granted, it's not a scalability or reliability concern, but I would put that at number one. It seems like every six months, some impressively large company gets in trouble for doing this.
Storing images in your database is a fantastic idea. You then add in a mod_rewrite rule which checks for the existence of an image in the requested size (with some accepted size format like original_filename500x200.jpg) and if it doesn't, produces and caches that size. Best of both worlds.
The "size of database backups" point has been countered several times on this page effectively so I won't bother but in terms of scaling your app, restoring backups, managing versioned backups, moving your app around and a whole host of other management and maintenance related tasks, being able to completely blow away your cache of image files and have them rebuild on demand is tremendously useful.
Also on the point of "storing thumbnails" this assumes that your thumbnail size requirements never change over time. Using the approach of storing the source image in a database and caching the files on demand means you can change your design and all you need to do is change the size parameters in your URLs, bam, new image thumbnail sizes.
>read/write to a DB is always slower than a filesystem
One shouldn't forget to mention that a filesystem is a DB, albeit one specifically engineered for the purpose of storing files. There's some overhead in doing so, because the filesystem stores a fair bit of metadata about the file. I don't think this argument always holds true.
There was a paper (a little dated now) from MS research which looked into this a little: http://research.microsoft.com/pubs/64525/tr-2006-45.doc. The research is not really conclusive, but it suggests files smaller than 256kb might be better stored in the DB, where anything larger the filesystem performs better. Would be nice to see some more research on the topic.
I suspect that the issue is too complicated for there to be an official threshold which everyone can rely on. It's not just a question of whether the filesystem or database can serve up the data faster. It's also the impact that those large chunks of binary data have on the database engine's cache, and what that's doing to performance globally. Presumably the specifics of that impact will depend heavily on the data access patterns of the application in question.
Not sure how "db security" factors in, but yes it's likely a premature optimization for many users, but these three are some of the most frequently abused RDBMS items I see on a regular basis. And often the major limiting factor in the client's performance.
While it is "raw performance of one part of the system", it's also the hardest aspect of a system to scale properly. Not doing these three things can often get a user months or years further down the line without having to worry about it at all.
It really depends on your situation. Point by point:
>read/write to a DB is always slower than a filesystem
Obviously this is true, but depending on the context, the difference may not be that important.
>your DB backups grow to be huge and more time consuming
This just doesn't hold up to scrutiny. You still have to back the files up, regardless of where you put them. If you keep them in the DB, then backing up the DB automatically backs up the files. If you keep them separately, you're adding complexity to the backup/restore process.
I guess there are times when you keep references to files in your DB but can live with having the files and DB get out of sync if you have to restore from a backup, but that's a design decision, not a hard and fast rule.
>access to the files now requires going through your app and DB layers
Again, there are times when this simply doesn't matter.
Let's not forget that one of the main reasons people use databases is that they simplify concepts like locking. If you start tying your database to some external structure within your file system, you not only have to implement those features yourself, but you also have to coordinate them with the DB's own implementation.
Blobs also beat files when transactional consistency is a bigger concern than performance.
There's also SQL Server's FILESTREAM attribute, which is rather interesting. It tells the engine to store the data as files on the back-end. In theory that combines the best of both worlds, though I haven't had many good chances to poke at it yet.
> You still have to back the files up, regardless of where you put them.
If they are actual files you can do an incremental backup. If they are in the database the backup is just one huge file that has to be backed up every time. Just dealing with the IO takes much longer, not to mention disk storage and possibly network bandwidth.
If you need to restore a single image you are going to need to create a whole database just to hold the backup before pulling out the one image - depending on how much data you have this can take hours. But with a file it's trivial.
>read/write to a DB is always slower than a filesystem
This isn't true and has never been true. If you're writing a lot of records, especially small ones a DB is often faster because the writes are more sequential and the database optimizes how pages are flushed to disk.
Well, I think what is meant here is that for large chunks of binary data, it's faster to go through the file system directly than through the DB. But I just think that it's an irrelevant point in a very significant number of situations.
The logging thing is really really dumb. Having logs in a real database lets you do incredible things (e.g. correlating user feedbacks to random problems you noticed). We use our db-driven logs many times a day to debug issues and find patterns.
Don't be afraid of putting logs in databases and then figuring out how to scale it. It's not that hard and it's very worth it.
Splunk, last I checked, used PostgreSQL for it's backend. While maybe badly worded, my point was more toward the users who have a two column log table with a timestamp and a syslog style line of text. The problem with "advice" posts like these is you can't cover every possible scenario without confusing the less experienced readers. Do I think loggly or splunk cloud should store their logs as flat files? Hell no, their app IS logging. But in the general case of application architecture I don't think it is appropriate or terribly useful for most. I know they say "never say never", but I felt it justified when I believe it's a bad idea in 99% of apps.
It's really more about using the right tool for the job, Splunk, Loggly, NoSQL solutions, etc. are a MUCH better fit than an RDBMS for this sort of work.
My time as a system administrator makes me rather fond of logging as text into files. They're simple, easy to scale, to rotate, to compress and to throw away. One variation on this theme that I read about recently is to produce log lines as JSON objects rather than more "traditional" printf-formatted text.
This gives machine readability but also makes for really simple tools for extracting whatever "traditional" log lines you like. (see: https://github.com/trentm/json) You can also easily add new fields without breaking existing log parsing code.
You can still store files (images, documents, etc...) in the database as a backup.
Let's imagine the following scenario. If your database crash and you lose everything there is good chance you can't run your application any more (if you don't have backups), but you still have your images! great! what do you do with them. If you lose your files but not the database, cool you still have the paths of all your files that have been lost in the database, so now what?!
The idea is to keep files (such as images) in the database to be able to pull them down on the file system and backup your database. Plus it's not always stored in the database, for example with MSSQL you can use file stream to store document.
We did this for one project because we wanted transactional semantics when it came to image files and other related information, and this made it easy. We didn't serve images from the db - instead we created image files on the web servers from what was in the database.
Article says that logging to a non-production db "isn't a HORRIBLE idea" indicating that it isn't really a good idea. I used to work with a system where we routinely made complex queries against the log database. Parsing the log line before inserting to the db, putting relevant values into indexed columns seemed to me like a pretty good idea and provided us with a really easy-to-use and a rather fast diagnostics tool that no one had to learn how to use.
EDIT: Of course you have to make sure you don't do any writes that block the main operation, e.g. by using message queues.
> read/write to a DB is always slower than a filesystem
I always thought it was the other way around. Databases can cache data and keep files opened (keep the file locked or something, I don't know).
Also my own benchmarks on an old laptop (MySQL versus FAT32) turned out that MySQL was actually faster, even if you still had to connect (no persistent connection). I should try this again on a decent server and use NTFS or Ext3 (see if that makes much difference), but in general I can't really see how a database can be slower--given that you use it for what a database should be used for. Of course if you read a movie from it once a day, it's not gonna cache that and the file handle is not going to help much either. Then MySQL is only an obstruction to the data which it has to read from the disk anyway. Hmm or is that what he meant here?
I thought the issues with blobs in databases is that when you pull a file out of a DB to serve it on a website, you get something like this:
1. Database pulls file off disk, or out of cache, reads into memory buffer
2. Memory buffer is copied to a new buffer in SQL client library (over a socket or whatever)
3. Memory buffer is copied to a new buffer in your scripting language
4. Which is then written to the web server's memory buffer, where it gets served to the client.
Which ends up being a whole lot of copying (and a whole lot of work even when the data is cached in the DB) compared to:
1. Web server mmap's file
2. Web server streams file out to client
Is this situation no longer true? Are there newer tools that handle "serve a file from a BLOB" more efficiently? Do they also handle things like seeking (resumes) in a non-wasteful manner? Or is memory performance so good and the blobs we're talking about so small (bytes, not kilobytes) that it doesn't matter?
I'm by no means a DB expert, but that was basically my understanding of the disadvantages as well. As other people have pointed out, you can use caching to fix a lot of those performance issues.
Of course, the author of the article also believes that every database is hooked to a web server that will be constantly pushing its data to hundreds of thousands/millions of users, that every 10 milliseconds lost means another 1% of your users abandon ship, and that if you claim to have some other circumstances, you are pleading "unique snowflake" status and should just get shut up and follow his advice anyway.
There's a very strong use case for binary data: bitmaps. You can implement a bloom filter or a set as a bitmap field on a relation, and then perform quick searching and joining using the bitwise comparison operators. This is my main reason for wanting binary field support in Django—at the moment Django attempts to decode every string as UTF-8.
"read/write to a DB is always slower than a filesystem"
This is patently false, reads/writes on a DB are often much faster which is why we use DB systems.
Even image data especially on write can be faster when written to a DB vs. flat file because of sequential writes. Small files are slow to write on filesystems which is why we have DBs.
In most cases with a high read / low write load it's certainly a better idea to put thumbnails on a CDN for a variety of reasons but to think that writing small chunks of data to a file system is always faster than a DB is to misunderstand what DBs do and when and why they are useful.
> reads/writes on a DB are often much faster which is why we use DB systems.
Not really. We use db's because file systems aren't ACID and we like to not lose data, and we like to be able to query data, and because we like multiple machines having access to that data. But the file system is generally much faster, especially when appending something like a log (yea, db's do that too, that's beside the point).
> but to think that writing small chunks of data to a file system is always faster than a DB is to misunderstand what DBs do and when and why they are useful.
There was a article published by Microsoft where they tested BLOB storage in a database with different size files. They found that there was actually an inflection point where the relational databases performed faster than the NTFS file system.
Write a program that creates 100,000 1k-8k byte files.
Write a program that creates 100,000 1k-8k byte files in a zip file with no compression.
Run multiple copies of these programs, see which writes files faster.
Keep running the program(s) until the drive is 90% full. (Put the files in different directories if you feel this is fair.)
Run the programs again.
You'll notice that on a decent 7200 RPM drive the ZIP code is writing about 40-50MB/sec on a good drive.
You'll probably be lucky to get 4-5MB/sec from the file code (unless you're on an SSD, even then it will probably be half)
If you feel like it also create a database and do the same with a file table like "CREATE TABLE data (id int NOT NULL PRIMARY KEY, data varbinary(8000))"
Inject 1000 to 2000 rows between commits.
The DB should also outperform the file code, but probably not the ZIP because it has to write the data twice.
The reason is that writing file to disk quickly produces a random IO pattern as the FS tries to allocate files all over the disk. Also, everytime you create a file you have to update a whole bunch of file descriptors, a modern file system is a B-tree and journal. Sounds awfully like a database right? Except this database is optimized for large files, at least 32-64kbytes and preferably in the hundreds of MB.
You're biassing the test towards small files and particular file systems that don't deal with them as well, and edge cases where you think the db is highly optimized; I'd imagine that's because you know outside those bounds the raw file system will kick its butt, especially given the tools that let you avoid the application entirely by streaming the file from disk.
edit: It's not that your point is wrong by any means, it's just that you're arguing against a point that he wasn't really emphasising. I don't think anyone, including him, would try to tell you that constant appending to a log file would be faster through a db than just a direct write to the file.
I have to disagree with the logfiles, I work for a really large domain hosting company so our logs are quite vast, dealing with spam issues can amount to a large amount of time being used just looking for the information you need.
we have greylog in place to help query these logs from our syslog server, it has helped us identify alot of different issues since we have put it in place, not to be rude to the poster but I dont think you have the experience with really big log files and how painful it can be (or how long it can take) when you have to grep through this information by hand, zcating an old compressed log file can take up to ten minutes before you even start seeing data let alone relevant data.
I'm trying to understand why storing your logs in the production database is such a bad idea. Is it because of potential disk contention? Lock issues? Could you be more specific about why it is a stupid idea?
Probably just disk contention - locking is unlikely to be a problem since you're just adding rows.
The disk contention issue could be a problem because writes are hard to scale, but depending on the regularity of logging and the indexing used (I'm assuming it's done by timestamp here) it's probably not actually the biggest of deals - you should end up mostly doing append-only writes. In oracle, for example, all that'll happen is that for each log transaction some stuff will get altered in memory (i.e. a block of the table will get appended to, and an append will occur on the index) and the roll forward log will get written to on disk. Every so often the table and index blocks will get written to disk, but that will be much less than once per transaction in an appending case.
That said, overall I'd still recommend having the logging done on a separate system, or at least having profiling set up on your log table/user/whatever such that logging can't eat your production server if someone decides to excessively ramp up the log levels.
This is absolutely a good thing to do, but it doesn't make you safe - the disks running the database's redo logs are the ones that get hammered in a logging scenario (since the writes to the table and indexes are largely appending, and don't have to be written to disk to commit a transaction). These logs are usually global to the DB instance, so you can't easily (to my knowledge) separate them out. This being the case, they can significantly affect the performance of the rest of your system if too much logging starts happening.
Database is a pretty broad term these days. Doesn't this depend on the database we're talking about?
Something like Big Table is pretty good at storing small blobs. It's also pretty good at scaling, so you should never really have to delete "usage statistics, metrics, GPS locations, session data" or anything really.
Regarding logs, I completely agree. Something like tornado + scribe + s3 works quite well. (check out https://github.com/gregbayer/logcatcher) But then EMR + Scribe can be used to access it like a db.
While I fully agree that storing log data in the database isn't usually a good idea, I would never recommend Logly to somebody. It may just be my experience, but the search facilities and stability left a lot to be desired - it is one of very few web apps that consistently face me the Chrome "oh snap" js crash.
IMO Papertrail is a much better service with the exact same MO.
I'm ashamed to say I used to use MySQL to store serialized PHP arrays. It seemed like a good idea until I needed to run a query using a string within the array. I "solved" the problem by writing a MySQL function similar to ExtractValue().
Ugh. This is so wrong, even though some of the arguments are valid criticisms.
Why is it wrong? Transactions and referential integrity. Without being able to guarantee transactions that update data in the database along with any other data, non-DB file-based or not, then you chance issues with referential integrity. Even if you have a transactional infrastructure that you would bet your life on, something could happen like a partial restore that restores the file system but not the database.
It isn't that it is impossible to have a working transactional infrastructure without storing these things in the database, it is that it is much easier to store it in the database, and sometimes making things easy can save your ass.
I used to do custom SharePoint development using SharePoint and MOSS 2007 from Microsoft. Fun developing experience. I was doing document management for some rather large companies. Guess where Microsoft put all of the images, word, excel,powerpoint presentations, email and whatever else! In a blob in an SQL 2005 server database. It was amazing how fast those hard drives could fill up with each document and all of it's revisions being stored there. There was talk that MS might eventually add external blob support but I quit long before that if it ever happened at all. If anyone asks me now I simply refer them to other document management servers.
Strongly disagree on logging to database being inferior to flat files or Loggly/Splunk. The first point is good: you shouldn't put them in the same place as your production data, agreed. Then he goes on to recommend not logging into a database at all, which is just silly.
Map/reduce on log data is absolutely wonderful and opens new worlds of answers that you can only dream of. A Hive workflow is fantastic for parsing log data. You can answer any question, and even GRAPH any answer from your logs. The insight gained from capturing and processing logs can do wonders for your performance problems, ideas for new uses of your code, and so on.
I hate any approach to engineering that says "you'll never need it, so don't do it"; do it, and discover the great things that some things unlock for you. Burying your head in the sand on this point just limits your ability to unlock unique insight and gain an edge.
I particularly disagree with this because of the flourish at the end of article, where he says there are no exceptions to his advice. Flatly wrong, and if you tell me "don't do it, trust me," I lose a bit of respect for you as a communicator.
I was kinda going there too, while I agree that long term long storage is much better done in a Hadoop style system I used 24hr log grabs to provide visibility into bad actor behavior over those periods. As a search engine its not easy to see someone doing a .05 query/sec scrape but over 24 hrs those 4,000 queries in alphabetical order really stand out :-)
So putting a section of time from a log into a data base for generating some reports and then dumping it, I don't have a problem with that.
Yeah I think a LOT of the criticism of my post would have been cleared up if I had specified RDBMS more clearly. In my experience, most people simply put a logline with a timestamp column in a table and then rarely if ever reference them except to tail/search through which is unnecessary workload on the DB in most cases.
Now I'm curious: Assume for the sake of argument that we're dealing with an app on a single physical machine, no separate DB server. If you wanted to have production data and logs both in, say, Postgres, would you run two separate Postgres daemons, or is there some saner way of handling this?
It's not a good idea to run more than one database on the same physical machine. Especially a well-tuned database. The database always thinks it is the only and most important process on your machine. If you run more than one, they will constantly be contending with each other for the same I/O bandwidth. You will not see 1/N performance with N databases on the same machine, you will see something much worse.
On top of that, it's a bad idea because, where are you going to put your database logs? In the database? What happens when you run out of disk? All your applications are going to be trying desperately to log that they're out of space, and your database won't have anywhere to put those messages. This is why /var/log is traditionally a separate filesystem in Unix. But if you log to the database, unless you go through some hoops (tablespaces) they're all going to the same partition and the same physical disk. Believe me, these scenarios happen in real life, and usually on a weekend. Even worse, some applications (though not Postgres, I think) will try and hold onto pending writes until there is free disk, so the moment you free up space, you lose it. You don't want to have to shutdown the production database just because your logs ate up a bunch of space.
The right thing to do is let your database have its own machine (yes, physical machine) and its own disks. Ideally, your Postgres install has separate physical disks for the data and the WAL, and I'd probably put the regular logs on another disk as well if I could afford it. You'll be shocked how much better it will perform, too. Don't fool yourself. A bunch of VPSes is not the same as real hardware running a real database like Postgres.
I don't see why two daemons are necessary, simply having a separate database (in the CREATE DATABASE sense) should be sufficient to not make a big mess. There are lots of fun optimizations you can do like move its storage location to a separate disk to reduce IO contention. You could also avoid making indices on live log tables to improve INSERT performance. You could add an async buffering layer and do bulk-inserts which have much better performance.
If you're sufficiently careful, I imagine you could find similar performance with PostgreSQL-based logging as you would with a rotating flatfile log. Except you wouldn't need to worry about consolidating and parsing them.
Logging isn't ideal in traditional databases, because then you have to scale writes. Scaling reads is fairly easy in a traditional database, but scaling writes is much harder compared to something like HBase or Cassandra
Image storage in a traditional database will always suffer compared to other options. Filesystems are faster and easier to use, and things like S3 match requirements better in some cases.
I'm not really seeing anyone argue that ephemeral data is better in a Database than something like redis.
Cassandra can do over a million writes per second because of a combination of its horizontal scaling and eventual consistency. I expect HBase could scale similarly.
By comparison, people write about scaling MySQL to 25000 writes per second
It's very hard to get a single server to scale multiple orders of magnitude higher (even with FusionIO or something). Getting around that single master server for writes is possible, but either complicated or expensive. Even then I doubt you will get to 1 million writes per second without sharding, and if you are going to shard you might as well go to a NoSQL solution built for the problem.
That is indeed extremely impressive! I hope I didn't come across as meaning that NoSQL solutions aren't appropriate - it's very hard for me to say this anything about these solutions, as I've got next to no experience with this technology.
I am very interested in the first article though. It mentions number of instances in an EC2 availability zone - what does this mean? Is this similar to a clustering/load balanced solutions in traditional RDBMSes?
The automated tooling that Netflix has developed lets us quickly deploy large scale Cassandra clusters, in this case a few clicks on a web page and about an hour to go from nothing to a very large Cassandra cluster consisting of 288 medium sized instances, with 96 instances in each of three EC2 availability zones in the US-East region. Using an additional 60 instances as clients running the stress program we ran a workload of 1.1 million client writes per second.
Edit: Argh! Sorry, I just didn't read far enough before I posted. They seem to be being powered by some fairly heavy duty hardware. Is there a comparitive study using Oracle RAC or something like this?