Follow on: Use DBMS or fs?

mattjaynes · on April 8, 2007

From pg's Viaweb FAQ:

http://www.paulgraham.com/vwfaq.html

"What database did you use?

We didn't use one. We just stored everything in files. The Unix file system is pretty good at not losing your data, especially if you put the files on a Netapp.

It is a common mistake to think of Web-based apps as interfaces to databases. Desktop apps aren't just interfaces to databases; why should Web-based apps be any different? The hard part is not where you store the data, but what the software does.

While we were doing Viaweb, we took a good deal of heat from pseudo-technical people like VCs and industry analysts for not using a database-- and for using cheap Intel boxes running FreeBSD as servers. But when we were getting bought by Yahoo, we found that they also just stored everything in files-- and all their servers were also cheap Intel boxes running FreeBSD.

(During the Bubble, Oracle used to run ads saying that Yahoo ran on Oracle software. I found this hard to believe, so I asked around. It turned out the Yahoo accounting department used Oracle.)"

felipe · on April 8, 2007

In Java, I have used Prevayler in a previous project (about 2 years ago), and I really liked it:

http://sourceforge.net/projects/prevayler/

The reason I'm not using it right now is because Hibernate makes OO mapping so easy and trivial that I honestly don't see the need. But it still a good solution if using a DB is an overkill.

I believe there's a Prevayler port for Ruby.

jaggederest · on April 8, 2007

Even more interesting, happs (http://happs.org) combines a number of things: a prevayler-style in memory persistence layer, twisted-style event-driven programming, and SEDA-style non-blocking IO.

It's built in haskell, which really is a beautiful language if you can wrap your head around it. Much like lisp in that sense.

neilc · on April 8, 2007

http://madeleine.rubyforge.org/ is a Ruby implementation of similar ideas, I believe.

npk · on April 7, 2007

This post is a follow on to: http://news.ycombinator.com/comments?id=10001

Paul Buchheit @ startup school told us to put our heavily accessed, small data, in hash-tables kept in RAM. Paul Graham mentions viaweb used the FreeBSD fs as its database. It seems to me as if these two applications are very different, and cover a broad spectrum of requirements.

Is there somewhere I can read more about this subject? I mean, there is a whole industry developed around databases, they must add value... The typical response to such a query is: what are you trying to do? Well, given the two applications listed above, what would you have recommended? Me? I'd have recommended a database server. Why would you not have? How can I find out more? How do I achieve your level of enlightenment?

n

chris_l · on April 8, 2007

"I mean, there is a whole industry developed around databases, they must add value..."

The traditional RDBMS adds random access by computed criteria, transaction safety, multi-threadedness, a standard query language, ...

If you don't need these, you're likely to suffer a performance hit / programming overhead compared to a system that does not offer them. A filesystem only offers random access by one key (pathname).

Also, the more specialised your needs are, the further down the stack you should start your own code. For a simple web application with a bit of user data, I would probably go with a DB. For your own search engine, you probably want something a little more customized, building on top of a standard fs.

neilc · on April 8, 2007

Another reason to not use a DBMS for a search engine is that typical implementations of transaction-oriented SQL databases are a terrible fit for the performance requirements of a search engine. For example, search engines don't need concurrent writes or ACID transactions, or SQL-like query language; search engines want to optimize for large-scale updates, not small, random writes; typical DBMS index structures (btree) don't work well for search engine indices.

Eric Brewer has an interesting paper that lays out an architecture for a search engine that is consistent with DBMS design principles, but differs significantly in the implementation details:

http://www.cs.berkeley.edu/~brewer/papers/SearchDB.pdf

ustrip · on April 7, 2007

If I would have heavily accessed 'small' data I'd keep them in SQLite database.

npk · on April 7, 2007

Suppose the application was gmail? Obviously, one big sqlite DB is not going to cut it. Someone, somewhere on here, mentioned having a sqlite DB for each user. I can't imagine this is what gmail does. It may have worked in the case of viaweb, but I'm not sure.

mattjaynes · on April 8, 2007

Here's the discussion I think you were thinking of:

http://news.ycombinator.com/comments?id=10001

Based on this article:

http://blog.nanobeepers.com/2007/04/07/infinitely-scalable-framework-with-aws/

randallsquared · on April 8, 2007

If I were building a gmail, I'd be using a maildir-derivative.

npk · on April 8, 2007

All - I'm not posing my question properly, only chris_i and neilc seem to understand my question. A lot of people present software implementations, but the choice of an implementation is step 5, and I'm still hung up on step 2 :)

I'm trying to figure out how people made the decision to use that implementation. What factors were weighed? gmail makes the claim that they store a lot of information in RAM, and don't use a DB server. Any good articles about gmail's architecture?

thank you! n