
Follow on: Use DBMS or fs? - npk

======
mattjaynes
From pg's Viaweb FAQ:

<http://www.paulgraham.com/vwfaq.html>

"What database did you use?

We didn't use one. We just stored everything in files. The Unix file system is
pretty good at not losing your data, especially if you put the files on a
Netapp.

It is a common mistake to think of Web-based apps as interfaces to databases.
Desktop apps aren't just interfaces to databases; why should Web-based apps be
any different? The hard part is not where you store the data, but what the
software does.

While we were doing Viaweb, we took a good deal of heat from pseudo-technical
people like VCs and industry analysts for not using a database-- and for using
cheap Intel boxes running FreeBSD as servers. But when we were getting bought
by Yahoo, we found that they also just stored everything in files-- and all
their servers were also cheap Intel boxes running FreeBSD.

(During the Bubble, Oracle used to run ads saying that Yahoo ran on Oracle
software. I found this hard to believe, so I asked around. It turned out the
Yahoo accounting department used Oracle.)"

------
felipe
In Java, I have used Prevayler in a previous project (about 2 years ago), and
I really liked it:

<http://sourceforge.net/projects/prevayler/>

The reason I'm not using it right now is because Hibernate makes OO mapping so
easy and trivial that I honestly don't see the need. But it still a good
solution if using a DB is an overkill.

I believe there's a Prevayler port for Ruby.

~~~
jaggederest
Even more interesting, happs (<http://happs.org)> combines a number of things:
a prevayler-style in memory persistence layer, twisted-style event-driven
programming, and SEDA-style non-blocking IO.

It's built in haskell, which really is a beautiful language if you can wrap
your head around it. Much like lisp in that sense.

------
npk
This post is a follow on to: <http://news.ycombinator.com/comments?id=10001>

Paul Buchheit @ startup school told us to put our heavily accessed, small
data, in hash-tables kept in RAM. Paul Graham mentions viaweb used the FreeBSD
fs as its database. It seems to me as if these two applications are very
different, and cover a broad spectrum of requirements.

Is there somewhere I can read more about this subject? I mean, there is a
whole industry developed around databases, they must add value... The typical
response to such a query is: what are you trying to do? Well, given the two
applications listed above, what would you have recommended? Me? I'd have
recommended a database server. Why would you not have? How can I find out
more? How do I achieve your level of enlightenment?

n

~~~
chris_l
"I mean, there is a whole industry developed around databases, they must add
value..."

The traditional RDBMS adds random access by computed criteria, transaction
safety, multi-threadedness, a standard query language, ...

If you don't need these, you're likely to suffer a performance hit /
programming overhead compared to a system that does not offer them. A
filesystem only offers random access by one key (pathname).

Also, the more specialised your needs are, the further down the stack you
should start your own code. For a simple web application with a bit of user
data, I would probably go with a DB. For your own search engine, you probably
want something a little more customized, building on top of a standard fs.

~~~
neilc
Another reason to not use a DBMS for a search engine is that typical
implementations of transaction-oriented SQL databases are a _terrible_ fit for
the performance requirements of a search engine. For example, search engines
don't need concurrent writes or ACID transactions, or SQL-like query language;
search engines want to optimize for large-scale updates, not small, random
writes; typical DBMS index structures (btree) don't work well for search
engine indices.

Eric Brewer has an interesting paper that lays out an architecture for a
search engine that is consistent with DBMS design principles, but differs
significantly in the implementation details:

<http://www.cs.berkeley.edu/~brewer/papers/SearchDB.pdf>

------
npk
All - I'm not posing my question properly, only chris_i and neilc seem to
understand my question. A lot of people present software implementations, but
the choice of an implementation is step 5, and I'm still hung up on step 2 :)

I'm trying to figure out how people made the decision to use that
implementation. What factors were weighed? gmail makes the claim that they
store a lot of information in RAM, and don't use a DB server. Any good
articles about gmail's architecture?

thank you! n

