

Ask HN:  Why do web frameworks always use database? - prewett

I'm designing an information-heavy web site (static content, no user content), and have been wondering the best way to do it.  Drupal, django, RoR all get the information through a database.  It seems to me like this has a number of problems:<p>* All those database lookups are going to be slow<p>* All your data is locked up in a binary file<p>* How do you update your site?  It doesn't seem like you can easily merge two databases, or update just the change deltas.<p>Instead, if all the content lives in separate files with other files telling how to put it together, then you can use a compiler (of sorts) to generate the site as a bunch of static pages for the site.  The advantages would seem to be:<p>* Fast to serve content<p>* Easy to redesign (edit files, re-run `make` on the static pages)<p>* Easy to update the server:  cvs update -d -P; make<p>* If there is user content, CGI could add files (assuming availability of locking).  Then a backup would be as simple as `cvs commit`.<p>* Rolling back after an update gone bad would be easy.<p>The drawbacks seem to be minimal:<p>* Requires extra disk space (cheap)<p>* CPU-intensive to change site layout (infrequent)<p>* CPU-intensive to add content (presumably rarer than reading content, which is fast)<p>So am I missing something?  Why are databases so popular, given that they seem to be slower, and harder to update/backup?<p>[Edit:  slightly better formatting]
======
yu
Taking 'information heavy' to mean large data size/ volume. File system can be
seen as a basic database. When data access paths are not pre-defined e.g. URL,
slug, tag, etc., a systematic way to organize and query is required. Three
examples: (a) not knowing what user may query, text indexing today almost
index every words. (b) given predefined access path, NNTP have used mostly
files. (c) mail servers have both. IMO, in general, addressing domain specific
requirements, database systems (relational, logical, network...) are not
slower, harder to update/ backup. As pointed out in another comment, disk IO
is slow. Both file and database systems feature various ways to speed that.
Understanding requirements, then architect, design, and choose a solution
stack carefully.

------
byoung2
_I'm designing an information-heavy web site (static content, no user
content), and have been wondering the best way to do it._

Look into Movable Type (<http://movabletype.com/>). It is written in Perl and
it writes static html files. I did the redesign of <http://www.steves-
digicams.com/> in Movable Type 4.23 and there are over 12,000 pages. Movable
Type generates the static files at publish time in about as much time as
Wordpress takes.

A big advantage of using static files on Steve's Digicams is that it is very
easy to replicate across a load-balanced network of servers without the hassle
of having to load-balance MySQL with a master-slave configuration.

~~~
prewett
Thanks for pointing me to MT. I looked at it a bit, and it seems promising. It
looks like it stores everything in a database, and generates the static files
from that. How do you deal with a major site update? Would you copy the
database to the dev machine, do the update, and then have some SQL magic to
merge in only the parts that have changed? (Since you can't copy it back, as
then you'd lose all the forum posts since the copy-to-dev)

------
ismarc
Disk I/O is a huge limiting factor in a filesystem based approach to data
persistence. Using a database for the persistence of data allows the db to
schedule flushing the state to disk, independent of the actual requests. The
situation you're specifically talking about, where there's nothing but static
pages, there's no point in using a framework that's designed around building
dynamic pages. You could just as easily write/use a templating system that
takes content files and generates the HTML files from that content, and just
run it when you have new content.

~~~
prewett
Wouldn't disk I/O be equally bad for a database? I assume the database gets
cached in memory (as much as possible), but I would think that the file cache
would have the same effect.

~~~
ismarc
The database can make all the disk I/O operations asynchronous because it
provides an interface on top of file system itself. What you had listed would
require either your own implementation (which you'd be better off using
existing DB/Berkeley DB systems) of similar functionality, or actual disk I/O.

