
Viaweb FAQ (2004) - tosh
http://www.paulgraham.com/vwfaq.html
======
tannhaeuser
> _What database did you use? We didn 't use one. We just stored everything in
> files. The Unix file system is pretty good at not losing your data, [...] It
> is a common mistake to think of Web-based apps as interfaces to databases
> ..._

This. I mean URLs were designed to encode Unix file names. Today you'd
probably say 'it's a common mistake to think of Web-based apps as interfaces
to "REST APIs"'.

Now the question of all questions is why viaweb was recreated in a non-LISP
language (though probably discussed to death here) ...

~~~
lisper
The unix file system _is_ a database, just not a particularly capable one. It
is a hierarchical key-value database, with the file/directory name being the
key. If your data fits that schema you're golden, otherwise not so much.

~~~
tannhaeuser
True, but a bit reductionist. A Unix file system also gives certain
guarantees. For example, you can atomically rename a complete directory tree
while client program keep accessing the "old" ones, you have a large choice of
excellent SCMs for tracking files, can have networked and/or distributed file
systems, uniform permissions or ACLs ... all of which come in handy if you're
running a web server.

------
bluedino
>> We didn't use one. We just stored everything in files

Meaning what? What kind of file formats? What did you query the data with,
etc?

>> It turned out the Yahoo accounting department used Oracle.

I remember reading some Oracle advertisement (PC Magazine circa 2001) where a
company like Amazon switched to them 'in one day'. It was probably the
marketing department or something.

~~~
sillysaurusx
If it's anything like Hacker News, the way it works is as follows:

1\. on startup, load all items into memory from the files.

2\. whenever an item is changed, save it to disk.

In modern times, you can store each item as a separate .json file, for
example.

With this technique, there is no risk of data corruption. There is a risk of
inconsistency; e.g. if I remember correctly, when you upvote an item, the vote
is saved to disk, then the author's karma is incremented, and finally the
author's profile is saved to disk. If the webserver dies between saving the
vote and saving the karma count, the karma will no longer be the proper value.
Stuff like that. Such things tend not to matter if you design it carefully,
though.

EDIT: I was curious what the order of operations was, so I pulled up HN's old
source code:

    
    
            (unless (or (author user i)
                        (and (is ip i!ip) (~editor user))
                        (is i!type 'pollopt))
              (++ (karma i!by) (case dir up 1 down -1))
              (save-prof i!by))
            (wipe (comment-cache* i!id)))
          (push vote i!votes)
          (save-item i)
          (push (list (seconds) i!id i!by (sitename i!url) dir)
                (uvar user votes))
          (= ((votes* user) i!id) vote)
          (save-votes user)
          (zap [firstn votewindow* _] (uvar user votes))
          (save-prof user)
          (push (cons i!id vote) recent-votes*))))
    
    

The user's karma is incremented / decremented, then the user's profile is
saved; the vote is added to an item's votes, then the item is saved; the vote
is stored in the global votes table, then the votes table is saved; the vote
is added to the user's "votes" list, then the user's profile is saved.

~~~
spenczar5
Do all servers share a file system, or is there some sort of copying done in
the background?

~~~
ubercow13
As far as I remember HN runs on a single core of a single server. I wonder
whether this has changed more recently, though.

------
PaulDavisThe1st
> It is a common mistake to think of Web-based apps as interfaces to
> databases. Desktop apps aren't just interfaces to databases; why should Web-
> based apps be any different? The hard part is not where you store the data,
> but what the software does.

There's an alternate take on this point. Maybe the mistake is to _not_ think
of desktop apps as interfaces to databases (regardless of what the software
actually does).

------
wozer
Is one process per user still viable? It sounds like it would not scale / not
work well with cloud deployments.

~~~
spenczar5
Not viable for handling many users at once on a single unix box, no. Context
switching between the processes starts to dominate your time. It’s an old-
school approach that is really tidy at relatively low concurrency.

In the early 2000s, this area was sometimes called the “C10k problem” - can
you handle 10,000 concurrent connections on one machine? See
[https://en.m.wikipedia.org/wiki/C10k_problem](https://en.m.wikipedia.org/wiki/C10k_problem)

These days, most servers can blow way past that, even into millions, but none
- that I am aware - do that with a process-per-connection model.

~~~
tannhaeuser
It depends. If you're caching ("weakly" eg with revalidating If-Modified-Since
against mtime, or even "strongly" eg aggressively), which you should, you're
creating processes only for a small fraction of requests. Remaining major
overhead is reparsing dynamic language backend code, which you can further
reduce by using native code.

~~~
spenczar5
The cache is served by an HTTP server still, though. What you are describing
is a sort of hybrid, where most requests are handled in threads (either OS or
green threads) but a small fraction get their own process. I think I agree
that that could work, but it sounds a bit different from creating a process
per connection.

I don’t think parsing code is the major overhead. Its not really about
starting the processes so much as switching between them when concurrently
handling a bunch of requests.

------
janvdberg
Are there any old screenshots around; what the editor or backend interface for
a Viaweb user looked like?

------
tosh
the faq reads like an essay on minimalism and first-principles thinking

