Ask PG: Database, flat files or other for YC News?

Zak · on Jan 16, 2008

This has been covered before. IIRC, it's all in memory. The data structures are saved to disk as s-expressions and read on application startup.

pg · on Jan 16, 2008

Right. Except now to make restarts faster I use lazy loading.

petermichaux · on Jan 16, 2008

Please write about how you do this and how you interact with the files (loading and serializing). I'm looking at alternatives to using relational databases and information about how to avoid data corruption (features analogous to transactions) is scare. How would you convince a mission critical site developer that this is safe?

ntoshev · on Jan 16, 2008

No idea how pg handles it, but here is an easy do-it-yourself way: http://armstrongonsoftware.blogspot.com/2006/09/pure-and-sim...

It doesn't matter it is in Erlang - you can do it in any language. A Lisp that implements software transactional memory is Clojure (runs on the JVM): http://clojure.sourceforge.net/

pg · on Jan 16, 2008

I wrap code that changes things within a call to atomic, which prevents the thread from switching in the middle of it. That solves the problem of two threads trying to modify an object at the same time. I don't add any protections against e.g. the host machine's power being shut off in the middle of writing a file, though maybe MzScheme does.

The answer to your specific question, though, is that I wouldn't try. News isn't written like banking software.

andyn · on Jan 16, 2008

Perhaps you want to look into the object databases that are available for your language. For example ZODB[1] or Durus[2] for Python.

[1] http://www.zope.org/Products/StandaloneZODB

[2] http://www.mems-exchange.org/software/durus/

wlievens · on Jan 16, 2008

Transaction isolation would be handled like any non-database backed application: use your language and/or library's native thread synchronization features.

As for persisting transactions, you could marshall to file the deltas for each transaction, and on regular intervals apply them to the full image to create an up-to-date image.

euccastro · on Jan 16, 2008

> use your language and/or library's native thread synchronization features

You may try using a single thread and cooperative multitasking. It helps if your language makes this convenient, e.g., Stackless Python or Scheme:

http://news.ycombinator.com/item?id=45561

Just remember to yield every now and then if you do anything lengthy. Depending on your application, it may not be that bad, and if you don't need anything fancier in the way of scheduling fairness, it makes your life really simple.

ralph · on Jan 16, 2008

Yep, the CSP style of channel communication is great. Those thinking Erlang is better than sliced bread need to make sure they're up on what came before and after; http://swtch.com/~rsc/thread/ Personally, I find Erlang to be too clunky as a language. Good for special purpose telephone switching software maybe, but for general programming the CSP style can be done in nicer ways than having to switch to a whole new language.

jgrahamc · on Jan 16, 2008

Not often we hear about CSP here. I did my doctorate in it (Hoare was the head of department at the time). Are you using CSP for something real?

ralph · on Jan 16, 2008

Not CSP, no, but libraries that bolt onto existing languages, e.g. C, Python. I'd love to see it become more of a mainstream technique in Python than `import threading', etc.

For those that wonder what we're wittering on about, visit http://www.cs.kent.ac.uk/teaching/07/modules/CO/6/31/slides/ and read these PDF slides in this order.

    motivation.pdf -- just pages 1-39
    basics.pdf
    applying.pdf
    choice.pdf
    replicators.pdf
    protocol.pdf
    shared-etc.pdf

Ignore the crufty Occam (Transputer anyone?) syntax, just concentrate on the concepts. Although some of the PDFs seem to have many pages, often a page is the same as the previous with a minor change; they're slides!

lojic · on Jan 30, 2008

Would you mind elaborating on the persistence a little? What is the granularity of persistence? For example, does each submission get written to its own file with associated comments? Do you utilize a memory mapped file? Inquiring minds want to know :)

vdm · on Jan 18, 2008

>This has been covered before.

Link?