Hacker News new | past | comments | ask | show | jobs | submit login
Ask PG: Database, flat files or other for YC News?
24 points by iamelgringo on Jan 16, 2008 | hide | past | favorite | 13 comments
I know that you wrote that you used flat files for Viaweb. I was wondering if you were still doing that for YC News, if you had broken down and started using a database or rolled your own data store with Arc.



This has been covered before. IIRC, it's all in memory. The data structures are saved to disk as s-expressions and read on application startup.


Right. Except now to make restarts faster I use lazy loading.


Please write about how you do this and how you interact with the files (loading and serializing). I'm looking at alternatives to using relational databases and information about how to avoid data corruption (features analogous to transactions) is scare. How would you convince a mission critical site developer that this is safe?


No idea how pg handles it, but here is an easy do-it-yourself way: http://armstrongonsoftware.blogspot.com/2006/09/pure-and-sim...

It doesn't matter it is in Erlang - you can do it in any language. A Lisp that implements software transactional memory is Clojure (runs on the JVM): http://clojure.sourceforge.net/


I wrap code that changes things within a call to atomic, which prevents the thread from switching in the middle of it. That solves the problem of two threads trying to modify an object at the same time. I don't add any protections against e.g. the host machine's power being shut off in the middle of writing a file, though maybe MzScheme does.

The answer to your specific question, though, is that I wouldn't try. News isn't written like banking software.


Perhaps you want to look into the object databases that are available for your language. For example ZODB[1] or Durus[2] for Python.

[1] http://www.zope.org/Products/StandaloneZODB

[2] http://www.mems-exchange.org/software/durus/


Transaction isolation would be handled like any non-database backed application: use your language and/or library's native thread synchronization features.

As for persisting transactions, you could marshall to file the deltas for each transaction, and on regular intervals apply them to the full image to create an up-to-date image.


> use your language and/or library's native thread synchronization features

You may try using a single thread and cooperative multitasking. It helps if your language makes this convenient, e.g., Stackless Python or Scheme:

http://news.ycombinator.com/item?id=45561

Just remember to yield every now and then if you do anything lengthy. Depending on your application, it may not be that bad, and if you don't need anything fancier in the way of scheduling fairness, it makes your life really simple.


Yep, the CSP style of channel communication is great. Those thinking Erlang is better than sliced bread need to make sure they're up on what came before and after; http://swtch.com/~rsc/thread/ Personally, I find Erlang to be too clunky as a language. Good for special purpose telephone switching software maybe, but for general programming the CSP style can be done in nicer ways than having to switch to a whole new language.


Not often we hear about CSP here. I did my doctorate in it (Hoare was the head of department at the time). Are you using CSP for something real?


Not CSP, no, but libraries that bolt onto existing languages, e.g. C, Python. I'd love to see it become more of a mainstream technique in Python than `import threading', etc.

For those that wonder what we're wittering on about, visit http://www.cs.kent.ac.uk/teaching/07/modules/CO/6/31/slides/ and read these PDF slides in this order.

    motivation.pdf -- just pages 1-39
    basics.pdf
    applying.pdf
    choice.pdf
    replicators.pdf
    protocol.pdf
    shared-etc.pdf
Ignore the crufty Occam (Transputer anyone?) syntax, just concentrate on the concepts. Although some of the PDFs seem to have many pages, often a page is the same as the previous with a minor change; they're slides!


Would you mind elaborating on the persistence a little? What is the granularity of persistence? For example, does each submission get written to its own file with associated comments? Do you utilize a memory mapped file? Inquiring minds want to know :)


>This has been covered before.

Link?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: