Hacker Newsnew | comments | show | ask | jobs | submit login

"Object oriented models are fine if you write one application. But they tie functionality to data very closely and that makes repurposing of data for seperate applications much more difficult. So how do they analyse their data or use it in more than one application?"

This was an extremely poorly-written and poorly-researched article. One problem is that the projects being discussed are not object databases in the sense of storing straight-up serialized representations of business objects.

Take CouchDB for example; the basic "thing" you store in it and get back from it is a JSON object, which is a set of key/value pairs. In this sense it's not too far removed from SQL-based DBs, because you still choose which bits of data you're going to store (the keys in the object). It departs from the SQL point of view in not requiring that all the records have the same schema, and in not having a representation (at the data-storage level) of relations between records.

This turns out to offer some big advantages: in my experience, the number-one cause of unwieldy SQL is a schema that's had to grow over time to accomodate ever more edge cases. Maybe it's been done through lots of nullable columns, maybe it's been done through lots of related tables or some other mechanism, but it frequently has to be done and ends up making the database painful to work with.

CouchDB throws that out the window: edge-case records simply go in like anything else, and if they don't have some particular field present in other records, so what? The "query" is actually hitting a map/reduce in which the "map" function can take whatever action it wants with a record that's missing some particular key. It can skip that record, it can spit out a default value for a missing key, it can do anything it likes.

"I'm convinced that data has a different life cycle than procedural code and therefore needs to be expressed in a simple, uniform, reductionist model independent of all application code."

Simple and independent, yes. Reductionist and uniform, no.

To run with CouchDB as the example, those assumptions are thrown out because CouchDB essentially adds a layer to the stack. Traditionally, you have application code over here, querying data over there. In CouchDB, you have application code which queries a CouchDB view which returns data, but the query doesn't necessarily know anything at all about what the actual data in the DB is, or how it's structured (or even if it is structured in any sensible way; maybe it's just a bunch of random key/value pairs). The view layer is the part which cares about that.

And views are not static (or mostly static) things like the schemas in relational DBs; views are free to evolve over time, you're free to add or remove views in response to changing needs, and the underlying data never has to change as a result. And so you don't need to agonize over the most efficient way to reduce your data to a uniform schema. You don't need to "migrate" your underlying data storage representation to change the types of things you can store or the types of queries you can run.

I think data independence all but implies a reductionist data model (uniform probably isn't a very clear term). And when I say data model I don't mean a particular schema but the primitives that are used to model the data, like relations and attributes in the relational model or key value pairs in Berkeley DB.

I cannot say anything useful about CouchDB as I don't know it nearly well enough. What I think doesn't work is to hide data behind a procedural API when it comes to read access (write access is a different matter).

A procedural API is a black box that you cannot reason about and has a very application specific purpose that doesn't lend itself to analytics apps. Analytics apps should know as little as possible about particular applications. They cannot easily call arbitrary functions.

My experience with data centric apps is that it's a good thing to have that situation where everything is a table and each of the few operations you have creates another table. Tables in, tables out. The same thing works with lists, key/value pairs, etc.

My battle cry would be "No applications specific APIs" (for data access)


Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact