Hacker News new | comments | show | ask | jobs | submit login

There seem to be two quite contradictory arguments for NoSQL. One camp says, we already have what we need in the form of [programming language] objects, so why would we take everything apart and map it to something as reductionist as relational tables?

The other camp says, relational tables are soooo complicated, we need something even more reductionist and free form, so key-value pairs are just right.

I can understand the second argument, but the first is a step backwards unless your programming language is something like lisp, which has a reductionist data model itself.

Object oriented models are fine if you write one application. But they tie functionality to data very closely and that makes repurposing of data for seperate applications much more difficult. So how do they analyse their data or use it in more than one application?

They have to write a lot of code using an application specific object oriented API instead of a general purpose reductionist data model. That's horrible but how horrible it is only shows a few years down the road. It's a disaster in the making. A whole army of programmers will be required to extract data from all those big APIs of legacy apps.

I'm convinced that data has a different life cycle than procedural code and therefore needs to be expressed in a simple, uniform, reductionist model independent of all application code.




>>Object oriented models are fine if you write one application. But they tie functionality to data very closely and that makes repurposing of data for seperate applications much more difficult.<<

I've taken a different tack in my research--create a generalized user object and then build applicatin objects on top of them, with the base user object responsible for all persistent data. (These objects can each be mapped into an XML document, so there's a fair amount of flexibility).

The work has gone very slow, but here's the site: http://agilewiki.ning.com/ and mind, I have mixed in some hype/BS, though mostly it is intentional things which I've previously prototyped (been working on this over 6 years now).


"Object oriented models are fine if you write one application. But they tie functionality to data very closely and that makes repurposing of data for seperate applications much more difficult. So how do they analyse their data or use it in more than one application?"

This was an extremely poorly-written and poorly-researched article. One problem is that the projects being discussed are not object databases in the sense of storing straight-up serialized representations of business objects.

Take CouchDB for example; the basic "thing" you store in it and get back from it is a JSON object, which is a set of key/value pairs. In this sense it's not too far removed from SQL-based DBs, because you still choose which bits of data you're going to store (the keys in the object). It departs from the SQL point of view in not requiring that all the records have the same schema, and in not having a representation (at the data-storage level) of relations between records.

This turns out to offer some big advantages: in my experience, the number-one cause of unwieldy SQL is a schema that's had to grow over time to accomodate ever more edge cases. Maybe it's been done through lots of nullable columns, maybe it's been done through lots of related tables or some other mechanism, but it frequently has to be done and ends up making the database painful to work with.

CouchDB throws that out the window: edge-case records simply go in like anything else, and if they don't have some particular field present in other records, so what? The "query" is actually hitting a map/reduce in which the "map" function can take whatever action it wants with a record that's missing some particular key. It can skip that record, it can spit out a default value for a missing key, it can do anything it likes.

"I'm convinced that data has a different life cycle than procedural code and therefore needs to be expressed in a simple, uniform, reductionist model independent of all application code."

Simple and independent, yes. Reductionist and uniform, no.

To run with CouchDB as the example, those assumptions are thrown out because CouchDB essentially adds a layer to the stack. Traditionally, you have application code over here, querying data over there. In CouchDB, you have application code which queries a CouchDB view which returns data, but the query doesn't necessarily know anything at all about what the actual data in the DB is, or how it's structured (or even if it is structured in any sensible way; maybe it's just a bunch of random key/value pairs). The view layer is the part which cares about that.

And views are not static (or mostly static) things like the schemas in relational DBs; views are free to evolve over time, you're free to add or remove views in response to changing needs, and the underlying data never has to change as a result. And so you don't need to agonize over the most efficient way to reduce your data to a uniform schema. You don't need to "migrate" your underlying data storage representation to change the types of things you can store or the types of queries you can run.


I think data independence all but implies a reductionist data model (uniform probably isn't a very clear term). And when I say data model I don't mean a particular schema but the primitives that are used to model the data, like relations and attributes in the relational model or key value pairs in Berkeley DB.

I cannot say anything useful about CouchDB as I don't know it nearly well enough. What I think doesn't work is to hide data behind a procedural API when it comes to read access (write access is a different matter).

A procedural API is a black box that you cannot reason about and has a very application specific purpose that doesn't lend itself to analytics apps. Analytics apps should know as little as possible about particular applications. They cannot easily call arbitrary functions.

My experience with data centric apps is that it's a good thing to have that situation where everything is a table and each of the few operations you have creates another table. Tables in, tables out. The same thing works with lists, key/value pairs, etc.

My battle cry would be "No applications specific APIs" (for data access)




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: