

Writing a non-relational Django backend - wkornewald
http://www.allbuttonspressed.com/blog/django/2010/04/Writing-a-non-relational-Django-backend

======
kingkilr
I (author of multiple-database support for django) am extremely uncomfortable
with the approach these guys are taking. The proper approach would be to work
with Django itself to make things more flexible (when you're subclassing
something named SQLCompiler for a non-SQL backend it should be a hint you've
done something wrong).

~~~
twanschik
When you are naming something like the SQLCompiler "SQLCompiler" even if it
fits non-relational database's needs than naming it "SQLCompiler" was wrong!

~~~
kingkilr
As I said on reddit, the name isn't misleading. SQLCompiler is used for
generating SQL, and assumes relational database concepts like joins, table
aliases, and aggregation.

~~~
wkornewald
Now, here comes the surprise: That's exactly what we want to emulate with
Django nonrel. It's exactly the feature set we badly want to have. This is the
reason why the SQLQuery/SQLCompiler stuff fits so well (though, you're right,
we're only using a fraction of SQLCompiler's code, so we could actually
implement a simplified base class which is used by both, but we can discuss
this later on the django-developers group).

------
wkornewald
We need help to get official Django support for non-relational / NoSQL DBs.
Currently, we only have an App Engine backend and a half-finished MongoDB
backend. We have to demonstrate to the Django team that our approach works
with a wider variety of DBs. If we achieve that goal Django would have a way
to write portable code for a lot of DB types. We'd also have the foundation
for much more powerful features like automatic denormalization. Please help us
find backend contributors.

~~~
rbanffy
I am decided to give Cassandra a shot. It will take some work to get up to
speed, but you can count on me for that.

~~~
joshfinnie
I know this if off-topic, but what is the best resource to learn Cassandra? I
can't find anything good searching through Google.

~~~
kingkilr
There's a PDF from digg from a NoSQL meetup (NoSQL east I think) and it has a
link to a document somewhere there wrote up on understanding the cassandra
data model, it's pretty good. You'll have to google around for it though.

------
m0th87
This article begs the question "what for?"

* Document stores map pretty easily to object oriented systems, and thus negate the need for an ORM

* Distributed hashtables don't have the featuresets available to warrant an ORM abstraction

I guess it might work for Cassandra et al. But IMO ORMs barely work for SQL
databases. How well will they map on to column stores?

~~~
wkornewald
It's for creating an abstraction layer that allows to write portable apps and
that provides a much more powerful API than boring hashmaps (as you might have
noticed, lots of APIs for those DBs try to get away from the hashmap
representation). How often have you written manual denormalization code or
manual in-memory JOIN code? How often did you manually keep track of counters?
What are you trying to achieve with that, anyway? You're "manualating"
features you get for free with SQL. That's waste. It's something you shouldn't
have to do. It can be automated and that's our long-term goal.

These DBs are like writing with Assembler just to get more speed (or scale in
this case). The point is, you can use a high-level language (here, Django's
ORM plus a few "optimization" specifications) to achieve the same result much
faster and in a portable way.

For anything that goes beyond the ORM's features we can still provide
MapReduce and other mechanisms, but again with a higher-level API. Do you
really believe 10 years from now we'll still be using hashmaps to access those
DBs? (just trying to provoke some thought; I don't really believe that you
think that way)

~~~
m0th87
But that is a defense of ORMs, not using ORMs for NoSQL solutions. The value
of document stores for me is that I _don't_ have to worry about JOINs or
normalization - and thus the layer of abstraction between the object oriented
system and the document store is no longer necessary.

This is opposed to relational databases, where you generally either go with a
DAL if you want to be low-level or an ORM if you're willing to sacrifice some
scalability. And the loss in scalability is not comparable to that of going
from assembly to a higher level language. ORMs include often unnecessary JOINs
and other logarithmic queries whose performance degrades with the size of the
data set. With programming languages, performance loss is usually "constant"
so it's a much less painful pill to swallow.

~~~
wkornewald
Well, instead of thinking about normalization you now have to think about
denormalization. It's just the other extreme. This isn't really a
philosophical decision. Either you want to have an abstraction layer that
takes care of these things for you or you want to write your code manually. I
prefer the abstraction because it gives me more time to think about the
problems that matter.

~~~
m0th87
Maybe I'm dense, but I still don't understand. I'm guessing denormalization
means going from relational data to non-relational data. When you're working
with a document store, data never has to be normalized, and as a consequence,
it never has to be denormalized either.

Take, for instance, getting a user with the first name 'Martha'. In a document
store, it might look like:

    
    
      store.get({first='Martha'})
    

Whereas in Django's ORM, it might look like:

    
    
      User.objects.filter(first='Martha')
    

And for reference, the SQL might look like:

    
    
      query("SELECT * FROM User JOIN Preferences ON User.Id=Preferences.UserId WHERE First=%s", 'Martha')
    

The first two are obviously simpler than the third. But how is the Django ORM
inherently superior to the document store's? When your atomic unit is a
document, there's no longer a need for Django models.

------
urlwolf
If you use RDF stores/graph databases, this python ORM is worth watching:
<http://code.google.com/p/surfrdf/> . Like activeRDF in the ruby world, but
with the advantage of having solid RDF libs (RDFlibs for python is 3-4 years
old now) compared to the buzzing mess of new RDF libs for ruby, most of them
still very preliminary.

