
SQLAlchemy and You - megaman821
http://lucumr.pocoo.org/2011/7/19/sqlachemy-and-you/
======
espeed
This article is by Armin Ronacher, the author of Flask. I use Flask for most
of my projects because it's decoupled from the ORM and provides a clean way
for you to use whatever database tool is right for the job.

But lately I have moved away from using the relational database as the primary
datastore. When I do need a relational database, it's simple to use SQLAlchemy
in Flask.

But I have been finding graph databases as a more elegant alternative to
relational databases because you don't have to mess with tables or joins --
everything is explicitly joined.

Neo4j is one of the leading open-source graph databases, and it's pretty sweet
-- it can store 32 billion nodes while doing 2 million traversals per second
-- and you can use Gremlin with it (Gremlin is a graph query language, like
SQL for graphs).

The graph-database world is starting to emerge, but most graph databases are
Java based so non-JVM bindings are not always available.

However, there is a project called TinkerPop (<http://www.tinkerpop.com>) that
has made most of the open-source software stack for graph DBs. TinkerPop is
the project that developed Gremlin and Blueprints, and it released a REST
service called Rexster that is tied to Blueprints so it lets you connect to
any Blueprints-enabled graph database, including Neo4j, OrientDB, Dex,
OpenRDF, and an InfiniteGraph implementation is being released next month.

Because the graph-DB world lacked Python support, I decided to write a Python
persistence framework that connects to Rexster so you can now use any of those
graph DBs from Python, and binary bindings are in the works. The framework is
called Bulbflow, and you can find out more about it out at
<http://bulbflow.com/overview>.

In a few weeks I'm going to release Flask-based libraries that do
authentication and authorization in a graph-DB way.

~~~
zzzeek
You've been posting quite a bit about graph DBs lately including one the other
day which stated that "the relational DB is dying as the primary datastore" as
well as "Graphs are a much more elegant way of storing relational data".
(source: <http://news.ycombinator.com/item?id=2778945>).

I have to disagree with you here. Graph DBs are a much more elegant way to
store graph data, i.e. social media data, some forms of scientific data. But
for the vast majority of relational database usage, i.e. finance, accounting,
retail, medical records, human resources, academia, reservation databases,
etc. the truly vast amount of databases we interact with indirectly all day
long, these systems are firmly in the relational camp and that is not changing
anytime soon. SQLAlchemy would very much like Python to be prevalent
throughout all these industries, and is not just about social media
applications.

~~~
espeed
Yes, that was in the context of Django as well. Look at the trend for MongoDB
compared to PostgreSQL
([http://www.google.com/trends?q=mongodb%2C+postgresql&cta...](http://www.google.com/trends?q=mongodb%2C+postgresql&ctab=0&geo=all&date=all&sort=1)).

Because Flask is decoupled, it let's you use whatever tool is right for the
job. If you need to store financial data, it's easy to use SQLAlchemy, but
graph databases are much better fit for work working with stuff like the
modern-day social graph.

And don't get me wrong, I like SQLAlchemy -- I have used it for years and
continue to use it -- thank you for all the work you've put into it.

~~~
bretthoerner
> Because Flask is decoupled, it let's you use whatever tool is right for the
> job.

(Derail alert)

How is it any more difficult to use a graph database in Django? (Hint: if you
say "but then you can't use X" my next question is "are you implying Flask
includes X and you can automagically use X with a graph database?")

~~~
espeed
You absolutely can use a graph DB in Django. But if you're not using a
relational DB as your primary datastore (for stuff like authentication and
authorization) and not using the ORM, how much of Django would you really use?

Django was built around an ORM. Now you have more database options that have
less of an impedance mismatch, but because Django is so ORM-centric, forgoing
the RDBMS in favor of another option means you now have a framework mismatch.

Starting with Flask, which was designed to be decoupled, simplifies things.

~~~
simonw
"How much of Django would you really use?"

The URL mapper, the view abstraction, the middleware, the debugging tools, the
generic views concept, the httprequest and httpresponse classes, the
templating system, the caching framework, the syndication library and a bunch
of the third party ecosystem tools.

~~~
espeed
Ahh...what you describe sounds a lot like Flask.

~~~
parfe
It sounds like every halfway decent web framework ever written.

~~~
espeed
The point is why not just start with Flask if that's the feature set you're
going to end up with.

~~~
hello_moto
Becaus I'm more familiar and have experienced in using it. Because I decided
to use Django for the other part of my big web application. Because I want to
learn more and get better with my tool as opposed to learn the Python web app
framework of the week.

------
wulczer
SQLAlchemy supports multiple-column primary keys, Django ORM does not. What's
more to think about?

~~~
cturner
If I'm designing a schema, I always have a primary key column called "id" with
an autoincrement sequence against it. Sometimes I'll have a domain table, with
key varchar, description varchar.

Perhaps you deal with existing systems you didn't design. Aside from that -
where is the cause to use a multi-column primary key?

~~~
wulczer
There are strong opinions on it, but I firmly believe that "always have an
autoincremental id" is a very wrong approach.

Read [http://it.toolbox.com/blogs/database-soup/primary-keyvil-
par...](http://it.toolbox.com/blogs/database-soup/primary-keyvil-part-i-7327)
(an often-quoted series of blog posts on why surrogate numeric keys are evil).

In general, if you have an entity with a well-defined natural key (users and
their logins, shipments and their tracking codes), why would you use an
autogenerated key that's meaningless?

Case in point, my current schema, designed from scratch. Users can have
multiple dashboards, each dashboard has a label. A user cannot have multiple
dashboards with the same label, but the two users can of course use the same
label for their dashboards (like "sales" or "mydashboard"). The dashboard's
table primary key is (username, label) - that pair uniquely identifies a
dashboard, so it's a perfect PK candidate.

Also, as already noted in one response, many-to-many relationships are
typically modelled by an intermediate table that has foreign keys to both
sides of the many-to-many, and its foreign key is the sum of these foreign
keys.

EDIT: added a concrete example

~~~
cturner
Thanks to you and others for respectful responses on what I didn't realise was
a hot-button issue.

You ask, "In general, if you have an entity with a well-defined natural key
(users and their logins, shipments and their tracking codes), why would you
use an autogenerated key that's meaningless?"

I like my approach because of reduced cog. overload. When I'm designing my
database, or writing queries that jump through several tables, I always know
that my foreign key in table Person [1] is going to be of the form
fk_organistaion_id. All I have to do is remember table names and
relationships. If I also had to remember arbitrary keys, that would increase
memory overload. When I'm doing consulting and working on several applications
in parallel that's less practical. Similarly, I always know I can store a
reference to an ID column and know I'm it will be there and uniquely identify
a row.

I don't think the many-to-many example really answers to my question. The
issue of whether it is commonly done is different to whether it is good
practice.

I can imagine there might be performance advantages to having a composite
primary key for a join table. But - I expect you could get equivalent
performance on id|fk_a_id|fk_b_id by adding an index. This comes back to the
principle of - do you write your code principally to be run, or to be read. I
have a memory like a sieve and write to be read.

I think composite primary keys are done for the wrong reason at times.

The problem in the example in the article linked to is not a poor use of keys,
it's a poor design of system. They're trying to enforce type at the schema
level. Though he doesn't spell it out, I expect the reason they're getting
bizarreness is because they either have people interacting with the database
at too-low a level, or because they have multiple applications hung off it.
More on this in a second. [2]

Similarly, I think your attempt to use the database to enforcing typing rules
will work at some levels but runs out. For example - imagine if a user was
only alowed to have three labels. Or that the label musn't have any spaces in
it.

While you can delve into triggers [3] I think it's misguided to think you can
enforce a general sense of business logic at the database level. That stuff
should be done by an application surrounding the database. Then all
interaction with the database should go through that one-and-only-one system
that owns the database.

Databases have some type information but it's very primitive and inadequate
for all but the most simple of scenarios. I've found that once you acknowledge
that you start designing schemas and the systems around them in a way that is
very different to what you'd learn in the Oracle course.

\--

[1] Another quirk of my style - singular table names - because it makes it
easier to wrap ORMs around it without having code that reads as bizarre

[2] The second example is riduculous. They assigned an int to the wrong place.
That's not a problem with schema design, that's just a stupid mistake.

[3] I've done plenty of this work on hairy enterprise systems

~~~
wulczer
I think I see the fundamental point where we disagree. I try to keep as much
business logic in the database as possible. That's more of a DBA perspective,
I guess: don't trust these pesky app developers, always assume they'll try to
put broken data in your DB.

It's often not possible to model all business rules in SQL, but the more you
manage to cram into the database, the less probable it is you'll end up with
inconsistent data.

Often there are dozens of applications (or independent modules) using the
database. Keeping the business rules (like "every user has to have a first
name" or "no two customers can rent the same car at the same time") in a
centralised place helps a lot with keeping everything in order. It's often not
possible to limit the interaction with the database to a one-and-only system
(imagine different teams writing different parts of the system, app changes,
outsourced integrations, etc)

So, if a label can't have spaces in it, that's easy - slap on a check
constraint that enforces it. Then you know that even if the new hire writes a
"change my label" funcionality and forgets to enforce the no spaces rule, she
won't screw up your data.

If a user can have at most three labels, things get more interesting. Offhand
I'm not sure how I'd model that and it actually is a difficult problem, best
solved with triggers IMHO, as you have better control over locking and can
guard yourself against concurrency issues.

Anyway, the discussion on "all logic in DB" vs "all logic in the app" is part
of the neverending struggle between DBAs and app programmers...

~~~
cturner
haha I can see why we disagree. Same problems, different solutions.

    
    
        Often there are dozens of applications (or independent
        modules) using the database
    

I know! I always assume that someone will try to pull this on me, some
shortsighted bugger always does, and I end up on crusde protecting my platform
against this!

It is fundamentally bad design to have more than one application dependent on
a database, and you're right, it's used everywhere. If I do a PhD one day, it
will be on this topic. It's the number one stupid design in enterprise
computing.

I've spent years trying to write a logic framework to surround schemas to make
it possible to do the functionality of pl/sql, but with stateful sessions.
Every eighteen months I have a new go at it and rewrite.

I have the ideas working, and use the patterns now, but haven't managed to
create a framework of it that would be usable to anyone else.

    
    
        Anyway, the discussion on "all logic in DB" vs "all
        logic in the app" is part of the neverending struggle
        between DBAs and app programmers...
    

Yeah. When I was doing a lot of relational work, I'd enforce these checks by
having business logic layers around my ORM.

------
j-kidd
I think SQLAlchemy is designed for people who are comfortable dealing with SQL
directly, while other ORMs want their users to be oblivious about SQL.

In Chinese, there are 3 stages of life:

1\. Looking at a mountain, it's a mountain - this is when you start learning
about SQL

2\. Looking at a mountain, it's not a mountain - this is when you start using
tools that shell you from SQL

3\. Looking at a mountain, it's still a mountain - this is when you start
using SQLAlchemy

------
sc68cal
_Many people's first experience with a Python ORM (or maybe an ORM altogether)
is the Django one._

Useless anecdote: I started out using SQLObject since I was importing data
from flat files into a relational database, then doing some analysis. When the
analysis became useful to others, that's when I started building a web front-
end with Django.

------
nerd_in_rage
SQLAlchemy is pretty much the only ORM I've used that DIDN'T suck. It's great,
especially after seeing Java / Hibernate hell.

