
Give PonyORM a chance - kozlovsky
http://jakeaustwick.me/why-you-should-give-ponyorm-a-chance
======
datashaman
I do not really understand the desire for ORMs to try and recreate the
experience of writing SQL. Why not just use SQL, then?

I understand that the ORM is trying to smooth over differences in
implementation, providing the possibility of change from one DB access layer
to another.

I have yet to see anyone do that on a real project, which makes me really
wonder at the point of using an ORM at all.

I cannot quantify the amount of time I've lost figuring what the hell I must
do to generate a relatively simple SQL statement.

~~~
benselme
Well, ORMs are useless if you're of the opinion that string interpolation and
concatenation are a good way of composing logical units. Many of us prefer a
proper API for doing that, and good ORMs like SQLAlchemy provide us with one.

~~~
mantrax5
"foo > 123 or foo IS NULL and bar < 12"

Now write this in SQLAlchemy without strings, without concatenation, and let's
see which provides you with more clarity.

~~~
benselme
(FooBar.foo > 123) | (FooBar.foo == None) & (FooBar.bar < 12)

No strings, and I get an actual expression object instance that I can, for
example, store in a variable and reuse in any number of queries.

~~~
coleifer
Yes! Plus if you have a syntax error Python will catch it for you. Peewee
works the same way.

------
stana
Surprised by number of negative comments about ORM-s. Twice I have had to
migrate projects between different sql db engines. Once SQL was embedded in
the application, second time perl-s DBIx ORM was used from the beginning.
Guess which project was easier. Why would you lock yourself into using a
specific db engine unless you really had to. And how hard is it to embed SQL
for a few specific queries even when using ORM. Lately heavily using Django's
ORM and think it is great. And if your data structures so complex that ORM
does not do the job maybe something wrong with your schema. I like my data
layer simple if possible, and complexity in the application.

~~~
wldlyinaccurate
> And how hard is it to embed SQL for a few specific queries even when using
> ORM

This is the thing that many people seem to forget. Every ORM I've used has
allowed you to write your own SQL for those times where the DBAL is limited or
not performing well. In my experiences, those times are rare. Some ORMs (like
Doctrine) even have their own DSL so that you can write complex queries and
still have the benefit of a DBAL.

I'm currently working with a team on a large project where all the SQL is
written by hand, and results are hydrated into plain ol' arrays. The
maintenance overhead for schema changes is _huge_ due to the number of queries
which need to be updated, and there are constant bugs caused by columns being
left out of SELECT statements. On top of that, the testability of the entire
app suffers because most methods talk directly to the DB. It's truly a
nightmare. I've never experienced these problems with an ORM.

------
sgift
No chance. Sorry Pony, you are probably a great ORM, but I have been burned
too often by ORMs and their "magic", which will "just work" (hahahahahahaha).
Hibernate destroyed my trust in ORMs forever and even SQLAlchemy (which is
great) wasn't able to fully restore it. I will write my queries alone. So that
I know why they misbehave and I can change it instead of trying to chase
through obscure layers of magic, read various loggings (if logging is
available) and hope that my changes in the ORM layer will translate to the sql
I want.

~~~
mistermann
If you can think of one of your most representative examples of a problem you
had I'd be interested to hear.

~~~
angrybits
(I'm not the OP, but I will butt in with my two cents.)

Anything that isn't CRUD or a simple aggregation. Real world applications tend
to have: oddball joins, subqueries, window functions, case statements and all
other sorts of crazy once you get into the value-add parts of the system. What
I really need is a clever way to project the value or set into a structure
that I can easily process or convert into UI elements.

On the subject of CRUD, I would be really impressed with an ORM that would
detect my constraints and enforce them in the application layer. I am a big
fan of DRY, but I am a bigger fan of bulletproof relational models that
prevent bad data from ever getting to the system. So rather than declaring
these sorts of things in the app code, where they won't be enforced in things
like stored procs or ad hoc SQL, I'd much rather have the database be the one
source of the data constraints. For really complicated things (e.g. validation
triggers), I'd want a way to communicate a violation back up to the ORM layer.

Not sure if this is what you wanted, but for some reason I felt compelled to
brain dump what I have been thinking on the subject as of late.

------
the_mitsuhiko
Why you should not: it's inferior to SQLAlchemy and has a dual licensing model
where one is the useless AGPL and the other is an expensive commercial
license.

Also I find it's implementation to be very questionable.

~~~
reitanqild
As much as AGPL annoys me the pony developers seems to understand it and use
it in a reasonable way.

What really annoys me is

1\. when people stamp a AGPL license on something (that was previously
licensed under GPL or even less restrictive) and claim that it is still just
GPL and everyone could still continue using their system as before.

2\. when people make something brilliant and only offer it under AGPL without
providing a commercial licensing option.

~~~
thu
3\. when the code is AGPL, receives contributions, then the original author
offers a commercial license including said contributions without any
contributor agreement.

------
coleifer
I'm the author of a lightweight python ORM (peewee) and have a healthy respect
for the work that goes into building this type of tool. PonyORM is quite a
feat of software engineering: it's developer(s) have done a great job.

I would never use it, though, because decompiling Python generator expressions
just feels _so unpythonic_.

~~~
Jake232
The process going on under the hood (the whole disassemble / AST thing) is
pretty unpythonic, however I would argue this then gives the developer the
advantage of having a more pythonic query interface to the database on a
higher level.

Providing the things happening underneath don't break (and they haven't on me
yet), and it generates efficient SQL (which is does), then I'll take the
hidden complexity for the advantage of the higher level generator syntax.

~~~
coleifer
Totally, and I know you are not alone. I think it's just a matter of
boundaries, and when it comes to the data access layer of my applications I
definitely want to feel very comfortable with the library I'm using. That's
why SQLAlchemy is so popular, IMO, the implementation is rock-freekin-solid.
The API may be more verbose, but you can understand it, and in doing so trust
it to do the right thing.

------
jstsch
I still feel that RedBeanPHP ([http://redbeanphp.com](http://redbeanphp.com))
has found the right balance between direct object manipulation and SQL. I find
that abstracting away SQL too much gives more headaches than it's worth — SQL
is a valuable skill anyway, and often diving into SQL is much faster and
easier than diving into abstraction magic.

~~~
nnq
I love redbean, it saved my butt once, but it gets lots of it's power from
doing things "the PHP way", it makes linting and editor/ide autocompletion
impossible, there is no one place you can look to find _the part of the
database schema relevant to your application_ , you can't just go read a
models.php file and get a birds eye view of the database structure that
matters and all...

It's basically the opposite of "pythonic". I can't imagine something like
redbean written in a language like Python, and maybe it's for the best :)

------
nnq
> It essentially decompiles the generator into its bytecode using the dis
> module, and then converts the bytecode into an AST.

Uhm, anyone knows what are the performance costs of this?

~~~
Jake232
Here's a comment on the stack overflow answer I linked to in the post:

Very performant: (1) Bytecode decompiling is very fast. (2) Since each query
has corresponding code object, this code object can be used as a cache key.
Because of this, Pony ORM translates each query only once, whereas Django and
SQLAlchemy have to translate the same query again and again. (3) As Pony ORM
uses IdentityMap pattern, it caches query results within the same transaction.
There is a post (in russian) where author states that Pony ORM turned out to
be 1.5-3 times faster than Django and SQLAlchemy even without query result
caching:
[http://www.google.com/translate?hl=en&ie=UTF8&sl=auto&tl=en&...](http://www.google.com/translate?hl=en&ie=UTF8&sl=auto&tl=en&u=http%3A%2F%2Fhabrahabr.ru%2Fpost%2F188842%2F)

~~~
zzzeek
the performance cost is going to be mostly in the kinds of queries it produces
and how well they will be interepreted by the query planner. My understanding
is that Pony is very heavy on subqueries and correlated subqueries, and the
user is given extremely little leverage on being able to control the structure
of queries rendered. Subqueries and especially correlated subqueries have the
worst performance of all, especially on less mature planners like that of
MySQL.

~~~
amalashkevich
Actually Pony can transform subqueries into JOINs in most of the cases. But
when it translates the 'in' operator of a generator expression, it produces a
subquery with 'IN', because otherwise the programmer can be confused by the
fact that the resulted SQL looks too different from the Python code. Pony
allows you to use the 'JOIN' hint in order to make it to use JOINs instead of
subquiries. In the example below Pony produces a subquery when it translates
the `in` section from the generator:

    
    
        >>> from pony.orm.examples.estore import *
        >>> select(c for c in Customer if 'iPad' in c.orders.items.product.name)[:]
    
        SELECT "c"."id", "c"."email", "c"."password", "c"."name", "c"."country", "c"."address"
        FROM "Customer" "c"
        WHERE 'iPad' IN (
            SELECT "product-1"."name"
            FROM "Order" "order-1", "OrderItem" "orderitem-1", "Product" "product-1"
            WHERE "c"."id" = "order-1"."customer"
              AND "order-1"."id" = "orderitem-1"."order"
              AND "orderitem-1"."product" = "product-1"."id"
            )
    

But you can tell Pony to use JOIN instead of a subquery by wrapping the 'in'
section into a 'JOIN' hint:

    
    
        >>> select(c for c in Customer if JOIN('iPad' in c.orders.items.product.name))[:]
    	
        SELECT DISTINCT "c"."id", "c"."email", "c"."password", "c"."name", "c"."country", "c"."address"
        FROM "Customer" "c", "Order" "order-1", "OrderItem" "orderitem-1", "Product" "product-1"
        WHERE "product-1"."name" = 'iPad'
          AND "c"."id" = "order-1"."customer"
          AND "order-1"."id" = "orderitem-1"."order"
          AND "orderitem-1"."product" = "product-1"."id"

------
mtford
I do a lot of heavy Django work and first glance I really like the syntax.
Much cleaner that what I'm used to. I will def give this a try at some point
and comment further.

Why the name PonyORM btw? I know it's superficial but I much prefer the name
SQLAlchemy - has more meaning.

~~~
xenonlight
I believe it's a reference to DjangoPony
([http://www.djangopony.com/](http://www.djangopony.com/)).

~~~
amalashkevich
Actually we named our project aprox two years before Django Pony mascot image
appeared.

------
megaman821
As someone who has had to maintain other people's code, please use an ORM.
Usually the biggest thing to fix is a person didn't select related rows and
ends up doing hundreds of queries inside a for loop.

Now the hand-written SQL people leave in SQL injection possibilities. They
build up complex queries with crazy string concatenation. They either have no
or a shitty data mapping layer (I mean, I really enjoy having to look at the
database to figure out what fields select * from articles returns).

Obviously there are going to be queries outside of what any normal ORM can do,
but every ORM I have used gives you an escape hatch to just write raw SQL when
needed.

------
makmanalp
I'm just waiting for zzzeek to publish a blog post on how to implement this on
top of sqlalchemy core, like this post [1]. :) Seriously though, sqlalchemy
ORM layer already does this. Why not leverage all the existing features and
support it has? Also, it's an easier sell for a $100 piece of software: "If
you're using sqlalchemy, you can migrate to pony immediately."

[1] [http://techspot.zzzeek.org/2011/05/17/magic-a-new-
orm/](http://techspot.zzzeek.org/2011/05/17/magic-a-new-orm/)

~~~
zzzeek
the AST-on-top-of-SQLAlchemy idea has been discussed for many years prior to
Pony's existence. Robert Brewer's Geniusql
[http://www.aminus.net/geniusql/chrome/common/doc/trunk/manag...](http://www.aminus.net/geniusql/chrome/common/doc/trunk/managing.html#select)
does the same thing (for Geniusql, you pass it lambdas; interpreting
generators directly was more of a "future" thing, but by "future" we're
talking, like four years ago :) ). He presented it at Pycon maybe in 2009,
2010 and right after I had the usual flurry of questions "Can SQLAlchemy do
this?" and I said "sure! just write an adapter, easy enough". I think Robert
was even interested in that.

at the end of the day the AST idea looks very nifty but IMO is too rigid to
translate to SQL in a flexible enough way, and also works against the main
reason you use code structures to produce SQL which is composability. When I
last saw the Pony creators do a talk, the approach seemed that each time you
have a given SELECT, and you'd like to add some extra criterion to it, it
pretty much will keep producing subqueries of the original, because each time
you can only wrap the AST construct you already have. It similarly had no
ability to produce a JOIN - at that time at least, the only way to join things
together was by rendering inefficient correlated subqueries. This was asked
explicitly.

If they've found a way to resolve these issues while keeping true to the "AST
all the way" approach and not dropping into a SQLAlchemy-style approach, that
would be great. There's no reason SQLA ORM or Core couldn't be behind a
similar approach as well except that nobody's really had the interest in
producing it.

~~~
amalashkevich
Hi Mike,

Pony had the ability to produce JOINs from the very beginning, but during that
presentation we found that Pony produced subqueries for MySQL and that was not
very performant, correct. Since than we've improved Pony and now it got a
query optimizer which replaces subqueries with efficient JOINs where it is
possible. Here is the query from that presentation:

    
    
        >>> select(c for c in Customer if sum(c.orders.total_price) > 1000)[:]
    

The straightforward way is to use a subquery here, but Pony's optimizer
produces LEFT JOIN because such query usually has better performance:

    
    
        SELECT `c`.`id`
        FROM `customer` `c`
          LEFT JOIN `order` `order-1`
            ON `c`.`id` = `order-1`.`customer`
        GROUP BY `c`.`id`
        HAVING coalesce(SUM(`order-1`.`total_price`), 0) > 1000
    

In our opinion this is the main advantage of Pony ORM - the possibility to
perform semantic transformations of a query in order to produce performat SQL
while keeping the text of Python query as high level as possible.

~~~
zzzeek
OK, here's something I should understand. Say we start with:

    
    
        myquery = select(c for c in Customer if sum(c.orders.total_price) > 1000)
    

I'm inside of a query builder. Based on conditional logic, I also want to
alter the above statement to include customer.name > 'G'. Intuitively, I'd do
this:

    
    
        mynewquery = select(c for c in myquery if c.name > 'G')
    

which will take the original SELECT, wrap it in a whole new SELECT. Right?

Given "myquery", how do I add, after the fact, a simple "WHERE customer.name >
'G"" to the SELECT? Just continuously wrapping in subqueries is obviously not
feasible.

~~~
amalashkevich
There is no need to wrap it with a query. You can just add a filter:

    
    
        mynewquery = myquery.filter(lambda c: c.name > 'G')
    

The new query will produce the following SQL:

    
    
        SELECT `c`.`id`
        FROM `customer` `c`
          LEFT JOIN `order` `order-1`
            ON `c`.`id` = `order-1`.`customer`
        WHERE `c`.`name` > 'G'
        GROUP BY `c`.`id`
        HAVING coalesce(SUM(`order-1`.`total_price`), 0) > 1000

~~~
zzzeek
Well right, this is exactly a SQLAlchemy-style syntax, except a tad more
verbose :). This is the "dropping into a SQLAlchemy-style approach" I referred
to.

As far as the "AST allows caching" advantage, over at
[https://bitbucket.org/zzzeek/sqlalchemy/issue/3054/new-
idea-...](https://bitbucket.org/zzzeek/sqlalchemy/issue/3054/new-idea-for-
bake) we're working out a way to give people access to the "lambda: <X>" ->
cached SQL in a similar way, if they want it.

------
GlennS
The lack of data migration tooling would be a critical problem for me.

The traditional problem that ORMs are supposed to solve is change. Changing
your database schema means you have to change all your queries. So instead you
have a a system whereby the program that executes your queries also
understands your schema and can make that change for you.

Data migrations are similar to this, only much harder and more time consuming.

------
mantrax5
I think once a developer reaches a certain level of experience and maturity,
they realize they should stop giving any sort of ORM a chance.

~~~
anton_gogolev
Care to explain as to why?

~~~
datashaman
It's the difference between working on the problem, or working on the bugs /
implementation details of the ORM. I know which I'd rather be working on.

~~~
Demiurge
I've been using Django for 10 years, and I never experienced an ORM bug, and
when it couldn't do some complicated join, I just did a raw query. For
everything else, it saved me tons of time.

Is that long enough experience?

------
hirre
ORMs basically comes from "lazy" programmers not wanting to learn SQL and
proper database schema design and wanting something "that just works"
instantly with little thought effort... Optimizations and handling of large
data sets are some of the problems with ORMs. However, in our case we decided
to use an ORM anyways because each UPDATE/DELETE/INSERT had to be signaled out
to some lowlevel code, communicating with hardware. If we hadn't used an ORM
(with overloaded save-methods) the user would have to keep track of all
changes himself and signal the hardware manually, so in our case this was fine
(we also didn't manage large data volumes)...

