Hacker News new | comments | show | ask | jobs | submit login
Lessons learned defying Joel Spolsky with Django (speakerdeck.com)
138 points by benregn 1425 days ago | hide | past | web | 149 comments | favorite

Bumping into ORM limitations + moving to Jinja for templates --> one word: Flask

...really, what advantage does Django provide at this point in this project anymore?

Completely replacing the template system with Jinja is silly (no idea if this is what they did) since it significantly reduces the value of the Django ecosystem. Far better it to use jinja for your work, but leave the Django templates for Django and all the other apps you integrate to use. I do with Django would reduce their stance on no expressions in the templates. Sometimes it saves a lot of work to just be able to call a function or add two numbers together without needing to build a filter or tag every time.

I've been happy with django-jinja[1] for that purpose. It replaces the context processors so it will load jinja templates if they have a .jinja extension and Django if they are .html. It also includes the django filters in jinja-land.

The ORM is more problematic. I just started a project a couple months ago and thought a lot about ditching the default ORM in favor of SqlAlchemy. I decided not to, for the reason of expedience and, as the TFA mentions, it already leaks. So, I stuck with the Django ORM and will drop to SQL directly if need be (need being defined by the ORM making the code confusing or there being performance hotspots).

1. https://github.com/niwibe/django-jinja

There are still a lot of Django apps that work fine in this scenario. Plus I don't think the slides imply that they are just blindly replacing ORM calls with raw SQL, just that they have profiled and replaced the hot spots.

Also Flask is a micro-framework and Django is a full-stack framework. Flask can be used as the backbone of a full-stack framework but figuring out a good project structure and finding out what third-party apps to use can be daunting for a person without Flask experience. If you really want to get people to switch, package up a Flask-based framework with the features of Django.

Middlewares, context processors, forms, (class based) views, and tons of third party applications.

The CTO of a startup where some friends are working thought the same you did, and 2 years ago rewrote everything to Flask. Now they're going back to Django.

Django is much more that an ORM and templates.

Django middlewares is a poorly designed system. Flask has all of the things you mentioned except for forms. For that you can use Flask-WTF, but these libraries are becoming outdated anyway because they’re HTML based and don’t work so well for JSON validation.

I don't write much HTML anymore (it's all frontend templating and API calls), but I use Django Forms everywhere. For instance, tastypie supports using a Django Form to handle validation in your API[1].

[1] - http://django-tastypie.readthedocs.org/en/latest/validation....

I've found Django forms to work extremely well for validating non-form-based data - the API is a neat way of representing a set of validation rules and running them against a set of data.

Hi Simon - I greatly appreciate your work and opinion! The reason I think HTML forms libraries are insufficient for the problem domain: JSON is richer, both in datatypes and structure. We’re working on a forms library of sorts that is better suited for API serialization/validation and having one central place to define models for multiple internal DB’s.


I've found Flask-WTF to work quite well for validating JSON. It looks a little weird at first due to the naming scheme mismatch, but I haven't encountered any actual functional problems with it.

Could you please explain why do you think that? IMHO it's a really clever approach.

In addition to the other replies I have to add - the goddamn Admin. The amount of time it saves me is beyond belief.

If you aren't using the ORM, what does the admin bring to the table? I agree with the value of the admin, and is one of the reasons in my latest project I kept the Django ORM, even though some coworkers felt it was limiting (it is, but it isn't all about a single component of the system).

"what advantage does Django provide at this point in this project anymore?"

Documentation. I use Django but without any ORM and with Jinja2, so it's basically just Flask but with more stack overflow threads and more third-party software.

Is there any actual advantage that I would be getting by using Flask instead?

Flask’s documentation is great, and the codebase is readable and small. The third party community is not as big as Django’s, but the 3rd party work is verified and approved by Armin (Flask and Jinja2 author). These guidelines encourage authors to publish higher quality code and better documentation. You couldn’t verify my claims, but this is my experience coming from Django a couple of years back. Django’s extensions are often very poor quality.

I’d also like to point out that Flask-Admin has come a long way with adapters for SQLAlchemy and at least one other ORM library. Very usable and extendible.

The advantages aren't really in using Flask, but being able to use a better ORM (or no ORM at all, just SQL abstraction) and template engine.

After working with Django ORM for a large project, I started to rethink the usefulness of models. It turns out just treating data as sets lends to a functional style and is simpler down the road.

> It turns out just treating data as sets lends to a functional style and is simpler down the road.

Could you elaborate on this or point in the direction of something that does? I'm pretty sure I get the gist but having some more meat to chew to make sure the perspective is fleshed out fully would be awesome.

The bottom line is, tying data to objects seems like an elegant idea, but in practice it sucks. ORMs introduce various problems, like statefulness (when the whole point of dealing with a database is atomicity) and mismatches (your data doesn't necessarly map to one row or one object, it doesn't have to). I'll try to give two examples.

Consider how people often write update code on Django:

    instance = Model.objects.get(id=some_id)
    instance.foo = bar
This sucks a lot. It's doing two queries, and you have concurrency issues. This would be optimal, since it's atomic:

But in this case, it's not instantiating any object nor triggering any signals, so what's the point of the ORM after all? We might as well just use a sane DB API.

Here's another place where Django's ORM fails:

This won't work. Django will complain 'foos' is not a field on the model - even though 'foos' is a column in the result set, and it's perfect valid to SELECT on it. I reported this as a bug, but because the ORM works with model definitions and the result set can contain any column, what the ORM is really supposed to do or not is murky, so it's WONTFIX. This is one instance where data doesn't map to an object and the ORM concept crumbles.

There are many other pain points with ORMs, and Django's in particular, but these are the highlights for me. For an elegant querying API, in my opinion, check out RethinkDB. It doesn't depend on schemas (therefore, ORMs) and it supports map/reduce semantics, which solves 100% of what you need to do with data.

You are mistaking about filtering an annotation.

   > Model.objects.annotate(Count('foos')).filter(foos__lt=10)
The only reason this doesn't work is because the property annotate creates isn't called `foos` by default. You just need to do this:

and Django now knows to add a `HAVING COUNT(foos) < 10` clause.

I also use Django without any ORM and with Jinja2 and I'm taking baby steps towards completely abandoning Django and moving to Flask. For me, the primary benefit is that Flask being so much smaller and simpler than Django means you can understand the entire codebase without an extraordinary amount of effort.

I'm curious as to why you stick with Django (other than having projects already begun relying on it)? Without the ORM and with the templates, there's is not much I get out of Django.

"I'm curious as to why you stick with Django (other than having projects already begun relying on it)?"

I'm still a relatively new developer, so it was just easier to learn.

In addition to what everyone else has mentioned, the amount of libraries written for Django is staggering. You almost never need to roll your own app -- someone has already done it!

Also, geodjango.

It has an entire ecosystem and tool chain to sort building websites/webapps. When I used to use Flask, before moving to Django, I found myself essentially creating a whole bunch of stuff that Django already has, and is better written.

To throw a completely different wrench into the mix, I mostly just use Django to provide an API anymore, and for session handling.

I love and use Django, and have built large projects where I haven't really run into any limitations with it[1], but for the most part, nowadays my workflow is to pip install django, south, tastypie, then load in a template with Backbone and Marionette, then get to town.

Templates are either Mustache or Underscore, depending.

[1] - Yeah, it could be faster, but if you have a large, confusing database schema that you inherited, the Django ORM is great for getting things stood up, and then just tune the queries after the fact. It's still a huge timesaver vs. writing every query by hand.

Don't use just one ORM and then declare "ORM's are stupid". The "object = None" / "object_id = None" issue illustrated here is certainly not a mistake every ORM makes.

The SQL generated appears correct, unless the underlying database can guarantee that all foreign key constraints are met. That is, I consider this a failure of the user of the ORM to appreciate that the two invocations of filter are not identical.

In the former case, the query is verifying that the object_id field cannot be used to find a foreign object--regardless of the value of object_id. This is exactly what it is asked to do.

In the latter case, the query is simply verifying that object_id is NULL, which is exactly what it's asked to do.

"other_id" and "other" here refer to two different ways of referring to a many-to-one reference between "somemodel" and "othermodel". Asking for rows of "somemodel" where "object_id" is None is the exact same thing as asking for rows from "somemodel" where its reference to "object" is None. Both should produce the same query, the simple one without the JOIN. The query with the JOIN is completely wasteful and not at all correct - it does a LEFT OUTER JOIN to the remote table, only to filter on those rows where the remote table has no match; but this is already obvious from whether or not "somemodel.other_id" is NULL.

You're right in that they are largely the same action, but consider this case:

An Owner is deleted without setting the pets.owner_id to NULL. So now there's a row in Pets with an owner_id referring to an Owner that doesn't exist.

These two queries will (rightfully) return different things in this case.

One will return The Pets who have null for an owner_id. The other will return the Pets whose owner_id is NULL AND those who do have an owner_id, but it doesn't reference an existing row in Owner.

OK, you're right in that owner_id can refer to a nonexistent entry, if the schema both doesn't make correct use of constraints and is also referentially corrupt. Designing the ORM to jump through huge hoops to suit the case that the user is using the relational database incorrectly, in such a way that enormous performance overhead is added to the use case as used correctly in the vast majority of cases, is IMO a poor design decision.

Great point, I didn't even think of that! The join there is for a reason, and that you're getting the entire 'model' not a field. If someone thinks what ORM does in this case is stupid, they probably think table constraints are stupid.

So the ORM is too stupid to know that in the case the other_id is NULL, it doesn't need to create a join subquery. But would you rather the PostgreSQL query planner figure this out or do it yourself in python? I would like to see the time difference both queries take, and I don't think I would want Django ORM to do this.

I think you should run EXPLAIN ANALYZE on both queries and see what you find.

I now understand that it won't be optimized away because it's doing something, and I think I still want it to do that something. I've run into plenty of ORM limitations and use custom SQL for procedures and custom joints, but I still wouldn't call Django ORM stupid for doing what it's doing, in this case.

Correct, these are semantically different. You’ve described it well. The ORM would be incorrect if it did not generate the SQL that it did.

Part of the issue is confusing object data with metadata. The id is an implementation detail of the data store – metadata. What the user is trying to do here is “talk database” and “talk model” at the same time.

Which is a perfectly good argument against an ORM. But one should commit to a metaphor, or not: http://clipperhouse.com/2012/02/29/suspension-of-disbelief/

If the schema is known to have a FOREIGN KEY constraint on sometable.other_id, then the SQL is as close as can be to wrong without actually returning the wrong answer. In the presence of the FK, the two queries have the same intent. But the first produces a vastly complex query plan for no reason, and this has nothing at all to do with the application/model layer.

Edit: plus! even if you want the same answer assuming that FKs are not in place and that bogus values might be present in other_id, the query is still far less efficient than it should be. You should be doing this:

    select * from sometable where not exists 
    (select 1 from othertable where othertable.id=sometable.other_id)
compare the query plans on any reasonable database and see (and yes, SQLAlchemy produces the NOT EXISTS form when the relationship is a one-to-many versus many-to-one and you ask it for objects with empty collections).

“Wrong but works correctly” is a pretty novel definition of wrong.

The argument you are making is that this particular ORM needs optimizations. Which is true! The point stands that the semantics of the two expressions is different, and thus should do different things. The point also stands that the user is mixing metaphors.

If the ORM can be informed of the guarantee that the FK constraint provides, and optimize accordingly, that’s good too. But this doesn’t tell us much about ORMs, except that they can be improved.

Also, a database that has two different query plans for queries that are logically equivalent…needs improvement.

Yes, the first querry is correct, and the second one will return incorrect results if you don't keep the consistency of your database in check.

No, the Django ORM is not correct in unsing the correct query by default. At least not on every database backend. That's because on any database that enforeces constraints, it already created the necessary checks, and altough functionaly they are equivalent, that line can mark the difference between a 5ms or a 45 minutes runtime (that's what happens with my data, not a hypotetical case). There is more to a framework than mathematical correctness.

But then, it should certainly have an option to always use the correct query, and it is a lot of implementation work that I can understand quite well that Django developers don't want to do now. They probably have other priorities.

And by the way, as I said, this is a problem to me (but no, not important enough that makes me fix it now, maybe later), but did I stop using an ORM just because of an abstraction leakage? Of course not, I encapsulated a solution to this problem and gone on, earning lots of man-hours at the 99% of my code where Django's ORM doesn't leak. That's my main disagreement with the article. Yes, ORMs are stupid, but not using one just because of that is stupid.

And yes, if you let Django templates go, you can easily cut 9ms of your response time. That's great! I wonder how much you'll cut if you rewrite it all on assembly.

Obviously there are subtleties to things that may not be apparent, but his example makes sense to me and so does the SQL queries that are generated. The first query is operating at the "object level" and the second is operating at the "attribute level".

The first is retrieving the object and checking if it exists, and the second is just checking the parent's foreign key. Not sure that they are really the same query, if you have unenforced foreign keys.

This is not "dumb", this is analogous to checking if a pointer is null or that the contents of the pointer are null (which is a distinction that some people want to make).

Why not just write the SQL?

When ORMs first started becoming popular during the 2000s, especially within the Java world, their proponents drummed up a lot of animosity toward SQL.

While a lot of us who had started working with SQL in the 1980s, if not earlier, were perfectly fine with using it, many younger developers were scared away from it by these claims.

So we've had a generation of software developers who were essentially raised to hate SQL, and to embrace ORMs, even after it became clear that ORMs do come with some pretty serious trade-offs, and do not necessarily increase productivity.

Not having a solid grasp of SQL, a lot of these developers just don't realize what they're missing out on. I've seen this first-hand many times before. These developers will spent hours upon hours trying to get their ORM to perform a moderately complex query that could be easily written by hand within a few minutes, including any code necessary to perform the query and to retrieve the result. The time and effort expended on these sorts of queries will very quickly negate any time and effort the ORM may have saved for simpler queries. And these moderately-complex or complex queries always arise in real-world software.

I think that education is the only way to really solve this problem, but a lot of developers are quite set against this. Learning SQL isn't that much of an investment, but the returns it offers can be huge.

I came from that same time, and I think you are partially right. I think a self-perpetuating cycle has occurred where people know less about sql, so they use it less, so they know less about it.

But I also think that people don't like the boilerplate that comes with direct sql access. I think they don't like the impedance that comes in reading and understanding code. And I think they want to hand off annoying, but critical, things like caching to a lower level they don't have to think about.

SQL is an important tool, and any developer, especially one who is using a framework like Django or Rails, would be wise to learn it, but ORMs still have value and it isn't all about "I don't know sql".

I disagree. There's a lot of boilerplate ontop of raw SQL that can and should be abstracted away. At some point you'll have to parse out result and build an object graph anyway, it would be nice if it was done for you already. You can also plug-in things like a caching engine easily and transparently.

Most ORM systems will give you options. My experience was with Hibernate, which had let you do Object queries, Criteria queries, HQL queries, and finally raw SQL. You hardly ever needed to go down to raw SQL. It's nice not to worry about the particulars of the underlying SQL engine, and certainly you want serialization to be handled for you.

The OP is right though, you can't treat ORM framework as a total blackbox. You need to be aware of what it's doing else you can really get yourself in trouble.

hibernate lets you do object queries, criteria queries, hql queries, and finally raw SQL

So you have to learn

   Object Queries
   Criteria queries
   HQL Queries
   Raw SQL
And this has made your life easier has it? Hibernate is massively complicated, and you're right, it doesn't insulate us from the database, Not even slightly. So the amount of shit I now how to know has quintupled, just to persist an object!

But hey, BOILERPLATE, right?

>And this has made your life easier has it?

Yes it has. Belive me, it has. And it isn't nearly as bad as you make it out, certainly better than the alternative. If you have relatively simple relational data, and query requirements, you can get away with just Object and Criteria queries. Your code will thank you. For more complex queries, HQL will get you down almost to the bare metal. Why do that in lieu of raw SQL? I mentioned several reasons. One of which, is that the ORM layer does abstract the boilerplate of query to-and-fro serialization. More than that, it enforces constraints. Your DBMS doesn't give a shit whether ages should be in some valid range, or have some sort of valid format, or whatnot. Those constraints are in your object model, which is then automagically transferred to your SQL commands. Another reason is that you can now substitute SQL backends, trivially. Going from MySQL to Postgres is a one-liner. Another reason, caching is completely transparent. You can now plug-in any kind of caching engine and strategy with a one-line config change, and it's all completely transparent to your application. Another reason, your framework (JEE or Spring) probably has hooks to your ORM, which makes the integration completely seamless.

The thing is, if you didn't go with an ORM framework, you'd probably roll your own abstraction layer to take care of some (all?) of the above mentioned use-cases. You don't want serialization code, or caching code, or constraint-enforcing code littering your business logic. So this abstraction is good. There may be reasons to forgo an ORM framework, but I sincerely believe the vast majority of use cases will benefit from it.

If you insist, because I really want the months of debugging work we've spent on hibernate to pay dividends, I really really want the technical debt we've accumulated capitulating to the abstractions limitations in our design to be paid, and I really really really want to believe that we haven't wasted our time on an silver bullet that fits only trivial cases. I want to believe that serialization, caching, and constraint enforcement are far bigger nightmares than the one I'm going through.

I just don't think that will happen.

"At some point you'll have to parse out result and build an object graph anyway"

This type of thinking has to stop! Sure, some times it may be necessary to extract data from SQL and turn it into some type of object. However, most of the time it's enough just to get the data and work with the data directly.

Be serious. I made this argument in another thread: I don't believe for a second that it's good practice to just work with SQL directly in your business logic. You don't want to litter your code with serialization logic. You don't want litter your code with constraint checking everytime you make a query. Even if you don't use an official ORM framework, you will write an abstraction layer that will duplicate some of the ORM functionality.

I worked with redis extensively, and i got to the point where it was too dangerous to simply assume that none of the other guys on the team (or me) wouldn't put some garbage data in a field because from redis' perspective, every key looks the same and every value is as good as the next. We rolled our own abstraction layer, in which keys and values were wrapped in domain specific objects.

Programming languages, and databases are too general to be useful. If you don't 'constrain' them to your domain, you're going to get destroyed once your product or team scales to a certain size.

Not converting data into object graphs does not necessitate working with SQL in your domain model. I'm currently working on an application right now which favors simple data structures (hashes, arrays) over custom entities. It favors a business layer that operates on those data structures over composing dozens of "Model" based classes. That does not mean that persistence logic is littered within my domain.

* grabs OO sword and shield and prepares for battle * .

Ha! I have no problem with OO based programming. I do have a problem with OO based solutions for every problem. Sometimes it's worth pursuing alternative solutions, whether procedural or functional or a combination of what makes sense.

Not too mention, what many believe as an OO solution is often times far removed from anything remotely OO.

I don't mind writing queries, but I hate dealing with raw results. For complex queries I write SQL but have Django's ORM map relational data to objects for me.

The only problem I see, is we can't rewrite SQL.

I.e., in pseudocode:

    query = SQL("SELECT * FROM entities WHERE owner = ?", owner=me)
    if some_condition:
        query = query + SQL.WHERE("OR public = TRUE")
    if other_condition:
        query = query + SQL("LEFT JOIN things AS t"
                            " ON t.entity_id = entities.id") \
                      + SQL.WHERE("things.value > 0")
    my_nice_list_of_results = run(query + SQL("LIMIT ?", count))
This should be technically possible, but I haven't seen any library that does it.

SQLAlchemy lets you do precisely that. Give it a try, I haven't written a line of SQL since I've been using it.

Yes, I know and use it, but SQLAlchemy is not SQL. It's completely another (although, SQL-inspired and compiled-to-SQL) language, which rises learning barriers. I find myself frequently thinking SQL then mentally transforming the queries to SQLAlchemy syntax, which is somehow pointless activity, considering the fact computers excel at transforming formal languages.

I believe, If someone'll take an SQL SELECT statements parser and create a library that'd generate SQLAlchemy query/statement object from them, such library will make development more productive.

this is more or less what HQL does (http://docs.jboss.org/hibernate/orm/3.3/reference/en-US/html...). However the huge advantage to composing SQL as an actual object construct is that you get reusable constructs which serve as components to composing a larger structure. String-based SQL OTOH means you're going to be concatenating strings together which is error-prone, verbose, and even hazardous from a security point of view as it discourages the usage of bound parameters.

I have always wondered why people have an aversion to stored procedures. They are a programming language in themselves, so you could pass all of your parameters (owner, some_condition, other_condition) into a proc and, depending the logic, return a different cursor to a different query.

The problem with stored procedures is managing it. You have to maintain another, separate codebase. You need to make sure those procedures are "installed" and up to date. Plus, DBMSs don't have a concept of "versions", and those procedures are treated as data. In short, it's hell.

That's one of the pain points RethinkDB is trying to solve, since you write your queries in whatever application language you use and it's parsed and executed in the cluster.

it's because the stored procedure development model lacks the tools in order to make integrating with an application-level domain model simple. You end up needing to write not just one persistence layer, that of marshaling your object model to and from SQL statements, but two - all the SQL statements behind your stored procedure layer, and a second to move all the data between the SPs and your object model. To make matters worse, the stored procedures must be written completely by hand without the benefit of any in-application schemas to help compose statements.

One reason for the variety of opinion on this is that different developers make more or less use of domain models in the first place. Those who are accustomed to writing all SQL completely by hand with no helpers at all, and not working with a domain model tend to view the stored procedure approach as equivalent. Those who are accustomed to having at least some simple marshaling layers like a construct that generates an INSERT statement given a list of column names see the SP approach as more tedious since simple techniques like that are usually not easily available, at least in more old school SP languages like TRANSACT-SQL and PL/SQL.

All of that said, I do think this is a problem that can possibly be solved. Postgresql allows SPs to be written in many languages, including Python. I have an ongoing curiousity about the potential to integrate a Python-based object relational system into a stored procedure system. But it might end up looking like EJBs.

This sounds like a situation where it'd just be better to use separate, yet similar, queries, with each handling a particular case or set of conditions.

Views, stored procedures and functions can be used to help isolate duplication, parameterize the queries, or otherwise hide the SQL.

Code like you've posted is the result of taking DRY too far, to the point where avoiding a small amount of repetition ends up bringing in far more complexity and problems than the repetition might cause.

CLSQL has a functional interface which can do this (http://clsql.b9.com/manual/).

"The only problem I see, is we can't rewrite SQL."

I believe that the phrase you're looking for is "lack of compositionality".

I fail to see why one can't manipulate parsed SQL's AST. SQLAlchemy shows thas manipulating SQL-inspired objects is perfectly possible.

It's only bridge between sqlparse and SQLAlchemy that's missing. I guess, just because nobody had a wish, will and time to finish and share one.

I fail to see why one can't manipulate parsed SQL's AST.

I think that this is the "every problem in CS can be solved by another layer of indirection" part. You're basically sidestepping the issue of SQL not providing the functionality in the first place.

There are libraries which build up a model of a query - selections, constraints, etc, and only turn it into a string at the point of executing it.

SQLAlchemy was already mentioned. In the JavaScript world, there's node-sql. https://github.com/brianc/node-sql

That kind of code often ends up being worse to deal with than the SQL it is replacing.

I've always found it kind of odd how there are some people who despise SQL merely for its syntax, yet they'll turn around and advocate the use of libraries which mimic a SQL-like syntax in some other programming language (but do an absolutely terrible job at it).

The node-sql examples are atrocious, for example. It's even more obvious with the SQL so close by. The SQL statements are clear and concise, while the JavaScript version is nowhere near as easy to read.

At least LINQ gives the option of not having to directly deal with the method calls, which makes it marginally nicer to work with. Anything less than that, like we see with basically all other systems, is far less usable.

> I've always found it kind of odd how there are some people who despise SQL merely for its syntax, yet they'll turn around and advocate the use of libraries which mimic a SQL-like syntax in some other programming language (but do an absolutely terrible job at it).

No, I use ORM because I love SQL. ORM doesn't replace SQL. ORM helps to generate the exact SQL I want with much less code.

I have seen application with thousands of stored procedures, most of them boilerplates, and only supports one particular flavor of RDBMS. I have seen too much hand-crafted SQL in the form of "@param_xxx IS NULL OR field_xxx = @param_xxx".

I used to think that Tom Kyte was right, that everything should be in stored procedures. Now, I am thankful for ORM (more specifically, SQLAlchemy).

The complaint was not about SQL's syntax, but that SQL statements as strings are inflexible.

One often wants to have several variants of a SQL statement, beyond simple placeholders for arguments. I've seen several projects that grow a lame templating syntax on top of their SQL strings, to the point that the SQL then becomes incomprehensible.

If this really bugs you, perhaps the ultimate solution would be to actually parse SQL.

   query = sqlParse("SELECT foo FROM bar WHERE quux = 1")
   query2 = query.clone().constraint("quux = 2")

Some ORMs support the Query Builder pattern which does the exact thing you want. http://www.yiiframework.com/doc/guide/1.1/en/database.query-...

Just my opinion of course, but it seems that designers who think primarily in terms of the application will prefer to use an ORM to abstract away details of the storage layer; whereas a designer who thinks primarily in terms of data will prefer to write SQL, with the application being just one of many possible applications of that data.

Because (at least for the kind of queries common to most web applications) you can develop faster using an ORM. Less boilerplate, less repeated code, less time thinking about how to convert between SQL and application-level representations of your data - and most importantly, no time at all spent thinking about how to dynamically generate SQL queries using string concatenation (a sure-fire way to introduce bugs and security vulnerabilities in to your code).

Poor tool integration, nonexistent library ecosystem, and the advantages of a single-language codebase.

Because you end up writing the same basic SQL over and over and over again?

Do you have an example of an ORM which is not in some fundamental way "stupid"? I haven't found one yet, but I'd love to know one existed somewhere.

You should take a look at the ORMs that are considered a bit more at the top of their class. In the Python world SQL Alchemy in the Ruby world Sequel. While you can't judge all ORMs by looking at a few these do offer a better impression of what an ORM can do for you.

Until we've attained AI, all software can be said to be "stupid". Google is stupid. Relational databases are stupid, SQL is stupid. None of it works without human intelligence actively directing it to do work for us. Why single out ORMs?

That's called arguing for the sake of arguing ...

OP says one ORM is bad so all of them are, you say no some are nicely made (which I agree with), parent ask for a specific example you would recommend and you answer by nitpicking on a single word in his message, one that he even put in quotes himself. And then you finish with a question about something that parent didn't even say or infer.

Either you have an example and you provide it, or you don't and you say so, but your comment was unnecessary and unwanted.

(and so is mine, but I've seen so many of those on HN lately that I just broke and wrote that rant)

see my other comment to PommeDeTerre.

> Until we've attained AI, all software can be said to be "stupid" ... None of it works without human intelligence actively directing it to do work for us.

You've implicitly defined AI as a level of intelligence that doesn't need human oversight to function. That level doesn't exist yet, but that doesn't mean AI doesn't exist -- it just has a different definition than the one you're using.

Consider Watson (the Jeopardy contest computer lately in the news) -- it can beat the best Jeopardy players, but it's completely unable to function if given a different task or deprived of human oversight. Notwithstanding that limitation, most people will claim it's an example of AI.

Software is a tool, ORMs are tools. I don't need a tool to do my work for me, I just need it to work well for whatever job it's for. A "stupid" hammer would be one that is just a rock with a hole in it loosely fitted over a stick that fits in the hole with nothing but friction to hold it on. A "stupid" farm tractor would be one that gets 15 mpg most of the time except when it's used during a full moon between 10am and noon, in which case it gets 1 mpg.

Most ORMs are indeed stupid. They often produce completely surprising results at the least opportune moments. They are high maintenance tools that require constant supervision to ensure that you haven't accidentally made some changes which causes something crazy to happen in the ORM. It's about the principle of least surprise, which ORMs often fail horrifically at.

You're contradicting yourself. Your first comment indicates that it's wrong to label ORMs as "stupid". Yet in this latest comment, you've labeled all software (which includes ORMs, of course) as "stupid".

Like the other commenter requested, can you provide an example of an ORM that isn't "stupid"? Your initial comment makes it sound like you've worked with at least one that isn't. We're curious to learn about which these may be.

The word "stupid" here is ambiguous, so can be interpreted as, "stupid", "it's stupid to use them" (I disagree), or "stupid", "they don't necessarily understand your intent" (this applies to all software), "stupid", "simple issues are made more complicated than they should be, to an unreasonable extent" (this is a problem that varies to a significant degree based on the ORM in use and cannot be generalized based on the experience of just one ORM). Without the benefit of the speaker's words here it's hard to tell, though the only backing evidence given in these slides for "ORMs are stupid" is a trivial issue that not every ORM has, so that's IMO not a good argument.

View my profile for further detail.

The problem with ORMs seems to be that they're a leaky abstraction. I've been very happy using SQLAlchemy for quite a while now, but I'm intentionally only using the SQL Expression API. Based on my experience with Hibernate, there's always something you want to accomplish that necessitates circumventing your ORM.

Not only that, but typically you end up balancing between stuff being unavailable because of lazy-loading, and one pageview taking 30 seconds because the ORM collected all the dependencies.

> The problem with ORMs seems to be that they're a leaky abstraction.

I disagree it's a problem.

going back to Joel again (!) :

"All non-trivial abstractions, to some degree, are leaky."

I talk about this a lot in this particular talk: https://www.youtube.com/watch?v=E09qigk_hnY

Hibernate was a great influence on me but I like to think that it only introduced some ideas in rough form that we've all had many years to improve upon.

The ORM will of course introduce new issues to deal with but this is because it's taking care of a vast amount of persistence code you no longer have to write, and applies a consistency to that persistence logic that would be extremely difficult to achieve without using tools.

Look, I'm willing to believe that SQLAlchemy is the bad-assest ORM on the planet, but I'm not convinced I'll never need to circumvent it to get something done, and I'm free to consider that a problem (along with other commenters, it seems).

As I mentioned, I'm very happy using SQLAlchemy's lower level API. It's a helpful and elegant abstraction over queries and table definitions etc, and I've never needed to circumvent it yet. I'm also convinced that the delightfully flexible/powerful Mako is hands down the best templating library for Python. You, sir, Rock. But you come off as needlessly argumentative in this thread.

sorry, I did a whole talk inspired by the term "leaky abstraction" and thought it was relevant.

Is the talk's central idea that abstractions are bound to leak and we'll just have to deal with it? I did skim through the beginning of it, but then Youtube's player suddenly skipped to the end and stopped, and I moved on.

The thing is, SQLAlchemy's SQL Expression API is a suitable-level abstraction: high enough to be useful, but not high enough to guarantee leaks. I'm happily making queries with one-liners, and haven't had to circumvent it yet, but I bet I'd have run into trouble with any ORM already.

You know, you could've just directly mentioned SQLAlchemy earlier...

I don't like making my posts look like plugs for my own stuff, though I guess it's unavoidable....

In these type of discussions, it's not a plug -- it provides examples and evidence useful for teaching, which can light a path to a better way.

As Abelson and Sussman point out in SICP, "Computer language is not just a way of getting a computer to perform operations but rather that it is a novel formal medium for expressing ideas about methodology. Thus, programs must be written for people to read, and only incidentally for machines to execute" (http://mitpress.mit.edu/sicp/front/node3.html).

The software we write is a codified expression of what we think, distilled into a working example that can elevate a discussion from theoretical-based to evidenced-based. Those who have devoted time to think through an issue deeply and have codified their thinking into software would be doing a disservice by not referencing it.

There are several Micro ORMs for .NET: http://www.servicestack.net/benchmarks/#dapper-benchmarks

That don't try to handle hidden-magic-state and lets you easily access via Raw SQL if you need to do complex queries. Many don't try to abstract anything and are simply extension methods over the underlying IDbConnection (so you never lose any flexibility), i.e. they simply exist to remove the tedium boilerplate of mapping RDBMS results back into POCOs.

And if you want to take it to the "next step" and not rely on strings, having everything compiled and checked by the compiler, while staying low level: http://bltoolkit.net/ or https://github.com/linq2db/linq2db

SQLAlchemy has several explicit layers of abstraction, the first of which is an AST for all of SQL. The ORM is entirely optional.

Definitely agree.

Using an ORM should not exclude also using direct SQL. It should be both.

I believe this brings the best combination. Anytime there is major complexity just drop to normal SQL.

The best of both.

I think it's more of a case of the developer not having read the ORM documentation; this is a very newbie mistake (although very understandable, true).

Now obviously, some people would complain that it doesn't make sense to do the extra join, but then people would be complaining about magical or exceptional behavior. ORM behavior is very predictable about which fields are being queried

I haven't checked this one either way, but I'm curious what SQL would be if they used the 'right' ORM incantation.



True, but every ORM will make that kind of mistake. You can't just build an object model without paying attention to the SQL layer.

I was expecting this slidedeck to be a bit more focused on defying Spolsky, and whether or not that was a good decision.

FWIW, I've never bought into Spolsky's vision that re-writing code is poor strategy. Steve Jobs never thought twice about ripping something apart and starting over. If anything, code re-write can be an advantageous position -- you often have a greater understanding of the problems you're intending to solve. When well-executed, it can take the form of heavy refactoring, even when switching languages/platforms.

I'm not really sure why Steve Jobs is your model for code design.

Because he oversaw the creation of 3 of the most successful operating systems of all time? I know he didn't design them, but he was involved in managing the projects. Just playing devil's advocate.

Well, it doesn't mean he evaluated at the code level and decided that rewriting is better than refactoring...

Apple has/had a history of re-writing code over the years. The first iPods and their iterations were code re-writes, IIRC. Many of the onboard applications were completely re-written for the iPad.

Rewriting for new hardware is totally different from rewriting business logic. In the first case there are sometimes no other choices left.

Also, the code that usually goes into hardware microcode and firmwares is much more tightly coupled.

Steve Jobs never thought twice about ripping something apart and starting over. If anything, code re-write can be an advantageous position -- you often have a greater understanding of the problems you're intending to solve.

It's interesting that you take Jobs (and by extension Apple) as an example here, as many new projects from them which might appear to be complete rewrites from the outside are in fact heavily derivative or dependent on other projects. iOS for example is an incremental revision of OS X, removing some of the UI layer and replacing it, but leaving almost all of the underlying OS intact, using the same dev language any many of the same APIs - it is by no means a clean-room rewrite. OS X itself was heavily based on NextStep, which of course was based on Mach/BSD, so none of these 'new' platforms started from a clean slate like BeOS for example, and the same tends to happen with APIs, though sometimes these have been rewritten (Quicktime comes to mind, and arguably UIKit is a significant rewrite of AppKit, though the two still exist in parallel just now).

Sometimes rewriting is the best solution, but it does tend to take a lot longer than expected, doesn't always leave you with a satisfactory replacement, and ends in failure more often than it ends in success, particularly on very large projects or ones with fuzzy scope. I think Spolsky was talking about projects on the level of Netscape and Excel, where a rewrite would be a significant challenge very likely to fail or be delayed so long that it falls short of its initial goals. The smaller the project, the more viable a rewrite becomes, and sometimes it is the best option if the existing product is not delivering and is difficult to extend/support.

>I've never bought into Spolsky's vision that re-writing code is poor strategy.

I think it's something that's true in general but false in some specific cases. Rewriting involves spending enormous time and resources to at best standstill, and at worst move backwards ( chances are your re-written product will be poorer in features, and initially buggier than your old, stable, battle-tested version). For smaller companies, it is a death knell.

Yes, very true. Although I'd like to add some context.

> Rewriting involves spending enormous time and resources to at best standstill, and at worst move backwards ( chances are your re-written product will be poorer in features, and initially buggier than your old, stable, battle-tested version).

There is plenty of evidence that this has happened in many places and with many companies. A very real scenario that's played out before.

I'd say those scenarios were not well-executed. If the outcome of a re-write is "standstill", the re-write is pointless. There is no justification for proceeding with it.

However, if "standstill" equates only to user-facing features, chances are the re-write is to address critical issues elsewhere (I get the impression that was the situation with the OP.) In that case, "standstill" doesn't apply. It's simply a matter of deciding whether or not the effort and risk justifies the reward.

To my main point, Spolsky's hard-line essentially says re-building your application from scratch is bad strategy. I think it is short-sighted to draw that line. I prefer to exercise judgment and draw on the resources at my disposal for the given situation. I presume many others do as well.

>Spolsky's hard-line essentially says re-building your application from scratch is bad strategy.

I don't think Spolsky's position is as "hard-line" as you think it is. I always took it as a very very strong "rule-of-thumb". Every time a rewrite is proposed, it should sound warning bells in all stakeholder. And we're talking about a clean-room rewrite, and not a sub-module rewrite. Spolsky has no problem with rewriting and re-architecturing sub-pieces of an application.

Twitter is a great example of Spolky's philosophy. They had a big architectural and technology problem with Ruby and RoR, but they didn't do a clean-room rewrite with Java. Instead they tackled one sub-component at a time, and now they have something that rocks, but still allowed them to keep the time/code investment they made originally. Just as well, it allowed them to incrementally 'upgrade' their service.

I think the big danger Spolsky warns about is the "stop working on the old code and spend a year doing a complete rewrite". Many very smart people have underestimated the risks in doing that and the time it will take, while overestimating the benefits.

A smarter approach is to rewrite parts of the code. Apple has done that with many apps (and even iOS is OSX with a new UI/API on top). Linux has had many patch-by-patch rewrites as well. With a webapp, a "complete rewrite" still might leave most of the heavy lifting to the web framework, and you might be able to keep most of your HTML, CSS and JS.

I think the well-executed rewrites usually include launching an entirely new product while still supporting the old one. Adobe did this with InDesign (replacing PageMaker), Apple did this with OSX (the initial releases included OS9) and Microsoft did it with NT while keeping Windows 9X around.

Yea, the WaSP was in 1998 petitioning Netscape to cancel Mariner, and unfortunately they caved.

My personal favorite is the MS OS/2 2.0 fiasco, where the project was abandoned by MS after the first SDK was already sent to developers: http://yuhongbao.blogspot.ca/2012/12/about-ms-os2-20-fiasco-...

I found I could follow along with the slides, though some of the icons and messages around third-party tools were lost on me. I'd definitely appreciate a video.

Edit: 9 minutes ago, Nick posted to Twitter: "we have the recording - just need the sound cleaned up. Expect it early next week ;)"

Video now live: http://vimeo.com/65057265 [Slides in sync with audio]

While I love seeing slides, this deck obviously could use the audio or transcript along with it.

Agreed, I for one was felt feeling rather in the dark for the last few slides especially...

We are posting a version with audio on our blog on Monday: http://blog.iconfinder.com

I look forward to it!

Only audio, no video?

We will combine audio with the slides :-)

Video now live: http://vimeo.com/65057265

> 20ms with Jinja2 without auto-escaping

could this performance improvement back-fire if you end up with a security issue?

I'm not saying that it definitely would. If you know what you're doing / trust your data sources or sanitize them elsewhere, you should be fine. I'd be careful turning off such a feature completely though...

Of course. This is merely a tradeoff between performance and developer time. 99% of projects will never have HTML autoescaping as a performance pain point. Then again, you're going to need tens of hours to review all templates to make sure you're escaping everything. If your hardware budget is greated than what it costs to audit the code, it's the proper decision.

I wonder, is there a video of the talk available? The slides, unfortunately, alone are rarely very informative.

We will post video version soon (Monday) on http://blog.iconfinder.com

I found these tools mentioned by the author really helpful, I compiled a list of them:

http://jinja.pocoo.org/ https://www.getsentry.com/welcome/ http://graphite.wikidot.com/start https://opbeat.com/

is there a way though to use/test opbeat ?

Just send them a message via twitter or email. I'm sure they will help you get started.

You should give SQLAlchemy a try:


It gives you more control and requires you to be more explicit about your queries and relationships.

Does anyone know what tool was used to profile the django app? Looks cool.

If you're talking about the slide that says '87.91% of time spent rendering' then yes, I would like to know too.

Annoying that it doesnt have slide numbers to refer to.

it looks like kcachegrind or pycallgraph (https://github.com/gak/pycallgraph).

I don't think Spolsky's argument applies to startups... His argument basically boils down to "You think the code in front of you is a mess, but the reality is you're just having trouble reading somebody else's code which is probably good enough". But what if you wrote the code yourself? In that case, it's probably just a mess.

ORMs may be "stupid", yet Basecamp manages to handle 400-500 request per second with Rails and MySQL just fine. https://twitter.com/dhh/status/287221705443774465

Iconfinder. They are profitable? They have business requirements? Joel wrote about the problems of a rewrite when you need to earn money, implement money making features, heavy competition while at the same time maintaining two plattform where one is a moving target. I don't think Iconfinder fits in any of these constraints.

Incorrect. We fit in all of those constraints at the point where a rewrite was decided ;)

Thanks for your reply, from looking at Iconfinder it did not look like a large piece of software with lots of business requirements.

We have posted a video version of the talk (slides + audio): http://blog.iconfinder.com/staying-sane-while-defying-joel-s...

Did someone understand the last slides about the differences between transactions in tasks with and without celery ? I'm using celery, and i have been using django in the past, and I really didn't get the point.

He's trying to avoid the situation where you enqueue a background job inside a transaction and the worker gets started on that job before the transaction is committed. If the job needs to hit the database for any reason your likely to get errors as the new data hasn't been committed yet.

It looks like the work around he used is to cache the job queue locally and only flush it to the real job queue after the database commit, so your guaranteed whatever data may be needed for the job has been committed to the database.

Basically if you modify data in a transaction, add a task to the queue that relies on that data, and the worker pulls the task off the queue BEFORE the transaction is committed, then the task tries to access data that doesn't exist in the database yet.

He's trying to fix/avoid a race condition caused by data created within a transaction not being committed yet.

Interesting point on celery and transaction, though I've never seen that happen in real life, interesting.

+100 for using "leaking abstraction" in a slide deck "defying Joel Spolsky"

Is there a (publicly available) video that accompanies this slide deck?

I didn't understand why ORMs are stupid. Can someone enlighten me?

The real issue with ORMs isn't the ORM itself, but over reliance on the ORM to do everything the right way and not validating the ORM is doing things the right way. ORMs are also often heavily leaned on by people who don't understand SQL and relational databases well enough and just want a data dumping ground (would have been better off with a document database).

But, if properly validated, and knowing when to NOT use the ORM, a good ORM can help you get a lot of work done very efficiently. But I've also seen improperly used ORMs turn into MASSIVE time sinks where devs spends days just configuring the stupid thing (hello Hibernate/nhibernate).

I guess his point is that they sometimes create huge ugly and inefficient SQL queries. (Reminds me of the 90s sentiment of C compilers creating ugly assembly.)

If you know SQL, they're often frustrating. You know what you want to do, but there's some ridiculous arcane approach to get those results via the ORM. If you can do it at all.

ORMs make it easy to do things like run queries inside a loop without realizing it. I worked on a site where the front page ran something like 200 queries every time it was accessed thanks to ORM magic.

All abstractions tend to be leaky.

ORM can abstract DB/SQL for you, but if you really ignore DB/SQL, then it can happily make some queries an order (or two ) of magnitude slower than they should be.

So, you must always think in explicit SQL-query-terms anyway; and then it's just a balance for ease of coding - does the ease of ORM syntactic sugar outweigh the effort for you to double-check if any ORM-built queries don't accidentally do something stupidly slow.

ORMs pretend to give you full abstraction, but in reality you have to be aware of the underlying SQL layer when you build your object model.

I find that if you are aware of the SQL underneath, they are good for saving time.

Do you really want to write SQL to retrieve data and code to populate an object for every damn thing in your system?

I agree. Problems happen if you treat ORMs as a total blackbox.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact