Hacker News new | comments | show | ask | jobs | submit login
Goodbye MongoDB, Hello PostgreSQL (olery.com)
802 points by YorickPeterse 955 days ago | hide | past | web | 374 comments | favorite

As a greying developer I am most amused by people discovering that 'old' technologies like SQL databases work really well.

The only useful piece of advice I can give a younger developer is... be careful when drinking the newtech koolaid.

And one more thing:

    star = Sequel.lit('*')

      .select_append { count(star).as(:amount) }
      .select_append { ((count(star) / sum(count(star)).over) * 100.0).as(:percentage) }
just makes me want to cry. Learn SQL rather than wrapping it.

It's not so much about not wanting to write/understand SQL (both are still very much required), but about composability. If you want to re-use bits of a SQL query written as a string literal your only option is string concatention or using some kind of string builder/template system. In both cases there's little validation of the query's correctness (syntax wise) until you actually run it.

While I agree that many ORMs go too far or even worse, not implement certain powerful features, Sequel (and similar tools I imagine) strikes a nice balance. The particular examples I gave on their own are not super useful, but we already have quite a few queries that are composed/re-used without having to concat strings.

So tl;dr: it's about composability, not being "lazy" or "ignorant" to SQL.

I once agreed with this, but now I don't. I just want to write SQL (dammit!).

I can never, ever remember the intricacies of the Sequel API or any one of these query builder APIs. I am always looking up something that is rather trivial because I am thinking in SQL, the language, and always have to convert back to Ruby or whatever language I am working in.

CTEs and SQL functions in PostgreSQL strike a good balance in the composibility department for me, while still being able to write SQL.

Let's say you have a product search screen in your application. There's a text field for filtering on product title (WHERE title LIKE), one for filtering on UPC (WHERE upc LIKE), a couple range filters for min/max prices (WHERE price <= and/or WHERE price >=), and then on the results screen the user sort on a few different columns (ORDER BY) as well as paginate and set number of results per-page (LIMIT + OFFSET).

How exactly are you going to just write SQL if the actual query statement needs to change based on the user input? Are you going to hardcode every possible permutation, or are you going to use a query builder? Or are there some advanced Postgres features that I'm not aware of that would allow all these combinations in some kind of prepared statement?

edit: The question I'm asking is pointed towards the people who are implying that you can "just use SQL" as static statements that are not dynamically assembled. Like a static function or prepared statement that takes some parameters, and at runtime you only pass in those parameters - not rejigger the actual SQL statement fragments. Your ad-hoc query builder implementations with SQL injection vulnerabilities are not relevant to what I'm asking.

How exactly are you going to "just write SQL" if the actual query statement needs to change based on the user input?

How about something like this:

  s = Select.new
  s.add "WHERE title LIKE #{title}" if title
  s.add "WHERE price <= #{price_range}" if price_range
  s.add "LIMIT #{limit}" if limit
  s.add "OFFSET #{offset}" if offset
  s.add "id, blah, boo FROM products"
Note how I deliberately shuffled the order and didn't bother with escaping.

Also note how anyone who knows SQL could immediately work with this, learning curve: 5 seconds.

Why is there no ORM that works like this?

Python SQLAlchemy works a bit like this, if you use it that way:

    query = session.query(Products)

    if limit:
        query = query.limit(limit_val)

    if offset:
        query = query.offset(offset_val)
Sort is a bit tricky, but as I'm a noob at SQL Alchemy, I had some not-so-pretty boilerplate code implemented for decorating queries with custom sorts. But essentially it boiled down to:

    if sort_type == SORT_REVERSE_CONST:
         sort_obj = sort_obj.desc()
         sort_obj = sort_obj.asc()

    query = query.order_by(sort_obj)
And if you want to search through dynamic tables, change the initial query instantiation to something along these lines:

    query = session.query(retrieve_db_object(table_name))
And to execute:

    return query.all()

Python SQLAlchemy works a bit like this

All ORMs work "a bit like this". I don't want "a bit like this". I want exactly like this.

Because with my proposed interface I could be productive immediately and permanently. I would never have to refer to any documentation. Not once.

With every other ORM this is a pipe dream. Their "fancy" chainable wrapper-APIs are nothing but a ball on a chain.

Well, no, they're more than a "ball on a chain" - they're a tradeoff - like everything in software.

They give you the ability to manipulate the query in interesting ways at any point before you execute it. They let you join different queries together, built by different parts of the system, in a safe way. They let you work with the native language you're working in instead of having to construct clauses in a foreign language, using strings. They let you post optimise your loading strategies.

You might want "exactly like that", but it's not going to be as flexible as a chainable wrapper system.

Hell, I could probably give you pretty close to that in python with a little work, but it's not something I'd want to use myself.

I get it. I like working in SQL too. I know it really really well, and I cringe when I see developers writing totally sub-optimal code because they don't understand the relational data model.

But there are other ways to do things, and what you're describing doesn't give you much more than raw SQL, so why not just use raw SQL? You've added some syntactic sugar that, in my language (Python), would be a bit of a horror show (strings access local variables implicitly, no thanks). What else do you gain?

This is more verbose, granted:

    q = session.query(Products)
    if title:
        q = q.filter(Products.title.like(title))
    if price_range:
        q = q.filter(Products.price.lte(price_range))
    if offset:
        q = q.offset(offset)
    if limit:
        q = q.limit(limit)
    products = q.all()
But then you get more stuff for free, like drilling down into the other tables:

    p = products[0]
But that's rubbish, because you'll be loading in a really inefficient way. That's ok though, tell the system how you're going to want to load the additional data.

    q = q.options(
See what I got with my "ball and chain"? Turns out it was actually the anchor for the whole boat. Sure, you have to learn a new syntax, sure, it's not sql, but that doesn't make it bad or wrong.

Use whatever makes sense for your use-case. Don't limit yourself because you'd have to learn something new. Honestly, before using SQLAlchemy I mostly felt the same as you do. Many ORMs get in the way, but that's not really a problem intrinsic to ORMs.

>See what I got with my "ball and chain"? Turns out it was actually the anchor for the whole boat. Sure, you have to learn a new syntax, sure, it's not sql, but that doesn't make it bad or wrong.

If you are fine with the new debugging this entails. Take the joined load example you gave. Almost every time I encounter SQLAlchemy code, the author didn't understand the lazy joining semantics and every attribute access on each record is emitting a new query and you now have O(n) queries. It's not obvious that is happening until it bites you once you have thousands of records.

For another example, I can count on one hand the number of developers I have encountered that can understand the expiration of the objects in the session or what the engine builder stuff is doing in the boilerplate they paste in to get it to work. It always requires experimentation, a trip back to the SQLAlchemy docs, and then finally logging all SQL queries to see if it's finally emitting the optimal queries with the transaction guarantees you are looking for.

My gripe about SQLAlchemy is that it separates you too much from what is happening underneath.

Yup. I'll give you most of that - there's a pretty big learning curve with SQLAlchemy.

But, taking the lazy joining example. Someone didn't understand how it worked and now you have to fix it, which is probably as simple as changing that loading strategy to something more suitable for the way you're using the data.

Let's rewind to the late 90s. If I wanted two tables of data to write into the page in an interlaced fashion I would look at how wide they were. I'd make a decision about using two queries or using one larger join. The two query approach was sometimes required due to the size of the data, but it meant cryptic output logic to track positions of cursors relative to each other. The single joined query was simpler to deal with, but still required tracking a last_parent_id so you could swap during the interlacing.

Other developers (those same ones that didn't understand lazy joining) would loop over the first query, running extra queries in the loop (I saw this a lot). Same bad performance as the lazy join.

When you discovered this issue in the code it was a total pain to fix. You're talking about rewriting whole load of code to the point of being unrecognisable from the code you started with.

Contrast that with how easy it is to fix in the SQLAlchemy case. I mostly don't worry about loading strategies now until I'm deep in development. Something's running a bit slow, take a look at some logs, tweak a couple of things and you're golden again. That's such a powerful abstraction.

Regarding the config of the engines etc, again, it's something you need to learn. But really, someone's starting an application from scratch and they just want to dump some code in to handle all their db interactions, but they don't want to know how it works? That's on them, either learn it or use the sensible defaults (in, for example, flask-sqlalchemy).

SQLAlchemy ORM separates you from what's underneath, but it's an ORM, that's kind of the point. If you need to be closer to the metal, use SQLAlchemy Core.

Good news! Almost every language offers some sort of string concatenation with no protection whatsoever against injection, which seems to be what you're asking for.

His proposal is not string concatenation, and it is not vulnerable to injection.

"WHERE price <= #{price_range}" looks like raw interpolation to me. How do you make that not vulnerable to injection unless you're escaping all variables that might be used in a query?

How do you make that not vulnerable to injection unless you're escaping all variables that might be used in a query?

It was just a mockup. But you are right, in reality it would end up looking more like this (and use custom interpolation for escaping):

   s.add "WHERE foo > $(bar)"
Likewise a smart syntax for clause combining (AND/OR) and some kind of nesting would probably be needed.

I believe both of these problems should be solvable without compromising the simplicity of the approach.

You still need to supply the local variables to your ORM-like so that it can perform interpolation into the string (a feature which is actually best done by the database driver, NOT your ORM-like). Perhaps we can suggest this syntax:

s.add "WHERE foo > ?", bar

Or even this:

s.Where "foo > ?", bar

At that point, you've reinvented ActiveRecord or hundreds of other query builders, it also avoids you having to remember the ordering rules of sql as to which part of the query must be built first. There's a reason these query builders have converged on similar syntax.

I see where you're coming from - you don't want to learn two sets of syntax, but many query builders nowadays are very well thought out, and have a simple syntax which just echoes SQL while avoiding sqli and statement order issues, they also usually have an sql() option which lets you just send SQL if you wish for a complex query.

There are very good reasons people keep inventing layers over SQL:

They help avoid sqli

They centralise db access so you can optimise in one place by say adding caching layers to queries

They mean you don't have to deal with the vagaries of SQL for simple queries unless you want or need to

"You still need to supply the local variables to your ORM"

That can be done automatically in many languages. At least in the sense you can do:

"where value=$(bar)"

and not:

"where value = ?", bar

OpenACS was an old web framework for the Tcl language, but it even had that 20 years ago!. The company backing it failed in the .com era, and the lack of types was a turnoff for many - but it made SQL much much more readable. And of course the queries were turned into prepared statements. A query looked like this:

select foo from table where age>:min_age and name != :user_name

Things like this can also be done in node.js (disclaimer: I wrote the blog post below):


Perhaps we can suggest this syntax: s.add "WHERE foo > ?", bar

I'd say stick with keeping it a string and use:

  s.add "WHERE foo > $(bar)"
but many query builders nowadays are very well thought out, and have a simple syntax which just echoes SQL

And that's the problem. Everyone "echos" SQL, just slightly differently, each with their own idiosyncraticies.

If all you do is echo anyway, why don't you just let me write plain SQL, for christ's sake?

It's much easier for humans to memorize a slight SQL dialect than a different god-awful, nested ORM API boilerplate for every language they're working with.

I can write this without thinking, for queries of almost any complexity:

  s.add "SELECT id from users WHERE"
  s.add "name == $(name)" unless name.nil?
  s.add "AND age > $(age)" unless age.nil?
  s.add "GROUP BY age"
A parser/query-builder doesn't need amazing smarts to infer exactly what I want from this sloppy syntax; e.g. drop the "AND" if the first condition was false and present me with meaningful error messages when something is ambiguous.

Ask me to formulate the same query in any ORM and I'll have to refer to the documentation. And almost every time I'll waste a stupendous amount of time on getting some irrelevant, syntactic detail just right. Only because the ORM insists on an autistic API.

They mean you don't have to deal with the vagaries of SQL for simple queries unless you want or need to

After having worked with Hibernate, SQLAlchemy (the one-eyed amongst the blind), ActiveRecord, DataMapper, Sequel and a few others that I don't even remember, I conclude with passion: They have it backwards.

It's not the vagaries of SQL that I don't want to deal with. It's the vagaries of the ORM du jour that I don't want to deal with.

If all you do is echo anyway, why don't you just let me write plain SQL, for christ's sake?

They usually do more. SQL is not very pretty or friendly IMO, ORMs clean it up a bit (for example allowing arbitrary ordering, sqli, construction of multiple queries from one base). The example I gave was an arel query - very similar to yours if you move the WHERE out into the function name.

Your ORM proposal above looks fine (and very similar to existing ones but with more strings), but personally I find it uglier than the many which are available, and you haven't actually tackled the protection against sqli or explained how that would work. Normally you'd pass that through to the db driver as a $n param, rather than trying to escape a string at the ORM level. If your ORM is going to infer details from the sql, it basically has to parse that sql first and handle all the same things that current ORMs do by using named functions to add SQL - you'd find the same issues with consistency and edge cases I imagine, and of course have to be very careful about parameters and sqli.

So if you don't like ORMs, don't use them, but you do have to be sure that your own ad-hoc home-grown ORM that you're writing instead covers all the bases current ones do. I prefer the syntax of the ones you're denigrating for simple queries, even if for complex queries it might be better to write sql directly (via the functions they provide), but would be interested to see it in action - when are you going to start :)

Or you could just use Sequel and not have to build up your own library.

It's pseudocode for an implementation that doesn't exist. There is nothing in his proposal that requires this hypothetical implementation to copy the value of the price_range variable into the string at all, much less unescaped.

The strings don't even have to be sent to the DB at all. You've utterly missed his point. He wants the language to be intelligent about what the SQL means and do the right thing.

It's not raw SQL, it's an abstraction.

The psuedocode in question happens to be syntactically valid Ruby. If the goal was to demonstrate how simple query generation could be without abstraction layers like Sequel, it is a valid criticism to note that the example given is eliding a necessary feature. Especially since helping avoid injections is one of the primary advantages of such an abstraction layer.

The original argument was that if one could "just write SQL" one wouldn't need to refer to documentation to become productive, "not once". I don't personally think it's practical to use any database library without reading the documentation, but certainly the moment you get away from "the SQL I type in goes straight to the database" you're going to need documentation to tell you what's going to change.

What's being described here sounds like a library that parses SQL fragments, combines them into whole queries, and sends them off to the database. To implement this this without resorting to simple string concatenation needs an intermediate representation of SQL language components. That intermediate representation is going to look a lot like SQLAlchemy, or Sequel, or Arel. That means that instead of "just writing SQL" you're actually adding an extra layer of abstraction and another chunk of documentation to read.

All that said, the "just use SQL" approach is already fairly well served. Most languages have mature bindings to specific databases as well as at least one ODBC-style API to simplify access to those bindings. Nearly all of these have support for client-prepared statements, which handles the interpolation problem. If you want to build more complex queries on the fly, you can use string concatenation or CTEs or decide to use a SQLAlchemy/Sequel/Arel-type library for just that one case.

And, honestly, you don't need to give up SQL to make use of Sequel. I personally really dislike the query composition API demonstrated at the top of this thread, but you can do useful things with Sequel without it:

    db = Sequel.sqlite
    db.execute "create table foos (id int not null, name text)"
    db.execute "insert into foos (id, name) values (1, 'bar')"
    # or
    db[:foos].insert(id: 2, name: "baz")
    db["select * from foos where id = ?", 1].first # ==> {:id=>1, :name=>"bar"}
    # or
    db[:foos]["id = ?", 2] # ==> {:id=>2, :name=>"baz"}
No need to go any further than raw SQL, but increasing levels of abstraction available if you feel like it.

> The psuedocode in question happens to be syntactically valid Ruby.

So is everything else anyone types, code-like or not. He even said "Note how I deliberately shuffled the order and didn't bother with escaping.".

The response was flippant, intelligence-insulting, and obviously the result of failing to read thoroughly.

And speaking of intelligence-insulting, we all know you can run raw SQL through Sequel.

You're not having a useful dialogue, you're being combative, like the person I initially replied to.

I said the code snippet looked like raw interpolation to me, and I asked how it could be made not vulnerable to injection. It was an honest observation and a genuine question. No flippancy was involved. You're free to think I'm an idiot, but you are the one being insulting and combative here.

As everyone has repeatedly pointed out, once you deal with the obvious issues you quickly end up with something that looks like the gazillions of other ORMs on the market; sorry if I hurt anyone's feelings.

There is. MyBatis, which is a minimal ORM that aims to keep you as close to SQL as possible, has support for dynamic SQL: https://mybatis.github.io/mybatis-3/dynamic-sql.html

Edit [responding to moe, below]: that's a matter of taste. I prefer to have my SQL _outside_ my code. If I have to write a little XML to make it happen, so be it. Additionally, unlike your example, a strict separation of SQL and data ensures SQL injection is not possible. That's also a worthy goal.


Does MyBatis have an API similar to what I outlined above?

Because the page that you link to shows a horror that I can't even begin to describe:

  <select id="findActiveBlogWithTitleLike"
    WHERE state = ‘ACTIVE’
    <if test="title != null">
      AND title like #{title}
SQL mixed with proprietary XML soup? Talk about combining the worst from all worlds...

Get rid of the XML and you have a good start, I'd say.

> Why is there no ORM that works like this?

Because no one wants to write a SQL fragment parser. You would have write a parser that:

a) Could be started at an arbitrary place in a query and figure out what it needed to do.

b) Would be SQL-dialect aware. (Maybe -- it might be possible to skip this.)

Were it not for (a), I'd imagine you could quickly build this by using some kind of debug feature of the database -- sending it a query to parse and getting back a parse tree.

What you're describing isn't an ORM, it's a SQL query abstraction.

Knex is a very convenient, Promise-oriented query abstraction, and it has "raw" methods when only SQL snippets will do. http://knexjs.org/

Generally I'll go with something like this for these conditional search based queries

    select id, blah, boo
    from products
    where 1
    and (title like :title} or if(:title = '', false, true))
    and (price <= :price_range or if(:price_range = -1, false, true) 
    limit :limit
    offset :offset
Its a trade off, bit of extra complexity in the query for simpler code logic. Really, I avoid using sql for these types of "search" / find methods, they're probably going to be slow as hell as soon as you get a sizable data set.

I really despise orm's its essentially taking a black box (sql/database) and throwing it inside a much less tested, less optimized, less documented black box.

For what reason, I have no idea, nobody has ever been able to convince me there is a problem that needs solving. Almost every argument i've heard amounts to problems sql already solved a decade ago. I can only speculate but I'll hazard a guess it comes down to lack of enthusiasm to really learn sql properly (even though the person uses it / debugs it daily)

Microsoft's Entity Framework and LINQ work pretty much like that, only with nice IDE support.

I'll try to write an example like that when I get home.

Of course, they're not "just SQL", and I do understand where the above comments come from.

And yes, sometimes I have to go look up how to do things I'd do trivially on SQL.

> Why is there no ORM that works like this?

Because you're only showing a query builder, the "relational" not the "object mapper".

From an OOD point of view, if the end result of that query will be Product instances, why am I using a Select object to create them and why is it having to do some sort of string parsing to determine the objects I'm loading?

if the end result of that query will be Product instances, why am I using a Select object to create them

Because we can just infer the type to be returned via the FROM-clause of the query.

and why is it having to do some sort of string parsing to determine the objects I'm loading?

Because, to cite the immortal Larry Wall:

   The computer should be doing the hard work.
   That's what it's paid to do, after all.
   -- Larry Wall
It has to do the string parsing, escaping and intelligent validation so I, the human, can write

   s.add "count(*) / sum(count(*)) * 100 as percentage"
instead of

  .select_append { ((count(star) / sum(count(star)).over) * 100.0).as(:percentage) }
The difference is that most people can read and write the former without thinking.

The latter version may not look much more complicated at a glance. But as we all know these seemingly trivial fragments, more often than not, take ungodly amounts of very frustrating trial & error before they play along.

You don't have to pass a block to select_append. With Sequel, you can do this:

   # db is a Sequel::Database
   s = db[:foobars]
   s = s.select Sequel.lit("max(id) as best_id")
   s = s.select_append Sequel.lit("count(*) / sum(count(*)) * 100 as percentage")
   # ==> "SELECT max(id) as best_id, count(*) / sum(count(*)) * 100 as percentage FROM `foobars`"
The only real difference here is Sequel.lit, which is needed for security (any secure DB interface needs to somehow be notified that strings are safe to put into a query without escaping). If I'm writing code that leans heavily on Sequel to build queries, I'll make a private method #sql that is an alias for Sequel.lit.

It's entirely feasible to use Sequel this way. I've used it like that in production. I loathe the query building DSL, myself, but it's strictly optional.

The only real difference here is Sequel.lit

Well, no.

I repeat: The real difference is that most people can't write even this utterly trivial snippet without studying the Sequel documentation first.

Now what if I want a WHERE-clause? Do I have to use s.where? Or s.select_append("where ...")? What if I need to combine them with AND/OR?

It's not ok that we have to think about all this boilerplate that has nothing to do with our actual query.

We shouldn't have to translate our perfectly unambiguous request from english (SQL) to klingon (Sequel API) in order to have it processed.

I'm skeptical that it's possible to assemble a working SQL query of much complexity unambiguously from an arbitrarily-ordered set of clauses. CTEs, subselects, and boolean conditions would require a nesting construct. Boolean conditions, order clauses, and some joins need to be assembled in a meaningful order. Eventually you're going to find a level of complexity in the query where it's necessary for a developer to read the documentation, either to figure out the API or to understand the method the library uses to stitch queries together.

There might be a sweet spot yet unreached in terms of allowing developers to work with SQL on top of as minimal a native binding as possible. I'm certainly not claiming that Sequel/SQLAlchemy/etc. is at or near that sweet spot. It might be necessary to use a language with language-level support (or macros) to really reach it.

How often do you have a need for non-engineers to be able to alter your database queries? Why is this a requirement or even desirable? There seem to be few cases where a query change needs to be made in isolation, without having to touch any of the downstream consumers of that data.

So if we remove this non-requirement of "anybody can write it", we're back to my original question: Ok, we the coders know we want a Product instance, but to build it we create a Select instance and then ask a library do so some sort of parsing?

Possibly no ORM exists that meet your criteria because there are issues with the design goals you're trying to achieve?

Have you thought about JSON as a partial query placeholder, and all the native methods available for traversal, merging, plucking, etc.?

For example, here is :

{ type: 'select', columns: [{ expression: 'orgJobs.org_job_code', alias: 'job_code' }], joins: [{ table: 'org_jobs', alias: 'orgJobs' }], limit: 100, offset: 0 }

which yields:

select orgJobs.org_job_code as job_code from org_jobs as orgJobs limit 100 offset 0

and so its easy to do things like:

query.limit = 100;

and of course:


and use something like lodash/underscore (or equivalent for your language) to add collection/array/object helper methods so you never have to write for-loops for things like mapping and property plucking.

Now you have to be aware of SQL escaping to prevent SQL injection attacks.

Everyone who works with SQL should be aware of SQL escaping. There's nothing wrong using a library to do it for you, but what you should have an understanding of what the library is actually doing. Otherwise how could you possibly test it? Just by hoping it's right?

One of the things I like about EntityFramework is that you write LINQ, which odds are these days every C# dev should be familiar with. So, standby a few gotchas (which SQL has as well), the code you write to query an in memory collection is the same code you write to query the database. This also makes it fairly easy to test because you can create in memory representations of your data and check you filters, projections, orders etc.

Stuff like your example is where, in general I prefer a sproc, or something more like the following...


there is, it's called anorm. https://www.playframework.com/documentation/2.3.x/ScalaAnorm

You don’t need another DSL to access relational databases SQL is already the best DSL for accessing relational databases. We don’t need to invent something new. Moreover the SQL syntax and features can differ from one database vendor to another.

Micro ORMs usually works very much like this. There is a problem when you want sub records, they are handled differently in each library or not handled at all.

You've formulated the composibility reasonably in that sentence but it's important to remember that is not the only problem the world faces in data manipulation.

I think the irony is that much of the problems of sql are related to a lack of tooling.

The lack of tooling is result of large numbers of devs being distracted by ORM which moves that particular problem into the language ide/editor.

Essentially we lack great *.sql editing, ide, macro, refactoring because of the lack of focus on sql itself.

edit: as an aside, string concatenation in the language of choice is its own road to hell.

> Essentially we lack great *.sql editing, ide, macro, refactoring because of the lack of focus on sql itself.

Except that my other tools are just an editor like emacs and makefiles. So now to be productive I have to have and ide for sql, learn sql, learn macros.

Interesting that SQL came first. If it was so easy and obvious to use it we would not have seen any ORMs by now. But every other project that uses SQL databases ends up with some kind of ORM.

One of the reason is probably because there is an impedance mismatch. Code is already written dealing with objects, functions, pointers, data. But when it talks to the database it has to talk via a command line string interface so to speak.

There is another type of impedance mismatch and that is learning a new language. One can argue SQL is there to keep & manipulate the most precious things you have -- data, so one should spend time learning it well. But the other point is, SQL is there to keep & manipulate the most precious things you -- data, so don't mess with it if you don't know SQL very well. Let someone else learn it (the author of the ORM).

I am not arguing for one point or the other, just kind saying how it is interesting how we have been going in a circle the last 15 years or so.

I think it's a mixed bag... and depends on what you want to accomplish... a lot of the time, I will find myself bypassing an ORM about 20% of the time... the other 80%, the ORM model is simpler.

I find LINQ + Entity Framework to be a blessing in C. Though I use the monad/function syntax not the LINQ query sugar when I use it.

I also use emacs(evil-mode actually). An ide is not much different then a fully configured vim or emacs.

sql-mode for some databases(mysql and oracle iirc) reads the DDL of the tables you are typing about, why is this not better and more general?

Sql is so well established and general as a solution that not learning it is a poor choice.

You mean something like this? https://www.jetbrains.com/dbe/

Yes that was in my mind as I was typing it actually :)

Notice that its only a yearish old I think?

The IntelliJ IDEA Database plugin [1] on which 0xDBE is based on has existed for quite a while. I'm not sure when it become very useful as I've only used it for a few years, but it is quite powerful and useful.

The SQL code completion is quite good (and it caches your entire database schema so it is very quick). That's not particularly special on it's own, but what makes this especially useful is IDEA's "language injection" [2] feature. This allows you to, for example, get completion for SQL when it is contained in some other language, which could be anything (e.g. a Java String, Ruby String, XML, etc). It will also analyze and report errors in this SQL on the fly.

And SQL statements contained in concatenated strings are no longer a problem because you can edit those in a separate editor window where you are only editing the SQL, and it automatically gets placed into the concatenated string.

Not sure if it addresses all of your specific concerns, though.

[1] https://www.jetbrains.com/idea/features/database_tools.html

[2] https://www.jetbrains.com/idea/help/using-language-injection...

I am rather fond of Sqitch and pgTAP as tools for database development.

Sqitch problem domain I handle a different way.

Pgtap is on my list of things to poke around with.

edit: thanks btw

I would write a SQL function that takes all of those as optional parameters and includes a lot of these:

    WHERE (_title    IS NULL OR title LIKE _title)
    AND   (_minPrice IS NULL OR price > _minPrice)
    AND   (_maxPrice IS NULL OR price < _maxPrice)

That ends up being very inefficient as the optimizer will come up with one mediocre plan that works for all those criteria.

This actually answers the question, although I would imagine the ORDER BY handling will look pretty messy (CASE statement perhaps?).

And I don't know what it would look like if the requirements changed to allow ordering by multiple columns with different possible sort directions... that might get back into dynamic SQL using a RETURN QUERY EXECUTE type of thing, which is basically using a query builder in your query language.

Most SQL engines support ordering by column number.

select c1, c2, c3, ... order by 2;

would order by "c2"

> Or are there some advanced Postgres features that I'm not aware of that would allow all these combinations in some kind of prepared statement?

I can't speak specifically to Postgres, but, in my experience, there is nearly always a way to do what you want with bound parameters. No advanced features are required. It often results in poor performance and redundant code that is hard on the eyes. You will get what you deserve, but sometimes you don't have a choice.

Here is an example of the horror:

    WHERE (:1 IS NULL OR :1 = item_id)
      AND item_num BETWEEN NVL(:2,0) AND NVL(:2,9999999999)
      AND item_date BETWEEN NVL(:3,'01-jan-1900') AND NVL(:3,'31-dec-4712')
      AND item_name LIKE NVL(:4,'%')
      CASE :5 WHEN 'NUM' THEN item_num WHEN 'DATE' THEN item_date ELSE item_name END,
      CASE :6 WHEN 'NUM' THEN item_num WHEN 'DATE' THEN item_date ELSE item_name END

Edit: I suppose you could also parameterize the ascending vs. descending sort, although I have never tried. My first thought is to duplicate each line in the ORDER BY clause: one bind parameter for ASC and another for DESC. Have each CASE return a constant if the bound value is NULL, and then bind NULL for the direction you do not want. Yuck.

I am not advocating any of this but am pointing out that bind parameters can be abused in surprising ways if you are backed into a corner.

You might be over thinking this... something like this might work out just fine. (I wouldn't necessarily do things this way, but rather keep a list of clauses and join them with " AND " to avoid keeping track of the "WHERE"s and "AND"s, but you get the point...)

    sql = 'SELECT * FROM products'
    args = []

    if title:
        sql += ' WHERE title LIKE %?%'

    if upc:
        if args:
            sql += ' AND'
            sql += ' WHERE'
        sql += ' upc LIKE %?%' 

    if price_min:
        if args:
            sql += ' AND'
            sql += ' WHERE'
        sql += ' price >= ?'

    if price_max:
        if args:
            sql += ' AND'
            sql += ' WHERE'
        sql += ' price <= ?'

    if order_by_name:
        sql += ' ORDER BY name'
    elif order_by_price_low:
        sql += ' ORDER BY price'
    elif order_by_price_high:
        sql += ' ORDER BY price DESC'

    conn.query(sql, args)

This is a query builder. If you're using your programming language to dynamically assemble the SQL statement fragments at runtime, then you're using a query builder regardless of if it is a library with a fancy DSL that assembles an in-memory SQL AST or some kind of ad-hoc string concatenation you rolled yourself like this.

The question I'm asking is pointed towards the people who are implying that you can "just use SQL" as static statements that are not dynamically assembled. Like a static function or prepared statement that takes some parameters, and at runtime you only pass in those parameters - not rejigger the actual SQL statement fragments.

Yes, logically, it's a query builder. But it's one that doesn't force any extra dependencies in my project and is very easy to troubleshoot.

Seriously, how is this different from "just use SQL". SQL was never really intended to be used by itself (except for manually typing queries into a console). Yes, what I wrote is code that writes code, but it's far different from a larger ORM or querying framework.

Disregarding my edit and how your query builder is irrelevant to my question anyway, if I change the requirement to allow the user to sort by multiple columns instead of one, that case statement is going to either combinatorially explode or you'll also have to add some ad-hoc SQL escaping to guard against SQL injection since you can't use bound parameters in ORDER BY clauses.

It's easy to see how libraries like Sequel get invented to help people do this, and I don't really see what the problem is when the DSL stays close to SQL semantics.

Dealing with multiple ORDER BY clauses isn't any more difficult, but you'd have to figure out how to get the user input into some kind of list. That will require validation too... Just using an ORM doesn't let you ignore input validation. You wouldn't trust a user to use their own column names in a WHERE clause anymore than you'd trust them in an ORDER BY clause.

Trying to differentiate between a DSL that interfaces with SQL and SQL is nonsensical. SQL is the original DSL.

> SQL was never really intended to be used by itself.

Hmm I thought that is how it was intended to be used? If it didn't, it would look like datalog, lisp or some binary protocol with prefixed lengths and whatnot.

It was intended to be typed in by analysts at a console. Who would then print the report on the dot matrix printer and mail to the headquarters or something of that sort.

That data has to get into the RDMBS somehow...

I guess a better wording may have been to say that SQL was never intended to be used in isolation. It can be, but you've always had interfaces to other languages / platforms. SQL is a data manipulation and retrieval language - the rest of the business logic has to be performed somewhere else (excepting complex stored procedures).

I was told it was intended to be easy to use for non techies at University. It is quite wordy, kind of the opposite of some Perl.

SQL was often combined with Cobol (might still be where they have Cobol).

was? You sound like lots of paces have got rid of it.

I know this isn't central to your point, but I thought I'd say it anyway. These days I have taken to doing:

    sql = "SELECT * FROM products WHERE TRUE"
    if upc:
        sql += ' AND upc LIKE %?%'
This saves having to worry about "AND" and "WHERE", and the extra "WHERE TRUE" results in the same execution plan (thus the same performance).

I also add an extra column to my SELECTS:

I can just use "--" to comment out any line without having to manage the last comma. Especially useful when building aggregations.

Awesome. That's such a straight-forward solution to one of the crappy edge-cases that SQL is so bad about (ending lists without commas, like JSON requires too, being another).

Now how is this an improvement over the Sequel example?

I understand the impulse to "Just write SQL." But in practice, with all the string concatenation needed to generate actual queries, you can't really see what the SQL will be without running all the code in your head anyway.

In practice, I'd reduce this to a query building function in my own code, so the readability would be much better. The benefit is that you don't need to include another library for it to work. Well, that and it is a strategy that works for all languages, not just Ruby.

And the downside is now you need to spend effort maintaing and debugging your custom query builder.

Yeah but a lot of it is quite generic, so could be reused in a few places and therefore make that extra effort worthwhile. In fact, it might be handy in a few places so you could package it up and then have a single place you have to maintain the query building code for several projects.

Wait a minute...

I think you're over estimating the amount of effort that I spend on this.

I've done this quite a bit as well: string building is pretty well understood, and works remarkably well.

Dynamic prepared(NO ADHOC, THIS IS CRAP) TSQL is a common occurrence for me in production, it is often a better choice that writing a bunch of conditional logic in SQL (to get sargeable queries) or a bunch of really unperformant case statements in where clauses.

I have written TSQL that changes the entire query (which was on the order of 80 joins) dynamically based on about ~20 different parameters, only a few actually passed into the stored procedure, the rest based on settings and configuration. It took a few hours to grok how everything came together at first, but a few tricks that the optimizer will remove means you can add arbitrary conditionals in whatever order you please.

The example you give is VERY trivial for dynamic SQL. postgres - http://www.postgresql.org/docs/9.1/static/ecpg-dynamic.html mssql - https://msdn.microsoft.com/en-us/library/ms188001.aspx

That can be solved by writing a SQL function and the user input values are arguments to the function.

    query = sprintf("select count(*) from data where %s = ?", column)
    results = sql.execute(query, filter_value);
So long as you leave the "value" portion of the query as a '?' (or %s, or whatever your connector requires) and don't use direct user input for the column names, you're still safe from SQL injection.

Even with PL/pgSQL, there is no way to build a SQL statement dynamically _and_ safely -- it's all just string concatenation there, too. However, using `CASE`, `WITH` and `LATERAL` you can have a root `SELECT` that returns one of a few different queryable code paths, and `WITH`/`LATERAL` allow you to reuse definitions.

Use format() for safely constructing dynamic queries.


You end up doing a lot of string concatenation even with that, though. It's not quite the same thing as having some kind of literal or combinator based setup.

  select(from('tab'), where(and(like('name','Mo%'),not(is('id',NULL)))

You can get a similar syntax with format() by nesting format() calls such as:

  EXECUTE format('SELECT * FROM tab %s', format('WHERE %I IS NOT NULL', 'id'));

Answer : Learn SQL. It is very easy to pass variables.

I'll deal with the problem when I have to.

However, the point is that I'd rather use SQL, it's simple as that.

Orms are not about syntax, they're about things you can't natively "think about" in SQL, like inheritance, composition, references and graph navigation

I prefer them for taking away the grunt work of needing to code updates and inserts for everything.

I'd argue it's less about composability, and more about laziness (in the computational sense). As pointed out elsewhere, if you have all of the information needed to generate a dynamic query, it's often not a huge leap from an ORM to composing strings (especially given how relatively flexible SQL syntax can be).

However, sometimes I want one part of my program to be responsible for one bit of a query, and a separate piece to be responsible for something else. To take a trivial example, say I want one object/function to be responsible for doing the right sorting, another to be responsible for any pagination, and yet another to be able to group results when needed. In that case, having a programmatic abstraction over a query (whether it be an object or a datatype, doesn't matter) can be very useful.

I agree with you example, in that case it is useful. It's just that it is a pain in the ass in so many other cases.

I always wonder why developers (usually young and enthusiastic ones...) pick a set of the most challenging requirements they can think of and then use them to justify the usage of some library or pattern that makes life a nightmare in the simpler cases, which are obviously the majority of the use cases.

It's like using a tank for going to work, because the roads might possibly have been destroyed and there might be enemies shooting at you. Then ending up destroying and having to rebuild your own garage because the tank is too big to fit in. Complete nonsense.

> I agree with you example, in that case it is useful. It's just that it is a pain in the ass in so many other cases.

It's a pain, because what we really want in such cases is a datatype server, not a relational database. For all of SQL's merits, the NoSQL backlash was not without cause. For too long the SQL hammer has been wielded to pound in too many screws.

Of course, programmers being programmers, NoSQL turned into just another hammer used in just as many inappropriate situations. The example I outlined is not a theoretical one. I've dealt with such situations...but I've done so with Datomic, which is designed with datatype-composability in mind. Similarly, I've found a healthy mix of SQL, Redis, memcache, and Datomic all in moderation can go a long way.

...but I've still not found a solution where Mongo is the obvious answer.

The biggest challenge is to understand and find the simplest case and deal with the hard cases when they happen. I had to learn this over time.

Here's an example in C#. Imagine you're querying a database of products (here represented by integers). Users can enter filter parameters - you want to build your query dynamically based upon what they enter. With LINQ, you can do this kind of composing with no effort. You also get to run the same code on any kind of Queryable, so if you feel like doing some of the work in RAM and some using a DB, your query is usually going to be very similar, if not the same.

Yes, you still have to understand what you're querying against and how you should build your queries to make best use, but I'd much rather write this kind of code than try to concatenate SQL.

  IQueryable<int> Source()
    return Enumerable.Range(0, int.MaxValue).AsQueryable();

  class UserFilter
    public bool? EvensOnly { get; set; }
    public int? Minimum { get; set; }
    public int? Maximum { get; set; }

  IEnumerable<int> Search(IQueryable<int> source, int currentPage, int pageSize, UserFilter filter)
    var result = source;

    if (filter.EvensOnly.HasValue && filter.EvensOnly.Value)
      result = result.Where(i => i % 2 == 0);
    if (filter.Minimum.HasValue)
      result = result.Where(i => i >= filter.Minimum.Value);
    if (filter.Maximum.HasValue)
      result = result.Where(i => i <= filter.Maximum.Value);
    return result.Skip(currentPage * pageSize).Take(pageSize);
  void Main()
    var currentPage = 4;
    var pageSize = 5;
    var filter = new UserFilter {
      EvensOnly = true,
      Minimum = 1000
    var searchResults = Search(Source(), currentPage, pageSize, filter);

FYI, you can also write it this way, although you may find it less readable, it emits the exactly same query to the DB, and I personally find it much more pleasing;

  IEnumerable<int> Search(...) {
     return result
        .Where(i => !f.EvensOnly        || i % 2 == 0)
        .Where(i => !f.Minimum.HasValue || i >= f.Minimum)
        .Where(i => !f.Maximum.HasValue || i <= f.Maximum)
        .Skip(currentPage * pageSize).Take(pageSize); 
Nit: EvensOnly should not be nullable since it has only two states.

I find this more readable. In fact, the code here looks more like this:

Obviously the example is trivialised and the real filters are more complex, requiring joins, but the pattern gives a very readable way of writing an efficient query.

Unfortunately EF produces an unreadable query when there are a few dozen filters, but the LINQ code is readable, so there hasn't been any difficulty debugging.

Works great until you need some feature that is only implemented in SQL and in my experience that's unfortunately about 30% of the time.

How often do you really reuse bits of a SQL query? SQL is the language that an ORM will generate. The question is, how much work is done to avoid using SQL and is it worth it?

The only time that I've found ORMs useful for composing queries when I have to dynamically create a query at runtime based upon user input. And even in that case, today, I'd probably still just concatenate strings for a proper SQL statement.

I use this a lot for report-builder interfaces, and according to my personal experience SQLAlchemy in Python maps really well to raw SQL while adding a lot in terms of DRY, portability and security.

I guess the underlying question is how much your development style involves meta-programming (and how much someone enjoy reinventing the wheel with their own query builder engine instead of trusting a mature ORM).

With Sequel I reuse bits all the time, because you can generally very easily chain statements that return dataset's until you force evaluation of the query.

I think there is a problem of tooling in SQL, which means it is difficult to reuse it.

I spent the other day writing a big query, joining about 10 tables. So usually I start by joining the two main ones. Check that gives me the results I want. Add in another table or conditions, check again. Repeat until all tables are joined into the query and conditions are added.

Then I noticed the results of my GROUP_CONCAT were not as I expected. I had a couple of suspect joins that I tried removing. Same problem. In these situations, it is often easier to go through the same process from scratch of adding in one table at a time and ensure that it is working.

We need some kind of unit test equivalent for SQL.

Mature relational databases support views, pre-built queries that developers can define and other developers can then use as "tables" in higher-level queries, allowing some code reuse. This has been around a long time.

It happens a lot for me. Basically you have to choose between copy-pasting a lot of sql code or trying to reuse parts when the business logic has multiple cases that are similar but not exactly equal.

The composability requirement only justifies itself when... your queries need composability.

I'm absolutely in favour of using a query builder when it avoids the need to fiddle with strings, but at the same time any decent programmer can write in half a day a query builder that satisfies the vast majority of the use cases.

So the question is: once you have covered those cases, do you need to use the query builder to write static queries? No. Do you need an extremely complex query builder to build extremely complex queries? Again, no. SQL is great for writing very readable and optimized static queries, which are almost all the SQL queries you need when you're not dealing with search forms.

Bottom line: use your favourite query builder or write a tiny one, but use it only when the resulting code is simpler than the plain SQL + string joining.

We already have quite a bunch of query fragments we use to compose bigger queries, so I'd say in those cases it's more than justified.

> So tl;dr: it's about composability, not being "lazy" or "ignorant" to SQL.

If you want composability you should factor the common sql fragments out to parameterized views (table valued functions), or just regular views. Doing this across a large project can really improve performance as well.

> In both cases there's little validation of the query's > correctness (syntax wise) until you actually run it.

Yes? So? When is this really a problem? Even if you do generate queries dynamically. When does this become such a big problem that it is worthwhile killing readability and bogging down the system with lots of extra complexity?

As for ORMs: I've done software for 25 years. I still have not seen a single example of ORMs resulting in better software. Not a single time.

I agree with this 100%. SQL works incredibly well for purely declarive, relatively simple operations. However, the moment you start doing "Do this, then that, then that", it starts to look (literally the appearance) daunting.

I myself come from the R/Python background, and my hypothesis is this difficulty around composability (especially for dplyr/pandas crowd) is what deters data scientists from writing more SQL.

Use an IDE like JetBrain's IDEA that checks the SQL syntax against the database.

I use Entity Framework as an ORM and I have to say I like it. As much as there have been a lot of false-starts related to trying to reinvent SQL, I think Entity Framework hits many of the right notes for me:

1) All my code lives in Visual Studio with compile-time type-checking. No maintaining stored procedures outside of my main codebase, no mucking about with strings. And because Entity Framework puts the Select after the From clause, I even get good intellisense on column-names.

2) I can quickly shift code into being in-memory queries from database queries.

3) While it's a little muddy mixing SQL and C#, there are places where C#/EF's syntax is so much more brief and clean since it's hyper-aware of relationships. For example, instead of writing 2 joins to get the country of a user, I can say User.Centre.Country.

Of course, the Entity Framework is adding a layer of complexity to my work - you still have to know what's going on in SQL, and that mental overhead is a downside. But it pays off.

SQL is a brilliant language, and that's why it's stood the test of time in spite of its advanced age.

But seriously, it has a lot of bad flaws - it's often difficult to make reusable components in SQL like you can in other platforms.

The underlying relational algebra is brilliant. That's why I like ORMs - they admit that SQL is ideologically beautiful, but the SQL language itself could use some loving.

Yes. I pretty much agree with you and am happy for people to use ORMs (or raw SQL if that's their thing and they are into maintaining it). It's actually the underlying 'throw away all this old SQL database stuff' that is the biggest loss for people because they miss out on the power and optimization that's gone into those databases.

But doesn't the Sequel snippet just write SQL as Ruby and get Ruby objects out? It's still pretty raw SQL, just not SQL-as-embedded-strings-guess-the-syntax-error. And surely it's easier to extract, compose or otherwise manipulate reified queries than by concatenating floating bits of strings.

Yep, even better imo is that when you run a query it outputs the generated SQL. Often I'll have a psql console open next to vim while creating Sequel queries and just copy this output over to verify the generated SQL does what I expect.

When writing raw SQL, I tend to test my queries against the database first with sample queries, then carry them into code once all the logic has been figured out.

I will just say that no one should have inline SQL in their code. For security's sake, use a parameterized query!

Entity Framework still generates pretty crappy looking SQL. It's nice to get a project going, but every time I try to look at why we are having performance issues in production and see the horrible query EF is generating I cry.

While I agree the SQL it produces looks ugly, performance wise I haven't found too many issues (using EF 5 then 6). The few cases where it produced legitimately bad SQL (slow), was due to the developer failing to grasp how queries are built (in particular abusing "in" and having multiple layers of selects instead of just a flat query which could have done the same thing).

I will say EF requires extra training/knowledge. So it isn't a "free" framework. However being able to migrate your entire code base from one database to another only changing a single annotation in the Code First classes is pretty nice. As is being able to generate your databases via the initialisers.

Is it for every project? No. However for "classical" database projects where you're going to be writing a bunch of CRUD and shoving it into C# classes, it is very helpful.

No-one ever migrates their db unless they really are in some sort of serious shit. I'll also guess you've never done it because the EF + MySQL = a world of pain. MySQL does not like nested queries, the EF uses them like they're crack, one of the reasons the SQL it produces is so hard to read. I'd guess that the EF + [any db that's not MS SQL] probably suffers from similar "holy shit why did the DB just die, oh it's the EF" problem.

I'm also guessing John's problem with that query is that it's a simple aggregate query in SQL, there's no good reason to use the ORM, mainly because you have no idea what convoluted SQL it's going to spit out. There's only downsides with no upsides.

> No-one ever migrates their db unless they really are in some sort of serious shit.

I've been involved in several migrations. All of which were planned over years in some cases. There was no "serious shit" at any point in the process.

One popular reason for migrations is licensing costs (e.g. escaping Oracle or IBM), or due to mergers where the other company had a different database system and they wanted to consolidate both technology but also expertise.

> I'll also guess you've never done it because the EF + MySQL = a world of pain.

I've never done what? I've never done EF with MySQL. I never claimed I did however.

I've used EF with other database systems including Oracle 11g, MS SQL (several), and other integrated it with some smaller database systems that had EF providers available.

> MySQL does not like nested queries, the EF uses them like they're crack, one of the reasons the SQL it produces is so hard to read.

You, the developer, formulate the queries. If you're seeing excess nesting then examine how you're doing things.

> I'd guess that the EF + [any db that's not MS SQL] probably suffers from similar "holy shit why did the DB just die, oh it's the EF" problem.

That's not been my experience. While I have found some other EF providers "limited" compared to Microsoft's provider, most of the problems encountered were due to underlying bad configuration or bad entity layout rather than EF itself. I will say Code First is significantly better than EDMX-style modelling. In particular in larger projects (the Visual Studio designer kind of sucks).

> mainly because you have no idea what convoluted SQL it's going to spit out.

It is very easy to see exactly what EF spits out. You can even pipe that output all over the place (e.g. error system, performance logs, etc). There's no excuse for being ignorant of what EF is doing under the hood anymore.

Indeed for web development tools like MiniProfiler give you access to the queries in the page along with execution time and duplicate query detection. We use it on Oracle with the Devart provider and it's pretty snappy even with the somewhat bonkers looking queries it sometimes generates.

Because in my office the alternative is PL/SQL functions that dynamically build WHERE clauses, and I'll choose EF over that any day.

Given a best-case MySQL database that needs to be migrated elsewhere for data quality or performance reasons, it could be a very straightforward task that any qualified DBA could manage. That isn't the most common case, but it's one you can achieve by intention.

Don't try to read the generated SQL. Debug the query plan directly.

Yes, but using projections and generally avoiding descending deep down object graphs for queries of multiple objects generally yields acceptable performance.

> And because Entity Framework puts the Select > after the From clause

I think that you're confusing LINQ with the ORM.

LINQ is a language feature, Language INtegrated Query. It operates on IEnumerable<T>. They introduced a new interface, IQueryable<T> which extends IEnumerable<T> with Expression. Basically it exposes your "query" as an Expression Tree - this can then be used by QueryProviders to do cool stuff like generate SQL.

A number of ORMs implement LINQ QueryProviders; Entity Framework, NHibernate, BLToolkit are all good examples.

EntityFramework, like a lot of fully-featured ORMs suffers from all of the traditional ORM problems: runtime performance is sub-optimal, it's query heavy and if you use the visual designer it encourages bad development practices.

On the other hand it allows people with very little knowledge to be productive; for small-scale projects the bang-for-buck is hard to compete against.

EF has too much churn to be relied upon. They completely re-write it every few years. This time they are really taking an axe to the whole existing structure

I like EF well enough but if you don't know what you're doing it definitely makes it very easy to write code that works but runs really slowly when n is bigger than like, 10.

Sequel is the best ORM I've ever seen or used, but that is (probably IMO) not a good application of it.

It gives you total choice over what level of abstraction you want, so this is a particularly egregious use of it, being as nothing about that query is dynamic :)

You can use it purely to execute handwritten SQL queries loaded from files, or stored procedures, or any level of abstraction between raw SQL and the monstrosity you posted. The model layer is totally optional and is built cleanly on top of the query/relational layer, not inside of it.

The thing is a fucking work of art to be honest. The design shows impeccable taste, ruby's more exotic features are deployed only when really needed, everything is so clear, especially if youre used to activerecord.

BTW also the developer is super helpful and pops up everywhere to answer questions.

jeremy is incredible. he is more helpful and available (and certainly more knowledgeable) than most paid support people for premium products.

if you are learning sequel, stop by the irc channel and he'll probably answer your question.

seriously the man is a hero. I owe him about a brewerys worth of drinks.

EDIT: I just went to look at the github: 1821 stars, zero issues. And yes, github is the projects official bug tracker. There are 662 closed issues.

I must see if I can arrange for work to send him some money...

Or a brewery!

sequel + postgresql is pretty much a dream team. i really miss it now that i am on a python stack.

Agreed, but it's important to understand what drove the adoption of NoSQL and schema-less stuff as well as the related trend of dynamic languages like Ruby and JavaScript.

(1) SQL server software itself was clunky, hard to scale, complex to deploy in a clustered fashion, and generally "old" in a bad way. This made NoSQL seem like a breath of fresh air.

(2) Startups! Ship now! MVP! Fail fast! The whole industry has been operating on this ship-fast-fail-fast way of doing things that prioritizes development speed and agility over correctness. That's because half the time what you are developing will be thrown away anyway -- it will fail, or you'll "pivot" and rip it all apart and almost start over. No point in static languages, SQL schemas, or other explicit design elements in that world.

#2 is I think the more significant of the two for things like schema-less MongoDB or dynamic languages. Once you get product/market fit, you typically end up taking the crap you wrote with no types and schemas and re-doing it in a "real language" with a "real database." But as they say in Startupistan, that's a "good problem to have."

Where this line of reason goes completely off the rails is in thinking that schema-less databases are a hedge for future uncertainty. Yes they let you churn out code slightly faster, but then your data becomes a ball of mud just as quickly as your code base, except the former is much much worse because when you pivot you still need your data even if you decide to chuck out all the code that goes with it.

In fact, a traditional RDBMS is designed to allow for any kind of ad-hoc querying you desire with reasonable performance, and the ability to normalize/denormalize, index and materialize views in order to optimize unforeseen use cases. The excuse of poor scalability is just a rationalization that some kids who didn't understand SQL used to justify using something that has a shiny marketing page that they ignorantly found more viscerally appealing. The tradeoff was all wrong, because 99.9% of projects will never need to scale beyond a single DB server, and for those that do, the flexibility of a well-defined schema and ad-hoc queryability will give them an early advantage in what kind of pivots are reasonable—if they are lucky enough to have to scale then they can replace the RDBMS with the context of knowing what exactly their use case that they must scale is. And at that point you'll presumably have the resources to do it, which is much more effective than prematurely attempting to design a scalable infrastructure and discovering that you didn't have the first clue what the real bottlenecks or even the full use case would be.

The "schemaless" argument is nonsense as well. No schema enforcement at the database level means more checking at the application level (more work basically).

I imagine "whole industry" == Silicon Valley?

I am yet to do any project that doesn't use SQL servers for data storage.

Yeah, I mean "Greater Silicon Valley" by which I mean SV itself and its wider diaspora. Other names include Startupistan, Hackerdom, etc.

I assure you that whoever wrote that code knows SQL.

The builder syntax is convenient because it allows queries to be composable without ugly string-smashing. For instance, you could take the object built out of that "User.select..." and pass it around into other functions that append extra 'where' clauses, etc.

I may just be rehashing sibling arguments here, but to me that particular API looks very much what I think an SQL-wrapping library ought to be: a replacement for string concatenation and something that allows you to treat SQL queries as data. I don't know anything about Sequel, but that example still feels close enough to SQL.

My experience with ORMs has been that I eventually end up regretting using one if I try to model my data as objects because with ORMs it's easy to code yourself into a corner where you end up wishing that your design supported the relational model instead.

I do understand the desire for a good ORM, though. SQL is extremely powerful and a well-designed database is a joy to work with, but a sequence of tuples is often not the most convenient datastructure to process in most programming languages.

In my experience, it's great to start with ORMs, and they help you go really fast, but if the project is important then sooner or later you're going to find a case where you need to go around it and write custom SQL (probably for performance reasons).

And that's fine, but it does mean that you should pick an ORM that plays nicely with that workflow (or otherwise design your ORM to be compatible with it). Back when I was working with Perl I found that you could do some pretty good stuff with RoseDB -- it could load objects from your custom queries and they'd still work like normal ORM objects (i.e. you can load related objects from them). I've missed that capability on several occasions since.

This sums up my experience as well. Great for all the tedious insert update type stuff, but more hassle than they are worth for complex queries.

>As a greying developer

The problem with being a young person who wants to make his or her mark on the world is that if the people before you did an excellent job, you can only make things worse. In fact, I think this is one of our larger problems not only in technology but in society (see the recent move to extremism in many aspects of our political and religious life the people who grew up under more moderate times are pushing for).

Some things should be boring and stable. Build on top of those things. Redoing the foundation every six months is just sub-optimal. Accept that you may not make that big mark in the world, or if you do, it'll be in ways you completely did not expect, just like everyone else who made their mark.

I agree with you. Also, we should be developing products that last. And be proud of that. Why does everything always have to be new?

You sound like someone who thinks we thinks we should still be using Cobol and IBM Series mainframes.

Technology changes. It improves. It gets faster, easier and more responsive to business requirements. If you don't embrace change in the IT industry then get out. Because you simply won't survive.

Technology changes.


It improves.


It gets faster, easier and more responsive to business requirements.


There are a lot of ideas in technology that are widely accepted, particularly by younger, less experienced generations, as being The Best Way to do things. However, if challenged, most of those people don't really know why they believe that. They have never personally seen any hard evidence that The Best Way is actually better than known alternatives. They have never spent significant time trying those alternatives for themselves to gain first-hand experience of the pros and cons. In fact, their belief is usually nothing more than a matter of faith, a blind trust in the dogma of the day and the rhetoric they find on-line or hear from their peers or managers.

See also: OOP, design patterns, UML and round-trip development processes, frameworks, ORMs, XML, dynamically typed languages, more frameworks, TDD, NoSQL databases, and so on, many of which were either invented or popularised as a way to work around the artificial problems created by one or more of the preceding items.

Of course I'm being a little facetious here. But if you think I'm entirely kidding, just spend a few minutes with Google and see how much advocacy there has been for each of those ideas in their day and how much actual data there has ever been to justify that advocacy. And really I'm being kind here, because you talked about things that are supposedly "faster, easier and more responsive to business requirements" and I didn't even mention words like "agile" or "craftsmanship" in my list.

If you don't embrace change in the IT industry then get out. Because you simply won't survive.

I embrace using the right tool for the job. Whether it's new and shiny or old but tried-and-tested doesn't really matter, though if I had to pick one or the other for some bizarre reason, I'd pick tried-and-tested eight days a week. There have been only a few truly significant advances in the past decade or two, but very many technologies that were initially much-hyped but then failed to stand the test of time for exactly the reasons that critics outside the hype bubble had always predicted.

I've noticed this phenomenon as well, and it's very frustrating. Let's add async-and-future-everything to the list as well. I honestly don't understand how to convince people to make an honest assessment of the alternatives, especially when they're unable to articulate what exactly they're purchasing with the complexity they're adding.

Will our profession ever stop being fad-driven?

That's rather going to extremes with what I said. So, no, I don't think like that at all.

PostgreSQL is a very different beast to what it was 10 years ago.

So for a while it didn't support use cases as well as various NoSQL/NewSQL databases. However, PostgreSQL has also been built by great bunch of developers who are happy to adapt and implement the new, instead of harping on about the past.

So personally I adapt that advice to instead "use evidence to guide decisions".

The lack of a schema may sound interesting, and in some cases it can certainly have its benefits. However, for many the usage of a schemaless storage engine leads to the problem of implicit schemas.

As another greying developer, one thing that was seen with ORMs were implicit schemas, often motivated as bureaucracy shortcuts.

So, you need a dictionary of configuration data for your user, but the bureaucratic overhead of adding that to your schema is too much? Why, then take your fancy OO serialization technology and save that little dictionary as a binary blob and stick it into just one new column in the database.

Problem solved...until years later when information in one of those blobs becomes tied (by a maintenance programmer who didn't entirely understand the system architecture) to what is effectively an implicit schema.

And I thought I was the only one who'd encountered that...

I started with SQL in the 90s and completely agree with you. People who do not know SQL tend to reinvent it poorly in a language not designed to work with sets of data.

We have started using jOOQ and it strikes a great balance of raw sql/typed sql and result -> object mapping. I don't ever seen going back to a heavyweight ORM again. There is just too much magic and lost time spend figuring out performance issues for a minimal gain in ramp up for those who do not know SQL.

Gosh, no kidding. (I sorta disagree about ORMs though -- if you're selecting by ID it's monkey work to write those queries, but anything complex, sure, use SQL)

Right. A good ORM should deal with all your common cases (if not, your schema probably needs improving), and it saves you writing a lot of boilerplate code.

For the edge cases where you need complex joins and/or extra performance, don't try and bother with the ORM, just drop to SQL.

Agreed. Also it should be generating queries based on foreign keys/indexes (including joins).

Others should not be allowed in an interactive app anyway: developer should be warned/prevented using a "batch" query by accident.

I would like to say lets step back and not conflate SQL and relational databases together. Clearly SQL as the language the primary way most people interact with relational database.

In my my mind SQL as a language is a huge PITA. First, parsing of complex statements is expensive (there's workloads where SQL parsing takes more time then processing the results).

Second, as SQL exists today (SQL2011) it's a large, complex language that's not implemented uniformly. So I understand why people want to build programmable interfaces for generating queries versus writing giant string statements that expand to SQL.

I personally would wish that there was an alternative language for interacting with relational databases that isn't SQL. Just expose the whole relational expression tree to users say ala S expressions. It's not like the relational engine is going to optimize / re-order it anyways.

I mean something along the lines of:

    [ "name" ],
    [ [ SUM, "COUNT(*)" ],
      [ SUM, "cost" ]],
    [ JOIN, [ ==, "user_id", "customer_id"],
      [ FILTER, [
         [ &&, 
           [ >=, "age", "30" ],
           [ IN, "state", "NY", "CT", "NJ"]],
         [ TABLE, "customer" ]
      [ TABLE, "orders" ]]]]]
Is it more verbose, yes. But much easier to compose, parse and machine transform by software (code is data). Also, makes you think in terms of relational operations/expressions versus SQL the language.

Actually, some work on this has been done for CHICKEN Scheme in extensions "ssql" and "ssql-postgresql"[0]. I'm not too familiar with these particular eggs, but the flavor might be conveyed by this example:

    (ssql->sql #f '(select (columns (col actors name) 
                                    (count (col roles id)))
                     (from roles actors)
                     (where (= (col roles actor_id) (col actors id)))
                     (group (col actors name))))
translates to "SELECT actors.name, COUNT(roles.id) FROM roles, actors WHERE (roles.actor_id = actors.id) GROUP BY actors.name"

In the above sexpr, the SQL query is expressed as a Scheme/Lisp hierarchical list, which I think is a quite elegant notation. I'm sure other Scheme implementations provide variations on this theme of SQL generation. Using some macro magic, there would certainly be a lot of ways to organize it and probably it's not even all that hard to do.

[0] documentation at: http://wiki.call-cc.org/eggref/4/ssql

It's def a step in the right direction. At least in terms of letting you mechanically transform queries.

The downside (IMO) is that it's still reflecting SQL the language ... not the relational model.

You should learn SQL and understand relational databases. But using an abstraction (ORM) to cover 80% of the use cases is much better than writing tons of boilerplate SQL. http://java.dzone.com/articles/martin-fowler-orm-hate

Thank you. I searched to see if his article had been mentioned, and glad to find you did. The link he gives to Ted Neward's post is gone, but I found it in the Wayback Machine: https://web.archive.org/web/20141205230114/http://blogs.tedn...

Seems like this thread has gone down a very old, well worn rabbit hole.

In a compiled language like C#, "wrapping" SQL is a good idea, at the very least to catch bugs at compile time rather than runtime. Also makes it easier to switch from one DB provider to another if you need to.

You are less likely to suffer from SQL injection if you use at least some kind of wrapper, although a minimal wrapper that takes a SQL with placeholders and arguments would do for that.

My preferred way to wrap is to use EF with a dash of Linq, which is excellent and yields great readability and flexibility.

There is no way I am going to be willingly writing SQL statements in production code (unless I am forced to do so by some legacy problem). Worse case if the SQL is so complex that it is too much for Linq, then write a stored procedure.

Back in 2002 I had to write hand crafted SQL before mainstream ORMs and at the time I thought it sucked bad! I was looking for ways to generate the code from schema but didn't have time to go and back-fix all the legacy code ad-hoc SQL statments!

You are less likely to suffer from SQL injection if you use at least some kind of wrapper

Pretty much all of our coding guidelines on my team are just guidelines. The one absolute law is that all data going to the DB must be paramaterized, nothing goes in as string substitutions.

But virtually all of our DB access (like, at least 99%) is by stored proc anyway. We're just barely able to keep up with performance requirements by tuning things just right, and in my experience, EF isn't able to generate queries that are as efficient as we can by hand with some fiddling. I know that sounds like "I write in assembler, 'cause no compiler is as good as me", but that is what the reality seems like.

Out of interest, are there more types of SQL injection, or are they all a case of inserting a semicolon, and running a second query afterwards to get the information you want?

Surely that is something that could be disabled at the driver level for security? Make sure it only send one statement at a time.

Correct me if I have missed something.

That's what it boils down to, but limiting execution to a single command would only half fix the issue anyway. It would still be possible to put an early termination into a string and comment out everything else in the command, thereby completely changing what it meant.

Passing data properly through parameters isn't really very hard, there's no point in looking for incomplete work-arounds.

I personally prefer SQL + Dapper instead of ORM stuff if it is Linq-level SQL, and sprocs w/ Dapper otherwise.

The real issue isn't "newtech koolaid" it is the definition of the problem.

I find that people that don't understand the problem like MongoDB for the ability to be "flexible" in designing and modifying down the road. Though I like to argue you have flexibility in SQL also.

I agree, I've read a few 'screw Mongo, I'm going to SQL' type blogs, it really seems like they either didn't grasp how to architect Mongo correctly and tried to do things that don't work well with it, or they had a problem that was better solved by SQL in the first place.

SQL does many things very well that noSQL stuff won't. Also vice-versa.

This is really what he's driving at with the newtech koolaid. People jumped on MongoDB etc. bandwagon because they drank the koolaid of "Schema less is best!" and "It's so fast and shiny!" without really understanding the problem, let alone the right solution.

Unfortunately this seems to be a regular pattern in tech, a mistake (mostly) repeated by the young fresh developers coming in. Usually after you get burned once or twice you stop making that mistake.

I'm not a Comp Sci grad so can't speak from experience, but most of the syllabuses I've seen seem to focus on a very short term, single semester (or at most year) approach to projects. It would be good if they could start with a project and be required to re-write, adapt or improve some kind of project from beginning of the college time to the end. It would be a good introduction to technical debt, amongst other problems. (Heck if you really want to go for broke, maybe even mix things up a little, next semester you'll be working with someone elses code! That would be painful but might teach people the value of good comments, tests etc.)

> This is really what he's driving at with the newtech koolaid. People jumped on MongoDB etc. bandwagon because they drank the koolaid of "Schema less is best!" and "It's so fast and shiny!" without really understanding the problem, let alone the right solution.

Exactly. People report "X makes Y so easy!" and because of the breathy excited "look what I did in just an hour" blog posts the naive believe that using X for Y or similar things negates the need to properly design and think ahead.

If you take that approach then even if you are accidentally using the right tool doe the job, you may well be using it very wrongly and creating an unmaintainable and/or inefficient mess that you'll have to fix later.

A few days coding can save you a good hour or three of design and thinking beforehand.

I guess but college isn't a vocational school. The point is to understand the CS concepts and not necessarily to know all the engineering kind of stuff you'd need in a "real-world" project (which often won't be "challenging" in the CS department).

I honestly don't think I've ever seen a valid use case for Mongo. If you're going to query your data, you have to know what fields you're looking for, right? So why not create a schema that has those fields?

If you're doing a "tracer bullet"-style method of development where the requirements aren't quite set in stone, MongoDB and other schemaless databases can really save time when it comes to adding basic functionality quickly.

Another use case: if you have an ad-hoc group of developers working on a project in a decentralized fashion, they can each work on local codebases of the project without needing an up-to-date setup script or migration scripts.

I'd maybe even argue that a lot of these "goodbye MongoDB" developer blog posts are sort of unnecessary. MongoDB allows for fast deployment and the convenience of a flexible database at the cost of speed, so eventual migration to a more solid relational database could/should be part of a long-term plan when devs choose to use MongoDB. It's all about using the right tool for the job.

You can know a field or fields of the data while leaving the possibility of extra fields that you don't need to know. In practice if you're making a web application and dealing with JSON, it's typically very easy to work with JSON-based databases instead of using serialization functions. You can enforce the schema with various tools at the application level when needed.

MongoDB also has some performance benefits over many traditional databases, and has "native" support for binary data.

I personally prefer CouchDB over MongoDB for most application that are well suited to use a "NoSQL" database, but MongoDB is sometimes decent too.

Mongo doesn't stop you from using a schema, but you'll have to enforce it in your application code rather than the database itself.

> but you'll have to enforce it in your application code rather than the database itself.

You mean in every one of your application's code. You are using a schema no matter what, but it's a missing feature in mongo.

the only valid use case is when your data is so unstructured so that you don't care what the fields in it are and you treat it like a black box. So maybe some kind of user-generated markup that you just shift in and out.

those kinds of applications are not usually kinds of problems we have to solve.

I couldn't agree more with you. I love SQL for when I have a concrete data model, when I don't I love NoSQL. I could use both if I really wanted to, but you need to know when to force a square peg into a circle. Mongo has a good how to on it here. http://blog.mongodb.org/post/72874267152/transitioning-from-...

SQL databases had their imminent death coming. They were stuck. However, PostgreSQL is under more active development than most other databases, SQL or NoSQL. It is new and cool technology.

In the end, whether you speak SQL or use NoSQL construct matters as much as whether you write a program in C or Pascal. You can have endless debates about form and function and you might want to see and identify with a winner, but in the end whether the tools and the community surrounding a technology fits people's needs is the only thing that really matters.

Whenever I see someone shoehorning something into MongoDB or something else I ask the reasoning for not using a relational database. Its always something along the lines of "We don't do that now" or "schemaless" (which as the article points out means more work for the dev) something similar, with the "fast" added somewhere into the description.

The crazy thing is most of them are working on a single node, so no horizontal scaling. And they seem to think non-relational is the new way to do things.

If the API surrounding the ORM is well-designed, using an SQL wrapper like this eliminates the possibility of SQL injection. There are arguments for not always doing things the same way.

Using sprocs or parameterized queries also prevents SQL injection.

The advantage of this is that you can pass the relational dataset that's a result of this code around to other methods. Those methods can add additional filters, different ordering, etc. You can't do that with raw sql strings.

Agreed. And one might note that there's a really good reason Oracle has dominated for decades, and Larry Ellison (who read the SQL research paper, and formed a company to implement it nearing 40 years ago) is a billionaire.

Oracle's success is due to their aggressive expansion into business applications. Their database products have been taking a hammering as enterprises moved to data lakes and horizontally scaled databases like Casssandra.

Your argument would've made sense a few years ago but not now.

i think there is a line to be drawn.. If the ORM is saving you no effort, I fully agree. However, I suspect most ORMs in the short term boost early productivity quite a bit.

It's like writing an app from scratch in assembly, versus C.. The compiler will do quite a bit of the dirty work for you, and if you look at it in a disassembler no doubt you could find tons of improvements to be made (errmmm depending on the compiler I suppose). Doesn't mean either one is the wrong way to go about it. Now if all you do is stuff ASM code in your C app, yeah.. Why bother?

I agree with the commenters who argue for the value in a library that wraps strings that happen to be SQL statements, but none that I saw touched on this point: I think most users of such a library will have to know SQL. These libraries seem to have a one-to-one matching between concepts in them, and concepts in SQL. Hence, the point is not protecting developers from learning SQL (because you can't, for these needs), but in finding the least painful way of integrating SQL into the application logic.

Right. If you write SQL directly, you can also use commands such as EXPLAIN to see what the lookup strategy will be. If there's a full table scan of a large table involved, maybe you need a different query or a new index. If the SQL generation is hidden by some library, you can't do that.

Wikipedia runs on MySQL. They have replicated read-only copies of the database and ngnix caches which handle most read-type operations. Is your site busier than Wikipedia?

Wait... does preferring to write SQL (and knowing how) make me old now?


I think the problem is that NoSQL has been the cool new thing, which causes people to use it without really understanding why it is better/worse than other solutions.

Tools like MongoDB can do things that are extremely difficult or impossible with MySQL/PostgreSQL and they are a great choice for those situations. Using it simply out of laziness or misunderstanding, though, is probably going to create problems later.

I sincerely still prefer mongodb syntax because:

- Fits well with a programming language; {a: data.x} is better than 'WHERE A="' + data.x + '"' (sanitize?) or similar which are harder to read. SQL queries are good for direct input, mongodb queries styles are better to be used with a programming language;

If you're building queries in ANY language by concatenating strings you're doing it wrong.

In a sane language that query might look something like

cursor.execute("select field from tbl where long_obtuse_column_name=:foo and status != :status", foo=query, status='open')

All parameters are properly escaped by the api of course, so even if foo is "'; drop table audit; " you don't have any problems.

I've said similar because of this, so your comment is included in my statement.

That is not how you do parameterized queries. With any civilized database library, it would be something along the lines of `"WHERE a = ? AND b = ?", data.x, data.y`, so that the parameters are like function parameters.

I've said similar because of this, so your comment is included in my statement. That doesn't change much

Your users don't give two craps about the syntax. The technology choice is one of application requirements. If you chose Mongo over e.g. a SQL DB because you like your code to look pretty then I have to admit that I wouldn't trust your software at all.

>Your users don't give two craps about the syntax.

Of course not, but your developers yes. Use a complex syntax and more bugs will appear. Do you care about bugs? No, but your users will.

> If you chose Mongo over e.g. a SQL DB because you like your code to look pretty then I have to admit that I wouldn't trust your software at all.

This is not my only reason.

Your comment made me realize why I prefer to work in ORMs instead of raw SQL, even though I'm frequently frustrated by their limitations - it all comes down to composability.

The mongodb syntax looks like it supports composing statements much more readily than SQL. It makes me wonder if there's been any serious work done making a query language that's fully as general as SQL, but is designed to be safely and easily composable so that you can build queries up from parts.

Anyone know of such a thing? The mongodb approach of query-as-data-structure is as close as I've seen.

> The mongodb syntax looks like it supports composing statements much more readily than SQL. It makes me wonder if there's been any serious work done making a query language that's fully as general as SQL, but is designed to be safely and easily composable so that you can build queries up from parts.

Haven't you noticed that it's exactly what the Sequel snippet does? It's also what SQLAlchemy's Expression Language does. It's a reification of SQL queries in the host language, so it can/should give pretty much all the tools of SQL.

In some ways it does, but also remember it's a MUCH less rich query language. It's a lot easier to make a simple api mapping when you don't have such exotic things as "joins".

The ORMs I've used have always tried to map tables to classes (or whatever similar language structure is available). That works well almost all the time, but I get tripped up when the queries get to be more complex or specialized to a particular use.

I haven't used SQLAlchemy's expression language, but glancing over it now it looks like it might be just what I was looking for (at least for python), so thanks for the pointer.

If you really care about your data and you don't think a database should just be a bag where you put stuff but something that can be trusted , and efficient when it comes to storing data then you wouldn't be using MongoDB.

I agree, using ORMs and such hide the real SQL and make it much harder to optimize by using explains etc.

Just have Rails dump the query and stick "EXPLAIN" before it.

What I'd like to see is a universal SQL that can be translated to whatever dialect of SQL my current database is using. That way I won't have to relearn SQL every time I start a project with a different database engine.

You could use Teiid (https://github.com/teiid/teiid). The SQL dialect is similar to Postgres, and it has built-in translators to handle all of the popular relational databases (Postgres, Oracle, MySQL). Added bonus is that even some NoSQL databases are supported, and you can do things like join a table from a MySQL database against a collection in MongoDB. Full CRUD is supported for most translators.

If you're using Java (JDBC), there's an embedded kit so you don't have to run a standalone server. If you're not using Java, you can run the standalone server which also emulates the Postgres ODBC protocol so you can (for example) use `psql` to connect and run queries.

Very interesting, thanks!

The fact that you don't know what an ORM is is what makes me cry.

Stuart Sierra (@stuartsierra): "SQL is already a DSL"

This is a `negate` method in Arel (by Rails):

    class User < ActiveRecord::Base
      def self.negate(scope)
        where_values = scope.where_values

        scope.except(:where).where where_values.reduce(:and).not

      def self.suspended
        where.not(confirmed: true).where(arel_table[:created_at].lt 2.weeks.ago)

    unsuspended_users = User.negate User.suspended
Do this in SQL. Composability is the real boost, and you have composabiliy when you don't have to build a string in order to interact with the db.

unfortunately, where_values is also a private / unofficial API with no stability guarantees, so it can change at any time =/

True, but that's a problem related to Arel (which is poorly documented too... I think Rails is neglecting one of its greatest features).


There's quite a bit more I can do with Arel than I can do with raw SQL, especially if I'm already in Ruby. So... no thanks. :)

Instead of making a statement like that... educate me. I'm very happy to learn about new technologies. What extra, interesting things can you do with Arel that aren't possible in SQL?

Arel is just a syntax tree for SQL. It doesn't strictly give you any more power than SQL - I think your question is a little bit disingenuous, as you knew this already - but code manipulating syntax trees is better for query dynamism than code manipulating strings that don't have further typing.

I think your question is a little bit disingenuous, as you knew this already

No, it was quite genuine. I am not deeply familiar with Arel.


Not this argument again. I'm all for learning SQL (just like one should always learn foundational concepts), but for some reason there's this notion popping up that Real Programmers use Raw SQL, just like 20 years ago Real Programmers used Assembly. For the love of God, stop hating on basic software engineering principles like code-reuse and allow us to move up the abstraction hierarchy like we do in literally every other facet of software development. Here's a comment I made on the issue of ORMs vs Raw SQL a while back that goes into more detail: https://news.ycombinator.com/item?id=8134205

In the past year I've also grown very fond of type safety within my ORM coding as well, something that's hard to achieve at compile time with just raw SQL.

This post reflects an interesting technical narrative of companies switching off MongoDB to more traditional relational databases as they grow.

Importantly, I don't think that's an indictment of MongoDB. Instead, it highlights the key advantages of NoSQL: ease of use and rapid iteration.

When you're first working on a project, MongoDB is very easy to slap in. You don't even have to create tables/collections. As you iterate, you don't have to constantly be updating schemas as your product direction shifts. Thus, MongoDB is perfect for early startups.

As a company scales, of course it's warts start to show. The very attributes (like schemaless storage) which were an asset when just a few developers were hacking on the project become a drawback as the company grows in size. Reliability becomes more of a concern.

Naturally, people shift to a more traditional database. Personally, I'm almost ready to switch some of our core functionality from MongoDB to Postgres. This isn't an indictment of Mongo though.

Really, you can't spend the 10 minutes designing a table structure in a SQL database? And now you have to spend months re-inventing the wheel because you wanted an easy out?

This post reflects on developers being lazy, instead of doing it right the first time around. Oh no, you have to log in to the db and run a CREATE TABLE statement every few months when you need to scale. Cry some more.

And even then, 'lazy' is subjective. I don't find SQL that hard to implement. There's a library for practically every language you can think of. What's your excuse?

Actually, I think that's a great way to think about it: NoSQL is the "dynamic typing" of the database world.

Put another way, it's like "what? you couldn't spend 10 minutes declaring types everywhere?" - yeah, it's less robust, yet dynamically typed languages remain popular.

My excuse: When I'm just past the mock stage, and still playing with what UI functionality should be, sometimes I just want to get some JSON persisted. I'm changing the shape of the data a lot, discovering the schema as I build. The persistence is frankly a technical nuisance I wish I didn't have to think about.

Sometimes, the result is "good enough" and I don't need to go through the ceremony of glueing in an ORM.

I read this article and was just like "ya. types are important" and reflected on how I used to love Ruby, but now swear against it for serious projects. Glad to see I'm not the only one who made the analogy in their heads.

Right now though, I have to write a script to pull some crap out of an Excel file and into my database. I use Scala with the Play framework, but what a nightmare getting typed information out of an Excel sheet is. No thanks, I'll just pull it out magically with Ruby and construct the JSON requests to my real server to create the data. It's a smart business decision and I would defend it all day. I'm sure there's corollaries for "dynamic databases" such as MongoDB.

To be clear though, I'm with the grandparent. I really think decisions for MongoDB are made out of a combination of laziness and eagerness most of the time. I mean honestly, if your data is going to be relatable, why would you employ a scheme that forces you to make those relationships manually? So you go to a relational databases which ironically lacks relationships, but hey there's plenty of support and libraries that will do it for you. But really if you had to ask me my personal opinion? Graph databases where relationships are FIRST CLASS CITIZENS are typically the best for serious projects containing lots of relationships.

The difference there is that you enter types for thousands of lines of code. OTOH a schema is changed relatively infrequently. Maybe a few times a day during really experimental stuff.

Your given excuse makes you sound more like a naive amateur instead of a pragmatic architect

Just in case GP cares: To me it did not sound like this at all. I think making prototypes and starting to build the thing is the key to getting to a good architecture.

I guarantee you're not designing your databases correctly if they only take you 10 minutes and only a CREATE TABLE statement every few months.

You don't even need to design the schema, just add the bits you need as you go (updating the schema takes about 20 seconds with rails for instance).

I think you're understating the cost of updating the structure of an in-use database.

I'm not even sure if you're referring to an rdbms or Mongo, which is kind of the other side there. I _think_ you're referring to an rdbms. But I also think people seriously understate the cost of changing the way your data is stored in a large in-use MongoDb too.

Mostly because an RDBMS will ask you to explain, explicitly, in complete detail, how you propose to consolidate the entire schema and existing data with the new changes without introducing paradoxes or inconsistencies.

Mongo won't. It will take you at your word. Even if that word is wrong.

Definitely not about spending 10 minutes on SQL Schema. It is all about maintenance. MongoDB is right use case, if you know what it will be there in Table.

In some cases you don't even have to log into anything, just type one line to create migration and another line to run it.

No, this is bad. You shouldn't design a system that won't work from the outset. "Plan one to throw away" is about budgeting time, not about knowingly making big technical compromises when you write the first version.

But these systems do work - it's just that they don't necessarily scal as well as others.

It's silly to expect that every single software project requires the same level of correctness or maintainability. A rapidly-developed speculative SaaS for example may be able to trade long-term maintainability or scalability for faster development time (often more important if you have no idea whether or not your idea will even work).

Not quite--plan to throw away the internals, not the interface.

This lets you do things like writing shitty hyperlinear naive solutions to get all the pieces of a system in place, and then going back and optimizing each in turn. You don't spend a lot of time fussing over little details, and instead make rapid progress.

If you don't spend time on the interfaces between the pieces, though, you're absolutely screwed.

I'll say I agree with this corollary: the schema is the interface.

I had to look the quote up, because it didn't sound quite right.

"Plan to throw one away; you will anyway."

The quote is Fred Brooks from his book "Mythical Man Month." It seems to me the author's post reflects that quote. Granted they didn't throw the entirety of every application out, but arguably it's an entirely different application now. I would even wager there's months of refactorings ahead of them as they take advantage of Postgresql's features.

I think the author of the post has the right approach. They tried hard to build something that would work right, and since it didn't, they rewrote most of it. They had enough budget and willingness to rewrite it. The author learned from their mistakes, and probably won't be using MongoDB on a future project.

The author of the comment I'm replying to seems to advocate a different approach where you choose tech and techniques that are good for short-term but maybe not long-term. This way you have planned technical debt plus unplanned technical debt. Also the comment author seems to be missing one of the post author's main points, which is that MongoDB's schemalessness is awkward even in the early stages of the app, and this is worked around by a less-than-ideal technique of moving the schema from the database layer to the application layer.

And what most people here have missed is that PGSQL is working for them 'because' they now know what their data looks like, MongoDB got them to that point. I can't speak for the OP but some of his examples speak to a lack of experience, which, had they started with PGSQL would likely have resulted in an intractable schema and we would now be reading a 'How MongoDB freed us' post.

The author said they used Mongoose to define the schema, and since they needed to define the schema, they'd rather just do it in the database. So I don't think they'd have been limited by using SQL from the start. Either they learned how to evolve a schema or they didn't have a schema that evolves well even when they were using MongoDB.

I don't know about this, seems like a lot of technical debt to me.

Choosing mySQL over mongoDB to start with really can't add that much time to a project, especially if you're doing it in a language like Ruby. A couple hours extra here seems very worth it to avoid a multi-week migration process a couple years down the road.

OP will be back in three years talking about migrating off of Ruby...if they make it that far :-)

Unlikely, we've been running on Ruby for about 5 years and it's the one thing we feel safe relying on for our serious internet business. Actually, our entire setup is Ruby (safe from some shell scripts here and there).

> Another way of handling this is defining a schema in your models. For example, Mongoid, a popular MongoDB ODM for Ruby, lets you do just that. However, when defining a schema using such tools one should wonder why they aren’t defining the schema in the database itself.

Bah. It's like they didn't know that schema-free data stores mean "there is no schema; different objects may have different fields". This is the whole point of MongoDB: you assume the responsibility of managing the schema. That's a "feature": you get greater flexibility by assuming more responsibility. Whether it's a useful feature, I won't say.

Their second reason is much more valid: MongoDB is not consistent/durable, all that. It's good for a cache, but not for long term data.

Here's a third reason I'd give against it as your primary data store: it's expensive. You have to keep your entire dataset in RAM, but that's not always necessary. My favorite example is from Foursquare. They had every single check-in ever in MongoDB in RAM. That's absolutely unnecessary, and quite silly to do so. Old check-ins are archived data. You don't need them. No user ever wants to know when/where they checked in three years ago. This is why at the time they were paying for 68 GB RAM Amazon boxes instead of 4-8 GB boxes. (I have no idea what they do now. I remember chatting with them on HN after a catastrophic out of memory failure when they filled up the entire 68 GB's).

We have a dataset much larger than RAM (600gb vs 60gb RAM). It's the working set that really matters. Accessing data outside the working set can be slow - unless you have SSDs :)

Regarding consistent/durable: during the past four years, we've not had any problems on this front that weren't caused by us in. We've had an issue that was a misconfiguration on our part where we allowed writes to a server because we pulled it out of a replica set. We also ran out of space on the logging volume once and that caused downtime - but, we didn't have log rotation or anything setup.

In general, we've found the failover very reliable and new primaries have come online without any problems.

That said, schemaless is both a blessing and a curse. Now that Postgres and MySQL have online alter built-in I'd possibly choose one of them if we were starting everything again.

Well, SSD's aren't nearly as fast as RAM, even today.

I think we are in agreement about the working data vs all data. I am saying that in most applications your working dataset is much smaller than your total data set. So why pay for hardware capable of holding your entire dataset in RAM when you don't need it?

I am surprised you are able to do this with Mongo. Last I checked, it simply did not handle this case, and started failing miserably if it was not able to fit all data into RAM.

Re: durability: I am not talking about the server going down, coming back up. I am talking about whether there is an fsync() when writing data. Set up a test case where you are writing data very rapidly to MongoDB, then pull the plug on the box it's running on. It'll come back up, but the data it told you it just wrote won't be there because it didn't checkpoint. Did you check that all your writes succeeded when you had node failures in your cluster? Most applications don't have the machinery to do this because generally the state necessary to check this is stored in the database, yet it's the database you are testing. The only way to test this is to also write logs (also atomically), and then verify DB data against logs. Or, just use a database that guarantees durability.

Re: consistency: MongoDB doesn't support transactions [1]. That's enough to exclude it from a large number of applications. Anything to do with money, for example, is out since you really don't want double spending to be a thing.

[1] http://docs.mongodb.org/manual/core/write-operations-atomici...

Fair points - we don't allow failed servers back into replica sets and rely on writing to multiple nodes instead of the disk as source of truth.

This may not suit everyone and absolutely does not suit financial transactions. You can bend Mongo to do it using additional collections and money movement logs... but, why bother when it's simpler to use MySQL/Postgres?

I'm not a fan of MongoDB, but you don't know what you're talking about.

MongoDB is durable. While it doesn't quite support SQL transactions, it is durable. The data is journaled before being confirmed, and once confirmed will be written to disk. It can be consistent, but this sort of breaks the whole idea of scaling and distributing the load in Mongo, or causes massive performance problems, so that is something to consider when using it. Understanding and adopting eventual consistency is tough, but it's an issue with every distributed database, not just MongoDB.

And you don't "keep your entire data set in memory". You should make sure your indexes fit in memory. Your data can be as large as you like. Most people with terabytes of data in MongoDB don't keep terabytes of RAM in their servers.

Foursquare did NOT keep every single check-in in RAM. They kept an index of them in RAM, sure. But the problem was they had a sharded MongoDB deployment and one of the shards became unbalanced and exceeded the available RAM. If that happens (harder to do these days, but not impossible), it can be very difficult to recover.

MongoDB tries to bridge that gap between NoSQL and SQL. I think the MongoDB folks originally ignored decades of database research when developing MongoDB, but they've been forced to adopt it as the years have passed. Is it an Oracle killer? No. But it can be a useful and productive tool if you understand and apply it appropriately.

Please read the first answer to http://stackoverflow.com/questions/18488209/does-mongodb-jou... before assuming that the MongoDB journal is durable.

It comes down to this: there is no way to tell MongoDB to commit a write and not return until it is durable. You can tell it to fsync() it within at most 33 ms or so from when it returns, but not immediately. There is no control in it to say "fsync() this value because it is important, then return".

That is not as durable as other database engines, which do this based on number of commits that have not been fsynced to the journal (this value can be set to 1 causing every commit to be fsynced).

Consistency is a large topic, and I am sorry I mentioned it vs something more specific, such as:

MongoDB doesn't have transactions, so you can't make multiple updates truly atomic. That's annoying at least, and unusable at most. Two phase commits let you get about 50% there, but with no rollback your application code has to implement rollbacks. In lots of scenarios your application is not smart enough to rollback a "transaction", so your data will be screwed.

Eventual consistency is starting to come under fire precisely because it avoids the hard problem of database design and is offloading it on the application developer, who is typically not as versed in how to create a system where temporary inconsistencies are OK. At best, it means that the developer is now wasting productivity on addressing consistency issues.

You may be right about Foursquare keeping only an index of all the check-ins, not the entire dataset. The main point remains: why keep essentially archived data in RAM? MongoDB does not let you keep a partial index.

MongoDB tries to give up basic datastore guarantees to gain speed and flexibility. Once again, I am not arguing that it's a useless tool. It is useful. As a cache. Or in cases where data loss is acceptable. It is much less useful in cases where dataloss is catastrophic, such as financial transactions. It will also be much more expensive at scale, whereas other datastores optimize for the conditions where RAM is expensive and disk is cheap. You can still run into this case when you are talking about a terabytes to petabytes of data.

> This is the whole point of MongoDB: you assume the responsibility of managing the schema

Who/what is the 'you' there? Don't 'you' have the responsibility of managing the schema either way? It's a question of whether you want to manage the schema through an rdbms, or... just in your application logic, I guess?

Manage as in write all the code that ensures adherence to the schema. If you say that the field called "score" is an int in a schema-ful DB, then insert an array, the DB will throw an error. If you do that in a schema-less DB, it will not unless you add a check yourself. If you are not using some type of unified DB access layer, you must perform this check every time you write a value. You must also perform the check every time you read a value, since someone else might have written something different to the DB while you weren't looking. This you take on doing much more work, yet gain flexibility and possibly some space savings (as in, you are not storing empty cells if you don't want to).

> Manage as in write all the code that ensures adherence to the schema... ...This you take on doing much more work, yet gain flexibility

The flexibility you gain seems to mostly be... the flexibility to choose not to ensure adherence to the schema.

Which may be a valid choice in some cases, but I suspect a lot of people don't quite realize they are making it let alone analyze whether it is the right choice for them, and come to regret it.

Which is pretty much my point. Most people don't realize the hidden cost here. It's the same reason you wouldn't use a dict for everything in Python.

(solely toward your final point) Many RDBMS' will optimize that for you. PostgreSQL has a bitmap structure on each row that indicates which columns are null and which have data in them. In addition, it automatically compresses certain data types. For example, strings that overflow to the point where they need to be stored in a secondary table (called "toast" in pg) will be automatically compressed.

That's pretty cool. I didn't know Postgres actually did this, though I figured it was at least possible.

You are unlikely to gain space savings, as it stores the keys as well as the values to all records.


Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact