Hacker News new | comments | show | ask | jobs | submit login
Goodbye MongoDB, Hello PostgreSQL (olery.com)
802 points by YorickPeterse 1048 days ago | hide | past | web | favorite | 374 comments



As a greying developer I am most amused by people discovering that 'old' technologies like SQL databases work really well.

The only useful piece of advice I can give a younger developer is... be careful when drinking the newtech koolaid.

And one more thing:

    star = Sequel.lit('*')

    User.select(:locale)
      .select_append { count(star).as(:amount) }
      .select_append { ((count(star) / sum(count(star)).over) * 100.0).as(:percentage) }
      .group(:locale)
      .order(Sequel.desc(:percentage))
just makes me want to cry. Learn SQL rather than wrapping it.


It's not so much about not wanting to write/understand SQL (both are still very much required), but about composability. If you want to re-use bits of a SQL query written as a string literal your only option is string concatention or using some kind of string builder/template system. In both cases there's little validation of the query's correctness (syntax wise) until you actually run it.

While I agree that many ORMs go too far or even worse, not implement certain powerful features, Sequel (and similar tools I imagine) strikes a nice balance. The particular examples I gave on their own are not super useful, but we already have quite a few queries that are composed/re-used without having to concat strings.

So tl;dr: it's about composability, not being "lazy" or "ignorant" to SQL.


I once agreed with this, but now I don't. I just want to write SQL (dammit!).

I can never, ever remember the intricacies of the Sequel API or any one of these query builder APIs. I am always looking up something that is rather trivial because I am thinking in SQL, the language, and always have to convert back to Ruby or whatever language I am working in.

CTEs and SQL functions in PostgreSQL strike a good balance in the composibility department for me, while still being able to write SQL.


Let's say you have a product search screen in your application. There's a text field for filtering on product title (WHERE title LIKE), one for filtering on UPC (WHERE upc LIKE), a couple range filters for min/max prices (WHERE price <= and/or WHERE price >=), and then on the results screen the user sort on a few different columns (ORDER BY) as well as paginate and set number of results per-page (LIMIT + OFFSET).

How exactly are you going to just write SQL if the actual query statement needs to change based on the user input? Are you going to hardcode every possible permutation, or are you going to use a query builder? Or are there some advanced Postgres features that I'm not aware of that would allow all these combinations in some kind of prepared statement?

edit: The question I'm asking is pointed towards the people who are implying that you can "just use SQL" as static statements that are not dynamically assembled. Like a static function or prepared statement that takes some parameters, and at runtime you only pass in those parameters - not rejigger the actual SQL statement fragments. Your ad-hoc query builder implementations with SQL injection vulnerabilities are not relevant to what I'm asking.


How exactly are you going to "just write SQL" if the actual query statement needs to change based on the user input?

How about something like this:

  s = Select.new
  s.add "WHERE title LIKE #{title}" if title
  s.add "WHERE price <= #{price_range}" if price_range
  s.add "LIMIT #{limit}" if limit
  s.add "OFFSET #{offset}" if offset
  s.add "id, blah, boo FROM products"
  s.execute
Note how I deliberately shuffled the order and didn't bother with escaping.

Also note how anyone who knows SQL could immediately work with this, learning curve: 5 seconds.

Why is there no ORM that works like this?


Python SQLAlchemy works a bit like this, if you use it that way:

    query = session.query(Products)

    if limit:
        query = query.limit(limit_val)

    if offset:
        query = query.offset(offset_val)
Sort is a bit tricky, but as I'm a noob at SQL Alchemy, I had some not-so-pretty boilerplate code implemented for decorating queries with custom sorts. But essentially it boiled down to:

    if sort_type == SORT_REVERSE_CONST:
         sort_obj = sort_obj.desc()
    else:
         sort_obj = sort_obj.asc()

    query = query.order_by(sort_obj)
And if you want to search through dynamic tables, change the initial query instantiation to something along these lines:

    query = session.query(retrieve_db_object(table_name))
And to execute:

    return query.all()


Python SQLAlchemy works a bit like this

All ORMs work "a bit like this". I don't want "a bit like this". I want exactly like this.

Because with my proposed interface I could be productive immediately and permanently. I would never have to refer to any documentation. Not once.

With every other ORM this is a pipe dream. Their "fancy" chainable wrapper-APIs are nothing but a ball on a chain.


Well, no, they're more than a "ball on a chain" - they're a tradeoff - like everything in software.

They give you the ability to manipulate the query in interesting ways at any point before you execute it. They let you join different queries together, built by different parts of the system, in a safe way. They let you work with the native language you're working in instead of having to construct clauses in a foreign language, using strings. They let you post optimise your loading strategies.

You might want "exactly like that", but it's not going to be as flexible as a chainable wrapper system.

Hell, I could probably give you pretty close to that in python with a little work, but it's not something I'd want to use myself.

I get it. I like working in SQL too. I know it really really well, and I cringe when I see developers writing totally sub-optimal code because they don't understand the relational data model.

But there are other ways to do things, and what you're describing doesn't give you much more than raw SQL, so why not just use raw SQL? You've added some syntactic sugar that, in my language (Python), would be a bit of a horror show (strings access local variables implicitly, no thanks). What else do you gain?

This is more verbose, granted:

    q = session.query(Products)
    if title:
        q = q.filter(Products.title.like(title))
    if price_range:
        q = q.filter(Products.price.lte(price_range))
    if offset:
        q = q.offset(offset)
    if limit:
        q = q.limit(limit)
    products = q.all()
But then you get more stuff for free, like drilling down into the other tables:

    p = products[0]
    p.supplier.contracts[0]
But that's rubbish, because you'll be loading in a really inefficient way. That's ok though, tell the system how you're going to want to load the additional data.

    q = q.options(
        joinedload('supplier').
        subqueryload('contracts')
    )
See what I got with my "ball and chain"? Turns out it was actually the anchor for the whole boat. Sure, you have to learn a new syntax, sure, it's not sql, but that doesn't make it bad or wrong.

Use whatever makes sense for your use-case. Don't limit yourself because you'd have to learn something new. Honestly, before using SQLAlchemy I mostly felt the same as you do. Many ORMs get in the way, but that's not really a problem intrinsic to ORMs.


>See what I got with my "ball and chain"? Turns out it was actually the anchor for the whole boat. Sure, you have to learn a new syntax, sure, it's not sql, but that doesn't make it bad or wrong.

If you are fine with the new debugging this entails. Take the joined load example you gave. Almost every time I encounter SQLAlchemy code, the author didn't understand the lazy joining semantics and every attribute access on each record is emitting a new query and you now have O(n) queries. It's not obvious that is happening until it bites you once you have thousands of records.

For another example, I can count on one hand the number of developers I have encountered that can understand the expiration of the objects in the session or what the engine builder stuff is doing in the boilerplate they paste in to get it to work. It always requires experimentation, a trip back to the SQLAlchemy docs, and then finally logging all SQL queries to see if it's finally emitting the optimal queries with the transaction guarantees you are looking for.

My gripe about SQLAlchemy is that it separates you too much from what is happening underneath.


Yup. I'll give you most of that - there's a pretty big learning curve with SQLAlchemy.

But, taking the lazy joining example. Someone didn't understand how it worked and now you have to fix it, which is probably as simple as changing that loading strategy to something more suitable for the way you're using the data.

Let's rewind to the late 90s. If I wanted two tables of data to write into the page in an interlaced fashion I would look at how wide they were. I'd make a decision about using two queries or using one larger join. The two query approach was sometimes required due to the size of the data, but it meant cryptic output logic to track positions of cursors relative to each other. The single joined query was simpler to deal with, but still required tracking a last_parent_id so you could swap during the interlacing.

Other developers (those same ones that didn't understand lazy joining) would loop over the first query, running extra queries in the loop (I saw this a lot). Same bad performance as the lazy join.

When you discovered this issue in the code it was a total pain to fix. You're talking about rewriting whole load of code to the point of being unrecognisable from the code you started with.

Contrast that with how easy it is to fix in the SQLAlchemy case. I mostly don't worry about loading strategies now until I'm deep in development. Something's running a bit slow, take a look at some logs, tweak a couple of things and you're golden again. That's such a powerful abstraction.

Regarding the config of the engines etc, again, it's something you need to learn. But really, someone's starting an application from scratch and they just want to dump some code in to handle all their db interactions, but they don't want to know how it works? That's on them, either learn it or use the sensible defaults (in, for example, flask-sqlalchemy).

SQLAlchemy ORM separates you from what's underneath, but it's an ORM, that's kind of the point. If you need to be closer to the metal, use SQLAlchemy Core.


Good news! Almost every language offers some sort of string concatenation with no protection whatsoever against injection, which seems to be what you're asking for.


His proposal is not string concatenation, and it is not vulnerable to injection.


"WHERE price <= #{price_range}" looks like raw interpolation to me. How do you make that not vulnerable to injection unless you're escaping all variables that might be used in a query?


How do you make that not vulnerable to injection unless you're escaping all variables that might be used in a query?

It was just a mockup. But you are right, in reality it would end up looking more like this (and use custom interpolation for escaping):

   s.add "WHERE foo > $(bar)"
Likewise a smart syntax for clause combining (AND/OR) and some kind of nesting would probably be needed.

I believe both of these problems should be solvable without compromising the simplicity of the approach.


You still need to supply the local variables to your ORM-like so that it can perform interpolation into the string (a feature which is actually best done by the database driver, NOT your ORM-like). Perhaps we can suggest this syntax:

s.add "WHERE foo > ?", bar

Or even this:

s.Where "foo > ?", bar

At that point, you've reinvented ActiveRecord or hundreds of other query builders, it also avoids you having to remember the ordering rules of sql as to which part of the query must be built first. There's a reason these query builders have converged on similar syntax.

I see where you're coming from - you don't want to learn two sets of syntax, but many query builders nowadays are very well thought out, and have a simple syntax which just echoes SQL while avoiding sqli and statement order issues, they also usually have an sql() option which lets you just send SQL if you wish for a complex query.

There are very good reasons people keep inventing layers over SQL:

They help avoid sqli

They centralise db access so you can optimise in one place by say adding caching layers to queries

They mean you don't have to deal with the vagaries of SQL for simple queries unless you want or need to


"You still need to supply the local variables to your ORM"

That can be done automatically in many languages. At least in the sense you can do:

"where value=$(bar)"

and not:

"where value = ?", bar

OpenACS was an old web framework for the Tcl language, but it even had that 20 years ago!. The company backing it failed in the .com era, and the lack of types was a turnoff for many - but it made SQL much much more readable. And of course the queries were turned into prepared statements. A query looked like this:

select foo from table where age>:min_age and name != :user_name

Things like this can also be done in node.js (disclaimer: I wrote the blog post below):

http://ivc.com/blog/better-sql-strings-in-node-js/


Perhaps we can suggest this syntax: s.add "WHERE foo > ?", bar

I'd say stick with keeping it a string and use:

  s.add "WHERE foo > $(bar)"
but many query builders nowadays are very well thought out, and have a simple syntax which just echoes SQL

And that's the problem. Everyone "echos" SQL, just slightly differently, each with their own idiosyncraticies.

If all you do is echo anyway, why don't you just let me write plain SQL, for christ's sake?

It's much easier for humans to memorize a slight SQL dialect than a different god-awful, nested ORM API boilerplate for every language they're working with.

I can write this without thinking, for queries of almost any complexity:

  s.add "SELECT id from users WHERE"
  s.add "name == $(name)" unless name.nil?
  s.add "AND age > $(age)" unless age.nil?
  s.add "GROUP BY age"
A parser/query-builder doesn't need amazing smarts to infer exactly what I want from this sloppy syntax; e.g. drop the "AND" if the first condition was false and present me with meaningful error messages when something is ambiguous.

Ask me to formulate the same query in any ORM and I'll have to refer to the documentation. And almost every time I'll waste a stupendous amount of time on getting some irrelevant, syntactic detail just right. Only because the ORM insists on an autistic API.

They mean you don't have to deal with the vagaries of SQL for simple queries unless you want or need to

After having worked with Hibernate, SQLAlchemy (the one-eyed amongst the blind), ActiveRecord, DataMapper, Sequel and a few others that I don't even remember, I conclude with passion: They have it backwards.

It's not the vagaries of SQL that I don't want to deal with. It's the vagaries of the ORM du jour that I don't want to deal with.


If all you do is echo anyway, why don't you just let me write plain SQL, for christ's sake?

They usually do more. SQL is not very pretty or friendly IMO, ORMs clean it up a bit (for example allowing arbitrary ordering, sqli, construction of multiple queries from one base). The example I gave was an arel query - very similar to yours if you move the WHERE out into the function name.

Your ORM proposal above looks fine (and very similar to existing ones but with more strings), but personally I find it uglier than the many which are available, and you haven't actually tackled the protection against sqli or explained how that would work. Normally you'd pass that through to the db driver as a $n param, rather than trying to escape a string at the ORM level. If your ORM is going to infer details from the sql, it basically has to parse that sql first and handle all the same things that current ORMs do by using named functions to add SQL - you'd find the same issues with consistency and edge cases I imagine, and of course have to be very careful about parameters and sqli.

So if you don't like ORMs, don't use them, but you do have to be sure that your own ad-hoc home-grown ORM that you're writing instead covers all the bases current ones do. I prefer the syntax of the ones you're denigrating for simple queries, even if for complex queries it might be better to write sql directly (via the functions they provide), but would be interested to see it in action - when are you going to start :)


Or you could just use Sequel and not have to build up your own library.


It's pseudocode for an implementation that doesn't exist. There is nothing in his proposal that requires this hypothetical implementation to copy the value of the price_range variable into the string at all, much less unescaped.

The strings don't even have to be sent to the DB at all. You've utterly missed his point. He wants the language to be intelligent about what the SQL means and do the right thing.

It's not raw SQL, it's an abstraction.


The psuedocode in question happens to be syntactically valid Ruby. If the goal was to demonstrate how simple query generation could be without abstraction layers like Sequel, it is a valid criticism to note that the example given is eliding a necessary feature. Especially since helping avoid injections is one of the primary advantages of such an abstraction layer.

The original argument was that if one could "just write SQL" one wouldn't need to refer to documentation to become productive, "not once". I don't personally think it's practical to use any database library without reading the documentation, but certainly the moment you get away from "the SQL I type in goes straight to the database" you're going to need documentation to tell you what's going to change.

What's being described here sounds like a library that parses SQL fragments, combines them into whole queries, and sends them off to the database. To implement this this without resorting to simple string concatenation needs an intermediate representation of SQL language components. That intermediate representation is going to look a lot like SQLAlchemy, or Sequel, or Arel. That means that instead of "just writing SQL" you're actually adding an extra layer of abstraction and another chunk of documentation to read.

All that said, the "just use SQL" approach is already fairly well served. Most languages have mature bindings to specific databases as well as at least one ODBC-style API to simplify access to those bindings. Nearly all of these have support for client-prepared statements, which handles the interpolation problem. If you want to build more complex queries on the fly, you can use string concatenation or CTEs or decide to use a SQLAlchemy/Sequel/Arel-type library for just that one case.

And, honestly, you don't need to give up SQL to make use of Sequel. I personally really dislike the query composition API demonstrated at the top of this thread, but you can do useful things with Sequel without it:

    db = Sequel.sqlite
    db.execute "create table foos (id int not null, name text)"
    db.execute "insert into foos (id, name) values (1, 'bar')"
    # or
    db[:foos].insert(id: 2, name: "baz")
    db["select * from foos where id = ?", 1].first # ==> {:id=>1, :name=>"bar"}
    # or
    db[:foos]["id = ?", 2] # ==> {:id=>2, :name=>"baz"}
No need to go any further than raw SQL, but increasing levels of abstraction available if you feel like it.


> The psuedocode in question happens to be syntactically valid Ruby.

So is everything else anyone types, code-like or not. He even said "Note how I deliberately shuffled the order and didn't bother with escaping.".

The response was flippant, intelligence-insulting, and obviously the result of failing to read thoroughly.

And speaking of intelligence-insulting, we all know you can run raw SQL through Sequel.

You're not having a useful dialogue, you're being combative, like the person I initially replied to.


I said the code snippet looked like raw interpolation to me, and I asked how it could be made not vulnerable to injection. It was an honest observation and a genuine question. No flippancy was involved. You're free to think I'm an idiot, but you are the one being insulting and combative here.


As everyone has repeatedly pointed out, once you deal with the obvious issues you quickly end up with something that looks like the gazillions of other ORMs on the market; sorry if I hurt anyone's feelings.


There is. MyBatis, which is a minimal ORM that aims to keep you as close to SQL as possible, has support for dynamic SQL: https://mybatis.github.io/mybatis-3/dynamic-sql.html

Edit [responding to moe, below]: that's a matter of taste. I prefer to have my SQL _outside_ my code. If I have to write a little XML to make it happen, so be it. Additionally, unlike your example, a strict separation of SQL and data ensures SQL injection is not possible. That's also a worthy goal.


MyBatis

Does MyBatis have an API similar to what I outlined above?

Because the page that you link to shows a horror that I can't even begin to describe:

  <select id="findActiveBlogWithTitleLike"
       resultType="Blog">
    SELECT * FROM BLOG
    WHERE state = ‘ACTIVE’
    <if test="title != null">
      AND title like #{title}
    </if>
  </select>
SQL mixed with proprietary XML soup? Talk about combining the worst from all worlds...


Get rid of the XML and you have a good start, I'd say.


> Why is there no ORM that works like this?

Because no one wants to write a SQL fragment parser. You would have write a parser that:

a) Could be started at an arbitrary place in a query and figure out what it needed to do.

b) Would be SQL-dialect aware. (Maybe -- it might be possible to skip this.)

Were it not for (a), I'd imagine you could quickly build this by using some kind of debug feature of the database -- sending it a query to parse and getting back a parse tree.


What you're describing isn't an ORM, it's a SQL query abstraction.

Knex is a very convenient, Promise-oriented query abstraction, and it has "raw" methods when only SQL snippets will do. http://knexjs.org/


Generally I'll go with something like this for these conditional search based queries

    select id, blah, boo
    from products
    where 1
    and (title like :title} or if(:title = '', false, true))
    and (price <= :price_range or if(:price_range = -1, false, true) 
    limit :limit
    offset :offset
    
Its a trade off, bit of extra complexity in the query for simpler code logic. Really, I avoid using sql for these types of "search" / find methods, they're probably going to be slow as hell as soon as you get a sizable data set.

I really despise orm's its essentially taking a black box (sql/database) and throwing it inside a much less tested, less optimized, less documented black box.

For what reason, I have no idea, nobody has ever been able to convince me there is a problem that needs solving. Almost every argument i've heard amounts to problems sql already solved a decade ago. I can only speculate but I'll hazard a guess it comes down to lack of enthusiasm to really learn sql properly (even though the person uses it / debugs it daily)


Microsoft's Entity Framework and LINQ work pretty much like that, only with nice IDE support.

I'll try to write an example like that when I get home.

Of course, they're not "just SQL", and I do understand where the above comments come from.

And yes, sometimes I have to go look up how to do things I'd do trivially on SQL.


> Why is there no ORM that works like this?

Because you're only showing a query builder, the "relational" not the "object mapper".

From an OOD point of view, if the end result of that query will be Product instances, why am I using a Select object to create them and why is it having to do some sort of string parsing to determine the objects I'm loading?


if the end result of that query will be Product instances, why am I using a Select object to create them

Because we can just infer the type to be returned via the FROM-clause of the query.

and why is it having to do some sort of string parsing to determine the objects I'm loading?

Because, to cite the immortal Larry Wall:

   The computer should be doing the hard work.
   That's what it's paid to do, after all.
   -- Larry Wall
It has to do the string parsing, escaping and intelligent validation so I, the human, can write

   s.add "count(*) / sum(count(*)) * 100 as percentage"
instead of

  .select_append { ((count(star) / sum(count(star)).over) * 100.0).as(:percentage) }
The difference is that most people can read and write the former without thinking.

The latter version may not look much more complicated at a glance. But as we all know these seemingly trivial fragments, more often than not, take ungodly amounts of very frustrating trial & error before they play along.


You don't have to pass a block to select_append. With Sequel, you can do this:

   # db is a Sequel::Database
   s = db[:foobars]
   s = s.select Sequel.lit("max(id) as best_id")
   s = s.select_append Sequel.lit("count(*) / sum(count(*)) * 100 as percentage")
   s.sql
   # ==> "SELECT max(id) as best_id, count(*) / sum(count(*)) * 100 as percentage FROM `foobars`"
The only real difference here is Sequel.lit, which is needed for security (any secure DB interface needs to somehow be notified that strings are safe to put into a query without escaping). If I'm writing code that leans heavily on Sequel to build queries, I'll make a private method #sql that is an alias for Sequel.lit.

It's entirely feasible to use Sequel this way. I've used it like that in production. I loathe the query building DSL, myself, but it's strictly optional.


The only real difference here is Sequel.lit

Well, no.

I repeat: The real difference is that most people can't write even this utterly trivial snippet without studying the Sequel documentation first.

Now what if I want a WHERE-clause? Do I have to use s.where? Or s.select_append("where ...")? What if I need to combine them with AND/OR?

It's not ok that we have to think about all this boilerplate that has nothing to do with our actual query.

We shouldn't have to translate our perfectly unambiguous request from english (SQL) to klingon (Sequel API) in order to have it processed.


I'm skeptical that it's possible to assemble a working SQL query of much complexity unambiguously from an arbitrarily-ordered set of clauses. CTEs, subselects, and boolean conditions would require a nesting construct. Boolean conditions, order clauses, and some joins need to be assembled in a meaningful order. Eventually you're going to find a level of complexity in the query where it's necessary for a developer to read the documentation, either to figure out the API or to understand the method the library uses to stitch queries together.

There might be a sweet spot yet unreached in terms of allowing developers to work with SQL on top of as minimal a native binding as possible. I'm certainly not claiming that Sequel/SQLAlchemy/etc. is at or near that sweet spot. It might be necessary to use a language with language-level support (or macros) to really reach it.


How often do you have a need for non-engineers to be able to alter your database queries? Why is this a requirement or even desirable? There seem to be few cases where a query change needs to be made in isolation, without having to touch any of the downstream consumers of that data.

So if we remove this non-requirement of "anybody can write it", we're back to my original question: Ok, we the coders know we want a Product instance, but to build it we create a Select instance and then ask a library do so some sort of parsing?

Possibly no ORM exists that meet your criteria because there are issues with the design goals you're trying to achieve?


Have you thought about JSON as a partial query placeholder, and all the native methods available for traversal, merging, plucking, etc.?

For example, here is :

{ type: 'select', columns: [{ expression: 'orgJobs.org_job_code', alias: 'job_code' }], joins: [{ table: 'org_jobs', alias: 'orgJobs' }], limit: 100, offset: 0 }

which yields:

select orgJobs.org_job_code as job_code from org_jobs as orgJobs limit 100 offset 0

and so its easy to do things like:

query.limit = 100;

and of course:

query.columns.push(additionalColumn);

and use something like lodash/underscore (or equivalent for your language) to add collection/array/object helper methods so you never have to write for-loops for things like mapping and property plucking.


Now you have to be aware of SQL escaping to prevent SQL injection attacks.


Everyone who works with SQL should be aware of SQL escaping. There's nothing wrong using a library to do it for you, but what you should have an understanding of what the library is actually doing. Otherwise how could you possibly test it? Just by hoping it's right?


One of the things I like about EntityFramework is that you write LINQ, which odds are these days every C# dev should be familiar with. So, standby a few gotchas (which SQL has as well), the code you write to query an in memory collection is the same code you write to query the database. This also makes it fairly easy to test because you can create in memory representations of your data and check you filters, projections, orders etc.


Stuff like your example is where, in general I prefer a sproc, or something more like the following...

http://www.ivc.com/blog/better-sql-strings-in-io-js-nodejs-p...


there is, it's called anorm. https://www.playframework.com/documentation/2.3.x/ScalaAnorm

You don’t need another DSL to access relational databases SQL is already the best DSL for accessing relational databases. We don’t need to invent something new. Moreover the SQL syntax and features can differ from one database vendor to another.


Micro ORMs usually works very much like this. There is a problem when you want sub records, they are handled differently in each library or not handled at all.


You've formulated the composibility reasonably in that sentence but it's important to remember that is not the only problem the world faces in data manipulation.

I think the irony is that much of the problems of sql are related to a lack of tooling.

The lack of tooling is result of large numbers of devs being distracted by ORM which moves that particular problem into the language ide/editor.

Essentially we lack great *.sql editing, ide, macro, refactoring because of the lack of focus on sql itself.

edit: as an aside, string concatenation in the language of choice is its own road to hell.


> Essentially we lack great *.sql editing, ide, macro, refactoring because of the lack of focus on sql itself.

Except that my other tools are just an editor like emacs and makefiles. So now to be productive I have to have and ide for sql, learn sql, learn macros.

Interesting that SQL came first. If it was so easy and obvious to use it we would not have seen any ORMs by now. But every other project that uses SQL databases ends up with some kind of ORM.

One of the reason is probably because there is an impedance mismatch. Code is already written dealing with objects, functions, pointers, data. But when it talks to the database it has to talk via a command line string interface so to speak.

There is another type of impedance mismatch and that is learning a new language. One can argue SQL is there to keep & manipulate the most precious things you have -- data, so one should spend time learning it well. But the other point is, SQL is there to keep & manipulate the most precious things you -- data, so don't mess with it if you don't know SQL very well. Let someone else learn it (the author of the ORM).

I am not arguing for one point or the other, just kind saying how it is interesting how we have been going in a circle the last 15 years or so.


I think it's a mixed bag... and depends on what you want to accomplish... a lot of the time, I will find myself bypassing an ORM about 20% of the time... the other 80%, the ORM model is simpler.

I find LINQ + Entity Framework to be a blessing in C. Though I use the monad/function syntax not the LINQ query sugar when I use it.


I also use emacs(evil-mode actually). An ide is not much different then a fully configured vim or emacs.

sql-mode for some databases(mysql and oracle iirc) reads the DDL of the tables you are typing about, why is this not better and more general?

Sql is so well established and general as a solution that not learning it is a poor choice.


You mean something like this? https://www.jetbrains.com/dbe/


Yes that was in my mind as I was typing it actually :)

Notice that its only a yearish old I think?


The IntelliJ IDEA Database plugin [1] on which 0xDBE is based on has existed for quite a while. I'm not sure when it become very useful as I've only used it for a few years, but it is quite powerful and useful.

The SQL code completion is quite good (and it caches your entire database schema so it is very quick). That's not particularly special on it's own, but what makes this especially useful is IDEA's "language injection" [2] feature. This allows you to, for example, get completion for SQL when it is contained in some other language, which could be anything (e.g. a Java String, Ruby String, XML, etc). It will also analyze and report errors in this SQL on the fly.

And SQL statements contained in concatenated strings are no longer a problem because you can edit those in a separate editor window where you are only editing the SQL, and it automatically gets placed into the concatenated string.

Not sure if it addresses all of your specific concerns, though.

[1] https://www.jetbrains.com/idea/features/database_tools.html

[2] https://www.jetbrains.com/idea/help/using-language-injection...


I am rather fond of Sqitch and pgTAP as tools for database development.


Sqitch problem domain I handle a different way.

Pgtap is on my list of things to poke around with.

edit: thanks btw


I would write a SQL function that takes all of those as optional parameters and includes a lot of these:

    WHERE (_title    IS NULL OR title LIKE _title)
    AND   (_minPrice IS NULL OR price > _minPrice)
    AND   (_maxPrice IS NULL OR price < _maxPrice)


That ends up being very inefficient as the optimizer will come up with one mediocre plan that works for all those criteria.


This actually answers the question, although I would imagine the ORDER BY handling will look pretty messy (CASE statement perhaps?).

And I don't know what it would look like if the requirements changed to allow ordering by multiple columns with different possible sort directions... that might get back into dynamic SQL using a RETURN QUERY EXECUTE type of thing, which is basically using a query builder in your query language.


Most SQL engines support ordering by column number.

select c1, c2, c3, ... order by 2;

would order by "c2"


> Or are there some advanced Postgres features that I'm not aware of that would allow all these combinations in some kind of prepared statement?

I can't speak specifically to Postgres, but, in my experience, there is nearly always a way to do what you want with bound parameters. No advanced features are required. It often results in poor performance and redundant code that is hard on the eyes. You will get what you deserve, but sometimes you don't have a choice.

Here is an example of the horror:

    WHERE (:1 IS NULL OR :1 = item_id)
      AND item_num BETWEEN NVL(:2,0) AND NVL(:2,9999999999)
      AND item_date BETWEEN NVL(:3,'01-jan-1900') AND NVL(:3,'31-dec-4712')
      AND item_name LIKE NVL(:4,'%')
    ORDER BY
      CASE :5 WHEN 'NUM' THEN item_num WHEN 'DATE' THEN item_date ELSE item_name END,
      CASE :6 WHEN 'NUM' THEN item_num WHEN 'DATE' THEN item_date ELSE item_name END

Edit: I suppose you could also parameterize the ascending vs. descending sort, although I have never tried. My first thought is to duplicate each line in the ORDER BY clause: one bind parameter for ASC and another for DESC. Have each CASE return a constant if the bound value is NULL, and then bind NULL for the direction you do not want. Yuck.

I am not advocating any of this but am pointing out that bind parameters can be abused in surprising ways if you are backed into a corner.


You might be over thinking this... something like this might work out just fine. (I wouldn't necessarily do things this way, but rather keep a list of clauses and join them with " AND " to avoid keeping track of the "WHERE"s and "AND"s, but you get the point...)

    sql = 'SELECT * FROM products'
    args = []

    if title:
        sql += ' WHERE title LIKE %?%'
        args.append(title)

    if upc:
        if args:
            sql += ' AND'
        else:
            sql += ' WHERE'
        sql += ' upc LIKE %?%' 
        args.append(upc)

    if price_min:
        if args:
            sql += ' AND'
        else:
            sql += ' WHERE'
        sql += ' price >= ?'
        args.append(price_min)

    if price_max:
        if args:
            sql += ' AND'
        else:
            sql += ' WHERE'
        sql += ' price <= ?'
        args.append(price_max)

    if order_by_name:
        sql += ' ORDER BY name'
    elif order_by_price_low:
        sql += ' ORDER BY price'
    elif order_by_price_high:
        sql += ' ORDER BY price DESC'


    conn.query(sql, args)


This is a query builder. If you're using your programming language to dynamically assemble the SQL statement fragments at runtime, then you're using a query builder regardless of if it is a library with a fancy DSL that assembles an in-memory SQL AST or some kind of ad-hoc string concatenation you rolled yourself like this.

The question I'm asking is pointed towards the people who are implying that you can "just use SQL" as static statements that are not dynamically assembled. Like a static function or prepared statement that takes some parameters, and at runtime you only pass in those parameters - not rejigger the actual SQL statement fragments.


Yes, logically, it's a query builder. But it's one that doesn't force any extra dependencies in my project and is very easy to troubleshoot.

Seriously, how is this different from "just use SQL". SQL was never really intended to be used by itself (except for manually typing queries into a console). Yes, what I wrote is code that writes code, but it's far different from a larger ORM or querying framework.


Disregarding my edit and how your query builder is irrelevant to my question anyway, if I change the requirement to allow the user to sort by multiple columns instead of one, that case statement is going to either combinatorially explode or you'll also have to add some ad-hoc SQL escaping to guard against SQL injection since you can't use bound parameters in ORDER BY clauses.

It's easy to see how libraries like Sequel get invented to help people do this, and I don't really see what the problem is when the DSL stays close to SQL semantics.


Dealing with multiple ORDER BY clauses isn't any more difficult, but you'd have to figure out how to get the user input into some kind of list. That will require validation too... Just using an ORM doesn't let you ignore input validation. You wouldn't trust a user to use their own column names in a WHERE clause anymore than you'd trust them in an ORDER BY clause.

Trying to differentiate between a DSL that interfaces with SQL and SQL is nonsensical. SQL is the original DSL.


> SQL was never really intended to be used by itself.

Hmm I thought that is how it was intended to be used? If it didn't, it would look like datalog, lisp or some binary protocol with prefixed lengths and whatnot.

It was intended to be typed in by analysts at a console. Who would then print the report on the dot matrix printer and mail to the headquarters or something of that sort.


That data has to get into the RDMBS somehow...

I guess a better wording may have been to say that SQL was never intended to be used in isolation. It can be, but you've always had interfaces to other languages / platforms. SQL is a data manipulation and retrieval language - the rest of the business logic has to be performed somewhere else (excepting complex stored procedures).


I was told it was intended to be easy to use for non techies at University. It is quite wordy, kind of the opposite of some Perl.


SQL was often combined with Cobol (might still be where they have Cobol).


was? You sound like lots of paces have got rid of it.


I know this isn't central to your point, but I thought I'd say it anyway. These days I have taken to doing:

    sql = "SELECT * FROM products WHERE TRUE"
    ....
    if upc:
        sql += ' AND upc LIKE %?%'
        ....
This saves having to worry about "AND" and "WHERE", and the extra "WHERE TRUE" results in the same execution plan (thus the same performance).


I also add an extra column to my SELECTS:

  SELECT
    users.id,
    users.email,
    1
  FROM
    users
I can just use "--" to comment out any line without having to manage the last comma. Especially useful when building aggregations.


Awesome. That's such a straight-forward solution to one of the crappy edge-cases that SQL is so bad about (ending lists without commas, like JSON requires too, being another).


Now how is this an improvement over the Sequel example?

I understand the impulse to "Just write SQL." But in practice, with all the string concatenation needed to generate actual queries, you can't really see what the SQL will be without running all the code in your head anyway.


In practice, I'd reduce this to a query building function in my own code, so the readability would be much better. The benefit is that you don't need to include another library for it to work. Well, that and it is a strategy that works for all languages, not just Ruby.


And the downside is now you need to spend effort maintaing and debugging your custom query builder.


Yeah but a lot of it is quite generic, so could be reused in a few places and therefore make that extra effort worthwhile. In fact, it might be handy in a few places so you could package it up and then have a single place you have to maintain the query building code for several projects.

Wait a minute...


I think you're over estimating the amount of effort that I spend on this.


I've done this quite a bit as well: string building is pretty well understood, and works remarkably well.


Dynamic prepared(NO ADHOC, THIS IS CRAP) TSQL is a common occurrence for me in production, it is often a better choice that writing a bunch of conditional logic in SQL (to get sargeable queries) or a bunch of really unperformant case statements in where clauses.

I have written TSQL that changes the entire query (which was on the order of 80 joins) dynamically based on about ~20 different parameters, only a few actually passed into the stored procedure, the rest based on settings and configuration. It took a few hours to grok how everything came together at first, but a few tricks that the optimizer will remove means you can add arbitrary conditionals in whatever order you please.

The example you give is VERY trivial for dynamic SQL. postgres - http://www.postgresql.org/docs/9.1/static/ecpg-dynamic.html mssql - https://msdn.microsoft.com/en-us/library/ms188001.aspx


That can be solved by writing a SQL function and the user input values are arguments to the function.


    query = sprintf("select count(*) from data where %s = ?", column)
    results = sql.execute(query, filter_value);
So long as you leave the "value" portion of the query as a '?' (or %s, or whatever your connector requires) and don't use direct user input for the column names, you're still safe from SQL injection.


Even with PL/pgSQL, there is no way to build a SQL statement dynamically _and_ safely -- it's all just string concatenation there, too. However, using `CASE`, `WITH` and `LATERAL` you can have a root `SELECT` that returns one of a few different queryable code paths, and `WITH`/`LATERAL` allow you to reuse definitions.


Use format() for safely constructing dynamic queries.

http://www.postgresql.org/docs/9.4/static/functions-string.h...


You end up doing a lot of string concatenation even with that, though. It's not quite the same thing as having some kind of literal or combinator based setup.

  select(from('tab'), where(and(like('name','Mo%'),not(is('id',NULL)))


You can get a similar syntax with format() by nesting format() calls such as:

  EXECUTE format('SELECT * FROM tab %s', format('WHERE %I IS NOT NULL', 'id'));


Answer : Learn SQL. It is very easy to pass variables.


I'll deal with the problem when I have to.

However, the point is that I'd rather use SQL, it's simple as that.


Orms are not about syntax, they're about things you can't natively "think about" in SQL, like inheritance, composition, references and graph navigation


I prefer them for taking away the grunt work of needing to code updates and inserts for everything.


I'd argue it's less about composability, and more about laziness (in the computational sense). As pointed out elsewhere, if you have all of the information needed to generate a dynamic query, it's often not a huge leap from an ORM to composing strings (especially given how relatively flexible SQL syntax can be).

However, sometimes I want one part of my program to be responsible for one bit of a query, and a separate piece to be responsible for something else. To take a trivial example, say I want one object/function to be responsible for doing the right sorting, another to be responsible for any pagination, and yet another to be able to group results when needed. In that case, having a programmatic abstraction over a query (whether it be an object or a datatype, doesn't matter) can be very useful.


I agree with you example, in that case it is useful. It's just that it is a pain in the ass in so many other cases.

I always wonder why developers (usually young and enthusiastic ones...) pick a set of the most challenging requirements they can think of and then use them to justify the usage of some library or pattern that makes life a nightmare in the simpler cases, which are obviously the majority of the use cases.

It's like using a tank for going to work, because the roads might possibly have been destroyed and there might be enemies shooting at you. Then ending up destroying and having to rebuild your own garage because the tank is too big to fit in. Complete nonsense.


> I agree with you example, in that case it is useful. It's just that it is a pain in the ass in so many other cases.

It's a pain, because what we really want in such cases is a datatype server, not a relational database. For all of SQL's merits, the NoSQL backlash was not without cause. For too long the SQL hammer has been wielded to pound in too many screws.

Of course, programmers being programmers, NoSQL turned into just another hammer used in just as many inappropriate situations. The example I outlined is not a theoretical one. I've dealt with such situations...but I've done so with Datomic, which is designed with datatype-composability in mind. Similarly, I've found a healthy mix of SQL, Redis, memcache, and Datomic all in moderation can go a long way.

...but I've still not found a solution where Mongo is the obvious answer.


The biggest challenge is to understand and find the simplest case and deal with the hard cases when they happen. I had to learn this over time.


Here's an example in C#. Imagine you're querying a database of products (here represented by integers). Users can enter filter parameters - you want to build your query dynamically based upon what they enter. With LINQ, you can do this kind of composing with no effort. You also get to run the same code on any kind of Queryable, so if you feel like doing some of the work in RAM and some using a DB, your query is usually going to be very similar, if not the same.

Yes, you still have to understand what you're querying against and how you should build your queries to make best use, but I'd much rather write this kind of code than try to concatenate SQL.

  IQueryable<int> Source()
  {
    return Enumerable.Range(0, int.MaxValue).AsQueryable();
  }

  class UserFilter
  {
    public bool? EvensOnly { get; set; }
    public int? Minimum { get; set; }
    public int? Maximum { get; set; }
  }

  IEnumerable<int> Search(IQueryable<int> source, int currentPage, int pageSize, UserFilter filter)
  {
    var result = source;

    if (filter.EvensOnly.HasValue && filter.EvensOnly.Value)
    {
      result = result.Where(i => i % 2 == 0);
    }
		
    if (filter.Minimum.HasValue)
    {
      result = result.Where(i => i >= filter.Minimum.Value);
    }
		
    if (filter.Maximum.HasValue)
    {
      result = result.Where(i => i <= filter.Maximum.Value);
    }
		
    return result.Skip(currentPage * pageSize).Take(pageSize);
  }
	
  void Main()
  {
    var currentPage = 4;
    var pageSize = 5;
		
    var filter = new UserFilter {
      EvensOnly = true,
      Minimum = 1000
    };
		
    var searchResults = Search(Source(), currentPage, pageSize, filter);
  }


FYI, you can also write it this way, although you may find it less readable, it emits the exactly same query to the DB, and I personally find it much more pleasing;

  IEnumerable<int> Search(...) {
     return result
        .Where(i => !f.EvensOnly        || i % 2 == 0)
        .Where(i => !f.Minimum.HasValue || i >= f.Minimum)
        .Where(i => !f.Maximum.HasValue || i <= f.Maximum)
        .Skip(currentPage * pageSize).Take(pageSize); 
  }
Nit: EvensOnly should not be nullable since it has only two states.


I find this more readable. In fact, the code here looks more like this:

    .FilterByEvensOnly()
    .FilterByMinimum()
    .FilterByMaximum()
Obviously the example is trivialised and the real filters are more complex, requiring joins, but the pattern gives a very readable way of writing an efficient query.

Unfortunately EF produces an unreadable query when there are a few dozen filters, but the LINQ code is readable, so there hasn't been any difficulty debugging.


Works great until you need some feature that is only implemented in SQL and in my experience that's unfortunately about 30% of the time.


How often do you really reuse bits of a SQL query? SQL is the language that an ORM will generate. The question is, how much work is done to avoid using SQL and is it worth it?

The only time that I've found ORMs useful for composing queries when I have to dynamically create a query at runtime based upon user input. And even in that case, today, I'd probably still just concatenate strings for a proper SQL statement.


I use this a lot for report-builder interfaces, and according to my personal experience SQLAlchemy in Python maps really well to raw SQL while adding a lot in terms of DRY, portability and security.

I guess the underlying question is how much your development style involves meta-programming (and how much someone enjoy reinventing the wheel with their own query builder engine instead of trusting a mature ORM).


With Sequel I reuse bits all the time, because you can generally very easily chain statements that return dataset's until you force evaluation of the query.


I think there is a problem of tooling in SQL, which means it is difficult to reuse it.

I spent the other day writing a big query, joining about 10 tables. So usually I start by joining the two main ones. Check that gives me the results I want. Add in another table or conditions, check again. Repeat until all tables are joined into the query and conditions are added.

Then I noticed the results of my GROUP_CONCAT were not as I expected. I had a couple of suspect joins that I tried removing. Same problem. In these situations, it is often easier to go through the same process from scratch of adding in one table at a time and ensure that it is working.

We need some kind of unit test equivalent for SQL.


Mature relational databases support views, pre-built queries that developers can define and other developers can then use as "tables" in higher-level queries, allowing some code reuse. This has been around a long time.


It happens a lot for me. Basically you have to choose between copy-pasting a lot of sql code or trying to reuse parts when the business logic has multiple cases that are similar but not exactly equal.


The composability requirement only justifies itself when... your queries need composability.

I'm absolutely in favour of using a query builder when it avoids the need to fiddle with strings, but at the same time any decent programmer can write in half a day a query builder that satisfies the vast majority of the use cases.

So the question is: once you have covered those cases, do you need to use the query builder to write static queries? No. Do you need an extremely complex query builder to build extremely complex queries? Again, no. SQL is great for writing very readable and optimized static queries, which are almost all the SQL queries you need when you're not dealing with search forms.

Bottom line: use your favourite query builder or write a tiny one, but use it only when the resulting code is simpler than the plain SQL + string joining.


We already have quite a bunch of query fragments we use to compose bigger queries, so I'd say in those cases it's more than justified.


> So tl;dr: it's about composability, not being "lazy" or "ignorant" to SQL.

If you want composability you should factor the common sql fragments out to parameterized views (table valued functions), or just regular views. Doing this across a large project can really improve performance as well.


> In both cases there's little validation of the query's > correctness (syntax wise) until you actually run it.

Yes? So? When is this really a problem? Even if you do generate queries dynamically. When does this become such a big problem that it is worthwhile killing readability and bogging down the system with lots of extra complexity?

As for ORMs: I've done software for 25 years. I still have not seen a single example of ORMs resulting in better software. Not a single time.


I agree with this 100%. SQL works incredibly well for purely declarive, relatively simple operations. However, the moment you start doing "Do this, then that, then that", it starts to look (literally the appearance) daunting.

I myself come from the R/Python background, and my hypothesis is this difficulty around composability (especially for dplyr/pandas crowd) is what deters data scientists from writing more SQL.


Use an IDE like JetBrain's IDEA that checks the SQL syntax against the database.


I use Entity Framework as an ORM and I have to say I like it. As much as there have been a lot of false-starts related to trying to reinvent SQL, I think Entity Framework hits many of the right notes for me:

1) All my code lives in Visual Studio with compile-time type-checking. No maintaining stored procedures outside of my main codebase, no mucking about with strings. And because Entity Framework puts the Select after the From clause, I even get good intellisense on column-names.

2) I can quickly shift code into being in-memory queries from database queries.

3) While it's a little muddy mixing SQL and C#, there are places where C#/EF's syntax is so much more brief and clean since it's hyper-aware of relationships. For example, instead of writing 2 joins to get the country of a user, I can say User.Centre.Country.

Of course, the Entity Framework is adding a layer of complexity to my work - you still have to know what's going on in SQL, and that mental overhead is a downside. But it pays off.

SQL is a brilliant language, and that's why it's stood the test of time in spite of its advanced age.

But seriously, it has a lot of bad flaws - it's often difficult to make reusable components in SQL like you can in other platforms.

The underlying relational algebra is brilliant. That's why I like ORMs - they admit that SQL is ideologically beautiful, but the SQL language itself could use some loving.


Yes. I pretty much agree with you and am happy for people to use ORMs (or raw SQL if that's their thing and they are into maintaining it). It's actually the underlying 'throw away all this old SQL database stuff' that is the biggest loss for people because they miss out on the power and optimization that's gone into those databases.


But doesn't the Sequel snippet just write SQL as Ruby and get Ruby objects out? It's still pretty raw SQL, just not SQL-as-embedded-strings-guess-the-syntax-error. And surely it's easier to extract, compose or otherwise manipulate reified queries than by concatenating floating bits of strings.


Yep, even better imo is that when you run a query it outputs the generated SQL. Often I'll have a psql console open next to vim while creating Sequel queries and just copy this output over to verify the generated SQL does what I expect.


When writing raw SQL, I tend to test my queries against the database first with sample queries, then carry them into code once all the logic has been figured out.


I will just say that no one should have inline SQL in their code. For security's sake, use a parameterized query!


Entity Framework still generates pretty crappy looking SQL. It's nice to get a project going, but every time I try to look at why we are having performance issues in production and see the horrible query EF is generating I cry.


While I agree the SQL it produces looks ugly, performance wise I haven't found too many issues (using EF 5 then 6). The few cases where it produced legitimately bad SQL (slow), was due to the developer failing to grasp how queries are built (in particular abusing "in" and having multiple layers of selects instead of just a flat query which could have done the same thing).

I will say EF requires extra training/knowledge. So it isn't a "free" framework. However being able to migrate your entire code base from one database to another only changing a single annotation in the Code First classes is pretty nice. As is being able to generate your databases via the initialisers.

Is it for every project? No. However for "classical" database projects where you're going to be writing a bunch of CRUD and shoving it into C# classes, it is very helpful.


No-one ever migrates their db unless they really are in some sort of serious shit. I'll also guess you've never done it because the EF + MySQL = a world of pain. MySQL does not like nested queries, the EF uses them like they're crack, one of the reasons the SQL it produces is so hard to read. I'd guess that the EF + [any db that's not MS SQL] probably suffers from similar "holy shit why did the DB just die, oh it's the EF" problem.

I'm also guessing John's problem with that query is that it's a simple aggregate query in SQL, there's no good reason to use the ORM, mainly because you have no idea what convoluted SQL it's going to spit out. There's only downsides with no upsides.


> No-one ever migrates their db unless they really are in some sort of serious shit.

I've been involved in several migrations. All of which were planned over years in some cases. There was no "serious shit" at any point in the process.

One popular reason for migrations is licensing costs (e.g. escaping Oracle or IBM), or due to mergers where the other company had a different database system and they wanted to consolidate both technology but also expertise.

> I'll also guess you've never done it because the EF + MySQL = a world of pain.

I've never done what? I've never done EF with MySQL. I never claimed I did however.

I've used EF with other database systems including Oracle 11g, MS SQL (several), and other integrated it with some smaller database systems that had EF providers available.

> MySQL does not like nested queries, the EF uses them like they're crack, one of the reasons the SQL it produces is so hard to read.

You, the developer, formulate the queries. If you're seeing excess nesting then examine how you're doing things.

> I'd guess that the EF + [any db that's not MS SQL] probably suffers from similar "holy shit why did the DB just die, oh it's the EF" problem.

That's not been my experience. While I have found some other EF providers "limited" compared to Microsoft's provider, most of the problems encountered were due to underlying bad configuration or bad entity layout rather than EF itself. I will say Code First is significantly better than EDMX-style modelling. In particular in larger projects (the Visual Studio designer kind of sucks).

> mainly because you have no idea what convoluted SQL it's going to spit out.

It is very easy to see exactly what EF spits out. You can even pipe that output all over the place (e.g. error system, performance logs, etc). There's no excuse for being ignorant of what EF is doing under the hood anymore.


Indeed for web development tools like MiniProfiler give you access to the queries in the page along with execution time and duplicate query detection. We use it on Oracle with the Devart provider and it's pretty snappy even with the somewhat bonkers looking queries it sometimes generates.

Because in my office the alternative is PL/SQL functions that dynamically build WHERE clauses, and I'll choose EF over that any day.


Given a best-case MySQL database that needs to be migrated elsewhere for data quality or performance reasons, it could be a very straightforward task that any qualified DBA could manage. That isn't the most common case, but it's one you can achieve by intention.


Don't try to read the generated SQL. Debug the query plan directly.


Yes, but using projections and generally avoiding descending deep down object graphs for queries of multiple objects generally yields acceptable performance.


> And because Entity Framework puts the Select > after the From clause

I think that you're confusing LINQ with the ORM.

LINQ is a language feature, Language INtegrated Query. It operates on IEnumerable<T>. They introduced a new interface, IQueryable<T> which extends IEnumerable<T> with Expression. Basically it exposes your "query" as an Expression Tree - this can then be used by QueryProviders to do cool stuff like generate SQL.

A number of ORMs implement LINQ QueryProviders; Entity Framework, NHibernate, BLToolkit are all good examples.

EntityFramework, like a lot of fully-featured ORMs suffers from all of the traditional ORM problems: runtime performance is sub-optimal, it's query heavy and if you use the visual designer it encourages bad development practices.

On the other hand it allows people with very little knowledge to be productive; for small-scale projects the bang-for-buck is hard to compete against.


EF has too much churn to be relied upon. They completely re-write it every few years. This time they are really taking an axe to the whole existing structure


I like EF well enough but if you don't know what you're doing it definitely makes it very easy to write code that works but runs really slowly when n is bigger than like, 10.


Sequel is the best ORM I've ever seen or used, but that is (probably IMO) not a good application of it.

It gives you total choice over what level of abstraction you want, so this is a particularly egregious use of it, being as nothing about that query is dynamic :)

You can use it purely to execute handwritten SQL queries loaded from files, or stored procedures, or any level of abstraction between raw SQL and the monstrosity you posted. The model layer is totally optional and is built cleanly on top of the query/relational layer, not inside of it.

The thing is a fucking work of art to be honest. The design shows impeccable taste, ruby's more exotic features are deployed only when really needed, everything is so clear, especially if youre used to activerecord.

BTW also the developer is super helpful and pops up everywhere to answer questions.


jeremy is incredible. he is more helpful and available (and certainly more knowledgeable) than most paid support people for premium products.

if you are learning sequel, stop by the irc channel and he'll probably answer your question.


seriously the man is a hero. I owe him about a brewerys worth of drinks.

EDIT: I just went to look at the github: 1821 stars, zero issues. And yes, github is the projects official bug tracker. There are 662 closed issues.

I must see if I can arrange for work to send him some money...


Or a brewery!


sequel + postgresql is pretty much a dream team. i really miss it now that i am on a python stack.


Agreed, but it's important to understand what drove the adoption of NoSQL and schema-less stuff as well as the related trend of dynamic languages like Ruby and JavaScript.

(1) SQL server software itself was clunky, hard to scale, complex to deploy in a clustered fashion, and generally "old" in a bad way. This made NoSQL seem like a breath of fresh air.

(2) Startups! Ship now! MVP! Fail fast! The whole industry has been operating on this ship-fast-fail-fast way of doing things that prioritizes development speed and agility over correctness. That's because half the time what you are developing will be thrown away anyway -- it will fail, or you'll "pivot" and rip it all apart and almost start over. No point in static languages, SQL schemas, or other explicit design elements in that world.

#2 is I think the more significant of the two for things like schema-less MongoDB or dynamic languages. Once you get product/market fit, you typically end up taking the crap you wrote with no types and schemas and re-doing it in a "real language" with a "real database." But as they say in Startupistan, that's a "good problem to have."


Where this line of reason goes completely off the rails is in thinking that schema-less databases are a hedge for future uncertainty. Yes they let you churn out code slightly faster, but then your data becomes a ball of mud just as quickly as your code base, except the former is much much worse because when you pivot you still need your data even if you decide to chuck out all the code that goes with it.

In fact, a traditional RDBMS is designed to allow for any kind of ad-hoc querying you desire with reasonable performance, and the ability to normalize/denormalize, index and materialize views in order to optimize unforeseen use cases. The excuse of poor scalability is just a rationalization that some kids who didn't understand SQL used to justify using something that has a shiny marketing page that they ignorantly found more viscerally appealing. The tradeoff was all wrong, because 99.9% of projects will never need to scale beyond a single DB server, and for those that do, the flexibility of a well-defined schema and ad-hoc queryability will give them an early advantage in what kind of pivots are reasonable—if they are lucky enough to have to scale then they can replace the RDBMS with the context of knowing what exactly their use case that they must scale is. And at that point you'll presumably have the resources to do it, which is much more effective than prematurely attempting to design a scalable infrastructure and discovering that you didn't have the first clue what the real bottlenecks or even the full use case would be.


The "schemaless" argument is nonsense as well. No schema enforcement at the database level means more checking at the application level (more work basically).


I imagine "whole industry" == Silicon Valley?

I am yet to do any project that doesn't use SQL servers for data storage.


Yeah, I mean "Greater Silicon Valley" by which I mean SV itself and its wider diaspora. Other names include Startupistan, Hackerdom, etc.


I assure you that whoever wrote that code knows SQL.

The builder syntax is convenient because it allows queries to be composable without ugly string-smashing. For instance, you could take the object built out of that "User.select..." and pass it around into other functions that append extra 'where' clauses, etc.


I may just be rehashing sibling arguments here, but to me that particular API looks very much what I think an SQL-wrapping library ought to be: a replacement for string concatenation and something that allows you to treat SQL queries as data. I don't know anything about Sequel, but that example still feels close enough to SQL.

My experience with ORMs has been that I eventually end up regretting using one if I try to model my data as objects because with ORMs it's easy to code yourself into a corner where you end up wishing that your design supported the relational model instead.

I do understand the desire for a good ORM, though. SQL is extremely powerful and a well-designed database is a joy to work with, but a sequence of tuples is often not the most convenient datastructure to process in most programming languages.


In my experience, it's great to start with ORMs, and they help you go really fast, but if the project is important then sooner or later you're going to find a case where you need to go around it and write custom SQL (probably for performance reasons).

And that's fine, but it does mean that you should pick an ORM that plays nicely with that workflow (or otherwise design your ORM to be compatible with it). Back when I was working with Perl I found that you could do some pretty good stuff with RoseDB -- it could load objects from your custom queries and they'd still work like normal ORM objects (i.e. you can load related objects from them). I've missed that capability on several occasions since.


This sums up my experience as well. Great for all the tedious insert update type stuff, but more hassle than they are worth for complex queries.


>As a greying developer

The problem with being a young person who wants to make his or her mark on the world is that if the people before you did an excellent job, you can only make things worse. In fact, I think this is one of our larger problems not only in technology but in society (see the recent move to extremism in many aspects of our political and religious life the people who grew up under more moderate times are pushing for).

Some things should be boring and stable. Build on top of those things. Redoing the foundation every six months is just sub-optimal. Accept that you may not make that big mark in the world, or if you do, it'll be in ways you completely did not expect, just like everyone else who made their mark.


I agree with you. Also, we should be developing products that last. And be proud of that. Why does everything always have to be new?


You sound like someone who thinks we thinks we should still be using Cobol and IBM Series mainframes.

Technology changes. It improves. It gets faster, easier and more responsive to business requirements. If you don't embrace change in the IT industry then get out. Because you simply won't survive.


Technology changes.

Yes.

It improves.

Sometimes.

It gets faster, easier and more responsive to business requirements.

Occasionally.

There are a lot of ideas in technology that are widely accepted, particularly by younger, less experienced generations, as being The Best Way to do things. However, if challenged, most of those people don't really know why they believe that. They have never personally seen any hard evidence that The Best Way is actually better than known alternatives. They have never spent significant time trying those alternatives for themselves to gain first-hand experience of the pros and cons. In fact, their belief is usually nothing more than a matter of faith, a blind trust in the dogma of the day and the rhetoric they find on-line or hear from their peers or managers.

See also: OOP, design patterns, UML and round-trip development processes, frameworks, ORMs, XML, dynamically typed languages, more frameworks, TDD, NoSQL databases, and so on, many of which were either invented or popularised as a way to work around the artificial problems created by one or more of the preceding items.

Of course I'm being a little facetious here. But if you think I'm entirely kidding, just spend a few minutes with Google and see how much advocacy there has been for each of those ideas in their day and how much actual data there has ever been to justify that advocacy. And really I'm being kind here, because you talked about things that are supposedly "faster, easier and more responsive to business requirements" and I didn't even mention words like "agile" or "craftsmanship" in my list.

If you don't embrace change in the IT industry then get out. Because you simply won't survive.

I embrace using the right tool for the job. Whether it's new and shiny or old but tried-and-tested doesn't really matter, though if I had to pick one or the other for some bizarre reason, I'd pick tried-and-tested eight days a week. There have been only a few truly significant advances in the past decade or two, but very many technologies that were initially much-hyped but then failed to stand the test of time for exactly the reasons that critics outside the hype bubble had always predicted.


I've noticed this phenomenon as well, and it's very frustrating. Let's add async-and-future-everything to the list as well. I honestly don't understand how to convince people to make an honest assessment of the alternatives, especially when they're unable to articulate what exactly they're purchasing with the complexity they're adding.

Will our profession ever stop being fad-driven?


That's rather going to extremes with what I said. So, no, I don't think like that at all.


PostgreSQL is a very different beast to what it was 10 years ago.

So for a while it didn't support use cases as well as various NoSQL/NewSQL databases. However, PostgreSQL has also been built by great bunch of developers who are happy to adapt and implement the new, instead of harping on about the past.

So personally I adapt that advice to instead "use evidence to guide decisions".


The lack of a schema may sound interesting, and in some cases it can certainly have its benefits. However, for many the usage of a schemaless storage engine leads to the problem of implicit schemas.

As another greying developer, one thing that was seen with ORMs were implicit schemas, often motivated as bureaucracy shortcuts.

So, you need a dictionary of configuration data for your user, but the bureaucratic overhead of adding that to your schema is too much? Why, then take your fancy OO serialization technology and save that little dictionary as a binary blob and stick it into just one new column in the database.

Problem solved...until years later when information in one of those blobs becomes tied (by a maintenance programmer who didn't entirely understand the system architecture) to what is effectively an implicit schema.


And I thought I was the only one who'd encountered that...


I started with SQL in the 90s and completely agree with you. People who do not know SQL tend to reinvent it poorly in a language not designed to work with sets of data.

We have started using jOOQ and it strikes a great balance of raw sql/typed sql and result -> object mapping. I don't ever seen going back to a heavyweight ORM again. There is just too much magic and lost time spend figuring out performance issues for a minimal gain in ramp up for those who do not know SQL.


Gosh, no kidding. (I sorta disagree about ORMs though -- if you're selecting by ID it's monkey work to write those queries, but anything complex, sure, use SQL)


Right. A good ORM should deal with all your common cases (if not, your schema probably needs improving), and it saves you writing a lot of boilerplate code.

For the edge cases where you need complex joins and/or extra performance, don't try and bother with the ORM, just drop to SQL.


Agreed. Also it should be generating queries based on foreign keys/indexes (including joins).

Others should not be allowed in an interactive app anyway: developer should be warned/prevented using a "batch" query by accident.


I would like to say lets step back and not conflate SQL and relational databases together. Clearly SQL as the language the primary way most people interact with relational database.

In my my mind SQL as a language is a huge PITA. First, parsing of complex statements is expensive (there's workloads where SQL parsing takes more time then processing the results).

Second, as SQL exists today (SQL2011) it's a large, complex language that's not implemented uniformly. So I understand why people want to build programmable interfaces for generating queries versus writing giant string statements that expand to SQL.

I personally would wish that there was an alternative language for interacting with relational databases that isn't SQL. Just expose the whole relational expression tree to users say ala S expressions. It's not like the relational engine is going to optimize / re-order it anyways.

I mean something along the lines of:

  [ GROUP_AGGREGATE,
    [ "name" ],
    [ [ SUM, "COUNT(*)" ],
      [ SUM, "cost" ]],
    [ JOIN, [ ==, "user_id", "customer_id"],
      [ FILTER, [
         [ &&, 
           [ >=, "age", "30" ],
           [ IN, "state", "NY", "CT", "NJ"]],
         [ TABLE, "customer" ]
      [ TABLE, "orders" ]]]]]
Is it more verbose, yes. But much easier to compose, parse and machine transform by software (code is data). Also, makes you think in terms of relational operations/expressions versus SQL the language.


Actually, some work on this has been done for CHICKEN Scheme in extensions "ssql" and "ssql-postgresql"[0]. I'm not too familiar with these particular eggs, but the flavor might be conveyed by this example:

    (ssql->sql #f '(select (columns (col actors name) 
                                    (count (col roles id)))
                     (from roles actors)
                     (where (= (col roles actor_id) (col actors id)))
                     (group (col actors name))))
translates to "SELECT actors.name, COUNT(roles.id) FROM roles, actors WHERE (roles.actor_id = actors.id) GROUP BY actors.name"

In the above sexpr, the SQL query is expressed as a Scheme/Lisp hierarchical list, which I think is a quite elegant notation. I'm sure other Scheme implementations provide variations on this theme of SQL generation. Using some macro magic, there would certainly be a lot of ways to organize it and probably it's not even all that hard to do.

[0] documentation at: http://wiki.call-cc.org/eggref/4/ssql


It's def a step in the right direction. At least in terms of letting you mechanically transform queries.

The downside (IMO) is that it's still reflecting SQL the language ... not the relational model.


You should learn SQL and understand relational databases. But using an abstraction (ORM) to cover 80% of the use cases is much better than writing tons of boilerplate SQL. http://java.dzone.com/articles/martin-fowler-orm-hate


Thank you. I searched to see if his article had been mentioned, and glad to find you did. The link he gives to Ted Neward's post is gone, but I found it in the Wayback Machine: https://web.archive.org/web/20141205230114/http://blogs.tedn...

Seems like this thread has gone down a very old, well worn rabbit hole.


In a compiled language like C#, "wrapping" SQL is a good idea, at the very least to catch bugs at compile time rather than runtime. Also makes it easier to switch from one DB provider to another if you need to.

You are less likely to suffer from SQL injection if you use at least some kind of wrapper, although a minimal wrapper that takes a SQL with placeholders and arguments would do for that.

My preferred way to wrap is to use EF with a dash of Linq, which is excellent and yields great readability and flexibility.

There is no way I am going to be willingly writing SQL statements in production code (unless I am forced to do so by some legacy problem). Worse case if the SQL is so complex that it is too much for Linq, then write a stored procedure.

Back in 2002 I had to write hand crafted SQL before mainstream ORMs and at the time I thought it sucked bad! I was looking for ways to generate the code from schema but didn't have time to go and back-fix all the legacy code ad-hoc SQL statments!


You are less likely to suffer from SQL injection if you use at least some kind of wrapper

Pretty much all of our coding guidelines on my team are just guidelines. The one absolute law is that all data going to the DB must be paramaterized, nothing goes in as string substitutions.

But virtually all of our DB access (like, at least 99%) is by stored proc anyway. We're just barely able to keep up with performance requirements by tuning things just right, and in my experience, EF isn't able to generate queries that are as efficient as we can by hand with some fiddling. I know that sounds like "I write in assembler, 'cause no compiler is as good as me", but that is what the reality seems like.


Out of interest, are there more types of SQL injection, or are they all a case of inserting a semicolon, and running a second query afterwards to get the information you want?

Surely that is something that could be disabled at the driver level for security? Make sure it only send one statement at a time.

Correct me if I have missed something.


That's what it boils down to, but limiting execution to a single command would only half fix the issue anyway. It would still be possible to put an early termination into a string and comment out everything else in the command, thereby completely changing what it meant.

Passing data properly through parameters isn't really very hard, there's no point in looking for incomplete work-arounds.


I personally prefer SQL + Dapper instead of ORM stuff if it is Linq-level SQL, and sprocs w/ Dapper otherwise.


The real issue isn't "newtech koolaid" it is the definition of the problem.

I find that people that don't understand the problem like MongoDB for the ability to be "flexible" in designing and modifying down the road. Though I like to argue you have flexibility in SQL also.


I agree, I've read a few 'screw Mongo, I'm going to SQL' type blogs, it really seems like they either didn't grasp how to architect Mongo correctly and tried to do things that don't work well with it, or they had a problem that was better solved by SQL in the first place.

SQL does many things very well that noSQL stuff won't. Also vice-versa.


This is really what he's driving at with the newtech koolaid. People jumped on MongoDB etc. bandwagon because they drank the koolaid of "Schema less is best!" and "It's so fast and shiny!" without really understanding the problem, let alone the right solution.

Unfortunately this seems to be a regular pattern in tech, a mistake (mostly) repeated by the young fresh developers coming in. Usually after you get burned once or twice you stop making that mistake.

I'm not a Comp Sci grad so can't speak from experience, but most of the syllabuses I've seen seem to focus on a very short term, single semester (or at most year) approach to projects. It would be good if they could start with a project and be required to re-write, adapt or improve some kind of project from beginning of the college time to the end. It would be a good introduction to technical debt, amongst other problems. (Heck if you really want to go for broke, maybe even mix things up a little, next semester you'll be working with someone elses code! That would be painful but might teach people the value of good comments, tests etc.)


> This is really what he's driving at with the newtech koolaid. People jumped on MongoDB etc. bandwagon because they drank the koolaid of "Schema less is best!" and "It's so fast and shiny!" without really understanding the problem, let alone the right solution.

Exactly. People report "X makes Y so easy!" and because of the breathy excited "look what I did in just an hour" blog posts the naive believe that using X for Y or similar things negates the need to properly design and think ahead.

If you take that approach then even if you are accidentally using the right tool doe the job, you may well be using it very wrongly and creating an unmaintainable and/or inefficient mess that you'll have to fix later.

A few days coding can save you a good hour or three of design and thinking beforehand.


I guess but college isn't a vocational school. The point is to understand the CS concepts and not necessarily to know all the engineering kind of stuff you'd need in a "real-world" project (which often won't be "challenging" in the CS department).


I honestly don't think I've ever seen a valid use case for Mongo. If you're going to query your data, you have to know what fields you're looking for, right? So why not create a schema that has those fields?


If you're doing a "tracer bullet"-style method of development where the requirements aren't quite set in stone, MongoDB and other schemaless databases can really save time when it comes to adding basic functionality quickly.

Another use case: if you have an ad-hoc group of developers working on a project in a decentralized fashion, they can each work on local codebases of the project without needing an up-to-date setup script or migration scripts.

I'd maybe even argue that a lot of these "goodbye MongoDB" developer blog posts are sort of unnecessary. MongoDB allows for fast deployment and the convenience of a flexible database at the cost of speed, so eventual migration to a more solid relational database could/should be part of a long-term plan when devs choose to use MongoDB. It's all about using the right tool for the job.


You can know a field or fields of the data while leaving the possibility of extra fields that you don't need to know. In practice if you're making a web application and dealing with JSON, it's typically very easy to work with JSON-based databases instead of using serialization functions. You can enforce the schema with various tools at the application level when needed.

MongoDB also has some performance benefits over many traditional databases, and has "native" support for binary data.

I personally prefer CouchDB over MongoDB for most application that are well suited to use a "NoSQL" database, but MongoDB is sometimes decent too.


Mongo doesn't stop you from using a schema, but you'll have to enforce it in your application code rather than the database itself.


> but you'll have to enforce it in your application code rather than the database itself.

You mean in every one of your application's code. You are using a schema no matter what, but it's a missing feature in mongo.


the only valid use case is when your data is so unstructured so that you don't care what the fields in it are and you treat it like a black box. So maybe some kind of user-generated markup that you just shift in and out.

those kinds of applications are not usually kinds of problems we have to solve.


I couldn't agree more with you. I love SQL for when I have a concrete data model, when I don't I love NoSQL. I could use both if I really wanted to, but you need to know when to force a square peg into a circle. Mongo has a good how to on it here. http://blog.mongodb.org/post/72874267152/transitioning-from-...


SQL databases had their imminent death coming. They were stuck. However, PostgreSQL is under more active development than most other databases, SQL or NoSQL. It is new and cool technology.

In the end, whether you speak SQL or use NoSQL construct matters as much as whether you write a program in C or Pascal. You can have endless debates about form and function and you might want to see and identify with a winner, but in the end whether the tools and the community surrounding a technology fits people's needs is the only thing that really matters.


Whenever I see someone shoehorning something into MongoDB or something else I ask the reasoning for not using a relational database. Its always something along the lines of "We don't do that now" or "schemaless" (which as the article points out means more work for the dev) something similar, with the "fast" added somewhere into the description.

The crazy thing is most of them are working on a single node, so no horizontal scaling. And they seem to think non-relational is the new way to do things.


If the API surrounding the ORM is well-designed, using an SQL wrapper like this eliminates the possibility of SQL injection. There are arguments for not always doing things the same way.


Using sprocs or parameterized queries also prevents SQL injection.


The advantage of this is that you can pass the relational dataset that's a result of this code around to other methods. Those methods can add additional filters, different ordering, etc. You can't do that with raw sql strings.


Agreed. And one might note that there's a really good reason Oracle has dominated for decades, and Larry Ellison (who read the SQL research paper, and formed a company to implement it nearing 40 years ago) is a billionaire.


Oracle's success is due to their aggressive expansion into business applications. Their database products have been taking a hammering as enterprises moved to data lakes and horizontally scaled databases like Casssandra.

Your argument would've made sense a few years ago but not now.


i think there is a line to be drawn.. If the ORM is saving you no effort, I fully agree. However, I suspect most ORMs in the short term boost early productivity quite a bit.

It's like writing an app from scratch in assembly, versus C.. The compiler will do quite a bit of the dirty work for you, and if you look at it in a disassembler no doubt you could find tons of improvements to be made (errmmm depending on the compiler I suppose). Doesn't mean either one is the wrong way to go about it. Now if all you do is stuff ASM code in your C app, yeah.. Why bother?


I agree with the commenters who argue for the value in a library that wraps strings that happen to be SQL statements, but none that I saw touched on this point: I think most users of such a library will have to know SQL. These libraries seem to have a one-to-one matching between concepts in them, and concepts in SQL. Hence, the point is not protecting developers from learning SQL (because you can't, for these needs), but in finding the least painful way of integrating SQL into the application logic.


Right. If you write SQL directly, you can also use commands such as EXPLAIN to see what the lookup strategy will be. If there's a full table scan of a large table involved, maybe you need a different query or a new index. If the SQL generation is hidden by some library, you can't do that.

Wikipedia runs on MySQL. They have replicated read-only copies of the database and ngnix caches which handle most read-type operations. Is your site busier than Wikipedia?


Wait... does preferring to write SQL (and knowing how) make me old now?

Dang.


I think the problem is that NoSQL has been the cool new thing, which causes people to use it without really understanding why it is better/worse than other solutions.

Tools like MongoDB can do things that are extremely difficult or impossible with MySQL/PostgreSQL and they are a great choice for those situations. Using it simply out of laziness or misunderstanding, though, is probably going to create problems later.


I sincerely still prefer mongodb syntax because:

- Fits well with a programming language; {a: data.x} is better than 'WHERE A="' + data.x + '"' (sanitize?) or similar which are harder to read. SQL queries are good for direct input, mongodb queries styles are better to be used with a programming language;


If you're building queries in ANY language by concatenating strings you're doing it wrong.

In a sane language that query might look something like

cursor.execute("select field from tbl where long_obtuse_column_name=:foo and status != :status", foo=query, status='open')

All parameters are properly escaped by the api of course, so even if foo is "'; drop table audit; " you don't have any problems.


I've said similar because of this, so your comment is included in my statement.


That is not how you do parameterized queries. With any civilized database library, it would be something along the lines of `"WHERE a = ? AND b = ?", data.x, data.y`, so that the parameters are like function parameters.


I've said similar because of this, so your comment is included in my statement. That doesn't change much


Your users don't give two craps about the syntax. The technology choice is one of application requirements. If you chose Mongo over e.g. a SQL DB because you like your code to look pretty then I have to admit that I wouldn't trust your software at all.


>Your users don't give two craps about the syntax.

Of course not, but your developers yes. Use a complex syntax and more bugs will appear. Do you care about bugs? No, but your users will.

> If you chose Mongo over e.g. a SQL DB because you like your code to look pretty then I have to admit that I wouldn't trust your software at all.

This is not my only reason.


Your comment made me realize why I prefer to work in ORMs instead of raw SQL, even though I'm frequently frustrated by their limitations - it all comes down to composability.

The mongodb syntax looks like it supports composing statements much more readily than SQL. It makes me wonder if there's been any serious work done making a query language that's fully as general as SQL, but is designed to be safely and easily composable so that you can build queries up from parts.

Anyone know of such a thing? The mongodb approach of query-as-data-structure is as close as I've seen.


> The mongodb syntax looks like it supports composing statements much more readily than SQL. It makes me wonder if there's been any serious work done making a query language that's fully as general as SQL, but is designed to be safely and easily composable so that you can build queries up from parts.

Haven't you noticed that it's exactly what the Sequel snippet does? It's also what SQLAlchemy's Expression Language does. It's a reification of SQL queries in the host language, so it can/should give pretty much all the tools of SQL.


In some ways it does, but also remember it's a MUCH less rich query language. It's a lot easier to make a simple api mapping when you don't have such exotic things as "joins".


The ORMs I've used have always tried to map tables to classes (or whatever similar language structure is available). That works well almost all the time, but I get tripped up when the queries get to be more complex or specialized to a particular use.

I haven't used SQLAlchemy's expression language, but glancing over it now it looks like it might be just what I was looking for (at least for python), so thanks for the pointer.


If you really care about your data and you don't think a database should just be a bag where you put stuff but something that can be trusted , and efficient when it comes to storing data then you wouldn't be using MongoDB.


I agree, using ORMs and such hide the real SQL and make it much harder to optimize by using explains etc.


Just have Rails dump the query and stick "EXPLAIN" before it.


What I'd like to see is a universal SQL that can be translated to whatever dialect of SQL my current database is using. That way I won't have to relearn SQL every time I start a project with a different database engine.


You could use Teiid (https://github.com/teiid/teiid). The SQL dialect is similar to Postgres, and it has built-in translators to handle all of the popular relational databases (Postgres, Oracle, MySQL). Added bonus is that even some NoSQL databases are supported, and you can do things like join a table from a MySQL database against a collection in MongoDB. Full CRUD is supported for most translators.

If you're using Java (JDBC), there's an embedded kit so you don't have to run a standalone server. If you're not using Java, you can run the standalone server which also emulates the Postgres ODBC protocol so you can (for example) use `psql` to connect and run queries.


Very interesting, thanks!


The fact that you don't know what an ORM is is what makes me cry.



Stuart Sierra (@stuartsierra): "SQL is already a DSL"


This is a `negate` method in Arel (by Rails):

    class User < ActiveRecord::Base
      def self.negate(scope)
        where_values = scope.where_values

        scope.except(:where).where where_values.reduce(:and).not
      end

      def self.suspended
        where.not(confirmed: true).where(arel_table[:created_at].lt 2.weeks.ago)
      end
    end

    unsuspended_users = User.negate User.suspended
Do this in SQL. Composability is the real boost, and you have composabiliy when you don't have to build a string in order to interact with the db.


unfortunately, where_values is also a private / unofficial API with no stability guarantees, so it can change at any time =/


True, but that's a problem related to Arel (which is poorly documented too... I think Rails is neglecting one of its greatest features).


[deleted]


There's quite a bit more I can do with Arel than I can do with raw SQL, especially if I'm already in Ruby. So... no thanks. :)

Instead of making a statement like that... educate me. I'm very happy to learn about new technologies. What extra, interesting things can you do with Arel that aren't possible in SQL?


Arel is just a syntax tree for SQL. It doesn't strictly give you any more power than SQL - I think your question is a little bit disingenuous, as you knew this already - but code manipulating syntax trees is better for query dynamism than code manipulating strings that don't have further typing.


I think your question is a little bit disingenuous, as you knew this already

No, it was quite genuine. I am not deeply familiar with Arel.


sigh

Not this argument again. I'm all for learning SQL (just like one should always learn foundational concepts), but for some reason there's this notion popping up that Real Programmers use Raw SQL, just like 20 years ago Real Programmers used Assembly. For the love of God, stop hating on basic software engineering principles like code-reuse and allow us to move up the abstraction hierarchy like we do in literally every other facet of software development. Here's a comment I made on the issue of ORMs vs Raw SQL a while back that goes into more detail: https://news.ycombinator.com/item?id=8134205

In the past year I've also grown very fond of type safety within my ORM coding as well, something that's hard to achieve at compile time with just raw SQL.


This post reflects an interesting technical narrative of companies switching off MongoDB to more traditional relational databases as they grow.

Importantly, I don't think that's an indictment of MongoDB. Instead, it highlights the key advantages of NoSQL: ease of use and rapid iteration.

When you're first working on a project, MongoDB is very easy to slap in. You don't even have to create tables/collections. As you iterate, you don't have to constantly be updating schemas as your product direction shifts. Thus, MongoDB is perfect for early startups.

As a company scales, of course it's warts start to show. The very attributes (like schemaless storage) which were an asset when just a few developers were hacking on the project become a drawback as the company grows in size. Reliability becomes more of a concern.

Naturally, people shift to a more traditional database. Personally, I'm almost ready to switch some of our core functionality from MongoDB to Postgres. This isn't an indictment of Mongo though.


Really, you can't spend the 10 minutes designing a table structure in a SQL database? And now you have to spend months re-inventing the wheel because you wanted an easy out?

This post reflects on developers being lazy, instead of doing it right the first time around. Oh no, you have to log in to the db and run a CREATE TABLE statement every few months when you need to scale. Cry some more.

And even then, 'lazy' is subjective. I don't find SQL that hard to implement. There's a library for practically every language you can think of. What's your excuse?


Actually, I think that's a great way to think about it: NoSQL is the "dynamic typing" of the database world.

Put another way, it's like "what? you couldn't spend 10 minutes declaring types everywhere?" - yeah, it's less robust, yet dynamically typed languages remain popular.

My excuse: When I'm just past the mock stage, and still playing with what UI functionality should be, sometimes I just want to get some JSON persisted. I'm changing the shape of the data a lot, discovering the schema as I build. The persistence is frankly a technical nuisance I wish I didn't have to think about.

Sometimes, the result is "good enough" and I don't need to go through the ceremony of glueing in an ORM.


I read this article and was just like "ya. types are important" and reflected on how I used to love Ruby, but now swear against it for serious projects. Glad to see I'm not the only one who made the analogy in their heads.

Right now though, I have to write a script to pull some crap out of an Excel file and into my database. I use Scala with the Play framework, but what a nightmare getting typed information out of an Excel sheet is. No thanks, I'll just pull it out magically with Ruby and construct the JSON requests to my real server to create the data. It's a smart business decision and I would defend it all day. I'm sure there's corollaries for "dynamic databases" such as MongoDB.

To be clear though, I'm with the grandparent. I really think decisions for MongoDB are made out of a combination of laziness and eagerness most of the time. I mean honestly, if your data is going to be relatable, why would you employ a scheme that forces you to make those relationships manually? So you go to a relational databases which ironically lacks relationships, but hey there's plenty of support and libraries that will do it for you. But really if you had to ask me my personal opinion? Graph databases where relationships are FIRST CLASS CITIZENS are typically the best for serious projects containing lots of relationships.


The difference there is that you enter types for thousands of lines of code. OTOH a schema is changed relatively infrequently. Maybe a few times a day during really experimental stuff.


Your given excuse makes you sound more like a naive amateur instead of a pragmatic architect


Just in case GP cares: To me it did not sound like this at all. I think making prototypes and starting to build the thing is the key to getting to a good architecture.


I guarantee you're not designing your databases correctly if they only take you 10 minutes and only a CREATE TABLE statement every few months.


You don't even need to design the schema, just add the bits you need as you go (updating the schema takes about 20 seconds with rails for instance).


I think you're understating the cost of updating the structure of an in-use database.


I'm not even sure if you're referring to an rdbms or Mongo, which is kind of the other side there. I _think_ you're referring to an rdbms. But I also think people seriously understate the cost of changing the way your data is stored in a large in-use MongoDb too.


Mostly because an RDBMS will ask you to explain, explicitly, in complete detail, how you propose to consolidate the entire schema and existing data with the new changes without introducing paradoxes or inconsistencies.

Mongo won't. It will take you at your word. Even if that word is wrong.


Definitely not about spending 10 minutes on SQL Schema. It is all about maintenance. MongoDB is right use case, if you know what it will be there in Table.


In some cases you don't even have to log into anything, just type one line to create migration and another line to run it.


No, this is bad. You shouldn't design a system that won't work from the outset. "Plan one to throw away" is about budgeting time, not about knowingly making big technical compromises when you write the first version.


But these systems do work - it's just that they don't necessarily scal as well as others.

It's silly to expect that every single software project requires the same level of correctness or maintainability. A rapidly-developed speculative SaaS for example may be able to trade long-term maintainability or scalability for faster development time (often more important if you have no idea whether or not your idea will even work).


Not quite--plan to throw away the internals, not the interface.

This lets you do things like writing shitty hyperlinear naive solutions to get all the pieces of a system in place, and then going back and optimizing each in turn. You don't spend a lot of time fussing over little details, and instead make rapid progress.

If you don't spend time on the interfaces between the pieces, though, you're absolutely screwed.


I'll say I agree with this corollary: the schema is the interface.


I had to look the quote up, because it didn't sound quite right.

"Plan to throw one away; you will anyway."

The quote is Fred Brooks from his book "Mythical Man Month." It seems to me the author's post reflects that quote. Granted they didn't throw the entirety of every application out, but arguably it's an entirely different application now. I would even wager there's months of refactorings ahead of them as they take advantage of Postgresql's features.


I think the author of the post has the right approach. They tried hard to build something that would work right, and since it didn't, they rewrote most of it. They had enough budget and willingness to rewrite it. The author learned from their mistakes, and probably won't be using MongoDB on a future project.

The author of the comment I'm replying to seems to advocate a different approach where you choose tech and techniques that are good for short-term but maybe not long-term. This way you have planned technical debt plus unplanned technical debt. Also the comment author seems to be missing one of the post author's main points, which is that MongoDB's schemalessness is awkward even in the early stages of the app, and this is worked around by a less-than-ideal technique of moving the schema from the database layer to the application layer.


And what most people here have missed is that PGSQL is working for them 'because' they now know what their data looks like, MongoDB got them to that point. I can't speak for the OP but some of his examples speak to a lack of experience, which, had they started with PGSQL would likely have resulted in an intractable schema and we would now be reading a 'How MongoDB freed us' post.


The author said they used Mongoose to define the schema, and since they needed to define the schema, they'd rather just do it in the database. So I don't think they'd have been limited by using SQL from the start. Either they learned how to evolve a schema or they didn't have a schema that evolves well even when they were using MongoDB.


I don't know about this, seems like a lot of technical debt to me.

Choosing mySQL over mongoDB to start with really can't add that much time to a project, especially if you're doing it in a language like Ruby. A couple hours extra here seems very worth it to avoid a multi-week migration process a couple years down the road.


OP will be back in three years talking about migrating off of Ruby...if they make it that far :-)


Unlikely, we've been running on Ruby for about 5 years and it's the one thing we feel safe relying on for our serious internet business. Actually, our entire setup is Ruby (safe from some shell scripts here and there).


> Another way of handling this is defining a schema in your models. For example, Mongoid, a popular MongoDB ODM for Ruby, lets you do just that. However, when defining a schema using such tools one should wonder why they aren’t defining the schema in the database itself.

Bah. It's like they didn't know that schema-free data stores mean "there is no schema; different objects may have different fields". This is the whole point of MongoDB: you assume the responsibility of managing the schema. That's a "feature": you get greater flexibility by assuming more responsibility. Whether it's a useful feature, I won't say.

Their second reason is much more valid: MongoDB is not consistent/durable, all that. It's good for a cache, but not for long term data.

Here's a third reason I'd give against it as your primary data store: it's expensive. You have to keep your entire dataset in RAM, but that's not always necessary. My favorite example is from Foursquare. They had every single check-in ever in MongoDB in RAM. That's absolutely unnecessary, and quite silly to do so. Old check-ins are archived data. You don't need them. No user ever wants to know when/where they checked in three years ago. This is why at the time they were paying for 68 GB RAM Amazon boxes instead of 4-8 GB boxes. (I have no idea what they do now. I remember chatting with them on HN after a catastrophic out of memory failure when they filled up the entire 68 GB's).


We have a dataset much larger than RAM (600gb vs 60gb RAM). It's the working set that really matters. Accessing data outside the working set can be slow - unless you have SSDs :)

Regarding consistent/durable: during the past four years, we've not had any problems on this front that weren't caused by us in. We've had an issue that was a misconfiguration on our part where we allowed writes to a server because we pulled it out of a replica set. We also ran out of space on the logging volume once and that caused downtime - but, we didn't have log rotation or anything setup.

In general, we've found the failover very reliable and new primaries have come online without any problems.

That said, schemaless is both a blessing and a curse. Now that Postgres and MySQL have online alter built-in I'd possibly choose one of them if we were starting everything again.


Well, SSD's aren't nearly as fast as RAM, even today.

I think we are in agreement about the working data vs all data. I am saying that in most applications your working dataset is much smaller than your total data set. So why pay for hardware capable of holding your entire dataset in RAM when you don't need it?

I am surprised you are able to do this with Mongo. Last I checked, it simply did not handle this case, and started failing miserably if it was not able to fit all data into RAM.

Re: durability: I am not talking about the server going down, coming back up. I am talking about whether there is an fsync() when writing data. Set up a test case where you are writing data very rapidly to MongoDB, then pull the plug on the box it's running on. It'll come back up, but the data it told you it just wrote won't be there because it didn't checkpoint. Did you check that all your writes succeeded when you had node failures in your cluster? Most applications don't have the machinery to do this because generally the state necessary to check this is stored in the database, yet it's the database you are testing. The only way to test this is to also write logs (also atomically), and then verify DB data against logs. Or, just use a database that guarantees durability.

Re: consistency: MongoDB doesn't support transactions [1]. That's enough to exclude it from a large number of applications. Anything to do with money, for example, is out since you really don't want double spending to be a thing.

[1] http://docs.mongodb.org/manual/core/write-operations-atomici...


Fair points - we don't allow failed servers back into replica sets and rely on writing to multiple nodes instead of the disk as source of truth.

This may not suit everyone and absolutely does not suit financial transactions. You can bend Mongo to do it using additional collections and money movement logs... but, why bother when it's simpler to use MySQL/Postgres?


I'm not a fan of MongoDB, but you don't know what you're talking about.

MongoDB is durable. While it doesn't quite support SQL transactions, it is durable. The data is journaled before being confirmed, and once confirmed will be written to disk. It can be consistent, but this sort of breaks the whole idea of scaling and distributing the load in Mongo, or causes massive performance problems, so that is something to consider when using it. Understanding and adopting eventual consistency is tough, but it's an issue with every distributed database, not just MongoDB.

And you don't "keep your entire data set in memory". You should make sure your indexes fit in memory. Your data can be as large as you like. Most people with terabytes of data in MongoDB don't keep terabytes of RAM in their servers.

Foursquare did NOT keep every single check-in in RAM. They kept an index of them in RAM, sure. But the problem was they had a sharded MongoDB deployment and one of the shards became unbalanced and exceeded the available RAM. If that happens (harder to do these days, but not impossible), it can be very difficult to recover.

MongoDB tries to bridge that gap between NoSQL and SQL. I think the MongoDB folks originally ignored decades of database research when developing MongoDB, but they've been forced to adopt it as the years have passed. Is it an Oracle killer? No. But it can be a useful and productive tool if you understand and apply it appropriately.


Please read the first answer to http://stackoverflow.com/questions/18488209/does-mongodb-jou... before assuming that the MongoDB journal is durable.

It comes down to this: there is no way to tell MongoDB to commit a write and not return until it is durable. You can tell it to fsync() it within at most 33 ms or so from when it returns, but not immediately. There is no control in it to say "fsync() this value because it is important, then return".

That is not as durable as other database engines, which do this based on number of commits that have not been fsynced to the journal (this value can be set to 1 causing every commit to be fsynced).

Consistency is a large topic, and I am sorry I mentioned it vs something more specific, such as:

MongoDB doesn't have transactions, so you can't make multiple updates truly atomic. That's annoying at least, and unusable at most. Two phase commits let you get about 50% there, but with no rollback your application code has to implement rollbacks. In lots of scenarios your application is not smart enough to rollback a "transaction", so your data will be screwed.

Eventual consistency is starting to come under fire precisely because it avoids the hard problem of database design and is offloading it on the application developer, who is typically not as versed in how to create a system where temporary inconsistencies are OK. At best, it means that the developer is now wasting productivity on addressing consistency issues.

You may be right about Foursquare keeping only an index of all the check-ins, not the entire dataset. The main point remains: why keep essentially archived data in RAM? MongoDB does not let you keep a partial index.

MongoDB tries to give up basic datastore guarantees to gain speed and flexibility. Once again, I am not arguing that it's a useless tool. It is useful. As a cache. Or in cases where data loss is acceptable. It is much less useful in cases where dataloss is catastrophic, such as financial transactions. It will also be much more expensive at scale, whereas other datastores optimize for the conditions where RAM is expensive and disk is cheap. You can still run into this case when you are talking about a terabytes to petabytes of data.


> This is the whole point of MongoDB: you assume the responsibility of managing the schema

Who/what is the 'you' there? Don't 'you' have the responsibility of managing the schema either way? It's a question of whether you want to manage the schema through an rdbms, or... just in your application logic, I guess?


Manage as in write all the code that ensures adherence to the schema. If you say that the field called "score" is an int in a schema-ful DB, then insert an array, the DB will throw an error. If you do that in a schema-less DB, it will not unless you add a check yourself. If you are not using some type of unified DB access layer, you must perform this check every time you write a value. You must also perform the check every time you read a value, since someone else might have written something different to the DB while you weren't looking. This you take on doing much more work, yet gain flexibility and possibly some space savings (as in, you are not storing empty cells if you don't want to).


> Manage as in write all the code that ensures adherence to the schema... ...This you take on doing much more work, yet gain flexibility

The flexibility you gain seems to mostly be... the flexibility to choose not to ensure adherence to the schema.

Which may be a valid choice in some cases, but I suspect a lot of people don't quite realize they are making it let alone analyze whether it is the right choice for them, and come to regret it.


Which is pretty much my point. Most people don't realize the hidden cost here. It's the same reason you wouldn't use a dict for everything in Python.


(solely toward your final point) Many RDBMS' will optimize that for you. PostgreSQL has a bitmap structure on each row that indicates which columns are null and which have data in them. In addition, it automatically compresses certain data types. For example, strings that overflow to the point where they need to be stored in a secondary table (called "toast" in pg) will be automatically compressed.


That's pretty cool. I didn't know Postgres actually did this, though I figured it was at least possible.


You are unlikely to gain space savings, as it stores the keys as well as the values to all records.


There is a mistake in the article, due to the OP not knowing an arguably basic notion about MySQL.

> when defining a field as int(11) you can just happily insert textual data and MySQL will try to convert it.

this is dependent on the SQL Mode, which is quite flexible. for example, the STRICT_ALL_TABLES will prevent strings to be inserted in INT fields:

mysql> create table example ( `number` int(11) not null );

mysql> insert into example (number) values ('wat'); Query OK, 1 row affected, 1 warning (0,00 sec)

mysql> SET sql_mode = 'STRICT_ALL_TABLES';

mysql> insert into example (number) values ('wat'); ERROR 1366 (HY000): Incorrect integer value: 'wat' for column 'number' at row 1

There are certainly advantages in choosing PostgreSQL over MySQL... this is just not one of them :-)


Personally, I would not use a database which is 'lax' by default instead of 'strict'. What other choices have they made which I need to learn OR it will bite me big in Production?


It's not so easy.

MySQL has a large legacy of being used as a very-immediate-although-somewhat-toy database at its roots.

For example, in absolute terms, I would find much more troubling the usage of non-transactional tables, justified by meaningless microbenchmarks, which has been somewhat common for some time.

Nowadays MySQL is definitely reliable, and it has a much more expert surrounding culture than the past, so if a person/company is willing to put time and knowledge, it's a reasonable choice.

I don't find automatic conversion so damaging that people should stay away just because it's default.

When you reach some level, you definitely need to have a relatively intimate knowledge of your tools, and at such point, one is far from the "defaults".


Example: Inserting strings into a varchar() that are longer than the limit will silently truncate them.

Example: The TIMESTAMP type defaults to the current time instead of null.

There are more, but those are the two that have bitten me in the past (and no I do not use MySQL at all if I can help it)


See my other comment in this thread RE: string truncation (strict sql mode will default on in MySQL 5.6/5.7).

It is possible to change timestamps to default to Null. For backwards compatibility, the previous behavior defaults:

http://dev.mysql.com/doc/refman/5.7/en/server-system-variabl...


You can subtract two DATETIME columns in MySQL and it will not generate any warnings. It will give you an answer that probably is within the ballpark of the actual time difference but isn't the actual time difference.


If you feed MYSQL an invalid date for a date/timestamp field it will quietly insert 0000-00-00 and produce a warning, not an error.


In a previous life, the lax by default approach of MySQL bit us hard on a regular basis; a number of discrepancies vs. reasonable expectation were had. At the time I left, they were working on migrating toward Postgres, and reports suggest they are extremely happy with pg.


utf8 is not really utf8 and doesn't support 4 byte characters that exist outside the BMP (you'll need the separate utf8mb4 type for that, assuming you're not running on a legacy version that doesn't support that either).


This is indeed the case. To clarify a little:

- STRICT_TRANS_TABLES is on by default for "new installations" starting from MySQL 5.6 (2013). What this means is that the bundled config files all turn it on.

- Starting from MySQL 5.7, it is a compiled default, along with several other more "strict" options. Effectively making it default to ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION,ERROR_FOR_DIVISION_BY_ZERO, NO_ZERO_DATE,NO_ZERO_IN_DATE,NO_AUTO_CREATE_USER.

I have some sample configuration files to make 5.6 behave like 5.7's strictness. For example: https://github.com/morgo/mysql-compatibility-config/blob/mas...

It does make upgrades harder, so some applications may also need to use a whitelist/blacklist approach to transition: http://www.tocker.ca/2014/09/01/suggestions-for-transitionin...


When buying a car, would you prefer for it to be safe by default or only after you've enabled half a dozen features such as "air bags" and "seatbelts"?


Perhaps the author was aware, but found it unacceptable that clients could choose their own sanity ?


In the future we might also move our Rails applications over to Sequel, but considering Rails is so tightly coupled to ActiveRecord we’re not entirely sure yet if this is worth the time and effort.

Actually, with modern versions of Rails, using Sequel in place of ActiveRecord isn't bad at all. Nothing in Rails is really tied to ActiveRecord anymore. There are dependencies on ActiveModel, but you can easily make Sequel::Model objects conform to this interface. Sequel-rails helps with most of that: https://github.com/TalentBox/sequel-rails

It's really just the migration that's difficult, as that's tough to do piecemeal and requires good test coverage.

Here's a recent side project of mine that I switched from ActiveRecord to Sequel pretty quickly once I remembered how extremely limiting the querying capabilities of ActiveRecord can be: https://github.com/bgentry/portfolio-api

And if you really want to keep the option for schemaless data storage, Postgres can now do that with better performance than MongoDB, while keeping full indexing capabilities: http://blogs.enterprisedb.com/2014/09/24/postgres-outperform...


I've switched about 50% of our internal applications and all of my side projects completely from ActiveRecord to Sequel and it's an absolute joy to to work with once it's fully migrated.

The main reason we started using it the first place was composite primary keys (not supported in AR, at all), and we've never looked back!


While certainly possible we simply haven't really evaluated it yet in depth. In between releasing a bunch of upcoming features and upgrading Rails from 3.2 to 4.2 I'd rather wait with _also_ moving from ActiveRecord to Sequel for the time being.


yes, by all means upgrade yourself to the latest release of Rails before attempting to go "off the rails" :)


The article is sort of weird, why Document, no sql, no schema DB to be a schema DB? One fundamental fact that you liked MongoDB in early days (first 5 years) because it was damn easy to handle anything you throw at, you learned to build the business around it. In 5 years, you learned enough, business model matured, not much changes in the data model. Now development progress is matured, you have got team to work on, your worries relies on consistency. You might want to back to SQL. Now you know what schema/table you need, string length, data type, constraint, relationship etc, because you learned that in 5 years time. You don't need to deal with new developers screwing the Document DB (like storing Object ID as string in few places)

Developers, who reads the article beware of reality, you can look back 5 years now, not looking forward 5 years ahead. Don't waste time in dealing with database columns and schemas, instead build the business faster, I found Mongo DB or any Document DB is good fit for agility.

1000's of business move to Document DB because it is schema-less.

While working on Document DB, you should be master in writing stand-alone scripts in Python/Perl/Ruby to run every-time you break the structure, or fix inconsistency.

edit: fixed typo errors


Is there anybody here who has run MongoDB at moderate scale with good results?

As in a few terabytes of data, >10k ops/second territory.

I've been really disappointed with its reliability and performance in situations where I've been around that.


I came into a company that was attempting to use MongoDB for their soft-real time time-series data (I would consider it between low and medium scale) and it was atrocious, in too many ways.

It doesn't scale without tremendous effort and implicit schemas are a very dangerous thing to introduce into your application, they're insidious, and require enormous diligence in the application to codify the schemas (I would only feel comfortable using a loose document store with something like Haskell in which I can model the schema with strong types).

I've encountered MongoDB in three different companies / products and expended much effort to immediately move away from it in every case. In each case, the solution (which has been different each time) was far more appropriate to what was needed.

There's a sad inclination by developers to pick "one ring to rule them all" tools and MongoDB I believe even sells itself that way. It is not.


What were the more appropriate solutions?


Postgres for structured data, Riak for high-volume and flatter data, and TempoDB for high-volume time-series data.


I did, for nearly three years. MongoDB was consistently the number one reason for site downtime.


I've been using mongo on 39M+ records (tracking financial tick data) across 73 assets (collections) and my queries take anywhere between 2-3mins depending on complexity.

You can always run db.currentOp() in the mongo shell to see what process is taking forever as well.

Let me clarify why though, there methods of optimizing a query by adding another field, but since I have to traverse my records with the sort() cursor my queries take that long.


Jesus Christ!

I remember querying databases with 100 tables and millions of records in foxpro a century ago and it took less than a second.

What has happened to the world while I was in cryogenic state? Take me back to the nitrogen pool!


I don't get this. If any relational database choked with that small number of rows, it would be thrown away immediately. That's a couple of orders of magnitude too slow.


High scale public references and links to their presentations are maintained here: http://www.mongodb.com/mongodb-scale

I think scaling any system takes careful planning.

IMHO, this is an area where MongoDB has improved a lot in the past two years, especially with 3.0, but there is still a lot of work to do.


Yes. When implemented correctly Mongo can scale very well.


i think foursquare is, unless they have migrated off and did not publicized it


I had a business based on another company's API years ago, and they started changing a lot of their systems to work with mongo. The data lost a few useful points, like IDs and the speed and reliability didn't seem to be improved. My thought at the time it was, how about you guys just use postgresql? My comment on the mailing list about that seemed to be taken as an insult or naïveté. However, sure enough two or three years later they said they had all sorts of problems with Mongo and were switching to… Postgres.


The author's assertion that "Another problem with MySQL is that any table modification (e.g. adding a column) will result in the table being locked for both reading and writing. This means that any operation using such a table will have to wait until the modification has completed." is no longer correct as of Mysql 5.6:

http://dev.mysql.com/doc/refman/5.7/en/innodb-create-index-o...

If you specify ALGORITHM=INPLACE,LOCK=NONE you can alter table without blocking reads and writes. We have used this method successfully in Amazon RDS when updating schemas.


Small clarification:

> If you specify ALGORITHM=INPLACE,LOCK=NONE you can alter table without blocking reads and writes. We have used this method successfully in Amazon RDS when updating schemas.

The use-case of ALGORITHM=INPLACE and LOCK=NONE is to produce an error if the modification you are attempting is not supported in this mode. i.e. even if you don't specify LOCK=NONE, that doesn't mean it will lock.

This is useful in preventing guessing games (i.e. you think its LOCK=NONE, but for some reason it's not compatible...)


It's not exactly a common operation either, so basing the choice of rdbms on it seems a bit arbitrary.


It happens a couple of times every month in our production databases. If it required locking tables we would have to schedule downtimes during the early mornings every time this happens would would have been a pain.


As soon as you have more then 1 app talking to your database its time to either create a common library that talks to your database that can be shared or wrap it in a service. This has nothing to do with NoSQL vs SQL. It has to do with architecture composition.

Plus your problem with checking if a field exists is because Ruby doesn't support property attributes. This is easily solved in C# using attributes. Which you can fake in Ruby. http://stackoverflow.com/questions/1085070/how-do-i-fake-c-s...

Data consistency is what your models are for. No data should be inserted before being assigned to a model.

Again it sounds like you needed a shared library or just a service wrapped around your database.


> ... we value the following: > Consistency

With all due respect, you should have had this list before selecting MongoDB. Consistency, as in "eventual consistency" and also in the way that you describe it is better supported outside the NoSQL group.

Also, I don't fully agree with the "schemaless" discussion. The moment you think about your data, you build a schema in your head, which, then, is translated into code etc. It's about how far you take this rigid model. Maybe you should see it as "flexible schema".


So let me ask a question. What should I use when I do need a schemaless database? Is NoSQL never the answer? I've got a project that needs to allow clients to create registration forms for different events that my company hosts. A lot of the registration data will have a defined shema ex: name, email, address. I feel like that stuff should go in a RDMS, but all the event specific stuff needs to be schemaless. I know I can do custom key/value tables in a RDMS, but that doesn't feel right either. Is MongoDB useless as a database, or are people being bitten for thinking it's a silver bullet and throwing it at every problem?


Riak is amazing and actually scales. But I would only use Riak for high-volume data storage (similar to S3).

FoundationDB looks great, I haven't used it yet but they appear to have their heads on right.

PostgreSQL, the newer versions, have indexable BJSON data types so you can get the same exact behavior from Postgres as you do from Mongo but with a true RDBMS along with it, a dependable storage engine, etc...

Postgres is harder to scale horizontally though - if you have really high-volume data writes, you should be using something else for that.

I typically use PostgreSQL for all of my highly "structured" data and Riak for high-volume and "flatter" data (Postgres also often serves as an index into those objects).


Awesome, Thanks for the reply. My use cause is barely out of the "toy app" range. We only do a handful of events each year and they only draw around 100 attendants. We're talking a very small amount of data. When you do the PostgreSQL and Riak combo is it ever on the same/related dataset. What are you using at the application layer? I'm building this in rails and I feel like it would be better to store the structured fields in MySQL and the variant data in something NoSQL. But I haven't read much into using active record with two different persistence layers. That's interesting about PostgreSQL with indexable BJSON. I've only ever used MySQL, I really need to check out Postgres.


It's easier to go from something more highly structured to something looser. Start with a relational db and let it grow then pay attention to what data gives you the most scale pain and try to move that out to Riak / Cassandra / etc...

Don't prematurely scale, just pay attention to your metrics, scale vertically first, then tackle the very specific pain points.


The addition of the json type to databases like Postgres has significantly limited the usefulness of JSON-datastores like MongoDB for me. It used to be that if you had to store truly schema-less information, and wanted to be able to query it, you needed Mongo. Now, Postgres can do that inside of an otherwise structured table.

That said, there are applications for other less-structured datastores like Redis. When you need to store data with an expiration quickly, and use common datastructures like sets, Redis can be fantastic.


> Is MongoDB useless as a database, or are people being bitten for thinking it's a silver bullet and throwing it at every problem?

A bit of both really.

NoSQL databases allow for rapid prototyping, as do weakly and dynamically typed languages. It's amazing if you want to just get a product out of the door. NoSQL is the short term answer. And MongoDB is the answer if writing your data to /dev/null feels like a good idea to you..

However, strong typing and regular databases offer consistency. You have no way of going wrong because it would have refused to compile twenty times before you even think about pushing your (wrong) code in production. Sure, you can force yourself to get such consistency in MongoDB. But first, it's pretty taxing mentally, and secondly if you're doing that, why not go the way of a relational DB which offer you tools to enforce that and is faster ?

KV columns in databases feel wrong to me too, I feel dirty using PGSQL's json storage because I feel like I'm throwing normal forms away. But at the end of the day, what matters is that your product works.


Yea, I'm going to check out PostgreSQL's json storage.

>But at the end of the day, what matters is that your product works.

With my requirements, I could just write to a flat file and be fine... I seem to like complicating things just enough that I no longer understand how what I'm building works. LOL.


SQLite is great for "I really just want a nicer flat file" use cases.


Mongo is rarely the answer for high performance, high transaction systems. I use it quite happily to prototype applications due to it's very low boilerplate overhead.

If you need schema-less data storage in a "real" database, use PostgreSQL's JSON type.

http://clarkdave.net/2013/06/what-can-you-do-with-postgresql...


But what about low performance, low transaction? Realistically the stuff I'd use it for wouldn't see much traffic. The big factor for me is schemaless. I don't want to create a new table each time there is an event with similar, but not exactly matching data between events. I mean, is MongoDB so bad that in any production setting the reliability is not there? Thanks for the link. I didn't know about the JSON type. Maybe I'll finally give PostgreSQL a try.


Most people who run into performance issues on Mongo are putting a lot of data into it. I've personally never had problems with it for side projects, but my tolerance for failure and data loss in those scenarios are quite a bit different than what most people expect out of production systems. It's definitely possible to use Mongo in production successfully, you just have to be aware of the tradeoffs and plan accordingly.


> is MongoDB so bad that in any production setting the reliability is not there?

Replication is for high-availability not for consistency. As long as you can live with that, the reliability is ok.


MongoDB is a great database and has a really good set of client tools. It has a learning curve and it has not been without problems but I have loved it at a past startup and would absolutely use it again. If you try it out and find you like it, I'd really suggest getting to one of their MongoDB seminar days. They tend to have good speakers and for sure you'll learn something new about databases and MongoDB.


Yea, from the good stuff I've read about MongoDB I really want to like it. There's just been so much more negative that I've read.


Every choice you make in software has pros/cons. This is true of all database technologies as well. You can probably solve your problem using any of the choices before you. If you decide to use something new (e.g. MongoDB, Riak, whatever) first make sure it lines up with your requirements then see how it goes. Always keep backups. In the worst case, you'll restore and migrate to something else. But that's kind of what we do as an industry anyways.


>"I feel like that stuff should go in a RDMS, but all the event specific stuff needs to be schemaless."

Okay, you lost me there. Why does it need to be schema-less?


I imagine that because if you handle different types of events there is an infinite number of possible registration options. From preferred food type to breed of your dog... and many more. It could be stored as a huge (event, user, key, value) table, but in practice that's just how you choose to store a schemaless hash of event attributes.


while i've not spent much time with the "nosql" products, its mostly because i've decided to stay SQL. The bottom line is that most humans think of data the same way SQL thinks about data. When you talk to people outside of engineering (i.e. mgmt or customers), they expect the product to be able to do things that SQL does naturally (normalized data for easy changes to things like 'user name', join and sort based on a variety of cross cutting properties, etc). You can always make both do the same thing eventually, but one will fight you more than the other....


I did a similar migration last year. Loving postgres.


Something that I need to call out here. In talking about how to handle the lack of fixed schemas within Mongo, the OP refers to using if/else blocks to get around field name changes. This shows a severe lack of understanding of how to work with Mongo at such a fundamental level. If using Mongoid (which he is) you simply need to run `Post.all.rename(title: :post_title)` to do the schema change. This may seem like a nitpick to some people, but honestly if that concept was lost on the developer then its clear that there is little to no credibility to be had in the reporting of all the other Mongo issues.

I'm glad that the OP is growing as a developer and starting to understand the merits of SQL and why having defined schemas is important. SQL is awesome and its not going to be replaced by NoSQL (at least not fully). However please make sure that you are building the foundation of your understanding by fully grasping the technologies that you are already using.


This post outlines some of the most expensive parts of deploying, operating, and maintaining an application that operates on "NoSQL" databases like Mongo, and in my experience, DynamoDB:

1) Implicit Schemas. We avoid this completely by doing production migrations and re-indexes on every new field we add. Its expensive, and we have to write/test/run scripts in production. At times I wish I could just write a Rails migration on an RDS instance and call it a day. N-1 compatibility isn't hard to accomplish with good code reviews.

2) Search. Want to search and join like in the old days? Good luck. Using a NoSQL DB as the primary authoritative store of record is great, but you'll need a secondary indexes for any searches you want to do. If you need to look up an object on a new field (or even one that already exists in all of your data) if you haven't built an index for it you will have to.

3) Serving Clients. Because of the schemaless-blobby nature of writing clients, things get real messy the minute you have multiple services or applications reading or writing to a DB. To get around this, you have to put a service in front of it and serve the data from some RPC technology, which is an extra step and requires more development and maintenance.

4) Administration. There aren't a long history yet of robust toolsets and "science" behind different schemaless NoSQL databases. Meaning which one you choose has a huge impact on your ability to fine tune it, debug consistency or other expectation issues, and do things like proper failover, backups, restores, etc. Knowledge between similar NoSQL DBs doesn't transfer as well so your mileage will vary more so than on SQL databases (MySQL vs PostgresSQL, for instance).

NoSQL has its place and purpose, but it is rarely as the "one database that rules them all" that many businesses end up with. I'd be interested in counter stories.


Yes if you have one MongoDB database that uses `title` and another one that uses `post_title` then you have to adjust your code for that.

Guess what. Same thing applies to SQL.


In SQL you can't have a table whose title field is called either `title` or `post_title` depending on which record you're looking at.


You easily could have this in SQL if you designed a poor schema or failed to migrate data from the previous field name.


I haven't used MongoDB in production (the comments regarding reliability have been around for awhile), but playing around with it, I do like the json format and query structure.

The issue I have with SQL (MS sql in the case of work) is the amount of cleverness involved in some queries I have seen. Among the old timers, it seems almost a badge of honer to develop the longest, most clever SQL query that does everything in one step. Inevitably, there are problems, and people have trouble figuring out why because they can't debug parts of the statement in isolation. In this case, it's as if they have written an entire program in one line and can't test parts in isolation. No doubt this is abuse of the language... and admittedly my SQL skills are not world class, but I keep scratching my head and wondering why they do this to themselves. If a query is so complex you can't tell what it is doing, and it misbehaves, perhaps you would have been better off with some smaller queries you join in the program... nothing against joins... one or two or three of them... but really....

I keep asking myself if I'm missing something, but I sort of doubt it.


That is then not a problem of the SQL-Language, it is more a problem of the database design.

When you need really complex queries, you should think about a new db structure.


The largest database is vendor provided so we are stuck (for now).

Thanks for the insight.


I switched a new product in planning from MongoDB to PostgreSql 9.4. With the new JSONB columns you get the best of both worlds.


Interesting to read both the article and comments. It's cool to hate Mongo nowadays but we use it successfully (for PEPS, open source: https://github.com/MLstate/PEPS).

The version of MongoDb used is not mentioned in the article. The changes between versions is a problem with Mongo but it improved much recently.

One problem clearly identified is the lack of model. There are solutions to this. For instance, with Opa (http://opalang.org) we use strong static typing to generate a database model and guarantee that the whole applications sticks to it. That leaves out model changes, but there are solutions for that.

Also, there is no need to have the whole collections in RAM, but clearly enough RAM for the "working set" helps.

In the end, Mongo is no magic so do SQL databases which have their share of problems too.


I made a little bet with myself when I saw the topic of this post that a good 80% or more of the top comments would be of the "Shouldn't jump on the MongoDB/NoSQL bandwagon, nothing wrong with SQL" variety combined with various arguments to support that position or arguments against NoSQL in general or MongoDB specifically. I won.

Here's the thing; anyone that claims one is superior over the other without adding a specific use case or context is doing something akin to religious preaching. Databases are tools. Pick the tool you need for the job. If you don't understand exactly what a hammer is for and how it does what it does put effort into gaining that understanding. If after that research you think the tool has flaws that are important to you then don't use it.


There is only one NoSQL database satisfies all following requirements: 1. easy to use 2. reliable 3. transaction support 4. easy to scale 5. eays to build indexes

It is BigTable.

Generally, if your data is not very large, you should use a SQL database. NoSQL is mainly used for easy scaling, not for its schema-less feature.


Interesting article, did you consider any of the newer NoSQL solutions that offer a data model more similar to MongoDB but more, often configurable, consistency guarantees (like ArangoDB or RethinkDB)? I could imagine that the effort of a migration would have been considerably smaller...


As the post explained, one of the big problems of migration off of MongoDB is finding out the "effective" schema of the data, and then migrating it (according to that schema). By "effective" I mean the resulting structure of the data stored in the database.

I think it would very helpful to use here ToroDB (https://github.com/torodb/torodb). While being MongoDB-compatible, it stores data structured into PostgreSQL tables, automatically identifying the schema of the data. Then, just by looking at the created tables, you very easily have the schema (and the data migrated to that schema). It would make migration easier.

Disclaimer: I am a ToroDB developer


The problem of schemaless database mentioned in the article is then replaced with painful migrations.

The trend to go back to SQL databases now is fueled by the "same" poor reasons why people were going for NoSQL databases. You can't have everything...


I have found that the more schema you have, i.e. the more structure you have for your data, the easier migrations become, because there are no surprises.


Diving in a cesspit makes one appreciate clean air.


It's very interesting to see the amount of negative articles about MongoDB, and at the same time see that its very popular on job boards. I'd honestly like to read an article about a success case


The successes tend to be quiet. And folks who are happy don't take to the internet to defend their choices.

I've been on two straight projects where Mongo has been fine, no better or worse than an SQL database, but certainly nowhere as bad as one would assume reading all the negativity around here. I have experience with it being stable in two different contexts in production (social games, a very common use case, being one). A search for "social games mongo" should show you some success stories.

The thing that makes it a good choice for me when starting a new project is that you don't really think about it, especially if you're following lean startup methodology where the goal is to get a product into the hands of the users as quickly as possible & you anticipate changes to your models. Mongo is pretty fantastic for this. And when things stabilize, moving data really isn't that bad or expensive if you modeled based on this assumption (which you should). And now that PostgreSQL has a pretty good JSON datatype, it presents a pretty painless path if and when you outgrow Mongo.


A related read, titled Why You Should Never Use MongoDB: https://news.ycombinator.com/item?id=6712703


Beware of "The Problem Of Schemaless" can occur with a relational DB when you start defining columns as nullable. You end up with the same problem.


I wouldn't call it the same problem. Something nullable is similar to an Optional type (i.e. in scala Option[String] would map to a nullable varchar). That is not the same as being schemaless, you at least still have types that are being enforced, even if they might be empty.

I would agree that having too many nullable fields could indicate a problem with the schema, i.e. you should break things up into more tables.


Agree. I'll rephrase to "similar" problem.


I also keep reading about how bad is MongoDB for lots of projects. I don't think MongoDB is worthless, but from my experience I also have concerns about it.

If so many people migrate to other technologies, why is MongoDB still so popular? Have a look at http://db-engines.com/en/ranking_trend, MongoDB has just passed PostgreSQL


They don't write if they looked at other databases. I think there are a lot of alternatives that might have worked better for them than MongoDb and even Mongo 3.0 might have worked better. Regarding the schemalessness for simple new properties they should probably have fixed that in the ORM layer, it can add default values if needed. That something is null can actually happen in sql also.


> For example, when defining a field as int(11) you can just happily insert textual data...

My eyes practically fell out of my head when reading this.


The section, "The Problem with Schemaless," blames the technology instead of whoever put the data in there in the first place.

If the code has to handle both page.title and page_title, this is a feature of using a schemaless technology.

Also, lots of the issues the author had with MongoDB are also to be found in MySQL, e.g. taking hours to recover from a corrupt database/datastore.


It's just that if you have a sizable team (or worse over a long time with low overlap) working on a product, you intrinsically get an accumulation of duplicates, deprecated, or both "keys" in your schema-less schema.

With a DB that enforces a schema, the overhead of modifying the schema tends to moderate that nature.

I once worked in a lab (Ph.D. students are atrocious programmers BTW) where I introduced a schema-less store (cheesy K/V store) to handle some mundane metadata caching on some medical imaging. It was intended to store 4-5 attributes per PK, I never touched it after setting it up for what I needed but showed it to colleagues.

Fast forward 10 years and there were something like 3000 attributes defined. Several hundreds of which were serialized blobs. Huge amount of overlap between the different attributes.

Almost all that because people didn't know what was already in there so they just did their own thing.


I'm also moving back to Postgres. Postgres was the first DB I used when I learned to code PHP. MySQL came later. Just recently, around a year or two ago, I got introduced to Mongo and all its surrounding hype. Mongo is good for small small apps that need no references and have a low requirement for speed and data integrity.


I've been thinking of the same changes for a few months. We are running on MongoDB with a few M entries, and around 10GB of data. We use Scala and ReactiveMongo on the application side. So, my main fear is we're going to have a lot of overhead when trying to migrate to something like "ReactivePostgres".


It's very tiring to read things like this. And that is not meant as support for Mongo, quite the contrary in fact.


It's interesting that if you search for MongoDB vs Postgres, you won't find any articles suggesting you should go from P to M.

Supposedly WiredTiger for M 3.0 will be more everything, including faster and better.

We'll see.


The concept of moving from one database technology to another, especially something like NoSQL to Postgres, sounds like a huge task.

Aside from redesigning the schema/model and changing all the code, what about new backup/restore procedures? Scaling and performance best practices? Did you need to hire a postgres expert?


We didn't hire any experts, instead we educated ourselves on the matter. For example, one of the first steps we took was to run some rough benchmarks on Pg to see how it behaved compared to MySQL (https://github.com/olery/rds-shootout). Followed by this was mainly discussing PostgreSQL vs MySQL with those who used either one (or both) in production for a somewhat serious workload.

Backup/restoring is handled by Amazon RDS in our case, in the past we had a custom backup system for Mongo that backed data up to Amazon S3.


Will you publish the results of the benchmark?


You probably had a replica set and maybe an off-site replica. What are you using now, a single PostgreSQL instance, a master-slave cluster or any other distributed setup? If positive, which one of the many psql distributed technologies are you using? Thanks.


Before (Mongo): 1 primary, 2 secondaries, 2 arbiters Now (Postgres): 1 primary

Most of our applications require write access in some shape or form, so at least the default replication of Amazon RDS doesn't cut it. Besides that we don't really need it so far, don't see the need for it in the coming months either.


One problem I have with postgres right now is that whenever I want to upgrade the cluster (say from 9.1 to 9.3) there's downtime while the cluster is upgraded.

Not even vanilla replication helps, right now. I think I'd have to use something like slony to replicate between different versions of postgresql, but I never tried it.

How does upgrading postgresql versions work in RDS?


Simple but sensible.


Wonder if they evaluated rethinkdb?


I briefly looked at it, but seeing how young it was (and still is) I'd prefer not to bet any money on it for now, I'd rather use something tried and proven.


I feel like I'm missing something. So many comments in the article and here along the lines of, "just issues a warning and not an error."

Am I the only one here who's thinking that this is correct behavior on the part of the DBMS? Three result codes from an operation: 1) everything is okay, 2) I'm sorry Dave, I can't do that (error) and 3) Okay, if you insist, but I'm going to change your data to make it work.

Am I the only one who thinks: the programmer should be aware of and respond appropriately to ALL THREE, not just 1 and 2.? That anything else is just laziness?

Or is that just me? Am I missing some subtle consideration here that results in my thought process being naive? If I'm being naive I do want to understand what I'm missing, because getting schemas right and having my code react when improper data types are being used is sometimes a pain, ORM or not. But I've always thought it was the right thing to do.


Personally I feel like 3 should not be an option. Either things are okay or not okay, I don't like the "sort of okay maybe" option because that implies lots of fuzzy definitions and boundaries that the developer just has to learn.

Schema says int? Give it an int or fail. Not "it's sort of okay if you give it a thing which could be coerced into an int according to the database's ideas of coercability".


Thank god for return of sanity.


Who doesn't want NoSQL with ACID properties? Can anyone tell me what is superior on MongoDB compared to PostgreSQL?

If anybody is going to talk about scalability, there are many good alternatives to scale PostgreSQL too.


Not that im excusing mysql, but what I do when adding columns to a +10mil row tables is to create a new table with the new column and then insert into newtable from oldtable.

That changes those few hours into few seconds.


For some additional thoughts about schemaless vs. schema-enforcing datastores see

  https://gist.github.com/neunhoef/9b6749089775e472d44c


What system is he using to graph his queries response times?


New Relic.


Care to share your experience in terms of performance of Postgresql's JSONB vs Mongo? for both selects/inserts? and how about nested search queries?


In general, NoSQL databases are good for storing unstructured data. They start to fall apart when you want to query that unstructured data. You will hate yourself if you want to query something that is nested inside something, especially if one of the nesting levels is of the array JSON type. This is because indexes are required (just like in a normal database). Many times, Map-Reduces are required since the queries don't run in reasonable runtime. At some point, because you are storing the entire JSON document, relational databases start winning the space game (they don't store the name of the column with every document, like NoSQL does).


Haven't looked into JSONB that much yet so I'm afraid I can't give any numbers.


Am I alone in thinking that if a programmer writes a code that allows a string to be sent to an integer field in the database the issue is not with the DBMS?


Yeah. And since they use ActiveRecord, how would they even go about doing that in the first place? Over all the years I've used MySql, that has never been an issue for me. Sure it has its quirks, but show me a mature piece of software that doesn't.



There are ever more use cases for which relational data models are a poor fit.

For everything else, there are relational DBMS.


Isn't this a little premature to announce a victory? After all, they've been using MongoDB for 5 years and it delivered for most of the time. Who knows if this new shiny SQL thing will be sufficient for next 5 years? I'd say it's quite likely after few rounds of excited development the database will be slow and crappy again ;)


Do anyone recognize the application used for the metrics screenshots?


New Relic


Correct, this is taken from New Relic's APM service.


I think strict mode makes the warning an error


My aversion to NoSQLs is derived from the readiness with which uninformed people dive into them, integrate them with their products, and create a web of complexity around something that should ideally be boring and reliable: the database.

There are so many things that I've heard you "can't do in SQL" that are false, at least pertaining to Postgres. Semi-structured data, full-text search, "web scale" programming, geographical indexes... all of these things, people say "you can't do in a relational database" and that's not true. Postgres is fucking powerful and can do a lot, very well.

Arguably, PostgreSQL isn't always a "relational database". You can use it as a key-value store. It just happens that you often want relational logic in a multi-purpose, long-lived data store. The relational database seems to be an attractor; the requirements that accrue to a typical in-house "we can do it better"/NIH non-relational database often converge on it.

NoSQLs have their place at very large scale (100+ TB) and there are plenty of specialized reasons to use alternative databases-- I doubt that Postgres's full-text search is competitive with Elasticsearch-- but I feel like most of the anti-SQL sentiment is against the language. And sure, it's an ugly and outmoded language, but the database is one place where I'd rather have an ugly language and rock-solid tech than the other way around.

Finally, fuck ORMs.


I think 100TB deployments is a very high bar you've set for postgres. I think the one place NoSQL databases continue to serve (and where My/Postgresql don't) are highly available, sharded deployments - and people are running into the issue at the 10s of TBs (I can't imagine you'd want 5TB+ of data served from a single node - AWS largest SSD instance would be 6.4TB in RAID0, and serving your database off a single large RAID0 already sounds like a bad idea).

Not to say people aren't running these types of Postgres clusters, but its something not easily done.

I think the anti-SQL sentiment is a very MongoDB-esque thing, in that MongoDB tried to replace everything. Most of the other popular NoSQL solutions have very clear and cut use cases, and are pretty upfront about selecting the right database.


Agree with all of your points. The thing that perhaps annoys me the most is that people seem to assume that nosql is magically fast and that sql is slow... Sure, a key-value pair lookup runs like shit off a shovel, but as soon as you want to run anything more complex than that, it's likely faster with a relational database and a few indexes...

Usually the same people that think running an application on a cloud platform magically makes it fast, when in reality a VPS would be cheaper and faster.


NoSQL is not a well-defined term - there are huge differences between various solutions all commonly called "NoSQL", so the comparison to NoSQL is bogus. There are different NoSQL things designed with different use-cases in mind. On one side you'll have things like MongoDB, which is easy to set up, but doesn't really scale once your dataset grows out of memory, on the other side things like Cassandra which are unmatched by any RDBMS in terms of performance, scalability and availability on real-time, transactional workloads. Of course, your RDBMS of choice might be better at complex JOINs than Cassandra (which doesn't have JOINS at all, btw), but will it stand the chance at performace competition with Apache Spark or Hadoop? I don't think so.


Note that just because something is possible with PostgreSQL doesn't mean it's as easy to do with PostgreSQL than with an alternative database. The question could be rephrased as "why use PostgreSQL when you can do it in x?".


> Finally, fuck ORMs.

Amen to that. Many times I wondered, geez I could've written this with a simple SQL query, here I am reading the docs.


No comment.. (facepalm) and bad PR.


MongoDB was never about its benchmarking, scaling or sharding abilities. It was about two things:

#1: No schema. If you're prototyping, schemas just slow you down. Maybe that changes as your application matures.

#2: Great drivers. The amount of pain to get a basic CRUD app running with a SQL backend is just too high: push the data from the browser to the server in JSON or XML, validate, convert into database schema, sanitize using prepared statements or stored procedures, and send it off. Compare that with MongoDB: it's just JSON™.

The one thing that killed MongoDB was lack of reliability. Even when you're a startup, you may not lose data. I think MongoDB could have easily dropped 50% of performance & scalability for data security and it would be well off now. It was never intended to be the safe, sane choice for big enterprises.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: