Hacker News new | past | comments | ask | show | jobs | submit login

I once agreed with this, but now I don't. I just want to write SQL (dammit!).

I can never, ever remember the intricacies of the Sequel API or any one of these query builder APIs. I am always looking up something that is rather trivial because I am thinking in SQL, the language, and always have to convert back to Ruby or whatever language I am working in.

CTEs and SQL functions in PostgreSQL strike a good balance in the composibility department for me, while still being able to write SQL.

Let's say you have a product search screen in your application. There's a text field for filtering on product title (WHERE title LIKE), one for filtering on UPC (WHERE upc LIKE), a couple range filters for min/max prices (WHERE price <= and/or WHERE price >=), and then on the results screen the user sort on a few different columns (ORDER BY) as well as paginate and set number of results per-page (LIMIT + OFFSET).

How exactly are you going to just write SQL if the actual query statement needs to change based on the user input? Are you going to hardcode every possible permutation, or are you going to use a query builder? Or are there some advanced Postgres features that I'm not aware of that would allow all these combinations in some kind of prepared statement?

edit: The question I'm asking is pointed towards the people who are implying that you can "just use SQL" as static statements that are not dynamically assembled. Like a static function or prepared statement that takes some parameters, and at runtime you only pass in those parameters - not rejigger the actual SQL statement fragments. Your ad-hoc query builder implementations with SQL injection vulnerabilities are not relevant to what I'm asking.

How exactly are you going to "just write SQL" if the actual query statement needs to change based on the user input?

How about something like this:

  s = Select.new
  s.add "WHERE title LIKE #{title}" if title
  s.add "WHERE price <= #{price_range}" if price_range
  s.add "LIMIT #{limit}" if limit
  s.add "OFFSET #{offset}" if offset
  s.add "id, blah, boo FROM products"
Note how I deliberately shuffled the order and didn't bother with escaping.

Also note how anyone who knows SQL could immediately work with this, learning curve: 5 seconds.

Why is there no ORM that works like this?

Python SQLAlchemy works a bit like this, if you use it that way:

    query = session.query(Products)

    if limit:
        query = query.limit(limit_val)

    if offset:
        query = query.offset(offset_val)
Sort is a bit tricky, but as I'm a noob at SQL Alchemy, I had some not-so-pretty boilerplate code implemented for decorating queries with custom sorts. But essentially it boiled down to:

    if sort_type == SORT_REVERSE_CONST:
         sort_obj = sort_obj.desc()
         sort_obj = sort_obj.asc()

    query = query.order_by(sort_obj)
And if you want to search through dynamic tables, change the initial query instantiation to something along these lines:

    query = session.query(retrieve_db_object(table_name))
And to execute:

    return query.all()

Python SQLAlchemy works a bit like this

All ORMs work "a bit like this". I don't want "a bit like this". I want exactly like this.

Because with my proposed interface I could be productive immediately and permanently. I would never have to refer to any documentation. Not once.

With every other ORM this is a pipe dream. Their "fancy" chainable wrapper-APIs are nothing but a ball on a chain.

Well, no, they're more than a "ball on a chain" - they're a tradeoff - like everything in software.

They give you the ability to manipulate the query in interesting ways at any point before you execute it. They let you join different queries together, built by different parts of the system, in a safe way. They let you work with the native language you're working in instead of having to construct clauses in a foreign language, using strings. They let you post optimise your loading strategies.

You might want "exactly like that", but it's not going to be as flexible as a chainable wrapper system.

Hell, I could probably give you pretty close to that in python with a little work, but it's not something I'd want to use myself.

I get it. I like working in SQL too. I know it really really well, and I cringe when I see developers writing totally sub-optimal code because they don't understand the relational data model.

But there are other ways to do things, and what you're describing doesn't give you much more than raw SQL, so why not just use raw SQL? You've added some syntactic sugar that, in my language (Python), would be a bit of a horror show (strings access local variables implicitly, no thanks). What else do you gain?

This is more verbose, granted:

    q = session.query(Products)
    if title:
        q = q.filter(Products.title.like(title))
    if price_range:
        q = q.filter(Products.price.lte(price_range))
    if offset:
        q = q.offset(offset)
    if limit:
        q = q.limit(limit)
    products = q.all()
But then you get more stuff for free, like drilling down into the other tables:

    p = products[0]
But that's rubbish, because you'll be loading in a really inefficient way. That's ok though, tell the system how you're going to want to load the additional data.

    q = q.options(
See what I got with my "ball and chain"? Turns out it was actually the anchor for the whole boat. Sure, you have to learn a new syntax, sure, it's not sql, but that doesn't make it bad or wrong.

Use whatever makes sense for your use-case. Don't limit yourself because you'd have to learn something new. Honestly, before using SQLAlchemy I mostly felt the same as you do. Many ORMs get in the way, but that's not really a problem intrinsic to ORMs.

>See what I got with my "ball and chain"? Turns out it was actually the anchor for the whole boat. Sure, you have to learn a new syntax, sure, it's not sql, but that doesn't make it bad or wrong.

If you are fine with the new debugging this entails. Take the joined load example you gave. Almost every time I encounter SQLAlchemy code, the author didn't understand the lazy joining semantics and every attribute access on each record is emitting a new query and you now have O(n) queries. It's not obvious that is happening until it bites you once you have thousands of records.

For another example, I can count on one hand the number of developers I have encountered that can understand the expiration of the objects in the session or what the engine builder stuff is doing in the boilerplate they paste in to get it to work. It always requires experimentation, a trip back to the SQLAlchemy docs, and then finally logging all SQL queries to see if it's finally emitting the optimal queries with the transaction guarantees you are looking for.

My gripe about SQLAlchemy is that it separates you too much from what is happening underneath.

Yup. I'll give you most of that - there's a pretty big learning curve with SQLAlchemy.

But, taking the lazy joining example. Someone didn't understand how it worked and now you have to fix it, which is probably as simple as changing that loading strategy to something more suitable for the way you're using the data.

Let's rewind to the late 90s. If I wanted two tables of data to write into the page in an interlaced fashion I would look at how wide they were. I'd make a decision about using two queries or using one larger join. The two query approach was sometimes required due to the size of the data, but it meant cryptic output logic to track positions of cursors relative to each other. The single joined query was simpler to deal with, but still required tracking a last_parent_id so you could swap during the interlacing.

Other developers (those same ones that didn't understand lazy joining) would loop over the first query, running extra queries in the loop (I saw this a lot). Same bad performance as the lazy join.

When you discovered this issue in the code it was a total pain to fix. You're talking about rewriting whole load of code to the point of being unrecognisable from the code you started with.

Contrast that with how easy it is to fix in the SQLAlchemy case. I mostly don't worry about loading strategies now until I'm deep in development. Something's running a bit slow, take a look at some logs, tweak a couple of things and you're golden again. That's such a powerful abstraction.

Regarding the config of the engines etc, again, it's something you need to learn. But really, someone's starting an application from scratch and they just want to dump some code in to handle all their db interactions, but they don't want to know how it works? That's on them, either learn it or use the sensible defaults (in, for example, flask-sqlalchemy).

SQLAlchemy ORM separates you from what's underneath, but it's an ORM, that's kind of the point. If you need to be closer to the metal, use SQLAlchemy Core.

Good news! Almost every language offers some sort of string concatenation with no protection whatsoever against injection, which seems to be what you're asking for.

His proposal is not string concatenation, and it is not vulnerable to injection.

"WHERE price <= #{price_range}" looks like raw interpolation to me. How do you make that not vulnerable to injection unless you're escaping all variables that might be used in a query?

How do you make that not vulnerable to injection unless you're escaping all variables that might be used in a query?

It was just a mockup. But you are right, in reality it would end up looking more like this (and use custom interpolation for escaping):

   s.add "WHERE foo > $(bar)"
Likewise a smart syntax for clause combining (AND/OR) and some kind of nesting would probably be needed.

I believe both of these problems should be solvable without compromising the simplicity of the approach.

You still need to supply the local variables to your ORM-like so that it can perform interpolation into the string (a feature which is actually best done by the database driver, NOT your ORM-like). Perhaps we can suggest this syntax:

s.add "WHERE foo > ?", bar

Or even this:

s.Where "foo > ?", bar

At that point, you've reinvented ActiveRecord or hundreds of other query builders, it also avoids you having to remember the ordering rules of sql as to which part of the query must be built first. There's a reason these query builders have converged on similar syntax.

I see where you're coming from - you don't want to learn two sets of syntax, but many query builders nowadays are very well thought out, and have a simple syntax which just echoes SQL while avoiding sqli and statement order issues, they also usually have an sql() option which lets you just send SQL if you wish for a complex query.

There are very good reasons people keep inventing layers over SQL:

They help avoid sqli

They centralise db access so you can optimise in one place by say adding caching layers to queries

They mean you don't have to deal with the vagaries of SQL for simple queries unless you want or need to

"You still need to supply the local variables to your ORM"

That can be done automatically in many languages. At least in the sense you can do:

"where value=$(bar)"

and not:

"where value = ?", bar

OpenACS was an old web framework for the Tcl language, but it even had that 20 years ago!. The company backing it failed in the .com era, and the lack of types was a turnoff for many - but it made SQL much much more readable. And of course the queries were turned into prepared statements. A query looked like this:

select foo from table where age>:min_age and name != :user_name

Things like this can also be done in node.js (disclaimer: I wrote the blog post below):


Perhaps we can suggest this syntax: s.add "WHERE foo > ?", bar

I'd say stick with keeping it a string and use:

  s.add "WHERE foo > $(bar)"
but many query builders nowadays are very well thought out, and have a simple syntax which just echoes SQL

And that's the problem. Everyone "echos" SQL, just slightly differently, each with their own idiosyncraticies.

If all you do is echo anyway, why don't you just let me write plain SQL, for christ's sake?

It's much easier for humans to memorize a slight SQL dialect than a different god-awful, nested ORM API boilerplate for every language they're working with.

I can write this without thinking, for queries of almost any complexity:

  s.add "SELECT id from users WHERE"
  s.add "name == $(name)" unless name.nil?
  s.add "AND age > $(age)" unless age.nil?
  s.add "GROUP BY age"
A parser/query-builder doesn't need amazing smarts to infer exactly what I want from this sloppy syntax; e.g. drop the "AND" if the first condition was false and present me with meaningful error messages when something is ambiguous.

Ask me to formulate the same query in any ORM and I'll have to refer to the documentation. And almost every time I'll waste a stupendous amount of time on getting some irrelevant, syntactic detail just right. Only because the ORM insists on an autistic API.

They mean you don't have to deal with the vagaries of SQL for simple queries unless you want or need to

After having worked with Hibernate, SQLAlchemy (the one-eyed amongst the blind), ActiveRecord, DataMapper, Sequel and a few others that I don't even remember, I conclude with passion: They have it backwards.

It's not the vagaries of SQL that I don't want to deal with. It's the vagaries of the ORM du jour that I don't want to deal with.

If all you do is echo anyway, why don't you just let me write plain SQL, for christ's sake?

They usually do more. SQL is not very pretty or friendly IMO, ORMs clean it up a bit (for example allowing arbitrary ordering, sqli, construction of multiple queries from one base). The example I gave was an arel query - very similar to yours if you move the WHERE out into the function name.

Your ORM proposal above looks fine (and very similar to existing ones but with more strings), but personally I find it uglier than the many which are available, and you haven't actually tackled the protection against sqli or explained how that would work. Normally you'd pass that through to the db driver as a $n param, rather than trying to escape a string at the ORM level. If your ORM is going to infer details from the sql, it basically has to parse that sql first and handle all the same things that current ORMs do by using named functions to add SQL - you'd find the same issues with consistency and edge cases I imagine, and of course have to be very careful about parameters and sqli.

So if you don't like ORMs, don't use them, but you do have to be sure that your own ad-hoc home-grown ORM that you're writing instead covers all the bases current ones do. I prefer the syntax of the ones you're denigrating for simple queries, even if for complex queries it might be better to write sql directly (via the functions they provide), but would be interested to see it in action - when are you going to start :)

Or you could just use Sequel and not have to build up your own library.

It's pseudocode for an implementation that doesn't exist. There is nothing in his proposal that requires this hypothetical implementation to copy the value of the price_range variable into the string at all, much less unescaped.

The strings don't even have to be sent to the DB at all. You've utterly missed his point. He wants the language to be intelligent about what the SQL means and do the right thing.

It's not raw SQL, it's an abstraction.

The psuedocode in question happens to be syntactically valid Ruby. If the goal was to demonstrate how simple query generation could be without abstraction layers like Sequel, it is a valid criticism to note that the example given is eliding a necessary feature. Especially since helping avoid injections is one of the primary advantages of such an abstraction layer.

The original argument was that if one could "just write SQL" one wouldn't need to refer to documentation to become productive, "not once". I don't personally think it's practical to use any database library without reading the documentation, but certainly the moment you get away from "the SQL I type in goes straight to the database" you're going to need documentation to tell you what's going to change.

What's being described here sounds like a library that parses SQL fragments, combines them into whole queries, and sends them off to the database. To implement this this without resorting to simple string concatenation needs an intermediate representation of SQL language components. That intermediate representation is going to look a lot like SQLAlchemy, or Sequel, or Arel. That means that instead of "just writing SQL" you're actually adding an extra layer of abstraction and another chunk of documentation to read.

All that said, the "just use SQL" approach is already fairly well served. Most languages have mature bindings to specific databases as well as at least one ODBC-style API to simplify access to those bindings. Nearly all of these have support for client-prepared statements, which handles the interpolation problem. If you want to build more complex queries on the fly, you can use string concatenation or CTEs or decide to use a SQLAlchemy/Sequel/Arel-type library for just that one case.

And, honestly, you don't need to give up SQL to make use of Sequel. I personally really dislike the query composition API demonstrated at the top of this thread, but you can do useful things with Sequel without it:

    db = Sequel.sqlite
    db.execute "create table foos (id int not null, name text)"
    db.execute "insert into foos (id, name) values (1, 'bar')"
    # or
    db[:foos].insert(id: 2, name: "baz")
    db["select * from foos where id = ?", 1].first # ==> {:id=>1, :name=>"bar"}
    # or
    db[:foos]["id = ?", 2] # ==> {:id=>2, :name=>"baz"}
No need to go any further than raw SQL, but increasing levels of abstraction available if you feel like it.

> The psuedocode in question happens to be syntactically valid Ruby.

So is everything else anyone types, code-like or not. He even said "Note how I deliberately shuffled the order and didn't bother with escaping.".

The response was flippant, intelligence-insulting, and obviously the result of failing to read thoroughly.

And speaking of intelligence-insulting, we all know you can run raw SQL through Sequel.

You're not having a useful dialogue, you're being combative, like the person I initially replied to.

I said the code snippet looked like raw interpolation to me, and I asked how it could be made not vulnerable to injection. It was an honest observation and a genuine question. No flippancy was involved. You're free to think I'm an idiot, but you are the one being insulting and combative here.

As everyone has repeatedly pointed out, once you deal with the obvious issues you quickly end up with something that looks like the gazillions of other ORMs on the market; sorry if I hurt anyone's feelings.

There is. MyBatis, which is a minimal ORM that aims to keep you as close to SQL as possible, has support for dynamic SQL: https://mybatis.github.io/mybatis-3/dynamic-sql.html

Edit [responding to moe, below]: that's a matter of taste. I prefer to have my SQL _outside_ my code. If I have to write a little XML to make it happen, so be it. Additionally, unlike your example, a strict separation of SQL and data ensures SQL injection is not possible. That's also a worthy goal.


Does MyBatis have an API similar to what I outlined above?

Because the page that you link to shows a horror that I can't even begin to describe:

  <select id="findActiveBlogWithTitleLike"
    WHERE state = ‘ACTIVE’
    <if test="title != null">
      AND title like #{title}
SQL mixed with proprietary XML soup? Talk about combining the worst from all worlds...

Get rid of the XML and you have a good start, I'd say.

> Why is there no ORM that works like this?

Because no one wants to write a SQL fragment parser. You would have write a parser that:

a) Could be started at an arbitrary place in a query and figure out what it needed to do.

b) Would be SQL-dialect aware. (Maybe -- it might be possible to skip this.)

Were it not for (a), I'd imagine you could quickly build this by using some kind of debug feature of the database -- sending it a query to parse and getting back a parse tree.

What you're describing isn't an ORM, it's a SQL query abstraction.

Knex is a very convenient, Promise-oriented query abstraction, and it has "raw" methods when only SQL snippets will do. http://knexjs.org/

Generally I'll go with something like this for these conditional search based queries

    select id, blah, boo
    from products
    where 1
    and (title like :title} or if(:title = '', false, true))
    and (price <= :price_range or if(:price_range = -1, false, true) 
    limit :limit
    offset :offset
Its a trade off, bit of extra complexity in the query for simpler code logic. Really, I avoid using sql for these types of "search" / find methods, they're probably going to be slow as hell as soon as you get a sizable data set.

I really despise orm's its essentially taking a black box (sql/database) and throwing it inside a much less tested, less optimized, less documented black box.

For what reason, I have no idea, nobody has ever been able to convince me there is a problem that needs solving. Almost every argument i've heard amounts to problems sql already solved a decade ago. I can only speculate but I'll hazard a guess it comes down to lack of enthusiasm to really learn sql properly (even though the person uses it / debugs it daily)

Microsoft's Entity Framework and LINQ work pretty much like that, only with nice IDE support.

I'll try to write an example like that when I get home.

Of course, they're not "just SQL", and I do understand where the above comments come from.

And yes, sometimes I have to go look up how to do things I'd do trivially on SQL.

> Why is there no ORM that works like this?

Because you're only showing a query builder, the "relational" not the "object mapper".

From an OOD point of view, if the end result of that query will be Product instances, why am I using a Select object to create them and why is it having to do some sort of string parsing to determine the objects I'm loading?

if the end result of that query will be Product instances, why am I using a Select object to create them

Because we can just infer the type to be returned via the FROM-clause of the query.

and why is it having to do some sort of string parsing to determine the objects I'm loading?

Because, to cite the immortal Larry Wall:

   The computer should be doing the hard work.
   That's what it's paid to do, after all.
   -- Larry Wall
It has to do the string parsing, escaping and intelligent validation so I, the human, can write

   s.add "count(*) / sum(count(*)) * 100 as percentage"
instead of

  .select_append { ((count(star) / sum(count(star)).over) * 100.0).as(:percentage) }
The difference is that most people can read and write the former without thinking.

The latter version may not look much more complicated at a glance. But as we all know these seemingly trivial fragments, more often than not, take ungodly amounts of very frustrating trial & error before they play along.

You don't have to pass a block to select_append. With Sequel, you can do this:

   # db is a Sequel::Database
   s = db[:foobars]
   s = s.select Sequel.lit("max(id) as best_id")
   s = s.select_append Sequel.lit("count(*) / sum(count(*)) * 100 as percentage")
   # ==> "SELECT max(id) as best_id, count(*) / sum(count(*)) * 100 as percentage FROM `foobars`"
The only real difference here is Sequel.lit, which is needed for security (any secure DB interface needs to somehow be notified that strings are safe to put into a query without escaping). If I'm writing code that leans heavily on Sequel to build queries, I'll make a private method #sql that is an alias for Sequel.lit.

It's entirely feasible to use Sequel this way. I've used it like that in production. I loathe the query building DSL, myself, but it's strictly optional.

The only real difference here is Sequel.lit

Well, no.

I repeat: The real difference is that most people can't write even this utterly trivial snippet without studying the Sequel documentation first.

Now what if I want a WHERE-clause? Do I have to use s.where? Or s.select_append("where ...")? What if I need to combine them with AND/OR?

It's not ok that we have to think about all this boilerplate that has nothing to do with our actual query.

We shouldn't have to translate our perfectly unambiguous request from english (SQL) to klingon (Sequel API) in order to have it processed.

I'm skeptical that it's possible to assemble a working SQL query of much complexity unambiguously from an arbitrarily-ordered set of clauses. CTEs, subselects, and boolean conditions would require a nesting construct. Boolean conditions, order clauses, and some joins need to be assembled in a meaningful order. Eventually you're going to find a level of complexity in the query where it's necessary for a developer to read the documentation, either to figure out the API or to understand the method the library uses to stitch queries together.

There might be a sweet spot yet unreached in terms of allowing developers to work with SQL on top of as minimal a native binding as possible. I'm certainly not claiming that Sequel/SQLAlchemy/etc. is at or near that sweet spot. It might be necessary to use a language with language-level support (or macros) to really reach it.

How often do you have a need for non-engineers to be able to alter your database queries? Why is this a requirement or even desirable? There seem to be few cases where a query change needs to be made in isolation, without having to touch any of the downstream consumers of that data.

So if we remove this non-requirement of "anybody can write it", we're back to my original question: Ok, we the coders know we want a Product instance, but to build it we create a Select instance and then ask a library do so some sort of parsing?

Possibly no ORM exists that meet your criteria because there are issues with the design goals you're trying to achieve?

Have you thought about JSON as a partial query placeholder, and all the native methods available for traversal, merging, plucking, etc.?

For example, here is :

{ type: 'select', columns: [{ expression: 'orgJobs.org_job_code', alias: 'job_code' }], joins: [{ table: 'org_jobs', alias: 'orgJobs' }], limit: 100, offset: 0 }

which yields:

select orgJobs.org_job_code as job_code from org_jobs as orgJobs limit 100 offset 0

and so its easy to do things like:

query.limit = 100;

and of course:


and use something like lodash/underscore (or equivalent for your language) to add collection/array/object helper methods so you never have to write for-loops for things like mapping and property plucking.

Now you have to be aware of SQL escaping to prevent SQL injection attacks.

Everyone who works with SQL should be aware of SQL escaping. There's nothing wrong using a library to do it for you, but what you should have an understanding of what the library is actually doing. Otherwise how could you possibly test it? Just by hoping it's right?

One of the things I like about EntityFramework is that you write LINQ, which odds are these days every C# dev should be familiar with. So, standby a few gotchas (which SQL has as well), the code you write to query an in memory collection is the same code you write to query the database. This also makes it fairly easy to test because you can create in memory representations of your data and check you filters, projections, orders etc.

Stuff like your example is where, in general I prefer a sproc, or something more like the following...


there is, it's called anorm. https://www.playframework.com/documentation/2.3.x/ScalaAnorm

You don’t need another DSL to access relational databases SQL is already the best DSL for accessing relational databases. We don’t need to invent something new. Moreover the SQL syntax and features can differ from one database vendor to another.

Micro ORMs usually works very much like this. There is a problem when you want sub records, they are handled differently in each library or not handled at all.

You've formulated the composibility reasonably in that sentence but it's important to remember that is not the only problem the world faces in data manipulation.

I think the irony is that much of the problems of sql are related to a lack of tooling.

The lack of tooling is result of large numbers of devs being distracted by ORM which moves that particular problem into the language ide/editor.

Essentially we lack great *.sql editing, ide, macro, refactoring because of the lack of focus on sql itself.

edit: as an aside, string concatenation in the language of choice is its own road to hell.

> Essentially we lack great *.sql editing, ide, macro, refactoring because of the lack of focus on sql itself.

Except that my other tools are just an editor like emacs and makefiles. So now to be productive I have to have and ide for sql, learn sql, learn macros.

Interesting that SQL came first. If it was so easy and obvious to use it we would not have seen any ORMs by now. But every other project that uses SQL databases ends up with some kind of ORM.

One of the reason is probably because there is an impedance mismatch. Code is already written dealing with objects, functions, pointers, data. But when it talks to the database it has to talk via a command line string interface so to speak.

There is another type of impedance mismatch and that is learning a new language. One can argue SQL is there to keep & manipulate the most precious things you have -- data, so one should spend time learning it well. But the other point is, SQL is there to keep & manipulate the most precious things you -- data, so don't mess with it if you don't know SQL very well. Let someone else learn it (the author of the ORM).

I am not arguing for one point or the other, just kind saying how it is interesting how we have been going in a circle the last 15 years or so.

I think it's a mixed bag... and depends on what you want to accomplish... a lot of the time, I will find myself bypassing an ORM about 20% of the time... the other 80%, the ORM model is simpler.

I find LINQ + Entity Framework to be a blessing in C. Though I use the monad/function syntax not the LINQ query sugar when I use it.

I also use emacs(evil-mode actually). An ide is not much different then a fully configured vim or emacs.

sql-mode for some databases(mysql and oracle iirc) reads the DDL of the tables you are typing about, why is this not better and more general?

Sql is so well established and general as a solution that not learning it is a poor choice.

You mean something like this? https://www.jetbrains.com/dbe/

Yes that was in my mind as I was typing it actually :)

Notice that its only a yearish old I think?

The IntelliJ IDEA Database plugin [1] on which 0xDBE is based on has existed for quite a while. I'm not sure when it become very useful as I've only used it for a few years, but it is quite powerful and useful.

The SQL code completion is quite good (and it caches your entire database schema so it is very quick). That's not particularly special on it's own, but what makes this especially useful is IDEA's "language injection" [2] feature. This allows you to, for example, get completion for SQL when it is contained in some other language, which could be anything (e.g. a Java String, Ruby String, XML, etc). It will also analyze and report errors in this SQL on the fly.

And SQL statements contained in concatenated strings are no longer a problem because you can edit those in a separate editor window where you are only editing the SQL, and it automatically gets placed into the concatenated string.

Not sure if it addresses all of your specific concerns, though.

[1] https://www.jetbrains.com/idea/features/database_tools.html

[2] https://www.jetbrains.com/idea/help/using-language-injection...

I am rather fond of Sqitch and pgTAP as tools for database development.

Sqitch problem domain I handle a different way.

Pgtap is on my list of things to poke around with.

edit: thanks btw

I would write a SQL function that takes all of those as optional parameters and includes a lot of these:

    WHERE (_title    IS NULL OR title LIKE _title)
    AND   (_minPrice IS NULL OR price > _minPrice)
    AND   (_maxPrice IS NULL OR price < _maxPrice)

That ends up being very inefficient as the optimizer will come up with one mediocre plan that works for all those criteria.

This actually answers the question, although I would imagine the ORDER BY handling will look pretty messy (CASE statement perhaps?).

And I don't know what it would look like if the requirements changed to allow ordering by multiple columns with different possible sort directions... that might get back into dynamic SQL using a RETURN QUERY EXECUTE type of thing, which is basically using a query builder in your query language.

Most SQL engines support ordering by column number.

select c1, c2, c3, ... order by 2;

would order by "c2"

> Or are there some advanced Postgres features that I'm not aware of that would allow all these combinations in some kind of prepared statement?

I can't speak specifically to Postgres, but, in my experience, there is nearly always a way to do what you want with bound parameters. No advanced features are required. It often results in poor performance and redundant code that is hard on the eyes. You will get what you deserve, but sometimes you don't have a choice.

Here is an example of the horror:

    WHERE (:1 IS NULL OR :1 = item_id)
      AND item_num BETWEEN NVL(:2,0) AND NVL(:2,9999999999)
      AND item_date BETWEEN NVL(:3,'01-jan-1900') AND NVL(:3,'31-dec-4712')
      AND item_name LIKE NVL(:4,'%')
      CASE :5 WHEN 'NUM' THEN item_num WHEN 'DATE' THEN item_date ELSE item_name END,
      CASE :6 WHEN 'NUM' THEN item_num WHEN 'DATE' THEN item_date ELSE item_name END

Edit: I suppose you could also parameterize the ascending vs. descending sort, although I have never tried. My first thought is to duplicate each line in the ORDER BY clause: one bind parameter for ASC and another for DESC. Have each CASE return a constant if the bound value is NULL, and then bind NULL for the direction you do not want. Yuck.

I am not advocating any of this but am pointing out that bind parameters can be abused in surprising ways if you are backed into a corner.

You might be over thinking this... something like this might work out just fine. (I wouldn't necessarily do things this way, but rather keep a list of clauses and join them with " AND " to avoid keeping track of the "WHERE"s and "AND"s, but you get the point...)

    sql = 'SELECT * FROM products'
    args = []

    if title:
        sql += ' WHERE title LIKE %?%'

    if upc:
        if args:
            sql += ' AND'
            sql += ' WHERE'
        sql += ' upc LIKE %?%' 

    if price_min:
        if args:
            sql += ' AND'
            sql += ' WHERE'
        sql += ' price >= ?'

    if price_max:
        if args:
            sql += ' AND'
            sql += ' WHERE'
        sql += ' price <= ?'

    if order_by_name:
        sql += ' ORDER BY name'
    elif order_by_price_low:
        sql += ' ORDER BY price'
    elif order_by_price_high:
        sql += ' ORDER BY price DESC'

    conn.query(sql, args)

This is a query builder. If you're using your programming language to dynamically assemble the SQL statement fragments at runtime, then you're using a query builder regardless of if it is a library with a fancy DSL that assembles an in-memory SQL AST or some kind of ad-hoc string concatenation you rolled yourself like this.

The question I'm asking is pointed towards the people who are implying that you can "just use SQL" as static statements that are not dynamically assembled. Like a static function or prepared statement that takes some parameters, and at runtime you only pass in those parameters - not rejigger the actual SQL statement fragments.

Yes, logically, it's a query builder. But it's one that doesn't force any extra dependencies in my project and is very easy to troubleshoot.

Seriously, how is this different from "just use SQL". SQL was never really intended to be used by itself (except for manually typing queries into a console). Yes, what I wrote is code that writes code, but it's far different from a larger ORM or querying framework.

Disregarding my edit and how your query builder is irrelevant to my question anyway, if I change the requirement to allow the user to sort by multiple columns instead of one, that case statement is going to either combinatorially explode or you'll also have to add some ad-hoc SQL escaping to guard against SQL injection since you can't use bound parameters in ORDER BY clauses.

It's easy to see how libraries like Sequel get invented to help people do this, and I don't really see what the problem is when the DSL stays close to SQL semantics.

Dealing with multiple ORDER BY clauses isn't any more difficult, but you'd have to figure out how to get the user input into some kind of list. That will require validation too... Just using an ORM doesn't let you ignore input validation. You wouldn't trust a user to use their own column names in a WHERE clause anymore than you'd trust them in an ORDER BY clause.

Trying to differentiate between a DSL that interfaces with SQL and SQL is nonsensical. SQL is the original DSL.

> SQL was never really intended to be used by itself.

Hmm I thought that is how it was intended to be used? If it didn't, it would look like datalog, lisp or some binary protocol with prefixed lengths and whatnot.

It was intended to be typed in by analysts at a console. Who would then print the report on the dot matrix printer and mail to the headquarters or something of that sort.

That data has to get into the RDMBS somehow...

I guess a better wording may have been to say that SQL was never intended to be used in isolation. It can be, but you've always had interfaces to other languages / platforms. SQL is a data manipulation and retrieval language - the rest of the business logic has to be performed somewhere else (excepting complex stored procedures).

I was told it was intended to be easy to use for non techies at University. It is quite wordy, kind of the opposite of some Perl.

SQL was often combined with Cobol (might still be where they have Cobol).

was? You sound like lots of paces have got rid of it.

I know this isn't central to your point, but I thought I'd say it anyway. These days I have taken to doing:

    sql = "SELECT * FROM products WHERE TRUE"
    if upc:
        sql += ' AND upc LIKE %?%'
This saves having to worry about "AND" and "WHERE", and the extra "WHERE TRUE" results in the same execution plan (thus the same performance).

I also add an extra column to my SELECTS:

I can just use "--" to comment out any line without having to manage the last comma. Especially useful when building aggregations.

Awesome. That's such a straight-forward solution to one of the crappy edge-cases that SQL is so bad about (ending lists without commas, like JSON requires too, being another).

Now how is this an improvement over the Sequel example?

I understand the impulse to "Just write SQL." But in practice, with all the string concatenation needed to generate actual queries, you can't really see what the SQL will be without running all the code in your head anyway.

In practice, I'd reduce this to a query building function in my own code, so the readability would be much better. The benefit is that you don't need to include another library for it to work. Well, that and it is a strategy that works for all languages, not just Ruby.

And the downside is now you need to spend effort maintaing and debugging your custom query builder.

Yeah but a lot of it is quite generic, so could be reused in a few places and therefore make that extra effort worthwhile. In fact, it might be handy in a few places so you could package it up and then have a single place you have to maintain the query building code for several projects.

Wait a minute...

I think you're over estimating the amount of effort that I spend on this.

I've done this quite a bit as well: string building is pretty well understood, and works remarkably well.

Dynamic prepared(NO ADHOC, THIS IS CRAP) TSQL is a common occurrence for me in production, it is often a better choice that writing a bunch of conditional logic in SQL (to get sargeable queries) or a bunch of really unperformant case statements in where clauses.

I have written TSQL that changes the entire query (which was on the order of 80 joins) dynamically based on about ~20 different parameters, only a few actually passed into the stored procedure, the rest based on settings and configuration. It took a few hours to grok how everything came together at first, but a few tricks that the optimizer will remove means you can add arbitrary conditionals in whatever order you please.

The example you give is VERY trivial for dynamic SQL. postgres - http://www.postgresql.org/docs/9.1/static/ecpg-dynamic.html mssql - https://msdn.microsoft.com/en-us/library/ms188001.aspx

That can be solved by writing a SQL function and the user input values are arguments to the function.

    query = sprintf("select count(*) from data where %s = ?", column)
    results = sql.execute(query, filter_value);
So long as you leave the "value" portion of the query as a '?' (or %s, or whatever your connector requires) and don't use direct user input for the column names, you're still safe from SQL injection.

Even with PL/pgSQL, there is no way to build a SQL statement dynamically _and_ safely -- it's all just string concatenation there, too. However, using `CASE`, `WITH` and `LATERAL` you can have a root `SELECT` that returns one of a few different queryable code paths, and `WITH`/`LATERAL` allow you to reuse definitions.

Use format() for safely constructing dynamic queries.


You end up doing a lot of string concatenation even with that, though. It's not quite the same thing as having some kind of literal or combinator based setup.

  select(from('tab'), where(and(like('name','Mo%'),not(is('id',NULL)))

You can get a similar syntax with format() by nesting format() calls such as:

  EXECUTE format('SELECT * FROM tab %s', format('WHERE %I IS NOT NULL', 'id'));

Answer : Learn SQL. It is very easy to pass variables.

I'll deal with the problem when I have to.

However, the point is that I'd rather use SQL, it's simple as that.

Orms are not about syntax, they're about things you can't natively "think about" in SQL, like inheritance, composition, references and graph navigation

I prefer them for taking away the grunt work of needing to code updates and inserts for everything.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact