Hacker News new | past | comments | ask | show | jobs | submit login

How exactly are you going to "just write SQL" if the actual query statement needs to change based on the user input?

How about something like this:

  s = Select.new
  s.add "WHERE title LIKE #{title}" if title
  s.add "WHERE price <= #{price_range}" if price_range
  s.add "LIMIT #{limit}" if limit
  s.add "OFFSET #{offset}" if offset
  s.add "id, blah, boo FROM products"
  s.execute
Note how I deliberately shuffled the order and didn't bother with escaping.

Also note how anyone who knows SQL could immediately work with this, learning curve: 5 seconds.

Why is there no ORM that works like this?




Python SQLAlchemy works a bit like this, if you use it that way:

    query = session.query(Products)

    if limit:
        query = query.limit(limit_val)

    if offset:
        query = query.offset(offset_val)
Sort is a bit tricky, but as I'm a noob at SQL Alchemy, I had some not-so-pretty boilerplate code implemented for decorating queries with custom sorts. But essentially it boiled down to:

    if sort_type == SORT_REVERSE_CONST:
         sort_obj = sort_obj.desc()
    else:
         sort_obj = sort_obj.asc()

    query = query.order_by(sort_obj)
And if you want to search through dynamic tables, change the initial query instantiation to something along these lines:

    query = session.query(retrieve_db_object(table_name))
And to execute:

    return query.all()


Python SQLAlchemy works a bit like this

All ORMs work "a bit like this". I don't want "a bit like this". I want exactly like this.

Because with my proposed interface I could be productive immediately and permanently. I would never have to refer to any documentation. Not once.

With every other ORM this is a pipe dream. Their "fancy" chainable wrapper-APIs are nothing but a ball on a chain.


Well, no, they're more than a "ball on a chain" - they're a tradeoff - like everything in software.

They give you the ability to manipulate the query in interesting ways at any point before you execute it. They let you join different queries together, built by different parts of the system, in a safe way. They let you work with the native language you're working in instead of having to construct clauses in a foreign language, using strings. They let you post optimise your loading strategies.

You might want "exactly like that", but it's not going to be as flexible as a chainable wrapper system.

Hell, I could probably give you pretty close to that in python with a little work, but it's not something I'd want to use myself.

I get it. I like working in SQL too. I know it really really well, and I cringe when I see developers writing totally sub-optimal code because they don't understand the relational data model.

But there are other ways to do things, and what you're describing doesn't give you much more than raw SQL, so why not just use raw SQL? You've added some syntactic sugar that, in my language (Python), would be a bit of a horror show (strings access local variables implicitly, no thanks). What else do you gain?

This is more verbose, granted:

    q = session.query(Products)
    if title:
        q = q.filter(Products.title.like(title))
    if price_range:
        q = q.filter(Products.price.lte(price_range))
    if offset:
        q = q.offset(offset)
    if limit:
        q = q.limit(limit)
    products = q.all()
But then you get more stuff for free, like drilling down into the other tables:

    p = products[0]
    p.supplier.contracts[0]
But that's rubbish, because you'll be loading in a really inefficient way. That's ok though, tell the system how you're going to want to load the additional data.

    q = q.options(
        joinedload('supplier').
        subqueryload('contracts')
    )
See what I got with my "ball and chain"? Turns out it was actually the anchor for the whole boat. Sure, you have to learn a new syntax, sure, it's not sql, but that doesn't make it bad or wrong.

Use whatever makes sense for your use-case. Don't limit yourself because you'd have to learn something new. Honestly, before using SQLAlchemy I mostly felt the same as you do. Many ORMs get in the way, but that's not really a problem intrinsic to ORMs.


>See what I got with my "ball and chain"? Turns out it was actually the anchor for the whole boat. Sure, you have to learn a new syntax, sure, it's not sql, but that doesn't make it bad or wrong.

If you are fine with the new debugging this entails. Take the joined load example you gave. Almost every time I encounter SQLAlchemy code, the author didn't understand the lazy joining semantics and every attribute access on each record is emitting a new query and you now have O(n) queries. It's not obvious that is happening until it bites you once you have thousands of records.

For another example, I can count on one hand the number of developers I have encountered that can understand the expiration of the objects in the session or what the engine builder stuff is doing in the boilerplate they paste in to get it to work. It always requires experimentation, a trip back to the SQLAlchemy docs, and then finally logging all SQL queries to see if it's finally emitting the optimal queries with the transaction guarantees you are looking for.

My gripe about SQLAlchemy is that it separates you too much from what is happening underneath.


Yup. I'll give you most of that - there's a pretty big learning curve with SQLAlchemy.

But, taking the lazy joining example. Someone didn't understand how it worked and now you have to fix it, which is probably as simple as changing that loading strategy to something more suitable for the way you're using the data.

Let's rewind to the late 90s. If I wanted two tables of data to write into the page in an interlaced fashion I would look at how wide they were. I'd make a decision about using two queries or using one larger join. The two query approach was sometimes required due to the size of the data, but it meant cryptic output logic to track positions of cursors relative to each other. The single joined query was simpler to deal with, but still required tracking a last_parent_id so you could swap during the interlacing.

Other developers (those same ones that didn't understand lazy joining) would loop over the first query, running extra queries in the loop (I saw this a lot). Same bad performance as the lazy join.

When you discovered this issue in the code it was a total pain to fix. You're talking about rewriting whole load of code to the point of being unrecognisable from the code you started with.

Contrast that with how easy it is to fix in the SQLAlchemy case. I mostly don't worry about loading strategies now until I'm deep in development. Something's running a bit slow, take a look at some logs, tweak a couple of things and you're golden again. That's such a powerful abstraction.

Regarding the config of the engines etc, again, it's something you need to learn. But really, someone's starting an application from scratch and they just want to dump some code in to handle all their db interactions, but they don't want to know how it works? That's on them, either learn it or use the sensible defaults (in, for example, flask-sqlalchemy).

SQLAlchemy ORM separates you from what's underneath, but it's an ORM, that's kind of the point. If you need to be closer to the metal, use SQLAlchemy Core.


Good news! Almost every language offers some sort of string concatenation with no protection whatsoever against injection, which seems to be what you're asking for.


His proposal is not string concatenation, and it is not vulnerable to injection.


"WHERE price <= #{price_range}" looks like raw interpolation to me. How do you make that not vulnerable to injection unless you're escaping all variables that might be used in a query?


How do you make that not vulnerable to injection unless you're escaping all variables that might be used in a query?

It was just a mockup. But you are right, in reality it would end up looking more like this (and use custom interpolation for escaping):

   s.add "WHERE foo > $(bar)"
Likewise a smart syntax for clause combining (AND/OR) and some kind of nesting would probably be needed.

I believe both of these problems should be solvable without compromising the simplicity of the approach.


You still need to supply the local variables to your ORM-like so that it can perform interpolation into the string (a feature which is actually best done by the database driver, NOT your ORM-like). Perhaps we can suggest this syntax:

s.add "WHERE foo > ?", bar

Or even this:

s.Where "foo > ?", bar

At that point, you've reinvented ActiveRecord or hundreds of other query builders, it also avoids you having to remember the ordering rules of sql as to which part of the query must be built first. There's a reason these query builders have converged on similar syntax.

I see where you're coming from - you don't want to learn two sets of syntax, but many query builders nowadays are very well thought out, and have a simple syntax which just echoes SQL while avoiding sqli and statement order issues, they also usually have an sql() option which lets you just send SQL if you wish for a complex query.

There are very good reasons people keep inventing layers over SQL:

They help avoid sqli

They centralise db access so you can optimise in one place by say adding caching layers to queries

They mean you don't have to deal with the vagaries of SQL for simple queries unless you want or need to


"You still need to supply the local variables to your ORM"

That can be done automatically in many languages. At least in the sense you can do:

"where value=$(bar)"

and not:

"where value = ?", bar

OpenACS was an old web framework for the Tcl language, but it even had that 20 years ago!. The company backing it failed in the .com era, and the lack of types was a turnoff for many - but it made SQL much much more readable. And of course the queries were turned into prepared statements. A query looked like this:

select foo from table where age>:min_age and name != :user_name

Things like this can also be done in node.js (disclaimer: I wrote the blog post below):

http://ivc.com/blog/better-sql-strings-in-node-js/


Perhaps we can suggest this syntax: s.add "WHERE foo > ?", bar

I'd say stick with keeping it a string and use:

  s.add "WHERE foo > $(bar)"
but many query builders nowadays are very well thought out, and have a simple syntax which just echoes SQL

And that's the problem. Everyone "echos" SQL, just slightly differently, each with their own idiosyncraticies.

If all you do is echo anyway, why don't you just let me write plain SQL, for christ's sake?

It's much easier for humans to memorize a slight SQL dialect than a different god-awful, nested ORM API boilerplate for every language they're working with.

I can write this without thinking, for queries of almost any complexity:

  s.add "SELECT id from users WHERE"
  s.add "name == $(name)" unless name.nil?
  s.add "AND age > $(age)" unless age.nil?
  s.add "GROUP BY age"
A parser/query-builder doesn't need amazing smarts to infer exactly what I want from this sloppy syntax; e.g. drop the "AND" if the first condition was false and present me with meaningful error messages when something is ambiguous.

Ask me to formulate the same query in any ORM and I'll have to refer to the documentation. And almost every time I'll waste a stupendous amount of time on getting some irrelevant, syntactic detail just right. Only because the ORM insists on an autistic API.

They mean you don't have to deal with the vagaries of SQL for simple queries unless you want or need to

After having worked with Hibernate, SQLAlchemy (the one-eyed amongst the blind), ActiveRecord, DataMapper, Sequel and a few others that I don't even remember, I conclude with passion: They have it backwards.

It's not the vagaries of SQL that I don't want to deal with. It's the vagaries of the ORM du jour that I don't want to deal with.


If all you do is echo anyway, why don't you just let me write plain SQL, for christ's sake?

They usually do more. SQL is not very pretty or friendly IMO, ORMs clean it up a bit (for example allowing arbitrary ordering, sqli, construction of multiple queries from one base). The example I gave was an arel query - very similar to yours if you move the WHERE out into the function name.

Your ORM proposal above looks fine (and very similar to existing ones but with more strings), but personally I find it uglier than the many which are available, and you haven't actually tackled the protection against sqli or explained how that would work. Normally you'd pass that through to the db driver as a $n param, rather than trying to escape a string at the ORM level. If your ORM is going to infer details from the sql, it basically has to parse that sql first and handle all the same things that current ORMs do by using named functions to add SQL - you'd find the same issues with consistency and edge cases I imagine, and of course have to be very careful about parameters and sqli.

So if you don't like ORMs, don't use them, but you do have to be sure that your own ad-hoc home-grown ORM that you're writing instead covers all the bases current ones do. I prefer the syntax of the ones you're denigrating for simple queries, even if for complex queries it might be better to write sql directly (via the functions they provide), but would be interested to see it in action - when are you going to start :)


Or you could just use Sequel and not have to build up your own library.


It's pseudocode for an implementation that doesn't exist. There is nothing in his proposal that requires this hypothetical implementation to copy the value of the price_range variable into the string at all, much less unescaped.

The strings don't even have to be sent to the DB at all. You've utterly missed his point. He wants the language to be intelligent about what the SQL means and do the right thing.

It's not raw SQL, it's an abstraction.


The psuedocode in question happens to be syntactically valid Ruby. If the goal was to demonstrate how simple query generation could be without abstraction layers like Sequel, it is a valid criticism to note that the example given is eliding a necessary feature. Especially since helping avoid injections is one of the primary advantages of such an abstraction layer.

The original argument was that if one could "just write SQL" one wouldn't need to refer to documentation to become productive, "not once". I don't personally think it's practical to use any database library without reading the documentation, but certainly the moment you get away from "the SQL I type in goes straight to the database" you're going to need documentation to tell you what's going to change.

What's being described here sounds like a library that parses SQL fragments, combines them into whole queries, and sends them off to the database. To implement this this without resorting to simple string concatenation needs an intermediate representation of SQL language components. That intermediate representation is going to look a lot like SQLAlchemy, or Sequel, or Arel. That means that instead of "just writing SQL" you're actually adding an extra layer of abstraction and another chunk of documentation to read.

All that said, the "just use SQL" approach is already fairly well served. Most languages have mature bindings to specific databases as well as at least one ODBC-style API to simplify access to those bindings. Nearly all of these have support for client-prepared statements, which handles the interpolation problem. If you want to build more complex queries on the fly, you can use string concatenation or CTEs or decide to use a SQLAlchemy/Sequel/Arel-type library for just that one case.

And, honestly, you don't need to give up SQL to make use of Sequel. I personally really dislike the query composition API demonstrated at the top of this thread, but you can do useful things with Sequel without it:

    db = Sequel.sqlite
    db.execute "create table foos (id int not null, name text)"
    db.execute "insert into foos (id, name) values (1, 'bar')"
    # or
    db[:foos].insert(id: 2, name: "baz")
    db["select * from foos where id = ?", 1].first # ==> {:id=>1, :name=>"bar"}
    # or
    db[:foos]["id = ?", 2] # ==> {:id=>2, :name=>"baz"}
No need to go any further than raw SQL, but increasing levels of abstraction available if you feel like it.


> The psuedocode in question happens to be syntactically valid Ruby.

So is everything else anyone types, code-like or not. He even said "Note how I deliberately shuffled the order and didn't bother with escaping.".

The response was flippant, intelligence-insulting, and obviously the result of failing to read thoroughly.

And speaking of intelligence-insulting, we all know you can run raw SQL through Sequel.

You're not having a useful dialogue, you're being combative, like the person I initially replied to.


I said the code snippet looked like raw interpolation to me, and I asked how it could be made not vulnerable to injection. It was an honest observation and a genuine question. No flippancy was involved. You're free to think I'm an idiot, but you are the one being insulting and combative here.


As everyone has repeatedly pointed out, once you deal with the obvious issues you quickly end up with something that looks like the gazillions of other ORMs on the market; sorry if I hurt anyone's feelings.


There is. MyBatis, which is a minimal ORM that aims to keep you as close to SQL as possible, has support for dynamic SQL: https://mybatis.github.io/mybatis-3/dynamic-sql.html

Edit [responding to moe, below]: that's a matter of taste. I prefer to have my SQL _outside_ my code. If I have to write a little XML to make it happen, so be it. Additionally, unlike your example, a strict separation of SQL and data ensures SQL injection is not possible. That's also a worthy goal.


MyBatis

Does MyBatis have an API similar to what I outlined above?

Because the page that you link to shows a horror that I can't even begin to describe:

  <select id="findActiveBlogWithTitleLike"
       resultType="Blog">
    SELECT * FROM BLOG
    WHERE state = ‘ACTIVE’
    <if test="title != null">
      AND title like #{title}
    </if>
  </select>
SQL mixed with proprietary XML soup? Talk about combining the worst from all worlds...


Get rid of the XML and you have a good start, I'd say.


> Why is there no ORM that works like this?

Because no one wants to write a SQL fragment parser. You would have write a parser that:

a) Could be started at an arbitrary place in a query and figure out what it needed to do.

b) Would be SQL-dialect aware. (Maybe -- it might be possible to skip this.)

Were it not for (a), I'd imagine you could quickly build this by using some kind of debug feature of the database -- sending it a query to parse and getting back a parse tree.


What you're describing isn't an ORM, it's a SQL query abstraction.

Knex is a very convenient, Promise-oriented query abstraction, and it has "raw" methods when only SQL snippets will do. http://knexjs.org/


Generally I'll go with something like this for these conditional search based queries

    select id, blah, boo
    from products
    where 1
    and (title like :title} or if(:title = '', false, true))
    and (price <= :price_range or if(:price_range = -1, false, true) 
    limit :limit
    offset :offset
    
Its a trade off, bit of extra complexity in the query for simpler code logic. Really, I avoid using sql for these types of "search" / find methods, they're probably going to be slow as hell as soon as you get a sizable data set.

I really despise orm's its essentially taking a black box (sql/database) and throwing it inside a much less tested, less optimized, less documented black box.

For what reason, I have no idea, nobody has ever been able to convince me there is a problem that needs solving. Almost every argument i've heard amounts to problems sql already solved a decade ago. I can only speculate but I'll hazard a guess it comes down to lack of enthusiasm to really learn sql properly (even though the person uses it / debugs it daily)


Microsoft's Entity Framework and LINQ work pretty much like that, only with nice IDE support.

I'll try to write an example like that when I get home.

Of course, they're not "just SQL", and I do understand where the above comments come from.

And yes, sometimes I have to go look up how to do things I'd do trivially on SQL.


> Why is there no ORM that works like this?

Because you're only showing a query builder, the "relational" not the "object mapper".

From an OOD point of view, if the end result of that query will be Product instances, why am I using a Select object to create them and why is it having to do some sort of string parsing to determine the objects I'm loading?


if the end result of that query will be Product instances, why am I using a Select object to create them

Because we can just infer the type to be returned via the FROM-clause of the query.

and why is it having to do some sort of string parsing to determine the objects I'm loading?

Because, to cite the immortal Larry Wall:

   The computer should be doing the hard work.
   That's what it's paid to do, after all.
   -- Larry Wall
It has to do the string parsing, escaping and intelligent validation so I, the human, can write

   s.add "count(*) / sum(count(*)) * 100 as percentage"
instead of

  .select_append { ((count(star) / sum(count(star)).over) * 100.0).as(:percentage) }
The difference is that most people can read and write the former without thinking.

The latter version may not look much more complicated at a glance. But as we all know these seemingly trivial fragments, more often than not, take ungodly amounts of very frustrating trial & error before they play along.


You don't have to pass a block to select_append. With Sequel, you can do this:

   # db is a Sequel::Database
   s = db[:foobars]
   s = s.select Sequel.lit("max(id) as best_id")
   s = s.select_append Sequel.lit("count(*) / sum(count(*)) * 100 as percentage")
   s.sql
   # ==> "SELECT max(id) as best_id, count(*) / sum(count(*)) * 100 as percentage FROM `foobars`"
The only real difference here is Sequel.lit, which is needed for security (any secure DB interface needs to somehow be notified that strings are safe to put into a query without escaping). If I'm writing code that leans heavily on Sequel to build queries, I'll make a private method #sql that is an alias for Sequel.lit.

It's entirely feasible to use Sequel this way. I've used it like that in production. I loathe the query building DSL, myself, but it's strictly optional.


The only real difference here is Sequel.lit

Well, no.

I repeat: The real difference is that most people can't write even this utterly trivial snippet without studying the Sequel documentation first.

Now what if I want a WHERE-clause? Do I have to use s.where? Or s.select_append("where ...")? What if I need to combine them with AND/OR?

It's not ok that we have to think about all this boilerplate that has nothing to do with our actual query.

We shouldn't have to translate our perfectly unambiguous request from english (SQL) to klingon (Sequel API) in order to have it processed.


I'm skeptical that it's possible to assemble a working SQL query of much complexity unambiguously from an arbitrarily-ordered set of clauses. CTEs, subselects, and boolean conditions would require a nesting construct. Boolean conditions, order clauses, and some joins need to be assembled in a meaningful order. Eventually you're going to find a level of complexity in the query where it's necessary for a developer to read the documentation, either to figure out the API or to understand the method the library uses to stitch queries together.

There might be a sweet spot yet unreached in terms of allowing developers to work with SQL on top of as minimal a native binding as possible. I'm certainly not claiming that Sequel/SQLAlchemy/etc. is at or near that sweet spot. It might be necessary to use a language with language-level support (or macros) to really reach it.


How often do you have a need for non-engineers to be able to alter your database queries? Why is this a requirement or even desirable? There seem to be few cases where a query change needs to be made in isolation, without having to touch any of the downstream consumers of that data.

So if we remove this non-requirement of "anybody can write it", we're back to my original question: Ok, we the coders know we want a Product instance, but to build it we create a Select instance and then ask a library do so some sort of parsing?

Possibly no ORM exists that meet your criteria because there are issues with the design goals you're trying to achieve?


Have you thought about JSON as a partial query placeholder, and all the native methods available for traversal, merging, plucking, etc.?

For example, here is :

{ type: 'select', columns: [{ expression: 'orgJobs.org_job_code', alias: 'job_code' }], joins: [{ table: 'org_jobs', alias: 'orgJobs' }], limit: 100, offset: 0 }

which yields:

select orgJobs.org_job_code as job_code from org_jobs as orgJobs limit 100 offset 0

and so its easy to do things like:

query.limit = 100;

and of course:

query.columns.push(additionalColumn);

and use something like lodash/underscore (or equivalent for your language) to add collection/array/object helper methods so you never have to write for-loops for things like mapping and property plucking.


Now you have to be aware of SQL escaping to prevent SQL injection attacks.


Everyone who works with SQL should be aware of SQL escaping. There's nothing wrong using a library to do it for you, but what you should have an understanding of what the library is actually doing. Otherwise how could you possibly test it? Just by hoping it's right?


One of the things I like about EntityFramework is that you write LINQ, which odds are these days every C# dev should be familiar with. So, standby a few gotchas (which SQL has as well), the code you write to query an in memory collection is the same code you write to query the database. This also makes it fairly easy to test because you can create in memory representations of your data and check you filters, projections, orders etc.


Stuff like your example is where, in general I prefer a sproc, or something more like the following...

http://www.ivc.com/blog/better-sql-strings-in-io-js-nodejs-p...


there is, it's called anorm. https://www.playframework.com/documentation/2.3.x/ScalaAnorm

You don’t need another DSL to access relational databases SQL is already the best DSL for accessing relational databases. We don’t need to invent something new. Moreover the SQL syntax and features can differ from one database vendor to another.


Micro ORMs usually works very much like this. There is a problem when you want sub records, they are handled differently in each library or not handled at all.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: