
Better SQL strings in io.js - qooleot
http://ivc.com/blog/better-sql-strings-in-node-js/
======
jdc0589
I don't get the point. I have always had a "sql" directory in my project with
named `.sql` files for all my queries. Then just load them up with a
`require.extensions["sql"]` override, or another mechanism. You should be
using sql parameters anyway, formatting/interpolation shouldn't even be a
consideration.

Plus, keeping queries in their own files makes the change history nicer.

~~~
qooleot
Hey you do you a middleware you use for this? I'd be interested in trying it
out for cases where I have large, semi-static queries and want the entire file
to be sql-language and not mixed with node code.

Also, how do you handle queries that are not pre-defined? I have a lot of
screens where the user can select 'categories' (a collect of join-statements
basically) and pick their own columns (including creating formulas so I can't
just select *), sort, group-by clauses, and "pivot" (using CTEs or the
crosstab feature) dynamically.

~~~
emidln
In clojure, I define any queries that are semi-complicated in their own .sql
files with simple ? parameters (using yesql to load them and send them across
parameterized). The really dynamic stuff (off an open-ended HTTP API for
example) tends to get built out on top of a SQL abstraction layer (not quite
an ORM, but close) called HoneySQL that turns data structures (easy to
manipulate) into SQL. Doing a similar thing with Python using SQLalchemy (the
other language I write DB code in sometimes) for the highly dynamic queries is
reasonable. I don't write db queries in other languages to be familiar enough
with their database libraries, but you should be able to accomplish something
similar.

------
femto113
Template strings will be awesome for lots of purposes, but the given example
is poorly conceived. The template in the original is separated from the bound
columns in all manners except for order (e.g. change the name of your
num_array variable and the SQL doesn't change at all). The second approach
creates a tight coupling between the query and the source of the values (i.e.
change a variable name and you must change the SQL source as well). If you
imagine a more mature system where the SQL is either generated (or stored
separately in hand-tuned files, or) you can see why this coupling is
problematic.

The specific gripe of "What if you want a new parameter near the beginning?"
is not an new problem: it has been addressed in SQL clients by implementing
named parameters. There is not a universal standard sadly but competent
solutions exist for node and postgres (cf.
[https://github.com/bwestergard/node-postgres-
named](https://github.com/bwestergard/node-postgres-named)).

------
rpedela
Cool! Does it automatically escape variables if a prepared statement cannot be
used such as variables within the SELECT clause?

A similar library which implements Postgres format() for Node:
[https://github.com/datalanche/node-pg-
format](https://github.com/datalanche/node-pg-format)

Disclaimer: I wrote it.

~~~
qooleot
Hey nice I actually looked at this library when I started. I really like the
sql identifier escaping, where most just do literals.

I'd thought about:

"select * from %I${my_table} where ..."

I think I'd probably feel more comfortable having a whitelist of allowed words
rather than just a blacklist, as there are probably ways around it. Either new
syntaxes in future Postgres versions, or mid-sql comments. Take this for
example:

 _Edit_ : sorry the * before the word "comment" below is being escaped by
hacker news. There is probably some hackernews-injection to get around this:}

select/ _comment_ / * from foo;

if "select/ _comment_ /" was the identifier, that's valid sql, but:

function isReserved(value) { if (reservedMap[value.toUpperCase()]) {

does a comparison on an exact match, not contains.

[https://github.com/datalanche/node-pg-
format/blob/master/lib...](https://github.com/datalanche/node-pg-
format/blob/master/lib/index.js)

~~~
rpedela
I like the whitelist idea. If you have time, maybe you could create an issue
or PR which goes into more detail on your thoughts.

As for your specific example, I am not sure I understand the issue. The SQL
keyword list (reserved list) is only used to determine if the identifier needs
to be quoted. So if your identifier is 'select', the library will quote that
because you can use SQL keywords as identifiers as long as they are quoted.
Your specific example produces this:

JS: format('%I * from foo;', 'select/comment/')

SQL: "select/comment/" * from foo;

Alternatively I could add an option to quote all identifiers regardless. I
might do that. :)

EDIT:

I didn't see your comment edit before responding. I would appreciate it you
created a Github issue where you can get the SQL formatting right. If the
library is doing something wrong, I want to fix it.

~~~
qooleot
An example of extending your reserved list would be the world "lateral" (new
type of join in 9.4). I'll do a PR if I get that far:}

RE: comments, yes it would be quoted and that would cause an error.

select/ _comment_ / 1; ?column? \---------- 1 (1 row)

# "select/ _comment_ /" 1; ERROR: syntax error at or near ""select/ _comment_
/"" LINE 1: "select/ _comment_ /" 1;

That's probably a silly example, and I was just trying to think through 'what
could go wrong with this?' before implementing. Thinking about it further,
thats probably not something you want the library to be responsible for
handling. I'd want to try out newline chars too since thats more common.

~~~
rpedela
Yeah I do need to go through the list of supported SQL keywords for 9.4 and
update the library. Thanks for the reminder.

The main goal of the library is to behave like Postgres format() plus some JS
sugar like handling an array of identifiers. If the library behaviour deviates
from Postgres format() then it is a bug.

BTW I personally like the %I${var} syntax you mentioned earlier.

------
ericclemmons
I'd still recommend using something like
[http://knexjs.org/](http://knexjs.org/) for prepared statements and the like.

Still, being able to `yield` queries was the motivation to use generators +
harmony for me.

~~~
je42
Nice. Looks like sqalchemy for js !

------
teddyuk
You really should use stored procedures to provide a layer of abstraction
between the code and database - you can't easily refactor the db schema + keep
your application online when your application access tables directly.

~~~
WorldWideWayne
I think you and I are probably spoiled by SQL Server :)

Postgre, MySQL and Oracle all have severely limited stored procedure
capabilities compared to SQL Server. Postgre can't even return multiple result
sets from a procedure or use transactions.

~~~
tracker1
Honestly, PostgreSQL is probably a bit more flexible for most use cases than
MS-SQL... and with plv8, even more so... I usually use the database as
relatively dumb storage anyways, so very rarely will create a view or proc
when needed.

Some bits in MS-SQL are somewhat painful... powerful, but painful.. dealing
with XML types is particularly nasty. I can't really compare that on
PostgreSQL's side... What gets me with MS-SQL is that I can setup and run with
very little need for digging into advanced configuration options. Replication
is relatively simple, and easy to do (though expensive).

On the other side, having played with PostgreSQL + plv8, it's _NICE_ ... if I
could get a full node environment inside PostgreSQL (node modules as function
libries), that would be amazing.

~~~
qooleot
You can actually load libraries in plv8. Its cumbersome, and not a simple:

var _ = require('lodash');

but it works. Part of the reason is plv8 is a _trusted_ language, meaning it
cannot read code from the file system. Basically, the libraries are loaded as
text into a table, and then evaluated by v8 using the library loader. Someone
could make a gulp task to make that a quick command line tool if they did it
often enough....

------
thom_nic
Is it just me or does the author ignore that his approach is vulnerable to SQL
injection unless the parameters are sanitized/ escaped?

Most SQL libraries in other languages that have interpolation seem to offer a
named param option where parameters can be passed as a hash, which gets around
the indexed-parameter issue while still properly escaping values.

That said, I'm thrilled that ES6 is finally getting string interpolation.

~~~
qooleot
Hey yes, sorry if not clear, but I definitely output a prepared statement and
do not homebrew any escaping! I'll take a look at adding a "TL;DR" section at
the top to point that out before the second page where I actually go through
how it works:

[http://ivc.com/blog/better-sql-strings-in-io-js-nodejs-
part-...](http://ivc.com/blog/better-sql-strings-in-io-js-nodejs-part-2/)

~~~
pfooti
Huh, wow. I hadn't realized template strings can be used that way in ES6. Here
I was ready to snark about sql injection. Instead I ended up learning that you
can really tweak how these template strings get compiled into code, in such a
way that you totally end up actually emitting $1 $2 stuff.

That's pretty awesome - I am using a SQL generation library (knex.js), which
I'm admittedly happy with, but primarily because I don't like dealing with $1
$2 $3 in my own code.

Now if you had a way to programmatically expand an array (for an IN ($1 $2 $3)
type of query where the array length isn't known ahead of time), I really
wouldn't need to use knex.

~~~
qooleot
RE: expand array for in-clause, its definitely not supported automatically by
node-postgres:

[http://stackoverflow.com/questions/10720420/node-postgres-
ho...](http://stackoverflow.com/questions/10720420/node-postgres-how-to-
execute-where-col-in-dynamic-value-list-query)
[https://github.com/brianc/node-
postgres/issues/431](https://github.com/brianc/node-postgres/issues/431)

which I confirmed:

promise catch err: { [error: invalid input syntax for integer:
"{"45","56","33"}"]

but I can make this work by altering my 'sql' tagged template string function,
and inspect for typeof <var> array and automatically expand. I wonder if there
are there other purposes for arrays other than the in-clause syntax that I
should check for.

~~~
qooleot
Followup on this. To avoid the in/not-in <array> expansion, the trick is
actually to use:

foo=any($1)

where $1 is an array instead of in ($1, $2, etc.). != any($1) is the same as
'not in'.

------
vkjv
I know it's a hack, but I always wrote multi-line strings as a function with a
comment. Super simple case:

var multiline = (function() {/*

I am

a multi-

line string.

*/}).toString().slice(16, -4);

Couple this with a proper parser for the syntax and a template library and it
has served me well enough.

------
failed_ideas
I wouldn't call that better. I'd much rather have a self escaping sql literal,
something like:

    
    
        var query = "select id, name from table where id = #{id:int}"

~~~
lucian1900
That won't use a parametrised query, though.

~~~
failed_ideas
Why not, I'm talking about a construct that doesn't exist. A SQL literal
string can be stored and auto parameterized prior to interpolation. You see
the :int isn't the name, it's the parameter/type safeyness that any good dev
would want.

------
netcraft
I was really hoping that the new string templates could clean up SQL from js,
but I hadn't researched it yet. This looks great.

------
eatonphil
Looks just like heredocs[0] in every other language. Any difference? How does
the implementation change from other node heredoc libs[1]? Syntax-level
support?

[0] [http://tldp.org/LDP/abs/html/here-
docs.html](http://tldp.org/LDP/abs/html/here-docs.html)

[1] [https://github.com/jden/heredoc](https://github.com/jden/heredoc)

~~~
jkrems
This is _not_ string interpolation. ES6 template strings can be used for
multiline strings and string interpolation but that's just the basic case.
What this article uses is "tagged template strings" \- template strings where
the interpolation is handled by a custom function. This way you can write
sql`where name = ${name}` and the name parameter can be automatically
escaped/sanitized.

~~~
mhd
Which seems only one step removed from calling a function with the
heredoc/template string as a parameter (namely a standard way to extract
template values).

Not saying that this isn't nice, I've always appreciated long string support
that in Perl/Python/Ruby.

------
andyfleming
What about backtick collisions with a SQL statement like the following?

sql` insert into `example table`...

~~~
qooleot
That is invalid in javascript, just like this is invalid:

var foo = "string "with quotes" done."

You can escape the inner backquotes \\` but then its just a backquote
character in the string. It doesn't do anything special like call the template
string again.

~~~
Arnavion
Backticks are used around names that aren't valid SQL identifiers. In GP's
example, he has a table with a space in the name ("example table") so it needs
the backticks around it. It's not his intention to have the string be
templatized twice.

~~~
mhd
Only in MySQL, as far as I know. And doesn't even that support the standard
way of using regular quotation marks?

------
mister_m
Are stored procedures not something that people are using any more?

~~~
untog
I used to work on a system that require a stored procedure for every database
call - it became very annoying. I understand that stored procedures used to be
the only way to ensure a query was optimised, but that isn't the case these
days.

~~~
blktiger
Don't most database libraries convert parameterized queries into stored
procedures at runtime? None of the overhead of manually managing stored
procedures, but all of the safety benefits.

~~~
towelguy
And the query stays in your version control instead of the server.

------
flockonus
just [http://knexjs.org/](http://knexjs.org/)

~~~
qooleot
I really like knex, and have projects that use it too!

Sometimes, and especially on an intranet/admin tool thats not meant for
widespread consumption, I start with a sql query (or an analyst hands me one)
and it just needs to be a report/chart on a web page and isn't mean to be part
of a product's API and flexible. Converting it to knex, and especially if the
analyst changes their mind and gives me a brand new sql query, isn't really
worth the effort (opportunity cost of working on that code vs something more
fruitful).

