
Sqlfmt: an opinionated online SQL formatter - mjibson
https://www.cockroachlabs.com/blog/sql-fmt-online-sql-formatter/
======
justin_oaks
Tangentially related. I'm of the opinion that SQL shouldn't be uppercased. I
don't want to hit shift or caps lock and I find lowercase easier to read.

Are there others out there like me? I assume I'm in the minority, but I don't
understand why. People stopped uppercasing HTML tags, why can't we do the same
with SQL?

~~~
vbezhenar
I never understood why people want to uppercase anything. It's like source
code from 1970, when even bold highlight was something unavailable and
developers used uppercase for keywords. I don't uppercase SQL, I prefer
software which does proper highlighting.

~~~
bagol
In the beginning, there was uppercase

~~~
labster
Originally, back in the Roman era, you had to delimit your VPDATE statements
with interpuncts.

------
mjibson
Author here. Based on feedback I am going to make a sqlfmt binary for easy
editor integration.

You can currently already do this by using the CockroachDB binary
([https://www.cockroachlabs.com/docs/releases/v2.1.0-beta.2018...](https://www.cockroachlabs.com/docs/releases/v2.1.0-beta.20180924.html))
but understandably people want a more minimal solution.

~~~
egorfine
Hey, thank you for the formatter! I really like the way it formats statements.
Although I have found two issues:

1\. It lowercases table and field names, rendering the formatted statement
useless for a project with not exclusively lowercased names. Could you please
fix this?

2\. I'd appreciate accepting common placeholder "?" as a valid part of the SQL
statement so that we can format queries with placeholders. Would that be
possible?

~~~
mjibson
The formatter adheres to CockroachDB rules which is why it acts like you
describe. Case is insensitive in names unless you double quote it. Postgres-
style (CockroachDB is one) uses "$1" for placeholders instead of "?". Adding
"?" would complicate our grammar enough that it's not worth adding it.

sqlfmt is not super useful for non-postgres grammars.

~~~
crooked-v
Consider those of us stuck with Java adapters to Postgres, which means
Postgres syntax but ? for placeholders.

~~~
mjibson
Is that a thing? The Java driver must actually parse those and convert them.
Java is crazy.
[https://github.com/mjibson/sqlfmt/issues/25](https://github.com/mjibson/sqlfmt/issues/25)
is the issue to track '?' as placeholders.

~~~
lilactown
It might be crazy, but having just gotten my hands dirty with some seriously
DB programming, JDBC has a hell of a lot of value in my view.

------
sbuttgereit
If you're working with something close to PostgreSQL syntax support, why
wouldn't you start with pgFormatter
([http://sqlformat.darold.net/](http://sqlformat.darold.net/))? It's been
around for awhile and I can't see anything new or substantially more advanced
in the proposed solution.

I have to say, I do most of my development work in the database, and I'm not
thrilled with any of the automatic formatters thus far. The ones I've seen can
do a good job in many cases, but if you start getting very advanced at all,
the weaknesses of an automatic formatting approach start to become more
evident. It may just be that anything other than relatively simple SQL, PL/SQL
are not well suited for automatic formatting.

I'm still working through a good set of formatting rules, and I always find I
end up making exceptions for readability.

~~~
coder543
pgFormatter formats

    
    
      SELECT 1, 2
    

as

    
    
      SELECT
          1,
          2
    

which is one of the core complaints the article addresses.

~~~
sbuttgereit
That actually seems like a relatively minor complaint on the order of
submitting a pull request to an existing fairly good project (as is
pgFormatter) rather than coming up with standard 15 for SQL formatting in a
new tool. An option where, if the column text doesn't overrun a column limit,
is single line and then split once it does overrun doesn't sound like a bad
feature, but it does sound like a small one.

(addition) More than anything that's wrong with all of these formatters for
SQL is that there real consensus for what good formatting is. There are
historical norms for sure, and that's likely a starting point, but
reconciliation between the communities around each database vendor (and
sometimes each company developing with that database) make that a difficult
challenge. So this new tool will become the CockroachDB formatter, pgFormatter
is as far as I know the most popular PostgreSQL formatter, other databases
your mileage will vary. Each not really an SQL formatter, but an SQL formatter
for (stick affinity here). For CockroachDB, I'm not sure how much payoff there
is for moving away from the more popular tool for Postgres if they're
stressing their similarity to PostgreSQL SQL.

~~~
kierenj
It sounds like this formatter is sufficiently different in its approach (based
on the prettier paper) that a PR would basically rewrite the existing one?

~~~
sbuttgereit
My point is why create something new on any basis, if there's a fairly good
solution to the problem that you can help improve?

Sure, once you've gone and created a brand new tool you've gone past the point
of a PR unless you were starting with a fork. My original question was, why
jump to a new tool out of the gate when the differences seem fairly minor.
Yes, I'm sure that pgFormatter probably didn't start with the prettier
paper... but so what? If you aren't coming up with something really different,
I'm not sure that matters.

Actually, thinking about it. What I need more than an SQL formatter is
something closer to a linter quite frankly. In complex queries it's easy to
miss something... I know, I know... I'm probably talking about something more
like an IDE since many of the errors are likely to be logical rather than
syntactic, but still... that would be a better problem to solve than a
formatter right now.

------
fefe23
Brief question, only tangentially related:

Is it me or is "opinionated" software popping up all over the place? It looks
to me like the author considers it a selling point, too.

Can someone explain to me why I would want an "opinionated" software over any
regular old software? Especially over a configurable software?

I'm not a native speaker, but I never encountered "opinionated" being used
with positive connotation. It has so far always been in the "well, uh, he's
very opinionated!" sense of apologizing for that old grumpy person who gets on
everybody's nerves.

~~~
abraae
The word does indeed have initially negative connotations.

Perhaps it resonates more positively now to people who have had to work with
"flexible" software such as Spring in the old days.

There were 100 ways to skin a cat with old Spring, and each way was a tortuous
devil's brew of XML. Googling for any solution was a nightmare, exacerbated by
hundreds of wannabe experts polluting the interwebs with their own strange
takes.

Spring Boot was introduced in response as an opinionated approach. You can
still skin the cat in 100 different ways, but at least you start with sensible
defaults that someone far more knowledgeable than yourself had deemed to be
good.

~~~
MaxBarraclough
That's a good point.

In programming, there's the idea that "The code shouldn't _merely_ work, it
should _clearly_ work". If your configuration system becomes so complex that
the same principle applies there, that's a bad sign.

------
gwbas1c
Great idea.

I work on a project that's grown in complexity and team size. Developers have
come and gone. Sometimes a bit of code that's grown over a few years needs its
style to evolve. This kind of formatter clearly realizes this.

What I tell developers that complain about this or that style is that it's
more important to have a consistent code style throughout the lifetime of a
project, then to adopt whatever bracing style someone happens to like at the
time.

I'd really like all of my code to be automatically formatted with a tool like
this, as part of the build process. This would help with situations where code
evolves, as it would reformat it when needed.

------
ianmcgowan
The problem with opinions is everyone has a different one.

I like [https://www.sqlinform.com/](https://www.sqlinform.com/) \- keep a copy
of notepad around just so I can use it. Wish there was a VSCode extension. The
free version was good enough for my needs, but it's so handy I upgraded to the
pro version just to show appreciation. The best part is the ability to really
customize the layout and save the settings. It's always step #1 when someone
sends me a massive blob of badly formatted SQL.

~~~
gregwebs
works for the mentioned case of SELECT 1, 2

Web site is kind of horrible.

------
clarkdave
This looks great! I notice you mentioned Prettier in the post — do you have
any plans to integrate sqlfmt with that?

Prettier actually has a beta Postgres plugin, but progress on it seems to have
stalled. Hooking it up with sqlfmt would be neat - then prettier could format
SQL inside other languages too, like code fences in Markdown or template
literals in JS.

~~~
joobus
I wish they didn't stop development: [https://github.com/benjie/prettier-
plugin-pg](https://github.com/benjie/prettier-plugin-pg)

I use plv8 very often, and this would be perfect.

------
JelteF
Looks really cool, but I have one suggestion. Have you considered using
leading commas? It makes big SQL statements much easier to edit and it
improves diffs. I normally use trailing commas for my code, but most sql
dialects don't support it so I fallback to leading ones there.

~~~
orthecreedence
Oh god, no. No leading commas, anywhere. Not in javascript, not in SQL. It's
like nails on a chalkboard.

Funny because I was reading other people's comments and thinking "wow, that
person is VERY opinionated on query formatting" (but now here I am).

~~~
stubish
> Funny because I was reading other people's comments and thinking "wow, that
> person is VERY opinionated on query formatting" (but now here I am).

There are opinions, and there are crimes against humanity (although that is a
matter of opinion).

------
llimllib
I'm sad this isn't available as a binary, it's not much use to me as a
website.

~~~
williamstein
I searched around and found
[https://github.com/jackc/sqlfmt](https://github.com/jackc/sqlfmt), which is
an older open source project also called sqlfmt, also written in Go. I don't
know anything about how they compare.

~~~
lopezator
It simply doesn't work with cockroachdb it throws syntax errors with valid
schemas/queries.

------
PaulJulius
I was just wishing for a better SQL formatter! I'm wasting too much time
pasting queries into a doc just to reformat them.

It would be great if the parser supported question marks as placeholders. The
pg_stat_statements table in Postgres saves denormalized queries, replacing the
literal values with '?' placeholders (e.g., `SELECT * FROM table_name WHERE id
= ?`).

------
blairanderson
I love this. Now to implement as:

\- javascript lib / (free atom plugin with this)

\- jetbrains plugin

\- vim plugin

------
fiatjaf
What I want: an SQL (actually, it could be just Postgres as I only use that)
parser that takes two schema declarations and calculates the commands needed
to go from one to the other, so we can have automatic migrations.

For example, if A is

    
    
      CREATE TABLE person (
        id serial PRIMARY KEY,
        name text
      )
    

and B is

    
    
      CREATE TABLE person (
        id serial PRIMARY KEY,
        name text,
        age int
      )
    

Then this tool would output

    
    
      ALTER TABLE person ADD COLUMN age int

------
DaiPlusPlus
I’d love it if StackOverflow would let me select code text and automatically
reformat it according to some standard spec (I don’t care which set of
opinions are used provided it’s readable!) - too many inexperienced (or
unqualified, it seems) people copy and paste ad-hoc written SQL that I always
have to manually reformat myself when I edit their questions for readability.

------
amake
I would love to package this (most likely from
[https://github.com/lopezator/sqlfmt](https://github.com/lopezator/sqlfmt))
for my package manager of choice. What is the license?

~~~
lopezator
Not very aware of license internals, as it uses CockroachDB codebase we'll
better ask @mjibson and add a LICENSE to "lopezator/sqlfmt" for better
understanding depending on their answer :)

------
weaksauce
that would be really cool to have as a local package you can plug into your
text editor instead of copy and paste. people using this at companies might
leak sensitive information or possible attack vectors to the opaque service.

~~~
mjibson
See my other comment. There are a few options for a local binary.

~~~
weaksauce
cool thanks!

------
kyberias
My opinion: text should never be aligned to right as the keywords here are.

~~~
lazzlazzlazz
That's exactly what makes this formatter so good.

The style is similar to
[https://www.sqlstyle.guide/](https://www.sqlstyle.guide/), which is
excellent.

------
gfody
I would challenge the decision to UPPER CASE keywords, it looks oldschool and
is unnecessary since syntax highlighting is ubiquitous. And right-justifying
is something that only looks clean for the simplest of queries and wreaks
havoc with editors that detect mixed tabs/spaces.

~~~
just_myles
I still prefer upper case and indentation. It's cleaner imo. Lower casing
makes everything blend together regardless of modern editors highlighting the
text for key commands.

------
qaq
standalone Query parser would be nice :)

~~~
lfittl
In case you're looking for something Postgres syntax compatible, and find cgo
acceptable:
[https://github.com/lfittl/pg_query_go](https://github.com/lfittl/pg_query_go)

(this uses the actual Postgres parser wrapped into a library)

~~~
qaq
Wow nice thank you for pointing it out!

------
keithnz
not sure what this offers over

[https://sqlformat.org/](https://sqlformat.org/)

or

[http://www.dpriver.com/pp/sqlformat.htm](http://www.dpriver.com/pp/sqlformat.htm)

both of which seem to give nicer output and handle more diverse sql without
erroring.

also not really sure why it considers itself "opinionated"

------
WorldMaker
Why all the SCREAMING? Is something wrong?

It's 2018, we aren't using teletypes with bad shift keys and we all have
syntax coloring IDEs, we don't need to scream out all our KEYWORDS like we
were still writing COBOL or FORTRAN in the 1960s.

It's become my biggest pet peeve with most SQL formatters. I think sentence
cased SQL is far more readable.

    
    
        Select * from sometable where SomeColumn > 3
    

But I realize I'm a minority with the opinion right now.

~~~
DKnoll
Yours is a fair opinion and I can't come up with any reason why it's wrong.

Personally I like it because it clearly separates keywords from objects
(tables, columns, databases, etc).

This is very much like the tabs vs. spaces debate.

~~~
WorldMaker
Syntax highlighting already clearly separates keywords from everything else,
and more flexibly to individual taste. (The keywords being both uppercase and
bright bold purple on the linked page here particularly caught my personal
distaste.)

Certainly there are still far too many places people are
writing/reading/interacting with SQL without the benefit of syntax
highlighting, but we have the technology to solve that (and in many cases need
to solve that; a lot of the places that SQL is embedded and the outer language
has no idea about are opportunities for SQL injection attacks).

> This is very much like the tabs vs. spaces debate.

It's sadly worse than that. It's something like 60 years of momentum versus
user/developer experience and readability concerns.

Despite my under-appreciated jokes that most SQL databases probably don't need
to be shouted out to do their jobs (Oracle maybe being an exception. ;), it
actually is a serious readability issue.

We have decades of research showing that people read/scan by word shape and
that lower-case and mixed-case is far faster and easier to scan/read than all
caps. 'from' has a more interesting and relatively more unique shape than
'FROM', making it easier to spot. (Especially in a monospace language where
the outermost shape of all four letter all caps words is essentially the same
width and height rectangle.) (If you try a syntax highlighting scheme that
does something more interesting than bolding keywords, italicizing them with a
good italic monospace font, the lower-case word 'from' can be almost beautiful
and very easy to find. A nice benefit to syntax highlighters is that sort of
variation is a personal option between you and your editors of choice.)

Typesetters have known for a long time that all caps is slower to read, and
should be used sparingly because it exhausts readers.

Sentence-casing SQL statements benefits from the readability of all the
English we read everywhere else.

I make jokes about shouting in our database queries, because it's sadly easier
than changing more than a half-century of momentum in the status quo, but I do
think that all-caps keywords in SQL statements is a huge papercut in dev
productivity/ergonomics that we have the technology to fix but apparently not
the interest/willpower.

------
datavirtue
"As my code gets more complicated or deeplier nested"

OMG will someone please tell college professors about Clean Code? Perhaps it
takes years of writing deeplier nested code before you even know what Uncle
Bob is talking about.

