Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is an aside, but a colleague years back showed me his preferred method formatting SQL statements, and I've always found it to be the best in terms of readability, I just wish there was more automated tool support for this format. The idea is to line up the first value from each clause. Visually it makes it extremely easy to "chunk" the statement by clause, e.g.:

  SELECT a.foo, b.bar, g.zed
    FROM alpha a
    JOIN beta b ON a.id = b.alpha_id
    LEFT JOIN gamma g ON b.id = g.beta_id
   WHERE a.val > 1
     AND b.col < 2
   ORDER BY a.foo


This is the style I've settled on lately, where all the major keywords are left-aligned and the clauses are consistently indented. It uses a bit more vertical space, but I find it easier to read than any other formatting style I've seen (in the wild or produced by formatters).

    select
      a.foo,
      b.bar,
      g.zed
    from
      alpha a
    join
      beta b
      on b.alpha_id = a.id
    left join
      gamma g
      on g.beta_id = b.id
    where
      a.val > 1
      and b.col < 2
    order by
      a.foo
It's really easy, for me anyway, to get an overview of the query with this style compared to styles that are more cramped or that are inconsistently wrapped/indented.

For simpler queries, I think this is okay too, but only if the clauses easily fit on a single line:

    select ...
    from ...
    join ...
    where ...


I do this style too, but think of each new row as basically a new table being added to the query (especially if it's a query, which may go over several rows and is indented):

    from
      alpha a
      inner join beta b on b.id = a.id
      left outer join gamma g on g.id = a.id
      left outer join (
         select z.id, count(\*) as cnt from zeta z
      ) delta on delta.id = a.id
    where
      ...


That is my preferred style.

If there are many joins, I could also add more indention to show what is being joined to what.


My company style guide explicitly specifies this.

https://github.com/republicwireless-open/sql-style-guide

I think Mozilla's does as well.


It's also easier if you want to comment out certain parts of your code during debugging.


On this, I like to also use

   Where 1 = 1
   And...
   And... 
Which makes it easy to comment out specific filters.


Yup, my code is littered with WHERE true AND ...


10 years messing with data across 5 major flavors of sql and this is the format I've settled on and advocated for my team. It seems to flow the best and be easiest for people to get used to.


I've found that using hanging indents and using prefixed binary operators, like you did here, is the most readable form and when adding additional clauses, makes for a readable diff.


I find it more readable as

  SELECT a.foo, b.bar, g.zed
  FROM alpha a
  JOIN beta b ON a.id = b.alpha_id
  LEFT JOIN gamma g ON b.id = g.beta_id
  WHERE a.val > 1
    AND b.col < 2
  ORDER BY a.foo
Usually only the conditions get deep and can also use extra indented parenthesized parts.

reminder: don't use 'OUTER' it's pure noise


We are doing something very similar, except we add extra indentation to the joins like you did for the second condition in the WHERE. This is because we strongly separate each block - SELECT, FROM, WHERE, <ORDER>, so everything that is in the FROM block is indented. Same for the SELECT if it so long it goes to a second or third row.


I haven't seen my formatting pattern listed, so here is the equivalent how I'd write it:

  SELECT
      a.foo, 
      b.bar, 
      g.zed
  FROM alpha a
  JOIN beta b ON a.id = b.alpha_id
  LEFT JOIN gamma g 
      ON b.id = g.beta_id
      AND a.id = g.alpha_id
  WHERE 
      a.val > 1
      AND b.col < 2
  ORDER BY 
      a.foo;


My problem with formatting any code like this is that it can make diffs painful. I agree that this looks better but I would say only marginally so. And I really have no problems reading code that isn't lined up like this. I don't really have a high care level, though. I'm happy to go with the team on this one.


I don't see why you think it would make diffs painful. If anything, in my experience it makes diffs easier because each chunk can be put on it's own, independent line so that if you change anything it is constrained to the relevant line.


It makes diffs harder because maintaining the indentation rule (sometimes, depending on what is on other lines) requires changing every line of the query if you go from “INNER JOIN” (equally, outer/right/cross join) to “LEFT JOIN” (equally, full join).


Nearly every diff tool has -w for this, though: main main annoyance with GitHub is that I can’t enable this as the default diff mode.


Yep—`-w` is the default when I blame in my editor but ya, github is really the problem.


At least GitHub now has a UI for enabling it. I remember the dark ages when you had to put ?w=1 on the URL like some sort of animal.


The indentation rules never change. Fortunately, "SELECT" at six letters is as long as the longest first word that starts a clause, which is why when doing, for example, a "LEFT JOIN" you line up the "LEFT" and not the "JOIN, e.g.

  SELECT a.foo
    FROM alpha a
    JOIN beta b ...
would become

  SELECT a.foo
    FROM alpha a
    LEFT JOIN beta b ...
Any diff tool correctly highlights the only change is the LEFT.


Ya, was thinking after I submitted that SELECT and DELETE are always going to be the longest anyway, so it always works for SQL!



Jetbrains tools - specifically, PyCharm & Intellij IDEA - support this (gutter alignment) via the "Joe Celko" code style. With that enabled, you can autoformat you SQL to follow that convention.

Oddly, DataGrip doesn't provide this option.


Would it be rude of me to ask you to open an item on https://youtrack.jetbrains.com/ for this please? I'm just a curious JetBrains DataGrip user.


https://www.sqlinform.com/ is an excellent SQL formatting tool, where you can tweak the settings to whatever extent you like. I keep it around in notepad++ just for formatting SQL the way I like, even when writing/testing the SQL in VSCode or a DB tool.


I've taken to using a similar format too, though some seem to dislike it significantly. Other things I like for clarity and editing ease are prefix commas and lining up like parts, using something like your example:

       SELECT a.foo
            , b.bar
            , g.zed
         FROM alpha a
         JOIN beta b  ON a.id = b.alpha_id AND a.another = b.thing
    LEFT JOIN gamma g ON b.id = g.beta_id 
        WHERE a.val > 1
          AND b.col < 2
        ORDER BY a.foo
or

       SELECT a.foo
            , b.bar
            , g.zed
         FROM alpha a
         JOIN beta b  
              ON  a.id      = b.alpha_id 
              AND a.another = b.thing
    LEFT JOIN gamma g 
              ON b.id = g.beta_id 
        WHERE a.val > 1
          AND b.col < 2
        ORDER BY a.foo
I'm not consistent with the layout of my joining predicates - I go for whatever seems clearer given the current circumstances and that varies due to several factors (number of parts, length of column names and/or functions, ...). How sub-queries and instances of CASE are broken into lines and indented is something I also vary on.


> though some seem to dislike it significantly

I can see why. The indentation of the whole statement is not determined by the first line, but by the 6th on the first and the 8th on the second on a `JOIN` clause. It's really arbitrary, and when you have that statement between other code, it's going to be weird how the start of the statement is much more indented than its preceding code. I really dislike it, too.

I prefer the use of indentation to signal what's inside another syntax structure. So, for example, I also dislike how you aligned `ON` and `AND` when the `AND` is inside the `ON` expression. It makes it seem like the two lines are on the same syntactic level.

Here's how I do it:

  SELECT a.foo
       , b.bar
       , g.zed
    FROM alpha a
    JOIN beta b  
      ON a.id = b.alpha_id
        AND a.another = b.thing
    LEFT JOIN gamma g 
      ON b.id = g.beta_id 
    WHERE a.val > 1
      AND b.col < 2
    ORDER BY a.foo
You might also notice that I removed the padding you used to align the `=` signs. I dislike big changes where the only thing that changed for a line is the whitespace padding. It obscures the real change. It might not seem like a big thing when you only have 2 lines in alignment, but it's a real bother when reading a diff that does that for more lines. You have to compare between the - and + lines to find what really changed instead of the diff telling you outright.


> and when you have that statement between other code

In many circumstances (chunks of SQL in SSIS steps and so forth) there is no code to be around. Within longer tracks of SQL I'll not let "LEFT OUTER JOIN" push the whole statement to the right, but will drop the table name to the next line and maintain the alignment from there on.

> removed the padding you used to align the `=` signs. I dislike big changes where the only thing that changed for a line is the whitespace padding. It obscures the real change

For modifying existing code, I will sometimes let the alignment slide to avoid unnecessary extra lines of change. Though a good visual diff tool will have the option to ignore white-space-only changes and I'm not concerned about patch sizes being a few lines bigger.

To fix the alignment there may be a subsequent check-in that tidies up non-functional elements of the code. I'm happy to have an extra item in history in order to maintain readability over a stream of unaligned text.

Something I've wanted for a long time in code editors is for tabs after the first non-whitespace character on a line to line up, that way such alignment could be managed without extra effort or extra lines in a diff. There would need to be some simple heuristics to control breaking the alignment (and these may need to vary between languages) and perhaps some way to control/override them (a directive in comments?) in edge cases.


Maintaining alignment in these queries seems a pain. I'd prefer the regular, newlines and fixed indentation; e.g.:

    SELECT a.foo, b.bar, g.zed
    FROM alpha a
    JOIN beta b ON a.id = b.alpha_id AND a.another = b.thing
      LEFT JOIN gamma g ON b.id = g.beta_id 
    WHERE a.val > 1
      AND b.col < 2
    ORDER BY a.foo
(bonus: "AND" got accidentally aligned with the end of "WHERE")


> Maintaining alignment in these queries seems a pain.

I use tabs.

    SELECT   t.foo, t.bar
    FROM     a_table t


This escalated quickly.


I like your second version. My own style, still evolving, is to write more lines and align further left

  select
    a.foo
    , b.bar
    , g.zed
  from alpha a
  inner join beta b on
    b.alpha_id = a.id
    and b.thing = a.another
  left  join gamma g on
    g.beta_id = b.id
  where
    a.val > 1
    and b.col < 2
  order by
    a.foo


I like the idea in general, I have tried something similar before. But I've never understood the appeal of leading commas. It screws up your alignment and just looks messy.


Personally I like leading commas for the ergonomics rather than the aesthetics. When I'm developing or debugging a query the first column is typically the one I'm least likely to change. I tend to build up the query from there, so the last columns are the ones I'm most likely to change or to comment out. Plus I find it easiest to interpret the result set when columns that I'm using as a temporary reference are at the very end of the row. So for the way I work, I've found that with leading commas I don't have to do as much futzing about with commas.

My job involves a lot of ELT pipelines though and the queries I'm writing are often to transform a client's data from whatever ill-conceived data model they've been using to a standard data model that we use for all clients. Those queries require a lot more "detective work" to get right than the queries that run against our standard data model. If I was just writing queries against the standard, I'm not sure I'd spend enough time developing/debugging to really notice any ergonomic benefit.


I hear this argument all the time but it makes no sense. Leading commas only help you comment out the last line. Any comma arrangement allows you to comment any intermediate line. It’s sacrificing readability and aesthetics for a tiny benefit on one row.


likelihood of commenting out last N lines >> .. first N lines

What happens is I want to comment out a join entirely, and all its columns. I rarely find myself commenting out the first column. I'd prefer trailing commas on every element, where allowed, to be sure! but leading isn't so bad once I got used to it.

Plus to scan for a missing leading comma is a linear (literally!) search whereas [missing] trailing commas don't line up.


You can easily add remove columns like this, just removing the entire line. If you add the comma the traditional way, you will change 2 lines (the end of previous line with comma, and the new line). It's nice for maintenance and diff'ing


That is only true of the first or last line.

  col1,
  col2,
  col3

  col1
  ,col2
  ,col3
You can remove col2 from either of those examples and have valid syntax.


Does that still look ok if you're selecting 10+ columns with functions, or would you split out the first line situationally?


Another commenter showed how this works:

  SELECT a.foo
       , b.bar
       , g.zed
    FROM ...
While the comma placement may seem weird, it makes this exactly identical to the "AND" or "OR" placement in WHERE clauses, and the primary benefit is that it's easy to comment out any column except the first.


It's always easy to comment out any column except the first or last. Leading commas make it easy to comment out the last column, trailing commas make it easy to comment out the first.

Personally I don't think that optimization is worth the price. Trailing commas look nicer visually so I prefer them.


> While the comma placement may seem weird

It's not completely unconventional. Haskell is typically styled with that kind of comma usage, too. For example,

  [ 1
  , 2
  ]

  { foo = 1
  , bar = 2
  }
Coincidentally, SQL and Haskell are the only languages I know that use `--` for comments.


FWIW, Lua and AppleScript (and HyperTalk!) use(d) `--` as well.

(Edit: I think I misread “know” as “know of”; whoops.)


> I think I misread “know” as “know of”; whoops.

Don't worry about it; you didn't misread. I wasn't aware of AppleScript and HyperTalk, and though I read a bit on Lua some years ago, I either forgot or never realized that it used `--` for comments.


With the curly braces this makes a lot more sense. In SQL it offends my eyes (personal preference) but here it seems more clear.


This is close to what I’ve settled on for 20 years. I’ll also indent again if there are a lot of joins/clauses in the on.


what about something like this? http://www.eslinstructor.net/vkbeautify/




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: