I really like SQL, and think its one of the most important languages a dev needs to learn (all you managers, you could learn some SQL too!). However to syntax is often very restrictive.
As an example why would WHERE clauses need to have a AND instead of multiple WHERE clauses. When working with big tables, its super annoying to select all but one column. So why not have syntax for this? Eg SELECT !username from users could expand to all columns without username.
I feel SQL has so many low hanging fruits from a user perspective, but its very slow moving, and the standard is even more slow to make progress.
So SQL with sugar that compiles down to vanilla SQL has a market imho. I will study PRQL more and see if i could use it on a daily basis.
> Eg SELECT !username from users could expand to all columns without username.
This should be landing in PRQL soon. See ongoing discussion here:
[an exclude clause, that would select all columns except whats specified #172](https://github.com/PRQL/prql/issues/172)
The syntax we seem to be converging to is like you suggested:
select ![foo,bar]
would select all columns except "foo" and "bar".
The aim is to support this at least for SQL dialects that support it like DuckDB, BigQuery, ... For other dialects we might just error out initially.
There is a goal of supporting working connected to a live database in which case the source table schema could be read and the appropriate SELECT clause generated. This would not be robust to schema changes such as future column additions though. At the moment the SQL is generated in an offline setting though so this is not possible.
If you are a PRQL dev i have a suggestion for you/your team.
Make PRQL language agnostic. In a real life scenario many people are writing SQL not in the app layer, but more ad-hoc for stats, reporting, analysis, examining the data etc.
What i mean by this is:
If i write an app that has embedded SQL i dont really care to much for the possible verbosity. This code will be used more than once, so im OK with taking the extra effort.
Also all languages has an query-builder / ORM so the benefit of something like PRQL is possibly not big enough to merit it as an additional dependency.
My suggestion:
Make PRQL a cli tool that can be used by allowing users to connect to a database in a similar fashion as something like usql (https://github.com/xo/usql),
I would find the most power in a tool like PRQL is this setting: Ad hoc SQL queries.
This would open the editor integration. With a tool like this i could open vim, and write PQRL in a window, and pipe it via something like Slime to a open PQRL session.
So something like a language server, but connected to a database and able to execute queries? That's an interesting idea.
While working on the compiler, I try to separate different components so things like this are possible. Currently, we are focusing on the language itself and also put some though into intermediate representation of the query (RQ - relational query). We are conservative about expanding the scope, because building database connectors and execution engines would take too much focus off the language and the compiler.
But I'd lie if I said that I don't want to dig into it.
But before that i would put some effort into the actual "repl/session" -like tool. Pipe a PRQL query to a PRQL session, and then the PRQL translates that to SQL, and returns the underlying response from the database.
If done right, this would open the possibility to build all sorts of cool editor plugins, that could directly interface with the PRQL session.
IMHO this is where a tool like PRQL would shine, as it would make writing SQL more enjoyable and way faster, but at the same time being both language and sql-dialect agnostic.
Seems a natural fit for a notebook UI. If a PRQL cell doesn't start with "from," just continue adding filters to the pipeline above. Would let you progressively build pipelines by adding filters and derivations, while previewing the data each step along the way. Split a cell to debug a pipeline at any point.
Thanks for the suggestion. I don't think I knew about usql. I completely agree with you and have been working on a cli tool called `prql-query` or `pq` at the command line:
In BigQuery you can specify columns to include as well as exclude. For example:
select s.*, t.* except (s.foo, s.bar, t.blah)
from s, t
where ...
I've found this extremely handy for avoiding duplicate column names when joining tables. For example, trying to join two tables that both have an "id" column can be really tedious without this syntax.
The column list for a table is available in the information schema, just build the query with the expanded column list, minus the ones passed in to ignore on the DBs that don't have native support. Using the Information schema makes it DB agnostic for "free"
SQL was meant to be like readable english, so two or more WHERE didn't made sense.
Also select all columns but one may not be future proof is column are added or removed, you really don't know what columns that query is going to read, useful for oneshot query, but not much more.
Meh, thats not a valid argument. What happens if i `select id, name from users` and later someone drops the name column. Similarly if i `select * from users` and use name the same problem/bug arises. Dropping a column without the application code refactor is obviously a developer issue, not a SQL one.
You're right that it's a "dev/dba problem", but the reason for not using `select !<column>` is exactly the same as not using `select *`. If you specify exactly what you want, there's less of a chance of your ORM-based app breaking because it received an extra or too few columns. I'm sure most people who work with SQL DBs have been bitten this before.
Also, while it could also be useful, over the years I've started moving away from things like using `!` to negate things. It makes it slightly harder to read and it ruins the symmetry of my code. And lord knows SQL doesn't need any more of that. It's probably the only language I use often where I have to place my commas at the start of the line; where I need to add a first filter as `1 = 1` etc just to make it easier to comment out things quickly.
What I personally want most is the ability to move the FROM clause to the top (this is a no-brainer) and better aliasing rules. The alias I used in my SELECT should be accessible everywhere in the statement.
Select * is something no one should be doing in real code.
Select [list of columns] should the way to query and doing so you know exactly what columns are read. That's one of the reasons I dislike ORM like hibernate.
I advocate the least surprises principle, select * or select !column are doors to surprises. What if someone add a geometry column with tons of data (real example happened to me)? You are going to read tons of data that probably you don't need.
The query itself does not matter. If i delete/rename a column the app WILL break no matter how much i would follow best practices.
I dont see `select !name from users` any worse than `select name from users`. In both scenarios the app will break if the name column is renamed/deleted.
To me the biggest thing against select * is readability. Basically you are obfuscating what you are doing and getting from the database. In trivial queries it does not matter, but when you read more complex procedures, it starts to be a bit annoying.
I think there are pros and cons to both approaches:
- Explicitly spelling out the column names has the advantage that you can search your code base for that name and see where it's used.
- When declaring a view which filters the rows in a table, using SELECT * would simplify maintenance, as you wouldn't need to update the view when new columns get added to the table.
But delete/rename columns are not the only operations, nor the most common. Adding columns is the most common, and there only one of those options breaks
DROP TABLE IF EXISTS and CREATE TABLE IF NOT EXISTS are counterexamples to that statement. Notice that suffix operator IS NULL is present in expression part, but these statements do not use that kind of grammar.
I was talking about original idea, current Sql evolved and in many ways is a monster, but multiple WHERE instead of AND operator make little sense to me
Sure, multiple ORMs and query builders allow this, but when writing pure SQL thats dynamic its annoying to have to check for some condition before knowing "Should i use WHERE or AND next". This is traditionally solved with an hard coded WHERE 1 = 1 in the beginning of the query. Still its a wart, even tho a small one.
Point being, there are many other small things like this that (can) make SQL a burden, eg the select all but <n+1> columns.
> why would WHERE clauses need to have a AND instead of multiple WHERE clauses
Semi related. Python's for comprehensions allow multiple if's.
some_list = [ x for x in things
if x % 2 == 0
if my_function(x) > 0
if some_other_global_predicate()
...
]
And this is shown in the python docs.
I've had PR's rejected for using this since they wanted "and". Drove me up a wall as this IMO reads as easily if not more easily than the "and" version, and (again, IMO) is more for humans where the "&&" is more for the computer.
> When working with big tables, its super annoying to select all but one column
What's the use case for that? In fact, I can hardly understand the benefit of projecting at all, except at the very end. Why would you want to do it? Is there a performance benefit?
I would love this addition in SQL. My work is largely exploratory with SQL. 99% of my queries will never get saved anywhere.
At the end of the day it's just syntactic sugar, but sometimes when writing reports you sometimes don't want to include a single column like a sequential key, because it's not relevant information to the end user or exploration.
Another reason is that some text columns are long enough that just absolutely destroy readability in things like psql. Often I just want to get a feel for what's in the data and removing those annoying text fields to make psql readable is a pain. You have to run a `select * from table limit 0`, copy that into a text file, then `s/ +\| +/, /g` the columns list, then remove the annoying field... then throw it back into your query. Takes about 45 seconds to do all that and that adds up!
Right, I understand why you'd want to project at the very end of the query, i.e. suppress some columns from being shown. Other than that, I don't see the point, unless there's a performance impact. Keeping columns around doesn't actually change the result of the query in any way.
You've definitely missed the point because you're concerned about the "very end of the query". I'm describing both a reporting and more importantly as far as I'm concerned an explorative process that's disrupted by the silly need to list out all columns. Performance is irrelevant.
Yes, I know I've missed the point. That's why I asked my question in the first place! My question was:
> What's the use case for that? In fact, I can hardly understand the benefit of projecting at all, except at the very end. Why would you want to do it? Is there a performance benefit?
and I meant it literally. It's not an oblique, rhetorical way of saying "you don't need to do this". It was a genuine question!
You've answered that it's convenient when doing exploratory work. Thank you. That helps me understand better.
As an example why would WHERE clauses need to have a AND instead of multiple WHERE clauses. When working with big tables, its super annoying to select all but one column. So why not have syntax for this? Eg SELECT !username from users could expand to all columns without username.
I feel SQL has so many low hanging fruits from a user perspective, but its very slow moving, and the standard is even more slow to make progress.
So SQL with sugar that compiles down to vanilla SQL has a market imho. I will study PRQL more and see if i could use it on a daily basis.