Hacker News new | past | comments | ask | show | jobs | submit login
PRQL: Pipelined Relational Query Language (github.com/prql)
519 points by animal_spirits on July 25, 2023 | hide | past | favorite | 209 comments



For me the examples on the website https://prql-lang.org/ are the biggest selling point for PRQL, in particular the SQL it generates. It looks clean, straightforward, something I would've written myself.

In general, I like this slightly more careful take on modern database development. 10-15 years people would start a brand new database like Mongo, or Riak, or Influx, or whatever, and would try to convince application developers to select it for new projects. But recently we started getting more conservative options like EdgeDB, TimescaleDB, or even PRQL which all expect us to run Postgres with some addons and / or query preprocessors. Tech like this is so much easier to adopt!

I'm really liking what Edge folks are doing with schemas and migrations, but I do find PRQL syntax much more intuitive. My application code is littered with data transformation pipelines already: all these map / filter chains in TYpeScript, iterators in Rust, enumerables in Ruby, streams in Java, LINQ in .net, Rx in dozens of languages etc. etc. So the concept is very, very familiar. Getting nice SQL queries out of PRQL that I can store, inspect later, see the query plans, add indices where necessary, is just great. It's such a good tool!


To me it seems quite nice, but really just trivially different from SQL - like if Ruby was 'friendlier syntax that transpiles to Python', meh? You'd use whichever you happened to learn first and not bother with the other. (That's often true even though it's more than that of course.)

The examples arbitrarily make SQL look more verbose:

    SELECT
      id,
      first_name,
      age
    FROM
      employees
    ORDER BY
      age
    LIMIT
      10
Yes! Of course I'd rather:

    from employees
    select {id, first_name, age}
    sort age
    take 10
..but wait, actually the SQL could've been:

    select id, first_name, age
    from employees
    order by age
    limit 10
and it's more verbose by a character or two... (no braces, but 'order by' vs 'sort')


It's not the tiny changes in the syntax/order that matter here. It's that if you want to limit the columns or do something else with them afterwards, in PRQL you append that text to the end. In SQL you'd have to wrap it around your original query, it inject joins in the middle. It doesn't read well anymore, because "how" you're doing things eclipses "what" you're doing.

For a simple select, this just doesn't matter. But you also wouldn't bother with PRQL for simple selects.


In python I use sqlalchemy core to build the SQL for me, but I manage the transactions myself. Sqlalchemy query builder offers a similar role to what you described, as in I can add another select or where field after I have already added a sort criteria. It’s clever enough to order these correctly based on the SQL dialect.


I use CTEs, window functions, and groupings all the time when I write reporting queries. These things tend to be much more verbose in raw SQL, and ORMs / Query Builders either do not support some of these features at all or do very poor job (like, force me to put raw SQL substrings in my code), or force to write DSLs that are even more verbose than raw SQL. Look at corresponding PRQL samples, and you may see an appeal.

Also, I agree, no one should write SQL like this - screaming keywords, superficial newlines, etc. I don't think this style made sense ever, even in 1970s.


While there's some stuff in C#/LINQ/EF that's more verbose (left joins are often a nightmare) or not-supported, I'll always say that I prefer writing queries in EF than in SQL, at least when dealing with SQL features that are supported by EF (which is a lot of them, it's a very expressive dialect).

But EF lets you start with FROM, lets you do whichever keywords you need in whichever order (instead of WHERE -> GROUP BY -> HAVING and you've got to CTE or Subquery if you want another GROUP BY). It also lets you access the members of a group because the objects are still treated as a graph instead of being pulverized into a flat table like SQL does. It also makes your FKs into first-class navigational properties of the table.

Like, if I have an addressID and I want to get its country code?

In MS SQL that's

    SELECT CountryCode
    FROM Country
    INNER JOIN ProvState ON ProvState.CountryID = Country.ID
    INNER JOIN Address ON Address.ProvStateID = ProvState.ID
    WHERE Address.ID = @AddressIDParam
In EF that's

    db.Addresses
      .Where(a => a.ID == addressIDParam)
      .Select(a => a.ProvState.Country.CountryCode)
EF has a hell of a lot of flaws, but linguistically I love it. Yes there's a lot of aliasing boilerplate in EF, but the ability to walk the foreign keys and the fact that you can put the select after the table name pays off so very well.

Also there's a dialect of LINQ that looks more like SQL but it's kind of weird and I don't love it so I prefer to use the lambda syntax above.

In that dialect, it's

    from a in db.Addresses
      where a == addressIDParam
      select a.ProvState.Country.CountryCode
which is even more terse and SQL-y although I find it a weird linguistic mash-up.


The downside though is you have to grab the sql it’s generating somehow to try to optimize it, figure out what crazy cross apply madness it’s going or to figure out why it’s blowing sql servers recursion limit.

I prefer to avoid linq syntax now. It’s a false economy.


In my experience, it quite rarely uses cross apply, and typically only for functions where you're doing complicated join filtering and trying to extract full EF entities.

If you don't use linq you can't get EF entities anyway, and if you don't need EF entities you can can still use linq to get a projection and avoid the cross apply! The worst case scenario often ends up being writing a linq query to get your primary keys, then loading entities just using the list of keys.

I've used linq for years and it makes the basic and intermediate stuff way simpler, and the complex stuff no more complex. In hundreds of thousands of lines of code, we have exactly TWO queries where it ended up simpler to hand write SQL, and EF did not prevent us from doing that!


EF for some time (since the "Core" reboot) has made things like "crazy cross apply" "opt-in only" and instead you get runtime exceptions for LINQ query shapes it thinks can only be done that way. Some of those crazy things early days EF silently supported still aren't even possible in today's EF, for mostly better (but sometimes rarely worse), let alone opt-in behaviors.

I think more than ever current EF generates SQL that straightforward looks like you would expect given the LINQ you've written, with only surprises being runtime exceptions before it ever tries to confuse an SQL database with a horribly deoptimized query.

Today's EF also has more tools than ever to optimize its queries. It logs query text by default in some debug contexts to visible debug consoles, and to certain telemetry providers when errors occur in certain release/Production contexts. It's easy to opt-in to such logging in even more contexts should you desire that. (Thanks to modern .NET's ubiquitous System-owned ILogger infrastructure and the modern "Generic Hosts" approach.) For advanced scenarios, intercepting specific LINQ queries and enhancing/optimizing their output based on domain knowledge of the queries or the specific databases is easier than ever with simple dependency injection.


I drew a different conclusion from similar experience. I avoid navigation properties and other advanced mapping features, so an entity maps flatly to one table.

The LINQ queries will be more verbose as you'll need to write the join and group clauses explicitly, but I find it much easier to predict the performance of queries since the generated SQL will look almost exactly the same as the LINQ syntax. It's also less likely to accidentally pull in half the database with `.Include()` this way.


Oh I agree the actual mapping framework where it converts the EF/LINQ is deeply flawed. But that's the underlying implementation.

I just mean on a linguistic level, I tremendously prefer EF/LINQ/C# to SQL.


I have some ideas how to implement a high performance ORM (yes that sounds strangey doesn't it?) that makes extensive use of CTEs, window functions and temporary tables. Writing reports in that ORM would be downright trivial.

From my perspective the JPA speciation is downright harmful. There can never be a good JPA implementation. I don't understand how there are no Java ORMs that actually learned anything from the mess that Hibernate etc are.

Take a look at how nasty even some of the most trivial concepts like calculated properties.are: https://blog.eyallupu.com/2009/07/hibernate-derived-properti...

In my hypothetical ORM design, calculated properties would be easy and fast by default.

Also, the CORBA style lazy loading based on proxies is ridiculous. As if people truly wanted a landmine like that to blow up on them.


At least we can all agree on leading commas though!


death before trailing commas


Well if it was after they wouldn't be trailing would they?


for the win


I agree that I think the difficulty with the uptake will be "it's really just less annoying SQL", and it's hard to overtake a technology that's so ubiquitous when your fixes are really "nice to haves" vs something truly transformational.

That said, it's not the succinctness that's an improvement, it's that the pipelined nature of PRQL really maps much better to how people should think about queries, and also how the server executes them! Something as trivially simple as putting the from clause first means I can now get much better typeahead/autocomplete support from dev tools. Heck, I already do this now in a slightly more annoying manner: I write "select * from some_table ..." first, then I go and type out the actual columns I want because once the from clause is correct my IDE will autocomplete/show errors for possible columns.


> the pipelined nature of PRQL really maps much better to how people should think about queries

I disagree. Database engines take SQL and transform it into an execution plan that takes into consideration database metadata (size, storage, index analytics, etc.). Queries should be thought of with a _set based_ instead of _procedural_ approach to maximize the benefits of this abstraction - diving into the implementation details to guide the execution plan formation only when necessary.

Also, the pipeline approach could be achieved with common table expressions (CTEs), right?

That said, I think PRQL looks promising because it is a solid attempt to make RDBMS development more approachable. I also like that `from` comes before `select`: it is far more readable. A solid and modern IDE experience for PRQL could be a "killer app".


I disagree. I find it extremely hard to reason about large queries as set transformations, whereas it is much easier to break it down to "first this, then that". And this is long before I've even started writing my first line of SQL.

So let me write it procedurally and have the optimization engine fix it for me, just like how it fixes my SQL.

Even SQL queries are often better understood procedurally. Take this one [1]:

    SELECT article, dealer, price
    FROM   shop s1
    WHERE  price=(SELECT MAX(s2.price)
                  FROM shop s2
                  WHERE s1.article = s2.article)
    ORDER BY article;
That inner WHERE clause doesn't make sense in my opinion, unless you think of it procedurally as for each row in s1, ask do a search for the highest price amongst all items that share article number.

[1] https://dev.mysql.com/doc/refman/8.0/en/example-maximum-colu...


Completely agree, thanks for putting it better than I could have, with an excellent example. Correlated subqueries like the example you give, or similarly lateral joins in postgres, fundamentally are treated like for loops by DB engines anyway.

Semi-related, but the example you give is also why I love Postgres' "DISTINCT ON" functionality (I don't know if other DBs have something similar) - it makes it so much easier to reason about these "give me the 'first' one from each group" type queries without having to resort to correlated subqueries.


queries like these are best suited for window functions, although I am not sure Mysql supports it:

  SELECT article, dealer, price FROM (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY article ORDER BY price DESC) as rnk
    FROM   shop s1
  ) sub 
  WHERE sub.rnk=1
  ORDER BY article; 
this query will be a single pass over table without loops/joins


This is the "set based" approach for the MAX: there does not exist a bigger element:

  SELECT article, dealer, price
    FROM   shop s1
    WHERE  NOT EXISTS (SELECT 1 FROM shop s2 
                         WHERE s2.price > s1.price AND
                               s2.article = s2.article)
    ORDER BY article;


Unpopular opinion.

The uncorrelated example should have been rewritten with a CTE and should have been aliased as 'article_max_price' as if it was a computed property and where price = amp.price


That's a really simple example though. I think the real pitch for PRQL is that the syntax is way more regular- the group operator just runs a normal pipeline on each group and multiple derivations with different operations between them don't need to all be plotted out in advance, just as an example. SQL suffers because it's not really composable, especially once you get outside of specific modern versions.


Being able to pick the source (i.e. the FROM clause) first is useful in itself, as you then get the benefit of sensible auto-complete suggestions.


Even though a small example doesn't highlight the advantages well, I still prefer the PRQL syntax in this because that reads like map/filter code I'd write in some programming language.

So the mental model seems easier to me


"Order by" is a much better name, by the way. I'm permanently slightly annoyed by the fact that in programming it's a custom to call ordering "sorting" for some unimaginable reason.


I don't get that - to me the examples are much less readable than SQL and I don't understand why I should want to use this. Like, yes, you can reorder the query sections, which seems to be everyone's complaint about SQL, but then you also have multiple types of brackets, colons and other syntax for no reason, all while not really accomplishing anything SQL doesn't already do.

What's the attraction?


If you're happy with SQL then there isn't much point.

For the folks building and supporting PRQL, SQL just has a few too many warts and the popularity of tools like Pandas, dplyr, Polars, LINQ, ... shows that for analytical work we often like to work with our data differently. Other frameworks and languages feel that we should throw out Relational Algebra as well but we feel that's like throwing the baby out with the bathwater.

PRQL's core tenets are that Relational Algebra is foundational and a great abstraction but what's needed is a modern, ergonomic language that allows us to work with data the way most of us conceive of it - as pipelines of transformations.

Personally I feel much more productive in PRQL. I can start with a dataset and append one transformation at a time, writing from top to bottom as I go along without having to jump back and forth between the SELECT and the WHERE and the GROUP BY clause etc... .

Also, if I want to take out a step, I can just comment out that line and the rest of the pipeline usually still works. This might seem like a minor thing but in my experience it's those kind of ergonomics that make the difference in actual day to day work rather than fancy theoretical features you only use once in a blue moon. It's therefore worth noting that this was an intentional design decision. You try and take out some steps from your SQL query and see how well the rest of your query holds up.


For what it’s worth, it looks really readable to me. I have decades of sql experience at this point so consider myself pretty proficient but I can see the appeal of having a terser syntax for transformations. I especially like the “it just makes sql” approach.

Stepping through the second example on that page I know how I could do the same in sql, and I also know that it would be harder for most people to follow.

Question about that example though - why is there an aggregate without an assignment to a variable (average gross_salary) like the other examples?


I couldn't find that example right now but there is a similar line in the main example on the prql-lang.org homepage:

    aggregate {
        average total,
        ...
    }
I can't definitively say why it is there, other than perhaps just to show that you can specify aggregations without having to give them an alias. The column name won't be pretty but if you're just interactively trying something out and want to see the results then you probably won't care.

Does that help?


Gotcha. The thing that’s not immediately clear from the syntax is which columns I’m getting out as a final result of my query. I guess the last select you ran + any derived ones since then?


That's right. It can be a bit hard to keep track of but the rules are quite simple. You start with all the columns from a `from`, then any `derive` just adds columns. A `select` restricts the columns to those selected and a `group + aggregate` replaces the columns with those from the `group` + those from the `aggregate`.

I once tabled a proposal whether each query should be forced to end with a `select` to make the final column list explicit. This could be generated by the compiler. It was felt that that wasn't necessary though and would also be somewhat arbitrary since you also need to know what columns are available at each previous step of the pipeline if you want to be able to make modifications. As the tooling improves, you could perhaps be shown the current list of columns as you hover over each line?


There’s nothing stopping people from using their own convention of selecting early to grab the stuff they need and selecting again right at the end to be explicit about what they want. Like select *, it’s better to just be explicit.


Disclaimer: I'm a PRQL contributor.


The attraction is that, especially for more complicated queries with complex joins, subqueries, aggregates, etc., that the structure of PRQL much more closely matches the "English", mental-model of the query. I can just read PRQL linearly, from top to bottom, and at each point I know what is being "pipelined" into the next phase of the query.

With SQL I have to look all over the place. And it's not just that FROM should come before SELECT, it's that if I'm doing, say, an aggregation with a group by, normal SQL doesn't really have anything to make me think about the ungrouped rows, and then merge them together to get the aggregate values. With PRQL I can just go top to bottom, and for me it's much easier to reason about (i.e. first get all the rows, then group by some specific columns, then take aggregates, etc.)

And I say this as someone who spends about half my days in SQL at present.


The downside is, debugging what actual sql this monstrosity spits out.


Two points:

1. The SQL that I've seen output by PRQL isn't monstrous, at least no more than it would be coding by hand.

2. I do agree that I don't think PRQL will have much uptake until it is baked in to DB systems, at least as a plugin. One reason (among many) that I hate ORM systems is that with native SQL it's very easy for me to just capture a SQL statement from my DB logs and then search my code for an offending statement. But given how PRQL is transpiled, I don't see any reason it couldn't be essentially part of the query planner, so that query logs etc. are reported in PRQL and not SQL. E.g. the query planner already takes text (SQL) -> Query Plan, it seems rather straightforward to go text (PRQL) -> text (SQL) -> Query Plan, with the SQL intermediate just being an implementation detail I don't have to care about as a DB user.


The attraction is something that was designed after decades of usage and PL research. Consistency of syntax is a big one for me. A favorite example of mine:

  SELECT substring('PostgreSQL' from 8 for 3);
  SELECT substring('PostgreSQL', 8, 3); -- PostgreSQL-only syntax
  SELECT trim(both from 'yxSQLxx', 'xyz');
  SELECT extract(day from timestamp '2001-02-16 20:38:40');
Taken from: https://www.edgedb.com/blog/we-can-do-better-than-sql

Maybe if SQL would give me that monumental ask of trailing commas, perhaps I would hate it less.


I think main difference is how PRQL translates into query execution plan, with SQL you need to read entirety of query to get rough understanding of order of operations (especially if you join two-three tables and have some nested and lateral queries).

with PRQL I see that the order will be explicit and set up by developer, so any footguns will be evident.

things like predicate push down, optimization fence, variable rewrite, etc are not needed to be relied upon, because pipeline of query is more explicit in PRQL

also since it is new lang, it can be naturally extended into ETL type data pipelines

also because PRQL can be translated into query execution plan - it can be converted into non-SQL languages like MongoDB or pandas / spark / etc, eliminating SQL altogether for distributed nosql engines


Is that true? Doesn’t this just effectively translate into sql to be executed on the db as usual if you’re running on an rdbms?

Looking at the examples and the sql generated in the playground you’re just going to have the query planner doing its thing to decide how to execute eventually anyway.


It transpiles into sql currently, but my understanding it is not long term vision.

Language arch allows in the future to replace sql for querying data


Maybe you didn't see the more complex examples, but some of the SQL queries it abstracts are very unintuitive to write by hand. PRQL turns it into something more readable and closer to the expected mental model.

For people who can write error-free and complex portable SQL queries by heart this may not be really interesting. From experience, however, that is not a skill most developers have.


I don't use PRQL but I absolutely get the appeal but specifically on the readability part, some things that are easy in PRQL are just awful in SQL.

From the website for instance this is a nightmare to do in SQL:

    from employees group role (sort join_date take 1)


Unfortunately the linebreaks were lost and, as shown that isn't a valid PRQL query. It would have to be either

    from employees
    group role (
        sort join_date 
        take 1
        )
or

    from employees | group role (sort join_date | take 1)
In English:

    Take the 1st employee 
    by (earliest) join_date 
    for each role 
    from the set of employees


ClickHouse:

SELECT * FROM employees ORDER BY join_date LIMIT 1 BY role


Isn't this a fairly simple way of doing this? That said, it is a bit non-obvious if you haven't seen it before.

    select earliest_joiner.* from employees as earliest
    left join employees as earlier on
      earlier.role = earliest.role
      and earlier.join_date < earliest.join_date
    where earlier.id is null
    order by earliest.join_date


What is the generated SQL of that expression?

This is indeed a sticky problem, one that usually requires a subselect or other workaround to address the non-determinism of group by + order by; i.e. one cannot simply "select * from employees group by role order by join_date limit 1" and be guaranteed to get the expected ordering.


What does it mean? Is it the same as "LIMIT BY" in ClickHouse?

https://clickhouse.com/docs/en/sql-reference/statements/sele...


It picks the first result grouped by role and sorted by join date. I believe this can be expressed with limit by in ClickHouse.


I mean if you can look at the "expressions" example on the homepage and say the SQL is more readable than PRQL, then more power to you. Hell, actually more power to you, that's extremely impressive. But I think for many people, especially programmers more used to parsing brackets and nesting than pseudo-english, PRQL is easier in the places where you're trying to compose.


Oh, I'm a big fan of raw SQL, too. There's very little out there that can bring me same joy as a well-written 30-50 lines-long sql query! Like, I can spend an hour or two on it, but when it runs and produces the results in the desired form, it's sooo rewarding! And realizing that producing similar results in my main language (be it Rust or TypeScript, or whatever) would take me 3-4 days instead (and it would run much longer) makes me appreciate SQL programming more and more. In terms of "usefulness per minute spend" nothing comes close to SQL!

However, the syntax can be a bit clunky from time to time. I'm very fortunate that I use Postgres only in the past 5-7 years, so CTEs, and things like Timescale or JSONB, or PlV8 / PlRust for stored procedures are often on the menu. Yet, simple things still require repeating fragments of code from time to time, and complex grouping or window functions code often looks too convoluted.

And as I wrote in my GP comment: I (and many other developers) already use pipelines for data manipulation in code - beyond databases. Say, we got data from an API endpoint, or we mixing data from several data sources. SQL is familiar because I see it often, PRQL is familiar because I use similar pipelining concepts often in my no-SQL code. Would I use PRQL for super simple queries? Probably not. Would I be upset if one of my coworkers used it for simple queries? No, why would I? Would I try implementing complex reporting query using PRQL? Yeah, I would. Partially because I suspect I would get to my data in desired form quicker with pipelines, and partially because I know: if I get stuck I would convert my intermediary query to SQL and pick up from there.

For me PRQL looks better then most DSLs for ORMs or Query Builders in most languages. Adding a new tool to a project may be annoying, but depending on a project I'd rather pay this price once if it makes my less SQL-savvy team mates more comfortable. "Yet another syntax" is obviously the main point against, but from the application developer perspective ORMs / Query Builders often force one to learn "yet another syntax" anyway. "Don't use ORM / Builder" is an often voiced opinion and yet in practice we work with them all the time, too.

So, I view PRQL not as an alternative to SQL but as a third database communication dialect in addition to SQL and my ORM / Query Builder DSLs.


it seems like any SQL parser could let you put FROM first and solve a lot of the annoyances of SQL


The stiff syntax is a feature. The infamous 1000 line sales report query is already a bear to maintain. If you get a clever developer reordering syntax it will only be worse.


It’s certainly not a feature. A lot of important SQL usage is ad-hoc queries, and they are more annoying to type than they should be.


I don’t understand how changing the order of the clauses makes a query easier to type.


If you have auto-complete it can see what table you are using and complete the column names. Starting with SELECT it could be any column in the database.


Huh. I have been doing this for like 15 years and never have a problem. Text editors like Sublime Text suggest names based on the content of the file. SQL Developer, SQLWorkbench, and DataGrip all seem to handle it just fine.


It will suggest names that aren't in the table you are going to query unless it is psychic.


It is psychic in the sense that it makes educated guesses based on previous entries and the current database and schema. This is the same kind of educated guess it would make with a table name.


I see this complaint stated over and over again, how hard is it really to type SELECT * FROM x a and then go back?


Is there any downside at all to putting FROM first?


Disrupting decades of inertia to placate a small humber of unskilled novices.


We have recently merged PRQL support into ClickHouse: https://github.com/ClickHouse/ClickHouse/pull/50686

It's currently more like an experiment - I'm not sure if it will be usable or useful. There are some concerns about Rust, although minor: https://github.com/ClickHouse/ClickHouse/issues/52053#issuec...


This is awesome! Thank you!

Would that mean it would also be able to use it in clickhouse-local?


Yes:

    clickhouse-local --dialect prql
or:

    $ clickhouse-local

    :) SET dialect = 'prql'


I really want this to take off and become a first party supported language for Postgres. Yes, yes, relational algebra is beautiful and all that jazz. SQL is a garbage, first pass design that should have been replaced decades ago. Even Codd has complaints about it. It is amazing what he invented, but we have learned a lot about PL design since then.


People really like to associate relational algebra with SQL, probably because they learned them one alongside another. But SQL is really terrible relational language - it breaks a few core concepts in different places, like relations being unordered sets, that you can ORDER BY. This bubbles up as relations losing ordering after being wrapped into a subquery, which is really unexpected.

PRQL has a data model very similar to the relational one. The only big difference is that relations are ordered - they are defined as arrays of tuples. So let's hope that PRQL gets to be known as "the relational language Mk II"


Seems like you'd have to break quite a few relational model concepts in order to retain compatibility with SQL, no?

Ordered tuples like you say, but also duplicate tuples. And null values.


Not quite what you're asking for but DuckDB has both PRQL [1] and Postgres [2] extensions, so you could probably query your Postgres database with PRQL from there.

There's also a DBeaver plugin [3] which we still need to document better and simplify the usage of but you could potentially also use that to query Postgres with PRQL.

Finally there is pyprql [4] with which you could query Postgres from a Jupyter notebook.

[1]: https://github.com/ywelsch/duckdb-prql

[2]: https://duckdb.org/docs/extensions/postgres_scanner.html

[3]: https://github.com/PRQL/prql/issues/1643

[4]: https://github.com/PRQL/pyprql

(Disclaimer: I'm a PRQL contributor.)


By first party I assume they mean using psql and postgres's own tooling.


Relational algebra is still great, it is the sql that made an unbelievable mess out of this beautiful idea.


Same. I first learnt SQL 20 years ago as a teenager and even back then I remember thinking how odd and, quite literally, backwards the language was. Back then I probably thought I just didn't understand enough to see why it had to be so. Now I know there is no reason. SQL has become more like a natural language. There's no arguing against it, you have to speak it even if it doesn't make sense. But it's not a natural language and we can do better.


It’s optimized for reading and use: the first time you see a query written by another developer, the interesting bit is what data it returns (the select part). Once you know that, you may be interested in the how, where does that data come from? Same for functions or methods, you first see the signature, input and output structures, before you see the implementation.


Good point, I didn't think of it like that and, oddly, I do often advocate for writing things "backwards" from highest level of abstraction to lowest. Maybe there is a good reason SQL has stuck around.


Yeah, I have never liked that you choose what you are querying before you select the source. I think the formatting here is so much more intuitive.


If the main complaint people have about SQL is that you can't swap SELECT, FROM and WHERE, then that's pretty good for a language designed in the 70s.

This, by contrast, looks like it has a bunch of random line noise for syntax. Why on earth should I like this:

`join side:left p=positions (p.id==employees.employee_id)`

better than this:

`LEFT JOIN positions AS p ON p.id = employees.employee_id` ?


The main issue with SQL is that you are stuck in a very strict way of writing things, which does not clearly match to how I think. The top-down way of writing PRQL where each step is simply a transformation of the previous one makes way more sense to me. SQL is something I'd need a reference manual for, PRQL is simply writing down what I want the query to do.

I do agree that PRQL's join syntax is extremely bad, though. They should've stuck to explicit "left join"-like keywords, and the alias & join column shorthand could be done better.


Despite it not at all being the right tool for most jobs, this is why I actually enjoy writing spark. It reads top down, and is “sql-ish” for most commands. Enough that you don’t need to go out of your way to learn it if you already know SQL, you just think in terms of serial transformations.


I've had two real gripes with SQL. The rest of it has been, as you said, pretty good.

Complaint 1: Not being able to use selected columns later in the same select.

    SELECT
        gnarly_calculation AS some_value,
        some_value * 2 AS some_value_doubled
Instead:

    SELECT
        subquery.*,
        some_value * 2 AS some_value_doubled
    FROM (
        gnarly_calculation AS some_value
    ) AS subquery
Complaint 2: Not being able to specify all columns except. This combines with the above, where I have to pull some intermediate calculations forward from a subquery, but I don't need them in the final output. So I have to then enumerate all the output columns that I actually want, instead of being able to say something like `* EXCEPT some_value`.


Both complaints are resolved by ClickHouse.

This video also covers many other advantages of ClickHouse's SQL dialect: https://www.youtube.com/watch?v=zhrOYQpgvkk

Some may find your first complaint questionable... But I specifically designed ClickHouse SQL to allow aliases to be used and referenced in every part of SQL query.


So unsurprisingly based on my gripes, we are an OLAP shop. I've never heard of ClickHouse before, and I'll be looking into it. We're doing OK on PostgreSQL, but just OK. A lot of learning and unlearning.


Don't use *. It is a common source for all kinds of problems. Many query langs does not support it at all for that reason.


I'm quite aware of your advice regarding using `` against table. However, I'm talking about using `` against subqueries and CTEs where the pieces of the table have already been extracted.


This is supposed to read:

I'm quite aware of your advice regarding using `*` against tables. However, I'm talking about using `*` against subqueries and CTEs where the pieces of the table have already been extracted.

I was unable to edit this earlier due to HN being unavailable after I realized the formatter mangled it.


PRQL fixes this.

(Disclaimer: I'm a PRQL contributor.)


Not only PQRL fixes this. It has been allowed in ClickHouse's SQL since its inception.


That's awesome! ClickHouse is a great system by all accounts and I've been meaning to try out ClickHouse Local.

I'm more familiar with DuckDB and they've also been doing some great innovation on the SQL front. I don't know offhand if they can also do the forward referencing thing but they allow putting the FROM first and having GROUP BY ALL etc..

It's great to see all this innovation happening in the SQL and Query Language space more generally at the moment.


It is strange to hear about innovation in DuckDB - I see that they are gradually re-implementing the stuff already existing in ClickHouse. Sometimes they do a better job at promoting it.


They also do bunch of stuff that ClickHouse can't do, like correlated subqueries [1] or recursive CTEs.

I kinda wish for a database with ClickHouse storage and DuckDB optimizer. At least my experience with ClickHouse is that MergeTree is incredibly good at what it does, but the optimizer hurts.

[1]: https://duckdb.org/2023/05/26/correlated-subqueries-in-sql.h...


That's probably true but the big differentiator was that DuckDB can run in your python process so there's very low fiction to adopt it.

My impression of ClickHouse was that it was more like postgresql in that regard, i.e. OLAP : OLTP as ClickHouse : Postgres as DuckDB : SQLite.

clickhouse-local may have closed the gap on that though. Can you embed it in Python as a library?


> Can you embed it in Python as a library?

https://github.com/chdb-io/chdb

    pip install chdb


Maybe it's my familiarity with R and the tidyverse paradigm, but I think in general this paradigm and syntax is pretty readable. The example you chose is perhaps a case where brackets and commas go a long way, but that aside it's really not too bad in context of the rest of the language. `:` seems to consistently be treated as keyword arguments, and `=` for aliasing.


Because now the type of join is an argument on the "join" operation, and join is the first word of that statement making it more obvious what the operation actually is? I also prefer foo(bar) over "bar to foo()" as well, which seems like the equivalent in a more general purpose function call in a language example.


The problem with SQL is the same as the problem with C-style variable declarations. It sounds slightly better than the alternatives when you say the code out loud, but in reality it causes problems with readability and parsing/processing the code


This is well timed as PRQL 0.9 was just released a few hours ago. Have a look at the release notes here:

https://github.com/PRQL/prql/releases/tag/0.9.0

There is a rather large breaking change in the syntax from `[]` to `{}` for tuples. This is because initially it seemed like these were lists but over time we've realised that they are actually tuples. As per the release notes, freeing up `[]` clears the way to start supporting arrays soon.

Disclaimer: I'm a PRQL contributor.


The limitation of PRQL is that it only does SELECTs, by design. If you want to insert/update/delete data, you're back to SQL.

That means that your team's data scientist might give you a query written in PRQL, but if you want to actually incorporate it into the data pipeline, you'll need to translate it into SQL.

I wish that PRQL would support at least a limited ability to insert -- for example, maybe just the case of inserting into a new temp table. No update or ON CONFLICT logic to worry about. It could look like this:

    from tracks
    filter artist == "Bob Marley"
    save bob_marley_tmp


I don't see why PRQL can't support data mutation. Just have an insert/update/delete operator that must go at the end of a pipeline, and which takes a table name as an argument. An SQL query like this:

  UPDATE counters SET value = value + 1 WHERE name LIKE 'prefix.%'
Could then be written in PRQL as something like this:

  from counters
  filter startswith(name, 'prefix.')
  derive {
    new_value = value + 1
  }
  select {
    name, new_value
  }
  update counters


This feels way less intuitive than SQL:

UPDATE counters SET value = value + 1 WHERE name LIKE 'prefix.%'


I really hate that SQL doesn't let you do

   UPDATE counters WHERE name LIKE 'prefix.%' SET value = value + 1
In a SELECT it's fine, but in a mutation it's dangerous to make updating literally everything so easy if you leave off the tail.


Yep, these is where I lost interest.


This must not be missing here then: "I don't want to learn your garbage query language" [1]

[1] https://erikbern.com/2018/08/30/i-dont-want-to-learn-your-ga...


Big fan of this post! We link to it from PRQL website.

Our goal for PRQL is a great language to integrate with & build on. So we don't have N languages for N databases. Because PRQL will always be open-source, and won't ever have a commercial product, we think that's much more feasible than a DB-specific or product-specific language.

(PRQL dev here)


As evidenced from the poster's ORM rant, two languages for the same DB seem just as offensive to them as different languages for different DBs. I suspect you're still going to have an uphill battle converting them, unless they were hit with a blinding shaft of sunlight on the road to the SQL History Museum or something.



Previous Show HN: https://news.ycombinator.com/item?id=31897430

Show HN: PRQL 0.2 – a better SQL - 378 points by maximilianroos on June 27, 2022, 161 comments

Original post when it was conceived: https://news.ycombinator.com/item?id=30060784

PRQL – A proposal for a better SQL - 650 points by maximilianroos on Jan 24, 2022, 295 comments



KQL is excellent, I miss it whenever I write SQL.


The one joy I have while using Azure :)


Syntax doesn't matter much, what in the optimizer does.

I think that right now a lot of SQL problems are baggage. Too many people are battle-taught to write hard to read, complex, deeply nested queries because older engines were bad at handling modern constructs (e.g. look, somebody got common table expession (CTE) here, let's materialize it immediately). So older style was mess of

     SELELT FROM (SELECT FROM (SELECT FROM A JOIN (SELECT ) B ON A.x=B.x))) etc
With things like improved CTEs handling by engines, it is getting easier and easier to write understandable, composable queries of structure like below. Add modern window functions that decrease another layer of complexity, things like SELECT * EXCEPT, more complex TOP statements like TOP X FOLLOWING etc and SQL will be even more simpler.

    WITH A AS (
    ),
    WITH B AS (
    ),
    WITH C AS (
       SELECT y FROM A)
    ,
    SELECT result from C
    WHERE
and in new-ish world minor improvements of different syntax shape (like that PRQL) do not really bring the benefit, from my POV. Yes, you can rewrite it in imperative looking pipeline, but optimizer can throw away some of it etc

Yes, there are things that can and should be done better in SQL, but I have yet to see compelling case for cosmetic, syntactical differences.


There seems to be very little support for these SQL challengers despite many people saying SQL sucks and they want a redo. Why is that?

Some guesses:

- There is some popular non-sql query language that has gained lots of momentum that I just dont know about.

- People are more effective with SQL because they know it so no new and especially existing databases will switch. This is definitely true to some extent, but if this is the whole reason then I guess SQL is pretty good since people rely on it to be effective.

- There is support for it but a few key decision makers are to blame (big data or the MYSQL lobbyists)

- People actually dont dislike SQL that much and/or its not that bad and hackernews is not representative

- These query languages look nice but have technical or performance issues compared to SQL


I think the main reason mainline SQLs continue to be popular is that, unlike general purpose programming languages, the implementation of the query runner is the most important aspect, not then language spec. Take Python for instance. The official implementation objectively sucks, yet it remains extremely popular because the language implementation is only a part of a language’s success. That won’t fly for a db engine.


The last one is definitely a big factor, combined with the fact that you still have to compile to SQL, i.e. you are limited by its limitations, and there isn't always a workaround.

The rest are also true to some degree. But I think that once a serious contender comes along, people will eventually move to it.


You have to love the honesty: "Why PRQL? For HackerNews enthusiasts: The PRQL compiler is written in Rust [and] We talk about “orthogonal language features” a lot" (https://prql-lang.org/)


There are about 17 of these "SQL but better if we start with the FROM statement at the top" languages.

The language-integrated query (Linq) feature of .NET works the same away:

    var studentsGroupByStandard = from s in studentList
                                  group s by s.StandardID into sg
                                  orderby sg.Key 
                                  select new { sg.Key, sg };
Another example is Microsoft's Kusto Query Language, which they use in Azure Log Analytics:

    StormEvents
    | where State == 'TEXAS' and EventType == 'Flood'
    | top 5 by DamageProperty desc
    | extend Duration = EndTime - StartTime


So did QUEL, as implemented in Ingres circa 1976. Perhaps this is what happens when you go back to basics and take a good long look at Codd's relational calculus (or, equivalently, relational algebra), to which SQL bears only a passing resemblance.


back in my day, i thought a WHERE clause on the "RANGE OF alias IS RELATION" would be useful. nowadays, a session level VIEW is the way to go.


>declarative

An admirable ideal, but declarative languages always seem to devolve towards some frankenstein imperative/declarative hybrid. We need to stop going down this path and embrace Pulumi's pattern: use existing general purpose imperative languages to generate a declarative structure. Instead, people try to take their not-mature declarative language and fit a weird general purpose language inside it.

EDIT>> I'm not suggesting that SQL needs to be declarative, only that if a problem space would benefit from declarative structures, generate them imperatively instead.


I've silently been a big fan of this approach for years but hadn't heard about it anywhere (most likely because I'm not familiar with Pulumi).

Could you provide a reference that elaborates on "Pulumi's pattern"?


This is the closest example I could find https://www.pulumi.com/docs/concepts/how-pulumi-works/

Pulumi serves as the strongest contender to Terraform when doing IaC (infrastructure as code). Terraform attempts to be a declarative markup language (HCL) but it has a lot of weird imperative quirks due to (understandably) trying to support common complex use cases. In the end they have a clunky custom language that tries to do what general programming languages have done well forever. Pulumi doesn't re-invent the wheel, and lets programming languages do what they do best. Pulumi only really cares that the programming language generates a declarative spec that can be used to compare with your infrastructure. It's the best of both worlds.


Thanks so much!

> Pulumi only really cares that the programming language generates a declarative spec that can be used to compare with your infrastructure. It's the best of both worlds.

Fully agree! I've called this approach "imperatively generating declarations".

Things might get complicated, though, if you try to nest this approach. E.g. if you imperatively generate the structure of your deployment pipeline, which in turn, contains imperative code to generate your infrastructure spec. :) Or something like that[0].

But that's probably just the nature of the problem.

[0]: I can't come up with a fully realistic scenario right now because it's late. Maybe tomorrow!


PRQL seems the most realistic evolution out of SQL. Changing the programming paradigm will never convince the SQL true believers.


I don't really know what you're saying, can you say it another way? "Most realistic evolution"... why is that needed? If the problem is different database engines implementing the SQL spec differently, that's not something that can be papered over with another abstraction without a lot of wrinkles.


SQL does not need to be replaced, in the same way that COBOl never needed to be replaced. It can get the job done, but we have superior tools today. Better abstractions enable more productivity.

That there are N flavors of SQL is annoying, but there are foundational design choices in the language which we are stuck with today. PRQL is quite readable to those with SQL experience and feels like a plausible next language in the space without reinventing paradigms.


To go back to my original post, my main beef is with turning an imperative language into a declarative one. If you've seen enough of these types of languages degrade, you start to see a pattern.


The ergonomics could be better: https://pastila.nl/?01359244/9ac65f960385a02b3193778b4c6af10...

Significant newlines. Non-obvious way to specify database/schema.


There are two symbols for separating transforms in a pipeline - \n and | so you can use | instead of \n if you prefer.

So for your query from the linked gist, the following should work:

    from system.numbers | select number
Otherwise, if you want to be really strict:

    prql target:sql.clickhouse

    from `system.numbers` | select {number}
You can see here for more details: https://prql-lang.org/book/reference/syntax/keywords.html


This is tangential but a new query language is inevitably based in the idea that SQL is deficient in some manner (hard, not ergonomic, whatever). More interestingly, it also implies that the countless alternatives aren't good enough either.

Is there an existing query language that anyone will argue is better than SQL? I have limited exposure on this, but if SQL is really not that good then I'd expect there to be a better one at this point. All the new entrants kinda suggest that maybe SQL isn't so bad after all. Even many noSQL db's use an SQL-like language (see PartiQL). I'm not that familiar with alternatives though.


I think of it in the same way as JavaScript. Clearly, it has issues, but if you want to work on the web, it is what you have available. If you want to speak to Oracle/MySQL/SQL Server/Postgres database you are writing SQL. New entrants, could make a new language, but now they are fighting a battle on two fronts: the novel technology +query language.


> I think of it in the same way as JavaScript. Clearly, it has issues, but if you want to work on the web, it is what you have available.

And, interestingly, the approach they're taking here is similar to how folks have dealt with JS: introduce a transpiled language whose paradigms are close enough to the host language to feel familiar. Reminds me of CoffeeScript, actually (although if we're being honest, I couldn't stand CoffeeScript.)


I counter the argument that just because it is still around it is the best solution. Many things stick around just because of inertia, for example, the QWERTY keyboard.


You're comparing apples to oranges. Typing is much harder to relearn due to muscle memory. New well designed programming languages become quite popular quickly and go viral when they solve the right problems for developers.

SQL is a very sound querying language originally based on relational algebra and tuple relational calculus. Many attempts have been made in the past to come up with better syntax but they don't go far which might have to do with the fact that authors realize later that SQL is just a thin layer masquerading the mathematical concepts required to retrieve relational data and make certain assumptions without breaking set theory axioms.

One common mistake I see is that developers attempting to create an SQL replacement often approach it in the same mindset as creating yet another general purpose programming language.

Edit: The above is not in reference to PRQL. It's just my anecdotal experience working on several SQL engines, building ones from scratch and working in this domain for 15 yrs.


Then state the better solution.

Besides, I'm not saying it's the best because it's most common, or even that its the best at all. It just makes it difficult to understand the argument that SQL sucks when its so widely used and no one can agree on an alternative. Including in new databases which could opt for these "new and improved" variants.

I grant that there is a first-mover advantage and it's not easy to shift paradigms. That's a big factor. That's why I am honestly - not rhetorically - asking which query language is clearly better than SQL? And why do people keep making new challengers instead of backing something that's already better than SQL? There seems to be very little support for these query languages despite many people saying SQL sucks and they want a redo.


Big keyboard is keeping DVORAK down! In a fair world we’d all be typing NewQL faster with AI from our flying electric cars in a communist utopia.


The cross-silo synergy is strong in this one.


What countless alternatives? I can’t think of many, and believe me, I’ve looked hard!


Last time this showed up on HN, I complained about it not having CASE WHEN equivalent, which I've seen heavily used in ETL usecases. Now I see they added it!

Pretty cool and responsive. I'll go take it up for a spin.


Forwarding this to some of my team members right now.

We use SQL for a lot of things, almost all of them implementation of customized business rules. When attempting to develop a business rule that needs to manifest as a single SQL query, your brain is forced to work in a very particular crystalline way. Many business rules are defined in procedural terms at requirements time, so having a way to implement them in the same could be beneficial.


This looks similar in many ways to Kusto Query Language, one of my favourite tropes of Azure: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/...

I'm delighted to see an open alternative emerge <3


Note that Kusto Query language is open source see https://github.com/microsoft/Kusto-Query-Language


    BlaBla.io Better Tomorrow Today(tm)

    Fullstack Engineer (React, Typescript, PRQL) (100% Remote)

    What We're Looking For:

    You’ve got 5+ years of experience in designing, building and maintaining large apps written in React, Typescript and PRQL

    ...
I'll just add it to my experience just to be safe


Hot takes: SQL is good. Great, in fact.

Not just in what it's capable of doing, but in form. It has warts, yeah, but over the years I've been writing it I've realized most of them are there for a reason. To me it's more than useful, it's beautiful. It's the absolute last thing I want a replacement for.


How do we ever hope to differentiate between useful evangelism and stockholm syndrome? Either way, I'm probably learning a lot of postgresql this year.

I do really wish we had at least settled on one data format instead of having enough differences between PL data and DB data to start building ORMs.


Unsurprisingly, not everything is intuitive. There are language nuances that can surprise and frustrate users but are often well thought out. It’s exceedingly rare to have a SQL query written 20 years ago that couldn’t be run today, and written today it may be the same or very similar to one written long ago. It’s hard to say the same for other programming languages.


It's been really cool seeing PRQL come to life! I'm not involved with it, but I still remember the very first HN post about a year ago where it was just a proposal. And look at it now! It's really gaining steam, I'm really excited about it!


Except for niches, one is not going to unseat SQL as the primary de-facto standard relational query language unless the replacement is significantly better, not just slightly. Unless a fad storm kicks in, being slightly better is rarely enough to unseat an established standard.

As far as SQL competitor candidates, my personal favorite is SMEQL. It's more composable and uses tables for many operations instead of dedicated commands and syntax, making it more uniform where a handful of idioms do all the work instead of lots of specialized stuff for each task, like what SQL does.


Thanks. I hadn't heard of SMEQL before. To me it structurally actually looks very similar to PRQL.

PRQL just always has `srt` as an implicit parameter (now referenceable as `this` since 0.9) and a more linear flow whereas with SMEQL I guess you could have more complex chains. Actually you can do those with PRQL as well, you just need to break them out into separate `let` expressions. On the upside, PRQL can do away with a lot of the parentheses and some punctuation which makes it a bit nicer IMHO but YMMV.

I agree with you that PRQL would need to be *significantly* better to justify the time investment in learning it. For some of that is already the case but we'll continue working on it to try and hit that sweetspot for even more people.

All the best!

(Disclaimer: I'm a PRQL contributor.)


I frankly don't see many similarities to PRQL.


There is a point in moving the FROM ahead of SELECT and getting sensible auto-complete suggestions from your IDE as you type.

LINQ, however, attempts to solve a much bigger problem: Use consistent code to process data and don't worry whether it is in a remote database or in a local array.

If you manage to do it in a way that

- integrates seamlessly with e.g. Python IDE and I don't have to pass and parse strings

- allows me to access graph, document and relational data

- allows me to pull and process the data from REST APIs, [O/J]DBC and straight from my RAM

- and maybe even includes deductive (PROLOG/DATALOG) features

I will be your most loyal client


Hm, I just realized there are two similar projects with very similar names: this one, and

https://github.com/erezsh/Preql


Having spent more time in MongoDB aggregations than I'd like to lately, I really wish they'd support this. So much more sensible than the madness they've got going on.


This sounds interesting. Could you provide a link to the MongoDB aggregations that your are referring to because they're not something I'm familiar with?

(Disclaimer: I'm a PRQL contributor.)


Here are examples of MongoDB aggregations: https://github.com/ClickHouse/ClickBench/blob/main/mongodb/q...

They are painful to write compared to SQL queries.

Although the commercial version of MongoDB has support for SQL, it's not available for general MongoDB users.


Thanks!

Those do look awful and more what I remember from the brief period when I used MongoDB in around 2013 or so.

I take it GP comment was just trolling then?


I was not trolling, I actually dislike MongoDB aggregation syntax and it seems like PRQL is something that could potentially replace it or go on top. Admittedly I only took a quick glance at PRQL though, so perhaps I misunderstood something or it's not applicable to this case.

There are examples on Mongo's page, eg https://www.mongodb.com/docs/manual/core/aggregation-pipelin...

eg:

  db.orders.aggregate( [
   // Stage 1: Filter pizza order documents by pizza size
   {
      $match: { size: "medium" }
   },
   // Stage 2: Group remaining documents by pizza name and calculate total quantity
   {
      $group: { _id: "$name", totalQuantity: { $sum: "$quantity" } }
   }
  ] )


PRQL is a breath of fresh air. Reporting languages generally miss built-in visualization and drill-down capabilities. Ideal reporting query should define not only how to seek, join, and and aggregate data, but also how to visualize output and how to present details in reaction to user clicks. There are some limited efforts like in PowerBI and Splunk but we need a standard. I wonder if PRQL guys will address this need in the future.


Has anyone benchmarked it or the bindings? SQL tends to be embedded directly in software programs, so building would likely happen on the fly in real-world cases.


This is from the GitHub page

> PRQL is being actively developed by a growing community

You can see this at https://devboard.gitsense.com/PRQL which shows a good mix of new and not new contributors. Not sure why there was a spike in the last couple of days though. Maybe the maintainers that in this thread can comment on this.

Full Disclosure: This is my tool


Great tool! Thanks for sharing that.

Regarding the activity, there's been a lot of preparation for the 0.9 release. As to the new contributors, I don't know the reasons for that but all contributions are very welcome. It's a very welcoming community and a great way to make an impact!

Full Disclosure: PRQL contributor


Thanks for the insights. You might also want to add a "good-first-issue" topic an/or labels to your project to signal how open you guys are to new contributors. In the near future, I'm going to start surfacing projects with these labels/topics.


I am currently building a SQL-only web framework [1], and I was contacted by the prql folks to include the language by default in sqlpage. I have to say, I have mixed feelings about it: on one hand, it indeed feels more natural to query data with a clean pipeline of operators. On the other hand, it makes me think of the famous xkcd comic about standards [2]: there were too many data querying languages, so we made a new one.

[1] SQLPage: https://sql.ophir.dev

[2] https://xkcd.com/927/


Just a heads up that Prefect integration link on the landing page doesn't work - 404. This is the one I was interested in!


Sorry that that was your first experience. I have opened an issue for this for you: https://github.com/PRQL/prql/issues/3074

The correct link is the following: https://prql-lang.org/book/project/integrations/prefect.html

TBH, that one is not much of an integration yet. It essentially boils down do.

    $ pip install prql-python
and then

    import prql_python
    PostgresExecute.run(..., query=prql_python.compile(prql))
I don't personally work with Prefect, so if you have any ideas about what you would like to see here, please comment on the issue or on Discord. We're also very open to Pull Requests and they usually get merged fast ;-)

Disclaimer: I'm a PRQL contributor.


Thanks! We just did a big refactoring so we do have some links failing in our tests. Just PR-ed a fix to this one: https://github.com/PRQL/prql/pull/3075

(PRQL dev here)


Naive question :

Isn't something like F# computation expression or C# LINQ a simpler and more generic way of acheiving the same effect?


How are joins handled? A relational db is largely about the relations, otherwise this is more of a document query language.


I had the same question. From the docs, it looks pretty elegant:

    from employees
    join side:left positions (employees.id==positions.employee_id)
which translates to

    SELECT
      employees.*,
      positions.*
    FROM
      employees
      LEFT JOIN positions ON employees.id = positions.employee_id


I like it too. You can also alias them easily

    from e=employees
    join p=positions (e.id==p.employee_id)


positions p

is a SQL alias. Was that too difficult?


It's less about it being difficult, and more about it being easy, obvious to read/write, and consistent within the new design. Was that not obvious?


Except if I wrote it by hand, it would be -

    SELECT *
    FROM employees
    LEFT JOIN positions ON employees.id = positions.employee_id
which isn't really that different.


I dont know the right words, but creating a plugin for DBT for this would give it a lot of traction I bet.


Totally agree. This was a high priority goal for us and @maximilianroos did a lot of work putting something together. In the end, I believe it was really fighting against how dbt is set up though as it is so tightly interwoven with Jinja2. Perhaps with the introduction of the Python dbt models, things might have changed now and it's worth taking another look.

https://github.com/PRQL/dbt-prql

(Disclaimer: I'm a PRQL contributor.)


I couldn't find a mention if this supports SQL transactions?


Have completions been implemented at all? The language looks like it is designed to take that into account, but I don't see that after a cursory search


This is insanely cool. It’ll be interesting to see if this gets decent adoption, and whether the query language ends up being easier to maintain than sql.


Looks interesting but I wonder if this abstraction doesn't leak and it doesn't just become a more limited version of vanilla python + vanilla SQL.


Vanilla python? Where do you get this impression?

The leak to SQL is trough s-strings, which are chunks of SQL that gets inlined into the resulting SQL. The long-term plan is to remove them completely, but they are needed as an escape-hatch for now.


I see that PRQL is compiled to SQL. How do you deal with different type of SQLs? i.e Snowflake/Postgres/Clickhouse? Isn't it a potentially impossible abstraction or at least there will be places where you will need to use the native support sql language (unless of course databases will adopt that in the first place which might be great)?


This.

If you are layering an abstraction on top of SQL to avoid the SQL, you fail that goal once something goes wrong.

The reason you fail the goal is that you end up having to have skills to troubleshoot the SQL anyway. Wrong here can be: bad result, bad performance, or even just bad syntax. Unless you're ready to accept being helpless and rely on the knowledge of others... you'll want to look at what's actually hitting the database to know if it's a database problem, a PRQL problem, or problem in your own code and to do that we'll you better be well versed in SQL, your database, and how your DB vendor have implemented their flavor of SQL. So to use this DSL truly competently, you need to know everything that you'd need to know to write plain SQL... and then PRQL and how it might translate into the target SQL of your application. And taking PRQL on its own it doesn't really seem any simpler than SQL from what I can see... what you get in increased compositional simplicity you have to pay back in understanding its own quirks.

To be fair this approach can be worth it, but I don't think for a stand alone DSL like PRQL. For example, in .NET land there's C# and Linq where you get a query DSL... but you also get a more seamless integration into C# and its paradigms. Same could be said of Elixir and the Ecto query DSL; all the other things I above remain true, but the DSL brings the value of clean integration in to the code base of your app.

Needless to say, I understand the why they'd try something like this... but I'm skeptical that the marketing here seems to oversell what you end up really getting.


What made you think the goal is to not know SQL?


This feels like the perfect level of SQL abstraction. SQL is great in theory, but gets really hard too read once you throw in joins with subqueries.


Why not break these subqueries into their own common table expressions and prepend them to your join statements?


Or their own tables or views.


Adding tables doesn’t sound like a solution at all and views come with operational overheads that you need to be aware of.

CTEs should be the first thing you reach for when trying to clarify the intent of the way you’re chopping up your data.


Depends on the situation. Creating tables is a space/CPU tradeoff. Views can be managed the same way as the rest of your data pipeline.

CREATE TEMPORARY VIEW and CREATE TEMPORARY TABLE are both valid SQL.

> CTEs should be the first thing you reach for when trying to clarify the intent of the way you’re chopping up your data.

Sure but they aren't a silver bullet. Hence the "or" in my comment.

I use Redshift a lot and temporary tables are useful there for defining the sort and distribution keys when those are not favorable to the current query. Think joining several demographic tables into a single customer dimension, distributing it on the customer key then filtering the product_id distributed sales and re-distributing it on customer before joining. You can't do that with a CTE.

Views are handy when you have ugly CASE statements or other derived logic and need a convenience column that does something like string concatenation. I have a whole repository of them I deploy on top of the default system views for simplifying admin tasks.

If you are struggling to keep your SQL clean it's because you are doing too much at once. Take a step back and re-evaluate your data model. If your physical model is bad then no number of CTEs will save you.


CTEs are basically temporary views and tables.

I’d argue that the extra tables with duplicate data aren’t a space / cpu trade-off as much as they are a data management trade-off. There are cases for duplicating data in the system but I’d leave that to indexes and generated columns unless you really need to materialise it in a different structure.


> CTEs are basically temporary views and tables.

Not in any system I have ever used. CTEs aren't materialized so they're not tables.

Tables (or materialized views if you prefer) are literally space/CPU tradeoffs. I have a data pipeline that combines website logs into a table before I join it with product data. If you tried to do both in one query then the DB falls over.

You seem to be coming at this from a OLTP perspective but in OLAP data "duplication" is the name of the game. But what naively looks like wasteful duplication is actually creating coherent analytical models from disparate data sources. This leads to much simpler analytical queries.


I agree with your further example cases (they were added after my comment).

I was thinking more of the original post where someone was complaining about readability when you have joins and subqueries - which is the usecase for CTEs.


Yeah sorry I edit a lot. I'll bump up my delay.

Tables, views, and CTEs are all tools that can be used to make SQL more readable. They are all valid alternatives to subqueries in JOINs which is the only thing I would say you should "never" do.

I tend to use all of them. I create tables with an optimized (typically star) schema for my purposes. My date dimensions are almost always views on top of a list of dates. I also use views to create a ~"feature store" that pre-joins the underlying star schema. I can then write "simple" analytical queries that utilize CTEs. Those CTEs tend to not have joins and they tend to only apply relevant filters so the final select is clear and concise.


> CTEs aren't materialized so they're not tables.

You might well know this, but one thing thats worth pointing out is that in Postgres you _can_ materialise the CTEs (though it's more to trick the planner when your use case is more about making sure everything to exploded and indexed in advance).

    with x as materialized (select ..) select ...


neat trick!

sometimes you absolutely need to materialize your CTE, especially if it is one big computation and you refer it 4-5 times downstream.


According to the docs [1] this is not necessary:

> All queries in the WITH list are computed. These effectively serve as temporary tables that can be referenced in the FROM list. A WITH query that is referenced more than once in FROM is computed only once, unless specified otherwise with NOT MATERIALIZED.

[1]: https://www.postgresql.org/docs/15/sql-select.html


Yes poorly written code is hard to read. That flaw is not unique to SQL.


Sweet, it looks like the language M should have been (or should become)!

Granted, M would massively benefit from better tooling too.


Ah, nice to see a mention of M. I don't see it mentioned a lot on HN and get the feeling most folks here probably don't even really know about it because it's so embedded in PowerQuery and PowerBI. It's actually not a bad language, it's just quite verbose in terms of the keywords.

I actually draw on my experience with M a lot in terms of the input that I give to PRQL because for data munging M is quite a good language.

Until we did the [] to {} syntax swap in 0.9, PRQL actually started to look a lot like M because M is the only other language I know that has [] for tuples.

(Disclaimer: PRQL contributor here)


Yeah, it's a neat tool generally speaking.

What holds it back most of all I feel is the prison that is the Power Query editor (I have a sizable list of small and large papercuts in mind).

Now that PBI allows experimental export to pbip, change may be on the horizon. Hoping for more fine-grained tools, a better profiler and a better editor.

Will look at PRQL when I find the time. The planned column lineage feature and type checking e.g. seem like a godsend. Good work!


How does this compare to EdgeQL?


This seems like just a compiler from a better language (PRQL) to SQL, while EdgeDB is actually a database that comes with not just a better language (EdgeQL), but also other great features (I love their TypeScript query builder or how incredibly frictionless the migrations are).


Frictionless until you need to do standard stuff like a stored procedure or a view.


Why developing yet another language if we already have LINQ?


So, in principle, how is this different approach than DBT?


dbt is a tool for defining and orchestrating in-database transformation jobs. Typically you write SQL statements to define the transformations (plus some template language to pull in additional context if needed).

PRQL is an alternative language for writing queries which compiles to SQL. There is a dbt plugin which presumably allows you to define the transformations in PRQL instead of SQL, while everything is still orchestrated by dbt.


The sooner SQL is phased out in favor of something more akin to Hadley Wickham's dplyr the better. Don't particularly like the syntax of this but it's the right direction.


Has anyone compared PRQL and Malloy head to head?


I'm not aware of a head-to-head comparison. My impression is that they have slightly different aims.

PRQL intends to express analytical queries as a series of transformations which form a data pipeline.

My sense is that Malloy aims more to provide a kind of semantic layer as well as dealing with nested data and reports.

(Disclaimer: I'm a PRQL contributor.)


This is just tidyr ?


Kind of. dplyr, Pandas, LINQ, Kusto, etc... are big inspirations. The fact that these are so popular and have reinvented the same workflows with slightly different syntaxes to me is a sign that they capture something fundamental about how humans like to think about data transformations.

PRQL is indeed very close to dplyr. In my (biased) opinion, PRQL is actually a bit cleaner than dplyr because it is its own language and doesn't have to work as DSL inside R. The same goes for comparisons with Pandas and Polars having to work as DSLs inside Python.

Compare

    from mtcars
    filter cyl > 6
    select {cyl, mpg}
    sort {-mpg}
 
with

    mtcars %>%
    filter(cyl > 6) %>%
    select(cyl, mpg) %>%
    arrange(dplyr::desc(mpg))
Incidentally, I produced the dplyr code from the PRQL with

    ```R
    install.packages("prqlr", repos = "https://eitsupi.r-universe.dev", lib="~/.local/R_libs/")
    library(prqlr, lib.loc="~/.local/R_libs/")
    library("tidyquery")
    "
    from mtcars
    filter cyl > 6
    select {cyl, mpg}
    sort {-mpg}
    " |> prql_to_sql() |> tidyquery::show_dplyr()
    ```
(Disclaimer: I'm a PRQL contributor.)


In a way, yes.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: