
PartiQL: One query language for all your data - portmanteaufu
https://aws.amazon.com/blogs/opensource/announcing-partiql-one-query-language-for-all-your-data/
======
lwansbrough

        PartiQL> SELECT * FROM [1,2,3]
           | 
        ===' 
        <<
          {
            '_1': 1
          },
          {
            '_1': 2
          },
          {
            '_1': 3
          }
        >>
        --- 
        OK! (86 ms)
    

Jeez. 86ms for this query on this data set? Hope that's not representative of
the general performance!

~~~
justicezyx
You missed the point. The modern infrastructure's primary value is
scalability. This number is of course bad, for the data. But this number will
be more impressive when the data is million time bigger.

~~~
jiggawatts
You missed the point. Modern infrastructure's scalability is irrelevant if
even one user's experience is poor.

In the era of 64-core processors, scaling horizontally is meaningless for
99.9% of architecture designs. Latency matters to everyone, always.

Trivial queries taking nearly 1/10th of a second on modern kit is absolutely
atrocious, and shows a total lack of awareness of performance as a feature.

~~~
monsieurbanana
> Latency matters to everyone, always.

I used to do a lot of BigQuery for analytics. Latency in BigQuery is crap and
clearly not it's selling point, we're not talking ms here, we're talking
seconds at a minimum. Yet it's a really nice database for it's use cases.

~~~
vgt
We've improved our floor latency by a factor of 5 since two years ago, we've
introduced clustering, and newly introduced BI engine gets you into dozen
digit millisecond range, so give it a try again :)

(Product manager on BigQuery)

------
xpe
A common query language, while appealing, is unlikely to fully abstract over
different types of databases with different features and performance trade
offs. It will be a leaky abstraction.

Now, in practice, perhaps with sufficient adoption and integration, PartiQL
might be good enough for 80% of use cases.

~~~
yannisGP
(I'm part of the PartiQL effort.) You are right about the challenge you point
out and we are realistic about it. Thus this line in the charter: {{{ While
the adopting query engines generally may not support all features of PartiQL,
a database engine that “supports PartiQL” is expected to be consistent with
the PartiQL specification in the syntax subset it supports. }}}

------
jnordwick
'SQL’s ORDER BY orders the output data. Similarly, the PartiQL ORDER BY is
responsible for turning its input bag into an array.'

That is the most important thing for my uses. I deal mostly in time series
data, SQL windowing queries are too slow. Turning the set into an array to
allow indexing and support easy time series queries is enough for me the use
it.

~~~
dlurton
ORDER BY is still in the works: [https://github.com/partiql/partiql-lang-
kotlin/issues/47](https://github.com/partiql/partiql-lang-kotlin/issues/47)

------
manojlds
What does this offer over Hive SQL and Spark also supports it?

Below are the reasons given in the blog post and I am trying to compare them
with Hive SQL + Spark

SQL compatibility - I need to check this as I am not a SQL expert, but Hive
SQL seems compatible

First-class nested data - supported

Optional schema and query stability - supported

Minimal extensions - feels same goals in Hive SQL

Format independence - yes

Data store independence - yes.

------
rdsubhas
There is one word that every vendor hates: "vendor agnostic". Minor
differences in SQL dialects are not a bug, they are features for most vendors.

Most customers running on Amazon (or any cloud) want to move from having to
maintain their own databases (which takes a lot of effort) to paying someone
else do it. Amazon knows this.

This move looks like Amazon has everything to win and every other vendor has
everything to lose. Even if they say the opposite (you can switch from Amazon
to your own) - they know that extremely few customers have the will to
operationalize their own databases. So they know that only the opposite will
happen - customers will switch from self hosted to Amazon services. They have
also been openly predatorial towards other open source databases (e.g. aws
elasticsearch and mongo). No wonder all Amazon services already support this.

In that context, who is the target audience and what is the deployment model
here? Are vendors going to integrate this directly into their databases? Or
users have to run their own proxy instances? Or is it compiled into the
application as a library?

------
kodablah
Is there a specification in anything besides PDF easily available to link to?

~~~
throwawayoo
here you go. [https://partiql.org/assets/PartiQL-
Specification.pdf](https://partiql.org/assets/PartiQL-Specification.pdf)

~~~
majewsky
OP asked for anything _besides PDF_.

------
AtlasBarfed
AWS is all-in on data lock-in.

This may be powerful and useful, but it is proprietary, nontransparent,
unstandardized, and nonportable.

I get that every database has some platform lock-in, but its getting
ridiculous. At least amazon's relational offerings need to adhere to binary
driver protocols.

------
zellyn
Anyone know how this compares to Presto and zetasql?

~~~
dlurton
One big difference is native support for nested data that's built right into
the syntax of the language. Most other SQL implementations allow support for
nested data through functions which have non-intuitive syntax.

~~~
cmollis
We generally build views to unnest the arrays, maps, and structs and query
from them (or build other tables from the views in hive) but something like
this is certainly a bit easier

------
ahl
@dlurton since you seem to be speaking for the PartiQL team on this (congrats
on the launch!): The reference implementation is open source; what's the plan
for the language spec? Is that something that AWS is going to own and control?
The website references the PartiQL Steering Committee -- is that just AWS
folks or is the intention to make it more broadly composed of members of the
community you build?

I'm interested in adopting PartiQL for our product, but would we get to
participate in the evolution of the language or would we purely be downstream
of the decisions made to benefit AWS products and services?

~~~
yannisGP
hi @ahl, I'm a member of PartiQL's steering committee and glad to see your
interest to participate in PartiQL's evolution. The language spec source will
be open-sourced, as well, early next week (week of Aug 5). Overall, we look
forward to a community effort and participants that are interested in making
significant investments to achieve the project's goals. Diverse opinions and
viewpoints, both on the language and on the process, are very welcome.

At this point, the maintainers/committee is only Amazon members. As PartiQL
grows towards a diverse community, we expect to add maintainers/committee (for
code and spec) that have non-Amazon affiliations and explore more formalized
methods of governance,as they will emerge from our community discussions.

Please email us at partiql-committee@amazon.com to further coordinate.

------
pawelduda
Love the codebase, I never wrote any Kotlin (and very little Java) and was
able to (hopefully) complete a good first issue very quickly.

------
manigandham
This is pretty nice. If only because using a SQL dotted syntax seamlessly with
JSON data.

~~~
smt88
Postres has had this for years. It's arrows instead of dots, but that's the
only visual difference.

~~~
manigandham
It's not the same at all, and it gets much more verbose with minor complexity
and lacks functionality.

PG is working on adding SQL/JSON support for JSON Path queries for the next
version. It'll be a major improvement but still not as nice as what PartiQL
has here.

------
pushingice
Will this be integrated into AWS Athena? The blog post doesn't mention it.

~~~
manigandham
AWS Athena is basically managed Presto so AWS will have to modify Presto to
support it. They might, and hopefully upstream the changes.

------
whoevercares
Awesome! I’d be very interested to see when DynamoDB support this language and
a MongoDB like query builder. Then I might sell all my MDB shares...

------
ohnoesjmr
I wonder how this deals with nested parquet data, and whether it's able to
optimise on the things parquet provides.

~~~
dlurton
It would be possible to integrate parquet data with PartiQL.

Here is an example of integrating PartiQL with CSV files.
[https://github.com/partiql/partiql-lang-
kotlin/blob/master/e...](https://github.com/partiql/partiql-lang-
kotlin/blob/master/examples/src/java/org/partiql/examples/CSVJavaExample.java#L36).
Integrating with Parquet would of course be more complex then that.

------
k__
Is this a GraphQL alternative or more for accessing DBs in the backend?

~~~
girvo
Correct me if I'm wrong, but I believe the answer is "both", at least as far
as I've read.

~~~
TheDong
GraphQL is intended for public APIs which are interacted with by arbitrary,
possibly malicious, queries.

This appears to be for known queries. Unless it is designed for arbitrary
queries, DoS is a likely problem.

------
agentultra
Interesting that they opted for a relational rather than a categorical one;
the latter is proving to be more flexible [0].

[0] [https://www.categoricaldata.net/](https://www.categoricaldata.net/)

~~~
cheez
How is it proving to be more flexible?

~~~
wisnesky
I'd be happy to showcase our recent progress. But let's connect offline, so as
not to hijack the conversation. Feel free to drop me a line at
ryan@conexus.ai.

------
unnouinceput
Quote: "PartiQL requires the Java Runtime (JVM) to be installed on your
machine."

And that right there is where they lost me. Nooo thank you.

------
benburleson
[https://xkcd.com/927/](https://xkcd.com/927/)

------
mehh
So I assume this is a rebranding of some other open source project with the
amazon brand stuck on it, or is it actually something distinct?

~~~
danso
The posted article says it was designed and built in house, where it is
currently dogfooded, and the specification doc is dated today (2019-08-01):

[https://partiql.org/assets/PartiQL-
Specification.pdf](https://partiql.org/assets/PartiQL-Specification.pdf)

------
breck
This is neat. Anyone want to add support for TreeBase/Tree Notation?
[http://treenotation.org/treeBase/](http://treenotation.org/treeBase/). It's
currently on the backburner to query TreeBases in SQL without having first to
convert the TreeBase to sql. Seems like it would be relatively straightforward
to use this to do that.

~~~
topicseed
Most of your recent comments link to your project's website. Please stop with
the abusive promotion.

~~~
krapht
I have real problems with TreeNotation advertising itself as "a software-less
database system". It's basically abusing your filesystem to store a tree data
structure and using git to handle concurrent updates. That's great, except for
the part where they ask "does it scale"; and the answer is yes, but should be
no.

A database is so much more than just a schema and validator, but this is being
advertised as a database replacement.

And, I want to stress, that this doesn't mean I don't think it isn't useful. I
bet there are lots of times where you want to enforce some sort of structure
on a bunch of folders with files in them. That's not a database though.

~~~
breck
Thanks for the feedback! I did not expect that confusion. I just made an
update (and will push shortly) to be more explicit that it scales for
_collaborative knowledge bases_. But I'm not talking about something like real
time transactional DBs, etc.

I have not used TreeBase for anything other than collaborative knowledge
bases. Haven't even thought much beyond that. Thanks for letting me know that
wasn't clear.

~~~
krapht
There's nothing special about collaborative knowledge bases that make them
immune from scaling problems. You can't use TreeNotation to run Wikipedia.
Fundamentally it isn't a database, period. It's a schema for files in a file
system.

~~~
breck
> Fundamentally it isn't a database, period. It's a schema for files in a file
> system.

This is false. It is both.

A database is merely an application that provides an interface to structured
data on disk.

I know a thing or two about databases, having contributed to a few of the
larger open source ones.

There's a lot more to TreeBase than is on the website right now. As the
website says "We have been using TreeBase for over 2 years in systems with
millions of rows and dozens of collaborators." For all you know, you may have
actually used a website that is powered by TreeBase (well, a TreeBase
application written in a different host language, but the file system
semantics are the same).

~~~
rufugee
Not sure why you're getting negativity. Though I haven't gone back and
reviewed your past comments, I don't see any harm in mentioning it here... It
seems somewhat on topic and looks interesting for certain use cases.

