
Important database news from the last few months - MarkusWinand
https://winand.at/newsletter/2018-07/sql-over-the-cloud-fsyncgate
======
esaym
Anyone know a really good SQL book or tutorial? Every book I've ever bought is
usually way too simple. I'll flip to the "advanced" section thinking I'll
finally learn some tricks only to see the "advance" section just introduces
the join syntax...

I'd like a good book that goes over queries common in reporting, you know a 3
page sql query that joins to sub queries that themselves are joining to sub
queries. You know, the good stuff!

~~~
ivan78
If you are looking for something advanced, you should check books written by
C. J. Date. For example, his "An Introduction to Database Systems" considered
as one of the best book on database theory.

~~~
ris
_Theory_.

However he insists on living in his relational purist fantasy land without
nulls or outer joins or other pragmatic things that people who use databases
to get real work done rely on every day.

------
peatmoss
I recently bought the author’s book to brush some cobwebs off parts of my
SQLing. Over the years I’ve seen Markus’s blog posts and guides posted here on
HN. I’m always impressed when I see an individual focus on a topic and build
an identity / business like this. Thanks for all the blog posts, Markus!

------
rm999
SQL is by no means perfect. For one, t̶h̶e̶r̶e̶'̶s̶ ̶n̶o̶ ̶o̶f̶f̶i̶c̶i̶a̶l̶
̶s̶p̶e̶c̶i̶f̶i̶c̶a̶t̶i̶o̶n̶: some dialects are meaningfully different than
others, and even similar ones are often full of implementation details about
the underlying database. It has a ton of quirks, and isn't as powerful as I'd
always want it to be. And sometimes it can be really hard to read.

But I still haven't found a better universal "language" to talk about
extracting and manipulating data. The core set of functionality of SQL
(selects, joins, aggregations) is usually adequate to express powerful ideas.
Most importantly, a lot of people who work with data know it, and I've found
semi-technical people (including non-programmers) can learn it quickly. Anyone
who has been paying attention to trends in the data world in the last 5 years
knows SQL is here to stay for the long-term.

~~~
atombender
As for myself, I wish SQL's SELECT were more expressive. It's currently
organized almost exactly like a mirror of the underlying relational algebra
operators that a query engine has to execute internally.

But relational operators isn't what developers like to work with. SQL is about
"projecting" relations, putting them through some basic transforms, none of
which compose that well because of SQL's statement-oriented syntax. For
example, you can't group a group-by. A group-by takes a list of terms, but you
can't funnel the output through another one without wrapping the whole query
in a "SELECT FROM (...) AS ...".

I'd like a query language that more expressive, expression-based syntax where
the results of each expression can be "piped" into the next step to perform
transformations, joins, reductions and so on.

~~~
MarkusWinand
SQL uses nesting as you have shown instead of a "pipe" (think of it like
functional programming).

The difference is just the syntax, not the expressiveness.

~~~
atombender
Well, I am explicitly talking about syntax.

------
Scarbutt
Finally, an open source SQL database with history for data builtin:
[https://mariadb.com/kb/en/library/system-versioned-
tables/](https://mariadb.com/kb/en/library/system-versioned-tables/)

Always wondered why this feature has been ignored for so long.

~~~
Latteland
Thanks for pointing that out! They appear to only support one of the two time
models proposed by the research community. They support transaction time (when
you write it), but not the interesting and useful 'valid time'. The difference
can be explained as you are going to be paid a salary for working say from
July 1, 2018 to June 30, 2019 (this is valid time), but we write the
information into the db at a different time, say we write it on June 15 (this
is transaction time), then we realize a mistake and update your salary record
on June 20th. But your employment starts July 1.

~~~
MarkusWinand
Just FYI: The SQL standard supports both types. The other one (valid time) is
called "application versioning" in SQL.

Read this paper if you'd like to learn more about it:
[http://cs.ulb.ac.be/public/_media/teaching/infoh415/tempfeat...](http://cs.ulb.ac.be/public/_media/teaching/infoh415/tempfeaturessql2011.pdf)

~~~
Latteland
Cool I hadn't heard about that.

------
thanatos_dem
It would be interesting to see a "new and noteworthy" section in these types
of news recaps. There's always a lot of hype around new databases, and it'd be
nice to see an unbiased perspective from someone with such deep knowledge of
database systems, similar to Aphyr's deep dives into consistency and
durability.

This seems focused on only SQL systems, and only the largest ones, which is
okay, but I think there's a lot more to the story than that :)

~~~
jchanimal
If you are into global consistency but also want NoSQL flexibility, my
employer FaunaDB has ACID transactions, joins, indexes, and object level
access control.

~~~
atombender
I'm sure you guys are doing good business, but the only way we'd use FaunaDB
is if it's open-sourced.

I suspect you'll have a hard time competing against Google Spanner (or even
Google Cloud Datastore).

------
noncoml
From a _developer’s_ point of view, I absolutely hate SQL. Converting my data
from and to row format is a pain. Transactions don’t really tie well with the
rest of my application logic. And of course embedding multiline SQL statements
in my code looks like monstrosity.

~~~
tzahola
Why do you put SQL statements in the “core”? Just put them behind and
interface and call it a day.

SQL and relational data is the only option, unless someone comes up with an
equally powerful alternative with similar theoretical underpinnings.

~~~
noncoml
Sorry, it was a autocorrect shenanigans. s/core/code

------
olavk
Is this the final nail in the coffin for the NoSQL hype?

Although I'm not sure the road back to sanity is to add SQL interfaces to non-
relational database systems. SQL and the relational model are often conflated,
both in the NoSQL hype and apparently also in the backlash. I like things like
LINQ in .net which allows you to work with a relational model and use
relational algebra without having to use the SQL language

~~~
kimi
I am personally playing a bit with Apache Calcite - SQL without the data
storage - and it sounds very interesting. For most reporting jobs, a couple of
SQL queries are all that s needed and are easy to read and easy to write, even
if you are actually querying some in-memory structures.

------
Annatar
"According to the “shoot the messenger” principle, PostgreSQL has been heavily
criticized for fsyncgate. Indeed, PostgreSQL suffers from this problem more
than other databases as it doesn't offer direct IO."

Not a problem on any illumos based operating system, as fsync(2) and ZFS do
what they're supposed to do and do not lie about data being on stable storage
when it isn't. PostgreSQL is a good database, and if you want it to be bullet
proof, run it on an illumos based operating system.

~~~
corbet
I would dispute the claim that PostgreSQL has been so heavily criticized.
Instead, the project was invited to send a representative to the Linux
Storage, Filesystem, and Memory-Management Summit
([https://lwn.net/Articles/lsfmm2018/](https://lwn.net/Articles/lsfmm2018/))
to discuss solutions with the kernel development community; some of the
resulting kernel changes are finding their way toward the mainline now.

~~~
anarazel
Indeed (I was the PG guy at LSF/MM). I didn't feel ill-treated at all. Unless
you count attempted poisoning via adult beverages, that is.

------
spraak
Maybe the NOSQL hype is dying down, but it still seems like for caching for
fast reads it can be helpful. What do you think?

~~~
emperorcezar
NoSQL is a catch all term. There is a place for key/value, document, graph,
etc databases.

People got all excited cause they discovered that these exist. They've been
around for a long time.

Relational DBs are a good default, but there are plenty of cases in which
you'll use another type of DB.

Engineers need to stop trying to find a silver bullet.

~~~
mmt
They also exist as features within an (otherwise) relational DBMS. Perhaps the
trouble is that it's mostly been in the realm of the commercial ones and only
relatively recently has Postgres started adopting enough of them to make the
notion popular.

------
geekuillaume
On the database subject, I'm working on a new project called DBacked,
basically a simple and encrypted database backups as a service. I'm not ready
yet for a Show HN (more things to improve on the presentation website) but
happy to get your advices ;)

~~~
mxschumacher
easier if you show us the code

------
gfodor
"NoSQL" took the idea of a rejection of syntax (SQL) but also, in practice,
burned in with it a rejection of vast swathes of database theory that resulted
in the modern RDBMSes. It was a meme that was generally easy to ignore and
identify the "bad ideas" within if you understood the history of databases.
SQL obviously is just an incidental syntactic detail on top of a fairly sound,
generalized model for databases -- relational algebra combined with a clear
semantic model for transactions (ACID, etc.)

These two things I believe will stand the test of time. Everything else, I
think, is ultimately incidental complexity to meet operational goals (like
indexing, or caching methods, or sharding, or replication, or denormalization)
or conceptually useful models and interfaces (like graphs, or literate APIs,
or query langauges, or ORMs, etc) on top of these basic facilities that need
to be provided by a database system.

The trend of NoSQL databases I think can be summarized as people building
systems that tilted towards letting these bits of incidental complexity
dictate the design of the system as a whole. The overriding concern seemed to
be "scalability and performance" with "easy APIs" a close second and so that
resulted in specific data structure and API needs or access patterns dictating
everything about how the database system itself worked, and how data was
modeled in that system.

For example, MongoDB was hearalded as finally letting you access a
hierarchical data structure without SQL JOINs, but did so not by just
providing a nice abstraction on top of a sound relational schema that allowed
you to "re-project" your thinking and access patterns to a document-oriented
one when it was the right model contextually, but instead by burning that
denormalized, document-oriented data structure into the entire conceptual and
technical stack for the whole system!

The cost, of course, is self-evident since choosing to reject the relational
model also chooses to reject the benefits that were explicitly recognized as a
reason for the relational model to be a good choice for database systems:
normalized data allows a much more rich set of transformations and projections
on that data to be unambiguously expressed and fulfilled by the system. In
other words, it's much more future proofed since you now have the flexibility
to mix-and-match and analyze it in arbitrary ways, decoupled from your choice
of schema. So the effect of choosing MongoDB meant that you had minimal
future-proofing if you got your document structure wrong and suddenly needed
to re-project your data in a new way. The supported path, of course, was to
"just run map reduce on it" \-- ie, lets force every application developer to
do the job the database is supposed to do :(

It's nice to see things are moving forward on this again, hopefully the trend
doesn't just reverse now and we have a flood of "SQL access" APIs to poorly
grounded database systems.

------
zzzcpan
> The marketing term NoSQL, which was the hippest buzzword just a few years
> back, is slowly becoming a synonym for a defect. Without SQL, there is
> something missing.

I guess RDBMS companies are too afraid of NoSQL to attack it with such silly
propagandist statements. You can't win the market this way.

To me it increasingly looks like SQL is losing in distributed systems. And
NoSQL may never really gain SQL beyond niche usage after all.

~~~
Joeri
_To me it increasingly looks like SQL is losing in distributed systems._

I’m seeing the reverse trend. Google went from nosql to sql with cloud
spanner. The various flavors of SQL on hadoop are on the uptick. Cockroachdb
is looking really interesting.

~~~
zzzcpan
But do you see Spanner, CockroachDB, TiDB actually getting any ground?

------
mmaunder
Whew. Those years of enduring "SQL is dead" posts were not easy. Now if we can
just rebrand "cloud" to "someone else's computer", much will be right with the
world.

~~~
ebikelaw
The conflation of SQL with consistency in this blog post is dumb. Yes Google
offers SQL on Spanner but the query execution is built atop lower level
Spanner primitives that are also available to developers and are similar to
Bigtable concepts. Also Bigtable is still much, much larger and more important
within Google than is Spanner/F1, so the death of Bigtable is much overhyped.

Point is there are strong and weak consistency models with and without SQL
execution engines, and they’re all useful.

~~~
derefr
The SQL standard specifies explicit consistency semantics for statements and
transactions, though. So something claiming to adhere to e.g. SQL:2011, is
claiming that executing certain commands will have certain consistency
properties (and if you've set up your DBMS to have weaker consistency, then it
should be blaring a loud alarm that you've effectively put it in a non-SQL-
standard-conformant mode.)

Other generic DBMS query protocols _could_ specify their own consistency
semantics, but I'm not aware of any that do. (I guess, if you consider REST a
"DBMS query protocol", then CouchDB is an example of a DBMS "conforming" to
the consistency semantics specified in the REST protocol.)

~~~
ebikelaw
If that's your definition then it's even more dumb to hold up Google Cloud
Spanner as an example of how SQL gives you consistency. There's NO WAY to
mutate Spanner tables via SQL. All writes are performed via the lower-level
RPC interfaces. The only thing you can do with Spanner from SQL is query it.
The consistency provided by Spanner is clearly not provided by SQL.

~~~
derefr
Even if a DBMS only exposes read-only SQL support, that still has consistency-
model implications—namely that, if you can execute several SQL statements in a
transaction with a long delay between those statements, then all the
statements in the transaction should execute against the same MVCC "version"
of the affected tables.

So a write on that lower-level RPC interface _shouldn 't_, by the SQL spec, be
able to cause a sequence of SQL wire messages like this:

    
    
        BEGIN
    
        SELECT COUNT(*) FROM foo;
    
        -- 100s later
        SELECT COUNT(*) FROM foo;
        COMMIT;
    

...to return separate numbers for those two SELECTs.

If it could, under Spanner, then Spanner wouldn't be conformant to [the read-
only subset] of whatever SQL standard it claims to obey. But, according to the
Cloud Spanner docs ([https://cloud.google.com/spanner/docs/transactions#read-
only...](https://cloud.google.com/spanner/docs/transactions#read-
only_transactions)), the above code _will_ always return the same values for
the two queries.

And, if you're feeling argumentative, you could interpret the _reason_ for the
particular consistency semantics, as the fact that Cloud Spanner claims to
support SQL, and SQL mandates that consistency. You could suppose that,
without that claim, Cloud Spanner would never have bothered building a
transactional MVCC layer into their architecture.

~~~
ebikelaw
Unfortunately, no. Spanner may garbage collect the snapshot timestamp to which
your example query was bound, if the garbage collection policy was triggered
by additional writes at greater timestamps, or by the simple passage of time.
The result of the second query may be a permanent error indicating that your
data has disappeared.

Spanner had a transactional MVCC system in their original implementation
described in the 2012 paper which predates SQL support described in the 2017
paper. Therefore it is probably wrong to credit SQL with Spanner's MVCC
system.

~~~
derefr
> The result of the second query may be a permanent error indicating that your
> data has disappeared.

That's actually okay, per SQL. Any SQL transaction may fail, for any reason,
at the server's discretion. Because the user's permission to operate on the
table was revoked in the middle of their transaction, for example. Or because
the server is going down and wants to drop connections.

SQL clients have to be written to assume that any transaction could enter a
failed state "before" the execution of a statement, even if the previous
statement returned a success.

Helpfully, though, they know that 1. by the SQL standard, that failure means
that no part of the transaction applied; and 2. that this failure-state is a
failure of the transaction context to be valid, rather than a failure of their
statement to be valid. (They don't know that their statement _is_ valid, but
they know that they haven't been told that it isn't.)

------
amirouche
How is realistic this article without a word about FoundationDB?

