
SQLite Release 3.25.0 adds support for window functions - MarkusWinand
https://www.sqlite.org/draft/releaselog/current.html
======
nanimo
[https://www.windowfunctions.com](https://www.windowfunctions.com) is a good
introduction to window functions.

Besides that, the comprehensive testing and evaluation of SQLite never ceases
to amaze me. I'm usually hesitant to call software development "engineering",
but SQLite is definitely well-engineered.

~~~
twoquestions
This looks great, but I couldn't get through the first question on aggregate
functions. Are there any SQL books/tutorials that go over things like this?

A lot of material I've seen has been like the classic image of "How to draw an
owl. First draw two circles, then draw the rest of the owl", where they tell
you the super basic stuff, then assume you know everything.

~~~
AlisdairO
(self plug) try pgexercises.com. It's in a very similar vein to
windowfunctions - I understand it was one of the original inspirations - but
it covers a much wider range of material, with a gradual growth from very
basic stuff up to slightly advanced.

~~~
mercer
I'll vouch for this plug! I've spent quite a bit of time learning SQL (and
Postgres specific stuff), and this was one of my favorite resources.

I can also recommend Learning SQL by Alan Beaulieu, which was recommended in
some HN comment. After looking into various books, some of them Postgres
specific, I found this one to be the best for learning from scratch.

------
coleifer
For Python folks interested in using these features, you might be interested
in this post [0] which describes how to compile the latest SQLite and the
python sqlite3 driver. I've got a fork of the standard lib sqlite3 driver that
includes support for user-defined window functions in Python as well which may
interest you.

[0] [http://charlesleifer.com/blog/compiling-sqlite-for-use-
with-...](http://charlesleifer.com/blog/compiling-sqlite-for-use-with-python-
applications/)

[1]
[https://github.com/coleifer/pysqlite3](https://github.com/coleifer/pysqlite3)

------
molyss
The query optimizer improvements are pretty cool too.

Even though, I don't really understand that one : "The IN-early-out
optimization: When doing a look-up on a multi-column index and an IN operator
is used on a column other than the left-most column, then if no rows match
against the first IN value, check to make sure there exist rows that match the
columns to the right before continuing with the next IN value. ". I would
think there's no need to check the right column(s) if the leftmost one has no
match...

~~~
SQLite
Suppose your query is:

    
    
       SELECT * FROM tab WHERE key1=1 AND key2 IN (2,3,4,5);
    

SQLite starts by doing a single b-tree lookup on the index on (1,2) - composed
from the key1 field and the first possibility of the key2 field. If that
works, then it proceeds to look up (1,3), (1,4), and (1,5). But if the (1,2)
lookup fails, then it backs off and tries just (1, _) to see if that matches
anything at all. If (1,_ ) finds any record, the search proceeds with (1,3),
(1,4),etc. But if (1,*) fails, the search stops immediately.

The insight here is that a multi-column key value can be resolved using a
single binary search. It is not a sequence thing where we first look for the
key1=1 and then do a separate lookup in a subtree for key2. Both key1 and key2
are resolved in the same binary search.

~~~
irishsultan
This is what I expected the optimization to be, except I'm still not sure that
I understand the wording of "that match the columns to the right", I'd expect
that to be "that match the columns to the left", after all, you're checking
the existence of (1,* ), not of (* ,2) or (* ,3).

------
yread
A bit offtopic but has anyone tried to replicate sqlite databases? Using
rqlite [https://github.com/rqlite/rqlite](https://github.com/rqlite/rqlite) or
something else?

~~~
newusertoday
I am interested in this topic as well, although sqlite explicitly says that it
is not meant for client-server configuration. I still want to see if it is
feasible and someone is using it in production.

~~~
move-on-by
I got thrown into a legacy web project that used sqlite as the database. It
was a small internal-only app, I guess the original developer(s) figured it
was so small that sqlite would be plenty and it would reduce the environment
complexity.

Unsurprisingly they were wrong. It was small, but sqlite couldn't handle
multiple users. I believe this was before sqlite had WAL support, so reading
would lock the DB. The 'solution' was to split the sqlite database into many
smaller DBs that would allow users to use the site at the same time as long as
they were in different areas. This greatly added to the complexity of the
application. Some reports would need to access multiple databases to get what
it needed, so it would still lock out people. Complexity was much higher then
having all the data in a single postgresql/mysql database. All the users hated
the system and often ran into DB lock issues.

> sqlite explicitly says that it is not meant for client-server configuration

They are right and their advice should be heeded.

~~~
zip1234
Since 2010, SQLite has had Write-Ahead-Logging. Perhaps your project was using
an older version of SQLite?
[https://www.sqlite.org/wal.html](https://www.sqlite.org/wal.html)

------
virtualwhys
A bit off topic, but would be great to use SQLite in the browser instead of
IndexedDB.

I love relational databases, but you're almost forced into a NoSQL approach
when developing a SPA since the client (browser) only supports simple key ->
value storage. It would be a dream to use LINQ-to-SQL, or similar type safe
query DSLs like Slick or Quill (Scala), or Esqueleto (Haskell) in the browser.

Combine that with a single language driving the backend and frontend and
voila, no duplication of model, validation, etc. layers on server and client.

One can dream I guess, but the reality is NoSQL fits the modern web app like a
glove, for better or worse.

~~~
simonw
This almost happened. Ten years ago "WebSQL" was being considered as a
standard - it would have introduced SQLite-backed databases as a built-in
browser feature:
[https://en.wikipedia.org/wiki/Web_SQL_Database](https://en.wikipedia.org/wiki/Web_SQL_Database)

The W3C dropped it in November 2010 because they were uncomfortable supporting
a web standard which was entirely tied to a single implementation of SQL
(SQLite). I adore SQLite but I think the W3C made the right call there - web
standards need more than one compatible implementation.

~~~
mercer
> I adore SQLite but I think the W3C made the right call there - web standards
> need more than one compatible implementation.

could you elaborate on that? I get the impression that now, eight years later,
we're much worse off. I don't really see the upside of _not_ going for what
is, AFAIK, a relatively vanilla implementation of SQL.

~~~
simonw
A web standard needs to have more than one implementation. If you allow one
single implementation to define the standard, you get something like Flash.

I don't have a particularly strong practical argument here: for me this is
more of a fundamental principle for how the web should work.

~~~
tzs
Flash was a closed source plugin that was added to the browser as a binary
blob. It could be updated by the user, and because it frequently had security
bugs users were frequently urged to update. Those updates also often
introduced new features, which content providers started taking advantage of
once a significant number of people updated, putting pressure on the rest to
update too.

SQLite is open source that would have been compiled into the browser by the
browser vendor. A typical user would not have the means to update it
separately from the rest of the browser.

So why couldn't W3C pick a particular release of SQLite, list a specific
subset of its features, and declare that this, as described in the
documentation for that specific version, is the standard?

------
bertil
> Named window-defn

<3

I really prefer that syntax and I’m always a little sad when I have to copy-
paste windows across average, total, count, standard deviation, max, min… I
fully admit that it’s syntactic sugar but it’s the elegant kind.

------
silvestrov
I’d really really like if they improved “alter table” to include
dropping/renaming columns/constraints, even if it required rewriting the whole
table.

~~~
plq
There is nothing stopping you from doing it yourself. The Sqlite FAQ[1] even
has an entry on it.

Two things holding this back: 1. code being inside our outside sqlite would
not be much different. 2. the amount of additional test code would be
humongous for an operation of that level of complexity.

So, just do it yourself. It's not that hard.

[1]:
[https://www.sqlite.org/faq.html#q11](https://www.sqlite.org/faq.html#q11)

~~~
cryptonector
A lot of table alterations can be done without rebuilding the table, though
perhaps a table scan might be needed to validate constraints (though this
should optionally be a separate statement). To do this safely requires the
RDBMS to support the functionality.

------
fpgaminer
Unrelated to window functions, but I finally took the time to start digging
into SQLite's internals. People always sing its praises, so it was time to see
what all the fuss was about.

Someone else already mentioned that the vast majority of SQLite's codebase are
tests. Well, on top of that, of the real "working" codebase I'd say the
majority of it is comments. It's incredible. The source is more book than
code. If you have any when, why, or how question about SQLite, I guarantee
it's answered in the code comments (or at least one of the hundreds of superb
documents on their website).

Another surprise I discovered: SQLite has a virtual machine and its own
bytecode. All queries you execute against a SQLite database are compiled into
SQLite's own little bytecode and then executed on a VM designed for working
with SQLite's database. Go ahead, start `sqlite3 yourdb.sqlite` and then run
`explain select * from yourtable;`. It'll dump the bytecode for that
statement; or any statement you put after `explain`. So cool!

In hindsight, it makes a lot of sense, and a well built VM can be nearly as
efficient as any other alternative.

[https://sqlite.org/arch.html](https://sqlite.org/arch.html)
[https://sqlite.org/opcode.html](https://sqlite.org/opcode.html)

Fun bit of history. The VM used to be stack based, but now it's register
based. I guess they learned the same lessons the rest of the industry learned
over that time period :P

(N.B. the bytecode is for internal use only; it's not a public facing API. You
should never, ever use bytecode directly yourself.)

There are some painful parts of the codebase though. These aren't "cons" per
se. More like necessarily evils.

1) It is filled to the brim with backwards compatibility hacks that make the
code more complex than it strictly needs to be. (Most of these are the result
of various users of the library misusing the API. The SQLite devs are generous
enough to grandfather in the bugs that made those applications work. That's
excellent, but it definitely makes the code more "crusty".)

2) One of SQLite's big features is its flexible memory subsystem. It handles
OOM, and provides an API for completely customizing the memory subsystem. But
given that this is C and memory allocation and interaction is pervasive, the
code ends up littered with function calls and clauses. Handling OOM is no
small task, and often how to handle the OOM is different in different places.
So you can imagine the complexity that adds to the codebase.

Again, those are necessary evils, so its not something I'm "complaining"
about. But I thought they were worth mentioning for fellow adventures like me
who decide to dive in (which I highly recommend).

So, thanks to how well designed SQLite is overall, and their great
documentation, I was able to write a parser in Rust for the SQLite file format
in a handful of hours
([https://sqlite.org/fileformat2.html](https://sqlite.org/fileformat2.html)).
The file format is surprisingly simple. I'm now writing a Cursor to walk the
tables, which is a fun exercise of classic B-Tree algorithms.

~~~
pstuart
> It is filled to the brim with backwards compatibility hacks that make the
> code more complex than it strictly needs to be

They finally ended sqlite4 as an architectural misadventure, but perhaps they
should try again with the current codebase/features but all of the backwards
cruft jettisoned.

------
WorkLifeBalance
According to [1] windowing functions make SQL turing complete.

Does this make SQLite turing complete or has it been turing complete before?

[1]
[http://beza1e1.tuxen.de/articles/accidentally_turing_complet...](http://beza1e1.tuxen.de/articles/accidentally_turing_complete.html)

------
est
This is super cool.

Does anyone know how to upgrade python3's sqlite module to the latest version?

~~~
coleifer
You can use
[https://github.com/coleifer/pysqlite3](https://github.com/coleifer/pysqlite3)

It even supports user-defined window functions using the new sqlite apis.

~~~
simonw
Charles provided really great documentation on how to build this here:
[http://charlesleifer.com/blog/compiling-sqlite-for-use-
with-...](http://charlesleifer.com/blog/compiling-sqlite-for-use-with-python-
applications/)

Or if you're feeling lazy (like I was), there's a fork of his library at
[https://github.com/karlb/pysqlite3](https://github.com/karlb/pysqlite3) which
compiles the 3.25.0 amalgamation by default. This worked for me:

    
    
        $ python3 -mvirtualenv venv
        $ source venv/bin/activate
        $ pip install git+git://github.com/karlb/pysqlite3
        Collecting git+git://github.com/karlb/pysqlite3
        ...
        Installing collected packages: pysqlite3
        Successfully installed pysqlite3-0.2.0
        $ python
        Python 3.6.5 (default, Mar 30 2018, 06:41:53) 
        [GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)] on darwin
        Type "help", "copyright", "credits" or "license" for more information.
        >>> import pysqlite3
        >>> pysqlite3.connect(":memory:").execute("select sqlite_version()").fetchall()
        [('3.25.0',)]

------
anarchimedes
This is awesome! Does anyone know if this update will effect the sqldf package
in R?

------
Dinux
Not sure if SQLite is still a 'lite' database

~~~
MarkusWinand
I've just built the snapshot. The binary is still less than 2MB.

And it works:

    
    
      sqlite> with t(x) as (values (1), (2)) select sum(x) over (order by x) from t;
      1
      3

~~~
davidgould
I just checked a stripped copy of postgresql-10 built with all options on and
it is 7MB. That could be reduced a little by leaving out language support and
ssl support etc.

~~~
plq
While binary size could certainly be a factor in a good deal of embedded
environments, we also need to look at the resource requirements of the binary
in question as well.

Sqlite doesn't need too much more memory than what its binary needs whereas
with postgresql, you need all sorts of bells and whistles just to get the
database system to boot.

~~~
davidgould
Well, sure, sqllite is smaller than postgres, but database system performance
is largely dictated by the size of the buffer cache. For something like a
configuration database (a great use of sqllite) this does not matter. But for
more than a very modest amount of data more memory for buffers will benefit
both postgresql and sqllite.

Anyway, not trying to make the case that postgresql is small compared to
sqllite, it obviously isn't, just wanted to point out that it's not _that_ big
either.

------
usermac
I don't even know what that means but again, because it is SQLite, I up vote
automatically ^_^

~~~
dougmwne
There's a whole categories of analysis queries that are really easy to write
with window functions and very annoying to write without. They help eliminate
nested subqueries and self joins. This should make SQLlite a better choice for
data analysis and reporting.

------
usgroup
SQLite guys, please add FDW support ala Postgres and easy foreign function
support for Python and R, and you’ll corner most of analytics and data
science.

~~~
blattimwind
SQLite has had FDW since forever.
[https://sqlite.org/vtab.html](https://sqlite.org/vtab.html)

SQLite has had user-defined functions since forever.
[https://sqlite.org/c3ref/create_function.html](https://sqlite.org/c3ref/create_function.html)

Another underappreciated feature (while we're at it) would be WAL instead of
undo-journaling
([https://www.sqlite.org/wal.html](https://www.sqlite.org/wal.html)), which
enables concurrent reading and writing of SQLite databases. Has been available
for some ten years or so, but is off by default.

Functions are available through the Python bindings (which are not maintained
by the SQLite project), virtual tables I think, aren't. Alternate bindings
([https://rogerbinns.github.io/apsw/](https://rogerbinns.github.io/apsw/))
claim to achieve "Everything you can do from the SQLite C API" interop, which
would include virtual tables.

~~~
electroly
Having written a fairly significant amount of SQLite virtual table module
code, I think it isn't quite sophisticated enough to be useful. I regret
building my application atop that virtual table functionality. Many parts of
queries that are vital for efficient execution are not pushed down to the
virtual table provider; for instance there is no way for "SELECT COUNT(*) FROM
table" to do anything but retrieve every field of every row and return them to
the SQLite engine, because the interface does not push the "COUNT" down to the
virtual table provider. Even if your underlying data store can trivially
answer the question directly, you have to pull all the data and let SQLite
compute the answer. So on and so forth.

~~~
bobwaycott
Out of curiosity, why would you ever run a SELECT COUNT(*)? What use case
demands that versus, say, SELECT COUNT(col) where col is the name of a column
in the table?

~~~
yread
eh? if you want to know the number of rows in a table select count(*) is the
idiomatic way

~~~
bobwaycott
No. That’s the way one may see the first time they’re learning what a SELECT
statement is, but it’s most certainly not what you’d ever do in anything near
production code.

The parent made this statement:

> ... for instance there is no way for `SELECT COUNT(star) FROM table` to do
> anything but retrieve every field of every row and return them to the SQLite
> engine.”

The only response here is, “Of course! That’s what you told it to do! Why
would it do anything else—and why would you ever tell it to do that?!”

Edit: To be clear and not sound as though I’m just being dismissive, you can
of course use `COUNT(star)`, but the parent made it sound like this was
something non-trivial. There are better ways to write such a query when you
care about the exact count of rows and, assuming you can’t depend on grabbing
the max primary key due to deletions, you’re complaining about the database
wasting cycles pulling back irrelevant info. Get to know your database, as it
often has features to query such metadata in a much smarter and more
performant way.

PS: I replaced the asterisk with the word “star” because it’s making the
formatting of this comment atrocious. Apologies for all the edits. I give up
now.

~~~
electroly
I'm sorry, you've misunderstood. I'm fully aware that you can query it more
efficiently, and in fact that's the point. If you're writing a virtual table
provider, SQLite does not provide enough information about the actual user's
query for you to know that you can do it more efficiently. There's no way to
know that the user wants a count _at all_ ; the count function is not pushed
down into the virtual table provider interface. SQLite only tells you that it
needs you to retrieve every field in every row. You can't know that, in
actuality, SQLite only wants to count the rows.

~~~
bobwaycott
Ah. My apologies. I did misunderstand what you were calling attention to.

