Hacker News new | past | comments | ask | show | jobs | submit login
SQLite Plus: Missing SQLite functions (github.com/nalgeon)
214 points by tosh on April 3, 2021 | hide | past | favorite | 52 comments



SQLite doesn't compile it in by default, but there is a virtual table CSV implementation in the ./ext folder: https://www.sqlite.org/src/artifact?ci=trunk&filename=ext/mi...

Similar for regex: https://sqlite.org/src/file?name=ext/misc/regexp.c&ci=trunk

And percentiles: https://sqlite.org/src/file?name=ext/misc/percentile.c&ci=tr...


The VSV extension has additional arguments that may be useful in some situations:

https://github.com/nalgeon/sqlite-plus/blob/main/docs/vsv.md


How are these enabled if they are already compiled in? Or are they automatically enabled if they are compiled in?


The official binary build process flags them off, likely.


I'm still just absurdly happy that it now has a bunch of math functions in the latest release: https://sqlite.org/lang_mathfunc.html


Out of curiosity: what is your use case and why do you prefer to use them in the DB instead of the application itself?


If you can put the logic into the SQL and filter some data out, that's likely to be a performance win.

Whether the team benefits from tarting up the SQL beyond SELECT * is worth examining, however.


Generally I like to be able to ask questions of my database directly in a sql editor window. There's all sorts of cases where that is useful.


So you can filter and join on computed values. What if there is no application?


I never thought about this - how would you present the data to the user then if there's no application?


The "DB" guy at my workplace is regularly asked about certain stats from the production db, he runs some query, dumps it in a csv-formatted file and mails that to whoever asked for it.


Indexes over computed fields can be quite handy.


Without a comprehensive test suite, this set of extension libraries is rather a black box.

Sure, one can look over the actual code to make sense if the functions do what they should, but one of main SQLite's features is its in-depth testing.

It'd be nice to have the same testing approach extended to this set of utilities.


Related: https://duckdb.org/

It's OLAP, not OLTP like SQLite.


Does OLAP vs OLTP describe a real technological difference or just a description of the use cases a specific tool is trying to address?


Generally OLAP databases are columnar or otherwise optimized for batch and aggregations.


Yes it does describe a technological difference. Specifically, the kind of tasks that they are good for.

My very crude shorthand is that olap == optimized for time series (usually column oriented) and oltp == optimized for transactional processing (row oriented).


The "description of the uses cases" answer can also be a reasonable shorthand for databases which support both, which allows the DBA to optimize on a table-by-table (and even column-by-column) basis.


If a table is constructed WITHOUT ROWID, then its spatial locality will reflect the consecutive order implied by the primary key. Range-delimited queries are extremely fast.

In effect, SQLite already is both an OLTP and OLAP database.


As you pointed out, SQLite does implement clustered index, but it is not columnar.


[flagged]


...


A libicu free Unicode extension. First time I hear about that and just yesterday I was slightly bothered by the prospect of compiling sqlite with icu again. Thank you to whoever wrote the extension, I'll try it out soon.


There's also https://cgsql.dev/

This allows for stored procedures in a T-SQL variant that gets compiled to C. It is on my list to check out but I don't have any experience with it yet.


Some time ago I joined a project that was recording data logs in Protobuf format. I demonstrated that we could log the serialized Protobuf structures to a SQLite database and then use a plugin to query across the message fields: https://github.com/rgov/sqlite_protobuf

Learning how to use the SQLite extension API was a little mind-expanding but it worked like a charm.


Flatbuffers would be better as it doesn't need to deserialize at all (I think protobuf supports to partially deserialization nowadays?). I did something similar with https://github.com/liuliu/dflat too!


I attempted something similar with large rosbag files, but the performance gain wasn’t enough to make the initial processing worth it. That said, I’d like to try again with duckdb as that seems like it might be better suited to time series analysis.


One of the things I really like about SQLite is the fact that it is in the public domain.

But most third-party extensions, like this one, aren't. (This extension is MIT licensed.) If you think SQLite being in the public domain is a positive, then that's a positive these extensions lack.

(Or a public domain near-equivalent license like Unlicense or CC0, which contains an ultra-permissive copyright license for those jurisdictions which don't allow one to voluntarily put stuff in the public domain.)


MIT is already an extremely permissive license. Seems kinda mean to complain that something is merely 90% good because you are comparing to something 99% good.

What sort of problems do you envisage having with an MIT license that public domain would fix?


After reading SQLite's page on their public domain approach they seem to go through great lengths to make sure it stays public domain.

https://sqlite.org/copyright.html

Including having their closed contributors sign affidavits and stored in a physical location. This may be what OP is referring to.

Compare that with an MIT license with no "hardening" like affidavits so that it's easy to mutate to another license type, I can see their point.

Interesting stuff since I never knew this "public domain" approach before!


Their license text has always been my favorite (well, perhaps except for the original JSON license):

* The author disclaims copyright to this source code. In place of a legal notice, here is a blessing:

* May you do good and not evil.

* May you find forgiveness for yourself and forgive others.

* May you share freely, never taking more than you give.

edit: thanks for the correction about the jquery license.


do you have the link for the original JSON license? It looks remains only "The Software shall be used for Good, not Evil."


That's what I was talking about. Crawford said in a talk he had to make an amended licenses for corporations (namely, IBM), who had lawyers that needed to be able to get around the evil restriction.

https://www.youtube.com/watch?v=-hCimLnIsDA


What is the original JQuery license?


Likely referring to the [JSON license](https://www.json.org/license.html) and not the jQuery license. The JSON license has this problematic (for many businesses) phrase: The Software shall be used for Good, not Evil.


So, if you include N different MIT-licensed libraries in your app, you have to remember to ship N different third party copyright licenses in your binary (or documentation shipping with the binary). And you need to keep an eye out if they change – if you add or remove a third party library, or if a third party library adds the name of a new contributor to its copyright notice in a newer version.

It gets even worse when you consider that there are MIT-like licenses, which can have fundamentally the same terms as MIT, but with slightly different wording. Then you can have to ship multiple variant license texts too.

Public domain (and public domain-equivalent licenses) you don't have to worry about this, because there is no requirement to include copyright notices or copy of license text.

I'm not saying you shouldn't credit people whose contributions you use, but you can make that a non-legal expectation of courtesy rather than a make-work legal requirement.


Thanks for the post, this made me research SQLite's public domain approach and I never heard of it before! Something I'll keep in mind moving forward as I always thought MIT was the "end all be all"

https://sqlite.org/copyright.html


A simple mnemonic device to help remember: licenses are for losers.

#VivaLaPublicDomain


By itself it is wrong. Public domain is still a license. License let you do anything is a must as copyright is granted auto. You need to license. If the public domain does not license, there is no public domain (except those fall into it after certain number of years).

You may sense why the need of effort by SQLite to confirm its public domain status and left other not compiled in.

Another issue is the free vs open source debate. Does it have to be pure. Another time but SQLite seems allow both, just not in its core.


I fear this comment will come across as pedantic, but I think it's a distinction worth highlighting. I am not a lawyer, but I care about this stuff and have studied it.

My (somewhat educated) understanding is that public domain is a rejection of copyright, and is therefore the lack of a license. Rather that saying "I'm the owner, and you can do whatever you want", it instead says "I am not the owner", and indeed, even further "No one is the owner". Naturally, this can only be said by the person who would have held the copyright, so paperwork is still necessary to show this.

Stanford[0] has a page about this that's pretty interesting (among other things, it mentions public domain works can be copyrighted again if they are gathered into collections, which I find fascinating). The intro text on public domain says this:

> The term “public domain” refers to creative materials that are not protected by intellectual property laws such as copyright, trademark, or patent laws. The public owns these works, not an individual author or artist. Anyone can use a public domain work without obtaining permission, but no one can ever own it.

In the case of sqlite, I think it falls under "dedicated works"[1], which is one of the four ways they mention that leads to a work falling into the public domain.

[0]: https://fairuse.stanford.edu/overview/public-domain/welcome/

[1]: https://fairuse.stanford.edu/overview/public-domain/welcome/...


> Public domain is still a license

If you read the CC0 – https://creativecommons.org/publicdomain/zero/1.0/legalcode – it is structured into two parts, a waiver (in section 2) and a fallback license (in section 3). The license is certainly a license, but is the waiver a license? Especially since the license, by its terms, only applies if the waiver does not – if the waiver is judged valid in some legal jurisdiction, the license does not apply.

Some of the language in section 2 could even be read as settling a trust, with the general public as the beneficiary, the copyright as the trust property, and the copyright owner as both settlor and trustee.


You don't need a license to communicate. #LibertyNotLicenses


Take also a look at https://github.com/daitangio/sqlite_ora_functions

A similar project python-based


After CSV files as virtual tables, I'd like to see folders of JSON files as virtual tables. I know there's a whole virtual filesystem kit based around SQLite but what I want is simpler.


I don't think that a folder full of JSON files is actually simpler. There are SQLite drivers for nearly every language out there and querying SQLite is not only very easy and fast but it's also all contained in a single file you can copy around.

What do you want to gain by using JSON files specifically? Unless you want a document database but then SQLite is the wrong kind of tool anyways.


I read it as "I already have or can easily get a folder of JSON files (each representing a table); I want to open them up in SQLite with nearly zero effort."

That should be hackable with something like json2csv[1], but I imagine the intent is a desire for something supported, rather than awfully hacked up.

1 - https://github.com/zemirco/json2csv


That makes way more sense. GGP is probably meaning exactly that, specifically vsv from the OP that I missed initially.

https://github.com/nalgeon/sqlite-plus/blob/main/docs/vsv.md


Not poster, but also often have large numbers of json-ld files. Json events are really common, and saving events to files is really common, and wanting to query them is also common.


maybe storing files in mongodb instead of a filesystem can be solution?


This is a really good idea. I'm using SQLCipher right now but might swiitch over as I don't really use the encryption feature.


For me, the one thing I need to plug SQLite in at work us right outer join.

Kinda disappointed this doesn’t fix that.


I would be interested to know more - I have never used a right join. Why can't you just reverse the order of the join?


When you are dynamically joining 10s and 10s of tables in an decades old codebase with a public API, that’s easier said than done.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: