
SQLITE: JSON1 Extension - ashish01
https://www.sqlite.org/json1.html
======
jzelinskie
It should be made clear that this functionality is not equivalent to the
Postgres JSON support where you can create indexes based on the structure of
the JSON. This is an extension to the query language that adds support for
encoding/decoding within the context of a query:

    
    
        SELECT DISTINCT user.name
        FROM user, json_each(user.phone)
        WHERE json_each.value LIKE '704-%';
    

It's pretty neat considering they rolled it all on their own:
[https://www.sqlite.org/src/artifact/552a7d730863419e](https://www.sqlite.org/src/artifact/552a7d730863419e)

PS: If you haven't looked at the SQLite source before, check out their tests.

~~~
brotherjerky
Anything specific about their tests? I found this doc:
[http://www.sqlite.org/testing.html](http://www.sqlite.org/testing.html)

~~~
jncraton
SQLite is frequently cited for having incredible robustness, and this is
certainly related to incredible test coverage.

Many of the stats on that page are impressive, but the one that always gets me
is that for 122 thousand lines of production code, the project has 90
_million_ lines of tests.

~~~
sAbakumoff
Though I am a huge fan of SQLite, I am not sure if the incredible test
coverage necessarily leads to success :

"Trying to improve software quality by increasing the amount of testing is
like trying to lose weight by weighing yourself more often. What you eat
before you step onto the scale determines how much you will weigh, and the
software-development techniques you use determine how many errors testing will
find. If you want to lose weight, don't buy a new scale; change your diet. If
you want to improve your software, don't just test more; develop
better."(McConnell, Steve (2009-11-30). Code Complete (Kindle Location 16276).
Microsoft Press. Kindle Edition)

Another unique aspect of SQLite code base is total sticking to
KISS([http://www.jarchitect.com/Blog/?p=2392](http://www.jarchitect.com/Blog/?p=2392))

~~~
austinpray
That is not an apt analogy. Unit tests improve code quality: code that is
easily testable has high cohesion and low coupling. Tests can also serve as
valuable documentation.

~~~
Zach_the_Lizard
I have to work in a few codebases at work that require 100% line and branch
coverage.

There are many cases where someone will write tests that hit an endpoint
directly and then assert on the whole response, which in this case is quite
huge. They'll then do so for all branches.

Their library / service / etc. code is technically exercised, sure, but
doTheFooThing() isn't directly tested, so it could have a bug that is only
exposed from another caller with different parameters that would be caught
with direct testing. Extreme coupling happens all the time.

Now I'm slowly converting it to sanity, and my teammates are copying me.

To be fair, it was one of those "Get this out now because we're dying" kind of
codebases, not do to lack of skill.

But once code is written, it's hard to undo. Then the bad pattern becomes
"keeping the same style"

~~~
dagss
I am in favor of testing the way you complain about. The advantage is
primarily that you can refactor code without getting bogged down in having to
change tests. An API is a) more stable, so you are maintaining compatability
for that surface anyway, and b) uses your code under exactly the preconditions
that really matter.

If doTheFooThing() is called from somewhere else than that somewhere else
should also have tests. So I find that an argument from "purity" more than
practical consideration about bug probability.

Also, if you _only_ test doTheFooThing but not the API then you could
accidentally refactor yourself into breaking the API in a backwards-
incompatible way (or not be bug-compatible, which is sometimes required, or at
least you should detect it and check logs and warn consumers). So the API
tests are needed anyway.

There's a balance of course, if doTheFooThing() is an important internal
cross-road, or if it is algorithmically non-trivial, or important for other
reasons then it should be tested in seperation. But between only semi-
integration tests (hitting endpoints and checking responses), and only lots of
small unit tests that break or needs rewriting for the simplest refactorings
but doesn't catch subtle API breakage, I'd want to work with the former any
day. The units of code are often trivial where mistakes are not made, and the
mistakes comes when stringing them together, and then it is more difficult to
trust the human capacity to figure out failure scenarios than just run the
real handlers.

~~~
lsaferite
I've gone through various stages on my feelings about full coverage unit
testing vs functional testing.

Having full unit coverage with full path coverage is a great ideal. The
reality is that for most companies the overhead of having and maintaining
these tests is impossible from a business POV. Getting buy-in to spend more
than 50% of your time maintaining (writing, updating, verifying) tests is a
very hard sell. For companies with under-staffed/over-worked development
teams, it just doesn't happen.

At this point in my career I'm firmly in the camp that functional testing is
really what matters in most cases. It is a compromise between the realities of
business and the needs of engineering. I can test the end product of the
development work as a whole and identify that everything works as expected and
has no known strange side-effects. This also serves as a contract with your
business users as to functionality of the product. If bugs are discovered you
can add specific use-case tests to attempt to trigger the bug and prevent
regression. All of this does not preclude limited unit testing as well for
critical code paths. I find this to be a much more pragmatic approach.

------
Lxr
Wait, what? Isn't storing JSON data as text in a relational DB against all
kinds of normalisation rules? Under what circumstances should one do this?

~~~
singingfish
when you have arbitrary semi-structured data of limited scope to store. For
example the stuff I was working on today is pluggable payment infrastructure.
The vendor response is stored json (comes down the wire as xml or json, json
is easier, I refuse to anticipate the structure of the data for future
providers but I want the whole data returned for debug purposes), and the
extra data it requires for transaction resolution is also json. Again I have
no idea what this will look like for future payment providers, and this data
will not result in consequences for other bits of the database.

~~~
mrcarrot
Yep, I've done something very similar for payment processing (from multiple
providers) recently. Have a well defined schema containing the columns that I
_know_ I need now for handling a transaction, but also include a jsonb column
that stores all of the data that the payment provider provides for the
transaction.

For one, this makes debugging easier, and it also means that should business
needs change in the future and some field that we've been receiving becomes
important for payment processing, it can be extracted from the json field and
promoted to a column in the table, without having to _now_ define a load of
columns for every possible field that every payment provider can ever supply.

------
nbevans
SQLite is a great database for microservices and other minimalist
architectures. You'll never get a "TCP socket failed/timeout/reset" from
SQLite - that's for sure.

------
vhost-
I always seem to be the black sheep in a group of people when I say that I
love sqlite. It's seriously so handy. If I need to aggregate or parse big CSV
sheets, I just toss them in a sqlite database and run actual SQL queries
against the data and it makes the process much faster and I can give someone
their data in relational form on a thumbdrive or in a tar.gz.

~~~
m_fayer
I thought everyone loved SQLite! It's tiny, no-fuss but full-featured,
performs well, and works great as a data-interchange format. I use it in all
my simple ad-hoc personal apps, and where would mobile be without it?

~~~
vhost-
Not to mention browsers. Many people are surprised to learn iMessage, Chrome
and Firefox all use SQLite.

~~~
rodgerd
More people would probably be surprised to discover Adobe Lightroom catalogues
are sqlite databases (which means you can pull all sorts of info out of your
catalogue, should you so desire, or fiddle with it in unsupported ways).

It's a shame I don't see Adobe on the list of sponsors/contributers, I have to
say. Lightroom makes a huge amount of money for them.

------
thinknlive
sqlite 'all the things!'. Seriously. One of the best tools ever. So many data,
and related performance challenges, in almost any app can be solved
efficiently with this (for what it does) tiny little library.

------
mablap
Oooh! I just arrived at work and things are calm, this will be interesting
reading! We use sqlite for mostly every tool we develop.

------
adsharma
Not clear if you can compose these functions. Flatbuffers over rocksdb should
be considered an alternative. You can then use iterlib to do similar things.

Plug:
[https://github.com/facebookincubator/iterlib](https://github.com/facebookincubator/iterlib)

~~~
vdm
Thank you for plugging iterlib here, I hadn't heard of it.

Have you heard of storehaus?
[https://github.com/twitter/storehaus](https://github.com/twitter/storehaus)

In terms of the fundamental abstraction offered, it seems comparable to
iterlib to me, but I'd love to hear your opinion.

(See also: [https://upscaledb.com/](https://upscaledb.com/))

------
ilitirit
I use this for a custom JSON query tool and browser I wrote for our company
(the C# client can load extensions). It's been available for a while. Is this
post to spread awareness or have they added something to new to the extension?

------
giancarlostoro
This is very interesting, I wonder what drove this extension and how things
will shape with SQLite. I always loved that in Python I can 'just use' SQLite
quite easily.

------
nattaylor
>Experiments have been unable to find a binary encoding that is significantly
smaller or faster than a plain text encoding. (The present implementation
parses JSON text at over 300 MB/s.)

I understand that JSONB in Postgres is useful primarily for sorts and indexes.
Does SQLite work around this somehow, or is that just not included in their
experiments?

~~~
haldean
Looks like you can't index based on JSON in SQLite, so they might be
optimizing for different metrics?

~~~
lsaferite
My understanding is that yes you can create an index using this JSON data. You
just have to create an index from an expression.
[http://sqlite.org/expridx.html](http://sqlite.org/expridx.html)

------
burrows
Here are instructions for building the extension as a shared library on OS X
[https://burrows.svbtle.com/build-sqlite-json1-extension-
as-s...](https://burrows.svbtle.com/build-sqlite-json1-extension-as-shared-
library-on-os-x).

I wasn't able to get a working build on Windows.

~~~
voltagex_
What happened on Windows?

~~~
burrows
I was able to build the dll and load it into python, but using any of the
json1 functions caused my python to crash.

------
tenken
Is there a Ubuntu PPA or docker image of SQLite + JSON extension enabled?

I can't get the compiled json1.so to load on Ubuntu 14.04 lts with stock
SQLite.

~~~
onli
I think the version on 14.04 is too old. I found the statement you need 3.9
for that (is that somewhere documented properly?).

For what is worth, the sqlite3 version in Ubuntu 16.04 has that
json1-extension loaded by default:

    
    
        sqlite> CREATE TABLE user (name TEXT, phone TEXT);
        sqlite> INSERT INTO user VALUES("tester", '{"phone":"704-56"}');
        sqlite>  SELECT DISTINCT user.name FROM user, json_each(user.phone) WHERE json_each.value LIKE '704-%';
        tester
    

I'm now even more happy I upgraded my server to that version yesterday, the
server for that project was like yours on 14.04. It was even also because of
sqlite, I wanted to have a version that supports the WITH clause. Upgrading to
16.04 (actually, I made a fresh install and moved the data over) seemed like
the easiest way to get that.

------
GrumpyNl
How does this perform on larger tables?

------
zxv
Is there any plan to support jsonb (or similar) which would speed processing
by eliminating the need to reparse?

------
mayli
The embedded version of mongodb?

~~~
Buttons840
> The json_object() function currently allows duplicate labels without
> complaint, though this might change in a future enhancement.

Just like Mongo. I was suprised to learn that Mongo can actually store an
object with two identical keys. Most drivers will prevent you from doing so,
and will fail to display the record fully if it does happen, but it is
possible.

