Hacker News new | past | comments | ask | show | jobs | submit login
JSONB has landed (sqlite.org)
669 points by nalgeon 86 days ago | hide | past | favorite | 205 comments

Lots of confusion on what JSONB is.

To your application, using JSONB looks very similar to the JSON datatype. You still read and write JSON strings—Your application will never see the raw JSONB content. The same SQL functions are available, with a different prefix (jsonb_). Very little changes from the application's view.

The difference is that the JSON datatype is stored to disk as JSON, whereas the JSONB is stored in a special binary format. With the JSON datatype, the JSON must be parsed in full to perform any operation against the column. With the JSONB datatype, operations can be performed directly against the on-disk format, skipping the parsing step entirely.

If you're just using SQLite to write and read full JSON blobs, the JSON datatype will be the best pick. If you're querying or manipulating the data using SQL, JSONB will be the best pick.

TFA doesn't say that this is _the_ JSONB of PostgreSQL fame, but I assume it must be. JSONB is a brilliant binary JSON format. What makes it brilliant is that arrays and objects (which when serialized are a sort of array) are encoded by having N-1 lengths of values then 1 offset to the Nth value, then N-1 lengths of values then... The idea is that lengths are going to be similar, while offsets never are, so using lengths makes JSONB more compressible, but offsets are needed to enable random access, so if you store mostly lengths and sometimes offsets you can get O(1)ish random access while still retaining compressibility.

Originally JSONB used only offsets, but that made it incompressible. That held up a PostgreSQL release for some time while they figured out what to do about it.

Ah, https://sqlite.org/draft/jsonb.html says:

> The "JSONB" name is inspired by PostgreSQL, but the on-disk format for SQLite's JSONB is not the same as PostgreSQL's. The two formats have the same name, but they have wildly different internal representations and are not in any way binary compatible.

> TFA doesn't say that this is _the_ JSONB of PostgreSQL fame, but I assume it must be.

Definitely is not: Hipp states

> JSONB is also slightly smaller than text JSON in most cases (about 5% or 10% smaller)

Whereas in Postgres jsonb commonly takes 10~20% more space than json.

Link? My reading last time I looked at this was that the sqlite and postgres “jsonb” were different.

This is the first time that SQLite3 is getting JSONB. Idk if it's the same as PG's. TFA doesn't say. I assume they are the same or similar because I seriously doubt that D.R. Hipp is unaware of PG's JSONB, but then for that reason I am surprised that he didn't say in TFA.

https://sqlite.org/draft/jsonb.html references postgres:

> The "JSONB" name is inspired by PostgreSQL, but the on-disk format for SQLite's JSONB is not the same as PostgreSQL's. The two formats have the same name, but they have wildly different internal representations and are not in any way binary compatible.

https://sqlite.org/draft/jsonb.html says they're NOT the same.

> If you're just using SQLite to write and read full JSON blobs, the JSON datatype will be the best pick.

There is no json data type ! If you are just storing json blobs, then the BLOB or TEXT data types will be the best picks.

In context, what is clearly meant is, "If you're just reading and writing full JSON blobs, use `insert ... ($id,$json)`; if you're primarily querying the data, use `insert ... ($id, jsonb($json))`, which will convert the text into JSONB before storing it.

> There is no json data type

Why not? I feel like a database should allow for a JSON data type that stores only valid JSON or throws an exception. It would also be nice to be able to access subfields of the JSON.

SELECT userid, username, some_json_field["some_subfield"][0] FROM users where ...

Not sure where to give feature suggestions so I'm just leaving this here for future devs to find

SQLite has a surprisingly small number of types: null, integer, real, text, and blob. That’s it.


> It would also be nice to be able to access subfields of the JSON.

Not that it's not _useful_ sometimes, but it amuses me that this is a huge violation of 1NF and people are often ok with it. It really depends on whether you're treating the JSON object as an atomic unit on its own, regardless of contents, or using the JSON to store more fine-grained information.

I guess the same argument can be made for XML data types, and everyone's OK with it too.

The principles of relational algebra aren't always compatible with real applications.

They are successful beyond anything that I can imagine people expecting when creating them. But they are not that complete silver bullet that solves every problem humanity will ever need solved.

Database design is, unfortunately, a lost art. Up through the early 2010's I remember having design reviews for database schemas, etc. That isn't "agile"... so you just fix it in the next sprint, as you explain to someone what a unique constraint is and why a table is now full of duplicates.

1NF is a theory construct that makes no accommodation for real-world performance. I'm shuddering even thinking how many tables and joins I would need to store some of these 3rd party JSON things I need to import & refine.

The H2 database uses almost that exact syntax:

  create table foo(my_json json);
  insert into foo values ('{"a":{"b":{"c":"d"}}}' FORMAT JSON);
  select (my_json)."a"."b"."c" from foo; -- where the () around the field is mandatory

SQLite has a very few actual data types. There is a function that validates json and you can use it as a check on your `TEXT` or` `BLOB` column.

You can access subfields using `json_` and `jsonb_` functions.

Indeed. Those who use Postgres are already familiar with the difference.

Rule of thumb: if you are not sure, or do not have time for the nuances, just use JSONB.

There’s a huge nuance worth mentioning: with JSONB you lose key ordering in objects.

This may be desired - it makes two effectively-equal objects have the same representation at rest!

But if you are storing human-written JSON - say, configs from an internal interface, where one might collocate a “__foo_comments” key above “foo” - their layout will be lost, and this may lead to someone visually seeing their changes scrambled on save.

> There’s a huge nuance worth mentioning: with JSONB you lose key ordering in objects.

JSON never guarantees any ordering. From json.org:

> An object is an unordered set of name/value pairs.

If you depend on a given implementation doing so, you're depending on a quirk of that implementation, not on a JSON-defined behavior.

Using a function to query a fact about a string does not imply that the string should change, regardless of what guarantees are present within the format that the string is in. If you expect to get out the same string you put in, then there's nothing implementation-quirk about that.

> Using a function to query a fact about a string does not imply that the string should change, ... then there's nothing implementation-quirk about that.

Fair enough when considering only the string value. My ill-expressed point was more about JSON as a data exchange format - it may pass through any number of JSON parser/storage implementations, any of which may use an arbitrary order for the keys.

You're not storing a string, you're storing a JSON object. If you want to store a string use TEXT or VARCHAR.

That is what this post chain is about. The nuance between choosing to store your JSON as a string in a TEXT column or as JSONB in a BLOB column.

What about SQLite JSONB? Does it maintain order?

UPDATE: I think SQLite JSONB does maintain order. For example:

    select json(jsonb('{"a": 1, "b": 2, "c": 3, "d": 4}'));"
    -- {"a":1,"b":2,"c":3,"d":4}
And it does maintain order when adding new keys:

    select json(jsonb_insert(jsonb('{"a": 1, "b": 2, "c": 3, "d": 4}'), '$.e', 99));
    -- {"a":1,"b":2,"c":3,"d":4,"e":99}
(tested on https://codapi.org/sqlite/)

JSON processors are not required nor expected to retain object key ordering from either input or object construction order.

The question is whether people think of their database as a processor of data or a storage location for data, exactly as it was provided. Given that a lot of people use SQLite as the latter, it's a worthwhile caveat to make people aware of.

Additionally, Javascript environments (formalized in ES2015), your text editor, and your filesystem all guarantee that they'll preserve object key order when JSON is evaluated. It's not unreasonable that someone would expect this of their database!

If you're storing it exactly as provided then surely until you know it's actually valid json you want a string, not a json/jsonb column.

> The question is whether people think of their database as a processor of data or a storage location for data

If you use the database's JSON type(s) and/or functions, then yes, your database is a JSON processor.

And, yes, your database is not a dumb store, not if it's an RDBMS. The whole point of it is that it's not a dumb store.

> Javascript environments ... guarantee that they'll preserve object key order when JSON is evaluated

Yes, but only for non-integer keys.

Is there such a thing as a non-string JSON key to be evaluated? https://www.json.org/ suggests not.

Yes, in the context of JSON handling by JavaScript. I meant string keys that can be interpreted as non-negative numbers. Those are traversed first, the rest of the keys are coming in insertion order. For example:

    > JSON.stringify({'one': 1, '3': 3, 'two': 2, '2': 2, 'three': 3, '1': 1})

Thanks for explaining. That behaviour is... interesting!

That compounds really badly with another lack of specificity in JSON: parsers can treat duplicate keys in arbitrary ways, e.g. keep only the first or last value, combine values somehow, error out, etc.

When keys are not unique, their order can matter again, and can matter differently to different parsers.

Yes, but it's not impossible to achieve it in practice.

For example, JavaScript JSON.stringify and JSON.parse have well-defined behaviour:

> Properties are visited using the same algorithm as Object.keys(), which has a well-defined order and is stable across implementations


> The traversal order, as of modern ECMAScript specification, is well-defined and consistent across implementations. Within each component of the prototype chain, all non-negative integer keys (those that can be array indices) will be traversed first in ascending order by value, then other string keys in ascending chronological order of property creation.


Similarly, Python json module guarantees that "encoders and decoders preserve input and output order by default. Order is only lost if the underlying containers are unordered." Since 3.7 dict maintains insertion order.



Yes, JS and Python behaviour are not the same for all cases, however, non-integer keys do maintain order across Python and JavaScript.

> Yes, but it's not impossible to achieve it in practice.

For any one implementation. But there's a very large number of implementations. You just can't count on object key order being preserved, so don't.

There are situations where keeping order is useful. For example, human-editable json config files.

I also gave an example of two implementations that are compatible for a useful subset of keys. By the way, SQLite JSONB keeps object keys in insertion order, similar to Python: https://news.ycombinator.com/item?id=38547254

> There are situations where keeping order is useful. For example, human-editable json config files.

You might have to just normalize every time you want to edit that.

You can count on object key order being preserved when you control the implementation. Which is why it's useful to know whether a given implementation preserves key order.

Slight caveat. It seems that you should default to JSONB for all cases except for values directly returned in the query result. As IIUC you will get a JSONB blob back and will be responsible for parsing it on the client.

So use it for writes:

    UPDATE t SET col = jsonb_*(?)
Also use it for filters if applicable (although this seems like a niche use case, I can't actually think of an example).

But if returning values you probably want to use `json_*` functions

    SELECT json_extract(col1, '$.myProp') FROM t
Otherwise you will end up receiving the binary format.

Wait is it really true that if you use SQLite's JSONB format, `select my_json_column from foo` becomes unreadable to humans? That seems.. unacceptable. One would expect it to convert its internal format back to JSON for consumption.

Is there any downside to storing JSON-B even if you’re not planning to query it? For example, size on disk, read/write performance?

There’s processing to be done with JSONB on every read/write, which is wasted if you’re always reading/writing the full blob.

Which occurs with JSON as well (SQLite doesn't have a dedicated JSON nor JSONB type). The only actual cost would be the conversion between JSONB and JSON.

No, SQLite's "JSON" is just TEXT, there's no overhead with reading/writing a string.

That's what I said I think? "JSON" is a TEXT that is handled as a JSON string by `json_*` functions, while "JSONB" is a BLOB that is handled as an internal format by `jsonb_*` functions. You generally don't want JSONB in the application side though, so you do need a conversion for that.

Yes, but when you use the BLOB with jsonb functions, your application demands go from:

Read JSON from TEXT column.

Parse JSON into Internal Binary Format.

Run json_*() function on this format, which will Serialize Internal Binary Format to JSON as output.


Read JSONB from BLOB column.

Run json_*() function on Internal Binary Format, which will serialize the Internal Binary Format to JSON as output.


The json_* and jsonb_* all accept _either_ JSON or JSONB as their input. The difference is jsonb_* functions also produces it as output. So even in the above case, if your function output is just being used to feed back into another table as a BLOB, then you can use the jsonb_* version of the function and skip the serialization step entirely.

Oh, you are right! I later looked at the draft documentation and realized that `json(b)_` only determines the output type. That said, it is still true that you need one more function call `json(...)` to retrieve a textual form of JSON which you do often need instead of JSONB.

> Run json_*() function on Internal Binary Format, which will serialize the Internal Binary Format to JSON as output

I don’t think that’s always correct (it is within this thread, which stated (way up) “if you’re always reading/writing the full blob”, but I think this discussion is more general by now). Firstly, it need not convert the full jsonb blob (example: jsonb_extract(field,'$.a.b.c.d.e'))

Secondly, it need not convert to text at all, for example when calling a jsonb_* function that returns jsonb.

(Technically, whether any implementation is that smart is an implementation detail, but the article claims this is

“a rewrite of the SQLite JSON functions that, depending on usage patterns, could be several times faster than the original JSON functions”

so it can’t be completely dumb.

There’s no point using json_* functions if you’re always reading/writing the full blob.

Of course (and I never said that), but if you need to store a JSON value in the DB and use a JSONB-encoded BLOB as an optimization, you eventually read it back to a textual JSON. It's just like having a UNIX timestamp in your DB and converting back to a parsed date and time for application uses, except that applications may handle a UNIX timestamp directly and can't handle JSONB at all.

It doesnt validate that its valid json?

No, it doesn't. By default in sqlite3, you can store whatever you like in whatever field you want.

The "built-in validation mechanism" is invoking json_valid(X) [1] call within the CHECK [2] condition on a column.

[1] https://www.sqlite.org/json1.html#jvalid

[2] also assumes you didn't disable CHECKs with PRAGMA ignore_check_constraints https://www.sqlite.org/pragma.html#pragma_ignore_check_const...

If the order of items in the JSON blob matters then JSONB probably wouldn't preserve the order.

JSON is unordered. Nothing in your code should assume otherwise.

"An object is an unordered collection of zero or more name/value pairs, where a name is a string and a value is a string, number, boolean, null, object, or array."

That’s exactly the kind of difference between json and jsonb that you gotta keep in mind. Object properties are unordered, but a json string is very much ordered. It’s the same sequence of characters and lines each time, unless you parse it and dump it again. So if you want to preserve an unmodified original json string for some (e.g. cosmetic) reasons, you probably want json.

I would expect you are threading on dangerous grounds to assume a type called JSON is going to preserve the data byte for byte. It might currently but I doubt that is in the API contract. You really want to use TEXT if that is your requirement

I don't know what SQLite does, but in JS the order is actually defined. JS is not Java, object is not HashMap, if anything, it's closer to LinkedHashMap, but even that is not correct because there are numeric slots which always go first. https://tc39.es/ecma262/#sec-ordinaryownpropertykeys

> but in JS the order is actually defined

Defined yes, but still arbitrary and may not be consistent between JSON values of the same schema, as per the document you linked to:

> in ascending chronological order of property creation

Also, while JSON came from JS sort-of, it is, for better or worse (better than XML!) a standard apart from JS with its own definitions and used in many other contexts. JSON as specified does not have a prescribed order for properties, so it is not safe to assume one. JS may generally impose a particular order, but other things may not when [re]creating a JSON string from their internal format (JSONB in this case), so by assuming a particular order will be preserved you would be relying on a behaviour that is undefined in the context of JSON.

Yes, I know, the point was not to say that it's safe to depend on this universally, but rather why it's safe in JS and why it's not elsewhere -> other languages use hash maps simply because authors were either lazy or unaware of the original behaviour. (which sucks, in my opinion, but nobody can fix it now)

> other languages use hash maps simply because authors were either lazy or unaware

Because it is idiomatic in that language, and you often don't need the higher overhead of tracking insertion order.

Before Python 3.7, json.loads used dict (undefined order) and you needed to explicitly need to override the load calls with the kwarg `object_pairs_hook=collections.OrderedDict` to accept ordered dictionaries.

Since Python 3.7 all `dict`s are effectively `collections.OrderedDict` because people now expect this kind of inefficient default behavior everywhere. ¯\_(ツ)_/¯

The same goes for maps in Go, which now explicitly randomizes map iteration with the range keyword to prevent developers from relying on a particular ordering. Neat trick.

That might get them to rely on the randomization, though :)

You make it sound like it's one of the laws of physics.

ON part of JSON doesn't know about objects. When serialized, object entries are just an array of key-value pairs with a weird syntax and a well-defined order. That's true for any serialization format actually.

It's the JS part of JSON that imposes non-duplicate keys with undefined order constraint.

You are the engineer, you can decide how you use your tools depending on your use case. Unless eg. you need interop with the rest of the world, it's your JSON, (mis)treat it to your heart's content.

> It's the JS part of JSON that imposes non-duplicate keys with undefined order constraint.

Actually since ES2015 the iteration order of object properties is fully defined: first integer keys, then string keys in insertion order, finally symbols in insertion order. (Of course, symbols cannot be represented in JSON.)

And duplicate properties in JSON are guaranteed to be treated the same way they are in object literals: a duplicate overwrites the previous value but doesn't change the order of the key.

Concretely that means if you write:

    Object.entries(JSON.parse('{"a":10, "1":20, "b":30, "a":40}'))
This is guaranteed to evaluate to:

    [['1', 20], ['a', 40], ['b', 30]]
(Note that '1' was moved to front, and 'a' comes before 'b' even though the associated value comes from the final entry in the JSON code.)

Python made a similar change in version 3.6 (officially since 3.7), both with regards to insertion order and later values overwriting earlier ones while preserving order. I think the only difference at this point is that Python doesn't move integer-like keys to the front, because unlike JavaScript, Python properly distinguishes between different key types.

All the keys in JSON are strings. I do not understand why string literal containing number would be moved first. Maybe in JS-object in runtimes.

> You make it sound like it's one of the laws of physics.

The text I quoted is from the RFC. json.org and ECMA-404 both agree. You are welcome to do whatever you want, but then it isn't JSON anymore.

That does not follow.

JSON formatted data remains JSON no matter if you use it incorrectly.

This happens all the time in the real world - applications unknowingly rely on undefined (but generally true) behavior. If you e.g. need to integrate with a legacy application where you're not completely sure how it handles JSON, then it's likely better to use plain text JSON.

You may take the risk as well, but then good luck explaining that those devs 10 years out of the company are responsible for the breakage happening after you've converted JSON to JSONB.

In some cases, insisting on ignoring the key order is too expensive luxury, since it basically forces you to parse the whole document first and only then process it. In case you have huge documents, you have to stream-read and this often implies relying on a particular key order (which isn't a problem since those same huge documents will likely be stream-written with a particular order too).

It is very much still JSON, and your code can very much assume keys are ordered if your JSON tools respect it. ECMA-404 agrees:

> The JSON syntax does not impose any restrictions on the strings used as names, does not require that name strings be unique, and does not assign any significance to the ordering of name/value pairs. These are all semantic considerations that may be defined by JSON processors or in specifications defining specific uses of JSON for data interchange.

If you work in environments that respect JSON key order (like browser and I think also Python) then unordered behavior of JSONB would be the exception not the rule.

Assume you're writing an editor for JSON files. Don't think many users of that editor would be very happy if you change the order of the attributes in their json files, even though technically it's the same...

According to the JSON spec it's also legal to round all numbers to zero.

If you’re writing an editor, you will need to separate in progress text that user is editing from the parsed and validated data, and keep both.

I don't disagree, but people might still assume it. If you serialise a Map in Java, some Map implementations will maintain insertion order for example.

I think this is why JS and Python chose to make key ordering defined - the most popular implementations maintained insertion order anyway, so it was inevitable people would end up writing code relying on it

SQLite JSONB does maintain order: https://news.ycombinator.com/item?id=38547254

PG's JSONB is compressible yet also indexed (objects' key/value pairs are sorted on key and serialized array-like; arrays are indexed by integers; therefore binary searching large objects works). The history of PG's JSONB type is very interesting.

According to the linked announcement the data size is 5-10% smaller. So you are exchanging a small processing cost for smaller storage size. It will depend on your application but in many cases smaller disk reads and writes as well as more efficient cache utilization will make storing as JSONB a better choice even if you never manipulate it in SQL. Although the difference is likely small.

I haven’t delved into it, but you likely lose the ability to round-trip your json strings, just as in postgreSQL’s jsonb type.

  Jsonb_extract(jsonb(foo), '€')

- will remove/change white space in ‘foo’,

- does not guarantee to keep field order

- will fail if a dictionary in the ‘foo’ string has duplicate fields

- will fail if ‘foo’ doesn’t contain a valid json string

Not specifically JSONB, but I do recall that with MongoDB's equivalent, BSON, the sizes of the binary equivalent tend to be larger in practice, I would expect JSONB to have a similar trade off.

There'll also be a conversion cost if you ultimately want it back in JSON form.

From the post:

> JSONB is also slightly smaller than text JSON in most cases (about 5% or 10% smaller) so you might also see a modest reduction in your database size if you use a lot of JSON.

> Your application will never see the raw JSONB content.

That's not exactly right, as the jsonb_* functions return JSONB if you choose to use them.

And you can just read the BLOB column values out of the DB.

> If you're just using SQLite to write and read full JSON blobs, the JSON datatype will be the best pick.

I don't know how the driver is written, but this is misleading if sqlite provides an api to read the jsonb data with a in-memory copy, an app can surely benefit from skipping the json string parsing.

If my application never sees a difference to normal JSON, everything is compatible and all there is are perf improvements, why is there a new set of functions to interact with it (jsonb_*)?

It seems that the JSON type is even able to contain JSONB. So why even use these functions, if the normal ones don't care?

As someone mentioned below, the order of keys is undefined in JSON spec, but applications may rely on it anyway, and thus conversion to JSONB may lead to breakage.

There are some other minor advantages of having exact representation of the original - e.g. hashing, signatures, equality comparison is much simpler on the JSON string (you need a strict key order, which is again undefined by the spec, but happens in the real world anyway).

The difference is the storage format, which is important for a SQL database since you define it in the schema. Also the performance characteristics are different.

`json_` functions return json, `jsonb_` functions return jsonb. Both take either as input.

If you're modifying something "in-place" then `jsonb_` functions would be better since they avoid conversion.

I don't know if it's true of SQLite, but you missed the most important point at least with PostgreSQL : you can build indexes directly against attributes inside JSONB which really turns it into a true NoSQL / relational hybrid that can let you have your cake and eat it too for some design problems that don't fit neatly into either pure relational or pure NoSQL approaches.

What would that SQL look like in practice ?

Here's an example i have running on a pi with a temperature sensor. All data is received as json messages in mqtt, then stored as is in a postgres table. The view turns it into relational data.

If i just need the messages between certain temperatures i can speed this up by adding an index on the 'temperature' field in json.

    create or replace view bme680_v(ts, message, temperature, humidity, pressure, gas, iaq) as
    SELECT mqtt_raw.created_at                                          AS ts
     , mqtt_raw.message
     , (mqtt_raw.message::jsonb ->> 'temperature'::text)::numeric    AS temperature
     , (mqtt_raw.message::jsonb ->> 'humidity'::text)::numeric       AS humidity
     , (mqtt_raw.message::jsonb ->> 'pressure'::text)::numeric       AS pressure
     , (mqtt_raw.message::jsonb ->> 'gas_resistance'::text)::numeric AS gas
     , (mqtt_raw.message::jsonb ->> 'IAQ'::text)::numeric            AS iaq
    FROM mqtt_raw
    WHERE mqtt_raw.topic::text = 'pi/bme680'::text
    ORDER BY mqtt_raw.created_at DESC;

You can do this with text JSON in SQLite as well, but JSONB could speed it up.

Because it seems to not be common knowledge on this thread: JSONB is a format offered by Postgres for a while now, and is recommended over plain JSON primarily for improved read performance.


Postgres's JSONB uses native numbers, which is faster but coerces values. It also removes duplicate keys but you shouldn't use those anyway.

Sqlite's JSONB keeps numbers as strings, which is slower but preserves your weird JSON (since there's no such thing as standard JSON). I'm not sure about duplicate keys.

> since there's no such thing as standard JSON

I-JSON is the most sensible JSON profile I know of: https://datatracker.ietf.org/doc/html/rfc7493. It says: UTF-8 only, prefer not to use numbers beyond IEEE 754-2008 binary64 precision, no duplicate keys, and a couple more things.

SQLite's approach is faster if you are not frequently extracting numerical values out of JSONB. It also makes much easier to convert JSONB back to JSON. I think SQLite JSONB reserved enough space to define a native number type (among others) if this assumption turned out to be false.

> Postgres's JSONB uses native numbers, which is faster but coerces values.

You would probably be string quoting your numbers in your data/model if this matters to you so sounds like the right call from PG implementation.

I don’t know about Sqlite’s implementation but in Postgres JSONB is not 100% transparent to the application. One caveat I’ve encountered while working on an application that stored large JSON objects in Postgres initially as JSONB is that it doesn’t preserve object key order, i.e. the order of keys in an object when you store it will not match the order of keys when you retrieve said object. While for most applications this is not an issue the one I was working on actually did rely on the order of keys (which I am aware is a bad practice but this is how the system was designed) and suddenly we noticed that the app started misbehaving. Changing the column type from JSONB to JSON fixed the problem.

Given that the order of the keys is specified as having no significance in regular JSON[1], this is out-of-spec usage.

If key order has to be preserved then a blob type would be a better fit, then you're guaranteed to get back what you wrote.

For example, SQLite says it stores JSON as regular text but MySQL converts it to an internal representation[2], so if you migrate you might be in trouble.

[1]: https://ecma-international.org/publications-and-standards/st...

[2]: https://dev.mysql.com/doc/refman/8.0/en/json.html

You can also add a property to all objects with an array of keys in the order you want (if you can guarantee it won't conflict with existing properties, or can escape those), or turn them into a {k, o} array where k is the key array and o the object or if you don't care about looking up keys then either a {k, v} array where v is a value array or an object where keys are prefixed with their position or putting the position in the value as an array of value and position.

Indeed! There is at least one other difference, which is that the presence of null characters will cause an error in `jsonb`, but not `json`:

# select '{ "id": "\u0000" }'::json;

-> { "id": "\u0000" }

# select '{ "id": "\u0000" }'::jsonb;

-> ERROR: unsupported Unicode escape sequence

The object key order thing is not a JSON spec compliance issue, but this one is. Either PG should store this escaped or use counted byte strings, but you can see why neither is desirable for a high-performance binary JSON format.

I suppose it is what it is, but if ordering matters, it is only JSON-like in appearance. json.org says:

>An object is an unordered set of name/value pairs.

Never ever count on object key ordering, or even on there being no duplicate keys in objects. If the JSON texts you're dealing with can be processed -or even just stored- by other code then you're in for pain if you make any assumptions about object key ordering / object key dups.

Once you accept this then it stops being a problem.

A warning to anyone who uses JSONB in Postgres. Past a certain size, the engine must go to the TOAST table off disk, even for indexed reads. If you store your JSONB in Postgres, don’t wait for it to grow out of hand or store a bunch of random stuff in it from an API call!

That JSONB is different from SQLite's JSONB, though.

Is it? TFA doesn't say. I suppose we could check the source code.

EDIT: https://sqlite.org/draft/jsonb.html does say that they are different.

I didnt understand the purposes of document stores until the past couple of years and they are fabulous for building POCs. Enhanced JSON support will help a lot for making sqlite a suitable document store.

I get full type support by serializing and deserializing protobuf messages from a db column and not making this column JSONB means i can filter this column too, instead of having to flatten the searchable data to other columns.

Yeah as long as you're reading and writing to the database with the same language, and that language has good type safety the benefits of your database schema effectively being defined by the same types as the rest of your code is pretty nice for a lot of use cases.

You just have to be vigilant about correctly migrating existing data to the current shape if you ever make breaking changes to types.

This. Would be nice if there was a framework (in go, or python pydantic) which would help me migrate data made with old structs to new structs. And also deal with the transaction.

For now i use sqlite to deal with transactions and only make backward compatible updates to structs. Brittle, but it is a toy app anyways.

(Normally use django to deal with models and migrations, but wanted to do something different)

Yeah migrations are the biggest issue for me. I really don't like not knowing what the actual shape of the document will be. Missing transactions, and not great relationship performance makes modelling some systems more hassle than it's worth.

I gave it a good go to use mongo and firestore for a few projects, but after a year or two of experimenting I'll be sticking to SQL based DBs unless there are super clear and obvious benefits to using a document based model.

There's a gradual approach there, where you start out with a JSONB column, and then as each piece of the data structure stabilizes* you move it out of json fields and into its own columns/tables.

* meaning, when there's enough code that depends on it that changing it would require some planning

This is the way I build all my apps now. Eventually the jsonb field stores nothing as it all gets moved to defined fields.

Alternatively, keep it in the JSON/JSONB column until you need to search/filter/query on it, in which case you pull it out into a column.

Even that may not be immediately necessary. I don't think SQLite has it yet, but Postgres can build partial indexes on JSONB fields.

Though most of the time, in that situation, I would pull it out.

Most commonly I see people use Alembic to migrate SQLAlchemy and Pydantic models.

But I tend to just use Django. Every time I try piecing together the parts (ex FastAPI, Pydantic, Alembic, etc) I reach a point where I realize I’m recreating a half baked Django, and kick myself for not starting with Django in the first place.

You can try JSONB in the pre-release snapshot [1] or live in the playground [2].

[1]: https://sqlite.org/download.html

[2]: https://codapi.org/sqlite

Anyone know what their release process is like, will this be in v3.45? (Downloads page has latest release as 3.44, and TFA says this is in pre-release snapshot.)

I don't use SQLite directly much, but I'd be keen to use this in Cloudflare's D1 & Fly.io. Having said that though, I'm not sure they publicise sqlite version (or even that it isn't customised) - double-checking now they currently only talk about importing SQLite3 dumps or compatible .sql files, not actually that it is SQLite.

So API changes like this actually break that promise don't they? Even though you wouldn't normally think of an addition (of `jsonb_*` functions) as being a major change (and I know it isn't semver anyway) they are for Cloudflare's promise of being able to import SQLite-compatible dumps/query files, which wouldn't previously have but henceforth might contain those functions.

I came here to ask the same thing. Btw Cloudflare D1 has a lot of limitations while it has been in a beta. In my tests, almost all D1 read/writes go to a single region. It is highly recommended to try to limit edge function usage of D1 to no more than a single read query or 1read+1write to reduce edge latency.

I'm familiar with MongoDB's BSON, but not JSONB. Here is an article I found that talks about the differences: https://blog.ferretdb.io/pjson-how-to-store-bson-in-jsonb/

AFAIK jsonb is not a specific format, it’s just a generic term for “a json equivalent binary representation”. The sqlite blurb says jsonb is generally smaller than json, but IIRC from when postgres added jsonb postgres’ is generally slightly larger.

PG's JSONB has a specification (well, inside PG, not a standard), and they won't change it except in backwards compatible ways (or, really, at all).

> PG's JSONB has a specification (well, inside PG, not a standard)

So does sqlite's. An implementation detail, even a necessarily stable one, does not a format make.

Next step is to go full Ouroboros and have embedded SQLite DBs as records.

> Next step is to go full Ouroboros

That was the previous step. Binary BLOBs have been supported in SQLite for some time.

Think of the children!

Despite internal format I see immediate external usage in applications. For example batch insertions in Python. Per row insert call has noticeable overhead. And JSONB could bring performance back with CTE:

    CREATE TABLE data(id, name, age);

    WITH ins AS (
        SELECT c1.value, c2.value, c3.value
        FROM json_each('["some", "uuid", "key"]') c1
        INNER JOIN json_each('["joe", "sam", "phil"]') c2 USING (id)
        INNER JOIN json_each('[10, 20, 30]') c3 USING (id)
    INSERT INTO data (id, name, age)
    SELECT * FROM ins
Each json_each could accept a bind parameter with JSONB BLOB from an app.

You don't need JSONB for this - doing this with plain JSON is simpler and already faster than individual inserts for most bindings in my experience.

I typically do bulk inserts using a single JSON argument like this:

    WITH ins AS (SELECT e.value ->> 'id', e.value ->> 'name', e.value ->> 'age' FROM json_each(?) e)
    INSERT INTO data (id, name, age)
    SELECT * FROM ins
The same approach can be used for bulk updates and deletes as well.

I have quite wide records (over 50 fields) and ->> performs not well with text keys. I did not try it with array indexing though:

    WITH ins AS (
        SELECT value ->> 0, value ->> 1, value ->> 2
        FROM json_each('[["some", "joe", 10], ["uuid", "sam", 20], ["key", "phil", 30]]')
    INSERT INTO data (id, name, value)
    SELECT * FROM ins

I have some cases where sometimes I want to store a vector of floats along with a data item. I usually don't need to match against the column, just store it. I know I could use BLOB but often I just use JSON for this kind of thing so that I don't have to deal with data types and conversion. It's wasteful and imprecise though due to the string conversion. Is JSONB a good middle ground option for these cases?

Edit: Sorry, just saw the comment below by o11c,

> Sqlite's JSONB keeps numbers as strings

which means the answer to my question is basically "no".

I wish there was some way to compress JSON data across multiple rows, I often have very similar blobs in each row. Or some other way to reduce amount of storage for a lot of similar blobs across many rows, for example 10kB JSON where only one or two keys are different.

This does not look at all usable. Your application is exposed to the "internal binary format" that SQLite has chosen:

    select jsonb_extract('{"foo": {"bar": 42}}', '$.foo');

    │ jsonb_extract('{"foo": {"bar": 42}}', '$.foo') │
    │ |7bar#42                                       │

"The central idea behind this JSONB specification is that each element begins with a header that includes the size and type of that element."

Why not add this size indication to JSON specification. Would reduce memory requirements for JSON processing.

1997: https://cr.yp.to/proto/netstrings.txt

NB. This header may be the "central idea" behind JSONB but JSONB has other differences from JSON. This comment refers only to the size indication not the other features.

Would be nice to have an even more compact csv-esque version of jsonb that knew how to store arrays of objects with the same keys in compressible column-major format, omitting the repeated keys (and even value types if possible), like a transposed csv file with a first column of headers + type info. And the ability to embed arrays of object encoded that way into arbitrary jsonb structures, as long as all the objects have the same keys and (simple enough) value types. Or is that what the (different?) Postgres jsonb format does?

JSONB has performance implications, many of you here are likely familiar with JSONB in Postgres.

I encourage you to view this talk from PGConf NYC 2021 - Understanding of Jsonb Performance by Oleg Bartunov [1]

Looking forward to a similar talk from the SQLite community on JSONB performance in the future.


I would prefer relational databases just have a compound hierarchal data type that can contain all the types it supports. Json is so anemic on types.

This way it can efficiently store numbers , dates, uuids or raw binary etc. and should really have some sort of key interning to efficiently store repeating key names.

Then just have functions to convert to/from json if thats what you want

> Json is so anemic on types.

So is SQLite3 though.

I do wish that SQLite3 could get a CREATE TYPE command so that one could get a measure of static typing. Under the covers there would still be only the types that SQLite3 has now, so a CREATE TYPE would have to specify which one of those types underlies the user-defined type. I think this should be doable.

Hm I googled and found this draft of the encoding - https://sqlite.org/draft/jsonb.html

It feels like it would be better to use a known binary encoding. I thought the MessagePack data model corresponded pretty much exactly to JSON ?

Edit: someone else mentioned BSON - https://bsonspec.org/

To be honest the wins (in this draft) don't seem that compelling

The advantage of JSONB over ordinary text RFC 8259 JSON is that JSONB is both slightly smaller (by between 5% and 10% in most cases) and can be processed in less than half the number of CPU cycles.

JSON has been optimized to death; it seems like you could get the 2x gain and avoid a new format with normal optimization, or perhaps compile-time options for SIMD JSON techniques


And this seems likely to confuse:

The "JSONB" name is inspired by PostgreSQL, but the on-disk format for SQLite's JSONB is not the same as PostgreSQL's. The two formats have the same name, but they have wildly different internal representations and are not in any way binary compatible.


Any time data is serialized, SOMEBODY is going to read it. With something as popular as sqlite, that's true 10x over.

So to me, this seems suboptimal on 2 fronts.

SQLite says you should not attempt to access it: https://sqlite.org/draft/jsonb.html

> JSONB is not intended as an external format to be used by applications. JSONB is designed for internal use by SQLite only. Programmers do not need to understand the JSONB format in order to use it effectively. Applications should access JSONB only through the JSON SQL functions, not by looking at individual bytes of the BLOB.

> However, JSONB is intended to be portable and backwards compatible for all future versions of SQLite. In other words, you should not have to export and reimport your SQLite database files when you upgrade to a newer SQLite version. For that reason, the JSONB format needs to be well-defined.

If SQLite intends to own the format forever, I can believe that their requirements are such that leaning on an existing implementation is not worth the savings to implement.

Users don't read docs, and any visible implementation detail becomes set in stone:


It looks like they're aware of that ... it's probably fine -- not ideal, but fine

If users didn't care about what the blob format was, there wouldn't be JSON support in the first place! You would have started with something like JSONB

Also SQLite3 will have to remain backwards compatible with its JSONB forever, which means it's stable, which means there will be users who will feel comfortable using the raw JSONB from SQLite3.

> JSON has been optimized to death; it seems like you could get the 2x gain and avoid a new format with normal optimization

Either I'm experiencing a reading comprehension mishap or this is self contradictory. Where is a "2x gain" supposed to come from through "normal optimization" from after something has already been optimized "to death?"

> SIMD JSON techniques

Which are infeasible in key SQLite use cases.

> Credit to SQLite developers for adopting an extant binary JSON format, as opposed to inventing yet another one.

The comment you are replying to cites a statement saying explicitly that they are not adopting the Postgres JSONB binary format, only the name and the abstract concept. The API is not compatible with Postgres either.

I noted that and removed that bit prior to your reply.

Points off for a.) inventing yet another binary JSON and/or b.) using the same name as an existing binary JSON.

Those 2 points are exactly what my comment said

I just meant there are a zillion different known ways of optimizing JSON, that maybe could be applied here. But maybe they already optimized it to the degree they're willing and wanted a binary format

SQLite developers aren't in the habit of making poor technical choices. This choice certainly has precedent: Oracle, MySQL, MongoDB, Postgres and others have implemented some form of binary encoding for JSON. Perhaps they know something you don't and deserve the benefit of the doubt.

Re-using standards is a great idea, and should remain people's default, but I don't see the benefit here.

What's the advantage of re-using a format?

Ecosystem? That won't help SQLite here, who don't have dependencies.

Keep in mind that anyone trying to write a parser for this is also writing a parser for the entire SQLite file format (the only way to access bytes). And it's a spec simple enough to fit on one monitor.


BSON (and others?) seem to have different goals.

SQLite's format seems to minimise conversion/parsing (eg. it has multiple TEXT types depending on how much escaping is needed; BSON has one fully-parsed UTF-8 string type). BSON is more complex: includes many types not supported by json (dates, regexs, uuids...) and has a bunch of deprecated features already.

SQLite's on disk-format is something they intend to support "forever", and as with the rest of SQLite, they enjoy pragmatic simplicity.

> JSON has been optimized to death; [...]

Not the JSON format.

> [..] it seems like you could get the 2x gain and avoid a new format with normal optimization, or perhaps compile-time options for SIMD JSON techniques

SIMD isn't always appropriate, and there are still no SIMD incremental JSON parsers either. There are things that can't easily be done with SIMD and JSON.

It takes a lot of complex code to optimize a JSON parser. You don't need to optimize that hard to make an equally performant parser for binary serialization formats, and given SQLite's JSONB is purely an internal optimization, it doesn't have to be a well-known interchange format (which has much more concerns than SQLite).

> Any time data is serialized, SOMEBODY is going to read it. With something as popular as sqlite, that's true 10x over.

And this statement is equally true for the SQLite format itself. That doesn't mean that the SQLite format should be replaced with something more standard, of course.

I thought the point was that the DB could reach into JSONB for you (potentially indexed as well) instead of having to deserialize outside. So the exact serialization isn't maybe as interesting as the direct queryability.

Does SQLite make any promises on the internal implementation? I assumed that the only guarantee was at the SQL interface.

They do promise it'll be a stable format:

> The JSONB format is not intended as an interchange format. Nevertheless, JSONB is stored in database files which are intended to be readable and writable for many decades into the future. To that end, the JSONB format is well-defined and stable. The separate SQLite JSONB format document provides details of the JSONB format for the curious reader.

And indeed the functions jsonb() and json() will let you convert to and from jsonb.

CBOR should compress even better than BSON.

Unrelated question: what languages integrates best with sqlite? I am using it with go and cgo, but it is often advised to avoid cgo. Perhaps it doesn't matter so much, I can use it anyways, but would be interesting to hear about other experiences.

You have options that don't use cgo. Disclaimer, I made the second one.

https://modernc.org/sqlite https://github.com/ncruces/go-sqlite3

That's very cool, never considered a WASM option, thanks!

> what languages integrates best with sqlite?


Any language with bindings will be fine.

The C integration API to Sqlite is very simple and small, so a) almost all languages will have bindings, and b) any bindings that exist will be reasonable.

Define best

Little overhead, a "native" integration.

I'm surprised that TFA doesn't say that the JSONB it refers to is PostgreSQL's JSONB. I assume it must be because SQLite3 adopts a lot of things from PostgreSQL, so D.R. Hipp and crew must be familiar with PG's JSONB and so they wouldn't create a different JSONB. Plus the PG JSONB is brilliant, so it would be wise to copy it or at least take inspiration from it.

There's.. no need to interop with PG's JSONB encoding, I think. If so then SQLite3's could differ from PG's.

Anyways, it'd be nice if TFA was clearer on this point.

EDIT: https://sqlite.org/draft/jsonb.html does refer to PG's JSONB, and says that they are different.

So what will JSONB look like in a standalone DB browser (DBeaver, etc.) ?

It's just a BLOB.

I guess I'm thinking aloud about which (and when) DB browsers will auto-decode JSONB into JSON.

As the maintainer of sql.js and SQLPage, I am very excited about this:

- json is used a lot by sql.js users to interact with JavaScript.

- to generate more sophisticated web pages in SQLPage, json is crucial

I can't wait for the next release !

Interested to know what the Deno JavaScript/TypeScript project thinks of this addition given it has already has a key value store backed by the database and JSON is JavaScript friendly.

While this will be good for SQLite queries that look into JSON data I’m not sure it really changes much for Deno. V8’s JSON parser is very highly optimised so it may be tricky to make a faster SQLite JSONB parser.

Is this just a data type, or did SQLite put in a JSON interpreter?

What do you mean by a JSON interpreter? I think SQLite already has a full implementation of JSON including tree walkers, and this internal format (that is externally just another BLOB) makes them more efficient.

Oh, sorry, I was thinking that someone had put a Javascript interpreter in the database itself. Fortunately, no.

Despite the name, the JSON format doesn't really have much to do with JavaScript.

Pretty sure SQLite comes with a JSON parser. JSON is a very simple format for which it is easy to write a parser for...

You mean a JSON parser? I don't think you can execute anything with json.

SQLite has had pretty decent support for JSON for a while now:


It's a BLOB with a binary representation of JSON.

What are the use cases for storing and manipulating json at the db level like this - why not use a relational schema and query it in the normal way?

Imagine you need to attach to some records a bag of data which you always fully need or not. This bag may have a tree structure, which is expensive to deal with in a relational model. An example would be the JSON configuration of a component/client/whatever. In such a case, trying to fit this bag of data in a relational model would be inefficient.

Those bags of data are usually called "documents". And a lot of systems need a way to store them along relational data.

If you're just storing and retrieving those documents, without any query, you don't need JSONB, a simple blob is enough.

If you sometimes need to do some queries, especially free queries (you want all component whose configuration has some property), then JSONB is suitable as it lets you do the filtering in the database.

> If you sometimes need to do some queries, especially free queries (you want all component whose configuration has some property), then JSONB is suitable as it lets you do the filtering in the database.

This feels like a slippery slope into a denormalised mess though. Before you know it your whole client record is a document and you’re using Postgres as a NoSQL database

The PostgreSQL's devs are fine with that.

Such use cases may not be quite as optimized as relational usecases, but they should be possible. If you are doing that then perhaps a different database would be nicer or faster for that scenario, but again not PostgreSQL's concern.

If the PostgreSQL devs were relational purists they would never have added special support for querying JSON.

An application may be better served with a properly normalized schema, or it might not. That is a choice for the application developers to make.

In practice competent developers should quickly realize if they went too nosql when a relational approach would provide benefits (Like if multiple "documents" need to reference consistent shared data, or if json query performance is not good enough) and normalize as needed to get those advantages.

So long as the application does not contain random sql queries scattered everywhere (and doesnt treat its database as a sort of API for external access) then database refactoring is not impossible. Indeed the difficulty is often overestimated. It is seldom fun work, and tends to be a bit of a slog, and require more extensive testing before pushing to prod, but that happens.

Align with the rest of the system, when sqlite is used as a component in a system which mainly speaks json.

Performance, when records are complex and usually not accessed.

Ease, avoiding/postponing table design decisions.

Flexibility, for one of many corner-cases (since sqlite is a very broad tool).

Incremental enhancement, e.g. when starting with sqlite as replacement to an ndjson-file and incrementally taking advantage of transactions and indexes on fields [1,2,3].

For example, several of these could apply when doing structured logging to a sqlite database.

    [1]: https://www.sqlite.org/expridx.html
    [2]: https://www.sqlite.org/gencol.html
    [3]: https://antonz.org/json-virtual-columns/
    See also: https://www.sqlite.org/json1.html

There are good use cases, but from my personal experience, I've seen people use JSONB mainly to avoid to have too many columns when they inspect their tables, so they split their data into two groups, the ones supposedly important that have dedicated columns and the rest that is thrown into a single JSONB column.

Does anyone know if SQLite has a public roadmap? This combined with a gin or gin like index would be a killer combination

Does Sqlite support Sqlite database format in tables

What’s the difference between JSONB and BSON?

why not decode it as a struct or list of whatever are the native types in the database for arbitrarily nested objects?

sqlite doesn't have any native type for arbitrarily nested objects.

a good reason to move to DuckDB I guess?

Is there any best practices around jsonb?

There's all the rest of open source, and then there's SQLite. A public domain software that doesn't accept contributions from outsiders, and that happens to run much of the world.

And it just keeps getting better and better and better, and faster and faster and faster.

I don't know how these guys manage to succeed where almost all other projects fail, but I hope they keep going.

The patterns I've noticed, watching it evolve for nearly 20 years are:

- They draft behind others, and I mean this in a good way. They seem to eschew trailblazing new features, but keep a close eye on alternatives (esp. Postgres) and how use cases are emerging. When a sufficiently interesting concept has stabilized, they bring it to SQLite.

- Closely related to this, they seem to stay clear of technical and community distractions. They have their plans and execute.

- I don't know if D. Richard Hipp is considered a BDFL, but that's my impression and he seems good at it.

When put this way, it reminds me how Apple generally chooses to add features to the iPhone. They wait for use cases to be proven, usually by Samsung, and then add a very polished version to the phone.

Arguably the entire app eco system exists only to farm features. This is arguably the case whenever users are allowed to create things.

Go get a pro subscription with each of your competitors :)

And Java is similar nowadays

Sometimes you gotta keep the riff raff out in order to march forwards with your vision. Too many cooks in the kitchen can be deadly.

It's perhaps the best example of the cathedral model of open source.

Yeah, learned about it while reading the documentation of Fossil (from the same SQLite people). Their approach certainly has its own merits (and drawbacks).

Just wondering how they will transition once the original few people at the top of the hierarchy need to retire, eventually it will happen.

I guess they need to find younger trusted committers with the same dedication and spirit. That's not necessarily easy. But for a piece of software as important as SQLite, I have a feeling they will find those.

>Just wondering how they will transition once the original few people at the top of the hierarchy need to retire, eventually it will happen.

I had always viewed this as a "future worry", and then Bram Moolenaar passed away :(

The problem with “future worry”s is that the Universe doesn't always agree with us on when that future will be, and it often happens to be closer than expected, even more often closer than desired.

> I don't know how these guys manage to succeed where almost all other projects fail, [...].

They have a proprietary (not open source) test suite with 100% branch coverage. That makes it impossible to have credible forks of SQLite3. And it makes the gatekeepers valuable because SQLite3 is the most widely used piece of software ever built. So there's a SQLite Consortium, and all the big tech players that depend on SQLite3 are members, and that's how the SQLite team pays the bills.

Can anyone comment on the characteristics of the source code? Is the sqlite source worth reading for someone who is well-versed in C and x86 assembly, and has a rudimentary knowledge of databases?

"If you're looking into building your own database ...SQLite is amazing. It has very clean and readable code, so I'd suggest using it as a reference" https://news.ycombinator.com/item?id=21041833

"I suggest SQLite because the source code is superb (seriously, some of the most readable, most logically organized and best commented C code you'll ever see)" https://news.ycombinator.com/item?id=12559301

"it never hurts to look at a good open-source codebase written in C, for example the SQLite code is worth looking at (if a bit overwhelming)" https://news.ycombinator.com/item?id=33132772

Is it an excellent and unchecked attack surface like JSON is in MSSQL? Postgres is also known for SQL injection. And if this is based off of that...


This blog post is about using these operators to bypass WAFs. You don't need JSON operators for that. You can substitute `1==1` with `2==2`, or `1!=1`, or `true`, or a million other approaches. There are an infinite number of such strings.

This is a problem with WAFs, not databases. Postgres and SQL Server both provide prepared statements as an alternative to string concatenation, which addresses SQL injection. (Though some people may be stuck with legacy or vendor-contolled systems that they can't fix, and so WAFs are their only option.)

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact