The problem with SQLite is that it's not a standardized file format. It's well-d...

bane · on Sept 18, 2023

I'm not sure if the problem you are pointing out has to do with:

a) SQLite the file format - which is Public Domain and so well documented that parsers for it exist in numerous other languages even though it's almost pointless because...

b) SQLite, the Public Domain (and thus entirely source available) C implementation of the library that can operate on the file format -- and is documented to a level well above what most ISO standards shoot for. It's designed to be used in other software and has bindings for pretty much every major language.

c) Some notional OpenDocument stored in a SQLite file that's really just waiting for somebody to make and document.

ISO standards are great, but if we had to wait for ISO to define a file format we'd have pitifully little to work with.

btilly · on Sept 18, 2023

It is possible that the C implementation of SQLite is the single most commonly deployed software library ever. If not, then it is probably the second, after zlib.

https://www.sqlite.org/mostdeployed.html

Therefore I consider it a better supported format than most standardized formats.

lucideer · on Sept 18, 2023

That page makes the argument for zlib & sqlite, but Daniel Stenberg makes some good points here[0].

My guess would be zlib is still number 1 though, even accounting for Daniel's considerations.

[0] https://daniel.haxx.se/blog/2021/10/21/the-most-used-softwar...

pl4nty · on Sept 18, 2023

for one, it's bundled with consumer versions of Windows as winsqlite3.dll. not sure when this started though

sdeframond · on Sept 18, 2023

I think this has been discussed before about WebSQL.

> The [WebSQL] specification reached an impasse: all interested implementors have used the same SQL backend (Sqlite), but we need multiple independent implementations to proceed along a standardisation path.

https://www.w3.org/TR/webdatabase/

sdeframond · on Sept 18, 2023

The Chrome blog post about deprecating sqlite-based WebSQL makes an interesing point. I believe it applies to OpenDocument as well.

> The Web SQL specification cannot be implemented sustainably, which limits innovation and new functionality. The last version of the standard literally states "User agents must implement the SQL dialect supported by Sqlite 3.6.19". SQLite was not initially designed to run malicious SQL statements, yet implementing Web SQL means browsers have to do exactly this. The need to keep up with security and stability fixes dictates updating SQLite in Chromium. This comes in direct conflict with Web SQL's requirement of behaving exactly as SQLite 3.6.19.

https://developer.chrome.com/blog/deprecating-web-sql/

bane · on Sept 18, 2023

In other words, "all the implementors chose a standard, but we're the standard deciders so we're killing the whole idea".

dragonwriter · on Sept 18, 2023

One problem was the standard was bug-for-bug replication of a particular version of SQLite.

There’s very good reason for that not to be a standard. (Now, assuming the SQLite documentation is licensed in a way which supports this, copying the documentation of SQLite’s supported SQL as of that version into the standard might have been viable, but no one interested in having WebSQL proposed that or any other resolution.

That relates to the cited issue of absence of independent implementations, which would have been a problem even with a spec that supported independent implementations and verification of their compliance independent of a particular reference implementation. though I personally think the spec problem is a bigger real problem (even if not the decisive policy problem) than the “everyone is using the same underlying software to implement the spec” problem is in this case, where the shared implementation is a permissively licensed open source implementation sponsored by several of the browser vendors, among others.

bane · on Sept 19, 2023

hmmm...I appreciate the thoughtful reply. You bring up an interesting point. What is the SQLite documentation licensed as? I would assume PD like the rest of it, but I don't know that for sure.

anticensor · on Sept 19, 2023

SQLite itself is in the public domain.

sdeframond · on Sept 19, 2023

The standard deciders are the implementors. There is no point in opposing them. The W3C is actually the representatives of Google, Mozilla, Microsoft, Opera and so on.

zie · on Sept 18, 2023

100% agree and the Library of Congress loves it: https://www.loc.gov/preservation/digital/formats/fdd/fdd0004... and https://sqlite.org/locrsf.html

joshspankit · on Sept 18, 2023

Sounds like a solution is to use the C implementation to define the standard and have it canonized in to an ISO.

cornstalks · on Sept 18, 2023

That's what Opus did. The RFC[1] has a base-64 encoded libopus.tar.gz appendix (Appendix A), which is the "primary normative part of this [Opus] specification." If the prose and source code disagree, the source code takes priority and "wins" when it comes to which is normative.

I have a love-hate relationship with this approach.

[1]: https://datatracker.ietf.org/doc/html/rfc6716

dfox · on Sept 18, 2023

That is common for codec standards, the normative part of many MPEG specifications is the parser/decoder in C-like pseudo-code. What is somewhat unique for Xiph is that their normative reference decoders are actually usable.

Eduard · on Sept 18, 2023

funny, the RFC even includes a shell command pipeline to extract the base64 out of the awkward RFC formatting.

Using the C source code still leaves room for ambiguities / under-specification, no? After all, the semantics rely on the particular gcc release used for compiling the code.

cornstalks · on Sept 18, 2023

There is still the possibility of a bug or under-specification, but that's always the case in any spec. At least with Opus they document what implementation-defined behavior they require, so assuming there aren't any hidden bugs then you should get consistent output across compilers.

Eduard · on Sept 19, 2023

but the semantics change depending on the build tool version and other factors.

belenos46 · on Sept 18, 2023

Yeah, a solution in search of a problem.

gwd · on Sept 18, 2023

> Writing an XML parser is not a trivial task, but it's still simpler than writing an SQL parser, query optimizer, compiler, bytecode VM, full-text search engine, and whatever else Sqlite offers, without any data corruption in the process.

Just to clarify: You don't actually need to implement all that for it to be a standardized file format, any more than you need to implement all the spreadsheet functionality to be able to read a LibreOffice spreadsheet. All you need to do is to be able to reconstruct the tables. There's no reason, having reconstructed the tables, you couldn't write your own imperative code in the language of your choice to go over them and get whatever information you wanted.

justin66 · on Sept 18, 2023

> This isn't a concern for most software.

It's not even a concern for the US Library of Congress, which defined SQLite as a recommended storage format for datasets alongside CSV, XML, and JSON.

nelgaard · on Sept 18, 2023

But those are completely different uses of a storage format.

Library of congress considers if someone a 100 years from now could write a new importer in whatever langauge/AI they might use by then.

Office documents are something you send in email attachments to people you often barely know, and expect them to read it in whatever office system they have. And if the recipient uses e.g., Microsoft Word, OFD/Sqlite might not work.

justin66 · on Sept 18, 2023

It is true that it requires effort for the developers of a software program to support a given file format. Beyond that I'm not sure what your point is.

galangalalgol · on Sept 18, 2023

Not the op, but one point would be, why did we even pick xml, when we had latex and html? Why is a relational database the right tool for a document format?

cxr · on Sept 21, 2023

They're constrained by different requirements. The comment was clear enough:

"those are completely different uses"

It's not a hard concept to grasp. There is no riddle to decipher.

toast0 · on Sept 18, 2023

> Office documents are something you send in email attachments to people you often barely know, and expect them to read it in whatever office system they have.

Eh, if they're not running the same office system, down to patches, you can't really expect much.

sethev · on Sept 18, 2023

You seem to be mixing up the file format with how it's used. An application that uses SQLite's file format would use SQLite's library as part of the application. Yes, it would be quite a lot of work to replicate that library but in the same way that replicating the code that uses OpenDocument's file format would be.

The file format itself is pretty straightforward.

coliveira · on Sept 18, 2023

But you don't need a standard, because all interaction between applications and the document is made through SQL. And SQL is standardized (at least the parts that matter). If you have concerns about compatibility, make sure that the document can also be accessed through other databases (like mysql).

orra · on Sept 18, 2023

But other databases cannot access sqlite databases, because the file format is internal...

nojvek · on Sept 18, 2023

SQL file format is very well documented. In some universities it is an assignment to directly read and write sqllite files from disk and understand the paged and blocks structure.

You don’t need sql for any of it.

https://www.sqlite.org/fileformat.html

dunham · on Sept 18, 2023

It's interesting that this is a classroom assignment, like the sibling comment, I'd curious which university / class this was. I did the read part (+ query planning) on my own as an exercise, but I haven't gotten around to implementing writing yet.

You do need to parse DDL to get the column names, they're stored as a "CREATE TABLE" string. But you don't have to if you want to dump the file without names.

https://github.com/dunhamsteve/sqljs

avinassh · on Sept 18, 2023

> In some universities it is an assignment to directly read and write sqllite files from disk and understand the paged and blocks structure.

do you have any links?

bane · on Sept 18, 2023

https://github.com/pgspider/sqlite_fdw

orra · on Sept 18, 2023

I'll admit, that's a fantastic third party effort. But there definitely isn't the same level of first party support as there is for zip files.

hot_gril · on Sept 18, 2023

They can if they want to, using the standard SQLite lib or their own implementation.

marcinzm · on Sept 18, 2023

>at least the parts that matter

In my experience every part matters in non-trivial use cases since someone somewhere will use that part.

patapong · on Sept 18, 2023

This sounds exactly like the argument that killed WebSQL in 2010: https://en.wikipedia.org/wiki/Web_SQL_Database

I am still salty about this, as WebSQL would have made it much easier to build a certain class of web apps.

bb010g · on Sept 18, 2023

You can still use <https://github.com/jlongster/absurd-sql>. <https://jlongster.com/future-sql-web>

eternityforest · on Sept 19, 2023

It almost seems worth giving up ISO for SQLite, but I understand there are real concerns when you get into enterprisy stuff.

SQLite is kind of its own standard. It's public domain and they don't do breaking changes all day, and it's in C. As long as C is still viable, SQLite is usable on basically all non embedded platforms, and nobody really needs to reimplement it, unless they want to port it to Rust or something.

Not that you'd need to, since it's already very reliable.

paulddraper · on Sept 18, 2023

> less capable engine

There wouldn't be another engine.

It would be SQLite. Period.

cryptonector · on Sept 18, 2023

This could have been used as an opportunity to standardize the SQLite3 DB file format.

hot_gril · on Sept 18, 2023

I've never seen this as a problem, since plenty of random things are distributed as sqlite files. All the remaining questions for ODF would be about the schema design.

jimbokun · on Sept 18, 2023

Just define the schema and the semantics of each column for each table.