Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The problem with SQLite is that it's not a standardized file format. It's well-documented and pretty well understood for sure, but there's no ISO standard defining how to interpret an SQLite file in excruciating detail. Same goes for competing implementations, Zip and XML have a much smaller API surface than SQLite, whose API, apart from a bunch of C functions, is the SQL language itself. Writing an XML parser is not a trivial task, but it's still simpler than writing an SQL parser, query optimizer, compiler, bytecode VM, full-text search engine, and whatever else Sqlite offers, without any data corruption in the process. If Open Office used SQLite, its programmers would inevitably start using its more esoteric features and writing queries that a less-capable engine wouldn't be able to optimize too well.

This isn't a concern for most software. If you're writing a domain-specific, closed-source application where interoperability with other apps or ISO standardization isn't a concern, SQLite is a perfectly fine file format, but as far as I understand the situation, those concerns did exist for Open Office.



I'm not sure if the problem you are pointing out has to do with:

a) SQLite the file format - which is Public Domain and so well documented that parsers for it exist in numerous other languages even though it's almost pointless because...

b) SQLite, the Public Domain (and thus entirely source available) C implementation of the library that can operate on the file format -- and is documented to a level well above what most ISO standards shoot for. It's designed to be used in other software and has bindings for pretty much every major language.

c) Some notional OpenDocument stored in a SQLite file that's really just waiting for somebody to make and document.

ISO standards are great, but if we had to wait for ISO to define a file format we'd have pitifully little to work with.


It is possible that the C implementation of SQLite is the single most commonly deployed software library ever. If not, then it is probably the second, after zlib.

https://www.sqlite.org/mostdeployed.html

Therefore I consider it a better supported format than most standardized formats.


That page makes the argument for zlib & sqlite, but Daniel Stenberg makes some good points here[0].

My guess would be zlib is still number 1 though, even accounting for Daniel's considerations.

[0] https://daniel.haxx.se/blog/2021/10/21/the-most-used-softwar...


for one, it's bundled with consumer versions of Windows as winsqlite3.dll. not sure when this started though


I think this has been discussed before about WebSQL.

> The [WebSQL] specification reached an impasse: all interested implementors have used the same SQL backend (Sqlite), but we need multiple independent implementations to proceed along a standardisation path.

https://www.w3.org/TR/webdatabase/


The Chrome blog post about deprecating sqlite-based WebSQL makes an interesing point. I believe it applies to OpenDocument as well.

> The Web SQL specification cannot be implemented sustainably, which limits innovation and new functionality. The last version of the standard literally states "User agents must implement the SQL dialect supported by Sqlite 3.6.19". SQLite was not initially designed to run malicious SQL statements, yet implementing Web SQL means browsers have to do exactly this. The need to keep up with security and stability fixes dictates updating SQLite in Chromium. This comes in direct conflict with Web SQL's requirement of behaving exactly as SQLite 3.6.19.

https://developer.chrome.com/blog/deprecating-web-sql/


In other words, "all the implementors chose a standard, but we're the standard deciders so we're killing the whole idea".


One problem was the standard was bug-for-bug replication of a particular version of SQLite.

There’s very good reason for that not to be a standard. (Now, assuming the SQLite documentation is licensed in a way which supports this, copying the documentation of SQLite’s supported SQL as of that version into the standard might have been viable, but no one interested in having WebSQL proposed that or any other resolution.

That relates to the cited issue of absence of independent implementations, which would have been a problem even with a spec that supported independent implementations and verification of their compliance independent of a particular reference implementation. though I personally think the spec problem is a bigger real problem (even if not the decisive policy problem) than the “everyone is using the same underlying software to implement the spec” problem is in this case, where the shared implementation is a permissively licensed open source implementation sponsored by several of the browser vendors, among others.


hmmm...I appreciate the thoughtful reply. You bring up an interesting point. What is the SQLite documentation licensed as? I would assume PD like the rest of it, but I don't know that for sure.


SQLite itself is in the public domain.


The standard deciders are the implementors. There is no point in opposing them. The W3C is actually the representatives of Google, Mozilla, Microsoft, Opera and so on.



Sounds like a solution is to use the C implementation to define the standard and have it canonized in to an ISO.


That's what Opus did. The RFC[1] has a base-64 encoded libopus.tar.gz appendix (Appendix A), which is the "primary normative part of this [Opus] specification." If the prose and source code disagree, the source code takes priority and "wins" when it comes to which is normative.

I have a love-hate relationship with this approach.

[1]: https://datatracker.ietf.org/doc/html/rfc6716


That is common for codec standards, the normative part of many MPEG specifications is the parser/decoder in C-like pseudo-code. What is somewhat unique for Xiph is that their normative reference decoders are actually usable.


funny, the RFC even includes a shell command pipeline to extract the base64 out of the awkward RFC formatting.

Using the C source code still leaves room for ambiguities / under-specification, no? After all, the semantics rely on the particular gcc release used for compiling the code.


There is still the possibility of a bug or under-specification, but that's always the case in any spec. At least with Opus they document what implementation-defined behavior they require, so assuming there aren't any hidden bugs then you should get consistent output across compilers.


but the semantics change depending on the build tool version and other factors.


Yeah, a solution in search of a problem.


> Writing an XML parser is not a trivial task, but it's still simpler than writing an SQL parser, query optimizer, compiler, bytecode VM, full-text search engine, and whatever else Sqlite offers, without any data corruption in the process.

Just to clarify: You don't actually need to implement all that for it to be a standardized file format, any more than you need to implement all the spreadsheet functionality to be able to read a LibreOffice spreadsheet. All you need to do is to be able to reconstruct the tables. There's no reason, having reconstructed the tables, you couldn't write your own imperative code in the language of your choice to go over them and get whatever information you wanted.


> This isn't a concern for most software.

It's not even a concern for the US Library of Congress, which defined SQLite as a recommended storage format for datasets alongside CSV, XML, and JSON.


But those are completely different uses of a storage format.

Library of congress considers if someone a 100 years from now could write a new importer in whatever langauge/AI they might use by then.

Office documents are something you send in email attachments to people you often barely know, and expect them to read it in whatever office system they have. And if the recipient uses e.g., Microsoft Word, OFD/Sqlite might not work.


It is true that it requires effort for the developers of a software program to support a given file format. Beyond that I'm not sure what your point is.


Not the op, but one point would be, why did we even pick xml, when we had latex and html? Why is a relational database the right tool for a document format?


They're constrained by different requirements. The comment was clear enough:

"those are completely different uses"

It's not a hard concept to grasp. There is no riddle to decipher.


> Office documents are something you send in email attachments to people you often barely know, and expect them to read it in whatever office system they have.

Eh, if they're not running the same office system, down to patches, you can't really expect much.


You seem to be mixing up the file format with how it's used. An application that uses SQLite's file format would use SQLite's library as part of the application. Yes, it would be quite a lot of work to replicate that library but in the same way that replicating the code that uses OpenDocument's file format would be.

The file format itself is pretty straightforward.


But you don't need a standard, because all interaction between applications and the document is made through SQL. And SQL is standardized (at least the parts that matter). If you have concerns about compatibility, make sure that the document can also be accessed through other databases (like mysql).


But other databases cannot access sqlite databases, because the file format is internal...


SQL file format is very well documented. In some universities it is an assignment to directly read and write sqllite files from disk and understand the paged and blocks structure.

You don’t need sql for any of it.

https://www.sqlite.org/fileformat.html


It's interesting that this is a classroom assignment, like the sibling comment, I'd curious which university / class this was. I did the read part (+ query planning) on my own as an exercise, but I haven't gotten around to implementing writing yet.

You do need to parse DDL to get the column names, they're stored as a "CREATE TABLE" string. But you don't have to if you want to dump the file without names.

https://github.com/dunhamsteve/sqljs


> In some universities it is an assignment to directly read and write sqllite files from disk and understand the paged and blocks structure.

do you have any links?



I'll admit, that's a fantastic third party effort. But there definitely isn't the same level of first party support as there is for zip files.


They can if they want to, using the standard SQLite lib or their own implementation.


>at least the parts that matter

In my experience every part matters in non-trivial use cases since someone somewhere will use that part.


This sounds exactly like the argument that killed WebSQL in 2010: https://en.wikipedia.org/wiki/Web_SQL_Database

I am still salty about this, as WebSQL would have made it much easier to build a certain class of web apps.



It almost seems worth giving up ISO for SQLite, but I understand there are real concerns when you get into enterprisy stuff.

SQLite is kind of its own standard. It's public domain and they don't do breaking changes all day, and it's in C. As long as C is still viable, SQLite is usable on basically all non embedded platforms, and nobody really needs to reimplement it, unless they want to port it to Rust or something.

Not that you'd need to, since it's already very reliable.


> less capable engine

There wouldn't be another engine.

It would be SQLite. Period.


This could have been used as an opportunity to standardize the SQLite3 DB file format.


I've never seen this as a problem, since plenty of random things are distributed as sqlite files. All the remaining questions for ODF would be about the schema design.


Just define the schema and the semantics of each column for each table.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: