Hacker News new | past | comments | ask | show | jobs | submit login

The issue is incremental updates, and this is where things get complex. If you have a file embedded in the middle of a zip file that is 100 bytes, and you want to resize it to 150 bytes, how do you do that?

You can’t squeeze it in without moving everything else about, which disrupts other readers. You could append it to the end maybe, but you need to handle concurrent writers. Compression also is an issue here - I expect zip compression is applied to multiple files at once, rather than per file? So now you might need to update multiple seemingly unrelated files.

You need to step down from the concept of a whole file as a unit and move towards pages of data that can be incrementally updated/reused/freed, where each page might contain one, many or even only a part of a “unit” (file/row/whatever)

This makes things more complex for sure




So basically ODF loads everything into memory, relies heavily on in-memory structs for quick unsaved updates, and is crash-safe by writing the whole zip to a temp location during saves. Kinda similar to MS Office. The file structure also helps a little. This is good enough for small docs.

Many times have I encountered large docs, often spreadsheets, that push the limits here and become noticeably slow. If you want to get more sophisticated with the indexing and paging, SQLite is a very natural path. Anything else would be reinventing the same wheels SQLite has spent decades refining.


> Anything else would be reinventing the same wheels SQLite has spent decades refining.

Which is exactly the problem. They (I.e one dude?), and they alone have spent decades refining a single implementation.

Before we go and lock the entirety of the worlds documents into what’s essentially a proprietary format specific to a single implementation of a single library written by a single dude… we should double check if that’s a good idea or not, and if we can, collectively, solve some of these issues without reimplementing the whole of SQLite.

Because that’s complex. Perhaps more complex than it needs to be for most applications, which would benefit from the storage part more than the query part. And then we are back at the start of our discussion?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: