I love SQLite, and Fossil is very cool, but I don't see the fundamental difference between Git adding another file in the .git directory, and Fossil adding another table or index in the SQLite database.
This is about having a single, unified interface for all operations. This is all explained in great details by SQLite itself at https://sqlite.org/appfileformat.html
> This is about having a single, unified interface for all operations.
Unless you're intending to run joins on git data, were exactlty do you see any fundamental difference between running CRUD operations via an SQL interface or just importing/exporting a file?
That's the whole point: git data is highly relational. Retrieving a commit alone is completely useless to you, just as retrieving any of the core objects alone is. Every operation you do requires retrieving multiple, interconnected objects... which SQL excels at.
As usual with the SQLite / Fossil developer argumentation, it just seems very biased and far-fetched. Just one example:
> Pile-of-Files Formats. Sometimes the application state is stored as a hierarchy of files. Git is a prime example of this, though the phenomenon occurs frequently in one-off and bespoke applications. A pile-of-files format essentially uses the filesystem as a key/value database, storing small chunks of information into separate files. This gives the advantage of making the content more accessible to common utility programs such as text editors or "awk" or "grep". But even if many of the files in a pile-of-files format are easily readable, there are usually some files that have their own custom format (example: Git "Packfiles") and are hence "opaque blobs" that are not readable or writable without specialized tools. It is also much less convenient to move a pile-of-files from one place or machine to another, than it is to move a single file. And it is hard to make a pile-of-files document into an email attachment, for example. Finally, a pile-of-files format breaks the "document metaphor": there is no one file that a user can point to that is "the document".
More precisely:
> But even if many of the files in a pile-of-files format are easily readable, there are usually some files that have their own custom format (example: Git "Packfiles") and are hence "opaque blobs" that are not readable or writable without specialized tools.
What is advocated here is to transform the pile-of-files in a single SQLite database accessed through SQL queries. So instead of having only a few binary blob, transform everything in a binary blob and force the use of one specialized tool for everything.
> It is also much less convenient to move a pile-of-files from one place or machine to another, than it is to move a single file.
This is not true.
> And it is hard to make a pile-of-files document into an email attachment, for example.
I would not trust someone that had just sent his git repo over email.
> Finally, a pile-of-files format breaks the "document metaphor": there is no one file that a user can point to that is "the document".
A VCS will track source files. Maybe their argument is true for other applications, but for a VCS this is plain useless.
Indeed having only an SQL connector accessing a database is a unified interface to the file. But unifying this to the user means that you have to move the complexity further down, as explained:
> But an SQLite database is not limited to a simple key/value structure like a pile-of-files database. An SQLite database can have dozens or hundreds or thousands of different tables, with dozens or hundreds or thousands of fields per table, each with different datatypes and constraints and particular meanings, all cross-referencing each other, appropriately and automatically indexed for rapid retrieval, and all stored efficiently and compactly in a single disk file. And all of this structure is succinctly documented for humans by the SQL schema.
Yeah, and I don't want to have this complexity managed by a single "entity", I want to have several different tools available to do whichever kind of work I need to do. If I'm working on graphs and need to store them, I would prefer having the ability to read my file directly in my other tools for graph analysis / debugging without having to take the intermediate step of connecting to the SQL database, or redefining a way to work with the SQL paradigm to adapt my file format to the "dozens or hunders or thousands of different tables, fields per table, each with different datatypes".
This point is even more salient regarding grep / awk. The author obviously prefer using the query language of his choice and disregards the variety of tools to work on text, but there are many, many tools available to do all kind of work on it, and believing that
> An SQLite database file is not an opaque blob. It is true that command-line tools such as text editors or "grep" or "awk" are not useful on an SQLite database, but the SQL query language is a much more powerful and convenient way for examining the content, so the inability to use "grep" and "awk" and the like is not seen as a loss.
Is just nonsense. Passing on the file edition conveniently put under the rug, querying the text is usually only the beginning, usually someone wants to parse the output and act upon it, maybe even put back some modified version (sed), and so on.
The author just seems close-minded and living in his own world, unable to imagine that other people might want to work differently.
This reminds me a lot of his rant against git and for fossil, with the exact same bad faith arguments and lack of knowledge about other ways to do things.
Your points are valid, especially considering that this page is explaining the benefits (for the author) of using SQLite as a generic application file format; however we're only talking about git here, and my usage of git is limited to "git some-command", sometimes "git some-command | grep foobar", and most of the time I'm in a GUI anyway. I'm not grepping the git objects directly, so whether I use a git subcommand or a sql subcommand won't make any difference to me. The real advantages of using sql subcommands for me are:
- I could probably plug that into something else with more ease than something that is git-specific
- I have more flexibility for querying out of the box, without learning the specifics of each subcommand. The full SQL language is there at my disposal for outputting exactly what I need