
VFS shim that allows a SQLite database to be appended to another file - blacksqr
https://sqlite.org/src/file/ext/misc/appendvfs.c
======
rkeene2
For a bit of context, in the Tcl ecosystem there's the concept of "Starkits"
which are executable scripts which contain a stub loader which mounts a
virtual filesystem archive (which contains all the files the actual script
needs to run).

Typically this has been either a Metakit database or a Zip file, but they both
have disadvantages when compared to SQLite and we've been wanting to move to
SQLite for a while but it didn't support being appended to things, so we
couldn't. At the most recent Tcl Conference DRH agreed to fix this and with
this change it looks like he has done it. Expect an SQLite-backed Starkit soon
!

Also, and unrelated, Starkits can be converted to "Starpacks" which replace
the script-based stub loader with a native executable, so you can distribute
Tcl-based scripts as a single file.

~~~
oblio
Tcl has so many amazing ideas. I just wish it had higher adoption and
everything about it wouldn't feel so... old. Look at the wiki for example. It
should really be an actual wiki à la Wikipedia not some sort of strange
Wikipedia discussion page masquerading as documentation.

~~~
simias
Do you think it's an actually good language though? I only tinkered around
with Tcl (long ago when writing eggdrop bots and more recently while messing
around with FPGA and ASIC build scripts) and while it's serviceable it never
struck me as a particularly interesting or elegant scripting language. When
given the choice I'll always pick Python or Perl over Tcl. Am I wrong? What
did I miss?

~~~
eridius
I think Tcl is incredibly elegant. It's got a set of very simple syntax rules
that compose together to govern everything. You can use Tcl to create your own
keywords that are used exactly like the built-in ones and your caller won't
ever tell the difference (well, unless they inspect an error backtrace). For
example, a try/catch syntax was proposed as TIP 89¹. And many many years ago,
as part of the MacPorts project, I implemented the entirety of TIP 89 in pure
Tcl. The TIP was proposing actual Tcl keywords, but I implemented exactly the
proposed syntax and semantics without a single keyword. I have no idea if that
code exists in any form anymore (I haven't been involved with the project in
something like a decade), but I'm pretty sure the only observable difference
between that and TIP 89 is error backtraces² that cross a frame involving
try/catch.

Also, virtually every type³ in Tcl can be expressed as a string, and so you
can model the entire language mentally as if every value _was_ in fact a
string. If you wanted to write your own Tcl interpreter it would be pretty
easy to actually implement it this way, though obviously performance would
suffer quite a lot.

¹Later superseded by TIP 329 in response to the error options dictionary in
Tcl 8.5.

²And FWIW I believe there's no actual spec for what error backtraces look like
anyway, I think it's implementation-defined, though don't quote me on that.

³The only type I know of offhand that can't be expressed this way is an array
(which is distinct from a list, that is a fundamental building block of Tcl
and can absolutely be expressed as a string)

~~~
convolvatron
anyone interested in starting to play around with languages would do well to
write out the 'everything is a string' version. its very short and
understandable.

plus the result is really amenable to inclusion in another program as an
extension language (tcl is too, but it has a much bigger footprint)

------
regularfry
Heh, this reminds me of a trick I was working on right about the ruby 1.8->1.9
transition: import the tree of .rb files (including gems) into a sqlite
database, mess with the `require` mechanism to read from it instead of the
actual filesystem, and make a static build of ruby with sqlite itself, the
ruby code database packaged as a .text area, and any binary extensions all
linked up into a single file. Ta-da, single file static ruby apps and trivial
deployments: a truly useful thing back when Capistrano was the latest and
greatest.

Unfortunately, 1.9 changed the ruby build system enough that I couldn't quite
get it working, and I've not got round to looking at it since.

~~~
gandreani
That is very neat! I always thought about expanding that idea to most
programming languages. Most of the startup time is probably spend stating and
reading files.

Just from a quick experiment

    
    
        time python -c ''; time python -S -c ''
    
        real	0m0.079s
        user	0m0.021s
        sys 	0m0.030s
    
        real	0m0.017s
        user	0m0.007s
        sys 	0m0.007s
    

From the manpage:

    
    
        -S     Disable the import of the module site and the site-dependent manipulations of sys.path that it entails.
    
    

I also considered doing the same thing for a linux distro. Just out of pure
curiousity to what would happen to the boot up times. Wish I could have time
to try it one day :(

~~~
namibj
What about going all the way and using an unikernel (or something approaching
one)? Statically link suitable readonly data/code into a single global address
space (to make use of optimizations possible by combining binaries, as
apparent in e.g. busybox, and remove the overhead of code duplication and
dynamic linking, if possible even skipping some overhead with syscalls, by
allowing the compiler to reduce spilling registers to stack in an effort to
prevent clobbering), and just make sensitive and modifiable data unavailable
to or at least not writable by other/unprivileged processes.

This would probably require some modification to GCC or LLVM, if the latter
even supports building a normal Linux.

~~~
gandreani
Madness! Or actually really similar to an idea that Joe Armstrong presented in
Strange Loop one time (just the idea no demo or anything)

[https://www.youtube.com/watch?v=lKXe3HUG2l4](https://www.youtube.com/watch?v=lKXe3HUG2l4)

~~~
ZephyrP
There exists an Erlang unikernel (of sorts):
[http://erlangonxen.org/](http://erlangonxen.org/)

~~~
namibj
Well yeah, but that's like saying there exists hardware that eats java
bytecode. Yes, it does, but it brings _large_ restrictions with it.

------
torstenvl
It seems like the intended use-case would be for the database to be appended
after the executable that is using it. But writing to a file that's being
executed seems (to my naive view - I'm not terribly experienced) like it could
cause issues.

What am I missing?

~~~
yellowapple
I suspect the intent would be an extension of the idea of using SQLite as the
base for a custom binary file format. One use case might be to embed non-code
resources (graphics, audio, localization strings, etc.) such that they can be
easily queried and loaded via SQLite instead of some other method.

~~~
SQLite
I confirm your suspicions.

People have been appending ZIP archives to executables, in order to hold non-
code resources, for time out of mind. The appendvfs extension makes the same
thing possible for SQLite databases. An SQLite database has advantages over a
ZIP archive in that SQLite supports a far richer data model, is far faster for
random access, and has a query language. Both ZIP and SQLite are well-defined
and very widely deployed formats. Many developers are more familiar with ZIP,
but there are more than 1 trillion SQLite database files in circulation, and
SQLite is a recommended storage format according to the US Library of Congress
([https://www.sqlite.org/locrsf.html](https://www.sqlite.org/locrsf.html)) so
SQLite should not be dismissed as being too unfamiliar.

Example use cases:

(1) We have experimented with (but not published) putting all the SQLite
documentation into an SQLite database and appending it to a special webserver
app. Download the "EXE" and double-click on it and the SQLite docs
automatically pop up in your web browser. This is better than a pile-of-HTML-
files in that it can use server-side computing for things like the "Search".

(2) Not yet published or documented, but you can do "make sqltclsh" from the
SQLite source tarball and generate a TCL interpreter with SQLite built in. If
you also append an SQLite database to this interpreter, it reads its scripts
from the database. Use this to build stand-alone Tcl/Tk/SQLite applications.

(3) By 3rd-party user request: the Fossil version control system allows a
Fossil repository (which is just an SQLite database) to the end of the
"fossil.exe" binary. This is being used (I am told) to provide a rich package
of read-only but versioned content to non-technical users. The non-techies
just put the "document.exe" file on there windows desktop and double-click,
and a webserver pops up showing the reports they need, with complete
historical versioning provided by Fossil.

~~~
yellowapple
"Not yet published or documented, but you can do "make sqltclsh" from the
SQLite source tarball and generate a TCL interpreter with SQLite built in. If
you also append an SQLite database to this interpreter, it reads its scripts
from the database. Use this to build stand-alone Tcl/Tk/SQLite applications."

Holy shit. This will be a godsend. I've been searching for an easy and robust
way to package Tcl scripts into executables independent of a full-blown Tcl
installation (I've tried a couple different approaches, but they all tend to
have issues on some platform or other). This answers my prayers rather
thoroughly, provided it can be documented and made relatively easy to do.

------
mhd
So, resource fork 2.0?

------
phaedrus
I could use this! While converting someone's Win32 C program to C++, it struck
me that most of the magic numbers, window class names, arrays of GUI control
data etc. which are defined in code (in both windows.h and the user source
code) could be defined in a read-only sqlite database instead. And with this
link here, it wouldn't even have to be a separate file.

~~~
roel_v
On the user side, most of that can be stored in the resource section (for PE
format executables, so Windows only for all practical intents and purposes).
It has annoyed me many times that no such thing exist in Unix land (ELF,
basically), and instead I have to rely on 'bin2hex' style tools and store that
data in the data section.

~~~
dmitrygr
Elf has the concept of sections, and you can make as many of them as you want.
The standard gnu Linker also automatically creates labels that you can use to
get ahold of those sections. For example for a section titled XYZ, it'll
automatically create to extern variables named __xyz_start and @__xyz_end
sorry some such thing. You can place things into sections from source code, or
add them later using objcopy

~~~
roel_v
Well yes, let me rephrase then: it's a shame there is no standard API to
categorize and access data in other sections.

------
usermac
Anything SQLite I upvote. ^_^

------
pdw
I encourage everybody to check out sqlite's ext/misc directory. There are lots
of interesting extensions there, and most people don't know about it.

------
hexmiles
that's so cool, I don't understand if this is available in the standard
distribution. it seem so, but when i tried to use the command-line tool i get
"no such VFS", maybe i am missing something. Is there a tutorial/introduction
on how to use it? is exactly what i needed for a couple of project

edit: found! i need to use the '\--append' option

~~~
dangerbird2
VFSes just a c object created as part of the sqlite runtime and are registered
and activated ^1 with the c api. You can recompile the sqlite command line to
open with a given vfs object, or enable the URI filenames feature ^2 which
allows opening a file with a given vfs. For the appendvfs in th original post,
you probably need to use a new version of sqlite with it in the sourcecode

^1
[https://www.sqlite.org/c3ref/vfs_find.html](https://www.sqlite.org/c3ref/vfs_find.html)

^2 [https://www.sqlite.org/uri.html](https://www.sqlite.org/uri.html)

------
_pmf_
That's the kind of amazing emergent use cases that you get by sticking to
known, old technologies and not doing anything supposedly clever.

