Edit: also  is about how to store SQLite in a Redis hash (rather than on-disk file).
IIRC, the heart of the db engine appears to be some sort of vm that processes the parsed out SQL. Stepping through in debugger will inevitably capture you in the loop which executes the statements.
Of course, there're plentiful design docs, all on https://sqlite.org.
The internal design of DuckDB, however, is different and not directly related to SQLite. As you mentioned, it is entirely optimized for analytical workloads whereas SQLite is optimized for point-queries and point-updates. The main things we share is that we also have a single-file storage format and offer an amalgamation (i.e. single-source file compilation, duckdb.cpp and duckdb.hpp).
Not a dev so I'm a bit dumb with this stuff, but I was definitely looking for pivoting stuff and such, which requires a lot of extra steps with sqlite.
Luckily. The SQLite source code is very well documented, and the 'btreeInt.h' has something about how the btree pointers are laid out
I happen to use the SQLite btree directly without the SQL layer (the VDBE) as a key-value store on top of a userspace filesystem (chrome cachefs) and even with this configuration it works wonderfully.
** The basic idea is that each page of the file contains N database
** entries and N+1 pointers to subpages.
** --------------------------------------------------- -------------
** | Ptr(0) | Key(0) | Ptr(1) | Key(1) | ... | Key(N-1) | Ptr(N) |
** All of the keys on the page that Ptr(0) points to have values less
** than Key(0). All of the keys on page Ptr(1) and its subpages have
** values greater than Key(0) and less than Key(1). All of the keys
** on Ptr(N) and its subpages have values greater than Key(N-1). And
** so forth.
** Finding a particular key requires reading O(log(M)) pages from the
** disk where M is the number of entries in the tree.
** In this implementation, a single file can hold one or more separate
** BTrees. Each BTree is identified by the index of its root page. The
** key and data for any entry are combined to form the "payload". A
** fixed amount of payload can be carried directly on the database
** page. If the payload is larger than the preset amount then surplus
** bytes are stored on overflow pages. The payload for an entry
** and the preceding pointer are combined to form a "Cell". Each
** page has a small header which contains the Ptr(N) pointer and other
** information such as the size of key and data.
After this, try it out with the SQL and see how it works.
Edit: just to add to this, this line here
> Each BTree is identified by the index of its root page
When you call 'sqliteBtreeCreateTable()' it will return to you the root page of the table it creates. In my experience it starts from 2, and my guess is that 1 is probably some master root page (i bet Julia will dig into this later).
This is how i've manage to have "keyspaces" here. i use the first created table as a meta-table, keeping the index of keyspace names with their root page number counterparts, and on opening the db, is just a matter of recovering this info and put it in a hash table (keyspace => page number) for later use.
I don't know how SQLite scales for large data sets, but for a small tasks it's fantastic, and can be trivially ported to something like Postgres.
> implementation details of the KV/heap layer with some meaningful transactional properties seem more interesting
There is undoubtedly a real beauty in LMDB's data structures, and it is impressive to see how Rockset have re-engineered RocksDB to become cloud-native …but my own feeling is that the SQL/whatever, distributed consistency and federation layers of a DBMS encompass some seriously hard and fascinating problems. What if the web could behave like a cohesive offline-first database for all of human knowledge after all?
Now I think about it, "interesting" has a pretty loose definition too!
SQLite is not as good at keeping big records on server side, for that you go with PostgreSQL. And if you're Windows only, then Access is faster for a local database. But if you want a good enough local database, that performs the same on all of them, then your only choice is SQLite.
> May you find forgiveness for yourself and forgive others.
> May you share freely, never taking more than you give.
If only more software engineers started out their projects with this mindset, the world would be a very different place.
Found the link from the following old HN discussion: https://news.ycombinator.com/item?id=5138866 - but had to use archive.org as the article is now a 404.
> If only more software engineers started out their projects with this mindset, the world would be a very different place.
Apparently this is a horrible thing to say?
"Please don't comment about the voting on comments. It never does any good, and it makes boring reading."
Or is only one agent at a time, allowed to write to it?
That night I met Julia (she and a friend were, if I remember correctly, trying to boot a handwritten kernel in a VM and figured it out around 11pm that only the first 300 bytes were getting loaded into memory).
I met Andrew Kelley, who had recently written an NES emulator that translated the code to LLVM (if I remember correctly).
I briefly met Filippo Valsorda. I think he was the person who, after I said I wanted to make a bitcoin arbitrage bot, immediately came up with putting the prices into a matrix and then computing optimal paths.
The place just seemed.. magical. I have a family and kids in Canada but i thought a long time about trying to reorient my life to NYC for a bit. It was just so cool.
It reminded me of the joy of learning for the sake of learning, hacking stuff because it feels like a magic superpower to bring ideas to life, and the joy of building things you have no intention on trying to make money off of.
I'm certain none of those people remember me but I remember the evening fondly whenever I see any of them show up on HN. Cheers :)
I applied for an internship there a couple years ago. After writing an essay on why I want to do RC, and an interview where, IIRC, I shared my camera but my interviewers did not, I was rejected with a boilerplate "not for us" explanation, and that was that.
I would've felt much better about it if they had at least invited me to visit, but this experience gave it a very corporate and impersonal feel.
I think, perhaps arrogantly, that RC would have benefited from my pollination.
We'll both be fine on our own, too.
RC has excellent marketing. There needs to be more places like it rather than everyone thinking that there can only be one place that everyone tries to get into.
Why there aren't any workplaces like the Recurse Center: no company will give you the total freedom to pursue your own programming interests 100% of the time, especially if they conflict with the larger goals of the company.
What do you mean by this?
RC is a company (a YC company actually) that leases expensive NYC real estate and bears all the other costs of running a company while providing its service for free. Unfortunately, it's not a hangout where anyone can join and thus they have to limit space.
As for "if you are not good enough"... all the tech interview questions (at least in 2016) are easier or simpler than real job interview questions. The important parts are the soft skills questions. I "studied" with PhDs, bootcamp grads, and college students. And personally, I'm a pretty meh developer. But just like any company, they don't always give the best feedback on rejections.
For what it's worth, RC has struggled with the question of whether to give personalized feedback to applicants who don't get in. They did it for awhile, but stopped for a number of reasons. They explain it here: https://www.recurse.com/feedback
I was rejected the first time I applied to do a batch at RC. I applied again a few months later and was accepted.
Most people don't have the skills Julia and the other people the original post mentioned. They're honestly the 1% of RC's over 1k (I think!) students over the years. Most of us have little projects that we work on for 12 weeks in a great environment or learn new things. Most of us are normal people who are lucky enough to be able to study programming for 12 weeks in NYC.