More

benbjohnson · 2024-09-27T15:18:29 1727450309

Author here. I'm happy to answer any questions although this project was from 10+ years ago so I could be a little rusty.

Over the years I've been trying to find better ways to do this kind of visualization but for other CS topics. Moving to video is the most realistic option but using something like After Effects takes A LOT of time and energy for long-form visualizations. It also doesn't produce a readable output file format that could be shared, diff'd, & tweaked.

I spent some time on a project recently to build out an SVG-based video generation tool that can use a sidecar file for defining animations. It's still a work in progress but hopefully I can get it to a place where making this style of visualizations isn't so time intensive.

huntaub · 2024-09-27T17:13:56 1727457236

I just want you to know how much this visualization was appreciated. In my time working at AWS, I recommended this website to every one of our new hires to learn how distributed consensus works. Know that this has taught probably 50+ people. Thank you for what you’ve built.

benbjohnson · 2024-09-27T20:46:38 1727469998

Thanks so much for letting me know! It's always hard to tell when I put something out there if it just gets lost in the ether. I'm glad to hear it helped so many folks.

kfrzcode · 2024-09-27T16:28:43 1727454523

What are your thoughts on Dr. Leemon Baird's Hedera Hashgraph?

https://www.swirlds.com/downloads/SWIRLDS-TR-2016-01.pdf

thesz · 2024-10-10T08:31:10 1728549070

"Suppose Alice has hashgraph A and Bob hash hashgraph B. These hashgraphs may be slightly different at any given moment, but they will always be consistent. Consistent means that if A and B both contain event x, then they will both contain exactly the same set of ancestors for x, and will both contain exactly the same set of edges between those ancestors."

Consider UTXO-based events. There can be an event E1 that consumes UTXO1 and UTXO2 and event E2 that consumes UTXO2 and UTXO3. Hashgaphs that contain one of these events are consistent but their union is not. This can be used to perform some byzantine things, I can think of at least two of them: doublespend and degradation of service.

This paper is a clear example of how to make a thing that has no obvious problems.

benbjohnson · 2024-09-27T20:48:28 1727470108

I haven't read that paper but it seems like it's fixing a different problem of Byzantine fault tolerance. Most consensus systems that are internal for an organization don't have the Byzantine issue so it simplifies the problem.

mgenglder · 2024-09-27T16:05:56 1727453156

This is wonderful. Can I ask how you created it? Stack used and sour e code? I'd love to create something like this to help visualize things I'm working with currently.

benbjohnson · 2024-09-27T20:45:41 1727469941

It's all done with d3 and JavaScript. The visualizations aren't deterministic so I ended up writing a shitty Raft implementation in JS. Overall it was a terrible approach because it was so time consuming but I made it work. You can find all the source code in this repo: https://github.com/benbjohnson/thesecretlivesofdata

benbjohnson · 2024-08-08T15:59:24 1723132764

IIRC he was given shares in the company. https://news.ycombinator.com/item?id=39629939

benbjohnson · 2024-04-09T04:42:24 1712637744

LiteFS author here. I don't disagree with any points in the article but perhaps a reframing could help. I previously wrote a tool called Litestream that would do disaster recovery for a single-node SQLite server and I still think it's a great default option for people starting new projects. Unless you're doing very database-specific things, most SQL will carry over between SQLite and Postgres and MySQL, especially if you add ORMs in the mix. Pick the one that gets you writing code the fastest and you can switch down the road if you need it.

Rather than a paradigm shift or hype, I see distributed SQLite as an extension of a path that devs can go down. With Litestream, the most common complaint I got was that devs were worried that they couldn't horizontally scale with SQLite and they'd be stuck. While you probably won't hit vertical scaling limits of SQLite on most projects, it still caused concern. So LiteFS became a "next step" that a dev could take if they ever got to that point. It doesn't need to be your starting point.

As for the "hacky" solution of txid, I'm not sure why that's hacky. Your application isn't required to use it or the optional built-in proxy but it's available if it fits your application's needs. It also works for plugging legacy applications into distributed SQLite without retrofitting the code. The proposed solution of caching seems orthogonal to the discussion of distributed application data. I don't think any database provider would suggest to avoid caching when it's appropriate but there's plenty of downsides of caching. Hell, it's one of the two hardest problems in computer science.

TheCapeGreek · 2024-04-09T07:18:08 1712647088

>most SQL will carry over between SQLite and Postgres and MySQL, especially if you add ORMs in the mix

I think this goes underappreciated, or rather the opposite is overstated.

Sure there are some edge cases that don't work the same, but most apps won't hit those.

My _biggest_ gripe with SQLite so far is the lack of column reordering like other DBs. And my simplistic understanding is that the others do it exactly the same way as you'd do it manually with SQLite - table gets _replaced_ with an identical table with the data correctly ordered and the data is shoved into the new table.

simonw · 2024-04-09T12:03:03 1712664183

If you want a more convenient way to do column reordering (and other advanced alter table operations) in SQLite my sqlite-utils CLI tool can do this:

    sqlite-utils transform data.db mytable \
      -o id -o title -o description

That will change the order of the columns in the specified table such that id, title and description come first.

The same command can handle many other operations such as changing column types, renaming columns or assigning a new primary key.

https://sqlite-utils.datasette.io/en/stable/cli.html#transfo...

masklinn · 2024-04-09T09:07:35 1712653655

> Sure there are some edge cases that don't work the same, but most apps won't hit those.

That really depends on your modelling style. If you like things like types, SQL-side processing (eg using functions), or covering indexes, then you’ll hit issues every five minutes in sqlite.

SQLite really wants the logic (including consistency logic) in the application, just compare the list of aggregate functions in postgres versus sqlite, or consider that you have to enable FKs on a per-connection basis.

Which I guess is why ORMs help a lot: they are generally based on application-side logic and LCD database.

simonw · 2024-04-09T12:04:24 1712664264

I'm pretty sure SQLite has covering indexes. And the relatively new strict mode should enforce at least basic types (though if you want to enforce your own rules for things like dates you're still on your own).

masklinn · 2024-04-09T12:29:09 1712665749

> I'm pretty sure SQLite has covering indexes.

I checked to be sure I had not missed it, and didn’t find anything. You have expressions and conditions, but no covering. Obviously you can kinda emulate it by adding the columns you want to cover to the key, but…

> though if you want to enforce your own rules for things like dates you're still on your own

That’s what I was talking about, having richer types, and the ability to create more (especially domains).

Strict tables provides table stakes of actually enforcing the all-of-5-types sqlite has built-in. Afaik a strict mode is something that’s still being discussed if it ever becomes reality.

samatman · 2024-04-09T15:44:21 1712677461

SQLite has what they "call a covering index", see point 9 here: https://www.sqlite.org/optoverview.html

My impression is that this mechanism is less general than what one finds in full-fat client-server SQLite databases.

masklinn · 2024-04-09T15:55:22 1712678122

Ya if my reading is correct this is the poor man's covering index: if all the requested data is in the index key the query will not hit the table, so you can add additional fields at the end of the key to get index-only scans (at a cost, also some flexibility cost e.g. doesn't work with unique indexes).

I guess it's less of an issue in sqlite than in databases with richer datatypes in the sense that all datatypes are ordered and thus indexable.

cztomsik · 2024-04-09T17:06:45 1712682405

can you elaborate? I was living under impression that what sqlite has, is exactly what covering index is...

masklinn · 2024-04-09T17:42:40 1712684560

With a "proper" covering index (an INCLUDE clause in SQL Server or Postgres for example) you add data to the index value. This means it can be retrieved just by looking into the index but

- it's not constrained (e.g. to be orderable)

- it does not affect the behaviour of the index, so you can have covering data in a UNIQUE index, or in a PK constraint (although for the latter one might argue a clustered index is superior)

- it only takes space in leaf nodes, not interior nodes, so you can have better occupancy of interior node pages, less pages to traverse during lookup, and they have better cache residency

- and finally the intent is clearer, when you put everything in the key it does not tell the reader what's what and why it there, and thus makes it harder to evaluate changes

simonw · 2024-04-09T17:30:48 1712683848

In PostgreSQL a covering index can be configured which includes extra information from columns that aren't part of the searchable index itself. It's documented quite well here: https://www.postgresql.org/docs/current/indexes-index-only-s...

    CREATE INDEX tab_x_y
    ON tab(x) INCLUDE (y);

simonw · 2024-04-09T13:03:54 1712667834

"Obviously you can kinda emulate it by adding the columns you want to cover to the key"

Right, that seems like a good solution to me.

masklinn · 2024-04-09T13:57:04 1712671024

It’s a workaround, but it bloats the interior pages of the index with the covering data, which increases the size of the index and makes lookup less efficient (as they have to traverse more interior pages, and since there are more pages those are less likely to remain in cache).

distalx · 2024-04-10T05:29:36 1712726976

I use SQLite in my personal projects, not professionally. I was wondering if you could elaborate on what you mean by 'consistency logic' in the context of SQLite.

dkjaudyeqooe · 2024-04-09T07:46:09 1712648769

Why would you want to recorder columns? SQLite reads in a whole record at a time to access any column.

masklinn · 2024-04-09T08:55:19 1712652919

Because as in structs padding slack can lead to a surprising amount of overhead.

dkjaudyeqooe · 2024-04-09T16:59:51 1712681991

That's not the case:

"SQLite does not pad or align columns within a row. Everything is tightly packed together using minimal space."

https://sqlite.org/forum/info/06ad7f81fea46401

prirun · 2024-04-09T16:44:06 1712681046

One reason to reorder columns with SQLite is that if a column is usually null or has the default value, SQLite will not store the column at all if it is at the end of the row. It only saves a couple of bytes per column, but it is a reason to get these columns at the end.

masklinn · 2024-04-09T16:51:13 1712681473

AFAIK position has nothing to do with nulls, a null is a 0 byte in the header and has no payload in the row: https://www.sqlite.org/fileformat.html#record_format

prirun · 2024-04-13T20:22:51 1713039771

Continue reading that section:

"Missing values at the end of the record are filled in using the default value for the corresponding columns defined in the table schema."

If you have a table with 5 columns and you only insert the first 3 columns (based on create table column order) because the last 2 values are null or default, SQLite will only insert 3 type bytes in the header. However, if the first column (in create table order) is the one you omit, SQLite has to include its type byte, even if the value is null.

simonw · 2024-04-09T12:06:16 1712664376

I reorder columns all the time for neater readability of "select *" queries.

anacrolix · 2024-04-09T08:05:39 1712649939

Not true.

cztomsik · 2024-04-09T17:05:00 1712682300

column reordering is simple to fix with this migration script https://david.rothlis.net/declarative-schema-migration-for-s...

if you are using Zig (and like to live on the bleeding edge), you can also just use my library which includes similar script and also a simple query builder https://github.com/cztomsik/fridge?tab=readme-ov-file#migrat...

cjblomqvist · 2024-04-09T10:54:14 1712660054

I think the bigger issue for many is that tooling, infra(provider), in-house knowledge/skill/experience as well as optimizations may differ quite a lot.

Of course, this will differ a lot between projects.

SJC_Hacker · 2024-04-09T15:52:30 1712677950

SQLites handling of dates is pretty kludgy.

It stores them as strings, so to do something like extract just the year from a date, you have to do 'CAST(substr(game_date,0,5) AS INTEGER).'

Hackish and error prone.

MobiusHorizons · 2024-04-09T16:12:54 1712679174

It is a fairly low level abstraction, but one that does not require a verbose api. There is nothing error prone or hackish about what you have written, it will work for all inputs, it is just low level. You are just used to having other people write this code for you and give you a library. With newer versions of SQLite you could also write

CAST(strftime(“%Y”, game_date)) as INTEGER

Which is somewhat higher level and less easily mistyped

SJC_Hacker · 2024-04-09T17:39:34 1712684374

That's much better, thanks. In case I ever need to do years < 1000 or > 10000 :-)

Still, having that all over a query looks ugly. SQL is can be unreadable enough as it is without all the joins/table renaming.

I just want something more readable like EXTRACT(year from date), like you can in Postgres et al.

Would also be nice if there was a native timestamp like there is in, pretty much every other database.

I'm sensitive to "feature creep" but this doesn't seem like too big of an ask.

MobiusHorizons · 2024-04-10T04:50:26 1712724626

I agree it's less obviously correct, but I bet you could add the extension to sqlite if you feel strongly about it. As an aside '%y' is documented to only work in sqlite for years >= 0000 and <= 9999, so it would behave exactly the same as the code you wrote. especially because you already didn't have to worry about years less than 1000 because the ISO8601 format used for serializing dates in sqlite normalizes them with leading zeros.

for instance `select date(-50000000000, "unixepoch");` returns `0385-07-25`

Interestingly %Y doesn't seem to handle negative dates either if you need to handle BC, so I guess that is one downside for both. This is one reason I sometimes prefer to use low level code even when it is less obviously correct with a cursory glance, because abstractions may not mean what you think they mean, or even worse, may be lying to you. At least with low level code I can reason about how it would behave under certain edge cases I might care about.

benbjohnson · 2024-02-13T23:30:20 1707867020

We have an region-aware S3 replacement that's in beta right now: https://community.fly.io/t/global-caching-object-storage-on-...

benbjohnson · on Sept 29, 2023

Do you mind if I ask who you work for? I’m in Denver as well and I’m always curious what companies are doing 4x8 or even 4x10.

wil421 · on Sept 29, 2023

Someone I know worked at Lockheed and had every other Friday off and worked 9-5. Not quite a 4 day work week because half the smallish team needed to be on call Fridays.

panzagl · on Sept 29, 2023

A lot of government offices (and therefore their contractors) work '9 nines' to get every other Friday off, though in practice it really becomes '9 eights'.

mym1990 · on Sept 29, 2023

There needs to be a catalogue for this!

imiric · on Sept 29, 2023

https://4dayweek.io/

blissfulresup · on Sept 29, 2023

Interested too case OP responds

benbjohnson · on Sept 22, 2023

Author here. The comparison was meant to be about how Postgres (or any client/server RDBMS) is typically deployed. Yes, you can deploy Postgres on the same machine but I wouldn't say it's common. Maybe I could have expanded more on that point or simply referenced client/server architecture rather than Postgres so it didn't seem like a straw man argument.

marcosdumay · on Sept 22, 2023

If I had to guess, I'd say that single-machine (with cold backups) is the most common way to use Postgres with a web server.

benbjohnson · on Sept 22, 2023

Author here. My goal in the comparison was only in terms of scope, not that Postgres folks should be penalized for having good documentation. I think Postgres is great and it makes sense to use it when it's called for. But I think it can be overkill for many projects.

voganmother42 · on Sept 22, 2023

Makes sense and I enjoyed the article.

Estimating the complexity of using a project can be really...complex. I think about systems I have used which make it easy to use a minimal set of features and where I don't have to reason about or be negatively impacted by aspects I do not benefit from, and other systems where things are less easily isolated and more challenging to reason about.

I do think the Postgres docs in particular seek to be a reference in addition to an operating manual and I for one really enjoy them. I think the point is well made that Postgres can be too much (or too much right now) for many projects.

benbjohnson · on Sept 22, 2023

Author here. The single-node restriction for Litestream was one of the main reasons we started LiteFS. There isn't a way to handle streaming backup from multiple nodes with Litestream & S3 as SQLite is a single-writer system and there aren't any coordination primitives available with S3.

I agree that many of the SQLite cloud offerings introduce the same network overhead. With LiteFS, the goal is to have the data on the application node so you can avoid the network latency for most requests. Writes still need to go to the primary so that's unavoidable but read requests can be served directly from the replica. The LiteFS HTTP proxy was introduced as an easy way to have LiteFS manage consistency transparently so you can get read-your-writes consistency on replicas and strict serializability on the primary. That level of consistency works for a lot of applications but if you need stronger guarantees then there's usually trade-offs to be made.

benbjohnson · on Sept 22, 2023

Author here. Cool to see the post make it up on HN again. I'm still as excited as ever about the SQLite space. So much great work going on from rqlite, cr-sqlite, & Turso, and we're still plugging away on LiteFS. I'm happy to answer any questions about the post.

otoolep · on Sept 22, 2023

Thanks Ben!

rqlite[1] creator, happy to answer questions too.

[1] https://www.rqlite.io

rubenv · on Sept 22, 2023

What's the status on litestream? Does that have a future as well or is it LiteFS all the way?

benbjohnson · on Sept 22, 2023

Litestream definitely has a future. Our goal is to keep it as a simple single-node disaster recovery tool though so it won't see as much feature development as something like LiteFS. We've been focused a lot on LiteFS & LiteFS Cloud to get them in a good place but I'm looking forward to going back and updating Litestream more regularly.

rubenv · on Sept 23, 2023

Not much feature development is perfectly fine if it works! Things don't have to evolve.

Planning to use litestream as a library to dynamically swap in/out dozens of databases in a process. Looking at the code it'll easily allow that (super clean, kudos!).

So many thanks, it's going to enable a lot of new things!

jjtheblunt · on Sept 22, 2023

Is there a recommendable way to feign a graph database within SQLite? (because read only replication would be fantastic on fly.io for us.)

benbjohnson · on Sept 22, 2023

SQLite has very little per-query overhead (as opposed to a database connection over a network) so I would think you could traverse a graph using multiple small queries rather than using a graph query language.

jjtheblunt · on Sept 22, 2023

yep that's what i am doing

benbjohnson · on Sept 17, 2023

Author here. My goal with the article was to write about an use of LiteFS that I found to be useful and show the benefits and trade-offs. I don't think it's a general purpose technique for everyone but I think it has its place in some cases. What was it about the post that oozed arrogance?