Hacker News new | past | comments | ask | show | jobs | submit | eatonphil's comments login

Blogspot = expert

Geocities = deity

Communication is tough. :)

My intent was definitely to focus on that you should not feel limited by your title while also being a good upstanding member of your org.


As far as I'm aware ZFS does not scale out.

https://unix.stackexchange.com/a/99218


Yea, it wasn’t designed to scale that way.

In principle you could use fiberchannel to connect a really large number (2²⁴ iirc) of disks to a single server and then create a single ZFS pool using all of them. This lets you scale the _storage_ as high as you want.

But that still limits you to however many requests per second that your single server can handle. You can scale that pretty high too, but probably not by a factor of 2²⁴.


> [rqlite and dqlite] are focused on increasing SQLite’s durability and availability through consensus and traditional replication. They are designed to scale across a set of stateful nodes that maintain connectivity to one another.

Little nitpick there, consensus anti-scales. You add more nodes and it gets slower. The rest of the section on rqlite and dqlite makes sense though, just not about "scale".


Hey Phil! Also you're 100% right. I should use a different word than scale. I was meaning scale in the sense that they "scale" durability and availability. But obviously it sounds like I say they are scaling performance.

I've changed the wording to "They are designed to keep a set of stateful nodes that maintain connectivity to one another in sync.". Thank you!


Thank you!


I’ll nitpick you back: if done correctly, consensus can have quite positive scaling consensus groups can have quite a positive impact on tail latency. As the membership size gets bigger, the expectation on the tail latency of the committing quorum goes down assuming independence and any sort of fat tailed distribution for individual participants.



dude literally has a BUNCH of repos I want to explore, now.



Here's my list (probably not all of them)

https://github.com/FireScroll/FireScroll/

https://github.com/danthegoodman1/DurableStreams (brand new)

https://github.com/danthegoodman1/ObjectKV

https://github.com/danthegoodman1/WriteAhead

https://github.com/danthegoodman1/icedb (most popular)

https://github.com/danthegoodman1/Percolators (kind of a DB on top of a DB, technically just transactions tho)

there are others that might not fit exactly what you're looking for


Wow, what great resources. Thanks for sharing bud :)


Nice, thanks! I didn‘t know there was a subreddit for that.


Hi Phil, I thought I'd find you here! Love your blog!


Glad to hear it. :)


I thought this was going to be about querying a Postgres table with a graph query language (SQL/PGQ).

Indeed this is coming to Postgres eventually.

https://www.postgresql.org/message-id/flat/a855795d-e697-4fa...

https://ashutoshpg.blogspot.com/2024/04/dbaag-with-sqlpgq.ht...


Like apache age (postgres with cypher support) https://age.apache.org/


What I'm talking about is going to be committed into Postgres.


Postgraphile[1] does this via GraphQL, or have i misunderstood what you're referring to?

1: https://www.graphile.org/postgraphile


Nope I'm talking about something getting committed to Postgres itself.


Ah I see, yes it'd be pretty awesome to have Postgres be even more of a Swiss army knife of databases.


Yes, reading this post (working around a database's concurrency control) made me raise an eyebrow. If you are ok with inconsistent data then that's fine. Or if you handle consistency at a higher level that's fine too. But if either of these are the case why would you be going through DuckDB? You could write out Parquet files directly?


cosmos/iavl is a Merkleized AVL tree.

https://github.com/cosmos/iavl :

> Merkleized IAVL+ Tree implementation in Go

> The purpose of this data structure is to provide persistent storage for key-value pairs (say to store account balances) such that a deterministic merkle root hash can be computed. The tree is balanced using a variant of the AVL algorithm so all operations are O(log(n)).

Integer Vector clock or Merkle hashes?

Why shouldn't you store account balances in git, for example?

Or, why shouldn't you append to Parquet or Feather and LZ4 for strongly consistent transactional data?

Centralized databases can have Merkle hashes, too;

"How Postgres stores data on disk" https://news.ycombinator.com/item?id=41163785 :

> Those systems index Parquet. Can they also index Feather IPC, which an application might already have to journal and/or log, and checkpoint?

DLT applications for strong transactional consistency sign and synchronize block messages and transaction messages.

Public blockchains have average transaction times and costs.

Private blockchains also have TPS Transactions Per Second metrics, and unknown degrees of off-site redundancy for consistent storage with or without indexes.

Blockchain#Openness: https://en.wikipedia.org/wiki/Blockchain#Openness :

> An issue in this ongoing debate is whether a private system with verifiers tasked and authorized (permissioned) by a central authority should be considered a blockchain. [46][47][48][49][50] Proponents of permissioned or private chains argue that the term "blockchain" may be applied to any data structure that batches data into time-stamped blocks. These blockchains serve as a distributed version of multiversion concurrency control (MVCC) in databases. [51] Just as MVCC prevents two transactions from concurrently modifying a single object in a database, blockchains prevent two transactions from spending the same single output in a blockchain. [52]

> Opponents say that permissioned systems resemble traditional corporate databases, not supporting decentralized data verification, and that such systems are not hardened against operator tampering and revision. [46][48] Nikolai Hampton of Computerworld said that "many in-house blockchain solutions will be nothing more than cumbersome databases," and "without a clear security model, proprietary blockchains should be eyed with suspicion." [10][53]

Merkle Town: https://news.ycombinator.com/item?id=38829274 :

> How CT works > "How CT fits into the wider Web PKI ecosystem": https://certificate.transparency.dev/howctworks/

From "PostgreSQL Support for Certificate Transparency Logs Now Available" https://news.ycombinator.com/item?id=42628223 :

> Are there Merkle hashes between the rows in the PostgreSQL CT store like there are in the Trillian CT store?

> Sigstore Rekor also has centralized Merkle hashes.


I think you replied in the wrong post.


No, I just explained how the world does strongly consistent distributed databases for transactional data, which is the exact question here.

DuckDB does not yet handle strong consistency. Blockchains and SQL databases do.


Blockchains are a fantastic way to run things slowly ;-) More seriously: Making crypto fast does sound like a fun technical challenge, but well beyond what our finance/gov/cyber/ai etc customers want us to do.

For reference, our goal here is to run around 1 TB/s per server, and many times more when a beefier server. Same tech just landed at spot #3 on the graph 500 on its first try.

To go even bigger & faster, we are looking for ~phd intern fellows to run on more than one server, if that's your thing: OSS GPU AI fellowship @ https://www.graphistry.com/careers

The flight perspective aligns with what we're doing. We skip the duckdb CPU indirections (why drink through a long twirly straw?) and go straight to arrow on GPU RAM. For our other work, if duckdb does gives reasonable transactional guarantees here, that's interesting... hence my (in earnest) original question. AFAICT, the answers are resting on operational answers & docs that don't connect to how we normally talk about databases giving you answers on consistent vs inconsistent views of data.


Do you think that blockchain engineers are incapable of developing high throughout distributed systems due to engineering incapacity or due to real limits to how fast a strongly consistent, sufficiently secured cryptographic distributed system can be? Are blockchain devs all just idiots, or have they dumbly prioritized data integrity because that doesn't matter it's about big data these days, nobody needs CAP?

From "Rediscovering Transaction Processing from History and First Principles" https://news.ycombinator.com/item?id=41064634 :

> metrics: Real-Time TPS (tx/s), Max Recorded TPS (tx/s), Max Theoretical TPS (tx/s), Block Time (s), Finality (s)

> Other metrics: FLOPS, FLOPS/WHr, TOPS, TOPS/WHr, $/OPS/WHr

TB/s in query processing of data already in RAM?

/? TB/s "hnlog"

- https://news.ycombinator.com/item?id=40423020 , [...] :

> The HBM3E Wikipedia article says 1.2TB/s.

> Latest PCIe 7 x16 says 512 GB/s:

fiber optics: 301 TB/s (2024-05)

Cerebras: https://en.wikipedia.org/wiki/Cerebras :

WSE-2 on-chip SRAM memory bandwidth: 20 PB/s / 220 PB/S

WSE-3: 21 PB/S

HBM > Technology: https://en.wikipedia.org/wiki/High_Bandwidth_Memory#Technolo... :

HBM3E: 9.8 Gbit/s , 1229 Gbyte/s (2023)

HBM4: 6.4 Gbit/s , 1638 Gbyte/s (2026)

LPDDR SDRAM > Generations: https://en.wikipedia.org/wiki/LPDDR#Generations :

LPDDR5X: 1,066.63 MB/S (2021)

GDDR7: https://en.m.wikipedia.org/wiki/GDDR7_SDRAM

GDDR7: 32 Gbps/pin - 48 Gbps/pin,[11] and chip capacities up to 64 Gbit, 192 GB/s

List of interface bit rates: https://en.wikipedia.org/wiki/List_of_interface_bit_rates :

PCIe7 x16: 1.936 Tbit/s 242 GB/s (2025)

800GBASE-X: 800 Gbps (2024)

DDR5-8800: 70.4 GB/s

Bit rate > In data communications: https://en.wikipedia.org/wiki/Bit_rate# In_data_communications ; Gross and Net bit rate, Information rate, Network throughout, Goodput

Re: TPUs, NPUs, TOPS: https://news.ycombinator.com/item?id=42318274 :

> How many TOPS/W and TFLOPS/W? (T [Float] Operations Per Second per Watt (hour ?))*

Top 500 > Green 500: https://www.top500.org/lists/green500/2024/11/ :

PFlop/s (Rmax)

Power (kW)

GFlops/watts (Energy Efficiency)

Performance per watt > FLOPS/watts: https://en.wikipedia.org/wiki/Performance_per_watt#FLOPS_per...

Electrons: 50%–99% of c the speed of light ( Speed of electricity: https://en.wikipedia.org/wiki/Speed_of_electricity , Velocity factor of a CAT-7 cable: https://en.wikipedia.org/wiki/Velocity_factor#Typical_veloci... )

Photons: c (*)

Gravitational Waves: Even though both light and gravitational waves were generated by this event, and they both travel at the same speed, the gravitational waves stopped arriving 1.7 seconds before the first light was seen ( https://bigthink.com/starts-with-a-bang/light-gravitational-... )

But people don't do computation with gravitational waves.


To a reasonable rounding error.. yes


How would you recommend that appends to Parquet files be distributedly synchronized with zero trust?

Raft, Paxos, BFT, ... /? hnlog paxos ... this about "50 years later, is two-phase locking the best we can do?" https://news.ycombinator.com/item?id=37712506

To have consensus about protocol revisions; To have data integrity and consensus about the merged sequence of data in database {rows, documents, named graphs, records,}.


Do you turn on SQLite checksumming or how do you feel comfortable that data on disk stays keeps integrity?


I work on PGD but I'm pretty sure we actively support open-source options for HA:

https://github.com/EnterpriseDB/repmgr


Repmgr is handy, but it is not a replication system per se, but rather makes managing a bunch of moving other parts easier.

Personally I'd like BDR to be in the main tree, or something else, equivalent to Galera:

* https://galeracluster.com

It's not perfect [1], but it'll get you a good way for many use cases.

[1] https://aphyr.com/posts/327-jepsen-mariadb-galera-cluster


Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: