In principle you could use fiberchannel to connect a really large number (2²⁴ iirc) of disks to a single server and then create a single ZFS pool using all of them. This lets you scale the _storage_ as high as you want.
But that still limits you to however many requests per second that your single server can handle. You can scale that pretty high too, but probably not by a factor of 2²⁴.
> [rqlite and dqlite] are focused on increasing SQLite’s durability and availability through consensus and traditional replication. They are designed to scale across a set of stateful nodes that maintain connectivity to one another.
Little nitpick there, consensus anti-scales. You add more nodes and it gets slower. The rest of the section on rqlite and dqlite makes sense though, just not about "scale".
Hey Phil! Also you're 100% right. I should use a different word than scale. I was meaning scale in the sense that they "scale" durability and availability. But obviously it sounds like I say they are scaling performance.
I've changed the wording to "They are designed to keep a set of stateful nodes that maintain connectivity to one another in sync.". Thank you!
I’ll nitpick you back: if done correctly, consensus can have quite positive scaling consensus groups can have quite a positive impact on tail latency. As the membership size gets bigger, the expectation on the tail latency of the committing quorum goes down assuming independence and any sort of fat tailed distribution for individual participants.
If folks would like to see more examples of databases built to teach oneself, they get shared on the /r/databasedevelopment subreddit not infrequently.
Yes, reading this post (working around a database's concurrency control) made me raise an eyebrow. If you are ok with inconsistent data then that's fine. Or if you handle consistency at a higher level that's fine too. But if either of these are the case why would you be going through DuckDB? You could write out Parquet files directly?
> The purpose of this data structure is to provide persistent storage for key-value pairs (say to store account balances) such that a deterministic merkle root hash can be computed. The tree is balanced using a variant of the AVL algorithm so all operations are O(log(n)).
Integer Vector clock or Merkle hashes?
Why shouldn't you store account balances in git, for example?
Or, why shouldn't you append to Parquet or Feather and LZ4 for strongly consistent transactional data?
Centralized databases can have Merkle hashes, too;
> Those systems index Parquet. Can they also index Feather IPC, which an application might already have to journal and/or log, and checkpoint?
DLT applications for strong transactional consistency sign and synchronize block messages and transaction messages.
Public blockchains have average transaction times and costs.
Private blockchains also have TPS Transactions Per Second metrics, and unknown degrees of off-site redundancy for consistent storage with or without indexes.
> An issue in this ongoing debate is whether a private system with verifiers tasked and authorized (permissioned) by a central authority should be considered a blockchain. [46][47][48][49][50] Proponents of permissioned or private chains argue that the term "blockchain" may be applied to any data structure that batches data into time-stamped blocks. These blockchains serve as a distributed version of multiversion concurrency control (MVCC) in databases. [51] Just as MVCC prevents two transactions from concurrently modifying a single object in a database, blockchains prevent two transactions from spending the same single output in a blockchain. [52]
> Opponents say that permissioned systems resemble traditional corporate databases, not supporting decentralized data verification, and that such systems are not hardened against operator tampering and revision. [46][48] Nikolai Hampton of Computerworld said that "many in-house blockchain solutions will be nothing more than cumbersome databases," and "without a clear security model, proprietary blockchains should be eyed with suspicion." [10][53]
Blockchains are a fantastic way to run things slowly ;-) More seriously: Making crypto fast does sound like a fun technical challenge, but well beyond what our finance/gov/cyber/ai etc customers want us to do.
For reference, our goal here is to run around 1 TB/s per server, and many times more when a beefier server. Same tech just landed at spot #3 on the graph 500 on its first try.
To go even bigger & faster, we are looking for ~phd intern fellows to run on more than one server, if that's your thing: OSS GPU AI fellowship @ https://www.graphistry.com/careers
The flight perspective aligns with what we're doing. We skip the duckdb CPU indirections (why drink through a long twirly straw?) and go straight to arrow on GPU RAM. For our other work, if duckdb does gives reasonable transactional guarantees here, that's interesting... hence my (in earnest) original question. AFAICT, the answers are resting on operational answers & docs that don't connect to how we normally talk about databases giving you answers on consistent vs inconsistent views of data.
Do you think that blockchain engineers are incapable of developing high throughout distributed systems due to engineering incapacity or due to real limits to how fast a strongly consistent, sufficiently secured cryptographic distributed system can be? Are blockchain devs all just idiots, or have they dumbly prioritized data integrity because that doesn't matter it's about big data these days, nobody needs CAP?
Bit rate >
In data communications: https://en.wikipedia.org/wiki/Bit_rate#
In_data_communications ; Gross and Net bit rate, Information rate, Network throughout, Goodput
Gravitational Waves: Even though both light and gravitational waves were generated by this event, and they both travel at the same speed, the gravitational waves stopped arriving 1.7 seconds before the first light was seen ( https://bigthink.com/starts-with-a-bang/light-gravitational-... )
But people don't do computation with gravitational waves.
To have consensus about protocol revisions; To have data integrity and consensus about the merged sequence of data in database {rows, documents, named graphs, records,}.
reply