pfent's comments

pfent · 2025-03-07T18:25:30 1741371930

When incrementing the version with 4 GHz, it takes over 100 years non-stop for a 64bit wraparound.

xxs · 2025-03-07T19:52:59 1741377179

likely even more as it has to be an atomic operation which causes extra latency and coherency traffic

pfent · 2024-06-19T21:00:24 1718830824

There are TPC-H numbers in another post: https://cedardb.com/blog/simple_efficient_hash_tables/

riku_iki · 2024-06-19T21:08:42 1718831322

its great starting insight, but again its small dataset (100GB) which almost fits memory, and I think many details are missing (for example clickbench publishes all configs and queries, and more detailed report, so vendors can reproduce/optimize/dispute them).

refset · 2024-06-20T12:59:42 1718888382

> small dataset (100GB)

What counts as large or small definitely varies a lot depending on the context of the conversation/analysis.

MotherDuck's "Big Data is Dead" post [0] sticks in mind:

> The general feedback we got talking to folks in the industry was that 100 GB was the right order of magnitude for a data warehouse. This is where we focused a lot of our efforts in benchmarking.

Another point of reference is [1]

> [...] Umbra achieves unprecedentedly low query latencies. On small data sets, it is even faster than interpreter engines like DuckDB

> TPC-H Small Dataset = 866k tuples, sf 0.1

[0] https://motherduck.com/blog/big-data-is-dead/

[1] https://db.in.tum.de/~kersten/Tidy%20Tuples%20and%20Flying%2...

riku_iki · 2024-06-20T15:02:58 1718895778

> The general feedback we got talking to folks in the industry was that 100 GB

and then user discovers that DuckDB is plagued with OOMs and dramatic performance degradations when his data is slightly larger than memory.