its great starting insight, but again its small dataset (100GB) which almost fit...

refset · 2024-06-20T12:59:42 1718888382

> small dataset (100GB)

What counts as large or small definitely varies a lot depending on the context of the conversation/analysis.

MotherDuck's "Big Data is Dead" post [0] sticks in mind:

> The general feedback we got talking to folks in the industry was that 100 GB was the right order of magnitude for a data warehouse. This is where we focused a lot of our efforts in benchmarking.

Another point of reference is [1]

> [...] Umbra achieves unprecedentedly low query latencies. On small data sets, it is even faster than interpreter engines like DuckDB

> TPC-H Small Dataset = 866k tuples, sf 0.1

[0] https://motherduck.com/blog/big-data-is-dead/

[1] https://db.in.tum.de/~kersten/Tidy%20Tuples%20and%20Flying%2...

riku_iki · 2024-06-20T15:02:58 1718895778

> The general feedback we got talking to folks in the industry was that 100 GB

and then user discovers that DuckDB is plagued with OOMs and dramatic performance degradations when his data is slightly larger than memory.