Thank you all for your feedback. The benchmark was updated and the fastest tool is NOT written in Python. Here are the highlights:
* Added ClickHouse (written in C++) to the benchmark: I was unaware that the clickhouse-local tool would handle these tasks. ClickHouse is now the fastest (together with OctoSQL);
* OctoSQL (written in Go) was updated as a response to the benchmark: updates included switching to fastjson, short-circuiting LIMIT, and eagerly printing when outputting JSON and CSV. Now, OctoSQL is one of the fastest and memory is stable;
* SPyQL (written in Python) is now third: SPyQL leverages orjson (Rust) to parse JSONs, while the query engine is written in Python. When processing 1GB of input data, SPyQL takes 4x-5x more time than the best, while still achieving up to 2x higher performance than jq (written in C);
* I removed Pandas from the benchmark and focused on command-line tools. I am planning a separate benchmark on Python libs where Pandas, Polars and Modin (and eventually others) will be included.
* Added ClickHouse (written in C++) to the benchmark: I was unaware that the clickhouse-local tool would handle these tasks. ClickHouse is now the fastest (together with OctoSQL);
* OctoSQL (written in Go) was updated as a response to the benchmark: updates included switching to fastjson, short-circuiting LIMIT, and eagerly printing when outputting JSON and CSV. Now, OctoSQL is one of the fastest and memory is stable;
* SPyQL (written in Python) is now third: SPyQL leverages orjson (Rust) to parse JSONs, while the query engine is written in Python. When processing 1GB of input data, SPyQL takes 4x-5x more time than the best, while still achieving up to 2x higher performance than jq (written in C);
* I removed Pandas from the benchmark and focused on command-line tools. I am planning a separate benchmark on Python libs where Pandas, Polars and Modin (and eventually others) will be included.
This benchmark is a living document. If you are interested in receiving updates, please subscribe to the following issue: https://github.com/dcmoura/spyql/issues/72
Thank you!