Hacker News new | past | comments | ask | show | jobs | submit | wanderinglight's comments login

We let the generator code hardcore the weight into the generated source.

GPU performance significantly affects performance by as much as 20X. This project is only intended for cases where GPU is not available / desired due to cost or other constraints,


This is definitely something I intend to fix.

My initial intent was to use duckdb for fast vectored query execution but I wasn't able to create a planner / execution hook that uses duckdb internally. Will definitely checkout pg_analytics / Datafusion to see if the same can be integrated here as well. Thanks for the pointers.


Have you seen duckdb_fdw (https://github.com/alitrack/duckdb_fdw)? IIRC it's built based on sqlite_fdw, but points the outbound queries to DuckDB instead of SQLite, and it does handle running aggregations inside of DuckDB. Could be useful.


This is great, thank so much! I'll see if I can I can integrate this and how it compares to parquet_fdw.


I looked into pg_analytics and some other solution like Citus before working on pg_analytica.

The key difference is solutions like pg_analytics completely swap out the native postgres row based storage for columnar storage.

Not using the default postgres storage engine means you miss outon a lot of battle tested, functionality like updating existing rows, deleting rows, transactional updates etc. Columnar stores are not suited for transactions, updates and deletes.

pg_analytica retains the existing Postgres storage and only exports a time delayed version of the table in columnar format. This way developers get the benefit of a transactional storage and fast analytics queries.


Hey Fabian, there are indeed many trade-offs in making this possible.

The extension periodically schedules export runs for a table exporting all the data each time (I know this is expensive but I haven't found an alternative like a change pump for getting writes after time T in Postgres yet). The export frequency can be configured by the user.

Regarding how much the delay will be, we that depends on the export frequency and the machine running the Postgres instance. A fast machine could complete an entire table export every hour or so resulting data that is decently fresh. Slower / more resource constrained machines would have to schedule exports at a slower frequency.


This lecture series on columnar storage formats and querying them is great. https://youtu.be/1hdynBJo3ew?si=5KfT_2qpUFQmy_uL


Thank you!


Introducing the pg_analytica extension for Postgres.

Tables can be exported to columnar format via this extension to massively speed up your analytics queries. Exported data is periodically refreshed to keep your query results fresh.


would need some benchmark against indexed data


Regular indexes are not that useful for this kind of query as you need to read the entire table in most cases.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: