Hacker News new | comments | show | ask | jobs | submit login

> I have about the same amount of data in a Postgres database ...

I'm curious, how fast can one load data into Postgres? Is it possible to import data directly from CSV files?




> I'm curious, how fast can one load data into Postgres?

Hard to answer considering the number of variables impacting. pg_bulkload[0] quotes 18MB/s for parallel loading on DBT-2 (221s to load 4GB), and 12MB/s for the built-in COPY (with post-indexing, that is first import all the data then enable and build the indexes)

> Is it possible to import data directly from CSV files?

Yes, the COPY command[1] can probably be configured to support whatever your *SV format is. There's also pg_bulkload (which should be faster but works offline).

[0] http://ossc-db.github.io/pg_bulkload/index.html

[1] http://www.postgresql.org/docs/current/interactive/sql-copy....


18MB/s sounds rather low. It obviously rather depends on the source of data, format of data (e.g. lots of floating point columns is slower than large fields of text), and whether parallelism is used. But you can relatively easily get around 300MB/s into an unindexed table, provided you have a rather decent storage system.


>Is it possible to import data directly from CSV files?

Yup! http://www.postgresql.org/docs/current/static/sql-copy.html


Our dataset is not loaded from an external source, it is generated by scanners.

But to answer your question: yes, postgres can load data from csv files: http://stackoverflow.com/questions/2987433/how-to-import-csv...




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: