
A list of PostgreSQL libraries, tools and resources - lauriswtf
https://github.com/dhamaniasad/awesome-postgres
======
Cieplak
What is the best way to ETL data from one Postgres cluster to another (by ETL
I mean things like data sanitization and denormalization)?

a) I think most people end up doing some kind of batch synchronization, but
I'm interested in streaming solutions.

b) A lot of folks use trigger based replication, but triggers have to be on
the primary/master node, and not just on the replicas.

c) Another common solution is to force the database client to write to a
message broker, but that opens the door to data discrepancies and
synchronization issues.

d) In theory I think the best way is to do something like bottledwater-pg or
pg_kafka [1] [2] [3], but I'm not sure how battle hardened these are. I think
logical replication of the WAL is the right approach, but there is still not
much tooling around this.

[1] [https://github.com/confluentinc/bottledwater-
pg](https://github.com/confluentinc/bottledwater-pg)

[2]
[https://github.com/xstevens/pg_kafka](https://github.com/xstevens/pg_kafka)

[3]
[https://github.com/xstevens/decoderbufs](https://github.com/xstevens/decoderbufs)

PS: There are a bunch of interesting MySQL solutions out there, such as
Zendesk's Maxwell:

[https://github.com/zendesk/maxwell](https://github.com/zendesk/maxwell)

~~~
AlterEgo20
Google "Postgres BDR". It is new, but it allows you to replicate changes on
master to another master. With some configuration it can even change
replicated data.

~~~
Cieplak
Thank you. I believe that bidirectional replication will solve the ETL
problem, if it can be configured to stream from a replica in the source/oltp
cluster (to minimize network IO on the source cluster master db) to the master
of the destination/olap cluster. The olap master can mirror the oltp data, but
also use triggers to denormalize oltp tables.

One thing I'd like to see is the ability to sanitize/munge data from the
source cluster before it is sent to the destination cluster, for masking
sensitive data in the source cluster (PCI, HIPAA, etc).

------
pilif
With regards to psql2csv: The default psql can already do this very nicely.
Just use \copy.

\copy (select whatever from whatever) to 'yourlocalfile.csv' with (format
'csv')

and if you want column headers, add a "header true" inside of the with clause.

Generally, \copy works just like COPY[1] but it does so from the remote server
to the local machine, whereas file names given to COPY are relative to the
server.

Yes. A dedicated tool might feel easier initially, but once you know how \copy
works, you can always get a CSV file from whatever database you're connected
to and no matter what machine you're on.

[1]: [http://www.postgresql.org/docs/current/static/sql-
copy.html](http://www.postgresql.org/docs/current/static/sql-copy.html)

~~~
fphilipe
psql2csv is just a wrapper around what you're describing. The point is that
you don't have to write the "boilerplate" when all you need is a single query.
Plus, it works nice with stdin and stdout, making it handy for queries stored
in a file, or for piping the CSV to another program.

------
rMBP
I found this talk by Christophe Pettus [1] very informative. The title is
somewhat misleading as most of the talk has little to do with Python, but it's
a good introduction to more advanced Postgres concepts. Also available in PDF
form [2].

[1]
[https://www.youtube.com/watch?v=0uCxLCmzaG4](https://www.youtube.com/watch?v=0uCxLCmzaG4)

[2]
[http://thebuild.com/presentations/pycon-2014-pppp.pdf](http://thebuild.com/presentations/pycon-2014-pppp.pdf)

------
jbranchaud
Does anyone know of sample PostgreSQL databases? Something akin to Northwind
([https://northwinddatabase.codeplex.com/](https://northwinddatabase.codeplex.com/))?

~~~
ellisv
Not sure if this is what you mean but PostgreSQL 9.5 introduced TABLESAMPLE.
[https://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9....](https://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9.5#TABLESAMPLE)

~~~
jbranchaud
Sorry, no, I'm meaning example databases with 'fake' or sample data.

------
davidw
Can't it just be a 'list' of Postgres stuff? Do we really expect that it's
generated by an algorithm if someone doesn't write 'curated'? If it's a good
list, I'm going to find it interesting no matter how it was created.

~~~
dang
We took 'awesome' and 'curated' out the title, as per usual.

~~~
davidw
Thanks for your awesome curation of this site :-)

------
hvo
Good work. I am a big fan of PostgreSQL. It has made life easier for me.

------
ForHackernews
Amazon RDS doesn't make the list of "hosted Postgres"?

------
copperx
Is there a PostgreSQL that anyone here can recommend?

~~~
jmount
You can try Postgres "zero install" with Postgres.app for OSX, otherwise
docker. [http://www.win-vector.com/blog/2016/02/databases-in-
containe...](http://www.win-vector.com/blog/2016/02/databases-in-containers/)

~~~
copperx
Thank you! But I meant book, a PostgreSQL book; but somehow I didn't type that
:(

~~~
datawrangler
The PostgreSQL manual [1] is pretty good IMO. Have you looked at that?

[1]
[http://www.postgresql.org/docs/manuals](http://www.postgresql.org/docs/manuals)

~~~
thejosh
The PostresSQL manual & documentation are one of my favourite things about
PostresSQL, especially when I was first starting out :).

