
Fork a distributed Postgres database with Citus - DataChomp
https://www.citusdata.com/blog/2017/08/04/fork-your-distributed-postgres-with-citus-cloud/
======
_Codemonkeyism
"staging environment to experiment with, that is an exact copy of your
production database?"

This seems to be problematic in several compliance schemes. Most will make
sure developers,QA etc. do not get access to production data.

~~~
craigkerstiens
Craig from Citus here. In situations where compliance is an issue you would
absolutely want to obfuscate the data. You could do this after the fact in a
few ways, but have a data set that imitates the distribution and size of your
production database even in those cases is still useful. Given only the
engineers with production access would be the ones forking before they handed
it off they could put the right tooling in place to obfuscate.

All that said it's helpful feedback and something we could definitely look at
building more into the product.

~~~
_Codemonkeyism
This would definitely help. Although I think SOX (my compliance days are long
gone) is mostly concerned about controls to change data, it might be relevant
to access production data. Not sure about PCI and HIPAA.

~~~
Artemis2
Definitely not good for PCI DSS. Requirement 6.4 reads:

    
    
      Examine policies and procedures to verify the following are defined:
      • Development/test environments are separate from production environments with access control in place to enforce separation.
      • A separation of duties between personnel assigned to the development/test environments and those assigned to the production environment.
      • Production data (live PANs) are not used for testing or development.
      • Test data and accounts are removed before a production system becomes active.
      • Change control procedures related to implementing security patches and software modifications are documented.

~~~
_Codemonkeyism
Thanks, now I remember from my PCI DSS days.

------
ing33k
it's a very useful feature to have.

if you use Heroku Postgres Database ( production tier ), they have a fork
feature as well.

if you are running Postgres on your own and for relatively small loads, it's
quite trivial to create a copy of the db using Template

    
    
        CREATE DATABASE new_db TEMPLATE = old_db; 
    

[https://www.postgresql.org/docs/9.2/static/manage-ag-
templat...](https://www.postgresql.org/docs/9.2/static/manage-ag-
templatedbs.html)

While not the responsibility of a DB vendor, it would be nice if this includes
some Data anonymization/randomization option.

~~~
craigkerstiens
Great tip on the Postgres templates, they're a very much underused feature.

Fully agreed on the data anonymization, and it's something we'll think about
in the future. Prior to Citus I ran product for Heroku Postgres for a number
of years and the engineering team behind our database as a service is the
early team that also built Heroku Postgres so it's much of the same product we
aimed to create here. Anonymization absolutely makes sense we just have to
figure out the right way to deliver it.

~~~
aquadrop
I think one viable way is if you provide some way to insert hooks or triggers
for user to be able to amend data as they want. Maybe for some having ability
to mark fields to be filled with auto generated names/addresses will be enough
though.

------
yangyang
We use ZFS (on Linux) clones to instantly fork a read-only replica of our
production database, to create r/w copies. We also currently have these
upgraded in-place to 9.6 (we're still on 9.5) for testing.

Docker-compose and a makefile make it all very straightforward.

