
Ask HN: Cloning Prod DB for Acc-Testing Environment? - thecopy
Hello,<p>For acceptance-testing in our new deployment process we&#x27;re developing, I am thinking about the possibility of cloning a production-server into a virtual machine for each new release to simulate the exact environment as in production.<p>This would be easy if it not were for the database size. It is 2 TB, which would probably take too long and use up to many resources.<p>Is there some way to create a proxy to the production database from withing the accepance-testing environment? I am thinking that all reads come from the production environment, and all writes can go to a local instance. And if i am trying to read from a row wich has been touched it will get it from the local instance (which only contains the inserted or edited rows).<p>Is there something similar to this concept?
======
davismwfl
Overall, I do not like pointing to production ever for test/qa etc, even for
reads. Inevitably someone will mess up the config setting at some point and
you'll write test data into Production. I have had it happen, even when the db
user access should have prevented the writes, but stuff happens.

I'm a big fan of taking a sample of production data, anonomizing what should
be and then pushing that into test environments. I even now always write a
script (or code) up front to automate the process before getting to far into
development. I also always include "broken" records or strange data that has
broken production in the past so that we regression test against it. I add the
broken records as test pre-conditions usually so that in case someone deleted
those records during other tests they will get recreated.

BTW -- depending on which DB you use, I generally have started to configure a
test DB that gets replicated to that acts as a source. So essentially what I
do is replicate a subset of data from production to the test source DB all the
time, then all other copies are made from that source DB which is never itself
touched (other than an anonomizer). I only keep enough data in that source DB
to make the largest test valid. What this does is let a dev copy say 500
records to his laptop to test with, or with Integration or QA they can have
the whole source DB.

------
avitzurel
I remember those jolly days when you could just clone production DB and
restore it into your local machine.

These days I can't even do it to a single of our avg size tables.

Anyway, counting on production data for testing is a mistake.

We do a couple of things

1\. Testing For testing we use fixtures with a build and tear process.
Basically building the test with data and in tear down clearing the database.

Language and framework is irrelevant here, every single one that I know of has
this features built in.

2\. Staging/Princess For staging, we clone production periodically but we have
a "whitening" process. a. Remove all emails, remove all keys/tokens. b. Delete
tables for stats that only relate to production (orders etc...) c. Delete all
push tokens, no one from staging will ever get tweeted/push notification or
anything like that.

I would NEVER point staging/test to production, not even for read only.
Staging/Test should be on a security group that doesn't even have access to
production DB.

------
giaour
I would discourage you from using production data in testing environments,
even if writes are blocked. If you have any sensitive customer information
anywhere in your production database, your testing environment would become an
additional attack target. Depending on your industry, that might mean that
your test workers will need to be compliant with HIPAA, PCI, or your
production SSP.

------
atomical
Why not use fixtures and seed the data you need? Otherwise your tests are
going to be coupled to production data that may change in the future.

