

Local development and huge databases? - mdomans

Question for guys working on any data analytics projects - local development and working on features when the database we touch is huge. For some time now I have the problem of working with huge datasets in relational databases where having a local copy of such DB is simply not possible. I have a way of working around the problem by utilising either a mix of local editor and remote iPython or having some mocked data. That of course causes some problems with performance, since you can quite easily write slow code. Any ideas?<p>TL;DR: How to develop code when you need a huge database for work (more than 40GB)?
======
chriskottom
There are two approaches that I would consider. The first is simply to ask:
does your development work absolutely require large test data, or might it be
possible to reduce the size through more precise selection? And more to the
point, if such is possible, is it possible with a reasonable amount of effort?
This is such a common problem that my guess is you're not the only one trying
to solve it.

If the size of the data set is essential to your development, then could you
set up a development database server on dedicated machine (could be an old
workstation/laptop) on the local network that could act as common
infrastructure for the whole team? This would only work if your application
supports connection to a remote database, but it would solve the problem of
externalizing the need for a database and sharing that resource among your
development team rather than having each of you running a big DB locally.

