

Lean, mean data science machine - jeroenjanssens
http://jeroenjanssens.com/2013/12/07/lean-mean-data-science-machine.html

======
gjreda
I'm super interested in the chapter on creating reusable command line tools.

I've found the command line to be ideal for performing a lot of simple,
memory-intensive tasks (filtering/munging/sorting/etc. a massive text file).

However, after data collection (and munging), data science is typically A LOT
of _exploratory_ analysis. I think it's extremely important that all
practitioners approach analysis with the mindset of making it easily
reproducible (and if possible, flexible - don't hard code date ranges, file
paths, etc.).

I tend to stick with IPython Notebook (and heavily recommend it). I fear that
heavy analysis at the command line would consist of too many one-liners and
thus be difficult to read and maintain.

~~~
jeroenjanssens
Thanks! I completely agree with you on making the analysis reproducible.
IPython Notebook is a wonderful tool, and I use it a lot myself. (You can even
run shell commands from IPython.) However, using the command-line doesn't
necessarily mean that the analysis will be difficult to follow or maintain. On
the contrary, because you're making use of existing tools (and perhaps even
create new ones), you are processing data on a higher level, which can
actually make it easier to read and maintain.

