
Computational reproducibility: IPython in the age of data-driven journalism - phreeza
http://blog.fperez.org/2013/04/literate-computing-and-computational.html
======
mistercow
Since CoffeeScript started supporting literate programming, I've enjoyed
dabbling with it quite a lot. I wrote a program to generate a recipe for a
variant on the "soylent" concept, using numeric.js to solve and optimize for
various constraints based on a target nutritional profile. The goal of that
was rather close to what TFA is discussing; I had previously computed the
recipe by hand, realized later that my calculations were wrong, and decided to
build the recipe as a well-documented reproducible computation instead of an
opaque final result.

But what I have not found yet (or discovered for myself) is good guidance on
how to document larger blocks of code. If a function is more than a few lines
long, I haven't found a satisfactory way to document it.

The first approach I tried was not to respect the atomicity of the function in
terms of formatting. That is, I would just write along literately about what
the function was doing, right inside the function. The problem is that in both
the code and the generated documentation, it becomes very difficult to discern
scope visually.

The second approach I tried was to give a brief overview description above the
function, then include ordinary comments inside it. That's fine and all, but
at that point, you're not gaining much over simple block comments above the
function.

The third approach I've tried is to either use an ordered list of steps
preceding the function, or to describe the function with numbered footnotes,
and then to add corresponding numeric comments in the code. This is the most
satisfactory solution I've found yet, and the documentation ends up looking
like a programming book. But the locality of reference is just terrible. It's
also just terrible for control flow statements. Using an outline format
instead of an ordered list can allow for a bit better control flow, but it's
still not great.

~~~
datr
Documentation generator templates which split the comments and the code help
with the scope problem of solution 1. E.g.:

<http://reload.github.io/phing-drupal-template/>

The trade off being that the vertical spacing of the code is now inconsistent.

~~~
mistercow
That does certainly help in terms of making the documentation readable, but it
doesn't help the readability of the source code.

------
arpineh
The stuff you can do with current IPython notebook is already amazing and
immensely useful, as the examples mentioned in the posts demonstrate. Browser
is not a good place to manipulate gigabytes of data (like Python), but it is
the easieast to use, most versatile rendering engine for visualizations.

I keep envisioning a marriage of Light Table with IPython. If this post is any
indication of Light Table's prowess, it could very well work with IPython
backend: www.chris-granger.com/2012/05/21/the-future-is-specific/

I mourn the lack of language to build these editors. Schema languages only
validate document's structure and content. Browser based editors either
abstract editing away to wiki like syntax or hide it inside Word processor
emulating widgets. Both come with a set of assumptions that makes them hard to
customize to suit the needs of the data.

It should be possible to create a declarative editing and visualization
language (a Domain Specific Language) that would drive this browser/data
structure server combination. You could describe the editing environment for
one particular domain (like exploring and visualizing a statistical dataset).
This should be data driven, ie. it builds the environment according to chosen
dataset. Language should allow macros or modules in suitable languages for
additional functionality, like complex validation, animations and transitions.
You can export your view of the data, complete with your code and command
history.

Tools for data visualization and editing need to look like they are build for
this data, for these particular visualizations. And they should be
customizable, right there in the same environment so the link between the data
and your knowledge of its domain stays in view. I think these tools hold the
promise of becoming toolmakers tools, too.

~~~
dr_doom
Have you checked out Continuum's Wakari project.[1] It is essentially IPython
with an online ide.

<http://continuum.io/wakari.html>

~~~
arpineh
I first mistook this for their other in browser data analysis environment:
<https://github.com/ContinuumIO/bokehjs>

I have yet to test this, since it looks like early days yet.

Wakari is a cloud service and somewhat expensive for my needs. Though its hard
to say how big an instance my data would actually require. If my Pandas
experimentations don't pan out, I might have to give it a whirl.

------
minopret
Sage (sagemath.org) provides symbolic computing as well as numerical
computing. Symbolic computing makes it easier to explain the equations behind
the bottom-line results or diagrams and make that mathematical procedure open
to computer-assisted experimentation. Sage is built on top of IPython. It is
freely available both for download and as a web application (sagenb.org).

~~~
dschep
Is there any real benefit over IPython + SymPy?

~~~
minopret
If you're happy with SymPy, then Sage doesn't really give you anything. That
is, it gives you SymPy, which you already had.

Sage is many things intelligently integrated, of which SymPy is just one
example. Superficially Sage is a set of command-line and web user interfaces
with a preprocessor on top of the Python programming language. The deeper part
of Sage is a set of libraries that tie together many high-quality tried-and-
true open-source packages by providing a unified API and object model
(theoretically based in category theory, I think), plus a set of package-
specific APIs for when those are more desirable. The packages include Maxima
as the main tool for symbolic computation, SymPy as another, NetworkX for
graph-theoretic operations, PARI/GP for number theory, and many more:
<http://www.sagemath.org/links-components.html>

------
ianstallings
Inline twitter comments confuse the bejesus out of me.

