
Show HN: Pyexperiment - duerrp
https://github.com/duerrp/pyexperiment
======
ThePhysicist
During my PhD I wrote similar software to manage a large number of lab
instruments and perform experiments in a reproducible fashion. The code is
online on Github:

The general-purpose part:
[https://github.com/adewes/pyview](https://github.com/adewes/pyview)

Specific components for my experiments: [https://github.com/adewes/python-
qubit-setup](https://github.com/adewes/python-qubit-setup)

The idea was to create an MVC-like framework where you could create and
instantiate instruments, bundle them together into a system and use them to
perform measurements. The data from the measurements would be saved in a text-
based format and enriched with the meta-data about the state of the whole
system (in order to make it reproducible).

Your work seems to go in the same direction and seems to be very interesting!
I think there is definitely a need for a system like this in Python, although
it is a difficult problem since the requirements vary quite a bit as a
function of the research field that you're in.

~~~
duerrp
Thanks, very interesting. I am reading through your code right now... maybe
there's something I can add to pyexperiment.

------
Osmium
I've yet to see an experiment software I've really liked. I've seen a few
people use and struggle with OpenSesame recently, which seems like an
incredibly featureful and useful project, but is also awkward because of it.
Pyexperiment seems like a nice lightweight solution, but it has the same
problem, which is that you need some scripting and computer skills.

There's an argument which says that a modern researcher needs to be able to
script/program, at least to a degree. But I don't like the idea of otherwise
very able and very skilled scientists struggling to do good research because
they're not great with computers.

I'm not sure what can be done about this. On a large scale, I'd love to see
proper investment into the UX and UI of existing scientific codes (and I have
a long list of where to start with _that_ ). On a personal level, I'd really
like to make an alternative to something like OpenSesame; not a replacement,
but something more lightweight for smaller scale/student studies, with a nice
GUI and easy to use. I wouldn't really know where to start though: what the
essential features are and what you can leave out, because I don't do that
sort of study myself.

~~~
semi-extrinsic
> I don't like the idea of otherwise very able and very skilled scientists
> struggling to do good research because they're not great with computers.

The solution already exists: skilled scientist hires grad student who's done
some programming + some electronics classes. Happens everywhere already, works
well.

Talking about UI and UX as a problem with scientific codes makes me giggle
though. We have much bigger issues with reproducibility, providence, mandating
codes be open source, even getting people to use a version control system,
that we need to fix first.

~~~
Osmium
Interesting, would love to hear some of your experiences! It sounds like the
issues we face are slightly different. From my field, version control is
pretty much a non issue[1], most of the widespread codes use that (typically
svn, but that's fine). Build systems are often an issue, as is incomplete (or
outdated) documentation, and a lack of proper changelogs. A lack of proper
tests too. Actually, maybe the issues we face aren't so different after all...

> The solution already exists: skilled scientist hires grad student who's done
> some programming + some electronics classes. Happens everywhere already,
> works well.

I've never actually seen this work well. Scripting an existing solution, sure.
But writing a new one from scratch... Most scientists I have met personally
who code, write awful awful code. Sometimes they just lack the time to do
something better (I'm as guilty of this as anyone), other times they're just
not very good at it. And it's almost never maintainable: when they leave the
project, you might as well re-write from scratch (a more cynical person would
say that's by design, but I actually don't believe that). There are many, many
exceptions to this, obviously, but as a rule...

I am optimistic though. It seems like things are changing for the better,
slowly.

Edit: [1] That said, I've been trying to persuade a colleague to use version
control for ages. I'm at a loss. Live and let live, I guess.

~~~
semi-extrinsic
(What's you field? Mine's computational physics, but I collaborate a lot with
experimentalists.) When I say version control, I meant for in-house projects,
that tend to be just on someone's laptop and also a horrible mess, as you say.
Most public codes are indeed on some VCS, git is very popular in my field. But
reproducibility is a whole other matter. There's been some interesting work on
providence systems that can attach _all_ input needed to run a code for
reproduction to e.g. figures in a paper, but until we've got a good universal
one that journals start caring about, they're not going to see wide adoption.

As for non-computer-savvy profs using grad students for programming: I see a
lot of either LabView (for controlling experiments) or Python/Matlab in those
cases, and it's often "write-only" code, but it mostly gets the job done. I'm
also optimistic, but there are too few people working on and caring about the
software carpentry we need to back modern svience in a good way.

------
imiric
Thanks for sharing!

Looks interesting, though I think the word "experiment" is what's throwing
people off, as it usually has a research/science connotation. As you've
explained it here, your project is mostly a "commons" library for reducing
regular boilerplate you've encountered in your area of work. As such, I would
consider renaming it to something more suitable.

Also, these types of projects aren't really reusable by the general public,
unless it fits your use case exactly. For example, I'd have no need for
matplotlib or NumPy, and would want to output JSON logs (with something like
[structlog][0]). That said, trying to accommodate everyone's use case is
impossible, so as long as it solves your problem, mission accomplished.

[0]:
[http://www.structlog.org/en/stable/](http://www.structlog.org/en/stable/)

~~~
duerrp
Thanks for your feedback. I see your point, but at this point I am kind of
attached to the name - as a researcher many of my scripts are related to some
kind of experiment. This also explains the NumPy and matplotlib dependencies.
As for structlog, if I understand the documentation correctly, you could
easily add structlog to your project on top of pyexperiment, right?

~~~
sirclueless
I want to voice my support for keeping the driving use case "experiments".

I'm coming from the perspective of a software engineer here. To a software
engineer, a "program" is a collection of stateless routines and behavior. Data
is external and separate, the same program should be able to process a wide
range of data. "Reproducibility", as much as that matters, is having a tested
system that responds in a predictable and reliable way to inputs, and data is
one such input.

When I first worked extensively with a scientist on an experiment, I was
shocked how much common wisdom from computer science was turned on its head.
One is expected to load up a Matlab workspace with data and code all in the
same file? Scripts irreversibly mutate data, and often run exactly once? How
could one possibly keep track of such an environment? How does one fix bugs in
a series of commands typed into an interactive prompt? Reproducibility to a
scientist is a log of actions that could be repeated by another human, but the
environments used often just dropped such things on the floor, to be caught
only by the most diligent researcher with an unusually well-kept notebook.

I think there is definitely a happy medium somewhere. Reproducibility as a
scientist understands it; interactivity in a way that makes sense to a
scientist writing a one-off script. Program state stored easily so that the
scientist doesn't feel lost every time they restart their environment, as I
imagine they must do when editing python scripts in vim as a software engineer
might. But all this in a world where scripts can be maintained and versioned
and fixed without their hair catching fire.

~~~
duerrp
Thanks for the kind words. Until I left academia I used to work with matlab a
lot, and pyexperiment is probably a result of trying to get that experience
while making scripts that can easily be shared along with the data needed to
run them.

E.g., the issue with irreversible mutation of data is addressed in
pyexperiment with rotating state (and log) files. For example if you store the
state of your experiment in one run, and then change it in the next, you will
get a backup of the old state with a numerical extension (by default up to 5
backups are rotated). Moreover, pyexperiment by default comes with commands to
display stored state and configuration options (though they still need to be
improved), and both are stored in formats compatible with a host of other
software (including matlab).

Btw., along the same lines, I love ipython notebooks, but the way I use them
makes them very hard to share, and compared to plain python scripts, version
control is a pain (even with the usual hacks to make diffs readable).

------
mynegation
That looks like it could be the part of something more generic, like yeoman
for node.js. I like the idea of yeoman for Python (and any language/technology
in general)

~~~
duerrp
Interesting, thanks. I didn't know yeoman (nor much javascript), but I am
looking into it right now.

------
duerrp
I'd love to hear some feedback...

~~~
jackmaney
I...I really don't understand what the primary use-cases are for this library.
By "experiment", do you mean some kind of scientific experiment? A double-
blind study on file input? That...doesn't make a whole lot of sense, but
that's the closest thing that I can glean from a skim of the first
documentation page that makes any sense at all...

    
    
        Motivating Example
        Let’s assume we need to write a quick and clean script that
        reads a couple of files with time series data and computes
        the average value. We also want to generate a plot of the
        data.
    

That...provides no motivation whatsoever. If I want to do that, I'll use
Pandas and matplotlib. This example sheds absolutely no light on what
pyexperiment does or why I would want to use it.

~~~
duerrp
Pandas and matplotlib are great, I use them all the time. Yet whenever I write
a python script for such experiments, beyond some point, I find myself writing
code that parses command line arguments, handles simple configuration options,
saves the results of my computation in a shareable way, etc.

Pyexperiment collects these bits and pieces into a library where I can just
write the relevant stuff - it's mainly solving my own pain-point and I thought
I could share it.

~~~
jackmaney
So...it's for configuration management? What do "experiments" have to do with
this?

What does this package do that isn't done by click, docopt, or argparse?

~~~
duerrp
Thanks for the comments, I just added another motivating example right at the
top of the README. Is it clearer now?

~~~
jackmaney
Not in the slightest. I still have no idea what this has to do with
experiments.

~~~
duerrp
Ok, I gave the motivating example another shot... not sure if it helps, but
please let me know if you have any suggestions.

