Hacker News new | past | comments | ask | show | jobs | submit login
Getting a scientific prize for open-source software (gael-varoquaux.info)
142 points by gorpovitch 4 days ago | hide | past | web | favorite | 24 comments





OP is right-on as to why this is a big deal in academia.

I want to see it spread beyond computer programming libraries into areas where sharing is harder, like open source scientific equipment and fully reproducible methods in chemistry experiments.


> OP is right-on as to why this is a big deal in academia.

The ACM (academic computer science group) has awarded prizes to these open source systems projects amongst others recently:

- Wireshark

- Jupyter

- GCC

- Mach

- Coq

- LLVM

- Eclipse

- make

- Java

- Tcl/Tk

The linked article is written like the academy never works on or recognises open source software or implementation work, or using open licences is unusual. That's not true.


OP was talking about academia in general, and not just about CS-academia, which is of course a lot more sensitive to open source software.

In traditional (= non-CS) academia, proprietary software is still very much the norm, and as long as institutes get a free license for academic usage they also don't seem to care about open source too much. I don't know how much precedent there is, but such recognition from traditional academia still seems to be pretty rare and worthy of highlighting.


Yes. The self-congratulation was very off putting.

> fully reproducible methods in chemistry experiments

Top-tier open source libraries for cheminformatics (or other natural science -informatics flavours) would already be a welcome start.


What do you think is missing in the current offering (OpenBabel, RDKIT, maybe some other I am missing)?

Context: I do research in computational chemistry, and write an open source library for this, that could be used for cheminformatics too. I don't really know what is needed for this though, since I never touched cheminformatics.


I've dabbled a bit with OpenBabel and RDKIT, but I found their interfaces especially for simple things (traversing atoms/bonds in a molecule) quite unwieldy. I suspect that a big part of this could "just" be missing documentation / tutorials to get into it.

Maybe I'm just not deep enough into it, but from my impression so far especially when it comes to application-level software (in contrast to specialized research), OEChem and similar closed source libraries seem to be the most widely used ones, with nothing quite comparable available.

Context: Software Engineer that is also currently a biochemistry undergraduate.


> I found their interfaces especially for simple things (traversing atoms/bonds in a molecule) quite unwieldy.

Somehow the same for me, this is part of why I started my own project (http://chemfiles.org). I have the impression that for cheminformatics you want to see molecules as graphes, is this true or is a list of bonds enough for usual purposes?

I have heard of OEChem but never used it. I'll try to find some documentation to have a look.


> I have the impression that for cheminformatics you want to see molecules as graphes

Yeah, that was my thinking.

I've also seen your work on lumol, so you seem to be one of the few people working in the field with Rust! I just recently started writing a SMILES parser in Rust[0], as a first step towards an in-memory graph representation of molecules. I have a first rough draft of that locally, though it's very rough and changing a lot, as I have to adjust it weekly as I'm basically learning the required theory at the same time :D

[0]: https://github.com/hobofan/smiles-parser


Published chemical syntheses are described in very great detail. Physical chemistry/spectroscopy papers likewise describe apparatus, collection, and analysis often down to the nuts and bolts. I don't see how to open source work requiring a femtosecond mid-infrared laser or a prep requiring a synthesis lab with all the reagents, labware, and safety equipment. Buried in the open source PR is the unshakeable underlying belief that science begins when the data are in the can and ready for analysis.

> I don't see how to open source work requiring a femtosecond mid-infrared laser or a prep requiring a synthesis lab with all the reagents, labware, and safety equipment.

You can put text documents on GitHub describing process, in the same way as you can code and data. If you have some setup with a femtosecond mid-infrared laser or prep requiring a synthesis lab with all the reagents, labware, and safety equipment you can open source the bill of parts, the build instructions and the lab book. It'd probably be very valuable to do that so please do!


Here are the freely available supplemental data to a paper in the Journal of the American Chemical Society blending organic synthesis, computation, and spectral characterization. 122 pages of exquisite details from a multi-lab collaboration. Lots more like it out there.

Note: I am not in any way affiliated with this research or the labs involved. This came out of a quick search.

https://pubs.acs.org/doi/suppl/10.1021/jacs.6b13031/suppl_fi...


Doesn’t that prove my point? I know people post their artefacts. I often review them. Not sure what you’re trying to say?

Reproducing that paper will be very difficult even though all the information is out there. There is a world of science outside of data processing.

I think most people who have worked in science long enough realize that publications are not even minimum-viable: they often omit absolutely necessary information. Sometimes this is intentional, but most of the time, it's just assumed that the reproducer is working in a world-class lab and gets advice/help to implement state-of-the-art work.

> There is a world of science outside of data processing.

Yes, but that's documented in lab books and procedure documents. Or at least it should be! If it isn't, how are they able to explain their own research? And those can be open sourced.


The ethos of documenting and describing research in detailed technical publications has been around for a very long time. That supplemental data I referenced has a phenomenal amount of detail and is only one example. They have done as good a job of open sourcing as is possible. Having read other papers from those lab chiefs (Zare, Houk, Baran, Grubbs, Stoltz) I know that they are scrupulous about the detail they publish.

Even so, very few people will be able to replicate that work outside without a very well funded laboratory or collaboration of their own. Lab notebooks contain a vast amount of tangential or irrelevant data which are distilled into the publication. What good is a process document for obtaining an x-ray structure if the diffractometer costs a fortune and is a shared departmental or even national resource? In your example, how deep does the bill of materials go? Is it sufficient to state that one needs a Bruker FTIR or Coherent optical parametric oscillator or do those have to be decomposed into the lowest-level components?


> Even so, very few people will be able to replicate that work outside without a very well funded laboratory or collaboration of their own.

That's not a reason to not be open about it!

> What good is a process document for obtaining an x-ray structure if the diffractometer costs a fortune and is a shared departmental or even national resource?

I think it's inherently good! Even if you think can't use it right now, it's good to put it all out there for people looking and into the archives to keep it for the future.

> In your example, how deep does the bill of materials go?

Well if bills of materials are available for your components themselves then you don't need to break them down yourself.


> That's not a reason to not be open about it!

We're talking past each other. Openness in methods has been around in the physical and biological sciences for a very long time.

> Well if bills of materials are available for your components themselves then you don't need to break them down yourself.

Reverse engineering a piece of purchased equipment to publish its BoM now becomes a required part of scientific publication? Or, expecting a manufacturer to provide it and authorize it for general release? I don't think that's realistic.


> We're talking past each other. Openness in methods has been around in the physical and biological sciences for a very long time.

You're listing a lot of reasons why you think it's not worth it!

> Reverse engineering a piece of purchased equipment to publish its BoM now becomes a required part of scientific publication?

No, you mis-read me. I said if it's an existing piece of equipment just say you used that piece of equipment.


@chrisseaton: Can't reply at the correct indentation level but

> You're listing a lot of reasons why you think it's not worth it!

No, I am saying that it is not a new concept. I am all for it and tried to hew to that standard in the papers I've written.

And since I can't reply at the correct level, I'll take that as a hint that the site doesn't really want this and stop here.

Thanks for the discussion.


If you think that all software written on public money should be public, consider this petition: https://publiccode.eu

I think the budget used to make this software possible should also be public... yet we have a black budget




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: