More

asoplata · 2025-01-30T20:02:43 1738267363

There's a small but growing amount of "research software engineers", people who attempt to bring professional-level software development to scientific research, and help scientists achieve their goals (example org: https://us-rse.org/ ). That's what I do as of recently (see https://hnn.brown.edu/ ): left my last post-doc, and switched to full-time development on a scientific computing package in my old domain (computational neuroscience).

asoplata · on Jan 14, 2022

Absolutely, yes. The other comments here have some fantastic reasons for doing this, and several do a good job of weighing the pros vs cons.

The paper alone is, almost always, never enough to fully reproduce the result. I've been bitten by this almost every time I've tried to implement someone else's computational model. It comes down to that only relying on your paper to explain your code leaves a LOT of room for errors. I've experienced all of these when trying to implement someone else's computational work without their code being published:

    1. Despite your best efforts, you include fundamental, result-breaking typos in the equations you write up to explain the math of what you're doing. This WILL happen to you at some point in your career, and in my experience, it's a problem in >>50% of computational modeling papers.
    2. There are assumptions in the logic of the code that you don't include in the writeup, since they're obvious to you, but you don't realize that someone else trying to understand your paper won't necessarily be starting with those same assumptions. This happens frequently with neural models that use complicated synapse-computation schemes.
    3. Your codebase may be big enough that you think code part X works a certain kind of way from memory, but you forget that you changed the logic late in the project to work in a different way.
    4. Publishing your code at the time of publication prevents "Which version did I use?" problems. It's very common for people to continue to work on their science code for new work, but they don't bother to save/tag a SPECIFIC version of their code that was used for the actual paper. This results in that even the author doesn't know what exact values were used for the results in the paper!

Any "competitive advantage" has to be weighed versus "positive exposure". If your code is the primary research object (as opposed to the data), then it's technically possible that someone may grab your code, extend it to do the next, interesting use of it, and then scoop you before you can do it yourself. However, even if this happens (which it probably won't), consider the following:

    1. You can't build a successful career out of just small extensions to the same piece of code, and so that codebase won't be the main kernel of your career, but rather your understanding of it.
    2. For every 1 person that tries to use that to scoop you, IMHO there's going to be at least 10 other people who see your code and reach out to you for help with it, or just to ask a question about it, or reach out for potential collaboration! In other words, depending on the field, if you publish the code, I think you're likely to gain new/future collaborators at a MUCH faster rate than people who compete against you. You'll be surprised at how many researchers on the other side of the planet are interested in your software!
    3. Even if someone scoops you with your own code, if they give any indication it came from you, you still get to count that as a publication that built off of your software work when you're applying to jobs :)
    4. At least with US federal government funding, it's gradually becoming required to do this anyways, and I believe/hope that it's going to become the standard anyways very soon.

Finally, don't fret about polishing/cleaning/organizing the code, especially style. For others trying to reproduce your results or just investigating how you did things, the main thing that matters is that your code runs "correctly", i.e. how you ran it to get the results that you did. One idea is to publish it "as is" for the CORRECTNESS of the paper, put a git tag indicating "original version", and THEN clean it up on Github/wherever. This helps prevent any new "organizing" of the code from potentially breaking something, which is counterproductive. This way, when people go to your code page, the first thing they see is a nicely-organized version, and gives you time to test that it works the same. Honestly, if you care enough about this at all, then your code is probably significantly more organized than 95% of research code out there; the standards of code quality in science are VERY low, which is completely different than private sector software engineering.

* edits are for markup

asoplata · on Dec 20, 2020

I recommend using isync (aka mbsync) https://isync.sourceforge.io/ to download an offline, IMAP-style copy of your email; it seems to be faster than offlineimap http://www.offlineimap.org/ which does the same thing. Many people have it run on a simple cron job. You can then use Thunderbird etc. on your full local copy of the mailbox.

nfriedly · on Dec 20, 2020

Thank you, I'll try that out!

asoplata · on Feb 9, 2020

Do you mind posting a link to your dissertation? That chapter sounds interesting!

selimthegrim · on Feb 9, 2020

Seconding. Is it on ProQuest?

scottlocklin · on Feb 9, 2020

I don't even know what that is. I graduated in 2004!

Not even sure where the PDF is at this point; if I dig it up I'll put it on my blog.

asoplata · on Aug 25, 2019

I've read probably every HN thread the past 10 years about this, and the current consensus seems to be either Dell XPS 13 Developer Edition or Lenovo X1 Carbon.

asoplata · on Sept 4, 2018

Many of those jobs are done by the same academics as authors and reviewers who also work pro bono or for very little money as well.

asoplata · on Aug 30, 2018

Why does working a Bullshit Job have to be about ritual or tribal identity or anything else, when the most important thing in that person's life is putting food on the table? Even if a job is bullshit, there isn't enough of a social safety net in most/all of the world to provide for one's needs without taking a job. For an employee with no financial cushion and trying to avoid homelessness/starvation, what the job does or its contribution to society is secondary to the income it provides that allows that person their basic necessities.

Similarly, if those basic needs were met a la Universal Basic Income or a Star Trek society, there'd probably be less bullshit jobs of people doing something, ANYTHING, just to earn enough to live and provide for their family.

asoplata · on Oct 12, 2016

There's a free (legal!!!) HTML version at the "UC Press E-books Collection" http://publishing.cdlib.org/ucpressebooks/view?docId=ft4t1nb...

Generally speaking, I find that the more expensive a rare academic tome is, the more likely there are legal/etc. versions online (PDFs, HTML, DjVu, etc.), indexed by Google. That seems to be much more true for STEM texts than those of the humanities, though.

dekhn · on Oct 12, 2016

Thanks!

I'd love to see that the case for this book https://www.amazon.com/Molecular-Vision-Life-Rockefeller-Fou...

selimthegrim · on Oct 12, 2016

Ah, yes that's where I'd first read it but forgotten the link.

asoplata · on June 5, 2016

Have you tried Pandoc[0]? It's the best, simplest solution I've been able to find for writing content in Markdown, and then almost trivially exporting that same content into PDF manuscripts, or PDF presentations, or HTML presentations, or more, without having to change the source content. It seems to be increasingly popular in academia for handling class notes/slides/papers all in one go. It can have good citation processing with cite-proc [1]. One of my favorite things about it is, if you know you're just going to be exporting to a specific filetype like HTML or LaTeX, you can just plain throw in code for that language directly into the Markdown, and Pandoc will run/include it.

(shameless plug alert) I wrote a super-simple Makefile script for using Pandoc, called acadoc [2]. It lets you call "make beamer", "make manuscript", "make html_presentation", etc., in a directory with Markdown files to turn them all into whatever presentation/manuscript/paper/etc. you want. The key thing, though, is that for LaTeX presentations/etc. it uses an intermediary "style/manuscript.tex" file to customize how you want your resulting LaTeX to look -- so if you want the same content, but decide the presentation looks better with white-on-black instead of black-on-white, you only need make a few changes and create a "style/manuscript_dark.tex", copy-paste that style in the Makefile, and now you can call "make manuscript_dark". It's meant to be quick and easy to use, and theoretically easy to add new recipes to.

If you want some more serious power about multiple document-type exporting, check out the links in the Acknowledgements section of Acadoc, shown here:

- https://kieranhealy.org/blog/archives/2014/01/23/plain-text/ - http://jeromyanglim.blogspot.com/2012/07/beamer-pandoc-markd... - https://github.com/timtylin/scholdoc - http://scholarlymarkdown.com/

(Yes, I know LaTeX styles/stylesheets are a thing, but I've never taken the time to learn them, nor figure out if they can do all the custom formatting a regular .tex stylization file can do.)

[0] http://pandoc.org/

[1] https://github.com/jgm/pandoc-citeproc

[2] https://github.com/asoplata/acadoc

asoplata · on May 15, 2016

That ComputerLanguages repo, wow! This is really awesome! And far, far superior to my own store/collection of helpful links by language by far :(. I may just stop altogether collecting helpful links on languages I may never, ever use, when there're such better lists and guides (like yours) out there that I can search if I ever need to. There are far more programming languages/ideas than there is time.