
Improving GitHub for science - codecondo
https://github.com/blog/1840-improving-github-for-science
======
timr
So, aside from DOIs (which anyone can create:
[http://www.doi.org/faq.html](http://www.doi.org/faq.html)), this post is
mostly about free private git repos.

But the thing is, _anyone_ can get as many free private git repos as they like
at bitbucket.org right now. You only have to pay when your team is over 5
people, and even then it's pretty cheap. So if you want hosted git but cost is
an object, just use Bitbucket. There's no reason to pay a premium for hosted
git. It's a commodity.

~~~
cben
Another tiny attraction of bitbucket for science is MathJax support in
reStructuredText [0]. Unfortunately not implemented yet for markdown [1].

There is an unfilled niche for version control with out-of-the-box rendering
of scientific formats: \- mathjax everywhere \- And of course full latex. This
may be specialized enough to be better served by separate services, but now
that sharelatex is open source a tighter integration is possible. \-
bibliographies (rendered as publication lists?) \- Notebooks like IPython [2]
and Sage; literate R. \- plots of various kinds \- tons of formats I never
heard about (chemistry, bio, astronomy...)

Third-party web renderers exist for most formats (e.g. nbviewer.ipython.org)
but there would be value for doing them all in one place - making repos pretty
enough to serve as poor man's website (think how Github's up-front README
rendering was good enough for many projects).

The obvious direction is improving[3]/forking Gitlab. I hoped Banyan.co would
be that but just discovered they went down :-(. Authorea does some of this.
And cloud.sagemath.com exposes an impressive amount of tools under one roof.
Though it's focused on working more than browsing.

[0] [https://bitbucket.org/cbensf/test-
math](https://bitbucket.org/cbensf/test-math) For some reason not working at
test-math/src view (reported
[https://bitbucket.org/site/master/issue/9483](https://bitbucket.org/site/master/issue/9483))
[1] [https://bitbucket.org/site/master/issue/7908/enable-
mathjax-...](https://bitbucket.org/site/master/issue/7908/enable-mathjax-in-
markdown-bb-9086) [2]
[http://nbviewer.ipython.org/urls/bitbucket.org/mforbes/paper...](http://nbviewer.ipython.org/urls/bitbucket.org/mforbes/paper_dvrvsho/raw/tip/docs/notebooks/DVR_Demo.ipynb)
[3]
[http://feedback.gitlab.com/forums/176466-general/suggestions...](http://feedback.gitlab.com/forums/176466-general/suggestions/5569530)

BTW, Gitlab.com also has free private repos.

~~~
guynamedloren
We're on the same wavelength here. This is exactly the problem I'm trying to
tackle with Penflip ([https://www.penflip.com/](https://www.penflip.com/)).

    
    
      - Began as a fork of Gitlab
      - Hosts public and private writing projects that are backed by git repos
      - Git repos allow for local editing, remote access, and easy collaboration
      - In-browser markdown editor with syntax highlighting
      - MathJax support (as of yesterday - still testing and tweaking)
      - Extended markdown support for tables, footnotes, etc 
      - One-click downloads in PDF / HTML / ePub / Word format
    

Originally, Penflip was geared towards writers, but I am seeing an increased
demand for scientific and academic uses. I'm exploring this right now.

EDIT: wow, just realized you're the one behind mathdown.net, which I recently
discovered while researching MathJax. Excellent work!

~~~
cannam
Now that's an interesting site.

It would be cool if the free tier included one private project, so you could
test it without making a public project full of typical test blathering. (I
see the Discover page lists a few projects that are clearly only testing.)

The git import/export is very nice but looks a bit magicky -- am I right in
thinking text files called anything other than document.txt will be silently
ignored? I was looking at this with a view to working out how to import
existing documents, ideally with history: what's the best workflow for that?

A serif font option in the editor/preview would be welcome.

Looking forward to trying this for something more substantial, though. Nice.

------
bradgessler
... meanwhile Github doesn't support viewing rendered SVG files when you click
on a .svg file, yet they have viewers for 3D objects and maps that render in
SVG.

It's frustrating how difficult it is to get anybody at Github do work on small
things since there's "no manager" and everybody wants to focus on grandiose
projects.

~~~
jahewson
Browsers are really not good at rendering SVG. Both WebKit and Gecko are
riddled with bugs once you start using the non-trivial features of SVG. Worse
still, browsers don't even support the latest version of SVG, which suffered a
fate similar to ECMAScript 4. However, vector illustrating programs will
happily generate files which use the browser-incompatible newer versions.

So while it may be possible to target SVG as a backend for rendering basic
drawings, displaying arbitrary SVG files is, in practice, a lost cause.

~~~
TTPrograms
They don't have to send the straight svg for preview. They could either modify
it for compatibility or at least render it server-side to raster and send it
over. It's not impossible, at any rate.

~~~
nbody
It's not only a matter of compatibility but a multitude of security issues as
well.

------
privong
I like that the DOI association is with a tagged release, making it easy to
identify specific versions of the repository. In principle, this will make
duplication/checking of research results easier, as one can ensure use of the
same version of the software as the original work.

~~~
hrjet
Tags can be moved in Git. How much you want to trust the tags depends upon
your use-case.

~~~
trurl42
The DOI is linked to a zip archive on zendodo / figshare, which will not
change, even if you change the tag.

~~~
sampo
So if you publish the DOI as a part of a paper, and then later improve the
code in the Github repo (but of course not in the archived snapshot), is there
some way that the people who follow the DOI could also find the improvements?

------
btn
Something they don't mention until _after_ you've signed up: the micro plan
only lasts for two years. I assume any private repositories will become locked
if you don't pay for a subscription after that (as with regular accounts).

In comparison with BitBucket (not to advocate, but they offer a comparable
service): the restrictions they waive for academic accounts are done so
permanently.

~~~
rxdazn
You can renew it after two years though (still free of charge).

~~~
btn
Is that spelt out somewhere? The only mention of it I can find is in the
confirmation email they sent: "will be free for the next two years".

~~~
rxdazn
I don't think so. I just filled the form and my request was approved (they did
check my school email).

------
diego898
Im very excited that github is pushing forward with this. As a graduate
student Ive been desperately trying to get my lab to switch over and use
github as opposed to myfile.m --> myfile_diego.m --> myfile_diego_changed.m
etc!

I want researchers to use it not just for code, but for latex files for paper
writing as well!

~~~
anonymousDan
Our department recentlyvset up a self hosted gitlab. Works great. As many
private repos as you like and you don't have to worry about github
disappearing off the map.

~~~
Xylakant
Seriously, the chances that github just disappears off the map are about the
same that the gitlab team decides to stop working on the OS product. Non-zero,
but still low. However, the impact is only mediocre: It's simple to keep a
copy of the repo around and just a simple to keep a copy of the wiki around:
They're git repos too. The only thing you'd loose is the issue history and
your network of clones.

If that's what you're concerned about I'd be more afraid of your internal
gitlab server crashing down in flames because somebody fat-fingered a critical
update.

------
onalark
This is very good news for digital science, and has been a long time coming.
For those wondering why this is new and different from previous efforts, I
will try to summarize.

Up until now, there have been many places to permanently store your public
scientific research objects, so long as you were willing to use a Creative
Commons license. Similarly, there were many places to freely host your
software, so long as you didn't need a permanent store for your scientific
code.

The GitHub-ZENODO partnership is the first web application to bridge these two
needs. An earlier attempt with Figshare almost got it right, but failed to
support code licenses (everything at Figshare is CC BY). This isn't the only
way to host or cite your code, but it's definitely a good default.

~~~
hueving
Most universities already have systems to host research long-term. Most
researchers are just too lazy to use it. Also, I would place more money on a
university being around longer than Github if I was worried about longevity.

~~~
onalark
I agree that University archives present another opportunity, particularly
among the "first-tier" research institutions in the US. One problem is that
many University archives are paywalled or require restrictive licenses.
Another is that these archives generally do not accept submissions from
outside the University.

Let me turn this around on you. Given the availability of University preprint
servers, what's the value of a service like arxiv?

------
dnautics
I'm not so sure about allowing for private research repos. As a former
scientist, incentivising siloization seems to be the wrong direction; if they
want it, they should pay for it. A normal micro plan is cheap enough that it
won't break the wallet of a researcher.

~~~
eob
As long as society continues to reward individuals for generating new
research, individuals will, out of necessity, keep their research materials
private until they have finished extracting personal gain from it. Otherwise
they would go through all the pain of working on a problem only to have
someone swoop in 10% before it's complete and cross the finish line holding a
borrowed baton.

~~~
dnautics
If it's done in the open, it's pretty trivial to figure that out and the
person who swooped in runs the risk of being shamed. The few egregious cases
of this I can think of (Leo paquette and Armando Cordova) only were able to
operate because of siloed work and secrecy; and still they got exposed.

------
sixbrx
I don't think their Markdown flavor supports any Latex extensions, that would
be a good start, IMO.

~~~
onalark
It used to, but MathJaX does not play well with GitHub's Markdown parser.

------
alceufc
Is there any style or guidelines for citing code (e.g. as we have APA for
citing papers)?

Usually when I want to give credit to a software or code that I have used I
cite the paper that describes the that software or code.

Another point: if you attribute a DOI to your code, that could result in it
"stealing" the citations from a paper that describe that code.

~~~
mandalar12
Your latest point can be resolved through explicitly asking authors to cite
the paper rather than linking to the website/github repo.

For instance I used Intel Pintools and they specify in their FAQ[1] : I used
Pin for my latest paper. What citation should I include? [actual paper here]

[1]: [https://software.intel.com/en-us/articles/pintool-
faq](https://software.intel.com/en-us/articles/pintool-faq)

------
ylem
This is actually rather cool! There has been a movement in some fields towards
reproducible research
([http://reproducibleresearch.net/](http://reproducibleresearch.net/)). If
people were required to provide source code in their publications (by funding
agencies), it would be rather useful for those cases where there are questions
about the data analysis (for example:
[http://en.wikipedia.org/wiki/Anil_Potti](http://en.wikipedia.org/wiki/Anil_Potti)).
It would allow a serious referee to check to see if there were either obvious
or subtle mistakes in the analysis. Also, it would allow subsequent
researchers to see what previous researchers actually did in the reduction--
not just the final figure.

------
cannam
This seems like quite a big deal -- the parts were in place already, but
having a route promoted by Github will probably make a big difference in
academia. The idea of having a citable DOI for your work exerts a magic that
is possibly out of proportion to its practical advantages for research code.

I imagine the private project thing may be a blow to Bitbucket, which is quite
widely used by academics.

The pessimistic part of me finds it quite discouraging that this will only
centralise things more at Github. The optimistic part hopes for more matter-
of-course publication of research code and a happier relationship between
researchers and the software they write.

(Disclaimer: I run a subject-specific academic code repository,
[http://code.soundsoftware.ac.uk](http://code.soundsoftware.ac.uk))

~~~
onalark
And it's a very nice site! I might suggest making it more clear one of your
help or about pages what licenses are acceptable for hosting. I am not a UK
academic or a sound analysis computational scientist, but it's something that
I would be concerned about when choosing a place to store my code.

------
dalek2point3
I've been thinking about this for a while, and thought github finally
understood the value of a "git for data". Github really needs to be data
repository, with "Code" being only one type of data. But then they become a
hosting service for large data files, and that is a headache and a different
business. I dont see how academics who work with reasonably large datasets
would move to github as long as they have to manage their data on another
platform. Perhaps Github could have deep integration with AWS / DO and
collaborate to solve that pain point.

------
hyperion2010
I have been working on a little python cli project electrophysiology. I will
write a paper about it but it is such a simple and small project that many
journals may not want to publish it, however it could be really useful for
other experimentalists. I personally don't care too much about giving credit,
but I want people to be able to clearly communicate what tools they used to do
their science and citing github repos or tags is not really done at the
moment. Kudos to all involved for getting this to work!

~~~
onalark
This is a perfect fit for the SciPy conference. There are also several open
journals that focus specifically on publishing openly licensed software used
in academic work.

Citing software really varies by community. It is very consistently done in
some communities, and is gaining broader acceptance in others, some with
surprising rapidity (ecology comes to mind).

It's an exciting time to be a scientist!

------
ISL
Anyone else having trouble verifying an existing primary .edu address (no
"Verify" button is apparent)?

~~~
sp332
Have you already verified it?

~~~
ISL
Checks old email... yes (thanks :) ).

------
dllthomas
So that's "Improving (GitHub for science)", not "(Improving GitHub) FOR
SCIENCE!".

