Hacker News new | comments | show | ask | jobs | submit login
We need a GitHub of Science (marciovm.com)
246 points by marciovm123 2213 days ago | hide | past | web | 62 comments | favorite

>- GitHub's success is not just about openness, but also a prestige economy that rewards valuable content producers with credit and attention

I don't think I can agree with that. GitHub's success, IMO, seems to be based almost entirely on its openness. It has turned contributing to open source software into a drop-dead easy task, which would never be found nor contributed to if they weren't open. And they keep making it easier. I've fixed a number of things with machines which don't have Git installed, simply because they have their on-site editor.

Imagine if GitHub were behind a paywall. Do you think it would still be the success it is today? And, I may be weird, but I very rarely look at the names associated with commit histories. The code should speak for itself.

The rest of it sounds about right, scientific publishing as a whole is massively backwards compared to GitHub, if you're looking at it from an "Open" perspective. But I think that a lot of that is that the researchers tend to be insular compared to the implementers (businesses guarding their IP aside - they're not really GitHub's target audience anyway). GitHub isn't used exclusively for comp-sci researchers to post their findings with code, it's more for people doing things with ideas others have contributed to.

There are experiments on GitHub, absolutely. I have a few myself. But the main thing that GitHub has done is to make final products easy to find, modify, and contribute to. I have significant doubts that it would fit a research workflow smoothly, without becoming something else entirely.

John Resig started a project to generate resumes from the contributions at github. Regularly I see posts at HN/reddit/etc sites of users saying: "I build this and that, check it out!".

Personally, a motivation (of a lot more motivations) is indeed prestige. Now I can show off my nice work.

Sure, openness started it all: Linus shared his VCS, which in turn sparked Github, which in turn initiated thousands of developers to share their code. But openness really isn't the sole driver for people to share their code any more; more incentives, of which prestige is an important one, drive the popularity of github.

>"I build this and that, check it out!"

But people do that with other open source sites as well. Does GitHub provide this feature better than SourceForge or others? You still need to go to the user's page, they don't advertise it anywhere else. Do people go to GitHub to see information about person X, or project Y? And for the posts on social sites, are they more often about the creator or what they created?

> And for the posts on social sites, are they more often about the creator or what they created?

You hit the nail right on the head. Github is about sharing code. The code is the creation. Github's unique selling point is the webinterface to show the code: It's the best designed interface in the industry to show code in a browser.

> Does GitHub provide this feature better than SourceForge or others?

Various software is available to show code in the browser, but none works as well and is polished as well as Github. The creation (code) therefore can be best shared on Github. As such, others willing to check out the creation are, from the moment they arive at github, mostly busy this exactly that: check out the creation (code). Every time I visit repo's visualizing the code in html on other places then github, I get agitated by the annoying interface. Example: some interfaces require you to click on a file, after which the postback returns a site containing a list of revisions, and new buttons to 'view' a revision. This causes me to wait for a postback, and click, twice per file I want to view. Github instead instantly shows the last revision.

All these small tweaks account for much better usability on Github compared to other sites.

I guess for scientists, there must be the same approach as Github approaches programmers. For programmers, it's about code, and for scientists, it's about data and the conclusions derived from it. Instead of showing code, show a paper with the possibility to drill down on data. The data being shared can then be treated the same as code being in a source repo, so the usual git stuff (branching, merging, pulling, pushing) can be applied on the paper+data.

I'm a Ph.D. student working in theoretical high-energy physics. In this field we don't rely on peer-reviewed journals. Instead, when a researchers wants to "publish" a paper she uploads it to the arXiv, and the paper appears on the site within a day or two. The arXiv is open in the sense that almost anyone can publish there [1]. Researchers in the field catch up on new research by scanning the arXiv daily for interesting papers. No one I know reads peer-reviewed journals. I know that many papers are also published in journals, but I believe this is a formality that has more to do with obtaining grants and such than with actual communication within the community. As far as I know there's no reason to publish in a journal before you become a professor.

The result is similar to the GitHub situation in many ways. Because there are no barriers to publishing, everyone makes up their own mind about which papers are interesting. If your work is relevant, others will build on it and cite you. They will discuss it in their group meeting, and so on. A scientist's reputation is then directly related to the quality of their work, as judged by the community, with no artificial barriers. This means that a self-respecting scientist would not publish a sub-par paper even though it's technicality possible to do so, because that would hurt her reputation.

So it seems to me that the situation in high-energy physics is close to ideal, with respect to ease of publishing and the social aspect of reputation. Having said that, there are certainly aspects of GitHub that I would love to see adopted.

For instance, when several researchers are writing a paper, generally no version control system is employed. Instead, at any point in time the draft is "locked" by one of the collaborators, and only that person can change it. Beyond the obvious inefficiency of this method, note that it is also difficult to track what changes were made in each lock cycle. I use diff for this purpose, but in my experience many scientists in the field aren't aware of such tools. So something that could really help is a simple way to collaborate on papers, just a basic source control system. Also, it must be possible to work on the paper in private within the collaboration, and only publish the end result.

[1] The few barriers that exist are in place to keep out the crackpots, who reduce the signal-to-noise ratio and in that sense resemble spammers.

As a mathematics postdoc -- there is one (and only one) reason to publish your work in peer-reviewed journals. To prove to people outside your immediate area that you've done good work.

They are completely superfluous for disseminating knowledge.

"In this field we don't rely on peer-reviewed journals...papers are also published in journals, but I believe this is a formality that has more to do with obtaining grants and such than with actual communication within the community."

So peer reviewed journals are only important for grants, but your community doesn't rely on peer reviewed journals. Do you not rely on grants? Who funds your work? Do you all work for free, in your spare time?

I guess I wasn't clear enough. What I mean is that the communication within the community, and the reputation of a researcher among her peers, does not rely on publications in reviewed journals. These are the things that can be compared with open-source development.

If researchers also need to publish their work in journals, write grant proposals etc., how is it relevant to the idea of applying the GitHub model to science? Of course raising money is part of the job for a professor, but thanks to the arXiv it's decoupled from the actual research work. It's at a point where I, as a Ph.D. student, have no reason to consider publishing in reviewed journals. This is in contrast to my friends in optics or condensed matter, for whom a publication in Nature or Science practically guarantees a good postdoc position.

Thanks for the perspective. My impression from reading about Grigori Perelman was that almost all papers in arXiv are eventually published in traditional journals, for grant purposes, as alluded to by mechanical_fish. Maybe arXiv just needs to hire Scott Chacon (VP of R&D at GitHub).

I'm a former biophysics postdoc myself. Now I work for an open-source software company.

This post strikes me as charmingly naive. You have to love this guy. And yet any essay that discusses the incentive structure of science but doesn't use the word "grant" until the last sentence is beating around the bush. Follow the money, my friends.

The publications are a side issue. To the extent that your count of top-tier publications matters when trying to get an academic job, it's because it's correlated with your ability to bring in money. (Money comes from peer review too, and what your peers want to read about is also what they want to fund.) What the hiring committees really want is grants. Grant money pays for labs and salaries. It pays for grad students and postdocs. And grant money literally buys prestige: Big projects come from big grants, and big grants require strong track records and a bunch of preliminary data, which in turn comes from smaller grants, or from the shared equipment that your neighbor bought with her grants.

The fact that there aren't that many top-tier peer-reviewed journals is a side effect of the limited number of top scientists, and the number of scientists is limited by available resources, not by lack of knowledge or connections or education. I could literally pick up the phone and reach a dozen Ivy-educated postdocs who would be full-time scientists if they could afford it.

Why can you find so much great software on Github? There are lots of reasons, but a fundamental one is: Moore's Law. Computer hardware has become so dirt cheap that you can be a programmer in your spare time. You can literally be a twelve-year-old kid with a $200 cast-off computer and yet do top-notch software work. If computers cost millions of dollars each, like they did in 1963, we wouldn't have Github. We'd have the drawer of a desk on the ninth floor of Tech Square. (After all, in the old days half the AI researchers in the world lived within a few miles of that drawer, and the others were just a phone call away.) That's how most advanced science works today: There's no need for more publishing infrastructure for scientific technique, because the available methods of getting the word out -- top journals, second-tier journals, email, the phone, bumping into people in the hallway at conferences -- scale well enough to meet the limited demand. Because just having the recipe for your very own scanning multiphoton microscope doesn't do you much good: You need a $150,000 laser, and a $200,000 microscope, and tens of thousands of dollars in lenses and filters and dyes, and a couple of trained optics experts to maintain the thing, and that's before you even have something to photograph.

I wish there were a magical way to turn everyone's suburban basement into a cancer research lab, the way Github has turned everyone's couch into a potential CS research lab, but there's no magic bullet. A few technologies, like DNA sequencing, are sufficiently generic, useful, and automatable to be amenable to Moore's-Law-based solutions, so we probably will soon be able to (e.g.) drop leaves into the hopper of a $1000 box and get a readout of the tree's genetics. But something like cancer research is never going to be cheap. To study cancer you must first have a creature that has cancer. Mice are as cheap as those get, and mice are not cheap, especially if you know what the word mycoplasma means.

I used to think this as well. But if you think about things which are ridiculously expensive -- like cars or airplanes or Google's datacenters -- all of them got their start in someone's garage.

After spending some time with the Open Wetware/DIYBio guys, I realized that no one (other than the DIYBio guys) has really spent time reducing the costs of the fundamental toolkit of the molecular biologist.

Now they are starting to do so:


That gel box is going to be about $200-$300 all in. You might not get a multiphoton microscope right off the bat, anymore than you'd get access to all of Goog's servers when you were just starting out with your laptop...but reducing the price of entry for the hobbyist biologist/biochemist to say $5k or less is going to be really important.

That's because the role of the DIYBio community is set to wax in a big way. With the NIH budget cuts and the coming fiscal collapse of the US Government, the sun is about to set on a ~70 year period (1945-2015) of centralized US government research. The period before 1945 was more innovative by many measures (e.g. http://www.nytimes.com/2011/01/30/business/30view.html?_r=1), and potentially the period afterwards will be as well.

I agree that the DIYbio folks et al are incredibly exciting. But saying that those projects have to do with "building a Github of science" is putting the cart before the horse. To extend my metaphor for a moment, it's as if the engineers of 1963 had decided to popularize open source software by inventing and promoting Github, instead of inventing the personal computer and Ethernet first.

(As an aside, there are fascinating historical examples of people who did invent the web before inventing the personal computer, like this guy:


Of course, he ended up as many of these slightly-too-early visionaries did: His work lost all its funding and he was kind of sad.)

Build the gel box and then worry about the social network. Or better yet, don't worry about the social network at all: These gel box users will network themselves, no problem. It would be a challenge to stop them from finding each other online.

I don't know much about expensive lab equipment, but if you follow the DIY world, or the open source world , or even indian/chinese innovators, you see people building physical stuff at fractions of the price of commercial stuff.

A few example to illustrate my point:

1. Open farm tech - tractor and brick compressor and 1/3 to 1/10 of the commercial prices. they plan a whole set of manufacturing equipment at those reduced prices.

2. students took pictures of earth from space using £90 in equipment.

3. a diy electron microscope, a guy made himself [1], probably costs a lot less then commercial ones.

There many more extremely cheap stuff like this. Maybe the first place of open sourcing science is in support of open source tool development.

[1] http://blog.makezine.com/archive/2011/03/diy-scanning-electr...

I now see a terrible flaw in my essay of last night: I took the cheap and easy rhetorical route of emphasizing the costs of the fanciest equipment. Whereas the thing that kills you in science budgeting is actually the mundane stuff that you need in vast quantities.

Perhaps I should have talked about something boring, like absolutely clean, absolutely sterile containers. There is nothing sexy about containers and pipettes, and they are individually cheap. But they add up. The most reliable technique is to just buy quantities of disposable ones. You can try to save money by using recyclable containers instead, by washing dishes a lot, but washing dishes by hand is expensive even if you don't do it very patiently and carefully. And you need to be really careful, because if you move your cell culture from dish to dish to dish ten times, and even one of those dishes is contaminated, guess what? Your experiment might now be contaminated. And for every Alexander Fleming (a lucky guy whose contaminated experiment turned out to be a Nobel-Prize-winning medical breakthrough) there are a thousand experiments that have to be discarded because the data isn't reliable.

But even that argument is misleading, because the most expensive ingredient in science is not even a material good. It's time. Science is about patience and consistency. Doing an experiment once is not science. Doing it one thousand times and getting absolutely consistent results -- that is science. The work of being a scientist is about carefully building and debugging a reliable sequence of steps ("grow, filter, sort, lyse, plate, stain, image"): A sequence that can be repeated over and over to obtain thousands of data points that are extremely self-consistent.

The reason why professional scientists use such expensive equipment is that the equipment is actually cheap compared to the cost of spending two years taking data that turn out to be full of errors because your tools weren't reliable. Too much random error and you won't see your data amid the noise; too much systematic error and you might eventually have to throw out 100% of your work and start over. Trust me: If you want to experience soul-crushing misery [1], work sixty hours a week taking data for two years, then set the data on fire because it's unsalvageable. I have seen this happen many times. It has happened to me. It happens all the time in science, but you can't afford to have it happen too often.

So, yeah, you can take pictures of earth from space for $200, but can you take the same picture one thousand times, under consistent lighting and from consistent altitude and position? Yeah, you can build an electron microscope in your basement, but will it keep working every day for five years while you take your data? How much maintenance will it require? How much time will you waste waiting for it to pump down every time you change samples, or tweaking the knobs for hours every time to get a usable picture? Will it spread thin layers of carbon on your electronics, or go through a six-month phase where it can't focus well enough to see your samples?


[1] Fortunately, the misery is temporary. If it isn't, you're just not cut out to be an experimental scientist. Science requires reserves of forward-looking optimism! So what if yesterday was hell? It was a learning experience that will make tomorrow better! ;)

Thanks for a very detailed and interesting description of the scientific process.

From the way you describe it, it seems that doing science in your area, is not the science that would attract small hacker groups , because it's long and boring.

hackers tend to contribute to the interesting parts of open source , i.e. languages like python and ruby, and organizations tend to do the big, boring heavy lifting, things like android , linux , etc.

By the way, what about technologies like combinatorial testing, automation , and miniturization ? what would be their affect on the way biology is done ?

I wouldn't say it's boring. Not everyone has the patience science requires but for those that do, there's plenty of incentive to do disciplined data collection.

It's most likely just that the resources required to do it are still out of reach, unlike the python and ruby communities where your biggest expenses are a computer with a text editor and access to an internet connection.

Even some computing resources are out of reach for individuals, even if just barely. Amazon hosts public datasets, which is nice, but but the 500,000 instance-hours of HPC compute time needed to do analysis is not going to be in the budget of a hacker-in-the-basement (ie someone with no funding, whether it be from grants, revenue, or investment).

Also, as rapid prototyping picks up steam, this should also (eventually) help with sourcing equipment more cheaply. Lab on a chip (aka microfluidic tech) is a great example of Moore's law making it into labs.

Your "desk drawer in Tech Square" analogy is apt -- right now our publications are locked in a handful of those drawers, and folks can't even open the drawers unless they've bought keys from Springer/Elsevier/ACS/AIP/ACM/IEEE/etc. And how do we find out what's in the drawers? Searching the databases owned by ISI/Thompson-Reuters and friends, again for a fee.

Sure, a "Github of science" wouldn't turn anyone's basement into a cancer research lab, but it would mean that a lot of researchers at less-affluent universities would finally have full access to the literature of science.

I loathe the for-profit journals as much as anyone, but I'm deeply suspicious of any hypothesis of the form "the reason why journals still cost money is that we haven't yet invented the right electronic social network for sharing scientific information".

We've had the technology to publish science online for decades. We have tinkered with it dozens of times. The web was originally invented for exactly this purpose. Far older things, like TeX, were invented for this purpose. Nowadays we have everything from PLoS to arXiv to Google Scholar to custom in-house blogs to PDFs sent through email.

The continued existence of for-profit journals is an economic, political, and anthropological problem, not a technological one. PLoS and the like are slowly changing things, but I still suspect that the only way to free our journals within less than a generation or two is to lobby (e.g.) the NIH to require that their funded projects be published in free journals. When a grant agency talks, people listen. When postdocs talk, alas, it makes a very subtle sound. ;)

I agree with your statements, but perhaps the appeal of a github for science isn't the technology, it's the culture that comes with it. You can run your own git servers and submit patches over email, but when you can fork with the click of a button and do a pull request with another click, it encourages that much more sharing.

I've had this discussion with some of my professors, mostly just about open sourcing research code (I'm in Scientific Computing) and some of them wont do it because they want to squeeze a few publications out of one code and don't want anyone 'stealing' their publication. I find it disturbing, but it's ingrained in the culture. Changing it is important to me, but I don't see how yet.

The NIH is already actually doing that, at least sort of- any article reporting NIH-funded has to be submitted to PubMed Central, from where anybody can view and download the full text, figures, etc. I believe that there's an embargo exemption that allows journals to hold off on submitting to PMC for a few months or maybe up to a year, but after that it's public.

Right now, it's just NIH, but it's only a matter of time before AHRQ and the other biomedical funding agencies get in on the action, and there's no reason (in principle, anyway) why NSF or DOE couldn't also join in or do something similar on their own.

Where do Google Scholar and CiteSeer fit into this analogy?

Can't speak for other fields, but in CS any outlet worth submitting to will let you post a preprint on your own website, and from there Google Scholar will pick it up and place the link to your free copy right next to the paywall link. It's not a perfect system---it's like an extreme form price discrimination, like when Microsoft turns a blind eye to piracy in emerging markets---but it does give access to those who can't pay, and its not as dystopian as you make it out to be.

I think that CS is a lot more open than other fields, even putting preprints online and soliciting feedback before publication (that's unheard of in chemistry and biology). You're right that some authors put PDFs of their papers on their websites and Google Scholar finds them, but I frequently need papers that I can only get through interlibrary loan (and I work at MIT; even a relatively wealthy university can't afford everything).

Thanks for the love =).

I agree about funding being an ulterior motive for a hiring committee to look at publication history. In my defense, I brought it about 2/3 of the way through with:

"We need trusted ways to quantify just how useful that API and associated code are to the scientific community, which can be listed on a scientist's profile and utilized by committees making hiring and funding decisions."

Science is constrained by resources, like everything else, but that doesn't mean scientists shouldn't try to optimize what can be done with the resources available.

Also, you imply that good science requires your own experimental data - and bearing associated costs - but that's only true when experimentalists aren't incentivized to share their data.

Shoulders of giants and all that.

If I have seen further than others, it is only by standing on the shoulders of grants.

Do you need exclusive access to your very own scanning multiphoton microscope? Seems like laboratory virtualization would go hand in hand with open source science.

You can share them. Many labs do. Indeed, most of these things are probably shared in one way or another.

And you're correct: The trick to encouraging open source science is not to focus on the social networking tech -- that will be ready when you need it -- but to first attack the problem of doing quality lab work on the cheap. That's where the bottleneck is.

The biggest problem with shared facilities is the tragedy of the commons. In engineering -- or machining or woodworking or cooking, for that matter -- you quickly learn the importance of having your own tools. It only takes seconds to ruin a good tool. It only takes seconds to contaminate your cell culture, or your neighbor's cell culture, or an entire room full of your department's laboratory mice.

And mailing your samples off to a distant "virtual" lab is fine if you're studying disposable samples, or inorganic samples, or samples that have been permanently fixed and preserved on a glass slide. But living cells ship poorly even when you're allowed to ship them at all, and animals ship even more poorly than that. So often you've got to live next door to the equipment you're trying to share, and that's still expensive.

The cost of maintaining your own tools is quite high, as is the expertise required to maintain them in working order. It seems to be that the inexperience or lack of skill which leads to shared equipment getting ruined is due to the fact that extremely sensitive equipment is being handled by students and junior scientists with little engineering background. The same argument could be made for the value of having your own servers. The scientific equivalent of a datacenter, would necessarily entail staffing of highly qualified, experience personnel. Such expertise would undoubtedly be costly, but it would be amortized over a large amount of equipment.

Staffed by who? There are actually some user facilities for materials synthesis--the problem is that if someone is making someone else's material instead of working on something that they're interested in, then I don't think that you're going to get the same level of productivity out. Also, materials synthesis takes time and running through lots of dead ends. When I was a grad. student, a postdoc from a collaboration would drive in to our lab with powders she had made--try to grow a crystal for a week (sleeping for maybe an hour or two a night on a floor in our office) and then go back to her home institution and come back in a few weeks. That just doesn't work--you don't have the responsiveness to be able to figure out what dead ends you're wandering down...Imagine if you were writing code and could only run it once every few weeks. Now, imagine that it was thousands of lines of nontrivial code and you're trying to debug it that way...

Staffed by scientists and engineers, I imagine. You can get some pretty sophisticated parts built by foundries. At one point, making a microchip was prohibitively expensive. Nowadays, when you create chips basically by coding them in high level synthesis languages. Spinning a chip does take weeks. Obviously when a process takes longer to carry out, with high iteration cost, careful methodology is called for. Simulation and error checking software becomes valuable. Heck, once upon a time computers themselves were massively expensive, among the most expensive machines built. Computing wasn't born cheap. Mass production and decades of technological advances made them so. Machines for combinatorial science may someday be more effective than graduate students.

My lab has a SEM. It costs $50/hour to operate. I have been led to understand that most of that cost is to cover if something goes wrong (which means we can't lower the per hour cost by having more people use it).

Think how much more expensive coding would be if it cost $50/ per compile (or hell $10/per compile).

$50/hr seems quite reasonable. I doubt I'd need more than 100 hours a year of use, and even if I required several hundred hours, that's still readily affordable.

You may not need exclusive access to one, which is why a lot of institutions setup "centers" for this type of work (center for visualization, for example). Then multiple people can share the equipment and cost - if you can recruit people with a similar research focus. So, you may not need one exclusively, but you do need one locally available.

Seems like many samples could be prepared locally and shipped to the analysis center ala Netflix. Another possibility would be making samples to order at a location near where the necessary equipment is located. For instance, I need a brain tissue sample from xxx type of mouse, infected with pathogen yyyy.

That is the kind of thinking that is holding life sciences back: lack of openness, obsession with centuries old journals etc. Starting in comp. Neuroscience, and coming from a physics background I was shocked by how little info neuroscientists share with each other. I cant even access most of the papers I need from home. There is no repository of data or even computational models (well there is modeldb) . There have been several attempts to make the life sciences more open on the web, but it's not working because there is no culture of openness. For cryin out loud, most of the journals don't even have an online comments section. Big, ambitious projects are missing (think LHC-scale), and the pioneers in these are private institutions like the Allen brain institute. Maybe it's not possible to make a github for science, but it's certainly possible for a high profile university to require its staff to publish in open access journals only. That would be a start

Perhaps not at the academic level, at least not initially, but drawing more people into science by making it easier to ask questions and get answers couldn't hurt. http://area51.stackexchange.com/categories/7/science

Someday there might be 1,000,000 well-defined science/math questions, along with great answers.

There are science sections in quora as well, but people tend to post too many general/popsci/naive questions.

My goal for http://bibdex.com is to be this. I based the software on a wiki (original name was Bibwiki). The idea was to build lit reviews on topics that you could reuse and share with colleagues privately or publicly with the world.

I realized after starting that scientific communication is more complex, or at least it tries to be for various reasons. I could use help learning what people want from such a system.

I am keen on feedback or insights to drive my development. Please, if you are interested, you can reach me at sunir at bibdex com.

I agree (http://www.quora.com/What-online-tools-do-scientists-wish-ex...). Why don't we just start using GitHub itself to do this and go from there? The pain points will suggest ways that a real science-focused github could improve on GitHub itself.

The problem is currently GUI. There are no good GUI's to work with git. Windows and Mac OS X have some GUI tools scratching the surface of what's possible with git, but none come close in opening up the full possibilities of git. Linux has a few very alpha, specific (like showing branches) GUI tools.

If we want non-programmers to use git, we need GUI's to instantly visualize the state, commands and other possibilities. No non-programmer is going to learn git using a CLI.

Perhaps if it caught on enough. There are many people that had to learn LaTeX who were not programmers, which is IMHO not a trivial feat.

Not that I'm disagreeing with you--But making git point-and-clickable doesn't strike me as being very simple.

I agree.

I think that github is such a good tool to interact with git repos that if they made a version that can work locally (the main difference is explicitly dealing with the index and the current tree) I'd use it to manage git projects in a heartbeat.

http://science.io was featured on HN recently. Not github, but at least a place to discuss and sift research.

I tried to start a science network a few years ago, knowble.net, and I know this problem well. The main roadblock we faced was the "publish or perish" mentality. Luckily this mindset seems to be shifting & the idea of a 'GitHub of Science' is very powerful. Much more than a Science LinkedIn, which is what Knowble was.

The main unanswered questions for this idea are 1) Funding & 2) Maintenance. Knowble was a for-profit venture, but should have been a non-profit organization. So where can you/someone get the funding to build & maintain the site?

If you need a python hacker to help out - my email is emile.petrone (at) gmail.com

Thought of starting a science network myself... mind if I email you? Want to know more about what your roadblocks

One issue I see is what branch of science are we talking about? Physics largely seems to have this figured out via arXiv.org, but funding for molecular / biology / medical research is heavily dependent upon publication record. I'm not sure about Computer Science. But my point is when one says "Science" needs X or Y for "Science", no one is speaking the same language.

These comments are enough evidence of this. Some have already mentioned arXiv.org, and others Science.io which seems to be specifically targeted at CS. When you add medical research, the needs for these branches is vastly different.

Good ideas, but I disagree that you need a Bill Gates to make it happen.

The way this will happen is a grad student hacker who is avoiding working on his thesis will start coding it, and then create a kickstarter asking for support to spend the summer working on it. If she's a credible engineer, she'll get the support easily, and every subsequent kickstarter grant will also be fulfilled and it'll get built.

If you build it (right) they will come.

You may not want to believe it, but this is not an engineering challenge.

The programs on Github were written by amateurs. Professionals can do better - compare Python, PHP and Gnuplot to Mozilla, Scheme, Haskell, Plan 9 and Mathematica. But evidently people can keep their day jobs and still write good programs.

Science is different. The amateurs are called cranks, and a small community of professionals does the good stuff. (There are exceptions, but few.) The basic issue is who will pay their living expenses, and buy the million dollar machines that they work on.

These days, almost all research money is spent by governments. They spend most of it rewarding people for publishing in prestigious journals. Scientists will keep packaging their research that way until someone starts buying it in a different package.

Most of the interesting projects on Github were written by professionals and are far better quality than the average "professional" day-job application because it is a developer's passion and not constrained by business development.

I do not believe open source solves all problems, but you dismiss the incredible value and quality of so much that I find it difficult to take the rest of the comment seriously.

It sounds more like "People with credentials (whether scientists or professional programmers) are the only people who can produce quality work. I have credentials. I'm part of the elite who can do quality work."

You're mixing language definitions and implementations. You're comparing software that was created by big companies with software created by small teams or single developers. I think good commenters can do better.

BTW you're implying governments are funding over-priced journals people outside of prestigious institutions cannot afford. Even though that's true it's not the way it should be.

We also need a GitHub of government / legislation.

The most pressing problem of our modern information society is the abundance of crap.

The service peer-review provides is the filtering of crap, so that not everyone has to do that by himself. This makes science possible, as not everyone can be a master of all trades.

Publication without review is called "journalism".

As a side note, I believe that Elsevier has acquired an extreme market dominance in the scientific publishing sector and is abusing it in alarming ways.

Yes we do. Not just a replica of it with different content though, but a separate product tailored to the needs (and wishes) of science, sharing only some of the core ideas of GitHub. Sometimes I wonder if I should welcome the surfacing of ideas that have a large overlap with my own, or be anxious knowing that my lead has possibly been somewhat reduced.

I just met the brains behind Opani (http://opani.com) last night and they are a huge step in this direction.

Opani is actually a huge step in the right direction. I think Marcio's post is fantastic. I would in fact add an additional idea to it.

In addition to the open prestige inherent in GitHub, there is also the fact that one's work is vetted by a community. It becomes very difficult if not impossible to publish crap and claim that it is quality. In science this is not the case. The peer review system is supposed to protect us against that. However, my understanding is that a surprising percentage of research in top ten journals can't be reproduced either because key details about implementation are missing or because it is actually not reproducible.

A GitHub for science could also meaningfully move the ball forward in making science reproducible as it should be easy to wrap ones scripts in a specification of an "environment" that can be readily setup, deployed, and run. A lot of work would be required to develop corollaries for non-computational scientific domains, but it would be a hugely valuable effort as discussed in the general reproducible research community (http://reproducibleresearch.net/).


I think we need a github of any kind of information.

Do you think something like http://pubcentral.net could be useful in that direction?

The biggest problem with CS academicians have been in their misinterpretation of computers. Its a very different field from traditional sciences like Physics, Chemistry etc.. In traditional sciences, we study the world, understand it and express those ideas formally. With computers, its upto one's imagination what they can do with it. We just get so lost in the depths of formalism, that we forget that hacking and exploration are what can break boundaries and enable people to make computers do what they could not.

Funnily, academia harbours the most brilliant minds of CS, and barely produces usable software. Its people who identify problems, and provide software/ideas who actually get things moving. Github/Blogosphere etc allow such solutions to emerge more efficiently by allowing a lot of people to look at such solutions. In academia, a publication is taken as a end point for problem solving. There are no incentives to build real software or real systems.

If computer science wants to make a difference, it must move away from its publish or perish culture.

Computer Science is not about building real software or real systems, as opposed to Software Engineering. Your comment strikes me as missing the whole point of CS. Maybe it is so because of misleading English name -- as someone said, Computer Science is not about computers, just like Astronomy is not about telescopes -- but what kind of person would call Astronomy a Telescope Science?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact