Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What can I do to accelerate scientific research?
156 points by mariushn on June 15, 2019 | hide | past | favorite | 171 comments
I love science & tech and how these improve lives. As a software engineer/entrepreneur, in the last years I thought of starting/contributing to some projects which scientists would find useful. Now I'm ready to work full time on this.

Ideas revolve around

* indexing all open research with free unlimited access, similar to arxiv-sanity.com but better; Other projects exist though: Google Scholar; semanticscholar.org, academic.microsoft.com, https://www.chanzuckerberg.com/science/projects-meta

* generative design

* bioengineering (not sure exactly what, eg microbiota simulator)

* materials simulator (eg how can we get a material having a given set of properties)

I don't need immediate financial returns, but I do need the work to be used & have an impact in real life projects.

What ideas do you have on how one can accelerate scientific research?

This is a pipe dream of mine, but I would love a Wikipedia of null results: nullpedia! Nobody publishes null results bc they’re not exciting, but I think a lot of NSF money would be saved if “failed” experiments were aggregated somewhere in a searchable way.

There’s lot of questions. How to organize it? How to encourage participation? How to maximize usefulness while at the same time minimizing volunteer effort? How to encourage discussion (suggesting changes for a better exp design) rather than manipulation (stealing the seed of a bad experiment to publish at your better funded lab)?

I don’t know how to do it, but I think if done right people would really like it.

I am not sure this would be as useful as the new method of "pre-registered studies" [0], where the null findings appear automatically in the publications. Null findings are not the goal of this approach, but a side effect. Important are the initial hipotheses, which pass peer-review.

[0] https://www.nature.com/articles/d41586-018-07118-1 via https://news.ycombinator.com/item?id=18297724

The Series of Unsurprising Results (SURE) Journal is starting to do this, but only for economics papers: https://blogs.canterbury.ac.nz/surejournal/

Vox recently ran a good profile on them and the importance of null results in general: https://www.vox.com/future-perfect/2019/5/17/18624812/public...

Snopes and Mythbusters showed how you can actually make entertainment out of "This thing we thought might be true, actually isn't true / doesn't work at all".

As for the fairness issue, my first thought is to have it moderated by scientists but from different fields. It's like we do in other areas, like law. You need an expert on the general process, but if they have any personal connection to the topic at hand, they must recuse themselves.

In a sense, it's just a specific type of scientific journal, right? An online journal of only null results.

In a similar vein, I'd love to see some form of reward for replication studies, which are also hard to publish, since they're not novel and exciting. However, due to this focus on novelty, we got plenty of unreproducible results: https://en.wikipedia.org/wiki/Replication_crisis

In my ideal world, a scientific result would not be taken as meaningful until it's been replicated at least once.

Good idea, and great challenges indeed. Taking this further, I'd call it experimenpedia, with experiments published as they are thought out. Be open for comments/review and showing related experiments before the actual experiment being done. This might prevent potential failures and let owners tweak the planned experiment before being actually done. Then do the experiment and publish the results, whatever they are.

I guess secrecy would actually win though, and nobody would use this?

eLife is sorta like this. The experiment isn’t live, but the publishing process is.


It’s pretty interesting

To do this, it somehow needs to gain the reputation of a publication but also not encourage people doing just frivolous research (at least not on their own dime).

But I think this kind of thing would be INSANELY useful. Especially if data was attached. (This could possibly help with the above problem because reputations develop).

Great idea, would love to see it.

Related, there were journals like this at some point, e.g. International Journal of Negative & Null Results, or ournal of Articles in Support of the Null Hypothesis, and I think at least one more. Not sure how they fare, though.

Provide funding for permanent positions. I'm currently in the last few months of my first postdoc in condensed matter physics (think superconductors, quantum computers etc.) but will move into industry next year for lack of perspectives towards a permanent position in a reasonable place. As far as I can tell, my research so far was not substandard and the software I wrote has enabled quite a few projects which otherwise would have not been possible or taken much longer. Most people I talk to (both inside and outside academia) express some degree of disappointment over people like me leaving (after 4 years PhD + 2 years Post-doc) but none of them put their money where their mouth is.

To be clear, I can understand that a PhD candidacy justifies a temporary contract and I'm not even asking for a permanent position directly after a PhD (as would be standard in industry), I'm only asking for a reasonably safe perspective towards a permanent position reasonably soon after graduating. Can't exactly start a family if you don't have any kind of job security beyond the next couple months.

In the physics PhD version of “it gets better”, after a few years in a career all the past folks will fade away. You are making the right choice.

Your comment essentially echoes complaints from academia about the most capable ones leaving for private sector that often pays many times more compared to what they'd get,if lucky, by staying. Somewhat it has became a norm to bankroll grand buildings of little value instead of ensuring reasonably paying jobs that last more than just a year or so...

Excellent point. I cannot afford providing funding, but I can fund myself for 3 years to work on useful software.

> the software I wrote has enabled quite a few projects which otherwise would have not been possible or taken much longer

What's the common practice with such software? Is that published somewhere, open sourced? Or kept private in hopes of being monetized, with IP owned by the author/university?

At the moment it's "available within collaborations". My former supervisor has had some bad experiences with people using his open-sourced software without acknowledgement etc., which is of course not quite ideal if you actually want to build a career in academia for yourself. Monetisation is not really an option.

My toolkit is maybe a bit non-standard in that it has attracted a few external collaborators using it as well and I like to think I have taken better care of upholding coding standards, documentation etc.

Normally software in my field is kept within a group and dies after one or two PhD students have left.

This is a very sad state of affairs, therefore I would like to bring your attention to a petition towards open sourcing all scientific (and generally tax-paid) software:


This is a great initiative.It should not be limited to software: research papers, databases and many other things that are currently either not available at all or are behind pay walls should be released.

For research papers, there's also this open access initiative which is gaining support: https://www.coalition-s.org/ https://en.wikipedia.org/wiki/Plan_S

Does this call for avionics software for the F-22 (and similar taxpayer funded software) to be open sourced?

There should be definitely some exceptions from this legislation as usually.

Agreed. Just started my second postdoc job in pyhsics and 30% of my time goes into paper work for registering in a new country and at the same time applying already for new positions. This is by far the major time waste I see.

The biggest problem scientists in academia face is that they can’t do actual research. They spend most of their time doing stuff that doesn’t matter. So probably the biggest leverage you have is to reduce the time they spend doing stuff that doesn’t matter.

The most effective way to do that is to establish an alternative form of institution that focuses only on research (and teaching if you like - teaching is not the problem). A tough challenge. One line of attack would be to contrast your costs with academia’s full economic coating model.

Failing that, here’s some things you can do:

- develop a paper reference system that actually works well. Mendeley is the best there is and it’s (IMO) rubbish. Poorly designed.

- build a typesetting system that’s an alternative to latex but is WYSIWYG. Particularly important outside of CS.

- build a free conference organising / review tool that works. EasyChair is popular and utter garbage.

- build tools to automate the grant writing process. A step by step system to create a grant proposal, tailored to each grant scheme. Yes, this would potentially damage the grant application process. But it doesn’t make any sense anyway, so at least this would free up a few years where academics could churn the required proposals out fast until you were somehow disallowed.

- provide free slide materials, examples, exercises for all the main CS etc topics. Some kind of “piece it together” kit so lecturers could save time making slides and other materials that tick boxes. Diagrams, for instance, would be very useful. Pseudocode too.

NB you probably won’t make any money.

>develop a paper reference system that actually works well. Mendeley is the best there is and it’s (IMO) rubbish. Poorly designed.

Zotero works wonderfully for me. And it's open, which is a must.

I'll preface this by clearly stating that Zotero is absolutely indispensable; I wouldn't be nearly as organized without it. It's very important to me that such tools be open, and Mendeley in particular is a complete disaster in this regard (see their history with encrypting user data).

That being said, Zotero is very much a "least worst" tool in my opinion.

* Overly rigid in how it goes about modeling document types and metadata fields.

* Doesn't handle bookmarks, browsing history, and other various data types. At first glance it's easy to dismiss such things as out of scope, but I find that my typical workflow results in reams of such unsupported data being generated and manually tracked by me. The problem is that this unsupported data is often tightly coupled to the data I'm managing with Zotero, which is frustrating to say the least.

* An incredibly heavy and inefficient piece of software.

* It's far too difficult to set up and manage my own sync server (last time I checked, at least). I don't really want to share all my data with the developers, but it's very inconvenient not to do so.

More on topic with the broader discussion - knowledge and data management in general seems to be a largely unsolved problem, particularly in science and particularly regarding interrelations between and versioning of arbitrary pieces of data.

Why don’t data versioning tools like git lfs do the job? Is it lack of awareness or is the problem more complex than that?

Well I could just be unaware of some functionality they have, but all those tools do is version things. There's no integration with reference managers like Zotero (that I'm aware of), and no tracking of interrelations or metadata.

In contrast, Zotero (and other reference managers) don't do any versioning at all (at least that I'm aware of). Instead, they keep track of the metadata that's necessary to put together a works cited section for an academic paper.

... or at least that's what they started out doing. These days they also try to organize your papers into some sort of category structure, facilitate tagging and notes, provide synchronization between your devices, and probably a few other things that don't come to mind right now.

Feature creep? Sure, but all that stuff is central to the research and writing process. It's also all tightly coupled, so splitting it between multiple tools doesn't work very well. And that's the current problem - how to integrate, for example, a few of your browser bookmarks with your academic literature collection. Or how to track a list of all the papers cited by a particular paper. Or link a specific paper tracked by your reference management software against a specific version of a large data set, perhaps itself tracked by Git LFS.

Generalizing a bit, what about linking experimental notes (typically pen and paper) with data collection software (typically a binary), as well as the collected data (perhaps Git LFS), as well as a specific version of some data analysis scripts you wrote (perhaps Git). Now try to track everything as you work on multiple paper revisions with collaborators, each version of which adds (and sometimes removes) citations and could use a different (likely newer) revision of the collection software, data set, or analysis scripts.

Alternatively, for a data management scenario not directly involving writing papers consider molecular cloning using plasmids. You have a dozen semi-related tubes in a cryogenic freezer that you need to track over many years (ie long term inventory management), each of which has one or more pieces of sequencing data attached to it (so a small data set), they're all interrelated (you create a new one by physically modifying an old one), and each has the typical meta-links to experimental protocols, notes, academic literature, and other things.

I'm not aware of any software solutions that comprehensively address all of this stuff, so people still use pen and paper. But pen and paper is time consuming, it's error prone, it doesn't sync between devices, it's slow and tedious to cross reference - all the typical problems that software is good at addressing.

> - build a free conference organising / review tool that works. EasyChair is popular and utter garbage.

This is what we're doing at OpenReview.net. Obviously we still have a lot of work to do, but we're making progress and are actively looking for more developers that want to help make researchers' lives easier.

Slightly smaller-scale than most suggestions here, but for the average nonscientific HNer, the best way to help scientists is to improve their programming tools. In the Python ecosystem, for instance; numpy/scipy, scikit-learn, and matplotlib are widely used across dozens of disciplines, and are open-source projects relatively welcoming of new contributors. Julia is a whole new language for scientific computing, where all the fundamental tools are still being built and refined. Raspberry pis, 3D printers and other “hobbyist maker” tools are appearing in research labs to help develop novel instrumentation, so hardware-oriented people can help by contributing to open-source efforts of that kind.

Somewhere we computing folk lost our way - nobody needs software just for the sake of it (with the exception of maybe games). Everything we do is supposed to be building tools for other people who do the real work to make their life easier. As opposed to how it currently is which is to provide something for free, then put up artificial barriers to certain parts of it and charge for their removal.


Come work with us at Plex Research. A huge problem our founder ran into while doing drug discovery research was that there's tons of data in the world, but no one is using it because it's all in thousands of different places. As a programming, it was crazy to me that even the most advanced organizations in industry (Novartis, Sanofi) or academia (Harvard, Stanford, ...) are still keeping many important datasets in Excel.

We're pulling together all of the world's biomedical research data, structuring it as a graph, and allowing research access all of it as easily as Google does the web:


> indexing all open research with free unlimited access, similar to arxiv-sanity.com but better

This space is pretty crowded, in my opinion.

I don’t know much about biology, but I can tell you that in materials, it’s all about data. The materials design problem and predicting new materials comes down to knowing properties of other materials. A lot of progress has been made by using datasets generated by quantum mechanical calculations by the Materials Project, OQMD, AFLOW and NOMAD, but materials design is tricky because what we want to predict are the outliers that we haven’t seen yet: materials with the highest strength, etc..

There’s value to be created for materials researchers by curating experimental data in a digital, usable form, since so much is locked up in papers, but you really need domain expertise for this and there’s another problem that the experiments are so sparse and have so many features (chemistry, microstructure, thermal history, etc) that people have really only been successful when focusing on particular classes of materials.


You might find this company interesting.

I actually know and work with several people there :)

Appreciate your insight!

Note: The following is an earnest suggestion and not just a recruiting plug, but it is also a recruiting plug. If this is not an appropriate place for this content then comment and I'll remove (I do not frequently HackerNews)!

Personally, I believe the highest leverage thing we can do to promote scientific research is to help connect the dots between related discoveries in the service of bringing new ideas to commercialization/impact as quickly and reliably as possible. I joined Citrine (https://citrine.io/platform/) a few years ago thinking that we would help accelerate research by direct support - building simulation, data, and machine learning tools for scientists. It became clear very quickly that the community was already in a pretty strong position (smart cookies, those research scientists) on that front. The biggest opportunity turned out to be scaling expert knowledge - bridging the no man's land from ideation to scale-up, integration, and manufacturing. At Citrine, we're building infrastructure that helps researchers contribute to an organization-wide knowledge graph capable of supporting inference in the scale-up or manufacturing context based on relationships learned in the R&D phase. This fundamentally changes the ROI calculus for basic research because it can plausibly support the entire product life cycle.

If this sounds exciting to you, then consider joining us! We just raised a Series B and are growing quickly.

If you have an applied math and software eng. background and want to help generalize and scale our property inference infrastructure, then I think you'd enjoy working with me and my team in SSE: https://citrine.io/careers/#scientific-software-engineer.

If you have a backend software eng background and want to build distributed services for scientists and engineers at some of the biggest materials and chemicals companies in the world, join us in engineering: https://citrine.io/careers/#sr-backend-software-engineer.

If you want to help software related AI research, start compiling massive high quality training data sets and giving it away for free. Easier said than done though.

No immediate financial return? I hope you can accept no financial return, period. In general the easiest way for an individual to accelerate general research is through generous funding. But even then it’s not like a slider in a game where you provide more funding and things get done faster. There’s diminishing returns after a point. Not that I’m trying to discourage you, but I hope you’re thinking about it the right way before you waste a lot of time and money.

I suggest talking to actual researchers and asking them what they really need, and give them that. Basically the same as a startup going out and talking to customers. The only research probably being done around here is largely software related, and probably not changing the world much in ways that actually matter.

I second creating a large high quality dataset to advance AI. It would be especially effective when combined with a competition - Imagenet competition jump started DL revolution in 2012, and currently there’s no obvious successor. What’s lacking in AI is models with “common sense” and a “world model”. A dataset/competition to develop such models could cause the next AI revolution.

> I suggest talking to actual researchers and asking them what they really need, and give them that.

Will start to do that, thanks!

> No immediate financial return? I hope you can accept no financial return, period.

That would be ok for the next 3 years.

> That would be ok for the next 3 years.

No, I don’t mean no returns for three years, I mean no returns ever. You must go into this with both eyes open, don’t find yourself crippled later because you gave all your time and money away and have nothing to show for it.

What if he would sell models? A marketplace for trained models.

Some of this already happens. But you also have to realize that that is not in the spirit of science.

Got it. It's not, agreed, but nobody should work for free. In fact, I believe that doing what OP wants to do is more harmful than building a business around his intentions.

I agree, no one should have to work for free. But there are two competing aspects here. Science is about seeking out knowledge and advancing humanity. But unfortunately we need to buy food to eat.

I’d second this. Many labs, including ours, are happy to discuss challenges and opportunities with serious outside parties. If you emailed a few labs near you with a detailed explanation I’d expect you would get multiple responses.

Will do, thanks!

One of the big challenges for university researchers is trying to find talent and dedication in the enormous pool of undergrads and masters students. Nearly every professor relies heavily on the recommendation system or grabbing from the pool of students in a class. The problem with this is that it's often hit or miss with a net-neutral return on investment.

If I send a PhD student to spend a certain amount of time independently training two students, then I am investing that grad student's time into something that could be spent doing research. If one of those students is flaky and is mostly there to pad their resume, it's largely a lost investment (even more so if their work requires extra effort to fix). If the other is graduated by the following semester, the gained productivity might not exceed the other's loss by the time they leave.

The best situation is when you're working with an undergrad who is destined to continue on to the PhD program because you know they'll be dedicated and possibly have a head-start on their thesis work. If you could figure out how to connect these students with advisors using metrics and private social networking, then you will amplify their productivity significantly. You'll improve the likelihood professors will take on undergrads and potentially push researchers into the field earlier.

How you achieve this, I'm not entirely sure. Perhaps you can make it easy to set up competitions which test the skills you need. The winners will be asked to join the lab to work on some project. If some lab on the other side of the country which does similar research also wants to do the same competition, make it easy for them to share and run it.

I mean, you could pay them more than approximately nothing...

I have a request. Somebody needs to make matplotlib, but with a C API and lots of language bindings instead of only a python API. For example when I'm writing rust, my best option is to use a Python interop library that calls matplotlib...

Matplotlib is better than nothing, but... as I understand it, it was inspired by Matlab's plotting - and therefore is a bit quirky when translated to the Python paradigm. I use it tho.

I think having a built-in web server via a visualization library that shows the graphs etc in a browser is 'optimal' -- because then, the whole crap of e.g. dealing with Tkinter to make your windows goes away. You achieve OS independence, and could even use lynx for text-only.

The Python visualization space is vast. I believe 'something better' will shake out in the next few years.

Locking it down to the web platform is IMO too restrictive - if it was instead a low-level library you could access it anywhere, including the browser (if you wrote the server on top of the library). Sometimes, you want to use GtK, for example if you want lightning quick startup times, or weird portability, or anything else like that.

Python is kind of the gold standard in many areas of research, no? Why would you use Rust, or even anything but Python?

I'd only touch Rust (or C/C++) when I need to implement some fast numerical computation that does not already exist in Numpy or Tensorflow, but still call it from Python.

I've sped up a simulation 100 fold by porting a 20 line function to rust (which had a terrible access pattern for numpy), and could probably sped it up another 10 (which was closer to reasonable for numpy).

If the thing could have been written in rust in the first place, tons of time would have been saved on trying to optimize python, waiting for simulations to complete before (and to a lesser extent after) I ported a portion of it to rust. Dealing with language interop and build systems.

The main reason I can't suggest that for future similar problems to the person who I did this for is because of the lack of libraries like plotting (plotting is by far the most important one, numpy is second but rust comes a lot closer in that regard).

People who mostly do that weird numeric stuff find themselves wishing they could be free from that python interop, for example when 99% of your code is unique rust and at the end all you want is a graph. Another use case is when you want to write an application like ImageJ, to be used by (not necessarily computer-y) scientists other than yourself, where you want fast startup times and all-around good performance. Being able to do away with the python interop won't shatter the Earth but it would be nice. Besides, I love Python, everybody loves Python, but it's not the final end state of programming language evolution (isn't Haskell? ;).

Oh god yes how did I forget. Bury Matplotlib forever, please, someone, for the love of science.

Probably not what you want to hear, but putting a matplotlib REST service inside of a container would be a pretty good start. Apps can talk over a local IP, the interface should be usable from curl.

That's really close to how gnuplot is used.

Unix pipelines are the REST services of 1978.

Some interesting companies at the intersection of science/engineering that are hiring software devs:

- https://www.ginkgobioworks.com/ Synthetic biology engineering

- https://www.benchling.com/ Online LIMS

- https://neuralink.com/ Brain Machine Interface

- https://strateos.com/ Programmatic Cloud Lab (disclaimer, I work here)

Sci-Hub removes barriers to accessing scientific information. Their infrastructure is slow and has trouble. If you are looking for real impact, help them scale.

I don’t think this is a comprehensive answer, but if you want a nice summary of how climate change might be impacted by ML/DL applications, this is hot off the press: Tackling Climate Change with Machine Learning, https://arxiv.org/abs/1906.05433

There was a recent NOAA conference on this topic as well, you can view the slides here: https://www.star.nesdis.noaa.gov/star/meeting_2019AIWorkshop...

I think there's lot of room to improve tools in bioinformatics. In practice bioinformatics pipelines tend to be bundles of loosely organised python scripts and I've heard files on the order of a few GB described as Big Data because the processing times are so slow (days for stuff that could take milliseconds)

It would help to pair up with practicing scientists and explore what parts of their workflow can be improved

Definitely. A lot of work is done in R or python and could be sped up in a compiled language.

Yes. I’m not a scientist, but I think if there were good compiled python/R alternatives, the scientific world would benefit greatly, if not only for the reduced waiting times... Maybe a language with Go simplicity and speed and Python ease of use and appeal... It should have an almost real-time compile mode (with very little optimization) to enable interactive playgrounds, like Jupyter. Of course it’d need also strong optimization modes for final production code.

On the intersection of biology and machine learning, there is one of the holy grails of science: protein structure prediction.

I'd recommend starting reading about Google's AlphaFold, since this is currently considered state of the art in the field: https://deepmind.com/blog/alphafold/

> What ideas do you have on how one can accelerate scientific research?

I work in genetics (as a software engineer).

If there was a major flaw in current scientific research (that involves software), it's that most labs care more about getting published than they do about the reproduce-ability and validation of their work. This means most of the software written in research is ad-hoc, write once, and often never looked at again. It was put together for the sole-purpose of producing some output that could be put in a paper and then lost to time.

A current "holy grail" of software in research would be to fix that: empower other labs to validate and reuse the software written and reproduce the work of other labs with different data sets. And it is actively being worked on in a couple places (that I know of, perhaps more):

* https://genepattern-notebook.org/

* https://app.terra.bio/

* https://software.broadinstitute.org/wdl/

Some of these are just about giving the community a common framework to use for their software (CWL, WDL, Jupyter), others are about data storage and making it easily accessible for others to use in the cloud for reproducing results.

If you want to have a impact, joining one of these groups would probably put you in a much closer position to doing that.

If you just wanted to work on something in your spare time that would be incredibly valuable, then might I suggest this:

It's amazing how much work is done in the scientific community using CSV/TSV files (usually gzipped). And most of that work is done via perl, sed, and awk. And often these are huge I'm working with a VCF file (TSV) right now that's 2 TB in size ZIPPED! It's crazy. Researchers often don't have the time, resources, or know-how to put together a simple Spark cluster and use it.

A command line tool that allowed someone to run SQL (or SQL-like) commands on a gzipped CSV file FAST would be invaluable. And if it could JOIN across CSV files ... wow!

Thanks so much, mussung, for your excellent practical feedback! Both alternatives that you listed are very tempting.

May I please ask you some followup questions? There's no email in your profile. My email is marius.andreiana@gmail.com

> A command line tool that allowed someone to run SQL (or SQL-like) commands on a gzipped CSV file FAST would be invaluable. And if it could JOIN across CSV files ... wow!

What prevents one importing each CSV in a postgres db as tables, creating indexes and then start running queries? Disk space availability? (My local drive is only 1TB)

mariushn, I've sent you an email.

> What prevents one importing each CSV in a postgres db as tables, creating indexes and then start running queries?

There are many reasons:

* Experience/knowledge. Many labs don't have anyone experienced with databases.

* Security. Without proper dev ops a local DB is often out of the question. And when dealing with PII genetic data, [cloud] security can be a major concern.

* Funding. Machines cost money. AWS RDS instance cost money. Maintaining them costs money. Dev ops costs money. etc.

* Often times the queries being done are simple. For example, you may have a giant CSV with cross-ethnic trait data, but only need samples of African decent with a beta value > 0.1. Sure, you could spin up a database, load the entire thing into a table (O(N) + disk space + time), then index it (now it's O(2N) + more disk space + more time), and then finally run your query. Or you can just O(N) run over the CSV once and output the results with no extra disk space or "wasted" (perception of the researcher) time.

Finally, don't fool yourself about the capabilities of researchers. Many are code-savvy, but lack experience. Writing a SQL query is easy. Loading multiple TB of data into a relational database, indexed properly, and done in a manner that won't take days of time is a level higher.

Two things that might be of use

1. SQLite can easily import multiple GB of CSV data, process it in memory, or persisted to disk, a great tool for analyzing datasets up to the low 10s of gigs.

2. XSV, https://github.com/BurntSushi/xsv

Is anyone using Athena on AWS for tasks like this? It goes without saying that the documentation is hit or miss, but SQL-ish queries of flat files on S3 (even gzipped) can be a nice way to get the same result without managing Spark instances.

Obviously, I can't speak for "anyone", but all the groups I work with do not. There are a myriad of reasons for this, and reasonable people could effectively argue to the merits of those reasons.

It's also important to remember that some of the reasons AWS, Google, and other cloud services are often NOT used are legal in nature. For example, some EU laws prohibit any personally identifiable (genetic) data from studies being put in the cloud. So, even if summary statistics - or data with PII data removed - can be put in the cloud, work has to be done on that data to remove it.

DBeaver has some kind of engine which allows you to write sql queries against csv files. I think mysql also has csv storage engine which you may find useful to hack into something.

This will sound harsh, but it isn’t meant to be.

Scientists code better then you do science.

This is simply a consequence Of a weeding out mechanism for those that have no coding skills. The only ppl who get away with no coding skills are important professor with grad students to do the coding.

This isn’t to say that our skills are great, but a generic programmers (I.e. CS majors) science abilities are approximately zero (common, no thermo in an “eng” undergrad???)

So what can you do?

Since you mentioned science and not engineering, I’d ignore the AI advice. Science needs models based on mechanistic understanding of the underlying phenomena. A model that merely predicts is useful for engineers, not scientists.

“materials simulator (eg how can we get a material having a given set of properties)“

This is already done, but of limited usefulness. First the Mtls simulators are far from perfect. Then there is the problem of actually synthesizing the mtls. These simulations are more typically done to weed out bad candidates.

“No immediate financial return”

Wrong attitude. Only an attitude of “no financial return” helps science. That’s not to say you won’t make money off of it, but that can never be a goal since (true) science advances freely (again see the Gaussian jerk vs. Einstein or Landau - who contr. more?)

Instead, focus on making the programming tools scientists use better, easier to use and GPL. GPL is important because an MIT license by itself allows a scientist to use others work while blocking others (see Gaussian).

For example, making python (or Julia?) better would be one of the most important contributions you could make. The matplotlib guy was deeply mourned in science.

The two cents of a physical sciences researcher who once flirted with the Valley.

I actually strongly disagree with the message in the lead-in. All scientists can “code” better than OP can do science, yes, but most woefully lack any software engineering experience. Most postdoc research code would earn a failing grade from an undergraduate software engineering professor, or get themselves fired from a real world programming job. Think spaghetti code, lack of testing or continuous integration, non compliance with standards, etc.

I would say a cheap win for a coder would be to attack some domain where only rough research code exists and make it more reliable, scalable, better documented, interoperable, etc. a complete rewrite is probably required in many cases, but you have the working old version to compare against.

> or get themselves fired from a real world programming job. Think spaghetti code, lack of testing or continuous integration, non compliance with standards, etc.

Perhaps, but doing all those things would get them fired from the job they are currently in.

> matplotlib

And now a problem that would be great to solve is having vectored images and being able to change my fonts on size, lines are too small, I changed the paper format, whatever. I can't express how often I've had to redo plots simply because they don't look right in a paper.

I also cannot express the beauty of LaTeX but the absolute horror it is to create Tikz images. They are beautiful but it is definitely an art that one can never master. I want to do it with code, I don't want dumb gui interfaces that only work on certain machines and never work as expected.

If a nicer version of Tikz could be made that had a lot of power under the hood was created, this would help a lot of people. That's why matplotlib is so great. To do basic things is extremely straightforward. But if you want to do extremely complex things you also have that power. (Even something as simple as matplotlib for LaTeX - which results in vectored images - would be incredibly helpful)

Check out matplotlib's PGF backend: https://matplotlib.org/users/pgf.html

This is pretty nice and I will likely be using it from now on. Thanks.

But I do want something a little more native to latex. The major issue is that sometimes font sizes, axes, titles, even plot thickness doesn't look right in a paper. The issue is when you have a large plot and have to replot to fix these things. But vectored images will help.

> Science needs models based on mechanistic understanding of the underlying phenomena. A model that merely predicts is useful for engineers, not scientists

I'm not sure I agree. I'm aware of quite a bit of supercomputing time that is spent doing lattice QCD calculations (which apparently some scientists find useful), and though I'm no quantum physicist I'm pretty sure there is not much of a "mechanistic understanding" in QCD. I think your claim also doesn't apply to a lot of social science - psychology has a lot of functional models, but I don't think there are many mechanisms described.

I'll also state that modern science that doesn't require any engineering is pretty rare nowadays, so if a predictive model helps engineers that can then help scientists, the model has been helpful to scientists.

Ohm's law existed long before there was a mechanistic description behind it, and though it is mostly used for "engineering," I feel confident that a lot of scientists in the 19th century found it useful.

From https://www.olcf.ornl.gov/leadership-science/physics/:

"New Frontiers for Material Modeling via Machine Learning Techniques" - 40,000 hours allocated on Summit

"Large scale deep neural network optimization for neutrino physics" - 58,000,000 hours allocated on Summit.

Supercomputers typically do not allocate 58 million hours to things which are not useful.

I work with the DOE and was at ORNL before Summit was released (I got to play on Summit-dev). When making these models there is A LOT of exploration happening. There's a whole class of visualization techniques called "in situ" that visualize data as it comes off the press (memory is then dumped because there's neither enough storage space nor can we write to disk fast enough). I'll tell you that there will be a lot of restarting those simulations because the scientists need to explore the data as it is going on. Going in the wrong direction? Made a small mistake that causes cells explode? Realize you're not looking in the right region of interest? You restart the sim (thank god for restart files, right?). Exploration is one of the most important things in research and it is getting more and more difficult. I believe this is what the gp is after. Having these understandings helps you explore the data better. Creating these tools is hard work and takes a lot of collaboration too.

I guess "Mechanistic understanding" was meant to contrast against machine learning, not quantum mechanics. To elaborate, machine learning means fitting of a bunch of data by a given model. In science (eg lattice QCD) one often tries to theoretically (or computationally) explore regimes where data is not yet available. As a (former?) theoretical physicist, I am more than happy to admit that this is not immediately useful, though it will hopefully become useful in the long run.

Thanks, it makes sense.

> focus on making the programming tools scientists use better, easier to use and GPL

That sounds the most natural path to take going forward. Besides looking at existing GPL software and how that can be improved, would you have a recommendation on where to find scientists/researchers open to discussing their needs that could be solved by software? I'll release the software under GPL, but need to know I'm building something useful.

As a computational physicist (meaning I do science, but most of my time is spent programming), I agree with everything the parent said, but perhaps I can add some more specifics. The matplotlib example is a very good one in the sense that it's a piece of vital infrastructure almost everyone has used at some point. It works well enough for performance insensitive (meaning non-realtime and small-ish datasets) 2D visualization, has a lot of features and is easy to use. Other niches are less fortunate - for instance, for general purpose performant 3D visualization there's the bloated monstrosity that is VTK and little else, so I mostly just write OpenGL code by hand. That's annoying, but I haven't found anything that isn't outright terrible.

Other scientists, depending on their interests, will readily give you similar examples of obvious general purpose libraries that are lacking or non-existent, but there's a simple reason for this - it's hard, unrewarding work that's very hard to commercialize. Most of the large scale projects that exist have grown out of academic grants and often struggle for funding or are abandoned entirely. If you are really considering this as a career move that's eventually supposed to put food on the table, you need to have a pretty good idea of how your project is realistically going to earn money, because the correlation between funding and general usefulness is very weak in this space. Since academic funding isn't on the table for you, the common alternative involves things like biotech startups and venture capital.

>> Other scientists, depending on their interests, will readily give you similar examples of obvious general purpose libraries that are lacking or non-existent

I'd love to hear some of these, if folks on the thread can share more. Added 3d visualization to my list...

Symbolic computation libraries for Python are lacking - there's SymPy, but it can throw a "NotImplementedException" at you if you try processing some more hairy formulas... Would be great if SymPy was improved.

Have you looked into SageMath? (Technically it's "Python-based" rather than "just" Python.) I'm also curious if you're specifically looking for something in Python, or if doing symbolic computation/computer algebra in other languages would work for you, barring license costs.

I needed a Python function which computes a (symbolic) derivative of a certain known formula. I could've just computed the derivative in say Mathematica, and then write the Python function manually based on the derived formula, but the derivative function had hundreds of terms and there was no way I'd transpile it manually without introducing errors.

In the end, I used a hacky Mathematica script which converts a resulting Mathematica formula into a Python code (which I then pasted into my program). But, if SymPy was better, I could do all this in just Python.

BTW, according to Wikipedia, SageMath is just using SymPy for calculus.

I agree with this; scientific computing per se is best left to scientists and cannot be effectively done without the proper training. But there is a huge need in science for well-designed software. I'm talking about bread-and-butter CSE topics like basic UI design and documentation. This is where OP should concentrate their efforts in my opinion. A lot of research quality code is shockingly buggy and difficult to run. To give an example, if you are a biologist trying to run a shiny new machine learning method on your data, you are SOL in most cases unless the original authors went out of their way to enable that. For this reason a few PIs, really rich ones with big bio labs and f.u. grant money, employ full-time software developers, but this not possible for most people.

The trend in science is away from GUI's if that is what you are thinking, improved (although far from perfect) programming skills in the sciences and move towards reproducible research has heralded a move away from GUI's, documentation is another matter although that is becoming increasingly automated as well.

> science abilities are approximately zero

I’d say it really depends on your program and what you mean by science. I minored in BioEngineering, I also double majored in math. At least one of my CS final projects has a citation (which I recently discovered after looking at Google scholar).

My point is, what makes “science” skills may not match your expectations, but I’d argue many people have said skill set.

> Only an attitude of “no financial return” helps science.

I also take issue with this. Arguably all financial investments are a way of directing research. All research needs funds. How do we get most of the drugs we have today? It’s typical some research is done publicly, but the last “mile” so to speak, is done by private companies.

> generic programmers

I would argue that most would agree that a "generic programmer" does not have a degree in math or a minor in bioengineering. There are a lot of programmers who never studied any STEM outside of a CS curriculum, which usually has ~no science and rarely requires advanced math (i.e. requires only linear algebra).

I saw this list linked yesterday from another HN article, might give some useful jumping off points: https://github.com/kakoni/awesome-healthcare

One related thing I found from there was a list of projects for magnetic resonance imaging specifically: https://ismrm.github.io/mrhub/

I'd assume trying to contribute to those projects would hopefully give greater ROI than building a new thing (without a very specific idea of what to build and the market for it)?


Find any way to make a lot of money, gain a lot of political power or gain influence over those who have money or power and use it to fund research and make it more appealing culturally.

No, the most significant advances in science were done on a shoestring budget (or no budget at all) - just by THINKING. It doesn't require lots of money to support people who do that kind of work, and they are easy to find, especially in theoretical physics and mathematics.

Maybe it was true a hundred years ago, but it is not true today, unfortunately. Science became much more complicated.


It is still true. Before the tau neutrino was confirmed in a lab, it had been theoretized 25 or so years before that, which required nothing but pencil and paper. Modern quantum field theory doesn't require expensive gear. Math doesn't require any gear at all, and there is a whole pile of crucially important unsolved problems.

The fact that science became complicated is a sign of thought stagnation, not a sign of progress.

As a faculty running a lab in human genomics /genetic I would say either join a lab which needs your skills or work for a univeristy in support of their IT and high-performance computing needs. People with computational skills, an interest in doing/enabling research, and willing to accept a sub-market salaery are obviously rare. IT departments at universities are under-staffed and over red-taped, but most researchers depend on them to actually do work.

I would go talk to scientists and try to understand deeply what they’re working on, how they do it, and what systems (social/technical/political) allow them to do it. I’ve done this a few times and it’s always illuminating and inspiring.

Correct. How do you connect with scientists when not working in an university like you do? Getting an email from a nobody offering help for free does sound like a scam.

One answer would be "Enroll into an university". Others?

I just emailed them and told them I was doing research. Lots of people, especially students, love talking about their work.

Get an introduction. You almost certainly know some grad students, or have a acquaintance who does.

My dream would be to have a place I could look at any paper (even if not on a campus internet connection), be able to look at raw data, code, and have a forum to facilitate discussions between researchers.

I know the first won't happen for awhile, but it is a dream (open science).

The second is direly needed but in some cases not practical. But I think there could be a lot to gain from just having a small portion of this. It could help verify results significantly. Also imagine if people could research things without having to have all the fancy instruments. Some of this already happens, but I think it is harder to find and not always easy to sort though. It isn't connected to papers and research. Just having a paper and a link to the data would be tremendous help.

Similarly code. I don't know a researcher that hasn't made an accidental bug in the code that changes results (some slight, some major). I think we need to get over that WE ALL write hacky code. Hacky code is better than a vague description in a paper because you don't have enough room to write an accurate description of your model. Science is supposed to be replicated! Even linking papers with a github account would be tremendous help. Some people don't want to share code, and I think this is a shame and anti-scientific (especially if you are using public money), but that's a rant on its own.

Researchers email one another all the time. Some of these discussions should be public. Papers leave a lot of gaps. An area researchers could add extra notes that couldn't be fit in a page limit, where collaboration can happen, or where people can just ask questions, would be great. Replicating results can be hard and we should be learning from one another's hurdles. That's the point of science after all, for to push the progress of humanity. Lack of ability to communicate should not be a gate.

Along with these things it would be nice to encourage putting up null results. Aside a paper I would love to know what challenges a researcher fought. That's where most of the work is. It is funny, we constantly talk about failure being 80-90% of research. Whole projects failing or just banging your head on the desk because you can't figure out why something isn't working. Let's open this up. Let's help one another. Let's talk about what went wrong and how we fixed things to get to our success. I can't think of anything that would help science more than this.

I wonder if people are reluctant to do this because they would be scrutinized more and perhaps their phd, Grant etc might be at risk?

If so, is the only solution to give"credit" towards the phd, grant etc in the form of hours worked towards pushing knowledge acquisition, and not strict results?

I think it is partially that (but minor). Partially embarrassment (frequently code is rushed). Partially people view code/data as a trade secret. The latter I find anti-scientific, especially since a lot of funding comes from public money. I'm okay with holding on to it for a small period of time (because we live in an unfortunate world where sharing knowledge can't come first and people need to secure funding), but I don't think this should be a default mode that people do. Luckily it does seem that many are turning to GitHub/BitBucket/GitLab to make their code available.

And everyone that's in a PhD or has one knows that the majority of work you do is failing (but you learn from that failure).

Side note: I wonder if imposter syndrome would decrease if we were aware about one anothers' failures and didn't only see the accomplishments.

You might be interested in OpenReview, which is doing something like this, mainly in the ML/AI academic community.

(Disclaimer: I’m one of the developers)


This is different than what I'm seeking. I actually in favor of blind reviews. I don't think they are just good, I think they are essential.

Would you please send me an email to discuss further? marius.andreiana@gmail.com

There's no contact in your profile. Thanks!

Learning a lot how science actually works and how flawed it is I came to the conclusion that the biggest boost in scientific progress you could ever achieve is by eliminating science waste and doing more good science.

There are a few tiny steps in the right direction, but it's frustratingly slow. Once you understand a phenomena like publication bias it's hard to swallow that there are still empirical studies published without preregistration. There are so many studies published with such low quality that it's a complete waste, because the likelyhood that they're some statistical fluke is much higher than that they tell the truth.

Though the problems are known since a while, little has changed: https://journals.plos.org/plosmedicine/article?id=10.1371/jo...

That is a very click-baity publication (yes, it happens). Title suggests this phenomena is independent of fields. Content talks about the high p value. Not every field uses this metric. So this publication is aimed pretty much at medicine.

While I think the p-value thing needs to be rethought, it is bold to claim "Most Published Research Findings Are False". Honestly it would be more accurate to say "Errors matter and we can't take findings at face value and ignore error bars". Which is pretty much something every self-respecting scientist should know in the first place.

So as someone working in application of medicines, I'd agree with the comments that say those of us doing these things have learn to make code work, but are not software engineers (and would not claim to be). This means that if you can improve the tools that are used (and don't care about making a return), the leverage would be massive.

I imagine it isn't trivial (otherwise it would have been done by now), but imagine if R was automatically parallel processing; seldom is the code in my field worth figuring out how to make it parallel, but if it was automatically, then a hell of a lot of time would be saved! I know there are two packages that kind of make it work, but so far I've yet to hit a threshold making it worth it (I just leave thigns running overnight).

Abstract from that to projects like Zotero and you can see how you could have an impact on a lot of people by enabling them to do what they do.

I felt the same, and decided to learn how to do that. Followed this one guy i admired a lot. The way he is doing it is to merge software engineering practices into science (in this case AI and robotics), and reduce the cost to do iteration.

For us it means: 1. 1 click binary deployments 2. Safer iterations that allow for making mistakes, so that they can do 10s at the same time instead of 1 really safe one. 3. Logging visualization and whatnot on a unified infra.

We are not scientists, but we know damn well how to scale software and business. It applies everywhere. Think of tensorflow, before most people couldnt do ai themselves, now it is damn easy and more things will happen as a result.

This way they can concentrate on science while we concentrate on scaling it. We are betting on couple breakthroughs as a result of increased enthropy.

You can get a job at a company doing research. Improve their tooling and this make them research faster. SRI comes to mind in this regard.

Inside the company you can also push them to try and open source the tools. Unlikely that’ll work if they don’t do it already, however this skill set will eventually let you contribute more in the future. After some time on the job, start a personal open source project or start a company directed at some of the issues you saw.

It’s general advice and it’s the long game, but will likely help you have more impact.

The advice above may be more useful for engineers earlier in their career, but you can accomplish that in a handful of years.

Forget science: as an “army of one” you’re unlikely to make a contribution of sufficient magnitude that some needle might be moved in the right direction. Take advantage of the skills you do have: make money by doing what startup entrepreneurs do, then donate all proceeds to scientific research.

There may be the next Zuckerberg hiding in you. Imagine what would be possible if you could reach that level of wealth and use all that money to propel science forward. I’m not talking about some chickenshit foundation; real impact is made by committing all of your financial means to helping science. That’s how a real difference is made.

A lot, if not most of the important scientific discoveries were result of work of "an army of one". From biosciences to physics you can do amazing stuff. We don't need any new zuckebergs that siphone all of the great minds that are out there into showing more ads.

The individual scientific discovery was common in the early history of science, but has grown less common over time. Looking at this list [1], for example:

- Prior to roughly WWII, it was mostly individuals, and so we named things after them (Kelvin, Doppler, Joule, Ohm, etc).

- From WWII until the late 20th century, it was a lot of small teams (transistor, 3; DNA, 4; pulsars, 2; etc).

- Since then, team sizes have grown so that individuals aren't even named (cloning, an Institute; Top Quark, a Lab; Tau neutrino, a Collaboration; etc).

In fact, in the past 35 years, the only individually named contributors on that list were mathematicians who constructed proofs of long-standing unsolved problems in mathematics.

[1] https://en.wikipedia.org/wiki/Timeline_of_scientific_discove...

Side note: I am the co-founder of Aidlab - a device and a platform that is widely used by biomedical scientists and students in their research (https://www.youtube.com/watch?v=wY0YPOKNk88)

In my free time I am working on my next project - an open-sourced platform where everyone can contribute to help fight death. The platform allows uploading anonymized, structured health records - publicly available. Why? Dying is a number #1 problem that should be solved together.

OP, if interested, drop me an email: jdomaszewicz (at) aidlab.com

I presume your ultimate goal is to increase the average life span, but phrasing it as “fighting” or “solving” death sounds like some mad scientist stuff.

> The platform allows uploading anonymized, structured health records - publicly available.

Hi, I also started something similar, but there was no interest: https://juvmed.com/plan

Would you please share more details publicly about your plan?

...why should death be solved ASAP?

We haven't got the space or resources for exponential growth of population

Death has very little to do with the exponential growth of the population. You need to look at the other end.

Regarding the last point: try applying libraries like Quantum espresso [1] or CP2K [2] to real world problems and apply machine learning with the solutions they provide to a given problem. There's a tremendous amount of academic research being done in this direction but try and take these libraries and make them useful for real world applications.

[1] https://github.com/QEF/q-e

[2] https://github.com/cp2k/cp2k

As someone working in this field, I disagree. The state of these codes is that if you put garbage in, you get garbage out. Without a strong theoretical background, you’re generating junk. Of course, if he’s willing to spend several years developing his knowledge of the field, there is for sure room to help. If not in your suggested machine learning area, instead perhaps in improving the quality and usability of the code.

But instead I would recommend he start from something like the NOMAD database, where the calculations have already been run by more knowledgeable people. Then he can focus on the analysis side.

Of course you need to develop domain knowledge. But there's tremendous opportunity lying in these libraries, you just have to spend the time looking for it.

Thanks! Interesting, I'll look into both

There are a lot of citizen science initiatives you can participate in including folding at home and so on. Your best bet is going to be to find a field you love and find experts in the field to work with and learn from - theres definitely a lot of lack fo software talent in some areas that you'd be able to make a dent into but outside of citizen science initiatives, you need to start with understanding the problems that need to be solved which can be much more technical and difficult to understand than they ultimately could be to solve. Good luck!

Thank you!

Provide funding for high school students to spend a summer working in a research lab? The amount of funding doesn’t need to be high and it may pique early interest for the next generation .

>>> Indexing all open research

I'd also check out analytics platforms such as VIVO. End goal is a universal workflow for all research, discovery and collaboration. Solving this problem will have immediate impact. For example, computational epidemiology and containing Ebola outbreaks in "hot zones" hundreds of miles apart.

Web of Science


If you have the money to work full time on a personal project you have the time to sign up for a conference or three on the kind of scientific topic you like. Ideally, pick one near your home/in the closest metropolis.

Once there you can apply the following no-social-skills-needed guide to making contacts at a conference.

1. Watch the keynote.

2. After the keynote, walk up to pretty much anybody. Ask the following questions: "Do you think (keynote title) is a major area for (conference subject matter)?" and almost regardless of what they answer, you can follow up with "oh, really? Is what you're studying related to (keynote subject)?". Those two questions are enough to make virtually any academic launch in a paragraph-long exposé. Usually by the 6th conversation you have along those lines, you'll have a good summary of which subjects are considered important right now. If the person looks like they like you or are invested in talking about their topic, you can follow up by giving your contact info and saying you're interested in them sending you a paper on the subject they talked about (one of theirs if applicable).

3. There will be designated poster sessions. The posters are giant sheets of papers with young people in front of them. Walk up to a young person standing in front of a poster that doesn't look that slick but where the subject matter interests you (slick posters are from bigger labs where you are less likely to have an impact). Ask the person what they do, how long they've been doing it for, how big is their lab, and what's the most time-consuming step in their research right now.

4. If anyone asks you what you do, say you're an expert in (your computing area, web apps, cloud, data analysis, whatever) and interested in the intersection of (your area) and (conference name). Some of the people will say they think (your area) would be great for (their subfield) because of (super niche stuff you wouldn't have thought of). Grab their contact info so you can have a 1:1 meeting with them later where they find you a research subject.

Follow up by having email conversations with a few of the people. If a grad student looks like they can use your help, ask for an intro to the PI. Eventually you'll walk your way into a (possibly paid) research project. It's that simple. Thank you for caring about your world and its future.

Build tools to make research more efficient and reproducible. The research community is a small market and has little money. Building tools is often not regarded as scientific activity and can lead to dead ends if publication is pursued.

So, if you do not have to rely on income anymore and want to contribute to science, build tools for the community. Or better, come up with a way to help scientists to build their own tools fast and efficiently.

Thanks! Any other details / specifics would be welcome, since it seems you already have some knowledge in this area.

With my background in computer science and ML, I worked with genitesists to automate some of their observations.

A lot (almost everything) what they do routinely is still done manually. In genetics there are about 10 model organisms (I forgot the exact number) that most work is done on. Examples I have come in touch with are C. Elegans, D. Melanogaster and mice. A huge amount of work is done on these organisms and a huge amount of time is spent by grad students and post docs, repeating the same boring tasks day in and out. This is a big factor in deciding the breath of studies (you can only compare as many conditions as you can evaluate). Some things biologists do are subjective or prone to bias (even objective tasks like counting stuff becomes kind of biased after fatigue sets in if you have to count a lot of things and have to do it very often)

If you can automate their tasks, what will happen is that you will enable them to increase the breath of their studies drastically and potentially create more consistent measurements as well.

Software tools to help with scientific discovery / hypothesis generation. I'm thinking specifically about Literature Based Discovery[1] (LBD), but there are probably other useful approaches in that domain.

I have a little project[2] related to LBD, which I'm working on here and there. There's still a long way to go, but I'm optimistic there's something to be accomplished in this space that could help.

And then you've got things like Robot Scientist[3] which is crazy interesting.

If AI, in the AGI sense, interests you, I might mention the possibility of contributing to / working with one of the popular "Cognitive Architecture" systems like ACT-R[4], SOAR[5], or OpenCog[6].

[1]: https://en.wikipedia.org/wiki/Literature-based_discovery

[2]: https://github.com/fogbeam/Valmont-F

[3]: https://en.wikipedia.org/wiki/Robot_Scientist

[4]: https://en.wikipedia.org/wiki/ACT-R

[5]: https://en.wikipedia.org/wiki/Soar_(cognitive_architecture)

[6]: https://en.wikipedia.org/wiki/OpenCog

Solve the following problem:

"I want to find all research labs concerned with topic X in this area"

=> Create an (open) crunchbase/patreon for university research groups

Right now, each university has their own website, with their own layout and varying degrees of information. General scraping is impossible. Finding research groups with certain interests without trawling through an endless amount of publications in related journals or randomly meeting people at conferences is difficult. The only comprehensive lists are university ranking sites, but they are not detailed down to the research group level.

- create a crowdsourced, public index of running projects

- help labs find each other and collaborate on work

- help people apply to labs with open positions

- let donors and investors find and support projects more efficiently

- make science journalism easier, by connecting reporters straight to the source

The main difficulty is creating the network effect which is why, I suppose, no one has done it yet.

> help labs find each other and collaborate on work

Ideally, that would be awesome. But... do they want to? Will they put science & results above all, or they put first funding, egos and credits?

Science is great, but from my experience, these others are even more important for individuals. These support a career & family.

The ideas published in journals all depend on humans.

But to an outsider, these connections are completely invisible. Conferences cost thousands of dollars to attend. Subject matter experts are hard to find, so many connections are simply not made.

Which is a good reason to support such a platform, because it may make such views possible and existing deficits more visible and quantifiable :)

Small and medium size businesses outside academia could also use it as to improve their technology and connections.

Contribute to TaxonWorks http://taxonworks.org/. Software from a small endowed group that builds tools that support those who describe Earth's species. Software is completely open source, many opportunities to improve what's going on there.

I've helped one on one by working with someone who had a problem and needed some code a few times (sometimes getting paid, sometimes for free for a friend). I think that's higher impact than most of the remaining low hanging fruit. These where all 1 coder projects.

There are also larger coding projects in the sciences with teams. Running the database (and doing whatever else needs doing in code) for a telescope for instance. I don't have personal experience with them, but search around and you can probably find them.

The only thing I can think of that would have helped across the projects I've worked on would be better (more convenient, I don't care that much about how the output looks as long as it conveys the information) graphing/data visualization tools.

Re: materials simulator, you should totally check Materials Project [1]. I guess that they are open to getting some help in their github repos.

[1] https://materialsproject.org/about


I think the best way to accelerate scientific research is to recreate Bell Labs in a modern setting, free from the desire to succeed.

Get a huge endowment fund (the hard part), hire a bunch of smart and promising people, and tell them you want them to change the world.

I work on a project called OpenReview (https://openreview.net) that aims to improve scientific peer review by encouraging transparency and research on the peer review process itself. Right now most of our activity is within the machine learning academic community, but we have plans to spread into other areas of computer science and more.

We’re looking for developers, if you’re interested: https://codeforscience.org/jobs?job=OpenReview-Developer

Let me know if you want to learn more and we can find a way to connect.

I’m the engineering director at Quartzy (YC S11). We’re focusing on making research more efficient through the laboratory supply room. We help labs through a group buying experience to avoid waste, and offer discounts to their contract prices by working directly with manufacturers.

This isn’t the most obvious way to make an impact, but we’ve effectively saved many labs thousands of dollars on equipment and consumables that can then be reallocated toward their research. Our mission is to increase the efficiency of scientific research — the supply room is just the beginning.

Would be happy to chat - tristan@quartzy.com

>increase the efficiency of scientific research — the supply room is just the beginning.

Efficiency is often compromised institutionally, so I know this is an obstacle.

I've got a lifetime of business models for scientific software, but that wasn't the question above.

One way to accelerate many types of laboratory research would be to offer a workflow alternative where very high performance individuals can make the choice to spend almost all their working time at the bench(es) rather than being distracted or sitting down at desks or office & lab computers. While maintaining overall leadership of the organization, so this requires a different type of team structure as it scales.

That way someone who can really invent a lot doesn't get bogged down by the bureaucracy of their earlier inventions.

A shoutout for asking “what can I do” as opposed to “why is research so slow?”

A lot of journals (including the one I am an editor at) would probably consider leaving the fangs of elsevier and co and go full open access publishing. However, there are a few obstacles to that, such as missing open and reliable editorial and publishing platforms, contacts to libraries, etc.

In my opinion providing these things and actively encouraging journals to make the jump to open access would be a huge service to academia. Also, I wouldn’t be surprised if one could find funding for such a venture.

Donating money toward researchers. All of the following situations below may or may not have a useful outcome but direct investment would be the easiest one and one you can leverage.

The least/most you can do in this space is likely contribute actively in something you believe in.

Industry wide limitations are largely down to policy & regulations

Make programming accessible to scientists.

Today, running any compute/memory intensive experiment requires working through so many tools and terminologies (e.g. AWS, SSH, python, dependencies, virtualenvs). Scientists really need just Jupyter notebook like experience that's collaborative, reproducible, and powerful (i.e. run on any hardware with 1 click).

Check out https://www.dominodatalab.com/ - you've pretty much described their product.

> how one can accelerate scientific research

Easy peasy. Help (for example) this guy find funds for continuing his research: https://scholar.google.com/citations?hl=en&user=Q0w_e84AAAAJ

Edit: there are many like him.

You mention meta.org, you know you could work with us on it https://chanzuckerberg.com/join-us/openings/?team=engineerin... (search "meta" on that page)

Shameless plug: come work for Primity Bio. My team builds a bioinformatics web application that (among other things) is several orders of magnitude faster than the other tools, which fundamentally changes the scope of analysis that researchers can feasibly do. Email in bio.

to me, the biggest hurdle is the pretentiousness in the research world and the way paper is written. if someone can rewrite papers in plain english, removing all jargons, talking more about intuitions, that will help encourage more people to get involved in scientific research.

Here's a journal for publishing null results: http://www.jasnh.com/

Probably not the answer you would expect, but one very simple action you could do is to sign/share this petition: https://publiccode.eu.

I think you might enjoy this interview with Elon Musk:

Elon Musk - How to Build the Future


The job role for you might be as a "Research software engineer"

Surprised this comment isn't higher.

OP does indeed sound like the discipline they're looking for is Research Software Engineering.

Some reading: https://en.wikipedia.org/wiki/Research_software_engineering "Research software engineering is the use of software engineering practices in research applications. The term started to be used in United Kingdom in 2012, when it was needed to define the type of software development needed in research. This focuses on reproducibility, reusability, and accuracy of data analysis and applications created for research."



I just retired from 20 years as a Senior Research Programmer at an ivy league college. It was a very rare job position, there were maybe 5-10 people doing things similar to me. I supported a few specific research projects in computer science. My position was grant funded, which meant if the grant ended, the job ended, and I had a year to year contract, so no job security at all. Much of my job over the years was doing things like technical purchasing, administrative support, proof reading papers, and web research. Mostly stuff that nobody wanted to do because it's more fun to write software or come up with ideas. I also wrote software for the more complicated parts of research projects that students or post docs often didn't have the time or experience to deal with, for example for embedded systems which most CS people have little experience with or understanding of. I think my work made the rest of the team more productive, by leaving them time to work on the key innovations. The work I did is very different than what most software engineers do though. We built many systems as proof-of-concept, leaving out the functionality that would have made it a working system, which can be frustrating for an engineer. Quality assurance, user interface design, reliability, and more were usually ignored in the interest of speed to publication. The most important goal was to publish papers, not to build software that works well. The scientific goals were also set to match the publication goal, not to solve some important problem; we often spent months trying to come up with a problem our solution solved to make it seem more attractive in a publication. So one thing that would help advance science is to encourage more basic research, and more careful research, as well as more research directed at real world problems. Currently the granting agencies and the researchers decide what the important problems to solve are and this often seems to have little relationship to the problems extant in the world. Perhaps going into politics with the goal of changing how science is funded and how that funding is directed for specific purposes would have the most impact. Right now science essentially uses a seniority system, where people who have already made their reputation get to decide who gets funding and for what, and there is a big push for quick results. That arguably means that people who are generally behind the times and short sighted are determining what science focuses on. Another issue is that there is almost no follow up in science. A grad student and professor decide on a project, in part based on whether they think it will get published/funded, they work on it for years, then that's it, nothing more is done. It's up to someone else to take it further if they feel like it. There's little overall coordination of effort, it's a big free for all, except perhaps in medicine, although a lot of that research seems to be somewhat directionless too. There are fads in science too (Internet of Things is a current fad, because there is funding for it). I think people imagine that science is doing Manhattan projects to solve the big problems; it's not like that at all, it's more like a series of pet projects, done on a whim. I'm sure not all science is like that, but a lot of it is. Maybe one thing that would help is a multidimensional index of problems that need solving, science problems but also world problems, with dimensions including cost, estimated complexity, projected benefits, potential risks (cold fusion would be great but could also destabilize economies and shift power balances), time required, etc. Many students have a difficult time finding an idea to work on (they've spent their whole school life doing exercises and now they are asked to innovate.) Research needs to be much more cross-disciplinary too. So look at your skills, identify problem areas, and think about what kind of change you could encourage. At a minimum, you could contribute time to supporting specific projects. Getting paid for that is a problem though, your salary competition is students who work for "free" and underpaid post-docs hoping for a tenure track position. My pay was much lower than current salaries in industry. As I mentioned, the work is much different than what a software engineer usually does too, which can be difficult to adjust to; we made some attempts to hire software engineers and many of them quickly left when they understood the nature of the work. It's kind of like working at a startup, except there are no stock options, little chance of the "company" becoming "successful" (maybe a patent), your job is always at risk (Congress could change NSF funding and eliminate it), work is always short term, and you rarely get to finish anything, plus you have to do a lot of work that usually an administrative assistant would do.

Put yourself in survival conditions and use science to overcome it

Well nature did this to me and the committment to accelerated operation of a wounded laboratory (before it could have been lost altogether) yielded better progress IMHO on the same scientific instrumentation compared to leading PhDs in my field.

I wouldn't recommend it for everyone seeking commercialization because of accompanying roadblocks due to the systematic financial pressure against individuals coming from survival conditions.

But it is good knowing what this gear can really do.

Run BOINC, Folding@Home, or like distributed computing project. Many of those projects are run by education institutions.

Sign up for a prospective health cohort such as AllOfUs in the US or UK Biobank or equivalents in other countries.

Thanks! Since I'm in EU, I'll also contact biobank and ask if I can help with anything software-related.

an idea: create quality libraries for working with genome data.

Start a business

If you can find a problem, and solve it, a business is a fantastic way to make an impact!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact