AlphaFold2 at CASP14: “It feels like one’s child has left home.”

resiros · on Dec 12, 2020

"Now we finally get to something interesting: how AF2 actually works! Alas, I will be able to say a lot less than I had hoped for, and here I have to do something which I very much dislike to do but feel that I must—call out DeepMind for falling short of the standards of academic communication. What was presented on AF2 at CASP14 barely resembled a methods talk. It was exceedingly high-level, heavy on ideas and insinuations but almost entirely devoid of detail"

I agree 100% with this paragraph. I have watched alphafold presentation and was as disappointed as Mohammed. Deepmind kept all the interesting parts of their method secret and Jumper dodged question after the next about the method. I am disappointed because deepmind would not had any chance solving this problem without the CASP community and advances shared over more than 20 years.

lrossi · on Dec 12, 2020

I fully agree with this as well. This is why I think it’s important for research to be carried out by academia: they have no incentive to hide their secret sauce.

wiz21c · on Dec 12, 2020

Yep. This is soooo sad. The mighty power of Google allows it to, once again, be the winner takes it all. And unfortunately, get into a position where results could be protected. So Google will probably leverage his commercial advantage for some years before enough details surface so that one can rebuild a solution that can be shared freely.

Note this protein folding thing was not solved by other mighty giants such as Johnson&Johnson, Pfizer, etc.

So basically, Google has even more power than before.

But to be honest, energy production knowledge (nuclear stuff) is not in the hands of governments but in the military-industrial complex... So it's history repeating...

refulgentis · on Dec 12, 2020

The civilizational struggle lens feels a bit overwrought, OP notes this was on the competition's timeline, not AlphaFold's, and Deep Mind is rushing a paper to follow soon.

YetAnotherNick · on Dec 12, 2020

I don't quite agree with this. I work in this field and it is quite rare to get the code along with paper by people from academia. In fact many a times big companies tries to replicate the the published work and publish the code/model like say TF Model Garden.

jhbadger · on Dec 12, 2020

It depends on what you mean by "the field". Do you mean machine learning? That might be the case given the corporate dominance of that technique. But in bioinformatics (the domain to which AlphaFold2 is applying ML to) it is generally expected that all data and code be made available for a tool. Otherwise what's the point? If people can't build upon it, science hasn't been done.

timr · on Dec 12, 2020

You're overstating the situation. The previous top-performing protein folding method is essentially impossible to reproduce from public literature, and available only via licensing agreement with the lab that created it.

More generally, while it's far more common to find open-source implementations of lab research than it used to be, it's still quite common for code to be closed, and important methodological details to be secret.

Even in the "open" fields of ML, today, I routinely find papers where essential bits of code, parameter sets, etc. are not made available.

throwaway2245 · on Dec 12, 2020

In bioinformatics and academic protein folding in particular, there has been a strong tradition of sharing work and building on others' work.

That's arguably why CASP exists, as part of a recognition that the field benefits from collaborative effort to solve the problem - e.g. via public identification of what approaches have worked.

emteycz · on Dec 13, 2020

How come the next best protein folding software is proprietary, then?

CJefferson · on Dec 12, 2020

Lots of AI competitions have requirements that competitors are open source, see for example SAT Race ( http://sat-race-2019.ciirc.cvut.cz/index.php ). This seems to be less the norm in deep learning areas, which is unfortunate.

epmaybe · on Dec 12, 2020

There’s plenty of incentive if you are at risk of getting ‘scooped’.. especially if your livelihood depends on a publish or perish model.

dnautics · on Dec 12, 2020

What is "in" and what is "out" of academia. Plenty of important research has been conducted in industry, for centuries, from the invention of student's t test through everything that irving langmuir discovered and nyquist sampling theorem. Some things have been discovered in "basically not academia", too, like the chemiosmotic effect.

dekhn · on Dec 13, 2020

I've seen many academics hide the secret sauce. Getting tenure is competitive and any paper you can publish that gets lots of cites is important. Leaving out tiny details to replicate work is common.

ignoramous · on Dec 12, 2020

DeepMind should re-read their own policies: https://deepmind.com/about/ethics-and-society

marcinzm · on Dec 12, 2020

Which part specifically? A quick read doesn't indicate anything about being open about the details of their methods. Mostly the policies are about the ethics of how AI is built and used.

ignoramous · on Dec 12, 2020

> We start from the belief that AI should be used for socially beneficial purposes and always remain under meaningful human control. Understanding what this means in practice is essential.

> We embrace scientific values like transparency, freedom of thought, and the equality of access...

emteycz · on Dec 13, 2020

That's wildly different from making everything publicly available.

ignoramous · on Dec 13, 2020

Not if you claim it is essential to "understand what it means in practice"?

emteycz · on Dec 13, 2020

But it is not. It is essential to do the same thing without spending any effort yourself, it's not essential for understanding the science.

fh973 · on Dec 12, 2020

Is it a scientifc result if it isn't reproducible? Would the result have been accepted if it was submitted by an unknown?

learnstats2 · on Dec 12, 2020

The reproducible part is that they can consistently fold proteins - which they can.

The fact that they're not telling anybody else how they did it stops it from being public science, but doesn't stop it from being science.

ISL · on Dec 12, 2020

There are two terms that may be conflated in this particular discussion.

Reproducible: Can it be done again?

Replicable: Can someone else do it?

AF2 appears to be reproducible, but not yet replicable.

https://phys.org/news/2019-05-replicability-science.html#:~:....

YetAnotherNick · on Dec 12, 2020

> Can someone else do it? > but not yet replicable.

Why not? I don't see a blocker except for the money to create a team and the computational resources. I think AlphaGo/Zero was harder to replicate than this, still Leela's team managed to get it working.

alphagrep12345 · on Dec 12, 2020

They mentioned that they'd publish a paper soon. Why is it a problem, then?

lrossi · on Dec 12, 2020

This is an interesting view on DeepMind’s protein folding result, from the perspective of a researcher working in the same field.

I liked in particular the section where he compares research work in the industry with academia.

Advantages of research work in the industry:

- team structure favors focused work: no administrative chores, fund-seeking etc;

- stable teams consisting of experienced professionals, instead of grad students who do not have as much experience, and labs with a lot of churn;

- large groups working together for the same goal in a “fast and focused” approach.

He also points out that AF2 has a large team of 18 coauthors, which is something very rarely seen in academia, where teams are almost always small.

I have seen the same in many CS fields. Major papers published by FAANGs always have very long lists of authors, in addition to dozens other engineers credited inside. It’s becoming very difficult for university labs to compete with this kind of groups, since it takes a lot of effort to bring together even small groups from 3-4 different professors.

The author points out that there are important drawbacks caused by research work shifting towards industry:

- little visibility into how they reach their results: there are no lessons learned about what was tried and failed, and why, at least not outside the company;

- it’s often not clear which parts of their solution are important and why, and why certain parts were fine tuned or combined in a certain way;

- dissemination of knowledge outside the company is often minimal;

- focusing on results and metrics minimizes exploration of related ideas that are not directly important for the current objective and metric.

On top of that I would add the risk of accepting misleading results due to limited peer review (see Theranos).

I think these points are spot on. While on one hand it feels like we are making major technological progress, on the other hand a lot of the knowledge and the ability to use that knowledge effectively may become locked inside a small group of companies. While their current intentions are noble, I think it would be better for society if major research work continued to be carried by institutions that operate in the public interest.

ISL · on Dec 12, 2020

He also points out that AF2 has a large team of 18 coauthors, which is something very rarely seen in academia, where teams are almost always small.

In physics, we routinely have author lists of hundreds to thousands of authors on the big experiments. When resources are pooled to make a measurement, author lists can be huge.

University groups can absolutely compete with teams from industry, especially in consortia. The big difference I see between industry and academia in this context at the moment is that academics have to submit a grant proposal and compete for 6-12 months with scientists nationwide for $1M in compute-credits, whereas someone inside a company looking to flex its ML muscles may be able to requisition the same with an email.

aftbit · on Dec 12, 2020

If computation is the bottleneck, one might imagine that crowdsourced computing like Folding@Home and BOINC could offer an opportunity for less resourced teams to compete.

rcxdude · on Dec 13, 2020

Such resources are only really good for a particular kind of distributed computing: mostly the kind which is easily verifiable, and easily broken down into small embarrassingly parallel chunks which need a lot of computation done relative to their size.

Training large machine models is almost the opposite of this: the datasets are huge and don't split up easily, and iterations (which touch the entire model and a chunk of the dataset) are relatively fast. Even with dedicated hardware a lot of the challenge in getting these systems to work effectively is in moving the data around between different compute nodes. Something like folding@home just will not function for such a workload.

rsfern · on Dec 12, 2020

While interesting, I think this model is only viable for a certain subset of computational workloads.

AF2 took like two weeks of computation on 200 TPUs to train. I’m not sure if they used a data parallel or model parallel distributed training approach, but I don’t imagine either would scale particularly well to a globally distributed network of worker machines in a folding@home analogue...

ml_thoughts · on Dec 12, 2020

The problem of experienced professionals vs grad students seems to be a real problem for science. Academia could really do with a build-out of experienced research scientists able to make careers out of consistently building experience and knowledge.

Instead most labs are oriented around a PI that has little time to look into details of most projects due to grant-writing pressures, post-docs who are not expected to stay long-term in lab but whose modal career advancement expectation is the incredibly difficult "land a tenure-track position" option, and graduate students who leave as soon as they've accrued enough scientific knowledge to be truly useful.

It's purely a cost-structure decision on a per-lab basis, but on a societal level it's costly to have most science being spearheaded by inexperienced graduate students (messing up experiments in ways more experienced researchers would not) and costly to train so many people in niche science fields that they then leave to join unrelated tech companies. Salary may play a part in this decision, but many scientists are passionate about their field of study and seems like the real deal-breaker is a lack of career options post "researcher-in-training" phase.

BlueTemplar · on Dec 13, 2020

There are plenty of grad students that do their internships in companies rather than academia. (YMMV depending on the field of course...)

beowulfey · on Dec 12, 2020

Not just 18 coauthors, but 18 co-first-authors! Implying everyone on the large team did equal contribution. Large teams of coauthors are not unheard of even in biology, but you rarely have more than 2-3 co-firsts at most.

jeffbee · on Dec 12, 2020

This seems in character for DeepMind. Last time I saw their org chart it was perfectly flat with absolutely everyone reporting directly to the CEO.

ma2rten · on Dec 12, 2020

I'm not sure when you saw their org chart, but it's definitely not true anymore.

jbmsf · on Dec 12, 2020

Coming from a software generalist perspective, I found this phrase very familiar: "experienced hands, high coordination, and focused research objectives—are great for answering questions but not for asking them".

The best engineering teams are excellent at execution, but you need something more to choose the right problems.

loufe · on Dec 12, 2020

I appreciate the criticism and many of the other commenters here make other great points. I'm left wondering, though, if they may still make the effort to break their research down and publish the detail everyone is searching at some point in the future. Is it not possible that they just haven't gotten there, yet, in the mad rush to polish the software and get ready for the competition?

BlueTemplar · on Dec 13, 2020

"requiring trial and error that is not dissimilar from how we used to build bridges before the advent of civil engineering"

How did we ?