Hacker News new | past | comments | ask | show | jobs | submit login
AlphaFold: Using AI for scientific discovery (deepmind.com)
363 points by sytelus on Dec 3, 2018 | hide | past | web | favorite | 123 comments

This is a fantastic achievement. About 8 years ago I'd an interesting conversation with a friend on how any material with arbitrary desired properties can be constructed using proteins. You can build something as hard as turtle's shell or as soft as jellyfish. You can build liquids that dissolves plastics or you can build the most flexible fibers known in the universe. One way to think about proteins is generalized parametrized materials. If you knew the inverse function of properties -> protein structure, it would change the world far beyond dollar value of the invention. The DNA mechanism is evolution's best effort yet to build exactly that. This was such a beautiful insight that I remember rushing to Amazon and getting few books to understand basics of protein folding. The subject has some extremely beautiful foundational simplicity that is easy to understand but it quickly gets complex enough that it would be hard to navigate without interdisciplinary mind melds. With this new progress, hope is that protein engineering through AI would get huge boost in community attention and more accelerated progress!

This is...rather exaggerated. Scores based on residue distance-matrices have been around for a long time and don't perform especially well, so it was surprising to me that the method described on the webpage would trounce the competition. Then I read the abstract....

While the description of method (page 11 from this paper: http://predictioncenter.org/casp13/doc/CASP13_Abstracts.pdf) is pretty vague, they make clear that much of their scoring and structure refinement uses the scoring function from Rosetta. That's a tell that the neural-network part of the method probably isn't sufficient to pick out good structures. The AI, in this case, is generating fragments (which is not exceptionally different from what Rosetta already does), and doing a beta-carbon-distance score.

Basically, the machine-learning part is generating protein fragments and quickly stack-ranking structures that are created by monte carlo search. Everything else is done by a much more complicated physical model that has little/nothing to do with AI.

How far are we from this technology and what can we do to get there?

Forget clunky mechanical robots, Boston Dynamics can just engineer fleshy bullet proof self healing skin system. Think skinned dogs.

It is our job at Serotiny. We take plain-language requests for proteins with novel functions (not just binds more tightly, or produces this enzymatic product over that one), and we use the catalog of existing natural protein components to build you a protein that has those desired capabilities.

Imagine any natural input that life can read (light, heat, glucose levels, hormone levels, force, etc.) and any natural output that life can produce (temp, colors, fluorescence, electrical impulse, etc.). For many of those options, we can design a novel protein that achieves a linkage between that I/O.

However, our approach to the problem is very much not like AlphaFolds - we don't try to scan the 20^600 space by changing individual amino acids, but rather we don't worry about folding or structure (too much) and instead play around with discreet functional modules that already exist in nature. Our approach is a bit more sociological than it is a simulation of physics/chemistry. But it works.

Optogenetics tools, CARs, SynNotches, BaseEditors are all curious examples, and there are many more coming online right now.

> How far are we from this technology and what can we do to get there?

Less long than you think; more funding ;)

I know that's a bit cheeky, but the ability to understand and properly manipulate the chemistry of life (not just the D/RNA coding) is finally coming online. Efforts such as the OP's are along those lines.

Many still do not understand (myself included) how revolutionary CRISPR and other such techniques are. The world of fine-tuned, genetically based, auto-organizing chemistry is within sight. And man, it is going to weird.

Academician Prokhor Zakharov was wrong, we will be able to put an elephant's nose on a giraffe [0].

[0] https://en.wikiquote.org/wiki/Sid_Meier%27s_Alpha_Centauri#U...

Or 'Skin jobs'. Blade runner, here we come!

Any particularly good books you can recommend that you came across on your Amazon binge?

I highly recommend Engines 2.0 rather than the original. Drexler's conception of nanobots before he had done the heavy lifting of working out the designs that showed up in Nanosystems probably aren't very practical compared to more factory assembly line structures. Thermal motion is just so much easier to deal with in an enclosed tube than at the end of a wobbly arm.

Essential Cell Biology is pretty good one to start (Ch 2-7): https://www.amazon.com/Essential-Cell-Biology-Bruce-Alberts/.... There are numerous books like Machinery of Life that don't go in to enough details. Most books specifically on subject of protein structure, dynamics and folding are advanced textbooks like Introduction to Protein Structure by Branden and Tooze. My experience has been that following all the details is very difficult without taking formal classes in these subjects and there is an immense amount of details that is probably not relevant to our purpose of computational prediction. Hope is that someone soon writes Protein Folding for Programmers :).

Machinery of Life is a fantastic introduction to an intuition for what proteins and the mechanics of their scale.


One of the big reasons I love this place ... always find great book recommendations. Thanks for that and the insight!

Aka, Polymer Science with a different name.

Polymers are a simple monomer repeated many times. Proteins can be arbitrary non-repeating sequences. These are not the same.

Not so.

There are of course, 'uniform' polymers made from a single repeating subunit. However, natural non-uniform ones like proteins are also polymers, as are synthetic ones with two repeating units (EVA, for example).

From wikipedia:

> Polymers that contain only a single type of repeat unit are known as homopolymers, while polymers containing two or more types of repeat units are known as copolymers.

Trivially, there are protein homopolymers like polyglycine. They are definitely proteins.

This is technically correct, but it doesn't really capture the spirit of what I was saying - an arbitrary long keyboard mash of the 20 bases is a protein, most things called "copolymers" are substantially simpler. Perhaps it is useful to differentiate between (simple) polymers and (arbitrarily complex polymers called) proteins, perhaps it is not. I can see why if you dealt with polyglycine a lot it would be annoying :)

You are quite right that the properties of most proteins are very different to that of even copolymers.

In fact, simple structural proteins (like keratin, collagen, etc) are much more like synthetic homopolymers than the complex nanomachines such as ATP-synthase or the proteosome.

Proteins are polymers, just not homopolymers (those are simple repeats).

Not really, this is more of a route to nanotechnology via biology. It’s more than just stringing together amino acids.

This is probably be the next step in human societies, small scale (both for ~production rate and size of ~tooling) natural hacking. This would avoid large chemicals shops that may or may not process waste properly and also bring back people into a closer relationship to ~nature in a way, hopefully saving energy on the way. You can't brand and market proteins much.

The blog is light on details but it appears Alpha is being used for branding here. AlphaFold seems to depart in a significant way from the perfect information game playing agent's architecture. A few questions spring to mind:

1) What is the architecture of the generative network and where exactly does it fit in the pipeline?

2) What is the interaction with the database? Is there an encoder being trained with real sequences further augmented with variations using the generative network?

3) What is the structure of the neural network that encodes the sequence? Is it a graph network, LSTM or simple conv-net?

4) The gradient descent step is very vague. Is it a physically based differentiable model (not a neural network) whose parameters are being optimized with gradient descent using automatic differentiation? Or something else? In short, there's some detail on scoring but how are the proposals being generated?

Questions aside, the results speak for themselves and are head and shoulders above all other showings. I wonder what it feels like for someone whose been in the field for years.

Despite the high score, there's still a long way to go before results reach real world utility. It's also worth keeping in mind that from a systems biology perspective, protein folding is only a small part of what makes getting clinically useful results difficult.

I might have missed something but I could not find anywhere indications of an intention to publish further details. That would be disappointing if such were indeed the case.

I'm not really "in the field" but I did some computational work on protein folding in the past.

I have a hunch on how they are doing this.

From the blog post it appears that the network is using angles and distances of aminoacids in a given sequence in know structures to predict good starting point(s) for regular molecular dynamics-based structural optimization (what is called the "gradient descent step" in the post).

If I'm wrong then I'll just have to try this approach myself one day...

Yes, it seems you're very much on the right track according to page 12, direct structure optimization: http://predictioncenter.org/casp13/doc/CASP13_Abstracts.pdf

The document also answers some of my questions. Some confusion from the fact that they are using multiple methods in parallel. The scoring networks are resnets. The database was used in training one of the networks.

The generative network section is still unclear, but it looks like it was used in one of their methods, fragment assembly step and network architecture is DRAW (trained with DB). Network generated Fragments inserted using simulated annealing, ranked by either just their conv-net or both existing methods and conv-net. Simulated Annealing hyperparameters were optimized with evolutionary search.

Their third approach is what is most similar to what you describe. It looks like they combine lots of approaches, including features, convnet scores and predictions from other methods. Still needs a lot of detail to be filled in such as more detail on how they integrate memory and how gradient descent is actually done directly on protein chain structures.

It's possible their computational advantage made a large difference in being able to arrive at this result; existing researchers need not feel too bad.

gradient descent != molecular dynamics; the latter is simulating forces on atoms, not attempting to optimize a target function. You can use molecular dynamics for optimization too - it can be very powerful, just slow - but I don't think that's what they're describing here.

I read it as they are using a molecular dynamics package to define a function (potential energy of the protein structure) and are using gradient descent to (slowly) reach stable extrema (static configurations of the protein).

That makes more sense, but the energy function isn't really a feature of molecular dynamics, anything that does molecular optimization or modeling will have this information built in.

You're absolutely right. My bad, mixed too many ideas at once :). Gradient descent (or similar techniques) can be used for structural optimization with the same atomistic models used in molecular dynamics.

I mean if you can have gradient descent minimize the free energy of the system then it should work.

It looks like some details might be published in the CASP13 conference.


For such a breakthrough, there is a good chance they are writing another article for Nature or Science, even aiming for another cover feature.

I would question them more bluntly on the accountability issue though: the scientific community needs full reproducibility of exceptional results these days more than ever, so they should just make an effort on that front imho. They do not need to open source their code and their data in full, just planning a peer reviewed, open demonstration may suffice?

The contest IS an open demonstration? You get unlabeled data, need to guess the labels and after you submit the labels the external comitee tells you how good you were. At this point it might be unreproducible, but the paper is about it being verified and demonstrated.

The name of the company is Alphabet. Of course the prefix Alpha is a branding label.

That makes it sound like they're betting the company on "deep learning" ("Alpha").

Ooh. Interesting connection! You have to wonder if that wasn’t in the back of someone’s mind during the restructure.

According to Page at least [1], it was partially due to how 'alpha' is used in the trading world - to denote active returns, ie. returns above a benchmark

[1] http://uk.businessinsider.com/why-googles-new-name-is-alphab...

Presumably this was timed to go with the start of the NIPS conference. Maybe they will present something there.

Not necessarily. The CASP13 meeting where the results of the CASP competition are released is also taking place just now (Dec 1-4).

Regarding gradient descent optimization over the global free energy of the entire protein structure, the CASP13 animated gif paints quite a picture. Have we ever seen anything with that degree of fidelity before?

Ability to do this at scale could mean protein folding is effectively "solved" and we can move onto the next phases of computational design in systems biology. Namely: protein interaction prediction, design of antibodies, vaccines, novel assemblies, and on and on.

There are >100K molecules in the Protein Data Bank. With new ones added everyday. But folding and interaction data is opaque. If immediate predictions could be made that will in turn create a feedback loop informing design.

Actually pretty excited about the implications for Zero-G protein factories. Pharmaceutical companies could be a commercial driver for space based protein crystal fabs.

The coming of age of de novo protein design


Many researchers have already moved on to the next phases, because those are ultimately far more useful problems than the pure guesswork of CASP contests. Of course better prediction tools will mean better designs too, but the existing tools were already powerful enough to yield some pretty impressive non-natural designs.

The ranking of the methods is here: http://predictioncenter.org/casp13/zscores_final.cgi?formula... - the improvement is very similar to ImageNet improvement when deep neural networks were successfully used in image recognition comparing to 'traditional' methods.

> Our team focused specifically on the hard problem of modelling target shapes from scratch, without using previously solved proteins as templates.

I have a feeling they’re going after the biologics market with this. Predict structure directly from DNA sequences, simulate affinities, then make a batch and test in-vitro. Throw in a loop to feed back data to make a better DNA sequence. Definitely heading down the road to automated protein design.

Predicting protein structure de novo is still a long way from actual drug discovery. You can. Generate antibodies much fasterif all you want is some protein that binds your target anyways

That’s just minimising the accomplishment, predicting structure need not be the end of the process for it to be a significant advance.

No, you're exaggerating. I work in the field. Ab initio structure prediction (what this is) is an interesting technical challenge, but has little to no direct impact on biologics (or any other kind of) drug discovery.

The tools and technologies sometimes end up translating but that's a long-term process.

I think you’re suffering from a lack of imagination. It’ll be interesting to see where closed loop directed evolution[0], which is what I was alluding to above, ends up in five years with these kinds of advances.


Ab initio structure prediction has nothing to do with directed evolution. It doesn't enable it or make it better. Directed evolution is a laboratory technique that depends on scale and cycle speed -- huge numbers of variants are screened rapidly against an assay during each round. Ab initio structure prediction adds nothing to the process.

You might argue that you could then predict the structure of the best variants, but predicted structures are all but useless for drug discovery.

Think of this: You have a digital library of sequences that are fed into an AI that predicts structure. The structures are fed into an AI that predicts the desired function based on that structure. You use an optimisation algorithm to identify candidates. You synthesise the most promising candidates and use those as the basis for generating a physical library, and then screen as usual.

You now have a way of navigating sequence space far more effectively, so you can explore more of it. You could also potentially use the results to feed back into the system regarding function, so it could become smarter over time.

You're describing what protein design researchers have been doing already for years already. Except for predicting the desired function, which AlphaFold doesn't do either - as the structural genomics projects of the 2000s found out, having the protein structure doesn't magically tell you what it does in vivo.

> You're describing what protein design researchers have been doing already for years already.

If that’s so, can you link to any supplementary material about it? Particularly with respect to how machine learning is being used, how the candidate selection process works etc. I’m curious about the subject.

> Except for predicting the desired function, which AlphaFold doesn't do either - as the structural genomics projects of the 2000s found out, having the protein structure doesn't magically tell you what it does in vivo.

Protein function prediction is a real thing, and it requires knowing the structure. Good structure prediction is a step towards this.

Sorry, to be clear, there usually isn't any machine learning involved (at least not in the examples I'm familiar with), but the rest of the process is very similar. My point is just that it's not a new workflow and it's not something that the existing tools can't do; better predictions can reduce the search space and/or the number of iterations, but unless they're suddenly an order of magnitude more accurate, it's still an incremental improvement. It's difficult to guess from the CASP results how well this approach will reduce the number of false positives, which as I understand it is a big bottleneck in the design process - IMHO that's a much more interesting problem to solve than ab initio prediction, although they're closely related.

> Sorry, to be clear, there usually isn't any machine learning involved (at least not in the examples I'm familiar with), but the rest of the process is very similar.

No problem, shame though!

It seems to me there must be scope for using AI to improve this process given the results it achieves in other domains, and the alphafold result is very encouraging. Maybe that order of magnitude improvement will eventually be possible.

To add to the GP, there are at least two more fundamental problems with the computational approach:

- chaperones. Not all proteins fold by themselves, quite a few bind to an additional protein that helps them fold in the desired shape. It means that the final state is impossible to achieve from the "initial" state by gradient descent.

- proteins don't necessarily exist in the minimum potential energy state. Moreover, sometimes the state flips on addition of a ligand (e.g. myosin's relationship with ATP) and that's crucial for the protein function.

So static folding only gets you so far. Unfortunately, nature is hideously complicated and "entangled", so there is a tremendous gap between even perfect protein folding and real in vitro results.

In molecular modeling, at some point you're limited either by quantity or quality of input data (for example known protein structures), by the accuracy of your energy function, or by the inability to simulate at long timescales. AI alone won't magically solve the problem without fixing the others too. (AI combined with quantum computing, maybe?)

I am much more excited about the application of AI to more complex problems like metabolic engineering/synthetic biology, literature mining, and genome-wide association studies. It's a shame the training data are such an incomplete mess, but that'll improve slowly.

Protein function prediction does not require knowing the structure. In fact, nearly all function prediction is done on protein sequence alone, using alignment algorithms.

When I look up the subject I see several structural function prediction methods. This being so I don’t accept that having better structure does not assist in predicting function, at least when using these approaches.

Are there structure-based methods? Yes. Does function prediction require structure? (i.e. what you said, above.) No.

It's not minimizing the accomplishment - it's an impressive technical achievement, but every new advance in computational modeling for the last several decades has been touted as a revolutionary step forward for drug discovery, and every time it turns out to be an incremental improvement at best. Actually producing a marketable drug is far more difficult than simulating how two molecules fit together.

It seems like this AI makes probabalistic guesses based on empirical data, so you need to validate the guess by comparing it with reality. Sometimes reality is not available and from my ignorant perspective, a much better way would be to discover the mathematical principles behind protein-folding and deductively know the exact folding with certainity. Focusing resources on a deductive solution to the problem is much better than creating a statistical guessing machine. AI solutions often seem to shift focus from deductive solutions to statistical ones and I don't think that is progress in the longterm

Grounding science in mathematical models is the ultimate goal. But usually, technological advancement and discoveries precede scientific discovery and modeling.

We think science that comes first and from science we derive technology. But it is usually the other way around. Technology comes first and from technology, we derive science and even mathematics.

A canonical example is the telescope. It's invention led to newtonian physics and calculus. Another is the microscope, which led to much of modern biology.

If anyone is interested in the history of technology and science, I recommend

"Science and Technology in World History: An Introduction" by McLellan and Dorn

Also, if you have netflix, they have a great documentary called AlphaGo. I don't play Go, but I was able to appreciate the documentary. If play chess like I do, there are lots of youtube videos on the AlphaZero's games against stockfish.

If DeepMind's system is as general as claimed and as competent in other fields as it was in Go and Chess, I think it's fair to say that it could give us insights into current and new science/mathematics.

It’s the only progress we’re likely to get on mathematically intractable problems, until maybe a few generations into quantum computing.

The hopes pinned on quantum computing seem to be like hoping holistic medicine cures cancer.

In this case we are talking about simulating physical systems. Most everyone in the quantum computing community recognize that this is likely the most impactful application of quantum computers currently known. As another sibling comment mentions, I do think you're being a little overzealous with your criticism in this case.

Correct me if I’m wrong, but that reads to me as thoughtless eye rolling cynicism. Is there a basis to it? Does quantum computing not offer our best shot at solving currently unsolvable problems?

> Is there a basis to it?

Yes. "Quantum" is used in so many wrong instances that the default reaction to a stranger using it probably should be eyerolling.

>Does quantum computing not offer our best shot at solving currently unsolvable problems?

Not in general, there are very few problems where shor's algorithm or grover's algorithm apply. There are many more intractable problems that are out of reach regardless of computing power. There are though a lot of 'unsolvable' problems today that are just a matter of more (non-quantum) hardware and software.

If you want to rid yourself of quantum computing delusions in particular, try reading Aaronson. https://www.scottaaronson.com/democritus/ and maybe for a primer on the sort of problem shor's algorithm can help with: https://www.scottaaronson.com/blog/?p=208

I can't really add anything to this reply, it hits the nail on the head with some references I've not seen before.

Thanks Jach.

Very well. Point taken.

It isn't even clear whether the class of problems quantum computers can efficiently solve is a strict superset of the class of problems classical computers can efficiently solve, before you even consider whether it's clear that we can build quantum computers with any reasonable number of logical qbits: https://en.wikipedia.org/wiki/Quantum_supremacy

Quantum protein folding paper has been released a month ago.

I think it's better fit than AI.

A long-running joke at CASP was that eventually, a neural network would be trained that could predict structures as well as Alexey Murzin. Alexey is the guy who would routinely "win" CASP by explaining how to memorized the sequence on a 6 months before from a preliminary structure poster at some random conference (he also helped create SCOP).

I imagine there are hundreds, if not thousands, of scientific problems that DeepMind could help with. It's just a matter of getting people working on the problems.

I wonder if they're gonna collab with Neuralink. Among all of Musk's startups those two are poised to synergize the most.

Deepmind is owned by alphabet, not Elon musk.

Mixed up Deepmind and OpenAI again, my bad.

Reading the description of the submission process is pretty interesting: http://predictioncenter.org/casp13/doc/CASP13_Abstracts.pdf. There were multiple submissions and manual synthesis of different model results in some cases.

I’m curious how this competition works exactly — it seems like a set of label predictions are submitted and some form of accuracy result feedback is provided (a single accuracy score for the whole prediction set?). And that there are a certain number of allowed submissions ...? How much of the ultimate strategy for playing this game at a high-level ends up being around optimizing for receiving as much leaked information from the test set as possible — is best guess at this point that this result is likely to be a good indicator of a true increase in prediction capability ...?

The targets are completely unknown. They were experimentally solved, but not submitted to the Protein Data Bank. You basically get a target every day (meaning sequence of amino acids) and then depending on the category, you have three days to predict it (I think upto two weeks for "human" servers). In the modeling category they participated in, you can submit 5 models. A model is a fully predicted 3d structure of the protein. The targets are mostly independent. Some targets were split over a few days. But in principle, a target on day X has no connection to a target on day Y. The targets have varying difficulty. There are two categories: TBM (templated based) and FM (free modeling). You don't know which protein corresponds to which category, you can just guess by looking at the available template data. They focused on FM targets. Meaning there are no homologous available. It's hard to say how good of an indicator the results are. Looking at the contact prediction results, many methods are getting very good at constructing MSAs (gathering similar sequences). We already saw this at CASP12 - I think the FM targets are getting "easier" in that sense. There is basically zero feedback throughout the whole competition. Some targets are released after the deadline (because of publications), but in general, you don't know anything until the CASP meeting, which currently takes place. The competition ended in August.

There are multiple neural network models which looks like they are aiming to do the same thing. I wonder deepmind won just because of computation power and expertise in training the neural networks or they come up with much better and different model than other participants.

This was a pretty well written article. It was slightly technical, but still accessible to the average lay-person.

Tangential, but great interview with Demis Hassabis here:


(skip to 01:30 for real start)

It's great that computational protein design is finally reaching a point where our computational and algorithmic capabilities have made it possible to actually start implementing legitimate real potential products. This is what people like D.E. Shaw Research, David Baker's Lab, Vijay Pande's Lab, and countless others have spent the better part of their lives on, and I'm incredibly excited for what we can and will achieve with this technology.

I wonder if team “zhang” is angry. They’re way ahead of the rest of the entrants and would have had an impressive result, if they hadn’t been blown out of the water by A7D.

I have wrote an intro to the problem for lay-person. To the best of my ability. I hope this will be helpful for some of you[1].

[1] https://blog.nilbot.net/2018/12/pipeline-protein-structure-p...

Is there something equivalent but for medicine? Essentially AI assisted medicine generation. If not, does anyone know what the field name is (I thought it was molecular biology but I might be wrong)?

You might be interested in computer-aided drug design [1], in particular computational target prediction. It’s a fairly big field (both in academic research and in commercial application), and not exactly new. This is technically a subfield of molecular biology but that term is extremely broad.

[1] https://en.wikipedia.org/wiki/Drug_design#Computer-aided_dru...

That kind of thing is generally thought of as machine-learning, or AI assisted "Drug Discovery", but there's not a great name for the field at large. Because there are so many different kinds of 'medicine' at this point, there are a number of companies that use similar tools for various definitions of medicine:

Small Molecules: http://www.twoxar.com/ & http://atomwise.com/

Antibodies: http://www.antiverse.io/

Multi-Domain proteins: https://serotiny.bio (my company)

Yes. People are interested in protein-drug interactions (protein ligand in general) as well as drug drug interactions... Small molecule structures (now helped by a fast cryoMR procedure). I bet the fast cryoMR stuff will be adapted to proteins too so we can get very good deep NN ML models in the very near future.


> But protein folding is far from a solved problem, fear not. XKCD’s take on this remains accurate! It’s going to be very interesting indeed to see the progress over the next few years in this area, but that progress is not going to be the discovery of some general solution. It’s going to be a mixture (as mentioned above) of better understanding of the physical processes involved, larger databases of reliable experimental data covering more structural classes, and faster/more efficient ways for searching through all these (both the possible structures and the real ones) and generalizing rules to tell us when we’re closing in on something accurate.

Is this able to solve all pending Folding@Home tasks?

The majority of ongoing Folding@Home tasks are not aimed at structure determination, but rather simulating the conformational dynamics of folded proteins (exploring the energy landscape rather than searching for the global minimum). Very few of the CASP algorithms are well-suited for this problem.

Possibly not as in new FahCores. They might be useful in conjunction to each other, as AlphaFold is useful for protein structure prediction, Folding@home can confirm the predicted structure through simulation.

Is the confirmation significantly faster or is the same amount of work needed?

How does it compare with foldit top entries? I thought semi guided way is the best way by far to solve this.

What's the path from protein folding to 'new potential within drug discovery?'

The path is most likely through reliable structure prediction of drug targets. That would open up rational drug design projects that may have previously been impossible. The only problem is that experimental structure determination is so good in pharma, that it's hard to compete. For example, on a structure-enabled project, it may be possible to experimentally solve multiple high-resolution 3D models per week with an order of magnitude higher accuracy than predicted models. Once you can routinely get structures, there's still the rest of the drug discovery pipeline left to go.

Ah, interesting. How do you experimentally verify that you've solved the 3D model? My first guess is that you can have an agent bind to targets with that structure and then figure out how much binding occurred. This is a very uneducated stab, though.

> experimentally solve multiple high-resolution 3D models per week

I thought this was only true if you already have a structure; otherwise you typically run into the [phase problem](https://en.wikipedia.org/wiki/Phase_problem), which is often a significant hurdle. But I haven't done much biochemistry in years, so there might be better approaches than MAD/MIR/etc. that make the phase problem a non-issue.

If you already have the structures, what is there left to solve?

It’s a fairly direct path, since computational drug discovery relies heavily on protein folding predictions.

How about they scale this up to predict human phenotypes from DNA sequences.

Deriving phenotype from genomics is partly a problem of data. To give a computer the best chance to definitively correlate phenotype with sequence, you need millions to tens of millions to hundreds of millions of DNA sequences from unrelated individuals from every possible genetic lineage, and you'd also need objective analysis of phenotypes without cultural influence, ie someone who is boisterous in one culture would be normal in another culture, independent of the actual genetics.

Computationally, this is statistical analysis, and I doubt that AI would be able to offer anything unique. Protein folding prediction, on the other hand, is more of a question of "where do you start to arrive at the answer most efficiently", and AI is well suited for this, and it would be much better than humans at prediction using methodologies and correlations far outside of human brain capability.

ok. but how about just faces? i wonder if it is possible to find enough data for that

It is hard enough finding the similarities between children and parents. The assumption of face phenotyping is that you could be able to roughly guess what someone will look like based on their grandparents, which seems unlikely for an AI to accomplish, and humans are designed evolutionarily to assess subtleties in faces to identify friends and family from foe.

Yes, some people do this e.g. extract a plausible face from DNA.

Is there any chance this could be used for materials science instead of drug discovery? I personally doubt the world needs more drugs.


Scary. I never thought I'd hate to see protein folding being solved.

It's not solved.

It's being solved. Given how important this is, we'll have very good models fairly soon.

why is it scary?

It's weird that people seem incapable of being optimistic about scientific and technological progress unlike say in the 50's where it seemed many genuinely believed we would explore space and have robots etc.

Now people just try to malign AI, genetic engineering, nuclear power etc.

It’s not unreasonable to be cautious. Nuclear accidents did happen, which is what lead people to fear nuclear power, even if it is safer than the public realises. Not barging full speed ahead into unknown technologies with large downside potential is only sensible.

The public is also afraid of nuclear technology because of nuclear weapons used to destroy the cities Hiroshima and Nagasaki with a single bomb each time.

I would argue that the weaponization of nuclear power, and a generation growing up with nuclear bomb drills, was what drove the fear of nuclear. Before that people were putting it in their toothpaste.

There seems to be a trend of indulging in oppressive fantasy.

Maybe it balances out any irrational exuberance about the arrival of new technological frontiers. Or maybe some folks are just grumpy.

the danger is when greed takes over e.g. as can be seen with social networks - an useful tool being milked in such a way that makes societies crack at their foundations. custom design of proteins could be used to treat and cure most diseases, but curing doesn't earn as much money and treating.

New science is always scary, and the better it works the scarier it is.

Gene technology has potential to develop truly targeted weapons. What's worse is that it could spread through anyone but only attacking it's intended target(s).

> Gene technology has potential to develop truly targeted weapons. What's worse is that it could spread through anyone but only attacking it's intended target(s).

so, no collateral damage? Targeted assassinations? This is the CIA's wetdream.

It is scary for legitimate reasons. It doesn't mean we should stop the advancement of science, but ignoring human tendencies to leverage the best of science to amplify the destructive abilities of worst in human nature is a bit on the naive side.

Whether we are wise enough to handle our intelligence is still an open question.

Is this science , or machine learning disguised as science to try and appear more useful?

Any data available to create an open source version?

Their methods don't sound particularly groundbreaking, the achievement seems to be in the implementation. I hope people don't jump to the conclusion that AI can do science because there isn't actually any experimentation here, just a statistical analysis of a few key amino acid properties.

I hope that they will be able to abstract some formulas or rules about protein folding from the mess of statistics. I imagine that having a rulebook would be much more efficient than using AI, because protein folding isn't so much a game of chance as it is an extremely convoluted puzzle.

The rulebook approach is basically what groups like David Baker's have been doing for years. However actually applying the rulebook is a difficult problem; I expect AI would actually prove much more efficient in the end, especially with optimized hardware.

This is my major gripe as well. Brute forcing isnt science. It's data analysis and machine learning. Not science. I dont know if i will ever trust an AI to do 'science' for me to be quite honest. This just seems like a marketing ploy to get more venture funding. Hey now we are't just playing games, we are doing real science. Please donate 5 billion dollars so we can keep running up our AWS bill. These science models wont train themselves!!?!

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact