Hacker News new | past | comments | ask | show | jobs | submit login
AlphaProteo generates novel proteins for biology and health research (deepmind.google)
304 points by meetpateltech 4 days ago | hide | past | favorite | 98 comments





In my humble opinion, this work is not that innovative: de novo protein binders have been done to death, either by AI approaches or otherwise. Check out the work by David Baker’s group, for instance. They have a myriad of examples already.

That being said, as others have commented, my hopes are that all these advancements lead finally to reliable design methods for novel biocatalysts, an area that has been stalling for decades, compared to protein folds and binders.


Agreed on the hopes that these methods lead to novel biocatalysts (but they aren’t quite there yet).

David Baker’s lab has recently published on using their own diffusion model (RFdiffusion) to design novel biocatalysts that perform hydrolysis using a catalytic triad of serine, aspartic acid, and histidine, as well as an oxyanion hole, which is much more complex than the binders designed by AlphaProteo [1].

It gives me hope that we’ll soon be able to design biocatalysts as good as natural ones, but for any problem we care about.

1. https://alexcarlin.bearblog.dev/novel-enzymes-from-a-diffusi...


Yeah, I checked the preprint. Engineering serine hydrolases is kind of a low-hanging fruit, but a great starting point nevertheless!

Yeah, I agree. Honestly I found AlphaFold 3 to be way overblown as well. Engineering new biocatalysts is exactly what my group is working on rn.

Maybe this is in the supplement of the whitepaper [0], but I would have loved to see more analysis of how novel the designed proteins really are.

In the whitepaper they mention that they are novel compared to other in silico design techniques, but to my knowledge other binders to VEGF and Covid spike protein exist and would already be found in the PDB database that Deepmind trained the model on.

This is not to minimize the results- if the history of ML is anything to go by, even if AlphaProteo does not currently beat the best affinity found by in vitro screens, I do not doubt that it soon will!

[0] - https://storage.googleapis.com/deepmind-media/DeepMind.com/B...


Might depend on what your measure of 'novelty' is in protein structure. A single residue change (for example) would not normally be considered a novel structure - it's just a mutation.

However, a new fold - that is, the shape that the backbone folds into - would be novel. Potentially also novel would be 'chimeric' structures with parts from other structures, as with chimeric domain swaps.

There was a structure designed by the Baker lab called 'Top7 - https://pubmed.ncbi.nlm.nih.gov/14631033/ that I remember as ground breaking at the time :) (in the ancient days of 2003 it seems ...)


Exactly. If the proteins suggested in this paper are very similar to known good binders in PDB then I am much less impressed by the results. You could argue they are generating a structure from the training set.

I want more info about how novel these proteins are.


They must be somewhat novel in that the wet lab work verified up to 10x stronger binding as predicted. I agree it would be interesting to see how they compare to known binding proteins

we've been able to design tight binders for quite some time now- the issue with synthetic designs is that they tend to bind a little too tightly. You want to have a reasonable off-rate and the ligand protein should do more than just bind, it needs to effect some sort of response from the bound protein.

When you look at these synthetics they often maximize for interactions of hydrophobic areas on the surface.


Two minute papers video on the subject: https://www.youtube.com/watch?v=lI3EoCjWC2E

i studied molecular biology and i couldn't help contain my excitement when it was able to bind to another protein. I dont think HN realizes how huge this is. With this level of accuracy, not only can we understand the full mysteries of ourselves but literally any biological entity.

With that level of understanding, its easy to fabricate special medicines that target specific biochem pathways, but more exciting is that we can literally "code in 3d world". We'll be able to print and grow organs in mass. We'll be able to design structures that will bind to target proteins responsible for certain traits. The potential boon to human medicine will be enormous.

I got like goosebumps after watching that video because I understood the implications of being able to predict folds and now generate proteins that will bind to any protein we choose!!!!

We just might have discovered a panacea of sorts and Demis and his team should receive the Nobel Prize.

I'm just ecstatic that we'll see so much drastic improvement in human medicine and importantly how accessible they will be with this new discovery.


> We'll be able to print and grow organs in mass.

I'm a PhD candidate doing my thesis work on stem cell models and tissue engineering for organ transplant...I think this technology is certainly a large leap forward but I think you are a little overzealous with this claim.


Engineering biology is much much harder than simply knowing approximately how to fold proteins. Obviously folding proteins is an important first step.

Arc Institute and others are already looking at whole cell models, i.e. systems biology. It's a totally different level of abstraction.


How do you feel about the potential bioterrorism alternate angle of this capability?

If you haven't heard the phrase "utility-scale molecular sensing" or given it any thought, please prepare to update every opinion you might have about bioterrorism, and ask Keltar how the MR1 is our greatest plausible non-authoritarian line of defense. www.molecularreality.com Keltar is the little green guy in the upper left.

I don't see it as much different from what we have today.

Nowadays I am pretty sure that a bad actor with enough money get their hands on strains of really bad stuff since many labs created those to do research. With enough $$$ I bet you can get their hands on them and then release them.

Perhaps, the only difference I see is that it could give you the possibility to (at least in the beginning) to somewhat target the spread. However, given enough mutations it is very likely you go back to an uncontrolled pandemic and so there is no difference from what someone could achieve today.

Additionally, IMHO, getting the blueprint in how to build your protein and then make it so that existing virus/bacteria can produce it, carry it, ... sounds harder and costlier than bribing someone to get something out of said labs.


Oh and don't get an account it's broken right now, I'm building the site and hardly anyone has been there yet and the Auth0 is fucked up. But just give your email to Keltar if you wanna followup.

LMAO this is some low effort LLM

How could you say that when you didn't chat with it at all?

There are already easier ways to do bioterrorism if you are thinking a virus that kills a lot of people.

it would be like committing terrorism using silicon wafers

you would have to infiltrate an extremely guarded facility

you would somehow have to bypass QA

its not like somebody on the assembly line for a new protein drug sprinkles a dose of PCP

one potential dual use could be somebody modifying a popular fruit with birds and then droppin seeds at the local organic farm fair

and then when those seeds are consumed by birds they produce poop dangerous for other animals to consume

you could absolutely screw around with the ecosystem, like whoever has access to this programmable "bio-wafer" will be able to play god totally undetected.

the problem is that "bio-wafer" manufacturing process will be very tough and regulated like the CNC machines used to manufacture jet engines depriving certain countries from being able to churn out their own jet engines


Its not regulated if a government does it.

> With that level of understanding, its easy to fabricate special medicines that target specific biochem pathways

The problem is to find the right target or pathway in the first place. Just go to opentargets.org. There are lots of potential targets by different metrics but for many diseases we haven't identified that single target that let's us improve the life of say, 20% of patients, for disease X.


Let’s see a real drug coming out of this then we can talk

In 2035 everybody can live forever.

That channel is just hype machine. Like half the stuff he says is barely comprehensible if you know the actual science.

When he started the video with "I had the honor of having an exclusive look at it" I knew it was all marketing.

Interesting work, but there's a huge sector they're missing - industrial enzyme and catalysis design. Most of this field is concerned with small molecule binding - methane, carbon dioxide, ammonia, methanol, acetic acid, etc. Binding is often just the first step, as you're typically trying to do highly specific chemistry, e.g. attaching a single oxygen to methane or a single hydrogen to carbon dioxide, etc.

Working in this area might also be good test of their technological approach, as small-molecule binding can be somewhat challenging, and even evolved biological systems can struggle to achieve high specificity.


I want to mention an interesting industrial enzyme project. If you ever saw the laundry detergent commercial "Protein gets out protein", this is referring to an industrial enzyme in laundry detergent. Many years ago, Genentech had built up a significant capability in proteases, which are proteins that cut other proteins into pieces. In the course of optimizing proteases, they made a thermostable, thermoactive protease. Although it wasn't super useful for Genentech in a drug discovery context, it was recognized that you could put an inactive enzyme into laundry detergent that would be activated when the hot laundry water hit the detergent, and the resulting protease would be good at cleaning stains (many stains are composed of protein- blood, food, etc).

Genentech set up a subsidiary with Corning (the glass company) that owns the IP for this protease and then licensed it to laundry detergent manufacturers; many billions of dollars in revenue. I think this is one of the original patents: https://patentimages.storage.googleapis.com/d9/ca/6f/2fb89ff...


My guess is that this area is much harder to break into–enzymes facilitate challenging chemical transformations by stabilizing high-energy transition states in chemical reactions. These states are usually highly transient and therefore much harder to capture using the structural biology techniques that generate the structural data that AlphaFold and similar methods are trained on. Even though there are many structures of enzymes in the absence of their substrate, I would imagine that the small number of structures for states that represent actual catalytic intermediates would make it difficult for a model to internalize the features that distinguish a good enzyme/catalyst from a bad one.

Another consideration is that most protein structure prediction methods only generate the backbone, and the sidechains are modeled in afterwards. Enzyme efficiency requires sub-A level structural precision in the sidechains that are actually doing the chemistry involved in catalysis, so it could also be the case that the current backbone-centric methods aren't good enough to predict these fine-tuned interactions.


Interested observer here, not an expert: My understanding is that they are using another model called FermiNet for chemistry research https://deepmind.google/discover/blog/ferminet-quantum-physi...

What is Google actually doing with these systems? Are they using it to develop new drugs themselves? Or licensing it to the pharmaceutical industry?

IIUC most of the commercialization is done through Isomorphic (https://www.isomorphiclabs.com/). My guess is that Google Research/DM itself wants to stay at the front of the field rather than develop drugs (of which protein design is really just a tiny contribution).

When I worked at Google I made a case for doing protein design/preliminary drug discovery using Google infrastructure and it was well received by the leadership. The leadership at Google is mostly computer scientists who know about, but can't actually do, leading-edge life sciences research, and they want to contribute some amount of Google's resources to advancing the state of the art. That's the only reason exacycle was permitted- because Urs thought we could maybe help save the world with protein design (and it wasn't a good approach because it wasted enormous amounts of power on unbiased sampling of large proteins).

Honestly I don't think Google proper is really a good place for this work to be applied, though. Their attention is easily diverted, they repeatedly fail to commercialize, and most importantly, potential partners are scared Google will steal their data, and replace their business.


I wonder why things seem to work well with Waymo? Google was never in the auto industry, but they were able to create a subsidiary that has become a leader in automated driving system.

Yeah, waymo is a bit of an outlier and I think it's got to be a directive from Larry to spend some amount of money/engineering effort to move it forward, with the expectation that it will transform the world into a better place (rather than generate a lot of revenue for Google).

So deepmind is doing more of the fundamental stuff and isomorphic the commercialization? I thought isomorphic would have been carrying the protein projects now since Google brain/deepmind, now Google deepmind, is focusing on catching up with openai and stuff and less on fundamental research

I know that you can actually use AlphaFold at least. My wife, microbiologist, told me she's used it a couple of times at work. I don't know what their monetization model is, if her lab had to get a license or anything. But I know scientists are using it.

I play Go recreationally. I don't think I can use AlphaGo (or its successors) directly, but the published research on AlphaGo has inspired other strong Go AIs. Online Go platforms integrate them to offer AI matches as well as analyze games between humans. I also know that professionally ranked players are adopting things learned from AI into their own play, and a lot of traditional joseki (analogous to chess openings) are being rethought based on insights from AI play.


It's licensed through Google Cloud as one hosted option, but also open sourced.

Alphabet has a medical division - https://en.wikipedia.org/wiki/Verily . My somewhat cynical take is that most extremely wealthy individuals would like to live longer.

But more immediately this is an interesting and relevant problem to solve, so it servers as a tool to benchmark and improve AI... and the current theory is that at some point AI will {generate unlimited amount of wealth | lead humanity into the post-scarcity society | solve all human problems by eliminating humans}.


They penned a 3B deal with Novartis and Eli Lily I believe

Cynical take - mostly to appear like a diverse tech company with lots of different products and services so they don't get regulated for their strength in the search advertising market.

I think occam's razor might suggest they're trying to diversify their revenue so if search declines, they have fallbacks.

could be both

This is being done by Deepmind and Deepmind's main founder has always said he wants to use AI to advance basic science. I think that is maybe why they fell behind on chatbots for a bit.

As a Google engineer, I think it's two things:

- Great for recruitment: You're the most talented $SKILL in the world? Come to the team that is pushing humanity forward in all the ways that matter.

- Larry and Sergei actually care about humanity, and being a billionaire is kind of a side-effect of what they would have done anyway.


Its not like google is the first company to have a will R&D department. Xerox invented the mouse. AT&T invented unix & c. Etc

Not sure if you know this, but Larry nor Sergei run Google right now.

I have a question that hopefully a molecular biologist can answer. Can tools like this potentially create protein structures that specifically bind in certain cells? Or is this more about a way of being able to create proteins for genes / structures we haven't been able to before?

I'm very interested in my research at the moment in pleiotropy, namely mapping pleiotropic effects in as many *omics/QTL measurements and complex traits as possible. This is really helpful for determining which genes / proteins to focus on for drug development.

The problem with drugs is in fact pleiotropy! A single protein can do quite a lot of things in your body, either through a causal downstream mechanism (vertical pleiotropy), or seemingly independent processes (horizontal). This limits a lot of possible drug target as the side-effect / detrimental effect may be too large.

So, if these tools can create ultra specific protein structures that somehow only bind in the areas of interest, then that would be a truly massive breakthrough.


For anyone who would like to know more about designing proteins with a certain function, target, or structure in mind, the term to search for is "rational design."

https://en.m.wikipedia.org/wiki/Rational_design


Thank you for this, terms of art are the silent gatekeepers...

As an aside, learning the precise terms for concepts in fields in which I'm a layperson (or simply have some cobwebs to shake loose)--and then exploring those terms more--is something that I've found LLMs extraordinarily useful for.

Also "off target effects".

This research is focused on modeling individual protein binding sites. Pleiotropic effects and off-target side effects are caused by interactions beyond the individual binding sites. So I don't think this tool by itself will be able to design a protein that acts in the way you describe (and that's putting aside the delivery concerns - how do you get the protein to the right compartment inside the cell?).

But novel binding domain design could be combined with other tools to achieve this effect. You could imagine engineering a lipid nanoparticle coated in antibodies specific to cell types that express particular surface proteins. So you might use this tool to design both the antibody binding domain on the vector and also the protein encoded by the payload mRNA. Not all cell types can be reached and addressed this way, but many can.


Yes, in principle but there are huge limitations and challenges to using a protein as a drug in living organisms. It has to be injected to avoid digestion, and a protein can't just pass into a cell, it needs to get in somehow. Current peptide drugs like insulin are identical to, or closely mimic natural small peptide hormones that bind to receptors on the outside of a cell. However, there is a possibility of using gene therapy to directly express a novel protein drug inside of the cell. A novel protein is also likely to trigger an immune response- so that type of gene therapy is mostly useful when that is actually desired, e.g. as a vaccine.

they can generate proteins that bind to specific structures with high accuracy, achieving true cell-specificity and avoiding unwanted pleiotropic effects involves many more variables beyond just protein-protein interactions. These tools are more about expanding our ability to target previously "undruggable" proteins rather than solving the cell-specificity problem outright. however they could be valuable components in developing more targeted therapies when combined with comprehensive research on pleiotropic effects across multiple omics levels. real breakthrough will come from integrating these protein design capabilities with a deeper understanding of complex biological systems and developing strategies for precise delivery and regulation of these novel proteins in vivo.

Not an expert, but you could imagine a protein with two receptors that are required for activation. One of them binds to a protein that is only present in the cells of interest, and the other one binds to the actual target.

[flagged]


What LLM wrote this??

So I write a completely defensible rant full of truly interesting and well-informed perspective, get downvoted, and then get accused of being an LLM.

A perspective I'm sure the down-voters have zero cred to doubt. Style issues aside, I don't think there's a serious molecular biologist on the planet who would take issue with the actual gist of what I said.

Pleiotropy: a thing happening can cause more than one other thing to happen. We really need jargon to keep that in mind?

An LLM? This is what I get for writing with passion? Creatively? Daring to play with words? I'm an LLM? For writing anything that doesn't fit your norms? Wow.

How do I get MORE downvotes? They seem like badges of honor in this case.


Question for bio folks here, and not to steal from the joy of this article but I've been recently curious how far are we from engineering something like a virus that targets a subset of the population (e.g. via specific genetic markers). This sort of tech being commoditized feels much much scary than the LLM safety talk - by a mile.

Making proteins is nothing like designing life or viruses. It's barely even related.

I'm going to play the devil's advocate and disagree.

The viral life-cycle comprises attachment/entry, replication and maturation/release. These stages are generally well understood to the point where 'disarmed' (replication-incompetent) viruses are routinely used as a delivery vehicle in molecular biology.

The first part, attachment/entry is directly related to protein-protein interactions (between the envelope protein of the virus and the entry receptor of the host cell). This particular interaction determines the tropism of the virus, that is, its capability to infect a particular type of cell. Examples include the interaction of gp120 protein of the HIV virus and CD4 of a helper T cell, or the spike protein of SARS-CoV-2 and ACE2 of nasal ciliated cells.

The parent specifically asked about targeting a group of people - designing an envelope protein (or proteins) targeting a specific HLA haplotype would probably get you halfway there (this is not advice).


Yeah I figured as much. Hence the "[t]his sort of tech" -- I imagine progress would be made soon on those fronts as well? Or am I mistaken?

This sort of tech is like inventing a new type of screwdriver and asking how it will affect car production.

Not helpful.

I saw a recent video about the errors of alphafold 3.

https://www.youtube.com/watch?v=E61wJXlENoE


That video is saying that AlphaFold isn't good at predicting how proteins bind to each other, but is actually pretty good at predicting the structure of the individual proteins. Which is exactly what you would expect, because AlphaFold wasn't designed to predict antibody/antigen bindings. That's what AlphaProteo is now trying to solve. Also, as the video points out at the end, AlphaFold is well aware that it's unable to do this task and is communicating that fact via its ultra low confidences in the accuracy of the positioning of the proteins relative to each other. So I am not really sure what this video is trying to prove.

This is interesting work but I think something has been intentionally overlooked. Creating proteins is difficult and it's also unclear how many of these sequences folded into the predicted 3d structure. Small molecule synthesis is still easier, cheaper, and more scalable than protein therapeutics. I think this would've been more impactful had they focused on improving on the SOTA small molecule - protein interaction models.

> it's also unclear how many of these sequences folded into the predicted 3d structure

The whitepaper depicts some successful cases, determined by X-ray crystallography or cryo-EM.


> Trained on vast amounts of protein data from the Protein Data Bank (PDB) and more than 100 million predicted structures from AlphaFold, AlphaProteo has learned the myriad ways molecules bind to each other. Given the structure of a target molecule and a set of preferred binding locations on that molecule, AlphaProteo generates a candidate protein that binds to the target at those locations.

I wonder how many prions will be accidentally created by this, or if it can even predict if a particular protein will have prion-like effects

It would be essentially impossible to create a new prion disease by accident- generating random-ish new things with methods like this would pale in comparison to the massive number of weird random-ish things natural biology is already creating in the wild.

However, this category of technologies could potentially be used to develop new prion diseases on purpose. As well as to develop cures for prion diseases that disrupt the misfolding.


>As well as to develop cures for prion diseases that disrupt the misfolding.

That seems quite plausible actually. You'd need something that can target misfolded PrP and bind it up so it can't do anything and then hopefully your targeting protein leaves normal PrP alone. A bit like an antibody.


The problem, from what I understand as a dabbler in protein research, is that PrP binds into these large very very stable semi crystalline fibers, (I visualize them looking like thick extruded complicated pasta shapes, where the 2d crosssection is kinda the shape of the outline of a single PrP). It makes it really hard to learn about the structure, actually, because x-ray crystallography requires repeated crystalline structures, and these are more like 3d polymer threads that bunch up and make things hard to image (though there's some more modern imaging techniques that are making headway). It turns out that these are very very stable configurations unfortunately and have very few ways to attach anything, and that's the precise problem with building binders. Plus, even worse, it turns out PrP might even be biologically necessary for mammals and we don't want to usually get rid of it wholesale [https://bmcbiol.biomedcentral.com/articles/10.1186/s12915-01...]

The context here is that prions are misfolded proteins that replicate by causing other proteins to change their configuration into the misfolded form of the prion. Diseases caused by prions include Mad Cow disease, Creutzfeldt-Jakob disease, and Chronic Wasting disease. All prion diseases are incurable and 100% fatal.

If the protein is a novel it does not matter, because it has no normal variants in the nature.

Couldn’t the underlying tech be applied to non-novel proteins by a bad actor?

Someone could fine-tune a model on pairs of existing proteins and their misfolded prions and then ask the system to come up with new prions for other proteins. ChatGPT found these 4 companies that will produce proteins for you just based on digital DNA that you send them:

- Genewiz (Azenta Life Sciences)

- Thermo Fisher Scientific (GeneArt)

- Tierra Biosciences

- NovoPro Labs


Whelp, time to move to a small island in the middle of the Pacific.

One of the few cases where Mars actually is a decent planet B.

Oh this would be 100x worse than the covid lab leak

One question is how specific the binding is -- what's the level of off-target effects, etc.

I am sorry for my naivety, but what is the practical benefits of this?

It varies, but as the article says it can be used for things like drug discovery. Imagine there's a new virus running rampant. It works by using a very specific protein to latch onto a cells so it can pull itself in. You would like to develop a drug to stop it doing that and one way to do that is to find a protein that wants to strongly latch on to an important part of the virus. If it's holding onto the virus the virus probably won't be able to penetrate cells because you're engineered protein will get in the way. This is part of how antibodies work to stop viral infections naturally

I only comment on hacker news posts about biology because I'm a voice crying in the wilderness about the most important goddamn startup on Earth, I think, maybe. www.molecularReality.com

This could go wrong, in many directions

what kind of model architecture was used for this? is it safe to assume they used a transformer model or a variant of it?

any resources to self learn biotech and how to use ML in biotech ?

yeah yeah whatever another protein discovered oh wow... When are we going to see actual results? Hurry up Deepmind!

(not to be confused with AlphaProto, which is helps with Google's core business of turning protocol buffers into differenter protocol buffers.)

It generates novel candidates doesn’t actually generate proteins, and none of these proteins have actually been generated to validate whether these candidates are shit or not

Did you read it?

Safety is the new gatekeeping.

new?

This is equivalent of ChatGPT generates novel code, but we didn’t run it. It probably works though.

Terrible take. The article details independent lab verification with researchers listed by name and a quote from them.

Making a comment without reading the article? Who would do such a thing?

It's extremely refreshing that DeepMind is still working on using AI to solve hard problems instead of attempt to put creatives out of work.

I wonder if the backlash they received from inventing transformers and then allowing OpenAI to eat their lunch has changed their attitude towards how they'll commercialize future inventions.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: