In my humble opinion, this work is not that innovative: de novo protein binders have been done to death, either by AI approaches or otherwise. Check out the work by David Baker’s group, for instance. They have a myriad of examples already.
That being said, as others have commented, my hopes are that all these advancements lead finally to reliable design methods for novel biocatalysts, an area that has been stalling for decades, compared to protein folds and binders.
Agreed on the hopes that these methods lead to novel biocatalysts (but they aren’t quite there yet).
David Baker’s lab has recently published on using their own diffusion model (RFdiffusion) to design novel biocatalysts that perform hydrolysis using a catalytic triad of serine, aspartic acid, and histidine, as well as an oxyanion hole, which is much more complex than the binders designed by AlphaProteo [1].
It gives me hope that we’ll soon be able to design biocatalysts as good as natural ones, but for any problem we care about.
Maybe this is in the supplement of the whitepaper [0], but I would have loved to see more analysis of how novel the designed proteins really are.
In the whitepaper they mention that they are novel compared to other in silico design techniques, but to my knowledge other binders to VEGF and Covid spike protein exist and would already be found in the PDB database that Deepmind trained the model on.
This is not to minimize the results- if the history of ML is anything to go by, even if AlphaProteo does not currently beat the best affinity found by in vitro screens, I do not doubt that it soon will!
Might depend on what your measure of 'novelty' is in protein structure. A single residue change (for example) would not normally be considered a novel structure - it's just a mutation.
However, a new fold - that is, the shape that the backbone folds into - would be novel. Potentially also novel would be 'chimeric' structures with parts from other structures, as with chimeric domain swaps.
There was a structure designed by the Baker lab called 'Top7 - https://pubmed.ncbi.nlm.nih.gov/14631033/ that I remember as ground breaking at the time :) (in the ancient days of 2003 it seems ...)
Exactly. If the proteins suggested in this paper are very similar to known good binders in PDB then I am much less impressed by the results. You could argue they are generating a structure from the training set.
I want more info about how novel these proteins are.
They must be somewhat novel in that the wet lab work verified up to 10x stronger binding as predicted. I agree it would be interesting to see how they compare to known binding proteins
we've been able to design tight binders for quite some time now- the issue with synthetic designs is that they tend to bind a little too tightly. You want to have a reasonable off-rate and the ligand protein should do more than just bind, it needs to effect some sort of response from the bound protein.
When you look at these synthetics they often maximize for interactions of hydrophobic areas on the surface.
i studied molecular biology and i couldn't help contain my excitement when it was able to bind to another protein. I dont think HN realizes how huge this is. With this level of accuracy, not only can we understand the full mysteries of ourselves but literally any biological entity.
With that level of understanding, its easy to fabricate special medicines that target specific biochem pathways, but more exciting is that we can literally "code in 3d world". We'll be able to print and grow organs in mass. We'll be able to design structures that will bind to target proteins responsible for certain traits. The potential boon to human medicine will be enormous.
I got like goosebumps after watching that video because I understood the implications of being able to predict folds and now generate proteins that will bind to any protein we choose!!!!
We just might have discovered a panacea of sorts and Demis and his team should receive the Nobel Prize.
I'm just ecstatic that we'll see so much drastic improvement in human medicine and importantly how accessible they will be with this new discovery.
I'm a PhD candidate doing my thesis work on stem cell models and tissue engineering for organ transplant...I think this technology is certainly a large leap forward but I think you are a little overzealous with this claim.
If you haven't heard the phrase "utility-scale molecular sensing" or given it any thought, please prepare to update every opinion you might have about bioterrorism, and ask Keltar how the MR1 is our greatest plausible non-authoritarian line of defense. www.molecularreality.com
Keltar is the little green guy in the upper left.
I don't see it as much different from what we have today.
Nowadays I am pretty sure that a bad actor with enough money get their hands on strains of really bad stuff since many labs created those to do research. With enough $$$ I bet you can get their hands on them and then release them.
Perhaps, the only difference I see is that it could give you the possibility to (at least in the beginning) to somewhat target the spread. However, given enough mutations it is very likely you go back to an uncontrolled pandemic and so there is no difference from what someone could achieve today.
Additionally, IMHO, getting the blueprint in how to build your protein and then make it so that existing virus/bacteria can produce it, carry it, ... sounds harder and costlier than bribing someone to get something out of said labs.
Oh and don't get an account it's broken right now, I'm building the site and hardly anyone has been there yet and the Auth0 is fucked up. But just give your email to Keltar if you wanna followup.
it would be like committing terrorism using silicon wafers
you would have to infiltrate an extremely guarded facility
you would somehow have to bypass QA
its not like somebody on the assembly line for a new protein drug sprinkles a dose of PCP
one potential dual use could be somebody modifying a popular fruit with birds and then droppin seeds at the local organic farm fair
and then when those seeds are consumed by birds they produce poop dangerous for other animals to consume
you could absolutely screw around with the ecosystem, like whoever has access to this programmable "bio-wafer" will be able to play god totally undetected.
the problem is that "bio-wafer" manufacturing process will be very tough and regulated like the CNC machines used to manufacture jet engines depriving certain countries from being able to churn out their own jet engines
> With that level of understanding, its easy to fabricate special medicines that target specific biochem pathways
The problem is to find the right target or pathway in the first place. Just go to opentargets.org. There are lots of potential targets by different metrics but for many diseases we haven't identified that single target that let's us improve the life of say, 20% of patients, for disease X.
Interesting work, but there's a huge sector they're missing - industrial enzyme and catalysis design. Most of this field is concerned with small molecule binding - methane, carbon dioxide, ammonia, methanol, acetic acid, etc. Binding is often just the first step, as you're typically trying to do highly specific chemistry, e.g. attaching a single oxygen to methane or a single hydrogen to carbon dioxide, etc.
Working in this area might also be good test of their technological approach, as small-molecule binding can be somewhat challenging, and even evolved biological systems can struggle to achieve high specificity.
I want to mention an interesting industrial enzyme project. If you ever saw the laundry detergent commercial "Protein gets out protein", this is referring to an industrial enzyme in laundry detergent. Many years ago, Genentech had built up a significant capability in proteases, which are proteins that cut other proteins into pieces. In the course of optimizing proteases, they made a thermostable, thermoactive protease. Although it wasn't super useful for Genentech in a drug discovery context, it was recognized that you could put an inactive enzyme into laundry detergent that would be activated when the hot laundry water hit the detergent, and the resulting protease would be good at cleaning stains (many stains are composed of protein- blood, food, etc).
Genentech set up a subsidiary with Corning (the glass company) that owns the IP for this protease and then licensed it to laundry detergent manufacturers; many billions of dollars in revenue. I think this is one of the original patents: https://patentimages.storage.googleapis.com/d9/ca/6f/2fb89ff...
My guess is that this area is much harder to break into–enzymes facilitate challenging chemical transformations by stabilizing high-energy transition states in chemical reactions. These states are usually highly transient and therefore much harder to capture using the structural biology techniques that generate the structural data that AlphaFold and similar methods are trained on. Even though there are many structures of enzymes in the absence of their substrate, I would imagine that the small number of structures for states that represent actual catalytic intermediates would make it difficult for a model to internalize the features that distinguish a good enzyme/catalyst from a bad one.
Another consideration is that most protein structure prediction methods only generate the backbone, and the sidechains are modeled in afterwards. Enzyme efficiency requires sub-A level structural precision in the sidechains that are actually doing the chemistry involved in catalysis, so it could also be the case that the current backbone-centric methods aren't good enough to predict these fine-tuned interactions.
IIUC most of the commercialization is done through Isomorphic (https://www.isomorphiclabs.com/). My guess is that Google Research/DM itself wants to stay at the front of the field rather than develop drugs (of which protein design is really just a tiny contribution).
When I worked at Google I made a case for doing protein design/preliminary drug discovery using Google infrastructure and it was well received by the leadership. The leadership at Google is mostly computer scientists who know about, but can't actually do, leading-edge life sciences research, and they want to contribute some amount of Google's resources to advancing the state of the art. That's the only reason exacycle was permitted- because Urs thought we could maybe help save the world with protein design (and it wasn't a good approach because it wasted enormous amounts of power on unbiased sampling of large proteins).
Honestly I don't think Google proper is really a good place for this work to be applied, though. Their attention is easily diverted, they repeatedly fail to commercialize, and most importantly, potential partners are scared Google will steal their data, and replace their business.
I wonder why things seem to work well with Waymo? Google was never in the auto industry, but they were able to create a subsidiary that has become a leader in automated driving system.
Yeah, waymo is a bit of an outlier and I think it's got to be a directive from Larry to spend some amount of money/engineering effort to move it forward, with the expectation that it will transform the world into a better place (rather than generate a lot of revenue for Google).
So deepmind is doing more of the fundamental stuff and isomorphic the commercialization? I thought isomorphic would have been carrying the protein projects now since Google brain/deepmind, now Google deepmind, is focusing on catching up with openai and stuff and less on fundamental research
I know that you can actually use AlphaFold at least. My wife, microbiologist, told me she's used it a couple of times at work. I don't know what their monetization model is, if her lab had to get a license or anything. But I know scientists are using it.
I play Go recreationally. I don't think I can use AlphaGo (or its successors) directly, but the published research on AlphaGo has inspired other strong Go AIs. Online Go platforms integrate them to offer AI matches as well as analyze games between humans. I also know that professionally ranked players are adopting things learned from AI into their own play, and a lot of traditional joseki (analogous to chess openings) are being rethought based on insights from AI play.
Alphabet has a medical division - https://en.wikipedia.org/wiki/Verily . My somewhat cynical take is that most extremely wealthy individuals would like to live longer.
But more immediately this is an interesting and relevant problem to solve, so it servers as a tool to benchmark and improve AI... and the current theory is that at some point AI will {generate unlimited amount of wealth | lead humanity into the post-scarcity society | solve all human problems by eliminating humans}.
Cynical take - mostly to appear like a diverse tech company with lots of different products and services so they don't get regulated for their strength in the search advertising market.
This is being done by Deepmind and Deepmind's main founder has always said he wants to use AI to advance basic science. I think that is maybe why they fell behind on chatbots for a bit.
I have a question that hopefully a molecular biologist can answer. Can tools like this potentially create protein structures that specifically bind in certain cells? Or is this more about a way of being able to create proteins for genes / structures we haven't been able to before?
I'm very interested in my research at the moment in pleiotropy, namely mapping pleiotropic effects in as many *omics/QTL measurements and complex traits as possible. This is really helpful for determining which genes / proteins to focus on for drug development.
The problem with drugs is in fact pleiotropy! A single protein can do quite a lot of things in your body, either through a causal downstream mechanism (vertical pleiotropy), or seemingly independent processes (horizontal). This limits a lot of possible drug target as the side-effect / detrimental effect may be too large.
So, if these tools can create ultra specific protein structures that somehow only bind in the areas of interest, then that would be a truly massive breakthrough.
For anyone who would like to know more about designing proteins with a certain function, target, or structure in mind, the term to search for is "rational design."
As an aside, learning the precise terms for concepts in fields in which I'm a layperson (or simply have some cobwebs to shake loose)--and then exploring those terms more--is something that I've found LLMs extraordinarily useful for.
This research is focused on modeling individual protein binding sites. Pleiotropic effects and off-target side effects are caused by interactions beyond the individual binding sites. So I don't think this tool by itself will be able to design a protein that acts in the way you describe (and that's putting aside the delivery concerns - how do you get the protein to the right compartment inside the cell?).
But novel binding domain design could be combined with other tools to achieve this effect. You could imagine engineering a lipid nanoparticle coated in antibodies specific to cell types that express particular surface proteins. So you might use this tool to design both the antibody binding domain on the vector and also the protein encoded by the payload mRNA. Not all cell types can be reached and addressed this way, but many can.
Yes, in principle but there are huge limitations and challenges to using a protein as a drug in living organisms. It has to be injected to avoid digestion, and a protein can't just pass into a cell, it needs to get in somehow. Current peptide drugs like insulin are identical to, or closely mimic natural small peptide hormones that bind to receptors on the outside of a cell. However, there is a possibility of using gene therapy to directly express a novel protein drug inside of the cell. A novel protein is also likely to trigger an immune response- so that type of gene therapy is mostly useful when that is actually desired, e.g. as a vaccine.
they can generate proteins that bind to specific structures with high accuracy, achieving true cell-specificity and avoiding unwanted pleiotropic effects involves many more variables beyond just protein-protein interactions. These tools are more about expanding our ability to target previously "undruggable" proteins rather than solving the cell-specificity problem outright. however they could be valuable components in developing more targeted therapies when combined with comprehensive research on pleiotropic effects across multiple omics levels. real breakthrough will come from integrating these protein design capabilities with a deeper understanding of complex biological systems and developing strategies for precise delivery and regulation of these novel proteins in vivo.
Not an expert, but you could imagine a protein with two receptors that are required for activation. One of them binds to a protein that is only present in the cells of interest, and the other one binds to the actual target.
So I write a completely defensible rant full of truly interesting and well-informed perspective, get downvoted, and then get accused of being an LLM.
A perspective I'm sure the down-voters have zero cred to doubt. Style issues aside, I don't think there's a serious molecular biologist on the planet who would take issue with the actual gist of what I said.
Pleiotropy: a thing happening can cause more than one other thing to happen. We really need jargon to keep that in mind?
An LLM? This is what I get for writing with passion? Creatively? Daring to play with words? I'm an LLM? For writing anything that doesn't fit your norms? Wow.
How do I get MORE downvotes? They seem like badges of honor in this case.
Question for bio folks here, and not to steal from the joy of this article but I've been recently curious how far are we from engineering something like a virus that targets a subset of the population (e.g. via specific genetic markers). This sort of tech being commoditized feels much much scary than the LLM safety talk - by a mile.
I'm going to play the devil's advocate and disagree.
The viral life-cycle comprises attachment/entry, replication and maturation/release. These stages are generally well understood to the point where 'disarmed' (replication-incompetent) viruses are routinely used as a delivery vehicle in molecular biology.
The first part, attachment/entry is directly related to protein-protein interactions (between the envelope protein of the virus and the entry receptor of the host cell). This particular interaction determines the tropism of the virus, that is, its capability to infect a particular type of cell. Examples include the interaction of gp120 protein of the HIV virus and CD4 of a helper T cell, or the spike protein of SARS-CoV-2 and ACE2 of nasal ciliated cells.
The parent specifically asked about targeting a group of people - designing an envelope protein (or proteins) targeting a specific HLA haplotype would probably get you halfway there (this is not advice).
That video is saying that AlphaFold isn't good at predicting how proteins bind to each other, but is actually pretty good at predicting the structure of the individual proteins. Which is exactly what you would expect, because AlphaFold wasn't designed to predict antibody/antigen bindings. That's what AlphaProteo is now trying to solve. Also, as the video points out at the end, AlphaFold is well aware that it's unable to do this task and is communicating that fact via its ultra low confidences in the accuracy of the positioning of the proteins relative to each other. So I am not really sure what this video is trying to prove.
This is interesting work but I think something has been intentionally overlooked. Creating proteins is difficult and it's also unclear how many of these sequences folded into the predicted 3d structure. Small molecule synthesis is still easier, cheaper, and more scalable than protein therapeutics. I think this would've been more impactful had they focused on improving on the SOTA small molecule - protein interaction models.
> Trained on vast amounts of protein data from the Protein Data Bank (PDB) and more than 100 million predicted structures from AlphaFold, AlphaProteo has learned the myriad ways molecules bind to each other. Given the structure of a target molecule and a set of preferred binding locations on that molecule, AlphaProteo generates a candidate protein that binds to the target at those locations.
It would be essentially impossible to create a new prion disease by accident- generating random-ish new things with methods like this would pale in comparison to the massive number of weird random-ish things natural biology is already creating in the wild.
However, this category of technologies could potentially be used to develop new prion diseases on purpose. As well as to develop cures for prion diseases that disrupt the misfolding.
>As well as to develop cures for prion diseases that disrupt the misfolding.
That seems quite plausible actually. You'd need something that can target misfolded PrP and bind it up so it can't do anything and then hopefully your targeting protein leaves normal PrP alone. A bit like an antibody.
The problem, from what I understand as a dabbler in protein research, is that PrP binds into these large very very stable semi crystalline fibers, (I visualize them looking like thick extruded complicated pasta shapes, where the 2d crosssection is kinda the shape of the outline of a single PrP). It makes it really hard to learn about the structure, actually, because x-ray crystallography requires repeated crystalline structures, and these are more like 3d polymer threads that bunch up and make things hard to image (though there's some more modern imaging techniques that are making headway). It turns out that these are very very stable configurations unfortunately and have very few ways to attach anything, and that's the precise problem with building binders. Plus, even worse, it turns out PrP might even be biologically necessary for mammals and we don't want to usually get rid of it wholesale [https://bmcbiol.biomedcentral.com/articles/10.1186/s12915-01...]
The context here is that prions are misfolded proteins that replicate by causing other proteins to change their configuration into the misfolded form of the prion. Diseases caused by prions include Mad Cow disease, Creutzfeldt-Jakob disease, and Chronic Wasting disease. All prion diseases are incurable and 100% fatal.
Someone could fine-tune a model on pairs of existing proteins and their misfolded prions and then ask the system to come up with new prions for other proteins.
ChatGPT found these 4 companies that will produce proteins for you just based on digital DNA that you send them:
It varies, but as the article says it can be used for things like drug discovery. Imagine there's a new virus running rampant. It works by using a very specific protein to latch onto a cells so it can pull itself in. You would like to develop a drug to stop it doing that and one way to do that is to find a protein that wants to strongly latch on to an important part of the virus. If it's holding onto the virus the virus probably won't be able to penetrate cells because you're engineered protein will get in the way. This is part of how antibodies work to stop viral infections naturally
I only comment on hacker news posts about biology because I'm a voice crying in the wilderness about the most important goddamn startup on Earth, I think, maybe. www.molecularReality.com
It generates novel candidates doesn’t actually generate proteins, and none of these proteins have actually been generated to validate whether these candidates are shit or not
I wonder if the backlash they received from inventing transformers and then allowing OpenAI to eat their lunch has changed their attitude towards how they'll commercialize future inventions.
That being said, as others have commented, my hopes are that all these advancements lead finally to reliable design methods for novel biocatalysts, an area that has been stalling for decades, compared to protein folds and binders.
reply