Hacker News new | past | comments | ask | show | jobs | submit login
A review on protein language models (apoorva-srinivasan.com)
138 points by apoorva26 33 days ago | hide | past | favorite | 27 comments



I'm excited to see where this, and Alphafold go.

I'm also interested in a different direction for modeling proteins: ab-initio. I am curious if we can get a good-enough simulation of charge density around atoms to simulation folding using electric force models. It seems that chemists are using very computationally-intense models (HF, Kohn-Sham DFT etc) where the wave function (and therefore charge density) is modeled as a product etc. This does not scale well as electron count goes up. And, Valence Bond and Molecular Orbital theories of bonding both seem like not great approximations.

I am suspicious we can come up with reasonable models of charge density for atoms and simple molecules without this complexity, by solving the Schrodinger equation; the relation between psi'', psi, V, and E. And then perhaps interpolating various solved solutions, incorporating ambient water molecules etc. I can, unfortunately, find very little about this approach, but am making progress on a program that will probably go no where, but I still feel is worth building. Incorporating spin is proving tricky. It is possible we need to model using spinors, but I am suspicious there may be shortcuts where we accept Fermi-Dirac statistics as an axiom, without using exchange terms.

Of course, modelling the electron-electron interaction is tough because you need to integrate over 3D space, and there is a feedback loop between psi, charge, V, which then feeds back into psi for the other electrons, and vice versa!

This is fringe, and probably intractable, but... seems worth trying. One of the key challenges is the big challenge with modelling in general: We have differential equations that can verify a solution, but not come up with the solution... This, at least, gives a validation for work on this topic.

Even more fringe, but relevant: I wonder if the rules of quantum mechanics are intrinsically tied to nature's fondness of differential equations, and are key to how nature "solves" them.


It's still unclear to me that using QM to simulate protein folding or enzymatic activity is a worthwhile endeavor. Even highly approximate QM methods don't seem significantly better than classical force fields for recapitulating folding dynamics, and the actual amount of computational effort required would be astronomical. I would recommend against it simply because we know of better, more economical methods, to get at solutions we need.


QM give a lot of important solutions that are real that simpler models miss. But nature is of course unrestricted


interatomic potential is a promising scale and there's been some interesting recent developments using equivariant graph neural networks: https://www.nature.com/articles/s41467-022-29939-5 https://arxiv.org/pdf/2206.07697


I don't think DFTs will get better anytime soon. But here's something wild: Maybe it doesn't matter.

Run a DFT simulation of a protein with known structure melting. Time-reverse it. Train some sort of 3d convnet on the deltas at every point in the (wrong) melting curve.

Who cares if the DFT is wrong! The ml model will learn the rules of this fictional universe that uses the wrong rules to get the right thing.


I’m happily impressed that non experts find this area intriguing. I hope more people start to work at the interface of biology, chemistry, and AI.


Dan Gusfield wrote a book in 1997 entitled "Algorithms on strings, trees and sequences: Computer Science and Computational Biology". We (what was then the Center for Advanced Study of Language at the University of Maryland) found some of the techniques described there to be useful in (natural) language analysis, particularly morphology and for comparing words in related languages.


The intersection between biology and computer science is by far my favorite topic. I wish I would have gone into bioinformatics after my CS masters degree. Both sciences really get the best out of the other.


Most bioinformaticians are biologists.


It depends on what type (and also whether you distinguish computational biology from bioinformatics). The people who create new algorithms for sequence assembly, protein folding, etc. tend to be computer scientists who got into biology. On the other hand, the people who analyze biological data computationally tend to be biologists who got into computing.


They are arguably computer scientists who do “bioinformatics” by building tools. Different skill set. Also not a large number of them.

Most bioinformatics jobs need domain specific bio knowledge.

Things will change even more as DNNs take over.


I appreciated this article. These analogies always fall apart at some point (and the article points this out) because these are complex processes, but I appreciate these attempts to turn biology into a human problem, because at the very least, it'll have the effect of getting more people up the learning curve.


It bums me out that the way to get rich via AI is to throw an OpenAI API key at a SaaS app, not solving fundamental hard science problems.


Honestly I wish I learned more biology in college instead of cruising through a CS degree for a generic software engineering career. Now that I’ve gained more interest in bio, the doors have closed.


> Unlike proteins, most human languages include uniform punctuation and stop words, with clearly separable structures.

Latin has entered the chat.


Greek plays (as far as I understand) were also using minimal punctuation if any. Even the points the speakers changed is not always clear.


Colloquium Latinum iungere


(Submission title has a typo that article has not: languaUge models)


Interesting, a (u)racil mutation. We'll see how the repair system reacts. And if not, who knows what will come of it.


I’ve applied a Cas9 protein to the title to snip out the U. Thanks.


I am gobsmacked by how little programmers know about the underpinnings of actual human language. This guy sounds like a SoCal 1960s burnout going on about being one with language of the forest.


To be fair there is a disclaimer about the limits of metaphor, plus the article is really about protein folding, not natural language. Still, I'm curious as to which "underpinnings of actual human language" you think he's misunderstanding or misrepresenting?


Unrelated to conversation. But Apoorva is a female name. So its probably she. :)


Actually while it's generally female, it's not always! It depends on the region of Indian continent / ethnic group, as the final "-a" in many Sanskrit-roots may or may not be dropped depending on the modern language of derivation. For example, I know several Telugu, Marathi, and Bengali males named Apoorva/Apurva.

That being said, it does appear that this instance of Apoorva is indeed a female name (and in general, Tamil names like this one don't keep the aforementioned "-a" in the masculine).


i am, in fact, a 'she' :)


[flagged]


The irony in this exchange is too good!


Providing the first example yourself here




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: