
Protein Linguistics - superfx
http://moalquraishi.wordpress.com/2018/02/15/protein-linguistics/
======
cs702
> _The basic idea is as follows: there is evidence that today 's proteins
> emerged out of an ancient peptidic soup, one that may have left its mark on
> the evolutionary record. I.e., the proteins we see today may in some sense
> be formed out of primordial peptides. As proteins grew in size and
> complexity, it would have been advantageous to reuse existing components, to
> build bigger proteins from existing protein parts. We already know this is
> true on the level of protein domains, in that larger proteins are often
> comprised from chaining together smaller globular domains. But the
> phenomenon of reuse may go further, where even smaller protein fragments
> (handful of residues to dozens) may reflect an underlying evolutionary
> pressure to reuse working parts, fragments that fold in tried-and-tested
> ways (from the perspective of evolution.) If this is the case, then the
> space of naturally occurring proteins may occupy a very special "manifold",
> one that exhibits a hierarchical organization spanning small fragments to
> entire domains. Other evolutionary pressures could further drive the reuse
> phenomenon. For example, once a protein-protein or protein-DNA interface is
> established, presumably through some sort of structural motif, reusing that
> motif would present an efficient way for the cell to rewire its cellular
> circuitry. The end result of all this would be the emergence of something
> resembling a linguistic structure, a grammar that defines the reusable parts
> and how these parts can be combined to form larger assemblies. Given that
> this is biology, it’s unlikely to be rigid or minimal. It would be messy and
> hacky, with many exceptions and ad hoc evolutionary optimizations. But the
> manifold would be there, potentially discoverable and learnable._

Instead of characters -> 'byte-pair-encoding'-like sequences -> words ->
sentences, think primordial peptides -> simple protein parts -> more
complicated protein components -> proteins. If this "protein linguistic
hypothesis" is correct, I see no reason why the manifold wouldn't be
discoverable and learnable with modern SGD techniques.

~~~
zbyte64
> So are RGNs a panacea? Not at all. This is very much a 1.0 release. They are
> raw and unpolished. Training them can be quite challenging, like I already
> mentioned. They do comparatively well on novel protein topologies, but
> that’s because everyone else does so poorly. They do silly things like
> predict pretty awful secondary structure, and their predictions can have
> steric clashes and the like.

If we accept his comparison to other results it seems RGNs have an
unreasonable effectiveness for topologies...

------
tritium
I'm sure there's a predictable set of interactions, with a minimum, finite set
of required loops to support cellular life as we know it. Above the minimum
set of operations and repeatable cycles, there are almost certainly specialty
routines, and perhaps no fixed limits on diversity of optional interactions,
at the cellular/chemical level.

But for sure, there is also a boundary layer, for interactions _between_
cells. This would have to represent an almost entirely different set of
chemical interaction rules for signaling, with its own constraints, minimum
requirements, and optional expressions.

So, it's useful to conceptualize in terms like this, but problems solved
within the context of intracellular operations will only offer clues about
tissue organization, and indeed, tissue requirements may drive the optional
intracellular interactions more often than not, rather than the reverse. In
cases where intracellular interactions drive extracellular organization, it's
essentially leaky abstractions dictating the details of higher level
implementation.

------
gilleain
> the proteins we see today may in some sense be formed out of primordial
> peptides.

This seems reasonable, but another possibility is that modern proteins
(domains) are 'carved' from larger proteins that had a looser structure.

In other words, primordial proteins could have been badly folded and mutations
gradually improved them to smaller, better folded structures.

