
Are machine-learning packages like genomes? - Ecalpal
Biologist here.<p>I&#x27;m trying to learn a bit about machine learning just to keep my general knowledge fresh.<p>Reading about machine-learning model&#x2F;package development, I&#x27;ve realised that it must be easy to allow such models to degenerate; to patch and glue and create hidden dependencies that can be hard to reverse-engineer.<p>This reminds me very much of a genome, where functionality has been added over billions of years using whatever inputs were available at a time and producing something good enough at the time.<p>I&#x27;m not sure how relevant such analogies are to ML, but it feels like this must be the natural way of things: The code wants to degenerate (path of least resistance), but for the model to be clear--and generalisable--this must be resisted.<p>Do you feel this is fair&#x2F;accurate?<p>Again, I&#x27;m a biologist, not a technical expert. I just found this similarity intriguing and it would be interesting to hear your thoughts on the challenges&#x2F;opportunities of allowing code to be more like code (and less like a genome, since they&#x27;re notoriously hard to understand or reverse-engineer).
======
madhadron
When applied to the models themselves it's not a particularly helpful analogy.
Machine learning models are largely curve fitting in high dimensional spaces.
Overfitting, overspecialization, and the like are problems, and you could
relate them to ecological notions and selection, but it's not terribly helpful
in practice.

Where it _does_ apply is to the cascade of dependencies among data sets, code
that generates them, and sources of signal that you see in large data
platforms.

~~~
Ecalpal
Thank you for articulating my point better than I could myself about the
'cascade of dependencies'.

It seems to me that these platforms/packages are essentially trying to build
something almost-as-complex as a genome from scratch, and that trying to avoid
those cascades is paramount. However, is this possible in practise, and if so,
what would help streamline the process?

Would a generalisable package solution help with this, and what's limiting
development of such a solution? Is it because we don't know what such a
solution should/ought to look like?

~~~
madhadron
> Would a generalisable package solution help with this

Not really. It's more what happens when hundreds or thousands of people try to
get lots of different things done using shared resources over a period of
years.

