EDIT: I would specify that I'm not saying that experimental biology "doesn't work" or doesn't get meaningful results before the bigger new technology arrives. I think the Slate Star Codex article overstates the helplessness of single-gene theories, which did explain a bunch of diseases and simple attributes, and had significant medical impacts in understanding things like cancer and chemotherapy effectiveness. It just failed to accomplish a set of other things that people wanted, like explaining intelligence or height. Each new advance (cheap genome sequencing, epigenetic readouts, hugely longitudinal metabolomics, molecule-level microscopy, and the hundred-plus advances in this direction we haven't even conceived of yet) expands the territory of what we can address through molecular biology.
Scientists are often really optimistic about whether topics of interest lie inside or without this territory at a given moment, which I think has to do more with the incentives of grant writing than anything else.
At least until we get "AI Complete" approach, we can't expect to suddenly a means to "debug" a system crudely analogous to a "million lines of spaghetti code" (and yes, only crudely analogous but the ways that analogy break down only makes the whole thing harder to understand).
Take the genome for example. I really think the notion of gene sequences hobbles our understanding. The genome is not a sequence. It can be sequenced, but that's not what it actually is. A genome is a molecule. Every part of the surface of that molecule is constantly interacting with other molecules. In programming terminology it's like a program where every instruction executes in parallel, always, and in real time. That means that trying to read the genome sequentially like a program or a book is missing the point entirely. We're holding it wrong.
I truly think we are a long way from being able to really "grok" these systems. It took us thousands of years to develop the math, logic, and science to understand complex systems with discrete components and discrete logic. In many ways the digital computer is the apex product of our understanding of discrete systems and discrete logic.
Now we get to figure out concurrent systems and concurrent n^n combinatorial "logic." It might take a lot less than thousands of years because there are more of us and we have a lot more knowledge to work with, but it's not going to be overnight.
In order to make sense of and manipulate things like genetics we will need to develop machines that can do those things for us. While that's unsatisfying because we generally like the feeling of fully understanding things, such machines will still yield progress and results, which is all we can really hope for here.
I've thought for many years that there are new intellectual tools waiting to be discovered here that will be as big as arithmetic, calculus, or logic. There was a time when humanity had no idea what mathematics -- the whole field -- was, and today there are probably whole analogues to mathematics waiting to be discovered.
Unfortunately we are still in the phase of trying to attack this problem with old ways of thinking. We probably won't even try until we finally come to terms with the fact that the tools we have at our disposal right now do not work to truly understand the genome. This will take a while as humans become emotionally attached to their tools and cling to them. Try debating a programmer on OSes, languages, or editors to see a simple example. :)
Bonus is that once we understand the genome we'll probably understand a lot of other unknown unknowns we didn't even realize we didn't understand. Maybe this is why physics seems stuck. Maybe the cognitive tools we have right now are simply not up to the task of understanding the whole thing.
I actually think Stephen Wolfram's doorstop A New Kind of Science was groping in this direction. The book was problematic because of Wolfram's almost comical narcissism (Wolfram sort of tries to take credit for a lot of things he didn't invent), and the techniques it discusses don't seem to have delivered much fruit in and of themselves. Nevertheless at the "meta" level the notion of trying to invent fundamentally new intellectual tools is absolutely what we should be doing. We will of course fail a lot, but that's what happens when you try to do something new.
I'm pretty sure they could learn to write programs. The had algorithms.
We do not have aliens or time travelers to walk us through genomes and fill in the missing pieces of our understanding.
But, if you are suggesting a smart human from 500 B.C. couldn't grasp 'The Art of Programming,' I'd respectfully disagree. Logic and reasoning haven't changed in recorded history. The ancients were no more or less intelligent than the moderns.
Whenever I'm tempted to think otherwise, I sit down with my Euclids Elements and see how far into it I can get before I reach the "WTF... How did he figure THAT out!?"
An even better cure -- although more recent -- is to see how far you can get through Newton's Principea.
Everything seems obvious and easy in hindsight because we are viewing it with those intellectual tools deeply embedded into our understanding. They are all over our culture and we pick up bits of them as children through osmosis even before we study them formally.
I think getting an ancient Greek or Roman intellectual to understand a large integer factoring algorithm, a proof of work block chain, or an OS kernel would be pretty painful. It would take a lot of tutoring to first teach a lot of things that were not understood in that time at all.
You can sometimes see this today when you see older people in rapidly developing nations trying to learn advanced concepts. They can do it but it takes a while.
My point is that all this assumes a tutor who knows and can explain. For levels of understanding not yet reached by any human, there is no tutor to teach us how to think about the problem.
In short, you are confusing reasoning ability with algorithms designed for a particular form of technology.
Take neural nets. Would someone from the 1970s understand the benefits of a convoluted net vs. a simpler form? Perhaps. But, without the technology to perform millions of training calculations, the lack of comprehension would come from your pupil wondering what the point would be of trying to understand an algorithm that, with his technology, could never be demonstrated, used, or tested.
While cybernetics or the theories of emergence in systems-theory is a step in the right direction, i think the reality of how "something", "works" is way weirder than we expect.
It's much less about how different components superficially "interact" with each other and much more about mind-numbingly complex emergent properties that somehow "happens" between objects, emerge through complex space-time interactions probably with even weirder quantum factors or "fields" that are not apparent by n-dimensional modelling.
When you really begin to dig deep into these problems it becomes an increasingly weird mixture of philosophy, physics and mathematics, revealing endless paradoxes as we increase the resolution.
Basically it becomes like a chicken and egg problem: is the complexity and life-force of a cell or its components in any way able to be "deworlded" or is the cell or other complex organisations simply not able to be understood without the context in which they exists, in other words, the entire universe.
We won't be able to "understand" by looking at the "thing" because no thing is actually at the center, it's not "doing" anything and doesn't even exists outside of our own simple taxonomical concepts. The genome can't be sequenced because it's simply a way to label, divide or reduce a "thing".
Feeding a neural net with enough extremely high resolution data from a space large enough and i could imagine extremely strange patterns emerge.
How we would be able to capture such data from reality is the big problem because we simply don't know how many factors are at play. Protein folding is already an incredibly demanding calculation and it's very simple compared to even the tiniest mechanisms of a cell.
I wonder how much that's just a technicality, in the same way you could say the inverse square law for gravitation is wrong because really every massive particle has some influence on every other particle, etc.
So maybe it's the case that every gene is involved in every trait, but maybe there's a handful that account for 99% of what we care about in that trait? (Then again, I can imagine for something like intelligence that most of the genome is really involved—height though?)
EDIT: I had some remarks about the term 'gene' here that were incorrect and turned into a useless diversion.
Generally speaking, most genes individually make such a small difference that it takes a huge dataset of genomes to find them.
A fundamental problem with genetic analysis is that gene variants are categorical variables that often have non-linear effects, but predictive power goes down drastically as soon as you start looking for multi-factor effects. You have to narrow down the multi-factor search space, possibly by narrowing it down to genes that have a linear effect, but that could still be too large and you could easily miss genes that on average have no effect.
I wrote a critique of a paper using this method. I proposed studying the distribution of variant data's residual distance from a single-factor linear fit. Consider a variant that has a positive effect on height when combined with another variant, but a negative effect otherwise. The single-factor linear fit will assign it 0 effect, but residuals involving the variant will be unusually large, positive when the second variant is present and negative when it is not. My critique found several variants fitting this description (after a bonferroni correction), but i didnt have the compute power or time to rerun the 2nd order interactions with my findings.
But hey, I got a B on that paper, so maybe it's a terrible idea.
On the other hand, given that height is closely linked to metabolic conditions, and both the transcription/translation rate and functioning of every cellular component relies on metabolism, it doesn't take long for changes to propagate to simple traits like height.
It's not; the authors statement seems to use what I udnerstand to be the standard definition used in the field, where a gene is sepcifically a sequence which codes for the synthesis of a particular protein or nucleic acid sequence, which is essentially the lowest level “simple” trait. This is perfectly consistent with large and overlapping sets of genes being involved in determining each of the complex traits that humans mostly care about, and does not make the term “gene” meaningless.
If you assume the effects of individual causes combine linearly, you can still look at one cause at a time. But programming languages interact with the problem domain, library availability, team preference and experience in non-linear ways.
I think that's a much more measured claim than you're giving credit for; "it works" is claiming to explain 10% of variance. Do you have specific complaints about the cited article? Or are you just arguing by analogy that since people were wrong in the past, they are probably wrong now?
You can always decompose a function of two variables into three parts (with some hand-waving for notation):
f(a, b) = f(a) + f(b) + interaction term
where E(interaction term) = 0 in some statistical sense.
If the interaction term is zero, then great, your function is trivially decomposable. For our problem, take a = genetics and b = environment, that means you can precisely talk about someone's 'genetic score for intelligence' and someone's 'environmental score for intelligence', and never have to consider them together.
But I strongly suspect that the interaction between genes and environment is very very high; the latest effort to map genes to intelligence only accounts for 10% of variance not because the environment determines the other 90%, but probably the interaction term, which can be complicated and highly nonlinear.
Another thing: variance of your inputs can only be considered in a statistical sense so the relative importance of genes vs. environment won't be stable. If somehow the world became completely uniform (every single child in the world received the exact same education and upbringing), you'd expect genetic variation to account for everything just by definition.
f(a,b) = b/a
In this case the system is contained entirely within the interaction term, and since you don't know the distributions of a and b there's not much motivation to go further. If you had the distributions of a and b you might be able to do something a little less trivial by skewing the interaction term to have an expected value of zero, potentially like:
a/b = a + b + (a/b - a - b) = f(a) + f(b) + (interaction term with E[term] = 0)
I’m naively confident that we’ll find gene patterns for height soon.
The correlary is that successful products can litter the world with unintended consequences - as can isolated discoveries.