As I study biology and microbiology in college, the more I realize that taxonomy and phylogeny classification schemes are essentially bullshit. Bioinformatics and evolution research have shown "life" to be much more ambiguous and mysterious than we could have ever imaged. Modern classification schemes are so arbitrary and error-prone as to be essentially just a painful way to satisfy scientists' compulsion to shove things into neat little cubbies and has the negative side-effect of scaring students away from an otherwise fascinating subject. The professors I've spoken with tend to agree.
Edit: added "classification schemes" after taxonomy and phylogency to clear up confusion
There are a great many biologists and bioinformaticians who don't "get" phylogenetics. Please, please don't be one of them. I worked with some world class evolutionary biologists in the drug industry early in my career and I was blown away by how useful this often ignored corner of biology has become when married with modern bioinformatics techniques.
Here's an example: let's say you're trying to develop a new drug against a virus. You can isolate huge numbers of the virus from infected individuals and sequence the DNA very cheaply. Using selective pressure analysis you can look for regions of DNA that are being selected for over multiple generations of the virus. Why? Because these might be important for survival. If your drug can target those regions of the protein, the virus will have an uphill battle mutating and creating resistance to your drug.
Here's one more: You can use phylogenetic techniques on individual proteins to see how closely related they are. Why do this? Because if you're developing an antibiotic, you want to know that the bacterial version has a sufficiently large evolutionary distance from the human version. Now you can do this at scale for huge numbers of bacterial proteins and identify the ones with the biggest distance from their human orthologs. This allows you to identify bad antibiotic targets upfront before pouring all sorts of money into experiments (or at the very least, going into said experiments with your eyes wide open).
Beautiful examples in theory, but blown out of the water in practice.
In the current age of bioinformatics, practical modern antivirals and antibacterials are not being identified, not at all[0].
As such, I find it awkward and outdated to claim that novel drug discovery is a great testament to phylogenetics and bioinformatics.
Can anyone offer a practical example of how bioinformatics has genuinely been (pref. statistically) useful in successful drug discovery? I would also be interested in evidence that existing antibioitics could have been discovered more effectively, using new methods ab initio.
>Beautiful examples in theory, but blown out of the water in practice
I will admit that this work was largely done 10 years ago, so it's certainly a bit outdated. I will also concede that the current state of affairs in antimicrobials research is dreadful.
>Can anyone offer a practical example of how bioinformatics has genuinely been (pref. statistically) useful in successful drug discovery?
I can offer no data that stands up to a rigorous statistical analysis that you seek as far as bioinformatics impact on drug discovery is concerned. This is the primary reason most of us with bioinformatics experience who used to work in the pharma industry don't work there anymore (and for a great many of us this was not by choice).
>As such, I find it awkward and outdated to claim that novel drug discovery is a great testament to phylogenetics and bioinformatics.
Where did I make this claim? I chose the examples I did (which are real and actual examples by the way) of the way one can use phylogenetics to point out to the parent poster that it's not all crusty old geezers arguing about whether two almost indistinguishable variants of mice are part of the same taxon (which is the view a great many people who have never worked in the field often have). I chose these examples because they are practical examples of phylogenetic techniques that are easy to explain in a forum such as this in a few paragraphs.
I can't speak to the viral one, as I was only peripherally involved in that work, but my very first job was taking bacterial sequences, making alignments, and looking at the phylogenetic trees to see whether or not they were too close to eukaryotes for comfort. This was the policy of the company where I worked: the antimicrobials team wanted to ensure both spectrum (that targets they were going after were actually present in medically important bugs) and specificity (that they weren't sloppy targets that you might accidentally hit a human enzyme with). There were a great many targets that I personally de-prioritized because they were no good purely from an informatics perspective.
I understood you were using the example of antibacterial drug discovery to defend traditional phylogenetic models (against the parent commenter). I wanted to illustrate that the lack of new results in these fields doesn't offer good evidence that traditional phylogenetic models are adequate for this specific purpose.
Sincere apologies if I went too far and incorrectly put words into your mouth. Sorry.
I also don't work in bioinformatics any more for similar reasons.
Could I bother you for a couple of specifics for why you left the field? As I mentioned earlier I'm exploring bioinformatics, and I've previously spoken to JunkDNA at length about his own experience, so it would be nice to hear another perspective. My email is pvnick [at] gmail.com if you don't want to discuss it here.
Can anyone explain why ronaldx is being downmodded for this comment?
As someone who knows nothing of bioinformatics or their relation to drug discovery, I see nothing wrong with his post. Perhaps what he is saying is incorrect, but can someone explain why?
I actually agree with you regarding bioinformatics, and the subject you describe is one which I'm exploring as a career path. I was describing an indictment of classification schemes, not bioinformatics, and it seems there are plenty of respondents here that have misunderstood me. Which I suppose is either an indictment of the care with which I constructed my original comment or a lack of attention to the other comments I've made in this thread.
If you find phylogeny to be "bullshit", your biology and microbiology professors have truly failed you. Evolution by natural selection has produced the wonderful diversity that is life. Phylogenetic reconstruction is the tool biologists use to understand the history of all life on the planet. The fact that some living organisms engage in lateral gene transfer, or that there is no definition of species that perfectly applies to all living things is no reason to abandon perhaps the most fundamental and central pursuit of biology. Who we are as humans is inextricably tied to our mammalian, primate, vertebrate and multicellular ancestry, and we know this by virtue of the wonders of phylogenetics.
well, they kind of are bullshit, in the same way that the bohr model is bullshit, and that quantum mechanics is bullshit, too. It's not REALLY what's going on, but rather a simplification that gets your feeble brain to the point where it can do something useful. You'll eventually learn a model that's less bullshitty, that gets you doing one slightly more useful thing, but it's still bullshit.
Still no. You'd be hard pressed to find any biologists left that would argue that discrete taxa exist, but whether or not humans are more closely related to chimps or gorillas is superbly knowable and certainly not "bullshit". Likewise, whether or not birds and dinosaurs constitute a monophyletic clade is not bullshit. Even with all the "exotic" stuff like endogenous retroviruses or whatever "exception" you think you can raise, the "Tree of Life" model still works very very well, it just complicates how the branching works.
I work with hydrogenases, and the amount of horizontal gene transfer is incredible. There are so many mobile elements in the environment... E coli, the humble gut bacterium, seems to have four copies of hydrogenase. Two come from the broad class of "formate hydrogen lyase"; the other two are "uptake hydrogenases". Almost certainly they come from horizontal gene transfer, because other gammaproteobacteria may or may not have some or all of these hydrogenases; and you will find uptake hydrogenases scattered through bacteria (and archaeal) phyla.
the hydrogenase that I work with is part of a mobile element; it's found in our organism in a gene island, but also straight up in the genome of another organism that was isolated 5000 kilometers away, in a totally different ecotype. both of these guys are found in the ocean, but none of the other members in the broad family that it comes from are oceanic: they come from terrestrial volcanic mats.
So obviously species-wise discrete or fuzzy taxa do not work for me; but even protein phylogenetics can be muddied by things like random gene fusions and convergent evolution. I'm not sure that some of the trees that I generate are actually taxonomic, but may be the result of selective pressure against a highly conserved scaffold.
Of course, they're still useful, even if they're bullshit... Because I don't care about phylogenetics, I care about function. If the tree that I make is an unfaithful representation of the historical record, no big deal, as long as it faitfully clusters function.
My undergraduate senior project was on the interaction between breast milk and microbial ecology in infants, so I can appreciate the difficulty of protein phylogenetics in non eukaryotic organisms. I think it's still important to underscore that simply because whatever tree I get phylip to spit out doesn't reflect the true evolotionary history of the organism/gene/whatever, doesn't mean that 1)tracing the evolutionary history of an organism/gene/whatever is "bullshit" (whatever you mean by that) or that 2)the evolutionary history cannot be elucidated by other means. Phylogenetics can be used to discover gene fusion events and convergent evolution.
I think another way to put this is, we've got many years of experience using various forms of evidence to make trees of life (originally character states such as morphogenic properties but now also including gene data and other molecular details).
And even though we've been able to build a great story and a great tree structure, those of us who have studied deep biology know that the tree structure is really muddied by some inconvenient phenomena; for example, using some tiny subset of genes to compute phylogeny generates a very different result from using all genes, or combining all the gene data with all the morphogenic data. Understanding the underlying phenomena which help explain the oddballs that don't fit into the Dogma is almost always useful, because it helps us go back and refine the Dogma.
It's really more of a graph of life when you include horizontal gene trasnfer; whether that graph is very much non-tree like is still an open question.
That's rather harsh, don't you think? All models have their limits and scopes, but that doesn't preclude them being useful in their own right. For instance, Newtonian physics simply can't cope with molecular-scale events, but it still gets used every day for the cases where quantum or relativistic effects don't matter. Likewise, phylogenetic classification is still a hugely useful tool within biology. The fact that it has assumptions and limitations merely defines it, rather than diminishing from it.
Just because it's bullshit does not mean it's useless.
The tree of life model is almost useless for understanding single celled organisms which vastly outnumber there multicellular counterparts. It also creates a lot of confusion where people assume the model matches reality instead of poorly mapping to reality. Consider species is a surprisingly vague concept where A and B may be compatible and A and C are compatible but B and C are not. The point is it's based on inaccurate assumptions, if you understand how and why the model breaks down you can extract a lot of value from F=MA or any other such approximation.
You seem to have allowed the intricacies of the subject to distract you from my meaning. All of that is true, and there are obvious macrostructures, such as those you mentioned, which are credible. However, the worship of phylogenetic classification schemes which attempt to disambiguate even the most minute details of living organisms are not useful except maybe in the nichest of fields and to boost bioinformaticists' egos.
I'm still not sure what the argument is that you're making, or who you are accusing, and of what. Minute details tend only to be relevant in niche fields. Isn't the study of minute details what makes them niche fields? When you compute the area of a circle, though you may only use a few digits of pi, surely you can appreciate how those that need to precisely calculate the area of a circle might use more digits than you do. The fact that pi is irrational, that neither of you are using the "real" value of pi, doesn't make what either of you are doing "bullshit" though.
There's no harm in defining and categorizing things. The harm would be in forgetting that categories should be based on reality, but don't define reality (a Platonic mistake).
When new information blasts away an existing categorization scheme, then a new one does need to be made.
I agree, and that's why I said "Modern classification schemes." The current system doesn't work, and bioinformatics is blasting it away at such a speed that it can't seem to keep up. The current response seems to be just bandage it up, like a novice PHP developer's first web app devolves into spaghetti code because he doesn't know what an MVC framework it.
Are you studying those topics as a biology major, or as a student of some other major subject? There actually is a lot of criticism among biologists about how far behind the textbook representation of those topics is compared to the professional literature of biology. If "cladistics" is not a familiar term to you from your undergraduate education, that would be diagnostic of a poorly designed biology course, for example. See
Perhaps the answer lies in tagging. Rather than a hierarchical categorization, everything is just tagged and thus searchable. That way you don't have to keep trying to fit everything into preexisting categories... Seems to be the direction most other things are heading as well.
To my understanding the classifications tend to be "lies we tell to children", much like the simplified atomic models sold to first-year chemistry classes. If you're a biology major, you never really need to get the Chemistry PhD's version of atoms, right?
Very broad classifications tend to be useful, in that by definition they group similar organisms. Beyond that level of detail they're useless, yes, but beyond that level of detail you're already going into the level of expertise where you're expected to recognize their uselessness. Cell Biology PhDs may or may not realize it, but they're working on tyrosine kinases, not classifications.
Growing up it seemed crazy having to memorize these arbitrary taxonomy schemes when things intuitively didn't seem so black and white. I guess every model has it's shortcomings.
> Modern classification schemes are so arbitrary and error-prone as to be essentially just a painful way to satisfy scientists' compulsion to shove things into neat little cubbies and has the negative side-effect of scaring students away from an otherwise fascinating subject.
What do you mean by "modern classification schemes"? If anything, molecular phylogenetics helped clarify earlier attempts at classification which used common morphological features.
Are you asking "how do you define life?" or "how do you separate organisms from each other?"? Regarding either question, I don't think anybody has a good answer, and that's what's so fascinating, because it really seems to be a metaphysical question rather than a scientific one.
I'm sure the author means dislike for the current system for trying 'too hard' to classify life, not dislikes it entirely, or out goes our ability to describe life with words entirely, which I think you'll agree is a pretty important faculty.
I think the underlying point is that while humans have an extremely strong (and undeniably useful) urge to classify things into clean groups, not everything lends itself to being so classified. The human urge to classify things is what makes possible paradox like the sorites paradox.
Do we really need to classify it down to 8 taxonomic levels? At the last few levels I bet it tends to get really arbitrary. Or, we could have a variable-depth tree. Make it more specific only when you need to.
It is variable depth - the main ranks also have ranks between them, with some denoted by super- and sub- prefixes to the major categories, and others of unnamed rank defined only by its position between two adjacent levels.
One of those things is not like the others - the abstracted information and constructed envelope of a virus seems qualitatively different to me, since that structure doesn't depend on external factors for its basic coherence. +1 for mentioning Benard cells, did not know of those.
It's not "bullshit", it's just doesn't come with the same kind of guarantees that religion does. Our knowledge is refined and expanded over time. This is perfectly natural.
The problem is when people are taught in an authoritarian manner, and then they run into places that method breaks down, and so throw the baby out with the bathwater. As you have done.
Discoveries such as this just go to show how arbitrary the definition of "life" is.
One might argue that viruses are not life because they are obligate intracellular parasites - but many species of bacteria are too. It used to be that size was the distinction: viruses as small, simple particles and cells much bigger, but large complex viruses also blow this argument out of the water.
The only thing left that really distinguishes the "life" of cells and the "maybe-not-maybe-life" of viruses is the presence of ribosomes to synthesis proteins. Maybe one day we'll find a virus-type replicator that also contains ribosomes - and what then?
Then there is also the issue of simulating life in silico - at which point does it stop becoming just a simulation and can be considered 'actually' alive?
Personally I think these questions are best debated in the context of philosophy and ethics rather than taken as something that the domain of science ought to provide conclusions to.
My favorite exploration of this topic is found in Maturana and Varela's "Autopoiesis and cognition: the realization of the living". Cells have to "work" to preserve their ability to go on preserving their ability to ... (etc. recursively). Viruses don't.
> Maybe one day we'll find a virus-type replicator that also contains ribosomes - and what then?
Mimivirus, another giant virus, has its own gene for an amino-acyl t-RNA synthase, an enzyme that loads an amino acid onto a transfer RNA to be used in making proteins. That's awfully close to a ribosome component.
"Each about one micron—a thousandth of a millimeter—in length, the newfound genus Pandoravirus dwarfs other viruses, which range in size from about 50 nanometers up to 100 nanometers."
I thought the same thing exactly. I usually convert everything on molecular scales in terms of proportions to the nearest tangible entity. (I find nm too abstract and harder to remember)
A single hydrogen atom has diameter of about 10e-10 meters.
A single nanometer is 10e-9 (i.e. about 10 hydrogens stacked)
Viruses are on order of 50-100 nanometers => ~500-1000 hydrogens across.
These viruses are 10e-6 meters, or 10,000 hydrogens across.
That's too many to imagine and hard to remember, so other useful dimensions I sometimes use are
diameter of DNA helix is about 2nm (20 hydrogens across)
the size of X chromosome, which is roughly 7um.
the size of a cells in our bodies vary around 10-100um.
so, the best way to get a sense of scale for this particular virus (imo) is to imagine it as 1/7th of size of an X chromosome. Or as 10x bigger than an average virus.
No it doesn't imply anything about origins of life - but the lack of any homologous sequences in known species (including other viruses) that have been sequenced does illustrate the vast gaps in the totality of genetic data collected and indexed so far.
Absolutely, there are 2,500 genes that we didn't know about.
Now it could be that if any one of these genes shows up in any of the "known" lineages (as it might if a retro-virus carrying it injected it) that it causes the host to die before reproducing. It could be that they are pandemics waiting to happen, it could be that they are code for additional eyeballs.
The challenge is that we do not yet (as far as I can ascertain) have a way to looking at a gene and identifying all of the effects that gene has on a cell or an organism. What we have are organisms with genes, that we are 'debugging by printf' by essentially commenting them out and seeing what happens.
Once our knowledge base flips, and we understand genetics at a information/programmatic level, we would be able to evaluate these 2500 genes and see if there is anything useful here.
That is actually very interesting. How well would our (in)ability to parse what code (SW) does without actually running it transcend to genes. STA for genes?
Many types of viruses[1] are like this by virtue of their high replication rate and RNA's poor error correction while copying (in comparison to DNA). It's one of the reasons why anti-viral therapy can be so tricky in viruses like HIV, as it's such a rapidly moving target.
As as analogy to computer viruses and other malware, the effect is very much like the polymorphic program code found in some of these, albeit not going as far as to derive new functionality as biological mutations can, through the combination of mutation and selection.
Maybe it was once an unicellar predator that hunted bacteria which during evolution gained the "idea" of using prey cells' machinery to do some of its biochemistry.
So it didn't eat bacteria, rather plundered those.
Then during evolution it eventually lost every other piece of own biochemistry, eventually becoming a virus.
I read about the mimivirus it mentions in the article a few years ago, and it mentioned that was one hypothesis. Another is that it went the other way around; before cells were invented life was just increasingly complicated self-replicating molecules and cellular life evolved out of them. In that case the virus-like ancestors would have evolved to prey on their self-sustaining descendents.
I think the uniqueness of this virus' genes suggests that whichever direction it happened, it happened very early in the history of life.
Whenever I read about findings like this, I'm reminded of just how much our requirement that microbes need to grow on a piece of glass distorts our view of them. If we could observe them as easily as animals in the wild, that would open whole new avenues of research.
It's a wall of protein with nucleic acid inside that reproduce by exploiting other organisms' cells. Don't get blinded by the differences, the similarities are much bigger.
Edit: added "classification schemes" after taxonomy and phylogency to clear up confusion