Hacker News new | past | comments | ask | show | jobs | submit login
What a Newfound Kingdom Means for the Tree of Life (quantamagazine.org)
262 points by dnetesn on Dec 18, 2018 | hide | past | favorite | 69 comments

I'm currently reading Nick Lane's excellent book, The Vital Question, which explores what we know about the split between archaea and eukaryotes, so this was an exciting article. This discovery is still one level below that split, within the eukaryotes, but the ongoing discoveries of more and more ancient branches suggests the tantalising possibility that we may one day find something even more ancient still swimming around somewhere --- a missing link from billions of years ago.

What strikes me about both this article and Nick Lane's book is that when I was in school the tree of life was presented as this thing that people had pretty much figured out. Since then it's been altered almost beyond recognition using genomics. I hope there are people still in school who read articles like this one and get inspired to get into this amazing area of research.

A small digression on Nick Lane's book: I gifted myself The Vital Question in 2015 December. While Lane writes quite effectively without any mind-numbing jargon, the book still has quite a bit of technical chemistry (understandably). After the excellent first 80 pages, it took me a lot more will power to plough through. :-) (I paused at page 112 and never got back. Thanks to this reminder, I'll resume again.)

When I was reading the book on a plane, a seasoned biologist happened to be sitting next to me. When I said that it's the first book of Nick Lane that I picked up, he said: "I'd rather suggest you to pick up Laine's other book, Life Ascending, and then get back to The Vital Question."

(Edit: Oops, bad me, just noticed that there's already discussion about the book's accessibility in this thread.)

> pick up Laine's other book, Life Ascending, and then get back to The Vital Question

As an ex-biochemist who has read both books, this is excellent advice.

If I know nothing beyond high school biology about taxonomy and nothing beyond a reasonably educated layperson about biology generally, is "The Vital Question" accessible? If not, do you have an example of a well written but quick introduction to the subject matter.

I occasionally read pop-culture articles about, e.g. the discovery of giraffe subspecies, and think about how I'd like to know more about both the human-focused taxonomy side and the evo bio speciation side, but I don't know where to start.

If you have a background in IT, and are modestly comfortable with statistics, the following foundational paper on algorithms to reconstruct the tree of life from genetic data should actually be understandable: http://science.sciencemag.org/content/311/5765/1283

The statistics really only ever amount to Occam’s Razor, I. e. fewer differences in the genome means closer relationship.

Edit: actually free full-text link: http://bioinformatics.bio.uu.nl/pdf/Ciccarelli.s06-311.pdf

I am a statistician (although as I confessed nowhere near computational biology), so, uh, yeah this is up my alley.

One more thing: Jared Diamond (of “Guns, Germs, Steel” fame) transferred the exact, same method to linguistics, using it to discover the ancestry of Pacific Islanders using features of their respective languages instead of letters of DNA. The result is both a tree of their (cultural) ancestry, and a map of their migration/expansion from island to island.

Yet another thing: Julien d’Huy uses these methods to analyze folk literature and myths. Very, very interesting research.

Excellent! I believe I remember a far longer version of the paper, and interactive widgets to explore the tree. Couldn’t find those right now and am on the road, unfortunately.

In any case, the process consists of basically two steps:

“Align” the DNA of all the species you have sequenced. That’s done using algorithms such as smith-waterman, which minimize the number of edits needed to go from one species’ version of a gene to another.

That number defines a metric that measures all the distances between the species. So, Orang-Utan to Homo Sapiens may be, say, “450”, while Homo Sapiens to Ficus Benjamini is, say “12321”.

(The difference may also be measured on the level of protein sequences. The process is basically the same, only that edits may be assigned different distance scores, because some result in more functional differences, while others are functionally “silent”. Evolutionary pressure would make the first rarer than the latter)

The metric is unitless (I. e. It doesn’t allow translation into, say, “years since species diverged”. But it should fulfill the triangle equality).

Once you have a (triangle) matrix with all the difference, all that’s needed to reconstruct a binary tree that maximizes plausibility, I. E. minimizes the sum of difference at each branching point.

Richard Dawkins’ The Ancestors Tale is a great layman introduction, on par with Bill Bryson’s as mentioned a few posts up. I’ve read both. Both cover similar subject matter, but approach it from opposite chronological orders.

Bill Bryson's "A Short History of Nearly Everything" is an amazing layman start.

It's a little dated (the first edition came out in 2004!) but to be honest, if you're interested in a "human-focused" intro and how the concepts fit together, things haven't really changed that much.

I really, really, really loathe Bill Bryson, because every time he touches on something I actually know about, there are invariably glaring, awful mistakes, including repetitions of urban legends and just the sort of 'facts' you'd expect to find on a "AMAZING TRUE FACTS" diner placemat.

And so I'm real reluctant to read what he has to say about other fields, because I've got to assume it's all that bad.

You've got to understand the purpose of those books.

You, dmd, are not going to do anything with the knowledge gleaned from those books, except possibly one of two things: Peruse further study, or inspire others (usually children) to be interested in the field. The book opens up the field, the basic concepts of the field, and a bit of history of the field to you. You now have the tools to go off and either learn more or inspire others. You can even get into conversations and debates with those well-versed in the field.

The book is a gateway into a field, not a path to follow in the field.

Can you list some examples (or are examples collected somewhere?)

Many of the errors were corrected in later editions (it's been in print for 14 years!).

You can see the original errata here:


I leave it to your judgement whether this makes you "really, really, really loathe" the author :-) Note the book had 20 pages of just bibliography.

You know this is a problem for the layman like me. I need to get educated on the general aspects of so many things in life. But with science and knowledge always progressing at a fast pace, the "truth" is always evolving. And this makes it a problem because I cannot be 24/7 invested in every single area of my interest, which are many. So I appreciate Jared Diamond and Bill Bryson because they make available to me (random person with a lot of interests but limited time) a quick overview of things, which I gladly accept at a high enough accuracy. It's sort of like Wikipedia, yes not the be-all-end-all reference but gets you started.

Only bio class I took was in high school. The Vital Question is tricky in some places (I don't pretend to understand all of the details), but accessible, interesting, and one of my favorite books of the last few years. Of course, YMMV, but don't let that hold you back.

I would say it's accessible, yes. My background is CS and math, and I have managed to follow most of it, though I have occasionally got lost on some of the chemistry.

"If I know nothing beyond high school biology about taxonomy and nothing beyond a reasonably educated layperson about biology generally, is "The Vital Question" accessible?"

I have not yet read _The Vital Question_ but I have read his previous books and I highly, highly recommend:

_Power, Sex, Suicide_ _Oxygen_

If I had to pick just one, it would be PSS.

I'd avoid it because it is a rather misleading book whether or not it is accessible. Many people (but especially Carl Woese) led to our modern understanding of molecular phylogeny. Lane's main contact for his book was Bill Martin, who was a rival to Woese, which is fine, but the result is a story in which Martin made all the major discoveries and Woese literally gets one sentence. I'd strongly recommend Quammen's "The Tangled Tree" instead, which is a more balanced book.

IIRC, Woese gets a fair bit of credit in TVQ. Certainly far more than a sentence.


Why is it still taught as a tree, then? That seems like a pretty big discrepancy to just be a matter of "simplifying". Would also be more inspiring to students to know there is so much more to discover, as you point out.

Because it is a tree. Each branching point represents the last common ancestor between two species.

The only confounding factor is that among bacteria, there is a process called “horizontal gene transfer”, where one bacterium inadvertently (?) aquires some genetic information from another when they’re really just out for a quick snack. Plus viruses, which sometimes just get inserted into a host’s genome and becomes part of it, sometimes even functional. The way animals acquired mitochondria is also a fascinating divergence from the standard “tree” model, although it’s somewhat derivative of how plants acquired photosynthesis.

(That’s how everything in biology works. It’s the worst case of spaghetti coding, ever, full of self-modifying compilers, state saved in the JIT’s buffer, and programs only working by exploiting CPU flaws depending on the Hall effect, but only on workdays (according to the Julian calendar), fan speed, and your database server’s support of OpenGL).

But it’s really just a tree.

What’s far less of a “real” concept, is that of species. That makes essentially no sense in non-sexually reproducing organisms.

It's definitely a tree for species with sexual procreation. Each individual has a mother and a father, and species are pretty well defined.

For life forms that procreate by individuals splitting themselves into two copies, while also getting DNA from each other in various random or intentional ways, it gets a lot messier. We still talk about "species", but it's a quite different thing.

No it's not a tree. It's more like a graph. Some branches split and merge again later. Species are not well defined neither (or, if you prefer, there are plenty of competing definitions)

What's an example of species merging?

They really don't, at least not in a way where their genomes would mix in equal parts, or be concatenated.

The closest to merging is probably mitochondria, which probably were single-cell creatures ingested by some very early (single-celled) ancestor of us and, instead of being devoured, ended up in a symbiotic relationship within its host. The mitochondria to this day have their own genome and function a lot like a cell-within-a-cell. Same for plants and photosynthesis.

Just a small note: lateral gene transfer (= horizontal gene transfer) also exists in eukaryotes, although it’s much rarer than in bacteria. Nevertheless, the human genome is actually full of genetic elements that were originally transferred laterally from other organisms (virtually all of them via viruses). A “contemporary” example (i.e. a gene transfer that most people experience in their lifetime) is herpesvirus infection: Herpes simplex are retroviruses that introduce their own genetic material into the host’s.

It’s important to note that such viral infections generally do not infect germline cells and are thus not automatically inherited (though, like in the case of genital herpes, are easily transmitted to the offspring during birth).

Yeah... I started with “the only...” then noticed the errors of my ways (or: got sucked into biology’s usual vortex of more complications) and hinted at viral DNA in the sentence starting “Plus viruses...”

That’s a second point, and I should have corrected the start (to “three-ish, since the point on mitochondria is yet another)

If we just look at the genotype and use some sort of similarity metric, how treelike does it look, without presuming a tree?

As I mentioned in another post, the triangle inequality roughly holds, with some counter examples in proportion to what one would expect from a stochastic process.

If you’re asking specifically about instances of non-treelike gene transfer, the answer is twofold:

First, and I may be mistaken here (I’ve been out of the field for a few years) I think that any metric fulfilling the triangle inequality has a corresponding, consistent tree. So you’ll get a tree, totally. It may just not be the correct one.

Second, if you are asking for our ability to quantify divergence from that model, I’ll give a short explanation of the codon usage method mentioned before as an example:

The genetic code has 4 letters (ACGT) that translates to a different sequence consisting of 21 Amino Acids (a chain of amino acids is a protein, such as the enzymes doing catalysing all the fancy chemical reactions in your body)

Each three letters of DNA code one “letter” of the protein sequence, meaning one amino acid. Since 4^3 > 21, the code isn’t quite optimal.

There are some three-letter sequences that contain meta instructions like start/stop, and some that don’t have any meaning. But there are also instances of two or more three-letter sequences translating to the same amino acid. They are functionally identical. You can replace them in the lab, without any change in phenotype.

For some reason, some species (or even branches in the teee of life) still show preferences for using one or the other three-letter code for a given amino acid.

If you plot which of the (functionally identical) codes are used within a bacterial genome, you can find sudden changes, where the preference switches dramatically for some lenght, then returns to the previous preference.

That’s indicative of a piece of DMA that jumped across the tree. It’s really not very subtle when you know to look for it.

Another, even more obvious, example is viral DNA: this tends to end up as mangled, non-coding (“junk”) DNA, containing (fragments of) genes that often have nothing in common with the rest of the DNA, but have long stretches nearly identical to, say, a known gene for some viral coat.

In terms of quantification I’m really at the limits of my memory (and, unfortunately, in-flight internet), but I’ll take a stab and the former mechanism can be found in 5-10% of bacterial genomes, and amount to usually less than 8% of he genome, with maybe one or two exceptional cases with 20% or so (some bacterial species may have developed a tendency to exploit this sort of buffer overflow to cheat at evolution)

Phylogenists (the biologists working in trees, but not those “trees”) do consider all this (and much more). Where they encounter “known unknowns”, you will often see trees with nodes branching into more than two branches. That’s essentially what cartographers would lable “here be dragons”. Or, prosaically: it isn’t quite clear which split happened first in evolutionary history.

Thanks, more specifically what I'm asking is say we sequenced computer code. If we assume a tree we can make any collection of projects look like a tree. However, if we don't assume a tree they'll form some other sort of graph.

Have we done the same sort of experiment with genetic code to verify whether it actually does form a tree, or if that's just an assumption we are forcing on the data?

Well, chunks of it are tree-like and chunks of it have less well-defined structure. The tree still works very well. We still teach the Bohr model of the atom, too.

I never understood why we were taught the Bohr model. If it was a history of science class then maybe it would make sense, but IMHO it just lets students develop incorrect intuitions.

Science is not an accumulation of knowledge, science is a set of techniques for discovering new knowledge and testing it. So if a chemistry class just taught you the most accurate world view that teachers had at their disposal, it would be a failure. Understanding the history of science—experiments are done, new models are proposed, old models are overturned—is critical to understanding how to do science in a way that is perhaps less critical in some other fields.

Besides, every other model lets students develop incorrect intuitions. Incorrect intuitions are still useful, and by letting students develop intuitions based on more obviously incorrect models like the Bohr model, you are teaching them the same skills they will need to use when they discared Lewis dot diagrams for molecular orbitals, or any other point in the future when established models are replaced. If you can't do that, you’re not doing science, and if you can’t teach people how to discard theories, you’re not doing a good job at teaching science.

There is a tendency among the older generation (any older generation, at any given time) to complain about how their children are being taught wrong. Parents will pass onto their children a bunch of facts and skills, only to find out that schools teach something completely different. This is natural, of course. You can’t freeze a curriculum in time and still make it useful, you have to throw some parts of it out. But if you take students through the process of throwing something out, like the Bohr model, they’ll be more prepared for a changing future.

Fair enough, but I feel like I've learned more about actual atoms by reading Arstechnica as an adult than I did during high school chemistry class - I may be remembering incorrectly, but my recollection is that we learned antiquated models of the atom as a history lesson / collection of historical and no longer true facts. I'm sure that the science was mentioned, but we didn't focus on any more generalizeable scientific skills.

In my own field (statistics), I think that this is quite common. We commonly teach ANOVA and linear regression separately because it's always been taught that way, despite the fact that a good appreciation of linear models is far more general, useful, and intuitive.

This assumes a tabula rasa state of mind, where children have not formed any intuitions about the parts of the world they have not formally been taught about. However, when we learn about the sciences, we always discard incorrect intuitions (although probably often only nebulously defined). IMO, there is no need to explicitly teach something incorrect only to be able to un-teach the very same thing later. At best you alienate students because they feel either confused, lied to, or patronized. At worst, you do form incorrect intuitions that are even harder to get rid off than normally because they were taught as scientific gospel earlier.

It's clear that you have a very different idea of what I am talking about than I do. The comment about "tabula rasa state of mind" makes no sense to me. First of all, I want to focus on this part of your comment:

> IMO, there is no need to explicitly teach something incorrect only to be able to un-teach the very same thing later. At best you alienate students because they feel either confused, lied to, or patronized.

This is an impossible demand for science education, because every model of the atom we could possibly teach is "incorrect". You could teach students only the most recent, most accurate theories of the world, but at this point in time that means starting with something like relativistic field theory and that approach is obviously wrong.

If there is a scientific gospel, the gospel is "test your theories", and you can accomplish that by teaching students about the Bohr model. "Here is a theory about how atoms work, the theory is useful because it opened the door to quantum mechanics, it explained phenomena like the hydrogen spectra given by Rydberg, but it was known to have certain shortcomings." This gives students a case study for the scientific method. We can put an electric current through hydrogen gas and look at the spectrum with a prism or diffraction grating. We can calculate a formula for the spacing of the spectral lines. Then, we can come up with a model for how the atom works which explains those lines. Finally, this leads to more observations which contradict the model, and the process repeats.

This is the scientific process, and if you are not teaching the scientific process, you are not teaching science.

The reason you might use the Bohr model specifically is because many of the related experiments can be replicated in a school laboratory by students of relatively modest mathematical ability... just algebra is enough.

And to be clear here, we are not teaching the Bohr model as fact. It's not going to be "un-taught" or "un-learned". Other theories you would often teach in a chemistry class include phlogiston theory, again, because the experiments disproving it are easy enough to run in a typical high school chemistry class, not because phlogiston theory is particularly useful.

Because it captures important truths about the physics and gets the right answer; all using just high-school mathematics and very little quantum mechanics.

The electron really is a wave-like thing confined a by coulomb potential and thus has a discrete set of solutions. Moreover the discrete indexes of that solution map onto mechanical properties like angular momentum and energy. The Bohr model captures all of that, and even gets the right equations for the energy (at least in simple cases).

Of course the real wave equation that needs solving is much more complicated than the 1-d cartoon in the Bohr model, but that cartoon works because it captures many essential properties of the real system.

I apparently did not appreciate this when I was in high school.

I was never "taught" the Bohr model... there was a passing mention of it and a statement that it is obsolete, followed by an "electron cloud" model with an explanation that quantum theory is needed to explain it. (I'm remembering my high school classes from early 2000's.)

But all my peers remember the Bohr model regardless. Because it is simpler, it is easier to remember. People will develop the wrong intuition regardless of what you do. If you start with the complicated ideas, they just give up on learning instead, substituting superstition.

In any case, you can't really do anything with a model of the atom unless you're using equations, and you only get the quantum versions because those are the only ones that work. Everyone else doesn't have any need for a true model of the atom, and thus no incentive to learn it. (Most of the time, you'd rather need a model of molecules.)

It still more or less works as a model for certain systems, just like Newtonian gravity works just fine at certain scales.

For the same reason we teach and use Newtownian mechanics, even though it's "wrong".

Imagine if there is something really ancient out there but it's dying because of global warming or something. That possibility makes me sad.

Over the last 1+ billion years the Earth has gone through many shocking events. Not just large changes in global temperature through ice-age and warming periods, but the atmosphere composition, asteroids, super volcanoes, and so on. If the ancient life form is still alive today after all of that, I doubt global warming would cause it to die.

Survivorship bias.

Lots and lots and lots of organisms died out during each of those changes. That things have survived so long is a matter of probability not inevitability.

My point was that a 2 degree change in the atmosphere now isn't so extreme for a 1+ billion year old organism as all the other history it has lived through.

I'm not trying to brush off the impact of global warming, I'm just placing it in the context of 1+ billion years of Earth history.

That's the point?

Anything that has survived for a billion years in a form that could still be considered roughly the same organism or a close descendant must have a relatively stable biochemistry and gene pool that is capable of surviving an extreme range of environmental changes - as close as you can get to a holy grail in natural selection.

Obviously its survival would not be inevitable; not even stars can survive their own age.

You have the math backwards. If there are events that kill 99% of everything and a million such events occur, then the survivors must be hardy in the face of shock. Global warming won't kill them; they've already been through incidents of far worse warming and cooling.

They maybe something in the soil of colder climates, desperately clinging on to existence, needing just a few degrees more heat to thrive. Either way, I find these articles about new life discoveries fascinating.

There are some bacterias in the russian permafrost that have been hanging on for a few hundred thousand years at this point (they aren't dividing, just repairing themselves in wait for a better environment, they're the same cells that got frozen up all that long time ago).

Possibly some simpler and more robust organism may lurk under arctic ice for a better time.

School should do this with other areas of science, I remember being taught about atoms and how 'absolutely true' it was that they were the smallest possible things, despite the fact it was known at the time that was not the case in other areas of research.

It would have piqued my interest much more at a young age if we were taught 'this is what we have found out... so far!'. There's still so much left to discover!

This is what excites me about science - the discovery of the previously unknown and the revision of what we previously thought we knew.

I wish our education system promoted this idea - "what we know about science will change in your life time. It's likely that in the future, most of what you're learning today will be outdated. It doesn't mean what we're teaching you is 'wrong,' but rather, it's the best of what we know now. You're about to enter on a glorious adventure of discovery that will not end at graduation. Science is about hypothesis, iteration, and testing...and the discovery of unanticipated results."

Instead, what's taught is a static version of reality: What we know today is all that is worth knowing for the rest of your life.

In that regard I always admired Eliezer Yudkowsky's idea of working to become "less wrong" as you progress. It kind of sums that up perfectly.

We haven't got it all figured out yet, but we're less wrong than we used to be.

Or before Yudkowsky, Asimov's idea of "the relativity of wrong":


"Naturally, the theories we now have might be considered wrong in the simplistic sense of my English Lit correspondent, but in a much truer and subtler sense, they need only be considered incomplete."

That link omits some parts of the essay. Complete version here: http://hermiene.net/essays-trans/relativity_of_wrong.html

> Science is about hypothesis, iteration, and testing

Maybe in the past. Today science is about getting grants, publishing as much as possible even if the paper is trivial or you p-hacked your way to heaven.

It's also about finding a niche where there is no competition so that you can build a name for yourself. If any sort of competition appears, quickly run away to a more fertile ground (see the recent article about DeepMind and protein folding)

You're talking about science, the industry. I believe the person to whom you responded is talking about science, the school subject.

Actually it's materially important. To think the politics of science doesn't impact what results come out of the scientific industrial complex is a view sharply divorced from reality.

Politics is a deep vein that runs through absolutely everything and it has a big impact and oftentimes it's not obvious what the impact is.

Science is as much about the politics of science as it is about an empirical quest to find the truth.

Look at the history of Bayes Theorem as a relatively good example.

Even in science ideas don't win purely on merit. They require marketing and political support too.

Maybe the comment had it's wires crossed a little but still it's a valuable thing for people to keep in mind.

Agreed. And politics should always be minimized so we get less of this:

>Even in science ideas don't win purely on merit. They require marketing and political support too.

Ideally the world would be a meritocratic place, while still offering both sympathy and dignity to the weakest members. Does anyone disagree with that? But still, politics and cheating always weasel their way into power structures of society. And it should be seen as morally reprehensible within science.

For people interested in this kind of work, I recommend The Tangled Tree, the biography of Carl Woese, who discovered Archaea: https://www.nytimes.com/2018/08/13/books/review/david-quamme...

Evolution is interesting but up and down from this level of the tree of life too. I also loved reading "The Beak of the Finch" to learn about evolution in macroscopic metazoans.

For those interested in going deep on this, I highly recommend this essay I was fond of in grad school by W. Ford Doolittle, who writes a lot on this subject: [Pattern pluralism and the Tree of Life hypothesis]([https://www.pnas.org/content/104/7/2043)

For a while now amongst evolutionary biologists it has been clear that the "root" of the evolutionary tree, or the so-called "LUCA" (last universal common ancestor) may not exist, mostly because horizontal gene transfer is so common amongst unicellular lifeforms that it ruins the orderly notions of descent-with-modification that we have formed following Darwin's studies of higher life.

I imagine this is disheartening to most people, and the orderly notion of a "tree of life" with a simple root persists because we want an uncomplicated picture of the world; human needs for ontological clarity trump the disgusting ball of muck and sputum that is actual life.

It seems completely obvious to me that we continue to find new types of life. Given that "More than eighty percent of our ocean is unmapped, unobserved, and unexplored". I could not find a number that represents unexplored mud and muck, where this specimen was found. My guess is something very close to 100.

How big is that critter?

Answer my own question: other articles suggest it is about 15 μm long.

I wonder if we will discover that this tree is "hairy" - that there are thousands of isolated branches that departed from the main branch hundreds of millions of years ago. They wouldn't be "kingdoms", since they'd only contain a few (or even one) species.

It may be genetically distinct from animalia, but it’s definitely acting like an animal. It moves independently, senses external stimuli and hunts prey.

Actually, it’s probably the closest thing to an actual alien life form that we’ve yet encountered. This is what that kind of convergent evolution would look like.

As it turns out, not quite a guy with pointed ears wishing us long life and prosperity.

Humor aside, it is a very interesting idea to find convergent evolution far away on the tree of life to conjecture about exobiology.

I, for one, welcome our new hemimastigote overlords.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact