

The increase of genetic complexity follows Moore’s law (2013) [pdf] - dgudkov
http://arxiv.org/ftp/arxiv/papers/1304/1304.3381.pdf

======
dekhn
OK, this paper is bunk for many reasons.

First: Moore's law is just exponential growth (with a doubling time t). Drop
the Moore's Law terminology from the paper.

Second: C-value paradox isn't. It's based on false assumptions. Given there
are frogs and trees with genomes 100X the size of their nearest species
neighbors (this is likely due to some completely boring mistakes), and that
data is just copies and copies of the same DNA over (inlcuding duplications of
genes, duplication of regulatory regions, and duplications of hanger-on DNA.
Calling it an enigma (in the cited paper:
[http://www.ncbi.nlm.nih.gov/pubmed/19216716](http://www.ncbi.nlm.nih.gov/pubmed/19216716))
is a bit better, because if we understood how genomes can tolerate this sort
of expansion it owuld be great.

In short: the authors recognize some of the important problems, but go off in
the wrong direction with it.

~~~
rational-future
Your two arguments are extremely weak and bunk ;-) You are only objecting
names and have nothing to say about the content.

I'm not an expert in biology and have no idea how good this paper is. I was
getting a PhD in math from West Virginia Uni in early 90s. Sharov was visiting
at the time and had build a reputation of a pure genius.

~~~
dekhn
it's not worth the time to write a full rebuttal to the paper. It doesn't even
pass basic tetss.

------
sytelus
This is very interesting and readable paper with astonishing consequences.
Here's summary:

\- Genetic complexity roughly increases 10X every billion year.

\- This implies life started about 9 billion years ago.

\- Adjusting for potential hyperexponential effects, origin of life could
close to start of universe itself.

\- It took 5 billion years to get to the complexity of bacteria.

\- Life had started long before Earth came in existence but it was very
primitive (aka panspermia).

\- If you follow this theory it means, there were no intelligent species in
universe that had existed before us. This means, Earth wasn't seeded by
another intelligent aliens.

There are some sweeping propositions that Kurzweil is wrong on whole
singularity concept. There is also a mention that Drake's equation is likely
wrong to calculate number of civilizations in galaxy based on this theory.

In a way, I think this paper actually points to many more possible places with
intelligent life because it proposes primitive life had been floating in
universe long before and planets like Earth got seeded with them. Unless we
assume Earth as having the best possible conditions in entire galaxy, this
means intelligent life has to exist somewhere else in large enough sample of
planets. The only problem is that this life isn't going to be far more super
superior (or more accurately exceed us in genetic complexity).

------
tomaskazemekas
One of the most interesting points of the article is that humans are likely
the most intelligent species in the universe and that even if there is another
life form out there it is of similar intelligence.

~~~
givan
Making such assumptions about life in the universe based on our knowledge or
more specific, lack of it, is like saying earth is flat because that is how we
see it from here.

~~~
zxcdw
Why so? Of course it is fair to make an assumption -- it would be a different
story if there was a claim made based on that assumption, as in your
comparison. In other words, you made up a silly case and attacked against it
to argue against the original point -- you constructed a straw man argument.

~~~
givan
Is silly to make bold statements about something we know nothing about, I was
trying to make a comparison, from here we don't even know life exists some
place else in the universe, we are still in the flat earth stage about this
one.

There is no point there, I see no logic in that statement, as I said we know
nothing about the subject, this is pure speculation.

------
hyp0
I really like this - what would RNA world be like? What would even simpler
life be like? (It probably can't exist here now, because other life will eat
it).

Some interesting comments from last year (mostly critical):
[https://news.ycombinator.com/item?id=5580334](https://news.ycombinator.com/item?id=5580334),
[https://news.ycombinator.com/item?id=5552381](https://news.ycombinator.com/item?id=5552381)

------
gwern
This was heavily criticized back when it came out, and the criticisms struck
me as valid and the paper not useful.

------
eli_gottlieb
Surprising and intriguing and therefore requiring extraordinary evidence. It
doesn't quite help that Sharov is citing himself all over the place, but then
again, most lines of research do that.

>Thus we stick to the suggestion to measure genetic complexity by the length
of functional and non-redundant DNA sequence rather than by total DNA length
(Adami, et al. , 2000; Sharov, 2006).

Please God tell me they've at least _tried_ compressing the DNA to account for
actual fundamental string complexity rather than just length.

>There is no consens us among biologists on the question how variable are the
rates of evolution.

Well yes, I don't see that their model isn't blindly averaging over punctuated
equilibria.

Overall, a nice line of research to pursue, though. Deducing how much time
evolution actually takes to optimize from lifeless particulate soup to the
first life-forms to animals to people would certainly be an achievement.

~~~
ssdfsdf
Why would you want to be compressing the string? For this to be useful you
would need to be searching for the algorithmic complexity of the string, not
merely how compressible it happens to be with, say, huffman encoding. Finding
the algorithmic complexity of the string will be intractable. I would consider
reasoning in the following way:

There is selective pressure to reduce the length of the dna within an
organism, since maintaining dna is costly for the organism. So one might
suppose that on the average, over time that an organism will only contain the
length of dna which is necessary, give or take.

Of course it might be that the encoding of the information changes over time,
so that the organism is able to store more information in a the same length.
Or possibly less information in the same length, for reasons of benefit to the
organism.

This needs careful reasoning, I am not convinced either my approach, or yours
is enough.

~~~
rgejman
The idea that DNA length is costly to the organism is a venerable, but
increasingly controversial theory. The existence of disproportionally huge
genomes belies the hypothesis. While it's possible that some organisms derive
benefit from having larger genomes, it's also reasonable to infer that most
eukaryotic cells are not limited by energy. So much of eukaryotic cellular
activity seems energetically "wasteful," yet cells just don't seem to care.

~~~
ssdfsdf
Hmm, yes this may be true, even in the average.

I would still caution against using any old encoding technique on a string
representation of the genome and using the compressed length as any sort of
meaningful measure of the inherent information contained within it.

~~~
eli_gottlieb
Yes, but _some_ compression algorithms are still nice ways to approximate, for
ordering or measurement purposes, the algorithmic complexity of a string.

~~~
ssdfsdf
I'm not sure that is true. Take for instance the first 100 prime numbers
printed one after another in a string. The string is long and apparently
random, yet contains little algorithmic complexity, since the machine which
prints out the numbers is fairly simple. A standard compression algorithm will
not be able to compress the string very effectively.

Therefore I am not sure that compressing the string is likely to give you a
sense of the information contained within it, at least information in the
sense which we are interested in.

~~~
gwern
Compressing with something like gzip, xz, or zpaq gives you an upper bound on
the complexity. This upper bound is pretty loose, but works surprisingly well
in a surprising number of domains (not beating special-purpose algorithms
usually, of course, but that something like gzip can be used to estimate eg
phylogenetic trees at all is surprising).

See for example
[http://www.illc.uva.nl/Research/Publications/Dissertations/D...](http://www.illc.uva.nl/Research/Publications/Dissertations/DS-2007-01.text.pdf)
'Statistical Inference Through Data Compression'.

~~~
ssdfsdf
I am aware of the interesting duality between compression and learning. I have
spent quite some time thinking about it.

However I am still not convinced that this upper bound is going to provide us
with the information that we need in this case. What we are looking for is a
relative measure of the complexity between each genome. The upper bound will
not necessarily give us this relative measure because it may not be able to
compress the genetic code of organisms by the same factor. The compressibility
of a particular genome, by a specific algorithm will be dependent on the
method of encoding of information used by the organism. For instance the
organism may repeat codes for redundancy, but it may permute the letters of
the copy in a predictable way, for it's own reasons. The compression algorithm
used will not pick up on this.

It is useful to use compressiblity by a range of algorithms as the input to a
machine learning algorithm, or as part of the model in AIXI, but it is not
useful for estimating algorithmic complexity (as far as I am concerned, I am
open minded, but not convinced yet).

~~~
gwern
> What we are looking for is a relative measure of the complexity between each
> genome. The upper bound will not necessarily give us this relative measure
> because it may not be able to compress the genetic code of organisms by the
> same factor. The compressibility of a particular genome, by a specific
> algorithm will be dependent on the method of encoding of information used by
> the organism. For instance the organism may repeat codes for redundancy, but
> it may permute the letters of the copy in a predictable way, for it's own
> reasons. The compression algorithm used will not pick up on this.

It may or may not. But an upper bound is still an upper bound, and turns out
to be usable for many purposes.

------
yohanatan
Ray Kurzweil has been predicting this result for years.

