Hacker News new | past | comments | ask | show | jobs | submit login

> the entire source code for the human brain has to be strictly less than 700 MB, because the fully sequenced human genome which obviously encodes the full human mind is less than 700 MB uncompressed

Actually those 700MB are compressed, in a way so sophisticated that we don't really know how to uncompress it yet - or whether it's even possible without the external resources our planet provides. And keep in mind that those 700MB only describe how to prepare the basic concept of the brain, whose memory is then packed with information we get from the culture.

What I mean by uncompressed is that you're reading the cytosine (C), guanine (G), adenine (A), or thymine (T) as being two bits per base pair (1 of 4 possibilities). There are 3 billion base pairs, which is 6 billion bits, divided by 8 to get bytes you get 750 MB.

The human genome is somewhat redundant and can be further compressed. That is just the string of "ones and zeros" (ACGT) could be run through whatever compression algorithm you wanted.

But don't take my word for it:

>"When the 4 bases are packed into one byte ( .2bit format) the size is 770M (hg18.2bit) , but you'll need an extra tool to decypher the data." [1]


You raise an important point:

>And keep in mind that those 700MB only describe how to prepare the basic concept of the brain, whose memory is then packed with information we get from the culture.

Yes, absolutely. I simply called it an upper bound on how complex a brain's architecture could be. DNA obviously encodes the brain's architecture, since humans all have human brains. Beyond that, there is a very large variation in people's mental capabilities and brains, and the largest variation of all comes from culture.

But culture could be given to a virtualized brain (called training).

Bear in mind that when human brains receive culture, it takes them years of all-day training before they're even able to speak. So full 1x human brains take a long time to train.

When you see results out of neural nets that are similar to what very young toddlers can do, you should be awed. We have the computational power in server farms to do what full brains do -- if not now, then soon.

This isn't some sci-fi pipe dream. Go ahead and look at the facts.

[1] https://www.biostars.org/p/5514/

What your link is describing is the size of the Genome encoded into bits. In other words, that's the size of the file containing an entire genome compressed into base pairs in order to store the raw base pair encoding on disk as digital data. That is definitively not the measure of the amount of information encoded into the DNA.

Nor can you guess at that encoding by modeling DNA base pairs as 2 bits. DNA base pairs aren't bits. They don't define op codes or memory registers. They are read and executed in a complex way that we don't fully understand. They interact with each other in complex ways that we don't fully understand.

There's way more information encoded in those base pairs than a simple 2 bits. You can't simply model them that way and declare that there are X Bytes in a genome. DNA isn't digital.

you are right - see my comments under here (1 down):


however, that just concerns the DNA argument. We know roughly how many neurons there are in humans and their connectedness.

I think that your "700MB uncompressed" fails to take into account that the construction, development, and maturation of the brain relies heavily on cellular and molecular mechanisms. I think it is a little disingenuous to hide the enormous wealth of information necessary to create a brain, much less understand and utilize one, inside of your compiler.

>hide... inside of your compiler.

Not to mention the rest of the ecosystem. Look what happens to humans when they grow up without being properly embedded in the family, with its hundreds of thousands of years of historical contingency: https://en.wikipedia.org/wiki/Genie_(feral_child)

We can't even begin to describe the amount of information encoded there.

uh, you can begin to describe the amount of information. if someone grew up with sensory input limited to HD video meaning something like 6-10 GB/hour they would be handicapped versus other humans but not drastically so. In 6 years there are 52,560 hours. A large library, sure, but we already have all this digitized anyway.

I'm cutting a lot of corners, and the human substrate in which our brains are embedded is finicky - you can't just leave a toddler in a roomful of DVD's with food and get a fully functioning adult after 48 months of unsupervised training.

but it's not "can't even begin to describe the amount of information encoded there" either. 50,000 blue ray discs' worth oughta do it.

I'm not saying we've figured out any of the other stuff - it's just that computationally (horsepower) we're there; the later training set for unsupervised learning is also there; etc.

The missing parts might well seem insurmountable - but every result that shows AI performing at the level of a 2 or 3 year old is wonderful. This is it. This is beginning of the turning point. It can happen at any moment.

Someone at this very moment could be setting up a neural net that after 48 months of training can deduce its own status in the world, make novel and correct sentences, maintain a coherent world-view, and be trained on the entirety of the Internet at 10x the speed of adult brains. (The limit is 1,000,000x speed of adult brains - because the silicon substrate we're using today propagates signals literally a million times faster - today.)

We're there. It's all there. It's "just" 86,000,000,000 neurons (with 7k connections each) and 12 years of supervised training to get to the level of a 12-year-old. 3 pounds. 20 watts. This is happening. Amazon's and Google's server farms blow it out of the water computationally compared to a brain, today. we might not come up with the same architecture but what we are coming up with is making breathtaking progress.


EDIT: I've been submitting too much, but I agree with JonnieCache's thoughts below. However, as a thought exercise there is no reason we couldn't train AI interactively, putting it in a VR room and literally talking to it and correcting it, etc, like a pet. Granted this is not a normal approach to take but since we're discussing the theoretical limitations you can certainly envision it. Obviously nobody is trying to do that - we're not trying to come up with sentience using an approach like this and have no idea what steps humans go through exactly to get there. But it's not computational power that keeps us from getting there - and we could be surprised at any time.

>50,000 blue ray discs' worth oughta do it.

By this logic, the handful of megabytes of unicode making up War and Peace in the original russian should be enough for a non-russian speaker to fully grasp it and all its meaning and implications. It isn't even enough for a native russian speaker to do so.

Humans aren't raised by simply looking at their surroundings, they're raised by interacting with people, who were in turn raised by interacting with people, going back for the whole history of humanity, or arguably mammals. That information isn't all in the genome, although natural selection has put some of it in there. The bit that we don't know how to describe information-theoretically is the bit that isn't in the genome, because we don't know how it's encoded.

I don't see how you'd even try to put bounds on it: this is essentially the problem posed by post-modernism/post-structuralism/literary theory, once you strip away the marxism, and science's response has understandably been to reject it but it can't do so forever if it wants to create AGI.

Or maybe I've misunderstood your point.

I might be pursuaded that a 2 year old could come sooner than we think, via brute computational force as you describe, but I'd argue that a 2 year old with the capacity to become anything more than a 2 year old is much farther away than we think.

EDIT: if, as you claim, we are close to having the computational power to simulate human children, then why aren't we already successfully simulating much simpler animals? IIRC the best we can do is a tiny chunk of a rat, or the whole of various kinds of microscopic worms, and those are just computational models, not turing test-passing replicants.

One good analogy I've seen for this (I think expressed by @cstross) is that the DNA isn't the source code. It's the config file for a much, much larger process that only exists as a running blob of object code (i.e., all those cellular and molecular mechanisms, plus the environment, plus culture) and is never serialized anywhere.

I don't think you are correct from an information computation point of view. When looking at the computation done by neurons in the brain it is sufficient to abstract away the lower-level substrate in which it occurs.

You and other posters are all correct regarding the huge volume of information on which human minds are trained. it's hardly unsupervised learning either :)


EDIT: In response to your comment, I've given it further reflection. DNA as source code may be misleading as an "upper bounds". After all, suppose we knew for a fact (assume there were a mathematical proof or anyway just assume axiomatically) that a one hundred megabyte source code file completely described (contained every physical law etc) a deterministic Universe, and that if you ran it on enough computation to fully describe a hundred billion galaxies with a hundred billion stars each, one of those stars would have a planet and that planet would contain humans and the humans at some point in the simulation would deduce the same one hundred megabyte source code file for their Universe. (This is a bit of a stretch as it's not possible to deduce the laws of the universe rigorously.)

Anyway under that assumption in a way you could argue that the "upper bounds" on the amount of entropy that it takes to produce human intelligence is "just" a hundred megabytes, since that source code can deterministically model the entire Universe. But practically that is useless, and the humans in that simulation would have to do something quite different from modeling their universe, if they wanted to come up with AI to help them with whatever computational tasks they wanted to do.

In the same way, perhaps DNA is a red herring, as there are a vast number of cells in the human body (as in, tens of trillions) doing an incredible amount of work. So starting out with DNA is the "wrong level" of emulation, just as starting out with a 100 MB source code file for the universe would be the "wrong level" of emulation, even if we posit axiomatically that it fully describes our entire Universe from the big bang through intelligent humans.

So I will concede that it is misleading.

All that said, I think that emulating or considering the computation on the level of neurons is sufficient - so it is sufficient to look at how many neurons are in the human brain and the kind of connections they have.

As for the efficacy of this approach - that's the very thing that is being shown in the story we're replying to and many places elsewhere. It works. We're getting extremely strong results, that in some cases beat humans.

I believe that emulating or comparing to humans at the neural level should probably be sufficient for extremely strong results. We do not need to emulate every atom or anything like that. I consider it out of the question that we would discover that human minds form a biochemical pathway into another ethereal soul-plane and connect with our souls in a way that you can't emulate by emulating neurons and the like, and that the souls are where intellect happens and brains are just like "radio antennas" for them. Instead, I think that the approaches we're seeing will achieve in many ways similar results to what humans brains produce computationally - a much higher level of abstraction is sufficient for the results that are sought.

I will confess to not being an expert, but I disagree: I don't think it's sufficient to abstract away the lower-level substrate when the OP was referring to DNA as source code, which absolutely depends on that level of detail to both construct the system (the brain) and to enable the continued development and maturation of that system (a physical, real brain).

I was not referring to the huge volume of information necessary, as I acknowledge that as being "outside the system" for purposes of this discussion, so my apologies for any confusion I might have caused.

It may be possible that (and it is my belief that) there is a higher-level abstraction for the computations taking place in the brain, even if it is on the neuron-level, but at that point I don't think you can claim that the source code for that is going to fit under 700MB by using DNA as a baseline.

you are right, I went too far. see my other comment:


however, that just concerns the DNA argument. We know roughly how many neurons there are in humans and their connectedness.

> (This is a bit of a stretch as it's not possible to deduce the laws of the universe rigorously.)

Not sure I agree. This is known as the problem of induction, and Solomonoff tackles it quite well I think.

Certainly it is not possible to deduce every law governing the Universe! For example posit there is source code for the Universe; then assume a second Universe that is identical except its source code differs in that after 35 billion years suddenly something happens - but since there is no way to investigate the laws directly, nobody inside could possible "deduce" the complete set of laws, there is no way to know that the source code contains a check that gets tripped after 35 billion years. Since only one of the two versions is the one running the universe, but it is not possible to determine which one, therefore the inhabitants of the Universe cannot deduce the laws that govern them: for the first 35 bn years the two universes are bit for bit identical.

So if I wanted to assume there is a one hundred megabyte source code that somehow was actually exactly the source code governing the entire Universe, I'd have to assume that axiomatically. We could never "deduce" it and know for certain that it is the only possible source code and our Universe must be following exactly it and nothing else.

At least, this is my thinking...

You absolutely can deduce those laws, which is what Solomonoff induction does [1]. It's been formally proven to converge on reproducing the correct function in the limit.

> Since only one of the two versions is the one running the universe, but it is not possible to determine which one, therefore the inhabitants of the Universe cannot deduce the laws that govern them: for the first 35 bn years the two universes are bit for bit identical.

Correct, until they're not, at which point you can distinguish them. Until then, you prefer the laws whose axiomatic complexity is lower. It's basically a formalization of Occam's razor.

[1] https://en.wikipedia.org/wiki/Solomonoff's_theory_of_inducti...

I was using deduce as 'arrive at the actual set of laws (the laws 'deduced' perfectly matches reality indefinitely going forward) and know correctly that they must be the actual set currently running the universe.'

In your phrasing, it seems that you would assume this following sentence does not contain a contradiction.

> Alice in the year 23,435,342,435.03453725 correctly deduced all laws governing her Universe and through a clever mathematical trick was able to correctly deduce the state of the universe in the year 24,535,342,450.03453725 (15 years later, i.e. '50 instead of '35), which a mathematical trick allowed her to predict with absolute precision without having to model the entire Universe. Having deduced the correct set of laws, she knew for certain that that state at '50 would include x, let's say atom #349388747374123454984598323423 in that Universe must be a hydrogen atom. She knew this with certainty, because she had deduced the laws governing her Universe and that they could not be any other set of laws. The laws were also deterministic. She was also correct in her calculations. Therefore, since she made no mistake in her calculations, had deduced the correct set of laws which was actually running the Universe, she was correct that atom number #349388747374123454984598323423 would be a hydrogen atom. One thing to note however, is that the whole universe happened to have an "if year ==24,535,342,449 then all atoms turn into nitrogen atoms wherever they may be" which someone had added to the source code as part of an experiment. Therefore, although she had correctly deduced that atom #349388747374123454984598323423 in the year '50 would be a hydrogen atom, in fact when that year rolled around it was not one. That doesn't stop her deduction from being correct and proper, or mean that she had not deduced the laws with certitude or been correct. In effect, it is possible fro me to say with complete and justified certitude that "next year x" as long as I have completely rigorously deducted it, and I will be correct in my rigorous deduction and in thinking that I have 100% chance of being right, even if next year not x."

you see my struggle?

For me, if you "deduce" that something "must" be exactly some way, you can't possibly have been right in your deduction if it turns out that you are wrong.

Nobody can deduce that Pi cannot equal 4, if, in fact, there is any chance that it is indeed 4 (under that axiomatic system). That's not a deduction: it's a logical fallacy.

so you are you using an extremely different definition of "deduction" as I am. I can deduce that within standard axiomatic mathematics, Pi is not equal to 4. After making that correct deduction, there is no conceivable Universe in which I am wrong. (Seriously - it's not possible that in another universe someone exploring ZFC+ would find that Pi is four there.)

Held to such rigor, there is nothing that I can deduce with certainty about the laws that govern our Universe - so that it is impossible that some other laws or a larger set of laws might in fact govern it.

However, that is exactly the state that I wanted to assume as axiomatic for my argument. (I wanted you to assume as axiomatic that the complete set of laws or source code could be 'deduced' and as guaranteed to be correct as we have a guarantee that under ZFC+ pi isn't equal to 4.)

if we knew that "these 100 megabytes exactly model our Universe, and by the way deterministically - if you run it on enough hardware to simulate a hundred billion galaxies with a hundred billion stars, one of them has a human around year 12 billion" and we are guaranteed that it is exactly the laws of our universe with the same certitude that we are guaranteed Pi isn't equal to 4 under ZFC+ -- well --- that is not the level of certitude that Physics is able to confer :)

Imagine a very compact, compressed representation of the most complicated CPU on the planet. How many MB do you think that would be?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact