Hacker News new | past | comments | ask | show | jobs | submit login
Artificial Intelligence Generates Christmas Song from Holiday Image (nvidia.com)
136 points by kebinappies on Dec 1, 2016 | hide | past | favorite | 111 comments

The melody is all over the place, and the rhythm is hard to tap a foot to, but one thing is certain: it's absolutely convinced that Christmas trees get decorated with flowers. Lots and lots and lots of them.

I found the naïvité and belief that flowers are put on Christmas trees truly touching. Such a pure reaction to something that can be so cynical. For that I'll forgive the tone-deaf singing.

I could have mistaken it for a track off The Shaggs'[0] lost Christmas album. I like to believe Frank Zappa would have liked this, too.

[0] https://en.wikipedia.org/wiki/The_Shaggs

Oh man, The Shaggs are something else.

And it just so happens that I live two towns from Fremont, NH, where they're from.

Sounds like "Friday" by Rebecca Black.

One of the cofounders of the Echonest (acquired by Spotify) created this back in 2004:

    "A Singular Christmas" was composed and rendered in 2004. It is the automatic statistical distillation of hundreds of Christmas songs; the 16-song answer to the question asked of a bank of computers: "What is Christmas Music, really?"


Very interesting reference! Deep learning is statistical, so this is sorta one of the spiritual predecessors.

By my understanding, yes, in the sense that certain parts of each algorithm are looking to minimize something...the former model used principal components analysis, which is linear in the sense that you are using transforms which pick out for the least correlated pieces of a huge chunk of data, whereas neural networks, which uses a combination of linear and non-linear layers picked by the user to minimize, "errors."

What's interesting is that the former model sounds so much, "better." I wonder if anyone could chime in about how our ears and auditory nerves or perhaps auditory cognition works, and whether they are more, "principal component analysis-y" somehow than "error minimization-y" or something relating to the actual math, which may explain why this new neural network christmas song sounds like absolute crap to us, whereas the older version sounds pretty amazing. Also, whether my understanding of the underlying math is correct or not.

Is it just me or it really sounds like a randomly generated chord progression that barely makes any musical sense?

"Musical sense" is a pretty subjective thing, which I suspect is one of the primary issues with artificial creativity.

That said, assuming training data comes from music with progressions that could be broadly classified as 'popular music,' you would expect to find some regression to the mean with more production and deeper training data. (to abuse a phrase)

One other issue that I think will come up is how insular and unevolving artificial creativity will be if it's based on present music for training data. What has historically moved creative trends is disruption; sometimes it's a slow burn and sometimes it's a few catalysts, but experimentation in artificial creativity will be hard to come by early on but quickly needed if it's to supplant human creativity.

The statistical approach is painfully naive and doesn't work - as is obvious from the example.

It's like feeding a net with the complete works of Shakespeare and expecting it to produce a genius-level original play. It's simply not going to happen.

The issue is not with the statistical but with the parameters around the output and the organization of training data.

I think that your assumption is that the genius of Shakespeare's plays can be statistically reproduced through sufficiently-clever organisation. That is not obviously true to me.

Some things are just art, capable of being truly understood only by a creature with a head and heart, arms & legs, love & hate, emotions, experiences ­— in short, a man.

Art is about perception not creation. If autonomously created art produces the same perception and evocative response, it sufficiently passes that test.

An autonomous agent doesn't need to understand the underlying emotion, it just needs to mimic it.

A autonomous car doesn't know why it shouldn't hit a child that jumps into its path, or why it is making any decisions at all despite those being some of the most important and fundamental to humans on earth. It just needs to reproduce the actions of a human that does understand those things.

Yes, I believe that artificial creativity will produce art will be indistinguishable from that made by humans.

So you're saying humans are not just cleverly organised statistical automata? If so, science would disagree with that position.

Oh, that's just because the input image wasn't good enough.

Sounds a lot like songsmith to me...


It kind of feels like a five-year-old trying to make a song. Which seems good - now they only need to improve the mechanisms and who know, maybe we'll get to the point when it's a ten-year-old?

Instead of five, don't you mean "two to three"? And under that comparison, isn't it scary how much it does sound that way? Like a kid who hears words together but don't know what they mean yet? (and doesn't really get the cultural things around it.) To me it sounds like a quite musical 2-3 year old stringing words together. Doesn't it strike anyone else that way?

These things are going to grow up very, very soon! We know that. It's scary.

You're watching a two-year old. It's interesting to think about what it will be like when it's ten, sure.

But what will really blow your mind is what it'll be like when it's 23. This is happening right now, before our eyes.

I cannot overemphasize that any large server farm at Google or Amazon is doing more, and much faster, processing than a human brain's neural net. Human brain is 86 billion neurons with an average of 7,000 synaptic connections each. That is a huge number. But they are firing at 15-300 herz (because they're at biological speed, instead of lightspeed like our CPU's) - which is 7 orders of magnitude slower than our silocon. Our brain is about 3 pounds (1,300-1,400 g) and uses some 20 watts.

It's not a question of "if" a server farm will have as powerful neural nets. It's a question of "when".

(Also although we won't be using it, the entire source code for the human brain has to be strictly less than 700 MB, because the fully sequenced human genome which obviously encodes the full human mind is less than 700 MB uncompressed.)

Guys, we are at an incredible pivot point in human history. We are coming up with computerized brains with in some ways comparable architecture to humans, and they are doing human activities.

Today, in 2016, there are thousands, perhaps tens of thousands, of server rooms all over the world that have more than enough computational power to do in real-time what a human adult brain does in real time - but we lack the software.

when we see advances like this in artificial intelligence, this is scary.

We're all but looking at the intellectual output of a two year-old in the field of music.

every single day AI results are astounding. this is it.

> the entire source code for the human brain has to be strictly less than 700 MB, because the fully sequenced human genome which obviously encodes the full human mind is less than 700 MB uncompressed

Actually those 700MB are compressed, in a way so sophisticated that we don't really know how to uncompress it yet - or whether it's even possible without the external resources our planet provides. And keep in mind that those 700MB only describe how to prepare the basic concept of the brain, whose memory is then packed with information we get from the culture.

What I mean by uncompressed is that you're reading the cytosine (C), guanine (G), adenine (A), or thymine (T) as being two bits per base pair (1 of 4 possibilities). There are 3 billion base pairs, which is 6 billion bits, divided by 8 to get bytes you get 750 MB.

The human genome is somewhat redundant and can be further compressed. That is just the string of "ones and zeros" (ACGT) could be run through whatever compression algorithm you wanted.

But don't take my word for it:

>"When the 4 bases are packed into one byte ( .2bit format) the size is 770M (hg18.2bit) , but you'll need an extra tool to decypher the data." [1]


You raise an important point:

>And keep in mind that those 700MB only describe how to prepare the basic concept of the brain, whose memory is then packed with information we get from the culture.

Yes, absolutely. I simply called it an upper bound on how complex a brain's architecture could be. DNA obviously encodes the brain's architecture, since humans all have human brains. Beyond that, there is a very large variation in people's mental capabilities and brains, and the largest variation of all comes from culture.

But culture could be given to a virtualized brain (called training).

Bear in mind that when human brains receive culture, it takes them years of all-day training before they're even able to speak. So full 1x human brains take a long time to train.

When you see results out of neural nets that are similar to what very young toddlers can do, you should be awed. We have the computational power in server farms to do what full brains do -- if not now, then soon.

This isn't some sci-fi pipe dream. Go ahead and look at the facts.

[1] https://www.biostars.org/p/5514/

What your link is describing is the size of the Genome encoded into bits. In other words, that's the size of the file containing an entire genome compressed into base pairs in order to store the raw base pair encoding on disk as digital data. That is definitively not the measure of the amount of information encoded into the DNA.

Nor can you guess at that encoding by modeling DNA base pairs as 2 bits. DNA base pairs aren't bits. They don't define op codes or memory registers. They are read and executed in a complex way that we don't fully understand. They interact with each other in complex ways that we don't fully understand.

There's way more information encoded in those base pairs than a simple 2 bits. You can't simply model them that way and declare that there are X Bytes in a genome. DNA isn't digital.

you are right - see my comments under here (1 down):


however, that just concerns the DNA argument. We know roughly how many neurons there are in humans and their connectedness.

I think that your "700MB uncompressed" fails to take into account that the construction, development, and maturation of the brain relies heavily on cellular and molecular mechanisms. I think it is a little disingenuous to hide the enormous wealth of information necessary to create a brain, much less understand and utilize one, inside of your compiler.

>hide... inside of your compiler.

Not to mention the rest of the ecosystem. Look what happens to humans when they grow up without being properly embedded in the family, with its hundreds of thousands of years of historical contingency: https://en.wikipedia.org/wiki/Genie_(feral_child)

We can't even begin to describe the amount of information encoded there.

uh, you can begin to describe the amount of information. if someone grew up with sensory input limited to HD video meaning something like 6-10 GB/hour they would be handicapped versus other humans but not drastically so. In 6 years there are 52,560 hours. A large library, sure, but we already have all this digitized anyway.

I'm cutting a lot of corners, and the human substrate in which our brains are embedded is finicky - you can't just leave a toddler in a roomful of DVD's with food and get a fully functioning adult after 48 months of unsupervised training.

but it's not "can't even begin to describe the amount of information encoded there" either. 50,000 blue ray discs' worth oughta do it.

I'm not saying we've figured out any of the other stuff - it's just that computationally (horsepower) we're there; the later training set for unsupervised learning is also there; etc.

The missing parts might well seem insurmountable - but every result that shows AI performing at the level of a 2 or 3 year old is wonderful. This is it. This is beginning of the turning point. It can happen at any moment.

Someone at this very moment could be setting up a neural net that after 48 months of training can deduce its own status in the world, make novel and correct sentences, maintain a coherent world-view, and be trained on the entirety of the Internet at 10x the speed of adult brains. (The limit is 1,000,000x speed of adult brains - because the silicon substrate we're using today propagates signals literally a million times faster - today.)

We're there. It's all there. It's "just" 86,000,000,000 neurons (with 7k connections each) and 12 years of supervised training to get to the level of a 12-year-old. 3 pounds. 20 watts. This is happening. Amazon's and Google's server farms blow it out of the water computationally compared to a brain, today. we might not come up with the same architecture but what we are coming up with is making breathtaking progress.


EDIT: I've been submitting too much, but I agree with JonnieCache's thoughts below. However, as a thought exercise there is no reason we couldn't train AI interactively, putting it in a VR room and literally talking to it and correcting it, etc, like a pet. Granted this is not a normal approach to take but since we're discussing the theoretical limitations you can certainly envision it. Obviously nobody is trying to do that - we're not trying to come up with sentience using an approach like this and have no idea what steps humans go through exactly to get there. But it's not computational power that keeps us from getting there - and we could be surprised at any time.

>50,000 blue ray discs' worth oughta do it.

By this logic, the handful of megabytes of unicode making up War and Peace in the original russian should be enough for a non-russian speaker to fully grasp it and all its meaning and implications. It isn't even enough for a native russian speaker to do so.

Humans aren't raised by simply looking at their surroundings, they're raised by interacting with people, who were in turn raised by interacting with people, going back for the whole history of humanity, or arguably mammals. That information isn't all in the genome, although natural selection has put some of it in there. The bit that we don't know how to describe information-theoretically is the bit that isn't in the genome, because we don't know how it's encoded.

I don't see how you'd even try to put bounds on it: this is essentially the problem posed by post-modernism/post-structuralism/literary theory, once you strip away the marxism, and science's response has understandably been to reject it but it can't do so forever if it wants to create AGI.

Or maybe I've misunderstood your point.

I might be pursuaded that a 2 year old could come sooner than we think, via brute computational force as you describe, but I'd argue that a 2 year old with the capacity to become anything more than a 2 year old is much farther away than we think.

EDIT: if, as you claim, we are close to having the computational power to simulate human children, then why aren't we already successfully simulating much simpler animals? IIRC the best we can do is a tiny chunk of a rat, or the whole of various kinds of microscopic worms, and those are just computational models, not turing test-passing replicants.

One good analogy I've seen for this (I think expressed by @cstross) is that the DNA isn't the source code. It's the config file for a much, much larger process that only exists as a running blob of object code (i.e., all those cellular and molecular mechanisms, plus the environment, plus culture) and is never serialized anywhere.

I don't think you are correct from an information computation point of view. When looking at the computation done by neurons in the brain it is sufficient to abstract away the lower-level substrate in which it occurs.

You and other posters are all correct regarding the huge volume of information on which human minds are trained. it's hardly unsupervised learning either :)


EDIT: In response to your comment, I've given it further reflection. DNA as source code may be misleading as an "upper bounds". After all, suppose we knew for a fact (assume there were a mathematical proof or anyway just assume axiomatically) that a one hundred megabyte source code file completely described (contained every physical law etc) a deterministic Universe, and that if you ran it on enough computation to fully describe a hundred billion galaxies with a hundred billion stars each, one of those stars would have a planet and that planet would contain humans and the humans at some point in the simulation would deduce the same one hundred megabyte source code file for their Universe. (This is a bit of a stretch as it's not possible to deduce the laws of the universe rigorously.)

Anyway under that assumption in a way you could argue that the "upper bounds" on the amount of entropy that it takes to produce human intelligence is "just" a hundred megabytes, since that source code can deterministically model the entire Universe. But practically that is useless, and the humans in that simulation would have to do something quite different from modeling their universe, if they wanted to come up with AI to help them with whatever computational tasks they wanted to do.

In the same way, perhaps DNA is a red herring, as there are a vast number of cells in the human body (as in, tens of trillions) doing an incredible amount of work. So starting out with DNA is the "wrong level" of emulation, just as starting out with a 100 MB source code file for the universe would be the "wrong level" of emulation, even if we posit axiomatically that it fully describes our entire Universe from the big bang through intelligent humans.

So I will concede that it is misleading.

All that said, I think that emulating or considering the computation on the level of neurons is sufficient - so it is sufficient to look at how many neurons are in the human brain and the kind of connections they have.

As for the efficacy of this approach - that's the very thing that is being shown in the story we're replying to and many places elsewhere. It works. We're getting extremely strong results, that in some cases beat humans.

I believe that emulating or comparing to humans at the neural level should probably be sufficient for extremely strong results. We do not need to emulate every atom or anything like that. I consider it out of the question that we would discover that human minds form a biochemical pathway into another ethereal soul-plane and connect with our souls in a way that you can't emulate by emulating neurons and the like, and that the souls are where intellect happens and brains are just like "radio antennas" for them. Instead, I think that the approaches we're seeing will achieve in many ways similar results to what humans brains produce computationally - a much higher level of abstraction is sufficient for the results that are sought.

I will confess to not being an expert, but I disagree: I don't think it's sufficient to abstract away the lower-level substrate when the OP was referring to DNA as source code, which absolutely depends on that level of detail to both construct the system (the brain) and to enable the continued development and maturation of that system (a physical, real brain).

I was not referring to the huge volume of information necessary, as I acknowledge that as being "outside the system" for purposes of this discussion, so my apologies for any confusion I might have caused.

It may be possible that (and it is my belief that) there is a higher-level abstraction for the computations taking place in the brain, even if it is on the neuron-level, but at that point I don't think you can claim that the source code for that is going to fit under 700MB by using DNA as a baseline.

you are right, I went too far. see my other comment:


however, that just concerns the DNA argument. We know roughly how many neurons there are in humans and their connectedness.

> (This is a bit of a stretch as it's not possible to deduce the laws of the universe rigorously.)

Not sure I agree. This is known as the problem of induction, and Solomonoff tackles it quite well I think.

Certainly it is not possible to deduce every law governing the Universe! For example posit there is source code for the Universe; then assume a second Universe that is identical except its source code differs in that after 35 billion years suddenly something happens - but since there is no way to investigate the laws directly, nobody inside could possible "deduce" the complete set of laws, there is no way to know that the source code contains a check that gets tripped after 35 billion years. Since only one of the two versions is the one running the universe, but it is not possible to determine which one, therefore the inhabitants of the Universe cannot deduce the laws that govern them: for the first 35 bn years the two universes are bit for bit identical.

So if I wanted to assume there is a one hundred megabyte source code that somehow was actually exactly the source code governing the entire Universe, I'd have to assume that axiomatically. We could never "deduce" it and know for certain that it is the only possible source code and our Universe must be following exactly it and nothing else.

At least, this is my thinking...

You absolutely can deduce those laws, which is what Solomonoff induction does [1]. It's been formally proven to converge on reproducing the correct function in the limit.

> Since only one of the two versions is the one running the universe, but it is not possible to determine which one, therefore the inhabitants of the Universe cannot deduce the laws that govern them: for the first 35 bn years the two universes are bit for bit identical.

Correct, until they're not, at which point you can distinguish them. Until then, you prefer the laws whose axiomatic complexity is lower. It's basically a formalization of Occam's razor.

[1] https://en.wikipedia.org/wiki/Solomonoff's_theory_of_inducti...

I was using deduce as 'arrive at the actual set of laws (the laws 'deduced' perfectly matches reality indefinitely going forward) and know correctly that they must be the actual set currently running the universe.'

In your phrasing, it seems that you would assume this following sentence does not contain a contradiction.

> Alice in the year 23,435,342,435.03453725 correctly deduced all laws governing her Universe and through a clever mathematical trick was able to correctly deduce the state of the universe in the year 24,535,342,450.03453725 (15 years later, i.e. '50 instead of '35), which a mathematical trick allowed her to predict with absolute precision without having to model the entire Universe. Having deduced the correct set of laws, she knew for certain that that state at '50 would include x, let's say atom #349388747374123454984598323423 in that Universe must be a hydrogen atom. She knew this with certainty, because she had deduced the laws governing her Universe and that they could not be any other set of laws. The laws were also deterministic. She was also correct in her calculations. Therefore, since she made no mistake in her calculations, had deduced the correct set of laws which was actually running the Universe, she was correct that atom number #349388747374123454984598323423 would be a hydrogen atom. One thing to note however, is that the whole universe happened to have an "if year ==24,535,342,449 then all atoms turn into nitrogen atoms wherever they may be" which someone had added to the source code as part of an experiment. Therefore, although she had correctly deduced that atom #349388747374123454984598323423 in the year '50 would be a hydrogen atom, in fact when that year rolled around it was not one. That doesn't stop her deduction from being correct and proper, or mean that she had not deduced the laws with certitude or been correct. In effect, it is possible fro me to say with complete and justified certitude that "next year x" as long as I have completely rigorously deducted it, and I will be correct in my rigorous deduction and in thinking that I have 100% chance of being right, even if next year not x."

you see my struggle?

For me, if you "deduce" that something "must" be exactly some way, you can't possibly have been right in your deduction if it turns out that you are wrong.

Nobody can deduce that Pi cannot equal 4, if, in fact, there is any chance that it is indeed 4 (under that axiomatic system). That's not a deduction: it's a logical fallacy.

so you are you using an extremely different definition of "deduction" as I am. I can deduce that within standard axiomatic mathematics, Pi is not equal to 4. After making that correct deduction, there is no conceivable Universe in which I am wrong. (Seriously - it's not possible that in another universe someone exploring ZFC+ would find that Pi is four there.)

Held to such rigor, there is nothing that I can deduce with certainty about the laws that govern our Universe - so that it is impossible that some other laws or a larger set of laws might in fact govern it.

However, that is exactly the state that I wanted to assume as axiomatic for my argument. (I wanted you to assume as axiomatic that the complete set of laws or source code could be 'deduced' and as guaranteed to be correct as we have a guarantee that under ZFC+ pi isn't equal to 4.)

if we knew that "these 100 megabytes exactly model our Universe, and by the way deterministically - if you run it on enough hardware to simulate a hundred billion galaxies with a hundred billion stars, one of them has a human around year 12 billion" and we are guaranteed that it is exactly the laws of our universe with the same certitude that we are guaranteed Pi isn't equal to 4 under ZFC+ -- well --- that is not the level of certitude that Physics is able to confer :)

Imagine a very compact, compressed representation of the most complicated CPU on the planet. How many MB do you think that would be?

> Instead of five, don't you mean "two to three"? And under that comparison, isn't it scary how much it does sound that way? Like a kid who hears words together but don't know what they mean yet?

In a sense, that's expected, and it's a (perhaps weak) indication that we're moving in the right direction.

> You're watching a two-year old. It's interesting to think about what it will be like when it's ten, sure. But what will really blow your mind is what it'll be like when it's 23.

Or 23000. Because, you know, it's a machine.

> the entire source code for the human brain has to be strictly less than 700 MB, because the fully sequenced human genome which obviously encodes the full human mind is less than 700 MB uncompressed

I have this half-baked argument sloshing around in my mind for a while now, that the total source code is greater than that, because it also includes constants, rules, and facts built into the fabric of the Universe itself. In other words, that 700 MB is code for "build a carbon-based intelligence in the context of such-and-such universe" - where "such-and-such" may turn out to be a surprisingly long and intricate description, effectively extending the code by a large amount. Kind of like a fixed ROM added to the variable NVRAM of all humans.

In a different universe the same code will produce a very different creature - and that's at best. In most other universes, the process would terminate early with a critical error, or would not even start. Or would not even make sense at all.

> we are at an incredible pivot point in human history

I think we are at the cusp of an explosive change without precedent in the history of the Earth - and not just due to AI. There's a long list of things that are about to hit tipping points in the next years / few decades. It will be... "interesting". Buckle up.

> Or 23000. Because, you know, it's a machine. Deep Thought[1] instantly came to mind.

[1] http://www.bbc.co.uk/cult/hitchhikers/guide/deepthought.shtm...

By your own argument, any program that a single human being can write can also contain only 700MB of source code, including graphics and videos etc, which is obviously not true.

you are right, I went too far. but the size of DNA is a good indication of how complex the high-level wiring and topology of neurons likely is, that is, until that neural net has been been trained at all.

it takes years to train a human neural net to even been able to speak (toddlers.) But regarding the architecture, there's good reason to think that the wiring for it is not infinitely complex, taking trillions of terabytes etc - or anywhere close to that.

I think that on some level it is fair to say that the wiring that makes people's brains human is in some sense encoded in DNA.

However, I will grant that this might not be helpful. After all, if we posit or assume as axiomatic that the whole universe is deterministic and here, these 100 megabytes of source code can model it and if you did that with 10^100 bytes and FLOPS, you'd after a few days of running it you'd find humans inside - well, then, sure, in some sense 100 megabytes "describes" the entropy inside human brains, but in another sense that is not useful in any way.

so perhaps DNA is the same and the "knowledge" that 700 MB or so can produce a human if you run it in biological substrate after carrying it out in a womb, and then raise it with human culture for 72 months it will speak just fine, is not necessarily an indication of what kind of information we would need to give a neural net and how much coding it would take to take on the same behavior. So while to me personally it remains an indication, I agree I went too far.

I honestly disagree. Because in that case any evolution of anything would be impossible. Take a look at Wolfram's New Kind of Science for an analysis of how massive complexity arises from simple beginnings.

I will go further and tell you thay the state of AI so far is nowhere as advanced as you think. It is not a flexible intelligence that can figure out connections between concepts (look up Cyc for the closest we've got). It's PARAMETRIC MODELS made by HUMANS where a neural net simply runs an algorithm written by HUMANS to optimize something. It's just an optimization problem.

You know where all the ingenuity and progressin machines comes from currently? The ability to make PERFECT COPIES AND SEARCH QUICKLY. You see, learning takes a long time and imperfectly copies information. Books could be copied but search is slow. Now with computers we can try algorithms out and the ones that work better are SELECTED FOR AND COPIED and anyone around the world can contribute. That is also why open source outcompetes closed solutions.

You will find that the huge innovation in technology and software from 40s and 50s til now has all been because HUMANS added to the snowball and replicated good ideas. The ideas themselves require HUMANS to generate new ideas.

Show me one AI today that can form connections between concepts to teach itself to solve arbitrary problems, instead of just optimizing some parametric pattern recognizer.

it would be a massive breakthrough if such an AI existed today, and it doesn't.

Due to the amount of hardware that we have to throw at the problem, it may, however, happen at any moment. I am saying we have more than enough hardware for general intelligence to arise out of server farms - but, yes, the millions of years of evolution that gave rise to humans hasn't occurred. it would be a breakthrough, yes.

> These things are going to grow up very, very soon! We know that. It's scary.

I'll believe it when I see it.

Your post has just a bit too much futurologist science fantasy wishful thinking in it.

You sound like some of the kids I teach. All 17 - 18 year old kids have phones and check them at least every 5 minutes. I mentioned that it'll be weird when they are wearing some sort of augmented reality glasses and teachers won't be able to tell if students are concentrating or reading reddit.

The kids all said "That'll never happen." as if technology is stuck where we are today. They were astounded when I told them that ten years ago, I never saw a phone in a classroom. So from my perspective we went from no phones, to big phones, to little dumb phones, to little connected computers... it's no stretch to think that soon they will be glasses mounted or project some sort of holograph into the eyeball. Version 2 of Microsoft HoloLens will be cool, v3 will be tiny, v4 will be mounted inside glasses for sure.

The same with AI. A few years ago I couldn't talk to my computer. Today I do it all the time. Last week I had to remember passwords, today my computer recognises me and logs me in.

AI is here and it's getting better and from my point of view it is getting better much faster.

It's one thing to say "VR will be commonplace in 10 years" or "Self driving cars will be commonplace in 10 years". That, I can believe because we already have prototypes and I've messed with them and could imagine advancing them.

It is quite another thing to claim that human adult level strong AI is coming "very very soon" and could happen overnight.

The latter is just science fiction wishful thinking. I have no reason to believe we will ever have truly thinking, sentient computers, let alone "very very soon." Sure, our Siris and Alexas and whatnot will get better and better at responding to our queries how we want, but that's way different from an adult level human intelligence AI. Machine Learning has limits and will not yield conscious machines anytime soon, if ever.

> I have no reason to believe we will ever have truly thinking, sentient computers

Unless you think human brains work on magical pixie dust, I'd say you have quite a few reasons.

I'm not even convinced that the human brain can be modeled in terms of a Turing machine. How does does the brain achieve free will - the ability to "decide" between two arbitrary, equally-weighted things? RNG? And yet, it doesn't "feel" random.

Firstly, free will isn't what you think it is. A random choice might seem "free" but it's not "willed" in any meaningful sense. So let's leave aside the free will question because that term is pretty much undefined at this point.

What your post sort of hints is what's known as the hard problem of consciousness. I recommend reading the Wikipedia pages on it and qualia if you're interested.

Suffice it to say, given our current understanding of physics, we are no better than finite state automata (see the Bekenstein Bound). The only escape from this inevitability is if we collectively decide that the hard problem of consciousness is irreducible, and then something like panpsychism becomes preferable.

This is unlikely though, and we've been through this once before in the debate over how living matter differs from non-living matter. Must there be some "secret sauce" added to non-living matter to bring it to life? This was the proposal of vitalism, but eventually biology came to prominence and all of those who insisted living matter had to be different just died off and we were left once again with a reducible, mechanistic understanding of living matter. So it will be with consciousness (see [1] for an example of how this might work).

[1] http://journal.frontiersin.org/article/10.3389/fpsyg.2015.00...

This presumes free will, which is itself contentious. I think it's clear we have the appearance of free will, but it's not clear that consciousness actually makes decisions rather than an emergent appearance of agency, claiming decisions as made for actions that are already in process. If you're going to make a claim about the feasibility of AI that relies on free will, you'll have to prove free will first.

Of course, I have no way of knowing how soon "very soon" is.

However, although my post may have wishful thinking in it, I think if you investigate the reality of the cells that contribute to human thinking, you will come to the conclusion that it is impossible that this or some similar system should not be modelled in computers within the next hundred years. You can look at the processing involved yourself. Just remember that the whole is the sum of the parts, and the parts are doing slow, biological things - it's not silicon etchings that work at practically the speed of light. Neural signals propagate at something like 100 meters per second at the fastest[1] - light propagates at 299 792 458 meters per second and if you look at our 4 Ghz CPU's you will see that in a clock cycle light just goes about 10 cm.[2] Light goes fast enough to make a million trips across a server room in the time it takes neurons to fire. While staggeringly complex, the fact is, our server rooms are doing more.

This is like heavier-than-air flight. Let's say the fuels and engines are there - but we don't know how to assemble them.

When we see neural nets produce results that are similar to what young children produce, this is amazing. I cannot overemphasize that at any moment we can have a breakthrough. We have so much more computational power than required: 3 pounds! 20 watts! 86 billion neurons with 7000 connections! And at speeds of 300 herz.

The training time for a human brain to get to intelligence even close to adulthood is, let's say, 12 to 14 years even in the most precocious.

We're looking at the output of neural nets that receive perhaps months of training.

This stuff is happening right now. It's absolutely astounding and there is no limit to what can be unlocked overnight. Any server farm at Amazon or Google has the hardware today to go through 10 years of human brain training in about 10 months, and then end up with a network that thinks ten times the speed of human thought. It's there, today. Three pounds. Twenty watts. Three pounds. Twenty watts. Three pounds. Twenty watts.

It's not miraculous. It's happening.

[1] http://biology.stackexchange.com/questions/21790/how-long-do...

[2] https://www.google.com/search?q=c+%2F+4+hgz

> This is like heavier-than-air flight.

Yes, this is like heavier-than-air flight. It would be like if we didn't really understand how flight works so we are just imitating birds by gluing feathers to cardboard wings, flapping them, and hoping flight just sort of happens.

In a similar way, we don't really understand how the human brain works, so we are just imitating neurons and hoping sentience will just sort of happen.

How long did it take to go from really silly and obviously impractical flying machine designs to working, practical ones?

Thousands of years? Who knows how long before the Wright Brothers people have been attempting flight?

At least 500 years of relatively serious attempts, if we start with Leonardo.

ythn, you should start where we had the combustion engine, since it was a source of huge amounts of power (the power density of the fuel) that was already effectively being turned into motion.

What I'm saying is that the server farms @ Google, Amazon, and elsewhere, are at the stage of the "combustion engine" -- not just 'powerful enough', but more powerful than necesssary. I base this on a count of the number of neurons in the human brain, their rough topology and connectedness, how quickly and often they fire, etc.

In that sense, I think you will find it almost impossible to conclude that our server farms today do not have sufficient power to simulate 86 billion neurons' 7k connections each, at a lowly few hundred herz. They are not even more connected than we are able to simulate, precisely because we can saturate links over a million roundtrips before we meet brain's realtime behavior. (Or at least, thousands of times.)

So perhaps "we are just imitating birds by gluing feathers to cardboard wings, flapping them, and hoping flight just sort of happens" -- but while having an engine attached that's at least an order of magnitude more powerful than is required to achieve flight.

In that context, the story we're replying to is like a video of someone achieving a 2 second flight - with a lot of feathers flying everywhere. It might not seem like much, but given what we know an 86-billion-neuron neural net is capable of, it is exciting.

The results of AI based on neural nets that I see being posted every single day are absolutely astounding.

this is happening right in front of your eyes. you're witnessing the birth of flight.

no, the engines won't fly the way birds do -- but they're more than powerful enough to start flying, and you are seeing this every single day.

By the way, you know there's not a particular push toward trying to make neural nets that are self-conscious or feel pain, etc? This just isn't a primary goal of researchers at this very second.

Just as flight engineers don't really try to make ornithopters. We have better and more powerful things to be doing.

this is an absolutely miraculous time. And you're not seeing it - which is sad.

This isn't some kind of pipe dream. These are results coming out every single day.

Here's a neural net designed to make horror art out of normal photos:


And that's just one thing out of many. Nuance's Dragon systems have practically eliminated the entire profession of medical transcriptionist. (Just google regarding 'medical transcription dead' without quotes to find people from that industry reporting on its demise.) Becoming a medical transcriptionist today is like going to typewriter repair school. Because speech recognition techniques based on machine learning have gotten that good.

Go, an incredibly nuanced game with a staggering possibility space, has fallen to machine learning competing with a lifetime of competitive human mastery.

It doesn't matter in what order or how the next breakthroughs come that go toward sentient or at least very intelligent interaction. We know what the limits are. And we know based on the topology and computation involved, that we have massively more than enough horsepower.

so while you might point to flying feathers and crashing airplane and deride it, I think about the jet engine behind those flying feathers and my heart skips a beat when I see it sustain flight for 2.3 seconds before producing "A hundred and half hour ago" like a human toddler blabbering incoherently and without even understanding those words.

Because I know what else AI has been accomplishing, and I know the horsepower behind it. you need to expand your thinking and realize that our algorithms and machine learning techniques are playing catchup with hardware that has been sitting around being dumb.

That's right: computers have just been sittin' around, bein' dumb, while they have all the computational power necessary to surpass humans in every realm of neural computation. mark my words, ythn. no pipe dream involved.

I don't doubt we will be able to do great things with machine learning in the coming years.

What I do doubt is that machine learning will become generic anytime soon. I predict machine learning will always need some degree of specialization - we aren't reaching general intelligence/learning within our lifetimes. A machine that is awesome at playing Go will suck at translating languages.

I also predict machine learning will never be able to surpass humans in terms of creative ability. A top notch machine-written book/poem will always be inferior to a top notch human-written book/poem, for example. Humans can invent new things, machines seem only capable of rehashing existing things. For example, at some point a human writer invented the concept of an unreliable narrator. If you "teach" a machine how to write by feeding it thousands of books, but you exclude books that have unreliable narrators, will the machine ever write a book whose narrator is unreliable? I think not.

I'll happily admit you were right all along if AGI does come about within even the next 20 years, but I think you are grossly oversimplifying things in order to embrace the sci-fi fantasy you wish were real.

> we aren't reaching general intelligence/learning within our lifetimes

Almost certainly false.

> I also predict machine learning will never be able to surpass humans in terms of creative ability

Algorithms are already churning out papers that are accepted to journals, and they can compose crude music. This a mere 10-15 years after the study first began. I give it maybe 20 years before a computer generated song will appear on one of the top charts. These will likely still be domain specific algorithms.

> Humans can invent new things, machines seem only capable of rehashing existing things

So you think human brains run on magical pixie dust? "Things" that humans invent can all be described by finite bit strings, which means generating "new things" is a fiction.

We discover these compositions just like a computer would. The secret sauce that we have but don't yet know how to describe algorithmically, is discerning those bit strings that have more value to us than others, like a clever turn of phrase is more valued than a dry, factual delivery.

> If you "teach" a machine how to write by feeding it thousands of books, but you exclude books that have unreliable narrators, will the machine ever write a book whose narrator is unreliable? I think not.

I don't see why not, even if we stick to domain-specific novel generation, but it depends on how you train the system based on the inputs. Random evolution is hardly a new concept in this domain.

I'm curious if this is in spite of you being on board that "sure, server farms do more than enough computation for parity with the human brain" or whether you don't consider neural nets in relation to human neural nets (biological brains)?

If you do consider and admit that computationally there seems to be enough horsepower there, where does your skepticism come from that anybody would figure it out?

Alternatively, did you happen to completely ignore the argument about how much computation the human brain does? (Which isn't that much compared with server farms). I mean on a neural level, using the same neural network topology or an approximation of it, actual neural networks.

I guess I'm perplexed at your skepticism.

I'm skeptical because you are promising the moon, and when I look and weigh the tech for myself it seems many orders of magnitude less advanced than your hype leads one to believe.

I am basing the promise bottom-up, based on how many neurons are in the human brain, their connectedness and speed, and amount of computation those 3 pounds can be doing using the synaptic mechanisms we know about.

You are basing your skepticism top-down based on the results the science of artificial intelligence has shown to date.

It's a fair source of skepticism. There are 15,000+ species of just mammals, all of which have neural nets and exactly one of which have higher abstract reasoning communicated in a language with very strong innate grammatical rules - and that is humans.

However, we have 7 billion working specimens, a huge digital corpus of their cultural output, and their complete digitized source code which can be explored or modified biologically.

For me bottom-up wins. We can just try things until it works - which may be sudden/overnight.

At the moment I see a jet engine, feathers flying everywhere, and no flight. But looking at that jet engine, I just can't imagine it will take long.

> Also although we won't be using it, the entire source code for the human brain has to be strictly less than 700 MB, because the fully sequenced human genome which obviously encodes the full human mind is less than 700 MB uncompressed.

There's also the training set consisting of all sensory and chemical inputs from conception on. I think it's fair to say that the former inputs are only sensible to an organism with the same sensory organs, and the latter inputs are only sensible to an organism with the same physical structure & organs as we have.

I don't think a silicon neural net with the neurons of a human being would be the same thing after two dozen years of training as a human being after 24.

You're also assuming that there's nothing more to a mind than its neurons.

> uses some 20 watts

Daniel Lemire has an observation here: our brains don't use all the connections, all the time (when it does it's called epilepsy); it would be interesting if we built some computers that also worked like that - with only small parts being active at a certain time, but switching quickly from one to another. That would help with the heating problem and it might allow processors a lot more complex than we have now.

if you're trying to achieve parity with human brain calculation the most important thing is how slow neural connections are. they're snail-paced compared with gigabit links and CPU's @ 3 Ghz+. There's quite a lot of memory that is needed but if you're allowed to make a million round-trips across the room before you miss your realtime deadline, it's kind of silly to think that it won't happen. The fact you point out (about not all parts firing at once) could be useful because it might result in these links not being saturated by data anyway.

From my understanding, this is already present in modern architectures and systems, and will keep increasing . Instead of building only general-purpose circuits (e.g. something like a CPU), you can build lots of special-purpose circuits (multipliers, CRC, FFT, AES, hashing, h.264,...).

I've seen more and more little A.I.-generated ditties like this recently and their reception tends to be the same: that they're interesting and funny but don't sound that great.

The output would probably be more compelling if A.I. were adopted more as an instrument by individual artists/composers to automate some of their more tedious tasks by learning their own particular styles rather than a magical music box that churns out top hits.

"The best Christmas present in the world is a blessing."

This algorithm is throwing down some wisdom

I think generating fortune cookies is a really low hanging fruit for current AI. Someone could put it together in a week-end.

I am the main developer. You are literally right on the time spent...

Our original focus was writing a research paper on hierarchical music generation. Composing a song from an image is just one of the "fun applications" that we spent a little time on, to promote the interestingness of our method. I started Saturday afternoon, and was basically done by Sunday night.

Well, your project is quite a bit more complex than just making fortune cookie messages. Still amazing how much can be accomplished quickly with current AI tech.

Thumbs up for the cool project.

I once developed a hierarchical Markov chain, and I decided to use it to generate fortune cookie wisdom because I thought people would be more willing to overlook grammatical mistakes or be willing to interpret it as an expression of deep truth. http://www.kylem.net/stuff/fortunes.html (You might need to increase the "sense" parameter.)

"The supreme happiness in life is simply to serve as a warning to others."

Oh my god, I'm in stitches. What a fun project, thanks for sharing!

"If at first you don't succeed, destroy all evidence that you cannot change, and you will feel better."

When I think that this month I'm going to hear "Last Christmas" by Wham! in every store I set foot into, this new masterpiece doesn't sound so bad.

I am pretty amazed at the effort that nVidia is putting into its corporate rebranding effort. I wonder if, in the not too distant future, they will be the AI company that also makes Graphics cards sometimes.

The other thing I find really amazing about it, coming from IBM, is that IBM has invested a ton of money in IBM Watson but they sold off their foundry business (could have made massively parallel AI machines) and their systems business is a fraction of what it was.

Looking at what can be done when you're leading versus when you are following is really sobering to me.

I'm not sure what I should be impressed by. Maybe there's some real technical feat happening here, but I feel like a basic mad-libs style algorithm could produce something better.

I'm not very familiar with mad-libs so correct me if I was wrong. I think generating a lyrics passage (zero hard-coded rule on content or grammar or anything) from an image would not be something you can do with mad-libs.

One step closer to GLaDOS.

this was a triumph! im making a note here: huge success

It's hard to overstate my satisfaction.

it is scary, sounds like GLAdOS singing "still alive"!

It sounds more like a corrupt core.

"Cave here. It's Christmas time, and you know what that means: Christmas bonuses have been suspended until further notice. We've gotta pay the judgement on that pesky class-action with something. But don't let that get you out of the Christmas spirit. The lab boys have come up with a way to stay festive by hooking up the Christmas Core to the lab's PA system. So enjoy free, continuous, computer-generated Christmas music from now until January 5!"

"Cave again. Apparently the Christmas music has been causing some employees severe emotional and psychological distress. We've had reports of people sticking their heads into active particle accelerators and drinking Repulsion Gel to get away from the sound. So until a full investigation has been conducted and the Christmas Core thoroughly debugged, we are discontinuing the Christmas music. We do not need another class-action on our hands, folks."

Music for people who hate music.

Auto-synthesis of music has been a topic of academic interest since the 1950s, when the first mainframe scribbled out code on paper tape to be translated into sheet music and performed. UToronto's work here is the latest expression of this desire.

The huge gap between our cultures' actual music and these synthetic projects can to an extent be described through "receptivity" or the phenomenology of music, in other words, how it's experienced. The following fun, short talk does a great job of introducing the concept of through its analysis of "vaporwave."


That video is fantastic I watched it a week or so ago.

Your explanation also explains why computer performed music is so off. It still has that uncanny valley effect. So when Sony had a computer generate a "Beatles-esque pop song", they still had a human perform and produce it. But at the point there's so much creativity and human-added value on top of it that I don't think its fair to call it computer generated imho.

yes. I can tell you a little more about that, too, since I used to research this stuff and think about it a lot still.

One of my models of music is an external model of a regulated system that parallels and trains our own habits and responses. E.g. a song demonstrates tension and release similar to our own lives. The level of tension in a song before release occurs can inform us how much tension which should accept before performing some release activity.

Music's rhythms also inform the pace of our work. E.g. verse-chorus-verse represents switching between two different activities. Even the pitch of a single note acts as a reference for the amount of intensity of a sensation we should use in our own lives. E.g. thrash metal listeners enjoy sudden shifts into massive intensity and hold it there. Dub step listeners are training themselves for unusual, but rather intense aesthetics leading up to disproportionate release. Classical music tends to be for "long-chain thinkers" tumbling ideas over from various perspectives, e.g. writers and politicans, doctors, not factory workers.

With that as a background, consider that a live instrument is also a physical system with a human controlling it interactively. The live system is a bit different every time. Here's the critical part: the human must listen and provide instantaneous feedback to a varying system in order to present the piece of music as a proper response model of a regulated system. If the player fails to do this, the model communicated by the performance is different.

In open-loop systems, such as a sequencer, there is no (or limited) interaction between the player and the sound, so an incidental model emerges. That incidental model represents an unintended and therefore most likely irrelevant model of how to interact with reality. e.g. it relieves tension where no relief was needed. It lingers too long on an idea, long after a human novelty-seeking circuit has starved.

Some people, e.g. in discussions of unstable filters like the TB-303, chalk up the variations as being different at every performance because the instrument is random... However, they're missing the closed loop portion of the performance, in which the performer reacts to the unpredictability of the instrument in order to maintain the model. In other words, the score and notes are not the music, but the performer's response to the environment the score sets up is the music.

To revivify your uncanny valley observation, the "unstable filter creates variations" crowd has a parallel in Perlin noise used to subtly animate human models to make them not look so dead. However, it's incomplete because they don't use (short-term) feedback to determine when the movement suffices to be convincing. That feedback is the essence of performance.

In theory, computer scientists could implement these feedback models in performance to make the sounds more realistic. They could be used in synthesis, but the playback would still require observation of the listener! Which is possible. Personally, I just prefer playing electronic instruments live over using sequencers. It's only the sounds of electronic music I like, the zaps, peowms, zizzes, pews, and poonshes, etc. I don't care for electronics/computers to perform for me.

If you like this hypothesis, you can find more references on my wiki at: http://www.diydsp.com/index.php?title=Computer_Music_Isolati...

Did I just hear a new classic?


This is definitely now my favorite Christmas song. While obviously not a masterpiece, it's incredible how far this tech has come. It's almost got a Dadaist feel to it. Can't wait to see where this ends up in ten years! I can foresee music labels buying a few of these AI's, getting some pretty people with decent voices and sending them on tour.

Earlier this year, an AI-written script was made into a short film. It's got the same sort of absurd vibe.


Is pop music not cheap and disposable enough for you already?

Pop is still recognisably human. This sounds like logic gates roasting by an open fire.

That was a brilliantly seasonal analogy.

While this is hilarious, it doesn't seem like a huge achievement to me.

The only thing (kind of) working well is feature/topic detection in the image (tree, christmas etc.), but that isn't really cutting edge.

The core part, learning and creating music, only produced melody and lyrics that seem not much different from accumulating random sentence and chordal fragments.

I'm not sure that Christmas songs generally use the blues scale, I wonder what made them choose that for the melody?

I suspect it's an easy way to get something that won't sound completely dissonant, especially because you can use the same blues scale over multiple chords.

I can see the logic behind that, but without any sort of tension/release between the melody and chords, it still sounds just as dissonant to my ears.

Granted, it will never win any awards. Part of the choice behind it might have been the desire to "ship" it before Christmas.

It brings this recent story to mind: https://news.ycombinator.com/item?id=13033299

Better than The Christmas Shoes.

This is not even a proper song.

We have a very long way to go it seems. That was almost nonsensical.

This was a triumph!

I'm making a note here:

Huge success!


Show me the same algorithm generate songs in a different genre from different images and I'll be impressed.

I'm from the project team. This is a very interesting point.

While it is easy to crawl many songs from the internet, it is a little harder to gather the same amount but with proper genre/style/etc labels, although it is not impossible.

For now there's only one genre, which we call it "the genre of whatever is on the internet". So whatever music files on there, many of them quite "crappy", were used to train the model. Also there are many other problems on how to better structure and flavor the composition.

This is just a very early-stage attempt, as a CS student's fun side project. We are working with people with real musical talent now and hoping to make better songs in the next version.

Great job! What sort of resources did you use (Time, Processing power etc) to train this?

Edit: Found it in the article.

I mean, where does the Christmas element comes from? The image alone, the music it was trained with, or is it somehow hardcoded in the algorithm?

The Christmas element comes from 1. the image, and 2. a 4800-dimensional RNN sentence encoding bias generated from ~30 Christmas songs.

Not sure how to hardcode this.

I'm interested that you used some Christmas songs as training (which wasn't obvious from what I read of the paper). Were they pop songs, traditional, or a mix?

Further to my comment up there[0] - and I don't wish to sound a grinch because this is a really cool project - but would I be right in thinking you spent more time on the image description than the music?

I saw that you specify a scale for the melody, would it be either possible to use a mode to generate the accompaniment around, so that the melody can move diatonically and risk too many clashes, or to allow the melody to follow the chord sequence somehow?

Again, sorry if I sound too critical. It's a really awesome thing you've done, and I'm just a guy that listens to the music instead of the lyrics.

[0] https://news.ycombinator.com/item?id=13079355

Thanks for the comments! Are you asking the lyrics or music generation?

For lyrics, we actually didn't train on Christmas songs. Training data was a large collection of romance novels. (See neural-storyteller by Jamie Kiros). The "Christmas trick" we did was applying a "style shifting" after image captioning and before lyrics generation, where the shifting vector was obtained from ~30 Christmas songs.

For the music generation. Although we are aware of some basic music performing rules, such as melody following chord etc, we actually didn't add this kind of rules.

For the blues scale here's the thing. I didn't really know much about music, so I spent several hours reading things like basicmusictheory.com. It happened to introduce blues so we just used it. But you're right on the relevance between blues and pop: only a very small percentage in our pop music collection is blues, after we ran the scale-checking code.

Thanks for the reply! I was concentrating on the music specifically. I thought the lyrics generation was really enjoyable.

I was asking more if you'd used any traditional carols, as they can have a more definitively "christmassy" sound than a pop song with sleighbells laid over the top.

Overall I meant that I think the music would be more convincing either following the chords in the melody, or sticking to a single mode for both melody and accompaniment.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact