Hacker News new | past | comments | ask | show | jobs | submit login
Inceptionism: Going Deeper into Neural Networks (googleresearch.blogspot.com)
797 points by neurologic on June 18, 2015 | hide | past | web | favorite | 156 comments



Worth reading the comments too.

One from Vincent Vanhoucke: "This is the most fun we've had in the office in a while. We've even made some of those 'Inceptionistic' art pieces into giant posters. Beyond the eye candy, there is actually something deeply interesting in this line of work: neural networks have a bad reputation for being strange black boxes that that are opaque to inspection. I have never understood those charges: any other model (GMM, SVM, Random Forests) of any sufficient complexity for a real task is completely opaque for very fundamental reasons: their non-linear structure makes it hard to project back the function they represent into their input space and make sense of it. Not so with backprop, as this blog post shows eloquently: you can query the model and ask what it believes it is seeing or 'wants' to see simply by following gradients. This 'guided hallucination' technique is very powerful and the gorgeous visualizations it generates are very evocative of what's really going on in the network."


That's not really fair though, since any deterministic function can be "back-propagated" using the chain rule (or even automatic differentiation), even though it's not really necessary for simpler models such as GMM and SVM since there are much easier ways of inspecting them. Also, I don't feel single input/output pairs really describe the function itself -- knowing cos(0) = 1 doesn't reveal much about the cosine function, even though it's a local maximum. Maybe one could extend the technique to show transitions (morphing) between classes as video?



hey, where'd you get that?


Here's it applied to each frame of a video https://plus.google.com/photos/+MikeJurney/albums/6161722239...


I would absolutely love access to this, but it's showing as locked.


There is now a live stream of it producing video: http://www.twitch.tv/317070


Looks like they didn't intend to release it to the public. Should have saved it at the time. It was a reaction gif where each frame they applied that algorithm. It was super creepy, with eyeballs appearing and disappearing everywhere.

Perhaps they want to keep it private because they plan to announce more later. So we might see it again.


I can't really put my finger on it, but this is the most exciting thing that I've seen on HN in years.


Perhaps the argument should be steelmanned in that we should generally avoid using algorithms which are so complex that they aren't glass boxes. I doubt the idea to "simply follow gradients" can prove neural networks to be glass boxes because the output of that is still too complex. And we are clearly onto something here. If we can generate artificially hallucinated pictures today, it is not unreasonable to assume that computers will be able to hallucinate entire action sequences (including motor programs and all kinds of modalities) in a decade or two. Combining such a hallucination technique with reinforcement learning might be a key to general intelligence. I think it is highly unethical that there is almost no democratic control over what is being developed at Google, Facebook et al. in secrecy. The most recent XKCD comic is quite relevant: http://xkcd.com/1539/


> I think it is highly unethical that there is almost no democratic control over what is being developed at Google, Facebook et al. in secrecy. The most recent XKCD comic is quite relevant: http://xkcd.com/1539/

I consider myself to be very Left of center, but, I can't imagine what form of 'democratic control' you think is necessary over the research that Google and Facebook does.

I do not fault Google or Facebook for planning on time-scales longer than most governments. Governments ought to be doing this level of long-term planning, but are not (at least publicly)


It's a tricky ethical area. The Google post cites several research papers that seemt o provide more than enough information to replicate these results or get similar ones, which is good, because I think everyone should be able to explore these tools - I stick by my view from yesterday that this may be a scientific breakthrough.

At the same time, I can see the basis for some anxiety, because it's not hard to imagine proprietary research going a few steps farther and developing some sort of general intelligence or even a limited but extremely high-powered intelligence that would confer an overwhelming commercial advantage, and/or a political one. Suppose, as an exercise, that one developed an algorithm to maximize persuasiveness by first leading readers/listeners into a quiescent, semi-hypnotic state and then making your commercial or political pitch. There's certainly a potential for abuse.

In Europe this sort of thing tends to bring up the precautionary principle, the idea that you shouldn't do something without oversight and demonstrated minimization of risk. I think that's highly limiting, but expect some pushback against Google over this. Of course, I don't think democracy is all that wonderful either but then I'm a bit of a misanthrope.


> The Google post cites several research papers that seem to provide more than enough information to replicate these results or get similar ones, which is good

I agree that it is good, but even though the scientific theories and algorithms seem to be "open", having access to both the computing power and data-sets of Google, is not.

So one could replicate these experiments, but not quite on the scale that Google does. I'm not at all sure if it's practically possible for a single (really clever) person with a high-end CPU/GPU machine (and possibly some $$$ for Cloud Computing instances), to replicate something similar to the results in this blogpost.

The recognition nets used in the blogpost seem to be trained on a tremendously high number of training examples, to give the ability to "hallucinate" (or classify) such a great variety of animal species, for instance.


I'm not at all sure if it's practically possible for a single (really clever) person with a high-end CPU/GPU machine (and possibly some $$$ for Cloud Computing instances), to replicate something similar to the results in this blogpost.

It's very possible.

GoogLeNet[1] is an example in Caffe: BVLC GoogLeNet in models/bvlc_googlenet: GoogLeNet trained on ILSVRC 2012, almost exactly as described in Going Deeper with Convolutions by Szegedy et al. in ILSVRC 2014. (Trained by Sergio Guadarrama @sguada)

[1] http://caffe.berkeleyvision.org/model_zoo.html


I'm just questioning whether an autopilot with a profit maximization heuristic is the best tool to guide technological progress. With democratic control I don't necessarily mean our current democratic systems but any kind of decentralization of decision making by voting. Yeah, I know that's vague, but given what appears be at stake it seems unreasonable not to consider alternatives.


I'm reading Rationality: From AI to Zombies, and it goes through exactly this argument. Here's the original post:

http://lesswrong.com/lw/jb/applause_lights/


Fine, my suggestion to solve this problem democratically was an applause light. It was an unfinished thought and a call for action. I agree that this didn't convey any new information, I just wanted to express my distrust towards these kinds of appeasement statements from people who are working on these technologies. Being able to peak at different layers of a CNN doesn't recover NNs from the fact that they are in many regards opaque to us (and will possibly always be due to their complexity). Statements of the sort "I have never understood those charges" makes it sound like they are pretty much ignorant of the potential risks associated with not knowing exactly what your program does (they can perhaps be with regards to current technology, but I could imagine that more advanced systems can potentially arrive sooner than overall anticipated).


Well, the problem with "avoid using algorithms which are so complex that they aren't glass boxes" is that for quite a few problems, the simple choice is to either use a machine learning solution that will essentially give you a black-box or to use an understandable algorithm that doesn't get nowhere near state-of-art accuracy and is practically useless. Speech recognition, for example.


> Combining such a hallucination technique with reinforcement learning might be a key to general intelligence.

Knowing that the most common parallel effect of induced hallucination via psychotropics is ego-loss (complete loss of subjective self-identity) [0], maybe they need to try completely inverse processes in order to create a sense of ego in a machine... Because what's real intelligence but one's sense of self?

[0] https://en.wikipedia.org/wiki/Ego_death


I would argue that a sense of experience is a necessary precursor for that, and also that intelligence and consciousness are two different things, although (if I read you right) the latter certainly informs the former. Barry Sanders' A is for OX has many well-sourced musings on the emergence of consciousness as a product of literary capability vs. a purely oral tradition which you might find interesting, and of course I think everyone needs to read Jaynes, Dennett, Hofstatder on these topics.


Great more "ZOMG I'm skirred of AI" FUD. Stop being so afraid of the future.


There are plenty of reasons to be concerned about AI. Stop dismissing arguments because you don't like their conclusions. There is no law of the universe that the future can't suck.


The reason they look so 'fractal-like' (e.g. trippy!) is because they actually are fractals!

In the same way a normal fractal is a recursive application of some drawing function, this is a recursive application of different generation or "recognition -> generation" drawing functions built on top of the CNN.

So I believe that, given a random noise image, these networks don't generate the crazy trippy fractal patterns directly. Instead, that happens by feeding the generated image back to the network over and over again (with e.g. zooming in between).

Think of it a bit like a Rorschach test. But instead of ink blots, we'd use random noise and an artificial neural network. And instead of switching to the next Rorschach card after someone thinks they see a pattern, you continuously move the ink blot around until it looks more and more like the image the person thinks they see.

But because we're dealing with ink, and we're just randomly scattering it around, you'd start to see more and more of your original guess, or other recognized patterns, throughout the different parts of the scattered ink. Repeat this over and over again and you have these amazing fractals!


Functional iteration is actually a fun way to draw fractal images in the plane. It goes like this (for anyone interested):

1) Pick a function f: R^2 ==> R^2

2) Pick a region of R^2 (this could be the unit square for instance).

3) For each point in the region do the following:

  a) Plug the point into f. Then plug f(x) into f. Then plug f(f(x)) into f, etc....

  b) The norm of f(f(...f(x)...)) will either run off to infinite or stay bounded.

  c) Record for the original point, x, how many iterations it took the process to run off to infinite (or the maximum if the sequence stayed bounded).
4) Paint by number after assigning a unique color to each possible number of iterations.

Here's the result of this process for the function:

f(x,y) = ( exp(x) * cos(y), exp(x) * sin(y) )

http://i.imgur.com/LZKavio.png


That's really cool.

The trippiness is further compounded by the rainbow-ish colour effect produced by the recursive function, which mimics the "shimmering" rainbow effect you commonly get around lights when tripping on LSD.

And also, when under the influence of various drugs you tend to see patterns, particularly faces, where there aren't any.


> The reason they look so 'fractal-like' (e.g. trippy!) is because they actually are fractals!

While I agree with your idea about fractals (though you're a bit vague on the math details to know for sure), I also believe that a large reason the images look so "trippy" is because there is some local contrasting effect at work, generating high-saturation rainbow fringes at the edges of details and features. You get loads of that on psychedelics as well.

I bet there's a pretty straightforward reason to explain these rainbow fringes, if one were to dig into it, though.

Another (unrelated) observation I had was the feeling that the neural net seemed to be reproducing JPEG-artifact type fringes in the images? Though it could be that I was just looking at scaled versions of already JPEG-compressed output images, the article doesn't provide details (if only they had been PNGs ...).


> The reason they look so 'fractal-like' (e.g. trippy!) is because they actually are fractals!

Do they exhibit self-similarity at different zoom levels?


I believe they do (in the sense that if you take one of these images, zoom it in, and run it through the algorithm again, it'll take the micro-features of the animals it hallucinated and hallucinate more animals on top of them).


Yes, you can see zoomed-in sections of the last image in the article, about halfway down this gallery https://photos.google.com/share/AF1QipPX0SCl7OzWilt9LnuQliat...


These hair features that get re-interpreted as legs are particularly uncanny.


Tweak image urls for bigger images:

Ibis: http://3.bp.blogspot.com/-4Uj3hPFupok/VYIT6s_c9OI/AAAAAAAAAl... Seurat: http://4.bp.blogspot.com/-PK_bEYY91cw/VYIVBYw63uI/AAAAAAAAAl... Clouds: http://4.bp.blogspot.com/-FPDgxlc-WPU/VYIV1bK50HI/AAAAAAAAAl... Buildings: http://1.bp.blogspot.com/-XZ0i0zXOhQk/VYIXdyIL9kI/AAAAAAAAAm...

I'd love to experiment with this and video. I predict a nerdy music video soon, and a pop video appropriation soon after.


As linked in the last figure caption, there's a Google Photos gallery with high-resolution downloadable versions: https://goo.gl/photos/fFcivHZ2CDhqCkZdA



Some of them are truly beautiful


Had me sitting down. Felt mesmerizing, like a weird resonnance with my mind. This is how I imagined my brain working, patching bits of stimulus to recreate complex shapes fractally... Seeing it in pictures is ... just amazing.


This appears to be the source of the mysterious image that showed up on Reddit's /r/machinelearning the other day too:

https://www.reddit.com/r/MachineLearning/comments/3a1ebc/ima...


It reminds me of that parrot image that was said to crash human brains, only even more intense. I certainly experienced some effect, as while looking at it and trying to figure out what exactly it was, I felt my head heating up --- probably increased blood flow.


In case anyone hasn't read the story this is referring to: "BLIT" by David Langford. http://www.infinityplus.co.uk/stories/blit.htm


Ah, so it's Monty Python's funniest joke in the world in visual form: https://m.youtube.com/watch?v=ienp4J3pW7U


There's also a few sequels:

http://ansible.uk/writing/c-b-faq.html

http://www.lightspeedmagazine.com/fiction/different-kinds-of...

And What Happened at Cambridge IV, which I can't find online.


Here's What Happened at Cambridge IV:

https://books.google.co.uk/books?id=5d9hHvD-T7gC&lpg=PA264&o...

This last story makes the ML images even more disturbing!

Highly recommend these stories. They'd make a great black mirror episode.


Google Books previews appears to block pages randomly per user --- I don't see the entire text, unfortunately.


Wow. People thought the weirdest stuff about logic and brains back in the day.


I stopped after a few seconds of ocular recursion. A new category of warning label :)


That image was so striking and appeared to come out of nowhere. Was it some kind of marketing ploy do you think? I'm glad to have found the source anyway.


If I was part of the Google marketing machine and a developer wanted to make that nightmare image public, I'd veto them in a Mountain View minute.


Ya the PR team would have probably gone with this one:

https://photos.google.com/share/AF1QipPX0SCl7OzWilt9LnuQliat...


That image is very unpleasant.

I get creeps from fractals, but now my whole body is itching. Humans are weird.


Two remarks

1) Captain obvious says: the "tripiness" of these images is hardly coincidental, these networks are inspired by the visual cortex.

2) They had to put a prior on the low level pixels to get some sort of image out. This is because the system is trained as a discriminative classifier, and it never needed to learn this structure, since it was always present in the training set. This also means that the algorithm is going to be ignoring all sort of structures which are relevant to generation, but not relevant for discrimination, like the precise count and positioning of body parts for instance.

This makes for some cool nightmarish animals, but fully generative training could yield even more impressive results.


This is brilliant! I did something similar when I was trying to learn about neural networks a long long time ago. The results were fascinating.

I was writing a neural network trainer - to recognize simple 2D images. This was on a 300MHz desktop PC(!) so the network had to be pretty small. Which implied that the input images were just compositions of simple geometric shapes - a circle within a rectangle, two circles intersecting, etc.

When I tried "recalling" the learnt image after every few X epochs of training, I noticed the neural network was "inventing" more complex curves to better fit the image. Initially, only random dots would show up. Then it would have invented straight lines and would try to compose the target image out of one and more straight lines.

What was absolute fun to watch was, at some point, it would stop trying to compose a circle with multiple lines and just invent the circle. And then proceed to deform the circle as needed.

During different runs, I could even see how it got stuck into various local minima. To compose a rectangle, mostly the net would create four lines - but having the lines terminate was obviously difficult. As an alternative, sometimes the net would instead try a circle, which it would gradually elongate, straighten out the circumference, slowly to look more and more like a rectangle.

I was only an undergrad then, and was mostly doing this for fun - I do believe I should have written it up then. I do not even have the code anymore.

But good to know googlers do the same kinda goofy stuff :-)


I would love to see what would come out of a network trained to recognize pornographic images using this technique. :)


I'm sure that would be extremely disturbing. It would probably look like H.R. Giger's art.


That's a really worthwhile question, actually, considering the degree to which the human form and stylished portrayals of sexuality underpin so much of art and design.


I think it would look like a Picasso: https://en.wikipedia.org/wiki/Les_Demoiselles_d'Avignon


I think doing the same with videos would be far more interesting... and probably creepier.


Does anyone have a good sense of what exactly they mean here:

>Instead of exactly prescribing which feature we want the network to amplify, we can also let the network make that decision. In this case we simply feed the network an arbitrary image or photo and let the network analyze the picture. We then pick a layer and ask the network to enhance whatever it detected. Each layer of the network deals with features at a different level of abstraction, so the complexity of features we generate depends on which layer we choose to enhance. For example, lower layers tend to produce strokes or simple ornament-like patterns, because those layers are sensitive to basic features such as edges and their orientations.

Specifically, what does "we then pick a layer and ask the network to enhance whatever it detected" mean?

I understand that different layers deal with features at different levels of abstraction and how that corresponds with the different kinds of hallucinations shown, but how does it actually work? You choose the output of one layer, but what does it mean to ask the network to enhance it?


The detection layer will detect very faint random signals. For example, if you have a unit that's supposed to detect dogs, it might be very faintly activated if by random chance there is a doggish quality to some part of the image. What they do is pick up that faint, random, signal and amplify it.

They say: oh you think that cloud is a tiny bit dog-like? Ok, well then find me a small modification to the image that would make it a little more dog like, then a little more, and so on.

Think of it as semantic contrast enhancement


So in concrete terms, does this mean that we show the network an image, choose one layer's output vector, and then back-propagate gradients to the image such that the direction of that vector stays the same, but the magnitude increases?


That is my understanding of the blog post, yes.

That plus a prior on the input pixels to keep it image-like.


They've written though that they have chosen a particular layer in the network, which reads like "independent of the output layer". Features in such a layer correlate with certain classes, but I don't think they have dealt with classes at all. If that's the case, then the question is how they've amplified the detected features.


Yes they play with various layers. Layers closer to the input act more like edge enhancers, while higher layers emphasize whole objects ("animal" enhancers). You get increasingly less syntactical and increasingly more semantic as you go deeper in the network.


So "lower" layers are closer to the input, while "higher" layers are closer to the output?


My understanding: when you're doing standard gradient descent, you push the error down through the layers, modifying the weights at each layer. Now, in "normal" NN training you stop at the input layer; it makes no sense to tweak the error at the input layer, right?

But what if you did the following: flow the error down from the outputs to the layer you're interested in, but don't modify the weights of any of the layers above it; just modify the values of this layer in accordance with the error gradient.

Added later: I think we should wait till @akarpathy comes along and ELI5's it to us.


My thought is that they basically run gradient descent on the image where the loss is the magnitude of one output plane in one of the layers of the neural network. Probably using gradient descent to push up the magnitude of one of the output plane layers or something like that.


The fractal nature of many of the "hallucinated" images is kind of fascinating. The parallels to psychedelic drug-induced hallucinations are striking.


>If we choose higher-level layers, which identify more sophisticated features in images, complex features or even whole objects tend to emerge. Again, we just start with an existing image and give it to our neural net. We ask the network: “Whatever you see there, I want more of it!” This creates a feedback loop: if a cloud looks a little bit like a bird, the network will make it look more like a bird. This in turn will make the network recognize the bird even more strongly on the next pass and so forth, until a highly detailed bird appears, seemingly out of nowhere.

I would say that's a pretty good description of how many psychedelic hallucinations unfold. They start off as noise in your vision which turn into loose forms which turn into geometric patterns which turn into etc...


Some of these neural nets need to drink some orange juice and lie down for a few hours.


I'm reminded of Alex Grey's visionary art https://caudallure.files.wordpress.com/2011/09/1217891347094....


Link's broken.



OT, but this is originally a "3D" image.

It can be found in the cover art of "10000 Days", an album from American metal band Tool. The original box comes with two magnifying lenses like this:

http://s21.photobucket.com/user/Stonergrunge/media/Mis%20cos...

This (and others in the cover) look stunning through these lenses.


You can see the 3D effect of this subset of the image here: https://imgur.com/04yBHN4


Yes, this is basically artificial pareidolia. The interesting thing is that this technique would work on any kind of similarly structured neural net.


If we want to progress AI, I seriously recommend these researchers try acid or magic mushrooms.


I remember reading that drugs block certain parts of the optical nerve, which itself is a bit like fractal.

Also, personally, these pictures looked 'trippy' due to color palette that reminds hippy raves, rather than fractals.


I'll repeat what I posted on facebook because I thought it was clever: "Yes, but only if we tell them to dream about electric sheep."

So, tell the machine to think about bananas, and it will conjure up a mental image of bananas. Tell it to imagine a fish-dog and it'll do its best. What happens if/when we have enough storage to supply it a 24/7 video feed (aka eyes), give a robot some navigational logic (or strap it to someone's head), and give it the ability to ask questions, say, below some confidence interval (and us the ability to supply it answers)? What would this represent? What would come out on the other side? A fraction of a human being? Or perhaps just an artificial representation of "the human experience".

...what if we fed it books?


Neural networks are a relatively simple mathematical model. They don't actually "think" or have a conscience. Neural networks are also regularly fed books, in order to model some properties of natural language.

Here's a good introduction: http://colah.github.io/posts/2014-07-NLP-RNNs-Representation...


Neurons are also relatively simple, at least in comparison to the mind. I don't think the simplicity or complexity of the underlying model has much bearing on the higher-level properties of the network.

Now, this isn't to say that the kinds of neural networks we build today are conscious, but I don't think that's because they're based on a simple mathematical model; I think that's because they don't have the network-level properties that conscious humans do, for example, a self-representation.


It would have some kind of intelligence, at least able to recall information and form associations between things. But there's no reason to think that it would come out looking human. I mean you can show a dog lots and lots of images and it doesn't turn human.


Some comments seem to be appreciating (or getting disgusted by) the aesthetics but I think the "inceptionism" part should not be ignored:

We're essentially peeking inside a very rudimentary form of consciousness: a consciousness that is very fragile, very dependent, very underdeveloped, and full of "genetic errors". Once you have a functioning deep learning neural network, you have the assembly language of consciousness. Then you start playing with it (as this paper did), you create a hello world program, you solve the factorial function recursively, and so on. Somewhere in that universe of possible programs, is hidden a program (or a set of programs) that will be able to perform the thinking process a lot more accurately.


We're essentially peeking inside a very rudimentary form of consciousness

Blatant sensationalism. There is absolutely nothing here that would suggest consciousness. If you have a mask for matching images, you can reverse that mask and imprint it as an image. What we're seeing here is a more complicated version of the same process. Heck, look more closely. Some of those "building" images have obvious chunks of pedestrians embedded, probably because the algorithm was trained on tourist photos.

Is it interesting? Yes, from algorithmic point of view. Cool as hell. However, this has nothing to do with consciousness.

If anything, some of those images are just a more elaborate version of a kaleidoscope. It's not like they run a network and got a drawing. They were looking for a particular result, did post processing, did pre-processing and tweaked the intermediate steps (by running them multiple times until the image looked interesting). Finally, we as viewers do our share of pattern matching, similar to how we see patterns in Rorschach inkblots. And there are captions that frame what we see and "guide" us to recognizing the right objects.


> Blatant sensationalism. There is absolutely nothing here that would suggest consciousness.

Putting biological consciousness on a pedestal might be blatant sensationalism itself. By consciousness, I specifically mean the behavioral capacity of general intelligence, nothing more. If by consciousness you mean subjective character of experience then yes, there are serious issues with resolving the mind-body problem. But functionally speaking, our brains exist in a physical universe, are massively parallel, and do stochastic computations. Deep learning systems share all three of these traits except that the scale is about 3-5 order of magnitudes smaller (things like incorporating time, biological impulses are missing but if someone claims those features are going to be the dealbreaker then maybe we can have a discussion). And the scale difference is shrinking at lightning speed.

I'm not claiming DNN's are the end-all-be-all of an upcoming general electronic intelligence. But they seem to be doing mind-blowing stuff every few weeks, and it seems we've stumbled upon a radically new aspect of computation.


You did not address a single specific points I made.

I specifically mean the behavioral capacity of general intelligence

Any image-matching algorithm can be used to generate images. A simple color-matcher can be used to create very impressive things if plugged into a genetic algorithm, but no one claims it's conscious or intelligent. None of what you wrote here points out some fundamental differences between this and other image-generating techniques used before. You're simply trivializing what it means to be conscious or intelligent to the point that word is no longer useful.

I will consider an AI to be "generic" when it is able to apply training from one domain to an entirely different domain without any manual "mapping" from humans. For example, being able to decently play checkers after learning chess and being given a description of checker's rules. Applying training in image domain to image domain with tons of manual tweaking might be interesting and useful, but it's hardly qualifies as "mind-blowing".


> will consider an AI to be "generic" when it is able to apply training from one domain to an entirely different domain without any manual "mapping" from humans. For example, being able to decently play checkers after learning chess and being given a description of checker's rules.

Want to butt in to this discussion, and post this link here for anybody who hasn't seen/heard about this amazing development from a while ago: http://robohub.org/artificial-general-intelligence-that-play...


If anything, some of those images are just a more elaborate version of a kaleidoscope.

I put it to you that your 'self' is not much more complicated than any other elaborately iterated strange loop.


This is one of the most astounding things I've ever seen. Some of these images look positively like art. And not just art, but good art.


This may sound ridiculous, but I think this has the potential to be a development as foundation-shaking as Modernism itself. There has been plenty of algorithmically-derived art over the past 30 years, but generative pieces inevitably look like math – they are interesting curiosities, sometimes quite beautiful, but they don’t challenge the mind like any of the major movements of the past 150 years.

This is different because, while still just math, it’s modeled on the processes of human perception. And when successfully executed, it plays on human perception in ways that were formerly the exclusive domain of humans – Chagall, DiChirico, Picasso – gifted with some sort of insight into that perception.

Future iterations of this kind of processing, with even higher-order symbol management could get really weird, really fast.


I'm blown away by this "guided hallucination" technique. It's not a big oversimplification to describe to the layperson as: enter images into neural network; receive as output artwork representing the essence of the images.


I felt the same. I think the main aspect about these images that makes me like them is how everything feels connected, which, is what the AI is trying to find: connections. Honestly, can anyone tell me where I could order large prints of some of these?


Agreed. These are just amazing. Someone linked above to the source images on Google Photos, but even those aren't especially high-res. Would be awesome if Google released the originals.


> And not just art, but good art

Makes me wonder what passes as good art nowadays. But yeah some of the renderings were particularly aesthetic.


These images are remarkably similar to chemically-enhanced mammalian neural processing in both form and content. I feel comfortable saying that this is the Real Deal and Google has made a scientifically and historically significant discovery here. I'm also getting an intense burst of nostalgia.


The level of resemblance with a psychotropics' trip is simply fascinating. It's definitely really close to how our brain reacts when is flooded with dopamine + serotonin.

I wonder if the engineers at Google can make the same experiment with audio... It'll be funny to listen the results.


Might be interesting, although I've always liked the visuals* of psychedelica a lot more than the audio effects (which in my experience, mostly tends to make sounds be perceived really "loud" and "close", rather than "trippy"--unless that's what you associate with "trippy" audio, of course). Dunno if my experience is typical, obviously.

* also the particular mind-altering effects, which are hard to describe


I'd be interested to see if the results end up looking something like DIPT which is the only known mainly audio hallucinogenics.


I'm starting to come around to sama's way of thinking on AI. This stuff is going to be scary powerful in 5-10 years. And it will continue to get more powerful at an exponential rate.


You are not alone in your fears. Others have been ringing the alarm for some time. Nick Bostrom's Superintelligence is a good reference.


Facial-recognition neural nets can also generate creepy spectral faces. For example:

https://www.youtube.com/watch?v=XNZIN7Jh3Sg

https://www.youtube.com/watch?v=ogBPFG6qGLM

(Or if you want to put them full-screen on infinite loop in a darkened room: http://www.infinitelooper.com/?v=XNZIN7Jh3Sg&p=n http://www.infinitelooper.com/?v=ogBPFG6qGLM&p=n )

The code for the 1st is available in a Gist linked from its comments; the creator of the 2nd has a few other videos animating grid 'fantasies' of digit-recognition neural-nets.


The one generated after looking at completely random noise on the bottom row, second from the right:

http://googleresearch.blogspot.co.uk/2015/06/inceptionism-go...

Reminds me very heavily of The Starry Night https://www.google.com/culturalinstitute/asset-viewer/the-st...

Lovely imagery.

I never had much luck with generative networks. I did some work putting RBMs on a GPU partly because I'd seen Hinton talk showing starting with a low level description and feeding it forwards, but always ended up with highly unstable networks myself.



That's great, thank you.


Neural networks are notoriously difficult to train due to the large number of hyper-parameters that need to be tuned. If your network never converged, it's possible your learning rate was too high, so it kept overshooting the minima of the loss function.


Quite possibly. The classification results were great, just wasn't good when trying to run things back through the network repeatedly. I did have issues that the learning rates reported in some of the original papers didn't match the ones in the released code.


I'll be the first to say it. It looks like an acid/shroom trip.


Maybe there's something to do with how our brains interpret information differently when under the influence of psychoactive drugs.

I've been looking at Aldous Huxley's "Doors of Perception" and other psychonautic works recently and he hypothesizes that these sorts of drugs filter out the usual signals from the CNS that shut out the parts of perception that are not important for you to receive for survival.

It might be some great leap of armchair psychology, but I think we're due for another psychedelic revival, especially considering the new advances in synthetic psychedelics, legalization of more harmless recreational drugs, new tests in medical research using MDMA/LSD/Psilocybin, and the cultural shift away from the 'War on drugs'.


Really cool. You could generate all kinds of interesting art with this.

I can't help but think of people who report seeing faces in their toast. Humans are biased towards seeing faces in randomness. A neural network trained on millions of puppy pictures will see dogs in clouds.


That's essentially precisely what's happening here. You can see in the different pictures where different sets of training data were used---buildings, faces, animals.

Give the machine millions of reference images to work from and then tell it to find those images in noise, and it will succeed (because it literally can't "imagine" anything else for the noise to be).


Now I'm thinking about all those google cars, quietly resting in dark garages, dreaming about streets.


I'd really like to see what an Electric Sheep looks like. Maybe if they did a collaboration with the Android team?



Don't you mean the Tyrell corporation?


Request for startup: Neural Network on demand artist.

E.g. SaaS that takes your images and use neural network transformations. Can you make a portrait of my that I look like king.


Understanding what is going on in a neural network (or any other kind of machine learning mechanism) when it makes a decision can be important in real world applications.

For example, suppose you are a bank and you have used built a neural network to decide if credit applications should be approved. The lending laws in the US require that if you reject someone you tell them why.

Your neural network just gives a yes/no. It doesn't give a reason. What do you tell the applicant?

I have an idea how to deal with that, but I have no idea if it would satisfy the law. My approach is to run their application through multiple times, tweaking various items, until you get one that would be approved. You can then tell them it was that item that sunk them. For instance, suppose that if you raise their income by $5k, you get approval. You can tell them they were rejected for having income that is too low.


I have an idea for a company related to this concept, but for hiring and job training.


While I think this is beautiful, conceptually, I really am a bit terrified of the potential of this in reverse (the neural network for processing/understanding an image). With Google releasing their 'Photos' app, this network is about to get a direct pipeline for machine learning imagery to accelerate everything – my main fear would be the potential for this technology to be employed by weaponized drones able to scan a scene (with, eventually, incredibly high resolution cameras and microphones that far surpass human capability) and identify every single object/person in realtime (also at a rate that humans are incapable of).

Of course, there is great utility to be had as well, it just scares me to think about what could be done with this technology, in a mature form, if used for violent purposes.


This will happen for sure. Such super-perceptive computers will oversee our every movement.

Computers can already understand our emotions in writing, voice and from the expression on our faces, they can also estimate pose and understand your movements. They can label thousands of kinds of objects. And they're just starting.

They can also build neural nets 10x smaller by compressing a larger neural net while maintaining most of accuracy. That means once a problem such as vision or speech has been solved with a huge net, it can be transferred in a smaller, more efficient net.


> They can also build neural nets 10x smaller by compressing a larger neural net while maintaining most of accuracy. That means once a problem such as vision or speech has been solved with a huge net, it can be transferred in a smaller, more efficient net.

This is known as "dark knowledge". Slides from Geoff Hinton: http://www.ttic.edu/dl/dark14.pdf


Am I the only one who found those images somewhat disturbing? I wonder if they're triggering something similar to http://www.reddit.com/r/trypophobia


This unpublished one is incredibly creepy. https://i.imgur.com/6ocuQsZ.jpg



Definitely this thing likes dogs =)


I'm only mildly trypophobic but those images did have a minor effect - and I have a possible hypothesis for why it happens: what these images and trypophobia-triggering ones all have in common is a huge number of edges of various shapes and sizes, and it's this "edge overload" stimulating many more neurons than usual that's causing the disturbance. I find that the repetitive, but not-quite-the-same patterns like (organic) holes or other curvy shapes have the greatest effect; in contrast, straight lines don't do much. This makes sense since straight lines probably only trigger neurons that detect one direction, but curves have many "directions" to them.


I've always wondered about this! Back when I was a teenager I remember playing around with one of the Kai's Power Tools to generate fractal-like images (I forget exactly which one it was, but it had a bunch of presets that would generate cell-like textures).

Thing was, I found them MASSIVELY anxiety-provoking and have never been able to figure out why. They'd literally make me panicky.

These images are doing the same. Even now, just thinking about them, my stomach is fluttering. It's something about the way they're organic, but I don't know what it is. It's definitely nothing rational.

I've never heard of trypophobia; this does seem like it. I wonder if it's closely related to the feeling of disgust; somehow an evolved response to keep us away from rotten food perhaps? Things like bacteria growing on bread, or beehives. Or any food that's started decomposing.

There's a cool documentary I remember seeing called "How Art Made The World". One of the things it talks about is how we're driven to make more-than-perfect representations of things in our art. Say we find something in the real world aesthetically pleasing. With art, we can take that aesthetically pleasing stimulus and exaggerate it, resulting in the art being more pleasing than anything in the real world.

I wonder if that's what happens here, with the almost-organic images somehow being 'hyper-disgusting' as they coincidentally line up with a hyper-exaggerated version of a stimulus that on the scale of disgusting might be 'slightly unappealing' in the form that we'd encounter it in the real world.


I found an image which stimulates the edge detectors in people's brains, far more than a natural image. It tends to cause people to feel weird and not want to look at it, in a way they can't quite describe. And making the image flash and rotate rapidly made it far worse.

The image: https://i.imgur.com/Zihujue.gif

And it flashing: http://makeagif.com/i/hHzAiq


My wife found them somewhat too intense to take in rapidly. If viewing these makes you uncomfortable you should probably steer clear of psychedelic drugs, which tend to induce this sort of imagery for hours on end; as you can imagine this would be mentally tiring at the best of times.


These paintings remind me of Louis Wain's work when he was mentally ill.

Which makes me wonder, are these sophisticated neural nets mentally ill, and what would a course of therapy for them be like?


This sort of 'illness' should be supported, rather than treated.


Am I the only person who is not entirely happy about the overuse of the pop-culture term 'inception' for everything that is remotely nested, recursive or strange-loop-like?

   In this paper, we will focus on an efficient deep neural network 
   architecture for computer vision, codenamed Inception, which derives 
   its name from the Network in network paper by Lin et al [12]
   in conjunction with the famous “we need to go deeper” internet meme [1]


I haven't personally noticed any buzz-wordiness about that term lately. Maybe I'm just not looking at the same stuff as you.


In other words, you haven't trained your personal neural net with enough Inception-meme instances to be finding it everywhere in the noise? ;)


There is at least one major neuroscientific paper with similar title http://rstb.royalsocietypublishing.org/content/369/1633/2013...


Do computers dream about fractal antilopes ?http://i.imgur.com/jZtbz7f.png


This was recently posted to HN, http://tjake.github.io/blog/2013/02/18/resurgence-in-artific...

Which mentions running the NN in reverse, quote

    By far the most interesting thing I’ve learned about Deep Belief 
    Networks is their generative properties. Meaning you can look 
    inside the ‘mind’ of a DBN and see what it’s imagining. Since a 
    deep belief networks are two-way like restricted boltzmann 
    machines you can make hidden inputs generate valid visual 
    inputs. Continuing with our handwritten digit example you can 
    start with the label input say a ‘3’ label and activate it then 
    go reverse through the DBN and out the other end will pop out a 
    picture of a ‘3’ based on the features of the inner layers. This 
    is equivalent to our ability to visualize things using words, go 
    ahead imagine a ‘3’, now rotate it.


It's worth pointing out that a naive bayes or k-nearest-neighbor classifier can similarly generate examples of valid inputs.


I understand the theory behind neural networks quite well, but am not so clear on how you feed them with images, eg how do you build a network that can process megapixel images of random aspect ratios or audio files of predictable length?

I', trying to get a sense of how much effort would be involved to replicate these results if Google isn't inclined to share its internal tools, to do a neural network version of Fractint as it were, which one could train oneself. I have no clue which of the 30-40 deep learning libraries I found would be best to start with, or whether my basic instinct (to develop a node-based tool in ab image/video compositing package) is completely harebrained.

Essentially I'm more interested in experimenting with tools to do this sort of thing by trying out different connections and coefficients than in writing the underlying code. Any suggestions?


You could try Torch libraries. There are a few examples on how to (almost) replicate some of Google's neural network models on Imagenet.

Check https://github.com/torch/torch7/wiki/Cheatsheet#demos.


Does this work well enough on a modern desktop PC without having access to Google's computing resources?


No idea, but I'm sure their enormous infrastructure helps a lot. Maybe it could be done with some sort of BOINC-type platform.


Very cool. I wonder if there's some example code on Github to generate images like this?


Pretty interesting and beautiful. If I was still at my old job I'd love to try and see how helpful this is in teaching NN. My first instinct is that it would be really valuable because they tend to be blackboxy/hard to conceptualize.


They are doing nothing but starting with random noise, and then learning a representation of an image that will maximize the probability in the output layer (by suggesting to the network that this noise should have actually been recognized as a banana or what have you) and back propagating changes into the input layer. Essentially, this has been happening since 2003 in the natural language processing world where we learn 'distributed representations' of words by starting with random representations of words, and learning them by context by back propagating changes into the input layer. Very cool though.


This is fascinating. And important. We need better ways to see what neural nets are doing. At least for visual processing, we now have some.

This might be usable on music. Train a net to recognize a type of music, then run it backwards to see what comes out.

Run on the neural nets that do face popout (face/non face, not face recognition), some generic face should emerge. Run on nets for text recognition, letter forms should emerge. Run on financial data vs results ... optimal business strategies?

But calling it "inceptionism" is silly. (Could be worse, as with "polyfill", though.)


Run on the neural nets that do face popout (face/non face, not face recognition), some generic face should emerge

https://en.wikipedia.org/wiki/Eigenface


I don't understand what kind of NN they used on the painting and the photo of the antelopes(?). What was it pre-trained to recognise?

EDIT: in clarification, to pick out abstract features of an image, it must obviously be trained on many images. I'm curious about how it picked out seemingly unique characteristics of the painting, and what images it was trained on to get there.


This story has been picked up by The Guardian: http://www.theguardian.com/technology/2015/jun/18/google-ima...


Has anyone seen an explanation for the why the images end up with that colour palette?


Probably an artifact of the color space used.


Really nice. I'd be interested in seeing a more in-depth scientific description of how these images were actually generated. Are there any other publications related to this work?


There's four papers linked in the article. The last three (see below) were pretty good, haven't read the first.

http://arxiv.org/pdf/1412.0035v1.pdf

http://arxiv.org/pdf/1506.02753.pdf

http://arxiv.org/pdf/1312.6034v2.pdf


It would be interesting to know what happens if instead of tweaking it to better match a banana, they tweaked it to better match a banana and NOT match everything else.


Are the coefficients of the neurons inside the layers to "trigger" just multiplied by some constant? Not cited in the original article apparently.


Wow. Funny how those images look like dreamscapes when the trained neural nets process random noise...

Kinda make me contemplate more own conscious experience :)


a really early version of this: http://draves.org/fuse/ published as open source in the early 90s. not NN but does have the same image matching/searching.


Fascinating link, but that was arguably different: In the case of the Google links, the neural network was built for other uses, and the "image fusion" is only a side effect... It is a sort of "proof" that some really interesting things are happening behind the scenes.

In the older approaches, the image fusion was the primary intent of the system. Still very cool, but much less impressive IMHO.


I am curious what do they see if feed with a screenshot of their own code.


If anyone wants to send this to their non-dev friends, here's the write-up I sent to mine!

https://medium.com/@stripenight/seeing-how-computers-might-t...

---

tldr: To figure out how computers "think", Google asked one of its artificial intelligence algorithms to look at clouds and draw the things it saw!

There's this complex Artificial Intelligence algorithm called a neural network ( https://en.wikipedia.org/wiki/Artificial_neural_network ). It's essentially code which tries to simulate the neurons in a brain.

Over the last few years, there have been some really cool results, like using neural networks to read people's handwriting, or to figure what objects are in a picture.

To start your neural network, you give it a bunch of pictures of dogs, and tell it that those pictures contain dogs. Then you give it pictures of airplanes, and say those are airplanes, etc. Like a child learning for the first time, the neural network updates its neurons to recognize what makes up a dog or an airplane.

Afterwords, you can give it a picture and ask if the pic contains a dog or an airplane.

The problem is that WE DON'T NOW HOW IT KNOWS! It could be using the shape of a dog, or the color, or the distance between it's legs. We don't know! We just can't see what the neurons are doing. Like a brain, we don't quite know how it recognize things.

Google had a big neural network to figure out what's in an image, and they wanted to know what it did. So, they gave the neural net a picture, but stopped the neural net at different points, before it could finish deciding. When, they stopped it, they asked it to "enhance" what is just recognized. Eg. if it just saw the outline of a dog, the net would return the picture with the outline a bit thicker. Or, if it saw the colors similar to a banana, it would return the picture with those colors looking more like a banana's colors.

This seems like a simple idea, but it's actually really complex, and really insightful! Amazing images here - https://photos.google.com/share/AF1QipPX0SCl7OzWilt9LnuQliat...

Original article - http://googleresearch.blogspot.com/2015/06/inceptionism-goin...




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: