One from Vincent Vanhoucke: "This is the most fun we've had in the office in a while. We've even made some of those 'Inceptionistic' art pieces into giant posters. Beyond the eye candy, there is actually something deeply interesting in this line of work: neural networks have a bad reputation for being strange black boxes that that are opaque to inspection. I have never understood those charges: any other model (GMM, SVM, Random Forests) of any sufficient complexity for a real task is completely opaque for very fundamental reasons: their non-linear structure makes it hard to project back the function they represent into their input space and make sense of it. Not so with backprop, as this blog post shows eloquently: you can query the model and ask what it believes it is seeing or 'wants' to see simply by following gradients. This 'guided hallucination' technique is very powerful and the gorgeous visualizations it generates are very evocative of what's really going on in the network."
Perhaps they want to keep it private because they plan to announce more later. So we might see it again.
I consider myself to be very Left of center, but, I can't imagine what form of 'democratic control' you think is necessary over the research that Google and Facebook does.
I do not fault Google or Facebook for planning on time-scales longer than most governments. Governments ought to be doing this level of long-term planning, but are not (at least publicly)
At the same time, I can see the basis for some anxiety, because it's not hard to imagine proprietary research going a few steps farther and developing some sort of general intelligence or even a limited but extremely high-powered intelligence that would confer an overwhelming commercial advantage, and/or a political one. Suppose, as an exercise, that one developed an algorithm to maximize persuasiveness by first leading readers/listeners into a quiescent, semi-hypnotic state and then making your commercial or political pitch. There's certainly a potential for abuse.
In Europe this sort of thing tends to bring up the precautionary principle, the idea that you shouldn't do something without oversight and demonstrated minimization of risk. I think that's highly limiting, but expect some pushback against Google over this. Of course, I don't think democracy is all that wonderful either but then I'm a bit of a misanthrope.
I agree that it is good, but even though the scientific theories and algorithms seem to be "open", having access to both the computing power and data-sets of Google, is not.
So one could replicate these experiments, but not quite on the scale that Google does. I'm not at all sure if it's practically possible for a single (really clever) person with a high-end CPU/GPU machine (and possibly some $$$ for Cloud Computing instances), to replicate something similar to the results in this blogpost.
The recognition nets used in the blogpost seem to be trained on a tremendously high number of training examples, to give the ability to "hallucinate" (or classify) such a great variety of animal species, for instance.
It's very possible.
GoogLeNet is an example in Caffe: BVLC GoogLeNet in models/bvlc_googlenet: GoogLeNet trained on ILSVRC 2012, almost exactly as described in Going Deeper with Convolutions by Szegedy et al. in ILSVRC 2014. (Trained by Sergio Guadarrama @sguada)
Knowing that the most common parallel effect of induced hallucination via psychotropics is ego-loss (complete loss of subjective self-identity) , maybe they need to try completely inverse processes in order to create a sense of ego in a machine... Because what's real intelligence but one's sense of self?
In the same way a normal fractal is a recursive application of some drawing function, this is a recursive application of different generation or "recognition -> generation" drawing functions built on top of the CNN.
So I believe that, given a random noise image, these networks don't generate the crazy trippy fractal patterns directly. Instead, that happens by feeding the generated image back to the network over and over again (with e.g. zooming in between).
Think of it a bit like a Rorschach test. But instead of ink blots, we'd use random noise and an artificial neural network. And instead of switching to the next Rorschach card after someone thinks they see a pattern, you continuously move the ink blot around until it looks more and more like the image the person thinks they see.
But because we're dealing with ink, and we're just randomly scattering it around, you'd start to see more and more of your original guess, or other recognized patterns, throughout the different parts of the scattered ink. Repeat this over and over again and you have these amazing fractals!
1) Pick a function f: R^2 ==> R^2
2) Pick a region of R^2 (this could be the unit square for instance).
3) For each point in the region do the following:
a) Plug the point into f. Then plug f(x) into f. Then plug f(f(x)) into f, etc....
b) The norm of f(f(...f(x)...)) will either run off to infinite or stay bounded.
c) Record for the original point, x, how many iterations it took the process to run off to infinite (or the maximum if the sequence stayed bounded).
Here's the result of this process for the function:
f(x,y) = ( exp(x) * cos(y), exp(x) * sin(y) )
The trippiness is further compounded by the rainbow-ish colour effect produced by the recursive function, which mimics the "shimmering" rainbow effect you commonly get around lights when tripping on LSD.
And also, when under the influence of various drugs you tend to see patterns, particularly faces, where there aren't any.
While I agree with your idea about fractals (though you're a bit vague on the math details to know for sure), I also believe that a large reason the images look so "trippy" is because there is some local contrasting effect at work, generating high-saturation rainbow fringes at the edges of details and features. You get loads of that on psychedelics as well.
I bet there's a pretty straightforward reason to explain these rainbow fringes, if one were to dig into it, though.
Another (unrelated) observation I had was the feeling that the neural net seemed to be reproducing JPEG-artifact type fringes in the images? Though it could be that I was just looking at scaled versions of already JPEG-compressed output images, the article doesn't provide details (if only they had been PNGs ...).
Do they exhibit self-similarity at different zoom levels?
I'd love to experiment with this and video. I predict a nerdy music video soon, and a pop video appropriation soon after.
And What Happened at Cambridge IV, which I can't find online.
This last story makes the ML images even more disturbing!
Highly recommend these stories. They'd make a great black mirror episode.
I get creeps from fractals, but now my whole body is itching. Humans are weird.
1) Captain obvious says: the "tripiness" of these images is hardly coincidental, these networks are inspired by the visual cortex.
2) They had to put a prior on the low level pixels to get some sort of image out. This is because the system is trained as a discriminative classifier, and it never needed to learn this structure, since it was always present in the training set. This also means that the algorithm is going to be ignoring all sort of structures which are relevant to generation, but not relevant for discrimination, like the precise count and positioning of body parts for instance.
This makes for some cool nightmarish animals, but fully generative training could yield even more impressive results.
I was writing a neural network trainer - to recognize simple 2D images. This was on a 300MHz desktop PC(!) so the network had to be pretty small. Which implied that the input images were just compositions of simple geometric shapes - a circle within a rectangle, two circles intersecting, etc.
When I tried "recalling" the learnt image after every few X epochs of training, I noticed the neural network was "inventing" more complex curves to better fit the image. Initially, only random dots would show up. Then it would have invented straight lines and would try to compose the target image out of one and more straight lines.
What was absolute fun to watch was, at some point, it would stop trying to compose a circle with multiple lines and just invent the circle. And then proceed to deform the circle as needed.
During different runs, I could even see how it got stuck into various local minima. To compose a rectangle, mostly the net would create four lines - but having the lines terminate was obviously difficult. As an alternative, sometimes the net would instead try a circle, which it would gradually elongate, straighten out the circumference, slowly to look more and more like a rectangle.
I was only an undergrad then, and was mostly doing this for fun - I do believe I should have written it up then. I do not even have the code anymore.
But good to know googlers do the same kinda goofy stuff :-)
>Instead of exactly prescribing which feature we want the network to amplify, we can also let the network make that decision. In this case we simply feed the network an arbitrary image or photo and let the network analyze the picture. We then pick a layer and ask the network to enhance whatever it detected. Each layer of the network deals with features at a different level of abstraction, so the complexity of features we generate depends on which layer we choose to enhance. For example, lower layers tend to produce strokes or simple ornament-like patterns, because those layers are sensitive to basic features such as edges and their orientations.
Specifically, what does "we then pick a layer and ask the network to enhance whatever it detected" mean?
I understand that different layers deal with features at different levels of abstraction and how that corresponds with the different kinds of hallucinations shown, but how does it actually work? You choose the output of one layer, but what does it mean to ask the network to enhance it?
They say: oh you think that cloud is a tiny bit dog-like? Ok, well then find me a small modification to the image that would make it a little more dog like, then a little more, and so on.
Think of it as semantic contrast enhancement
That plus a prior on the input pixels to keep it image-like.
But what if you did the following: flow the error down from the outputs to the layer you're interested in, but don't modify the weights of any of the layers above it; just modify the values of this layer in accordance with the error gradient.
Added later: I think we should wait till @akarpathy comes along and ELI5's it to us.
I would say that's a pretty good description of how many psychedelic hallucinations unfold. They start off as noise in your vision which turn into loose forms which turn into geometric patterns which turn into etc...
It can be found in the cover art of "10000 Days", an album from American metal band Tool. The original box comes with two magnifying lenses like this:
This (and others in the cover) look stunning through these lenses.
Also, personally, these pictures looked 'trippy' due to color palette that reminds hippy raves, rather than fractals.
So, tell the machine to think about bananas, and it will conjure up a mental image of bananas. Tell it to imagine a fish-dog and it'll do its best. What happens if/when we have enough storage to supply it a 24/7 video feed (aka eyes), give a robot some navigational logic (or strap it to someone's head), and give it the ability to ask questions, say, below some confidence interval (and us the ability to supply it answers)? What would this represent? What would come out on the other side? A fraction of a human being? Or perhaps just an artificial representation of "the human experience".
...what if we fed it books?
Here's a good introduction: http://colah.github.io/posts/2014-07-NLP-RNNs-Representation...
Now, this isn't to say that the kinds of neural networks we build today are conscious, but I don't think that's because they're based on a simple mathematical model; I think that's because they don't have the network-level properties that conscious humans do, for example, a self-representation.
We're essentially peeking inside a very rudimentary form of consciousness: a consciousness that is very fragile, very dependent, very underdeveloped, and full of "genetic errors". Once you have a functioning deep learning neural network, you have the assembly language of consciousness. Then you start playing with it (as this paper did), you create a hello world program, you solve the factorial function recursively, and so on. Somewhere in that universe of possible programs, is hidden a program (or a set of programs) that will be able to perform the thinking process a lot more accurately.
Blatant sensationalism. There is absolutely nothing here that would suggest consciousness. If you have a mask for matching images, you can reverse that mask and imprint it as an image. What we're seeing here is a more complicated version of the same process. Heck, look more closely. Some of those "building" images have obvious chunks of pedestrians embedded, probably because the algorithm was trained on tourist photos.
Is it interesting? Yes, from algorithmic point of view. Cool as hell. However, this has nothing to do with consciousness.
If anything, some of those images are just a more elaborate version of a kaleidoscope. It's not like they run a network and got a drawing. They were looking for a particular result, did post processing, did pre-processing and tweaked the intermediate steps (by running them multiple times until the image looked interesting). Finally, we as viewers do our share of pattern matching, similar to how we see patterns in Rorschach inkblots. And there are captions that frame what we see and "guide" us to recognizing the right objects.
Putting biological consciousness on a pedestal might be blatant sensationalism itself. By consciousness, I specifically mean the behavioral capacity of general intelligence, nothing more. If by consciousness you mean subjective character of experience then yes, there are serious issues with resolving the mind-body problem. But functionally speaking, our brains exist in a physical universe, are massively parallel, and do stochastic computations. Deep learning systems share all three of these traits except that the scale is about 3-5 order of magnitudes smaller (things like incorporating time, biological impulses are missing but if someone claims those features are going to be the dealbreaker then maybe we can have a discussion). And the scale difference is shrinking at lightning speed.
I'm not claiming DNN's are the end-all-be-all of an upcoming general electronic intelligence. But they seem to be doing mind-blowing stuff every few weeks, and it seems we've stumbled upon a radically new aspect of computation.
I specifically mean the behavioral capacity of general intelligence
Any image-matching algorithm can be used to generate images. A simple color-matcher can be used to create very impressive things if plugged into a genetic algorithm, but no one claims it's conscious or intelligent. None of what you wrote here points out some fundamental differences between this and other image-generating techniques used before. You're simply trivializing what it means to be conscious or intelligent to the point that word is no longer useful.
I will consider an AI to be "generic" when it is able to apply training from one domain to an entirely different domain without any manual "mapping" from humans. For example, being able to decently play checkers after learning chess and being given a description of checker's rules. Applying training in image domain to image domain with tons of manual tweaking might be interesting and useful, but it's hardly qualifies as "mind-blowing".
Want to butt in to this discussion, and post this link here for anybody who hasn't seen/heard about this amazing development from a while ago: http://robohub.org/artificial-general-intelligence-that-play...
I put it to you that your 'self' is not much more complicated than any other elaborately iterated strange loop.
This is different because, while still just math, it’s modeled on the processes of human perception. And when successfully executed, it plays on human perception in ways that were formerly the exclusive domain of humans – Chagall, DiChirico, Picasso – gifted with some sort of insight into that perception.
Future iterations of this kind of processing, with even higher-order symbol management could get really weird, really fast.
Makes me wonder what passes as good art nowadays. But yeah some of the renderings were particularly aesthetic.
I wonder if the engineers at Google can make the same experiment with audio... It'll be funny to listen the results.
* also the particular mind-altering effects, which are hard to describe
(Or if you want to put them full-screen on infinite loop in a darkened room: http://www.infinitelooper.com/?v=XNZIN7Jh3Sg&p=n http://www.infinitelooper.com/?v=ogBPFG6qGLM&p=n )
The code for the 1st is available in a Gist linked from its comments; the creator of the 2nd has a few other videos animating grid 'fantasies' of digit-recognition neural-nets.
Reminds me very heavily of The Starry Night https://www.google.com/culturalinstitute/asset-viewer/the-st...
I never had much luck with generative networks. I did some work putting RBMs on a GPU partly because I'd seen Hinton talk showing starting with a low level description and feeding it forwards, but always ended up with highly unstable networks myself.
full resolution image
I've been looking at Aldous Huxley's "Doors of Perception" and other psychonautic works recently and he hypothesizes that these sorts of drugs filter out the usual signals from the CNS that shut out the parts of perception that are not important for you to receive for survival.
It might be some great leap of armchair psychology, but I think we're due for another psychedelic revival, especially considering the new advances in synthetic psychedelics, legalization of more harmless recreational drugs, new tests in medical research using MDMA/LSD/Psilocybin, and the cultural shift away from the 'War on drugs'.
I can't help but think of people who report seeing faces in their toast. Humans are biased towards seeing faces in randomness. A neural network trained on millions of puppy pictures will see dogs in clouds.
Give the machine millions of reference images to work from and then tell it to find those images in noise, and it will succeed (because it literally can't "imagine" anything else for the noise to be).
E.g. SaaS that takes your images and use neural network transformations. Can you make a portrait of my that I look like king.
For example, suppose you are a bank and you have used built a neural network to decide if credit applications should be approved. The lending laws in the US require that if you reject someone you tell them why.
Your neural network just gives a yes/no. It doesn't give a reason. What do you tell the applicant?
I have an idea how to deal with that, but I have no idea if it would satisfy the law. My approach is to run their application through multiple times, tweaking various items, until you get one that would be approved. You can then tell them it was that item that sunk them. For instance, suppose that if you raise their income by $5k, you get approval. You can tell them they were rejected for having income that is too low.
Of course, there is great utility to be had as well, it just scares me to think about what could be done with this technology, in a mature form, if used for violent purposes.
Computers can already understand our emotions in writing, voice and from the expression on our faces, they can also estimate pose and understand your movements. They can label thousands of kinds of objects. And they're just starting.
They can also build neural nets 10x smaller by compressing a larger neural net while maintaining most of accuracy. That means once a problem such as vision or speech has been solved with a huge net, it can be transferred in a smaller, more efficient net.
This is known as "dark knowledge". Slides from Geoff Hinton: http://www.ttic.edu/dl/dark14.pdf
Thing was, I found them MASSIVELY anxiety-provoking and have never been able to figure out why. They'd literally make me panicky.
These images are doing the same. Even now, just thinking about them, my stomach is fluttering. It's something about the way they're organic, but I don't know what it is. It's definitely nothing rational.
I've never heard of trypophobia; this does seem like it. I wonder if it's closely related to the feeling of disgust; somehow an evolved response to keep us away from rotten food perhaps? Things like bacteria growing on bread, or beehives. Or any food that's started decomposing.
There's a cool documentary I remember seeing called "How Art Made The World". One of the things it talks about is how we're driven to make more-than-perfect representations of things in our art. Say we find something in the real world aesthetically pleasing. With art, we can take that aesthetically pleasing stimulus and exaggerate it, resulting in the art being more pleasing than anything in the real world.
I wonder if that's what happens here, with the almost-organic images somehow being 'hyper-disgusting' as they coincidentally line up with a hyper-exaggerated version of a stimulus that on the scale of disgusting might be 'slightly unappealing' in the form that we'd encounter it in the real world.
The image: https://i.imgur.com/Zihujue.gif
And it flashing: http://makeagif.com/i/hHzAiq
Which makes me wonder, are these sophisticated neural nets mentally ill, and what would a course of therapy for them be like?
In this paper, we will focus on an efficient deep neural network
architecture for computer vision, codenamed Inception, which derives
its name from the Network in network paper by Lin et al 
in conjunction with the famous “we need to go deeper” internet meme 
Which mentions running the NN in reverse, quote
By far the most interesting thing I’ve learned about Deep Belief
Networks is their generative properties. Meaning you can look
inside the ‘mind’ of a DBN and see what it’s imagining. Since a
deep belief networks are two-way like restricted boltzmann
machines you can make hidden inputs generate valid visual
inputs. Continuing with our handwritten digit example you can
start with the label input say a ‘3’ label and activate it then
go reverse through the DBN and out the other end will pop out a
picture of a ‘3’ based on the features of the inner layers. This
is equivalent to our ability to visualize things using words, go
ahead imagine a ‘3’, now rotate it.
I', trying to get a sense of how much effort would be involved to replicate these results if Google isn't inclined to share its internal tools, to do a neural network version of Fractint as it were, which one could train oneself. I have no clue which of the 30-40 deep learning libraries I found would be best to start with, or whether my basic instinct (to develop a node-based tool in ab image/video compositing package) is completely harebrained.
Essentially I'm more interested in experimenting with tools to do this sort of thing by trying out different connections and coefficients than in writing the underlying code. Any suggestions?
This might be usable on music. Train a net to recognize a type of music, then run it backwards to see what comes out.
Run on the neural nets that do face popout (face/non face, not face recognition), some generic face should emerge. Run on nets for text recognition, letter forms should emerge. Run on financial data vs results ... optimal business strategies?
But calling it "inceptionism" is silly. (Could be worse, as with "polyfill", though.)
EDIT: in clarification, to pick out abstract features of an image, it must obviously be trained on many images. I'm curious about how it picked out seemingly unique characteristics of the painting, and what images it was trained on to get there.
Kinda make me contemplate more own conscious experience :)
In the older approaches, the image fusion was the primary intent of the system. Still very cool, but much less impressive IMHO.
tldr: To figure out how computers "think", Google asked one of its artificial intelligence algorithms to look at clouds and draw the things it saw!
There's this complex Artificial Intelligence algorithm called a neural network ( https://en.wikipedia.org/wiki/Artificial_neural_network ). It's essentially code which tries to simulate the neurons in a brain.
Over the last few years, there have been some really cool results, like using neural networks to read people's handwriting, or to figure what objects are in a picture.
To start your neural network, you give it a bunch of pictures of dogs, and tell it that those pictures contain dogs. Then you give it pictures of airplanes, and say those are airplanes, etc. Like a child learning for the first time, the neural network updates its neurons to recognize what makes up a dog or an airplane.
Afterwords, you can give it a picture and ask if the pic contains a dog or an airplane.
The problem is that WE DON'T NOW HOW IT KNOWS! It could be using the shape of a dog, or the color, or the distance between it's legs. We don't know! We just can't see what the neurons are doing. Like a brain, we don't quite know how it recognize things.
Google had a big neural network to figure out what's in an image, and they wanted to know what it did. So, they gave the neural net a picture, but stopped the neural net at different points, before it could finish deciding. When, they stopped it, they asked it to "enhance" what is just recognized. Eg. if it just saw the outline of a dog, the net would return the picture with the outline a bit thicker. Or, if it saw the colors similar to a banana, it would return the picture with those colors looking more like a banana's colors.
This seems like a simple idea, but it's actually really complex, and really insightful! Amazing images here - https://photos.google.com/share/AF1QipPX0SCl7OzWilt9LnuQliat...
Original article - http://googleresearch.blogspot.com/2015/06/inceptionism-goin...