Hacker News new | comments | show | ask | jobs | submit login
Deep image prior 'learns' on just one image (dmitryulyanov.github.io)
782 points by singularity2001 9 months ago | hide | past | web | favorite | 223 comments



Wow:

"In this work, we show that, contrary to expectations, a great deal of image statistics are captured by the structure of a convolutional image generator rather than by any learned capability. This is particularly true for the statistics required to solve various image restoration problems, where the image prior is required to integrate information lost in the degradation processes.

To show this, we apply untrained ConvNets to the solution of several such problems. Instead of following the common paradigm of training a ConvNet on a large dataset of example images, we fit a generator network to a single degraded image. In this scheme, the network weights serve as a parametrization of the restored image. The weights are randomly initialized and fitted to maximize their likelihood given a specific degraded image and a task-dependent observation model.

We show that this very simple formulation is very competitive for standard image processing problems such as denoising, inpainting and super-resolution. This is particularly remarkable because no aspect of the network is learned from data; instead, the weights of the network are always randomly initialized, so that the only prior information is in the structure of the network itself. To the best of our knowledge, this is the first study that directly investigates the prior captured by deep convolutional generative networks independently of learning the network parameters from images."

PS. This makes me wonder whether and to what degree the structure of the brain's connectome might be a necessary prior for AGI.


This is great work. It shows that some of the "amazing" results of deep learning are not as deep and don't even require learning!

In my view this also sheds some light on the GAN's and their ability to generate "real looking images". Perhaps there is much less to generating "real looking images" then everyone attributes. E.g. in this work, the network clearly knows exactly nothing about the world and generates good looking in-paint.


"the network clearly knows exactly nothing about the world"

Well, except the network structure itself.


The results are amazing, at least to me!


> PS. This makes me wonder whether and to what degree the structure of the brain's connectome might be a necessary prior for AGI.

I've seen an analogy in hardware that is potentially interesting to explore.

There is a technique in A/D converters (ADC) where input signal noise can be reduced by RMS averaging the sequence of digital outputs. This increases the effective resolution of the ADC, while lowering the effective sampling frequency of the input signal.

In the case that the input noise of a precision ADC is very low, the digital averaging has no effect on effective resolution. Surprisingly, in this case you can actually inject a "hand-crafted" noise source into the input, and still realize the resolution improvements from digital averaging. The key is to find the right amount of noise to randomize the quantization noise.

In the case of the ADC, the structure of the quantizing hardware is known, as is the characteristics of quantization noise. This allows for a guassian noise to be injected with the input source (might be analogous to a "prior"). From there, one can balance the input injection noise with the output digital averaging to get an optimal increase in output resolution.

This technique is explained well in this paper: http://www.analog.com/en/analog-dialogue/articles/adc-input-...


I have been reading about fractal image compression (don't sue me patent trolls, I haven't implemented anything I promise!) lately and so this doesn't surprise me very much. I also wouldn't be surprised if it turned out that what is happening inside the network is essentially similar in many respects to how fractal image compression works.

Fractal image compression is inherently pretty simple in concept: Find portions of the image that are similar to transformed other portions of the image. Store the things needed to transform one portion into the other, and patch together enough of those to cover the whole thing. Then you can throw out the image itself. You only need the ways in which the parts can be transformed into the others (affine transformations along with brightness/contrast shifts usually). Once you have those, you can literally start with _any_ source image and iterating the application of the transformations are guaranteed to result in something very close to the original image.

A CNN 'rediscovering' this technique feels intuitively like a very natural thing to occur, and the iterative images presented there smack of early iterations of a fractal image 'decoding' from a blank source image. The connection, of course, could be utterly specious and I am just guessing. I am intrigued, however, as I've been wanting to investigate using deep learning to perform VHS video capture cleanup as a side project for awhile now.


But I see one problem. Say the image is a photo of a driveway. The gravel on the driveway will look like noise. Will the algorithm now smooth out the driveway, effectively removing the gravel?

At least a "learned", context-aware approach can prevent this from happening in theory.


I think you are right.

But as a brush in a photo editor program it would be great. Then a used can 'mark' what areas need to be smoothed and the result will be great.


In the examples they show an image of a woman next to a table. The table is covered with a table cloth. In the corrupted image the table cloth looks like all noise to the human eye, however in the reconstructed image you see that the method is able to recover most of the pattern in the cloth.


That's something different: a demonstration of an "inpainting" problem. The black pixels were specifically marked as corrupt. Also, the pattern on the cloth is much less granular than noise.


Ah, correct. Since the image was in grey I thought is was added noise, not missing pixels. Thanks for pointing that out.


still the problem space is greatly reduced - all we need now is a nn to detect corrupted pixels


We need more than just the information where corrupted pixels are.

For example, consider an image of a woman, severly blurred such that her individual fingers cannot be discerned but you can still see that it's a woman. A "learned" context-aware approach can now deduce that the "blob" that is her hand should be inpainted by fingers. A non-learned approach can't do that.


But the fingers it draws will be random fingers. Not her fingers.


true but it's still pretty amazing that those specific number fluctuations represent fingers (random or otherwise) without training


Since a convolutional network applies parameters (filters) to different parts of an image evenly, this means that this approach still learns, except not from a corpus of images, but from the image itself (i.e. one part of the network learns from different parts of the image).

> This makes me wonder whether and to what degree the structure of the brain's connectome might be a necessary prior for AGI.

Convnets have been invented to speed up learning (reusing parameters). The human brain (afaik) doesn't contain such speedup. Instead it relies on massive parallellism of neurons, and thus trains all "filters" separately instead of reusing them. Therefore, I suppose this approach would not apply to humans.


Another way to think of it:

Convnets are a method to speed up learning by sharing filter-coefficients. In other words, one could just as well train the network by replacing the convnets by fully-connected layers, except that of course training would take a lot longer.

If one views convnets as purely a computational optimization, then the approach of the paper can be considered a "hack" that only works because of this optimization.


Can someone explain this in a way so that an ordinary mortal computer scientist can understand it?


Here's a psuedocode rundown of exactly what it does:

  network = new NeuralNetwork()
  
  targetOutput = readFile('./images/zebra-500x500-corrupted.jpg')
  input = generateNoise(500, 500)

  while(iterationCount < overfitThreshold) {
    networkOutput = network.getOutput(input)
    loss = getLoss(targetOutput, networkOutput)
    network.backPropogate(loss)

    iterationCount++
  }

  writeFile('./images/zebra-500x500-denoised.jpg', networkOutput)


> Can someone explain this in a way so that an ordinary mortal computer scientist can understand it?

I'll try.

Instead of the common approach that tries to search for image pixels to minimize e.g. a denoising objective, they realize that they can instead search for the weights of an image generator network such that the generated image matches the objective.

They argue that the structure of the network then constitutes some prior knowledge over what a natural image should look like.

My (probably wrong) interpretation: since a convolutional neural network essentially works by looking for some spatial patterns at different resolutions, their optimization process boils down to finding the high and mid-resolution patterns that best match the input image, and then re-using that information to fill in the missing information (or replacing "noise" that does not match the extracted pattern).


What do they mean by ‘the structure of the network’ is this the network topology?


There's a great explanation of the idea here in the comments : https://news.ycombinator.com/item?id=15820994


There exist neural network structures that can solve the image processing problems shown when weights are randomly applied.

This means that for these problems the solution is encoded entirely in the network structure and not the weights.

We could use linear regression as an analogy: y = b f(x).

This is akin to saying that for some class of problems, it's the case that you can get as good of a solution from the function you pick alone (x^2, log(x)) as you could from picking a function and then computing the best coefficient. Note this isn't true for linear regression for any meaningful problems, but I feel like it's a decent analogy.


> PS. This makes me wonder whether and to what degree the structure of the brain's connectome is a necessary prior for AGI.

Well, I wouldn't mix up AGI and AGI by deep learning, and more important I would emphasise that this is a good prior for images. The fundamental insight in CNNs and eventually in this work is that there is a correlation between pairs of nearby pixels. We have something similar for video and audio processing, but nothing remotely similar for other more abstract intelligence tasks.


Thanks. I'm not mixing them up! I'm just wondering whether and to what degree architecture, i.e., network structure, will prove important for other, more advanced AI tasks, including up to AGI.


To throw a dissenting voice into the mix:

I do not think intelligence is a consequence of structure. Transparency is a consequence of how light interacts with objects. There is no "transparent gold". And being transparent is not something we can program gold to do.

Programming is the application of an electric field across a silicon surface: this can no more transmute silicon into gold as it can into a nervous system.

Consciousness is a biological activity. It is not something wood, silicon, sand, metal, glass, water, etc. can do by virtue of moving around in an interesting pattern. To me, this is akin to believing magic, or to thinking that the issue with Bug's Bunny's consciousness is that the ink wasnt laid out in the right way. Rather, there isnt any.

As scientists we should begin very clearly by identifying the target phenomenon. Line up everything in the known universe that is conscious and you will find that it engages in a highly specific biochemical activity that requires a whole set of highly specific chemical interactions.

To believe these can be achieved by applying an electric field across some silicon seems like the victorian superstition that a person could be raised from the dead by doing likewise.

Rather, all we are able to do is imitate in the weakest sense. People fooled by chatbots are no more enlightened than the early victorians who were shocked to watch the movie of a train coming at them. Fooling people is a very easy thing to do. And esp. in the area of consciousness when "alertness to things that might be conscious" seems a deep psychological preoccupation of almost every animal to the point of absurdity (eg. being scared at the wind).


> Transparency is a consequence of how light interacts with objects. There is no "transparent gold". And being transparent is not something we can program gold to do.

Your remark made me research this, and apparently transparent gold exists, we can create it, and you can buy it [1]. The trick appears to be making the gold thin enough.

> Line up everything in the known universe that is conscious and you will find that it engages in a highly specific biochemical activity that requires a whole set of highly specific chemical interactions.

Line up everything in the known universe that engages in this highly specific biochemical activity, and you'll find a whole lot of things that most people wouldn't call "conscious". So the biochemical activity can't be enough on it's own to create "consciousness". If you compare those "unconscious" organisms with "conscious" ones, you'll find that that the "conscious" organisms have a large amount of cells with this biochemical activity arranged in an intricate network structure.

Now why do you think it's not this structure that creates "consciousness"? Why do the details of the biochemical activity matter beyond being a particular physical realization of some differential equations? Why would a machine made out of silicon, when it precisely emulates the behavior of that biochemical activity, be unable to replicate it's large-scale properties (including "consciousness")?

[1] http://www.reynardcorp.com/products/optical-components/coati...


Sure, I meant we cannot impart the property of transparency to a gold bar by "programming" it. Programming isnt a magical spell that can rewrite the causal interactions that take place in the universe.

> and you'll find a whole lot of things that most people wouldn't call "conscious"

I disagree. I cannot think of anything with a nervous system engaging in the particular neurochemical reaction I'm talking about not being conscious. No example comes to mind?

I dont mean any old reaction. As, the digestion of wheat. I mean the specific kind that define neurological systems.

> why do the details of the biochemical activity matter beyond being a particular physical realization of some differential equations

Because the "details" we're talking about are the causal effects.

> precisely emulates the behavior of that biochemical activity

For the same reason you cannot programme a gold bar into transparency. Or program a table into an elephant.

Or to put it another way, a program which "precisely emulates the behavior of gold" doesnt turn machine into gold.

A program which "precisely emulates the behavior of digestion" does not digest pizza.

A program which "precisely emulates the behavior of" consciousness isnt conscious.

By "emulation" you mean, "imitation in form". Since it is only the FORMS the program and the nervous system share (or, gold, whatever). Ie., that they can be both described to an extreme level of abstraction in the same way. But the universe isnt abstract. It isnt a form.

A program simulating a gold bar may be an instance of some equation that a gold bar is -- but possesses none of the properties that make it gold.

A machine which imitates some highly abstract equational description of thought is as close to thinking as a bird is to an aeroplane. The bird's heart will burn as much jet fuel as your machine will think.

The universe isnt taking place in the abstract, it's taking place in the concrete. You have no soul. Your mind is not a program. You consciousness is not ideal. It isnt a number. A pattern, a structure, an equation. These are descriptions. Your mind is something your body is doing -- as so it is for every known thing that has a mind.

To speak as if the mind could be abstracted enough to a description that may be realized in silicon is to believe in an almost magical power of electric.


> Or to put it another way, a program which "precisely emulates the behavior of gold" doesnt turn machine into gold.

> A program which "precisely emulates the behavior of digestion" does not digest pizza.

> A program which "precisely emulates the behavior of" consciousness isnt conscious.

If my goal is probing the behavior of gold, I don't care if it's real or emulated.

If my goal is having a list of chemicals pizza turns into, I don't care if the digestion is real or emulated.

If my goal is having a conversation, I don't care if they're a p-zombie.

Tell me if I'm missing anything here:

Either a computer can perfectly imitate a neuron's behavior, or it can't.

If it can, then a giant computer should be able to make either a conscious being or a P-zombie. (Do you make a distinction there? If you do, can you justify it?)

If you say it can't, then I accuse you of magical thinking. There is no evidence of physical interactions that cannot be emulated.

> To speak as if the mind could be abstracted enough to a description that may be realized in silicon is to believe in an almost magical power of electric.

Here's the thing. The idea of emulating a person does not depend on abstracting the mind. The 'proof of concept' is just building a computer so big that you emulate an entire nervous system, fully intact, every single atom.


A simulation of digestion, does not digest pizza.

> Either a computer can perfectly imitate digestion, or it can't.

Yeah, it cant.

What you seem to think is that "imitation in form" means being the same as. A photograph of me is imitation in the form of me, it isnt me. It doenst have a pulse.

The universe is not an idealised abstract world of Forms, and so anything with the same abstract pattern is equivalent.

Abstract patterns exist only form our point of view. They are our way of describing things.

To say a computer imitates anything is only to say that when we look at it, we can use it to inform ourselves. Simulating digestion is a means of learning about digestion because we design it to correlate across systems.

Without a point-of-view to impose correspondence, the two systems are a like in no respect. "Being a simulation of" is like Beauty. Its not a process in the universe, its not a property of objects.

A piece of silicon with an oscillating electric field across it remains the same regardless of whether it is running the "digestion simulation" or the "consciousness simulation". It is only a matter of electrical frequencies and amplitudes.

This acquires meaning to us when we look at them. In the case of digestion one 01010 pattern means "too acidic", in the case of a lifelike chat bot "01010" means "hi!".

The cinema screen is not a train. It's a movie. Its just light hitting a canvas. It wont hurt you. 01010 isnt acid. It isnt thought. It's just a voltage fluctuation, it cant talk to you.


> The cinema screen is not a train. It's a movie.

Ooh, this is a good example!

When I watch a movie, I don't care if the camera was pointed at real events, or if it was CGI.

I care about the actual movie experience. It's the same either way.

When I have a telephone call with someone, I don't care if the microphone is pointed at flesh. I care about the conversation.

> What you seem to think is that "imitation in form" means being the same as. A photograph of me is imitation in the form of me, it isnt me. It doenst have a pulse.

Actually my thought process is not that at all. My goal is to create an identical photograph with my computer. Or more specifically, 60 of them per second, plus audio.

Do you think that's possible or impossible?

If it's possible, then I have something that claims to be real, that claims to feel, that responds as intelligently as a human in every way... That's good enough for a lot of people to call it conscious. It's definitely good enough to drive a car, or translate a book, or fill out a captcha, or do whatever "AGI" is supposed to do.

If it's not possible, then why? If everything is a purely physical process, can't a computer can use math to figure out where atoms would go, and where photons would bounce?


> When I watch a movie, I don't care if the camera was pointed at real events, or if it was CGI.

We call something conscious because it actually thinks, not because it can fool human beings into believing it does.

The foolishness of a human audience is not the measure of all things. Whether neo exists or not matters.

We have moral obligations to thinking things, we have none to non-thinking things. etc. We have none to fictional things.


What evidence do you have that you are conscious?

What if your neurons are running a biological program that merely emulates thinking, and fools humans into believing that it thinks?

But all of that is philosophy, right?

In terms of external behavior, you're admitting this machine could match a human?

Because that's enough for AGI. You don't need "genuine consciousness".


They are not "running a biological program". Acid melting metal isn't a "program" it's a chemical reaction.

Consciousness is what I am doing. That's my definition of the term. It's incoherent to say "what's my evidence for it".

> In terms of external behavior, you're admitting this machine could match a human?

No. I am only saying that a stupid human being using only stupid human faculties of speaking to something might be fooled, by their stupidity, into thinking they were speaking to something as equally stupid as their are.

A dog which observes a toy dog walking around may bark at it as-if it were another dog. But that toy dog isnt. it doesnt think, it has no emotion, experiences no pain. has no goals, no interests, no skills, etc.

No piece of silicon will every amount to more than a toy dog. Neuroscientists will not be fooled by it. Incraesingly sophisticated tests of cognition and goal-directed behaviour won't be fooled by it.

People on the street might be. But their foolishness is no guide to the temperature of the sun, the cause of the tides, nor whether the toy dog thinks.,


So you think no silicon computer could ever correctly calculate the way atoms move, and photons bounce?

Because that's the only way for it to fail cognition tests! It has to calculate different photons than you'd get from a real body.

And I feel like the idea that some physical processes just can't be simulated is magical thinking.


Does simulating a gold bar turn the simulation into gold?

I feel im going round in circles here. A description of a system, even if that description is "active" (ie., it evolves over time)... is not the same as the system.

I can take a photograph of a mountain. My hand has no mountain in it, but a photograph.

I can describe any system with mathematics. I can give every atom in your body a number for all of its properties, and a location with respect to the centre of the earth and then append all those numbers together to form a massive number X.

Over time because your position and properties change the number which describes you X2 differs from X.

If I start by writing X down, then X2, then X3, etc. then the ink on the page describes you as a changing system.

The ink isnt you. It isnt alive. It's ink.

That is all a machine is. We have replaced ink with electricity and paper with silicon, but it is only correlated with a changing number.

WE perform the translation from this number to something else in the world. The person who reads the ink on the paper is the person who knows what X becoming X2 means as it pertains to you. The paper and ink do not mean anything alone, they are not you and share none of your properties.

When a chat bot says "hi" it is just a change to an LCD screen caused by an electrical fluctuation caused by a prior electrical fluctuation. It is just a new line of ink on the page.

There is nothing thinking, saying, speaking, understanding, "hi". It is just a change to some liquid crystal which fools you as you read it. YOU impart meaning on it when you look at it. It isnt a person, it's just ink.

The most ambitious we can be about machines is that they fool people into thinking they are having a conversation with someone. Ie., that paper shreds with ink phrases on them tumble out of a folder in just-the-right-order as-if a conversation was happening. But it's a tumbling of electric, there is no conversation.


> If I start by writing X down, then X2, then X3, etc. then the ink on the page describes you as a changing system.

> The ink isnt you. It isnt alive. It's ink.

> That is all a machine is. We have replaced ink with electricity and paper with silicon, but it is only correlated with a changing number.

Okay, good. Now here is the important part.

I am not making the claim that this machine is a real person.

I am making the claim that if you video-call the machine, it will produce lights and sounds that are identical to video calling a real person.

If the simulation is accurate, isn't that a straightforward consequence?

You keep saying that the machine won't be real AND that it will fail sophisticated cognition tests.

Those are not the same issue.

I'm not arguing that it's real. I'm arguing that it will pass every cognition test you can throw at it.

Is there a flew in my very short chain of logic?


There are tests to distinguish gold from silicon. And so there will be for however your machine works. I cannot say now what those test will be, any more than you can make such a machine.

Something not being the thing it is simulating entails there will be tests to distinguish it.

The tests cannot be "cognative" because there is no cognition going on.

A simulation of an animal does not live in the world, experience and understand it -- nor render the world intelligible to itself and others. A simulation of an animal is just a current flowing around a wire.

I would guess the tests will always, actually, be quite simple -- perhaps not expressible as a single conversation -- because somehow like a chess game it has every infinite sequence of conversations available to it.

Perhaps, in a novel social situation adapting to evolving minds as they respond to each other. Suppose this is recorded today: https://www.youtube.com/watch?v=99IitWYZ0aU#t=60s

and your machine is sat in the audience.

The question is then: "What does Kenneth think about the audience's laughter?"


> There are tests to distinguish gold from silicon. And so there will be for however your machine works.

Unless something was coded wrong, gold inside the simulation will give the same numbers as gold outside, so you can't tell which is which.

Let me be clear here. You don't touch the machine directly, you get the numbers the simulation spits out. The 'proof of concept' interaction is that you are on a video call with the atoms and photons inside the simulation. The video looks like a person, and your job is finding any difference between it and a video call with a real person.

> Something not being the thing it is simulating entails there will be tests to distinguish it.

That implies that the atoms and photons are being simulated incorrectly. Are you saying it's not possible to have an accurate simulation of atoms and photons?

> A simulation of an animal does not live in the world

The simulated atoms interact with a simulated cage, so they should give the same results despite being mere electricity.

> I would guess the tests will always, actually, be quite simple -- perhaps not expressible as a single conversation -- because somehow like a chess game it has every infinite sequence of conversations available to it.

> Perhaps, in a novel social situation adapting to evolving minds as they respond to each other.

> The question is then: "What does Kenneth think about the audience's laughter?"

The simulated atoms will change simulated orientation the same way real atoms would. If you look for physical evidence of learning you'll find the same molecules moving the same way. Why would the simulation fail this test?


> You don't touch the machine directly, you get the numbers the simulation spits out.

So you're restricting the use of this machine to a situation designed to fool human beings?

The criterion for general intelligence is actual intelligence, not putting it in ideal conditions and seeing if people are fooled.

> Why would the simulation fail this test?

It seems like you think this machine is going to simulate the entire universe, evolve its model of the universe and therefore perfectly predict its next state -- and on the basis of this prediction provide an answer.

Sure, perhaps I will concede: if you are able to simulate the universe in infinite detail you might be able to perfectly predict its next state.

This isnt any ambition of any one however. And has nothing to do with AI. If the precondition of AI is "a perfect simulation of everything", then that's close enough -- for me -- to call it impossible.

Even so, in this sense however, we do not " simulate gold " . What it means to "simulate gold" is to take some small number of aspects, model them with equations, and run those equations.

A video game that allowed you to perform mass-spectrometry on any possible compound, along with everything else you could possibly do to everything -- would I think, be a video game which requires a whole other universe to exist.

And so, it seems your argument is that "when scientists can model the universe in infinite detail so as to perfectly predict its next state, we will have AI!" (and, as far as the quantum state of brains go, it's close enough to inifinity to model all of that)..

OK, sure. I don't know how scientists are going to build a universe simulator without a "theory of everything" and how, even with such a theory, a machine can predict the next-state of a large system in sub-infinite times. Processing merely some particle collisions in the LHC takes months.

I cannot see how a machine is going to actually track the evolving entangled state of an audience of human beings.

"Infinitely precise information about the universe" I think actually requires you to actually be the universe. That's maybe a speculation however, but I would be surprised if the universe could be described in less volume than it occupies. And if an infinity of precision (ie., perfect parity in every simulated result) is actually possible without the target system.

A machine can only simulate what is known. The actual behaviour of the universe is much larger than what is known. As soon as we discover something new, then we have a test to prove the machine is a machine.


> The criterion for general intelligence is actual intelligence, not putting it in ideal conditions and seeing if people are fooled.

Having to do it across a wire is "ideal conditions"?

The initial comment was about "artificial general intelligence". Every single one of those problems can be done across a wire.

Every word that has ever been spoken, every gesture that has ever been made, you can do across a wire.

It's good enough to solve any practical problem in the world. It just won't be "real".

> It seems like you think this machine is going to simulate the entire universe, evolve its model of the universe and therefore perfectly predict its next state -- and on the basis of this prediction provide an answer.

No, it's going to simulate a tiny cubicle with a person inside.

I guess you could call it a simulated universe, but the universe is only two cubic meters.

> I cannot see how a machine is going to actually track the evolving entangled state of an audience of human beings.

Put cameras in the seat in the theater. One copy of the feed goes to a real person, the other goes to the machine. Both can track the evolving state of the audience fine. Neither one should be expected to perfectly simulate the rest of the audience.

> I would be surprised if the universe could be described in less volume than it occupies.

Don't worry, I don't expect the machine to be smaller than two cubic meters!

> A machine can only simulate what is known. The actual behaviour of the universe is much larger than what is known. As soon as we discover something new, then we have a test to prove the machine is a machine.

That's fair. So version 1.0 will have slightly-wrong physics. Do you think that will necessarily make the simulation go awry? Do you think we'll never know enough about physics to simulate a small box with a person in it?


> Do you think we'll never know enough about physics to simulate a small box with a person in it?

In the sense you mean simulation, ie., describe with all required detail some system -- we can barely simulate a few atoms let alone a room with a human being in it.

Im not sure this is even a question of knowing the physics. The problem is that even an atom has an infinite density of "descriptive information"... ie., in order to describe in toto we would be calculating forever.

This is not what any one in AI is even trying to do by the way. This isnt machine learning. This isn't AI.

I'm not convinced simulation in this depth will ever be achieved, I cannot imagine it could ever be performant. Every single causal interaction taking place over a second is an entire universe in itself. To have this second alone described in simulation is a vast undertaking, let alone a conversation.

Maybe I would agree that while this system would be "good enough", if it could predict an appropriate response by simulating a target human being to this depth... all the way down to how dopamine bonds to receptors in the frontal lobe, etc. -- then sure, I could see that it would be close enough.

However this isnt what anyone means when we say something is "simulated". They mean that it a single aspect alone is idealised into a single equation and treated under perfect conditions without any other factor being relevant, and then a calculation involving this equation is run.

People in AI are not even considering animal consciousness as being a relevant thing to simulate (even though that's what consciousness is). They think it is just a matter of some idealized formal structure.

If they realised that it would require an electronic system to calculate every descriptive quantity regarding every particle of some animal, computational general-AI research projects would be binned for the next millennia at least.

In the case of AI, no one is trying to "simulate a human being" in the sense you describe. They are trying to find an extremely simplified highly idealized equation to describe thinking.

They are trying to model intelligence as if the salient features of animal consciousness were not biological but equational. "Good Bye" follows "Hello" because insert program...

No, "Good Bye" follows "Hello" because people who speak english have lived a life of speaking it in which experiences have been acquired in response to the world ie., their brains have developed under sociolinguistic conditions: with light and sound bouncing off their bodies and the bodies of those around them such that their neurological structure as evolved to causally associate "hello" with akind of social circumstane and "goodbye" with likewise.

There is nothing apart from this connected social-neurological system that constitutes why "goodbye" follows "hello". That is how it comes to be. Any rule or system which appeals to an equation that isn't modelling this entire process to its full depth is just "accidentally correlated" with english-- and will be trivially easy to expose.

And so on for every aspect of consciousness.


>> and you'll find a whole lot of things that most people wouldn't call "conscious"

> I disagree. I cannot think of anything with a nervous system engaging in the particular neurochemical reaction I'm talking about not being conscious. No example comes to mind?

Is a jellyfish conscious? Does it have the particular neurochemical reaction you are talking about?

> A machine which imitates some highly abstract equational description of thought is as close to thinking as a bird is to an aeroplane. The bird's heart will burn as much jet fuel as your machine will think.

This is actually a good analogy for our disagreement. Your definition of "thought" seems to inherently depend on the implementing substrate; and if it doesn't burn jet fuel, a bird doesn't really "fly".

But for me the substrate is irrelevant; I don't care whether a machine "really thinks", so long as it can solve any problem which I might have to "think" about otherwise.


> I don't care whether a machine "really thinks", so long as it can solve any problem which I might have to "think" about otherwise.

OK, well then your calculator satisfies your definition of "thinking".

I'm concerned to know whether a machine is doing what my dog is or I am. And mostly when people become hysterical or tedtalky (which is the same thing most of the time) about AI they are presenting an "I, Robot" future where androids dream of electric sheep.

> and if it doesn't burn jet fuel, a bird doesn't really "fly"

When I think, "I'd like my pen" and subsequently my arm moves to get my pen, my thinking is causally connected to my arm moving. My arm moving is some chemical my muscles do, in order to be connected at all with my thinking, my thinking has to be something broadly chemical too.

The plane doesn't move air out of the way because its flying. It does that because its burning fuel (etc.). "Flying" as a description of what the bird and the aeroplane are both doing isnt actually any physical process at all. Is a pattern they both very abstractly follow that we have invented. In this sense nothing in the universe actually flies: the bird does its thing, the aeroplane does its thing --- and from our point of view, they are both abstractly similar.

IT's our pov which makes them similar though. The airplane isnt distressed to burn too much. The bird is.


the problem with your suggestion is that it is merely affirming the consequent: brains are conscious and robots are not brains therefore robots are not conscious. actually the real question is why are brains conscious? if we could answer that, then we could gather conscious things together (human brains, dog brains, slime mold, not trees, not grass) -- you see. its not sufficient to just say "brain" or "organic matter". the diversity of life (ex bird brain vs squid brain while both creatures demonstrate self awareness) suggests a compelling hypothesis: multiple realizability. if multiple realizability is true then perhaps the explanation is that it is not the substrate that is relevant but the arrangement, the programming, whatever... point being, im guessing you have a bio background and so you're a skeptic about virtual minds and in the more strict sense that is a very valid point -- most in the field of AI do not have any bio background at all and as you say, the only thing known for certain to be conscious being your own mind, well, you'd think understanding the bio would yield some insights. well actually it has, ex hinton's work is very much inspired by actual biology (neurology) /end rant


Who said anything about consciousness? Intelligence does not equal consciousness.


Line up all the things in the universe that are intelligent. They are all conscious.

Unless you mean "imitating intelligence", but that's not what AGI (artificial general intelligence) is targeting.

An AGI robot would be one to include in any future line-up of "all the things that are intelligent". My view is that unless an AGI robot has a nervous system, it will never belong in that category.

It is fashionable in tech today to speak metaphorically with abandon and to call whatever we like "intelligent" (and to have, machine "learning", etc.). These systems are no more intelligent than an abacus: upon inventing a wood-to-lcd converter, we could run Win95 on a few million wooden beads.

Metaphorically, these are intelligent. They fill the role of prior uses of genuine intelligence -- ie., they can help us calculate and therefore substitute our calculative conscious behaviours for non-conscious substitutes. They are tools.

My hammar is no more intelligent than my laptop however. Neither possess thoughts, nor can conceptualize or understand anything. A concept is a (biochemical) technique an animal acquires through a biochemical interaction with the world: an electric field can not have concepts.

The argument that machines-which-seem-intelligent are intelligent is only a fallacy of ambiguity. What we mean, of machines, is a tool-which-helps-conceptualizing. We we mean of general intelligence is conceptualizing. The former and the latter are not the same. In a universe only of tools, nothing thinks to use them.


> Line up all the things in the universe that are intelligent. They are all conscious.

The words “intelligent” and “conscious” are not sufficiently well defined to make that claim. By “intelligence”, do you mean:

1) “the ability to learn or understand or deal with new or trying situations”? Even current AI can do that.

2) “the ability to apply knowledge to manipulate one’s environment”? That’s another rabbit hole itself.

3) “think abstractly as measured by [tests]”? I think current AI fails at this.

4) the thing which is separate to “body” in Cartesian Dualism? I don’t believe in that (i.e. souls) any more, so I can’t claim any AI would pass this test (nor any human).

And “conscious”? Is that:

1) Opposite of unconscious? I’d say they are.

2) Opposite of subconscious? ‍️

2)a) As in, not just autonomous functions like breathing or the equivalent for a robot? I’d say they are.

2)b) As in, “System 2 thinking”? ‍️I don’t know.

3) Critical self-evaluation? Generative adversarial networks pass, other than that I think AI fail this test.

4) Mirror test? Pass, but in a special case I don’t know the details of and which might well be newspaper fluff rather than proper AI.


> Even current AI can do that.

No it can't. A machine's output is not deterministic from its input and can be sensitive to conditions not anticipated at-programming-time. That isn't understanding.

A spinning top may spin on many surfaces not anticipated by the designer and acquire all sorts of interesting behaviors by doing so.

"The opposite of consciousness" is not a meaningful criterion of interest. I'm not even sure it's a meaningful term. It seems to commit the buddhist fallacy that causal processes at work in the universe are differentiated into opposites. That whatever causes rocks to fall must be the opposite of the thing that causes fire to rise. It isnt -- there isnt really anyway to define "opposite" with respect to what causes what.

Many of these tests you're outlining arent relevant to the question "does this robot have what we are interested in". Ie., is this piece of lead actually gold. Not "is it shiny with a brass coating" -- but can it participate in all the causal interactions gold can. I'm not concerned with how good the tool is, or how close we are to fooling people, i'm concerned with whether the robot can think.

Does the robot posses any concept? Any idea? Any understanding?

No, only metaphorically. It seems as-if it does to people who use it to aid in their understanding. It is only a trick, no more than an ancient human being being scared of a spinning top and wondering how it is so well able to navigate around the grain of the wood -- better than any beetle!

Intelligence as it is possessed by the relevant subset of animals which we're targeting possess concepts. They are biochemically connected to their environment. The understand it. The dog's finding its bone is NOT the same as the spinning top find its grove. Only by an extremely confused metaphor.

The dog puts to use thoughts, concepts, ideas, imagination (and many other things besides) that are about its environment. That it has acquired in its direct understanding of its environment. The spinning top merely topples towards its final point as-if it understood.

Machines are rivers of electrical current that topple toward and outcome that it sensitive to their current state, like a top spinning about a board. They have no active, navigating, biochemical motivated concernful goal-directed action.

My view is that they never will, since on all the best evidence, skillful concernful action is a neuro-bio-chemical process.


> "The opposite of consciousness" is not a meaningful criterion of interest. I'm not even sure it's a meaningful term.

Then let me rephrase: The state that you are in when you are not asleep nor anesthetised nor in a coma. (I’m assuming you wrote consciousness rather than unconsciousness as an autocomplete typo not as a misreading, I’m having to edit a lot of comments for that reason today).

(Also, strange example with fire and rocks, given there is one axis on which fire does rise for the opposite cause of rocks falling: buoyancy)

> Many of these tests you're outlining arent relevant to the question "does this robot have what we are interested in"

I listed them because it was not clear what you are interested in when you say “intelligence”. You have improved one step by saying “understanding”, but that has nine meanings lf its own, half of which point back to “intelligence” without adding anything useful to my mental model of what you might be trying to describe.

Now, with regards to your dogs-vs.-spinning-tops comparison. I totally accept that spinning tops are not intelligent. I do not understand how you decided to fit spinning tops against definition 1. I believe dogs are intelligent. I cannot prove dogs are intelligent by definitions 2, 3, or 4, only by definition 1 — can you? Can you demonstrate that a dog has any of “thoughts, concepts, ideas, imagination”? Again, I believe they do, but I cannot prove any of those things and I am aware of both the risk of anthropomorphism and of dehumanising (ironic word in this context, but it fits) their minds.

Finally, why do you believe that neuro-biomechanical processes are fundamentally capable of things that silicon cannot do? What makes it special?


> Even current ("real") intelligence can do that.

No it can't. A biological creature's output is not deterministic from its input and can be sensitive to conditions not anticipated at-programming-time. That isn't understanding.

A spinning top may spin on many surfaces not anticipated by the designer and acquire all sorts of interesting behaviors by doing so.

[...]

Many of these tests you're outlining aren't relevant to the question "does this biological creature have what we are interested in". Ie., "is this piece of lead actually gold". Not "is it shiny with a brass coating" -- but can it participate in all the causal interactions gold can. I'm not concerned with how good the tool is, or how close we are to fooling people, i'm concerned with whether the biological creature can think.

Does the biological creature posses any concept? Any idea? Any understanding?

No, only metaphorically. It seems as-if it does to people who use it to aid in their understanding. It is only a trick, no more than the sun being ascribed agency by ancient human being.

Intelligences which we're targeting do not possess concepts. They are not meaningfully connected to their environment. They don't understand it. The dog's finding its bone is just the same as the spinning top find its grove. The spinning is only much more intricate, and the nature of and interaction with the surface much less easily understandable.

The dog experiences illusions of thoughts, concepts, ideas, imagination (and many other things besides) that are about its environment. Those have been caused by the nature of its environment. The spinning top topples towards its final point as-if it understood, just like the dog.

Biological creatures are rivers of biochemical and electrical currents that topple toward and outcome that it sensitive to their current state, like a top spinning about a board. They have no active, navigating, motivated concernful goal-directed action.

My view is that they never will, since on all the best evidence, skillful concernful action does not exist. It is merely illusions that emerge from chemical and electrical interactions.

It is not obviously clear whether your argument is any more valid than the above (or the other way around).

To expand, if a robot in every imaginable way behaves exactly like another human would, how can you know know that one possesses "intelligence" (or rather, consciousness), while the other doesn't?

Why would it be possible to construct the high level structures from which intelligence emerges on top of one set of primitives (electrical/chemical in biological context), and not the other (electrical-based logic on silicon).

The argument you pose have many signs of being an appeal to the ghost in the machine.

For anyone interested in exploring ideas around (self)consciousness and the mind, The Mind's I by Douglas Hofstadter and Daniel C. Dennett is a good read.


> The dog experiences illusions of thoughts, concepts, ideas, imagination (and many other things besides) that are about its environment. Those have been caused by the nature of its environment. The spinning top topples towards its final point as-if it understood, just like the dog.

I think it's the same case with humans. We wouldn't differ from monkeys without ability to store information and imagination/abstraction(I think animals might posses those as well). Biologically-wise we are striving for same goals just in different environment(influenced/created) by us and more means(posibble acions).


What you've done here is expose an epistemological problem but not the ontological one under consideration.

My ontological premises are: consciousness exists and it is a neurochemical process.

From this follows: machines which are not instances of this process are not conscious.

Now you have basically said: but we do not know for certain that it is this neurobiochemical process that accounts for consciousness. Isnt it reasonable to suppose that we might encounter things not actually conscious and think they are? YES. Isnt it reasonable, therefore, to suppose that "consciousness" isnt actually ontologically the same thing as this neurobiochemical process? NO!!

The magnitude of human foolishness is no guide to what exists and what it is like: our being able to be fooled by anything tells us little.

Yes, we can reverse the picture for the spinning-top and dog. But this is an epistemic reversal: we can suppose we are being fooled by the dog, but the spinning top is the truth!

This seems highly unlikely for reasons basically summarized as, "science works".

The spinning top isn't thinking and the dog is. That it is possible to doubt this claim, ie., that it fails to be certain, tells us nothing. Almost all scientific claims fail to be certain, that's neither here nor there as to whether they are accurate.

And I am appealing to no ghosts to draw distinctions between dogs and spinning tops. I am appealing to a reasonable scientific inference: let us modify the dogs behaviour and let us modify the spinning tops. Cocaine might well be involved in the former, or neurosurgery -- and in the latter, wood carving.

The distinction between a chisel and cocaine is hopefully clear enough: they act on their target objects in extremely different ways. The way that cocaine acts informs what stuff I take to comprise "consciousness" -- that is how the behaviour of the dog is modified.

> skillful concernful action does not exist. It is merely illusions

I dont know what you mean by "illusion" here. I don't see why the observation that skilllful actions is a bodily process somehow diminishes its reality. Skillfull action is just what people do in order to achieve their goals, we have discovered all of these things are biochemical but that doesnt make them fake.

If we observe the target phenomenon "skillful action" we discover is it biological. This rules out silicon and electric doing it. Or, in other words, to modify the behaviour of a machine I cannot use cocaine. It has no thoughts to disrupt. I'd have more luck with a chisel.


No, the issue is with the very premise, then.

It is not clear that consciousness necessarily only emerges from a neurochemical process.

What was basically said was "We do not know for certain that it is this neurochemical process that accounts for consciousness. Isn't it reasonable to suppose that we might encounter things that actually are conscious, but whose consciousness is not accounted for by the same property (neruochemical) of the underlying process."

I'm not sure that stated premise is very useful, since it borders on the tautological.

The dog (or even a human doing some task) is akin to an intricate state machine whose next state depends on the current state and its environment. Just like the spinning top. For each of those we modify the lower level mechanisms to effect a different high level behavior. Changing the thing in the former case (Cocaine/neurosurgery) or its environment (steal the bone). Changing the thing in the latter case (cutting out part of the spinning-top) or its environment (carving the surface it spins on).

The difference in the two cases being the number of intermediate steps (or abstraction layers if you will) between the high level behavior and the low level mechanisms from which it emerges, and the complexity of the emergent behvaior.

Illusion: the low level mechanisms (biochemical or otherwise) that, using the current state and the environment, transition to the next state, and in the process "present" an experience that we interpret as ourselves thinking, making decisions, taking skillful actions and so on.

If we observe the target phenomenon "skillful action" we discover that all known occurrences are biological. This doesn't really preclude the possibility of other mechanisms producing it.

To modify the behavior of a machine, you cannot use cocaine. That's because the machine has no receptors for the comprising molecules - not because it has no thoughts. You could instead modify the logic gates it possesses instead by applying a certain pattern of electromagnetic radiation which would cause interference, just like the cocaine interferes with the normal workings of the brain.


Scientific claims are not necessities. I'm not saying I can prove consciousness is a biological process only that its overwhelming reasonable to suppose so.

Emergence is a result of causal interactions between parts of a system being different than the internal causal interaction within one part. It doesnt mean "complexity" and it really has nothing to do with a machine.

The oscillating electric field acquires no new causal interactions as the program complexity increases. Adding more H20 to a single H20 creates new causal interactions (eg. wetness).

> That's because the machine has no receptors for the comprising molecules - not because it has no thoughts

Right, so you're supposing a contrary entirely bizarre ontological view: that thoughts are something independent of a biological process.

Of all the known things in the universe which think, to remove their nerves is to destroy their capacity to think. I cannot see any reason to suppose thinking is not merely their activity.


> Of all the known things in the universe which think, to remove their nerves is to destroy their capacity to think. I cannot see any reason to suppose thinking is not merely their activity.

You appear to be using circular reasoning. You assert that only biological-neuron entities are intelligent, use this assertion to create the set of intelligent entities, and then say that this is valid because to remove the neurons in those entities also removes their intelligence.

Indeed, it does — but then I get to assert that only silicon-logic-gate entities are intelligent, because their ability to process sensory inputs and translate this into signal outputs goes away when you remove their doped silicon wafers. It doesn’t help.


It would be circular if it were an argument. I haven't made an argument for it only offered it as the start of our scientific investigation. All observations of brute fact, phrased as arguments, are circular -- because the universe goes unargued for and merely exists.

What I mean is this:

You and I are having a conversation about consciousness. To do this scientifically we're going to have to point out those things in the universe that we're talking about. (We cannot begin, as socrates thought, with definitions because we dont know them yet).

So I shall collect for you all the things we have been talking about when we have said "this is conscious!". And you do the same. And my claim is that everything in this group is in this group... because ... it has a nervous system.

That is a hypothesis. My view is that this hypothesis is true and extremely well-evidence. My view is further that the only thing you can add to this group without a nervous system is something of pure imagination -- a cartoon character.

This is possible for any group: I draw a golden rabbit speaking to a silver duck. There are no such things because ducks cannot be both alive and made of gold -- as a brute fact about our universe that at its base causal interactions only play out in that way.

To believe that an electrified piece of metal could ever belong in the group of things united by their common feature "consciousness" is profound bizarre to me: what exactly is that thing meant to possess that I have?

Of what do I have when I am hungry and think of food that a piece of silicon may have? I dont know what that it, but it seems to throw away all known neuroscience to suppose it exists.

That is it not merely a cartoon fantasy, that really, the thing that makes me conscious is some peculiar abstract property of me that current running around a wire can also instance.

I have grave suspicions that neuroscientists will ever find that I possess this "structure", not least, because modifications to the way I think are easily done by insufflating cocaine (or whatever else). A drug which operates biochemically and yet modifies thought.

I'm not sure how a chemical modification to thought makes sense if the latter is an abstract property.


I don’t think I follow you.

> So I shall collect for you all the things we have been talking about when we have said "this is conscious!". And you do the same. And my claim is that everything in this group is in this group... because ... it has a nervous system.

That would be great if we were in 1930 and asking which pre-existing creatures are conscious, but you are asserting that no members of a group which was created to implement all the forms of intelligence that have yet been made quantifiable (as opposed to qualitative judgements of intelligence) are in your set.

I assert that you have a list, and that you have merely defined your words to be a shorthand for that list, rather than made a hypothesis that those words are descriptive properties that allow us to even ask if other things can be in that list, nor to ask if all members of that list truly belong there. (I.e. “is a dog intelligent?”)

For example, you now assert the list is synonymous with “nervous system” (previously “conscious”, previously “intelligent”) without explaining why a digital- or semiconductor-based nervous system would fail your test.

> To believe that an electrified piece of metal could ever belong in the group of things united by their common feature "consciousness" is profound bizarre to me: what exactly is that thing meant to possess that I have?

That’s my question, too. What is that thing which you are meant to posess which supposedly cannot exist on artificial substrates? Why is a biological neuron fundamentally better at thinking than a computer simulation of a biological neuron?

Still, I’m not sure I actually follow what you’re trying to say, because your last three paragraphs seem to be distorted by either autocomplete or google translate. Either way I just cannot extract your point from them.


I do not find it bizarre to pose that it could be possible to use another set of primitives (than biochemical ones) to create something analogous to the higher level structure in a human brain that produces thoughts.


> The distinction between a chisel and cocaine is hopefully clear enough: they act on their target objects in extremely different ways.

No. Both change target state. And when "running"(spinning, living) will act differently.


Where do you think human consciousness comes from?


I think all consciousness (a feature of most animals) is a specific kind of neuro-bio-chemical process.

To put it another way, to improve your focus I can give you Modafinil. To stabilize your mood lithium. To give you hallucinations, lsd. To distort your visual perception, ketamine. To make you more empathetic, trusting and less hostile: mdma. To speed up your thinking, cocaine. To make you aggressive, too much cocaine.

And this applies across the animal kingdom, in large part.

Where in this list is a program?

A program source is a description. A running programming is an electrical field across a piece of silicon. A description has no causal effect (the mere act of writing a description down does nothing). An an electrical field across a piece of silicon has effects, but none meaningful on consciousness.

Consciousness is not hiding somewhere. It takes only a baseball bat to deprive someone of it. And its various parameters are easily enough mapped about by chemical manipulation.

The modern (general/strong) AI sort follow in the tradition of Descartes who imagined that consciousness was its own sort of substance apart from the body. That its effects and structure were idealizable. Whether we call this idealization a "soul" or a "program" seems to make no odds to the mistake being made...

Consciousness is something animal bodies do. It is not a pattern in the sand, or a current in a wire. An certainly not a soul, a number, a program (which is only a number) or any other idealized abstraction.

To be even more explicit: no, you will not be able to upload your consciousness. The suggestion and its homology to heaven ought concern anything modern thinking scientist.

A description of water (H20) is not some water. I can no more upload a drink to amazon, than I can upload a thought you are thinking. A thought is a biochemical reaction. A description of a thought, even if very accurate, doesnt think. The internet may one day hold very detailed descriptions of people's brains. These will sit like textbooks and tombs though, and not care/wish for/consider/want/desire/understanding anything.


If it's purely physical process then it can be simulated. If it is process then it has some state. You can save state.

> The internet may one day hold very detailed descriptions of people's brains. These will sit like textbooks and tombs though, and not care/wish for/consider/want/desire/understanding anything.

Same would be if you saved state of biological world without running(simulating) it.


> An certainly not a soul

OK we agree on the basics - that consciousness is not due to some mystical external factor but that it is contained within a brain.

However, your description of consciousness is a particular combination of electrical fields (NB that those chemicals are acting by changing how electricity is transmitted)...


People are conscious (actually self-aware?) because we say we are. Which isn't actually evidence. I'm doubtful there's anything to explain.


Lots of other animals are self-aware (they can identify themselves for example). This isn't just a human thing, but an advanced brain thing.


Although probably not sufficient for AGI, network architecture is essentially guaranteed to be important, because of both ample empirical evidence of the importance of architectures and ample reason, from facts about numerics, to believe that it is important.

In the first category (empirical evidence),

- The discrete leap from non-LSTM RNN to LSTM network performance on NLP was essentially due to a "better factoring of the problem": breaking out the primitive operations that equate to an RNN having "memory" had a substantial effect on how well it "remembered."

- The leap in NMT from LSTM seq2seq to attention-based methods (the Transformer by Google) is another example. Long-distance correlations made yet another leap because they are simply modeled more directly by the architecture than in the LSTM.

- The relation network by DeepMind is another excellent example of a drop-in, "pure" architectural intuition-motivated replacement that increased accuracy from the 66% range to the 90% range on various tasks. Again, this was through directly modeling and weight-tying relation vectors through the architecture of the network.

- The capsule network for image recognition is yet another example. By shifting the focus of the architecture from arbitrarily guaranteeing only positional invariance to guaranteeing other sorts, the network was able to do much better at overlapping MNIST. Again, a better factoring of the problem.

These developments all illustrate that picking the architecture and the numerical guarantees baked into the "factoring" of the architecture (for example, weight tying, orthogonality, invariance, etc.) can have and has had a profound effect on performance. There is no reason to believe this trend won't continue.

In fact, there are some very interesting ways to think about the principles behind network structure -- I can't say for sure that it has any predictive power yet, but types are one intuitively appealing way to look at it: http://colah.github.io/posts/2015-09-NN-Types-FP/


Thanks. I agree. The anecdotal evidence suggests that architecture is indeed important.

This paper is the only direct evidence I've seen of it, though.

Great work. Compelling.



Yes, AGI as a discipline distinguishes itself in large part from machine learning by its focus on architecture over training or data.


Well, there is correlation between 'nearby' things of all types. An apple is an 'apple' because we encountered the word and the thing in close proximity to one another temporally and perceptively many times during our life. All knowledge is this. Tremendously overlapping and filled with mistakes (spurious correlations induced by coincidence and happenstance), but fundamentally just that same 'insight'. Our brains are association machines. Once sufficient quantity of associations exist in a brain in a body living in a world we recognize, we call the property that results 'consciousness'.

Unfortunately there is no reason to think that you can substract the 'in a body living in a world we recognize' part and end up with anything like the same property. All those associations come from perceptive inputs, and result in motive outputs, with the biofeedback that occurs being of profoundly fundamental importance (heck, put a pen in your mouth which causes your mouth to 'smile' and you will 'feel happier'). While of course there is definitely a possibility that a machine-based intelligence could be conscious, it would doubtless have to be very different from anything we could recognize, derived as it would be from its own input, output, and the feedback between the two.


Probably more than structure (which I took to mean connectivity) -- there are a lot of things that affect (biological) neuronal computation in addition to connectivity, such as synaptic weights, distribution & types of receptors for different neurotransmitters, and the (sometimes very different) timescales associated with different neurotransmitters. Not easy things to measure.


Does this mean huge datasets are no longer a prerequisite for this type of computing? Leveling the playing field for smaller teams who may no longer have to rely on Google- or FB-sized datasets?


I don't think so. They are basically loosing entropy by going from the corrupted image to the original one that's more predictable.

I don't think you can do other things like labeling by using the same method


The brain probably has a lot of hard wired networks to recognize faces and to not like being cold for example. I think even feral children have these capabilities.


I don't understand their editorializing

> contrary to expectations,

ingore the weasel words

> a great deal of image statistics are captured by the structure of a convolutional image generator rather than by any learned capability.

> To show this, we apply untrained ConvNets to the solution of several such problems. Instead of following the common paradigm of training a ConvNet on a large dataset of example images, we fit a generator network to a single degraded image.

How is that _untrained_, if they are training it on an image? Is a generator network different from a ConvNet?

> This is particularly remarkable because no aspect of the network is learned from data;

but they just said "we fit a generator network to a single degraded image"

As others have commented, they appear to be training on all the parts of a single complex image, and using that training to repair local deviations from the global average. This may be better than classical denoising algorithms because the NN can model the image structure better than other approaches, but this doesn't seem like a novel use of NNs.


The point is that a fully connected network would not get you the same results. I.e. the convolutional structure allows it to learn interesting things from a single example because by choosing a convolutional architecture, you are implicitly imposing a prior.


How is a network structure different from a fully connected network with 0 weights?


Agreed. The network is trained in a sense "online" on the target image and directly applied to it. I don't see much impact of this work outside image denouncing or tasks already presented in the paper.


The “task dependent observation model” also comes out of thin air, or is it trained with lots of data?


Observation model is trivial and data-free. For example, just mean squared error for denoising, mean squared error of downscaled image for super-resolution, etc.


I'm fairly certain that's not? Where in the paper does it talk about this other network, generally called the discriminator?


There is no discriminator in this work. Read the paper.


Does this also explain - on a higher level - why AlphaGo Zero was possible and so successful?


Alpha Go Zero is actually more similar to a GAN: https://arxiv.org/abs/1711.09091


So are they saying that the topology of a deep-net is intrinsic to "reality" ... somewhat analogous to something like the Fibonacci ratio for organic forms?


The Fibonacci thing is mostly a myth perpetuated by confirmation bias tough.


Maybe more like a fourier transform.


Remember fractal image compression ?

https://youtu.be/AjdogjBxfco?t=260


This can also be looked at as the original source of patch based denoising, etc. In the end it's about capturing the scaling properties and self similarity of natural images. This is also, for example, why wavelets were so effective as a basis.

David Mumford particularly did some great work on this sort of thing a couple of decades ago, along with many others. I hope when people are rushing around trying to apply convolutional nets to everything they aren't losing these insights.


The benefit of CNNs is like the benefits of SVMs -- they generalize all the great old techniques so you don't have to understand them all, you just throw more CPU at the optimization problem.


I don't think that's true particularly in this case.

This paper is pointing out that you can encode a structural prior in a CNN - but knowing the "great old techniques" will help you design the right network architecture to do that.

SVMs were a surprise when they came out, no so much a generalization as a challenge (at least at first)


Thx for that!


Upvoted you both. I normally really don't like videos but this was worth the few minutes it took. (The link points to the middle of a 9 min video.)


Huh. This seems to boil down to 'noise is higher information entropy than realistic content; partial learning will learn realistic content before learning noise' or something like that.


I think that can be mainly attributed to the fact that the last few deconvolutional features are overfitted to features in the image and are somewhat robust to noise. The network does not even learn features to produce e.g. white noise as output. This is probably much less magical than the paper makes it seem to be.


Interesting. Then you might achieve similar results by compressing the image as a JPEG with low quality settings.



Actually... How cool would it be to have an NN that could extend an image's background with plausible scenery? Not just photoshop 'smart' fill, but for example if it detected a building on the right side of an image, it could draw the rest of it? :)


This exists.

This approach takes other photos of the same scene to extend a cropped pic (see also MS PhotoSynth): http://grail.cs.washington.edu/projects/sq_photo_uncrop/

This uses a GAN to fill in missing parts of a pic. Those parts could be on the edge of the picture (although the paper doesn't explore that): http://hi.cs.waseda.ac.jp/~iizuka/projects/completion/data/c...


Here's another earlier example of un-crop, that uses only the source image (doesn't need gps or an internet database of photos), and does both out-painting and in-painting.

http://graphics.cs.cmu.edu/people/efros/research/EfrosLeung....


Wow, so just the weight sharing architecture does so much already? I am wondering if the same could be done with LSTMs on sequences or CNNs on voice...


I'm wondering the same thing too.

Note also that this finding strongly suggests that neural net architecture actually is quite important, possibly even more important than having more data -- which contradicts the conventional wisdom!


There is some pretty strong evidence for this: all the toddlers in the world. You only need to show them something once and they'll immediately be able to recognize more examples of the same thing from different angles and even when it is partially hidden. All they have to guide them is the structure of their brains, not the quantity of data they have been exposed.


"All they have to guide them is the structure of their brains, not the quantity of data they have been exposed."

A typical toddler (say 12 months' old) has spent 4000-5000 hours with open eyes. Even if you assume a low frame rate (10fps), resolution (1080p), and a 1000:1 compression ratio, that's still 1TB of training data.



Also, find me a 12 month old that can recognize an object after seeing it once.


Seems like this view gets told every once in a while by someone who clearly hasn't been around any 0-2 year olds.


This is HN, what did you expect? Real people with wife and kids?


Women are real people too.


Are you suggesting that women cannot have a wife and kids?


Try not to be an asshole. Thank you.


Your comment is hilariously wrong. Please do not make assumptions like this, you're typically going to be embarrassed.


Please show me a child that recognizes a new object after one look.


Shall I mail them to you?


Certainly not true.. reading takes ages for instance. Associating objects to words takes forever.. Perhaps this is true in another sense but in the sense i described.


Reading is a lot more complex than object recognition.


It's not about neural network architecture. CNNs are taught by presenting them overlapping pieces of image. To speed things up and keep things orgsnized this is not done sequentially but in parralel making multiple neurons share weights but this is just a trick.

So what makes this result possible is not the architecture of NN in CNN but rather architecture of C. That allows us to get multiple samples from single image. The rest is just that actual content of the image is easier to learn then the noise.

Brain is almost nothing like CNN.


I think it's both C and NN. Don't forget each new C layer groups information from previous layers; using just a single C layer won't do you much good. It might not reflect brain much but it kinda resembles what retina/visual cortex neurons do; CNNs were actually inspired by visual field maps found in visual cortex and somebody had the idea that C is the most similar CV operation we have, and put them together. To everyone's surprise it worked nicely.


Next layers have exactly same trick as the first one. I don't quite buy that it resembles visual cortex.


It's probably just very rough "resemblance" :D It is said CNNs were "inspired" by visual field maps; I am 100% sure we know very little about how that part of brain works and maybe somebody just took a look at main/thickest connections between neurons there and tried to assemble them in a NN to see if it helps.


I'm thinking exactly same thing.


Do not Echo State Networks already do that?


This shouldn't really be surprising. Machine learning is specifically not magic. The reason CNNs have seen so much success is precisely because they build in translation-invariance, which massively cuts down on parameters while forcing the final function to have the desired structure regardless of wherever gradient descent takes the weights.

Also why most papers in deep learning are network architecture innovation.


One more relevant note - (Olshausen and Field, 1997) showed that the filter employed by V1 simple cells could be learned using some simple assumptions about sparse coding and a single image. Translation invariance built in by way of the sampling scheme of the image, small patches.

The filters learned by the first layer of CNNs is usually of the same type, Gabor filters. Not a coincidence.

That was twenty years ago. What's old is new?


What do Gabor filters have to do with this?


Parent is saying Gabor filters are typically effectively recapitulated by the first layer of of the network anyway, as they are a natural representation.


But what does have to do with smoothness and translation invariance which this paper is a demonstration of? You even learn Gabor filters with local connectivity without spatial weight sharing.


Deep learning researchers rediscover Compressed Sensing. News at 11.


Can someone break this down for this layman?


Not an expert so take this with a grain of salt; I could be misinterpreting the paper.

It seems that the current accepted method is to train a network with distorted images as the input and the correct undistorted images as the targets. Then after training you can feed a new distorted image into the trained network and get the estimated "fixed" image.

However this team actually uses the distorted image as both the input and the target to the net. So if they were to let the training go on for too long the network will produce an exact copy of the distorted input image. But for some reason, the structure of the network means that the estimated output learns realistic features first, and then overfits to the noise afterwords. So if you stop the training early, you get an image that incorporates realistic features from the distorted image, but hasn't had time to "learn" the noisy features.


This is fascinating because I've been running into something similar with sequence to sequence models translating natural language into Python code. I got better results stopping "early" when the perplexity was still quite high, I thought it was a little crazy.



Fascinating! Could you share more on the details of your experiment?


> So if they were to let the training go on for too long the network will produce an exact copy of the distorted input image.

It won't. The objective (min (E(x, x0) + R(x)) they are trying to optimize (over output images x) amounts to a combination of:

- The output image x should "look like" the input image x0, this is the error term E(x, x0);

- The output image x should be "regular", this is the regularization term R(x).

The latter term prevents overfitting: R should be chosen such that noisy images for example are considered irregular (high R(x)).


Ah you're totally right about it not producing an exact copy, they even mention how they used different error functions for the different classes of distortion, my mistake.

But w.r.t. the regularization term, what I thought they meant by "we replace the regularizer R(x) with the implicit prior captured by the neural network" at the end of pg 2 was that they let the natural behavior of a neural network during optimization serve as the regularization, without need for an explicit regularization term. Not entirely sure though.


> for some reason, the structure of the network means that the estimated output learns realistic features first, and then overfits to the noise afterwords.

Why does this happen? What characteristics does the network structure have that cause this effect?


I think the reason this paper is so interesting is that no one had any idea why this happens


Wow thanks for the explanation, I would they would just use what you wrote for the abstract.

At first, I thought the machine learning algorithm was able to just bridge the gap in images from nothing...

CSI time: Enhance ... enhance... enhance!


The structure of convolutional neural nets specifies much of the prior knowledge necessary for learning. In other words, the design of these neural nets makes a lot of correct assumptions about the nature of images (stationarity of pixel statistics, locality of pixel dependencies, and so on).


By structure, we are simply referring to the number of layers, the number of neurons in each layer, and the specific connections between neurons in each pair of neighboring layers, right?

So in this paper, they carefully chose a certain structure, set the weights randomly, and then what happened after that? I understand that they did not then train it with a training data set, but I'm not quite getting what they did with the single distorted image.


Well given that it's CNNs, you're leaving out weight sharing.

So by structure you should also include the demand that the prediction of any NxN patch of the image should be roughly equal to the prediction of any other NxN patch of the image.


is the structure of these CNNs learned or designed?

Do they run some kind of optimizer to learn the optimal CNN structure or does some person sit down and pick structures to include in it?


Convolutional layers are designed, by and large, and they're mostly the same everywhere. Yann Le Cun came up with them in the mid-90's, but their academic origins go back to at least the 50's and 60's.


then the results of this paper are sort of surprising that it's that good, if the structure is just relatively old, hand-crafted and traditional.


Handcrafted, but auto optimizing is a hot research topic right now with DeepMind et al. Needs to be evolved at deeper level than what they’re doing now so architecture is discovered, not just optimized.


By structure, do you mean the architecture of the NN?

Meaning number of hidden layers, nodes per layer, and their connectivity with each other and with the input and output layers?


No, the paper means primarily the weight sharing in each kernel filter within each convolutional layer, and the stacking of these layers in deep networks.


Which in turn relies on older work.

I wonder if anyone has looked at what Mumford-Shaw implications would look like projected onto a CNN?


Neural networks are powerful without even training them. By merely designing the structure of them you are creating something.


Truly, an occasion for this koan:

Sussman attains enlightenment

In the days when Sussman was a novice, Minsky once came to him as he sat hacking at the PDP-6.

“What are you doing?”, asked Minsky.

“I am training a randomly wired neural net to play Tic-Tac-Toe” Sussman replied.

“Why is the net wired randomly?”, asked Minsky.

“I do not want it to have any preconceptions of how to play”, Sussman said.

Minsky then shut his eyes.

“Why do you close your eyes?”, Sussman asked his teacher.

“So that the room will be empty.”

At that moment, Sussman was enlightened.

http://www.catb.org/jargon/html/koans.html


If there ever was one this is it.


Okay, I admit it, I'm not enlightened. Will somebody please Explain Like I'm 5?


Initializing a neural net with random weights =/= making a neural net with no preconceptions on how to play. It still has those preconceptions, you just don't know what they are.

Closing your eyes so you can't see the contents of a room =/= the room is empty. Not knowing what something is, is different from it not existing


Thanks!


This kind of reminds me of what I took away from Gerald Edelman's work ([0]).

The gist as I remember it was that he theorized that the cortex is composed of assemblies of neurons formed in development (in the womb, potentially with epi-genetic factors). Each assembly, just by chance of it's structure, is likely good at something. Over time, assemblies take up the tasks they happen to be good at and further specialize with experience.

[0] - https://en.wikipedia.org/wiki/Neural_Darwinism


Creating is perhaps better states as “asserting”


Could be wrong too, ML is not my field.

Basically, a generator neural network has two things which affect its output: an input and some parameters (weights).

Let's use the setting of the task for denoising images. They use a network who's output is to is the denoised image, and compare it to the noisy image to get a score for how good the denoised output was.

Now the strange thing is, for the input of the network, they just use random garbage. The only thing they move around to try get a good denoising score are the parameters of the network (not the input).

They find that by only adjusting the parameters, even with fixed random crap inputs, if they find the parameter setting which minimizes the noise, they actually still get great looking results.

This suggests that the networks they tested this method with (other researchers work who do well on this task) are based much more in the inherent structure of the network, rather than the model refinement from training on thousands of images, since even given random crap, they generates good results, as long as the parameters are tuned well to the noisy image.


Instead of looking for patterns from itself on others images (training data) it starts with noise and deforms such noise based on patterns found on itself and favors deformations closer to the input image; eventually reaching something close to the input image without the noise (cause the noise it's pattern-less or at least weak enough to die over stronger patterns)


Does this say more about CNNS fundamentally or our design process for their structure? What sort of assumptions might one make when "structuring" a CNN?


This is incredible! Can't help but wonder if the brain does something similar to fill up "gaps" in reality. e.g how we fill up our perception (not only vision, but general mental intuition) based on just context.


That's exactly what happens when the retina is damaged. The brain fills in the void imperceptibly so you aren't distracted by the deficit. But I think biology doesn't fill in the void synthetically. It's more likely that the brain "turns a blind eye" toward the missing 'pixels' and directs your attention elsewhere, perhaps to those remaining regions with high saliency (useful detail).


Not exactly. Brain has prior knowledge (deep learning). Lets take a picture of a car as an example. License plate has letters/numbers, so even if it cuts lower half you will try to guess by fitting the most likely symbol combinations. Approach from the article has no prior knowledge, wouldnt be able to inpaint meaningful letters, nor guess most cars have wheels, bumpers, drive on the road etc.


Yes it does.


As I'm not an expert in the field; what exactly does the term

  min_x E(x; x0) + R(x)
mean?

I thought that E(x; x0) would denote the error/difference between original and corrupted images, and R(x) be the (searched-for) correction. But this doesn't seem to make sense with the next parts of their explanation.


x is actually the generated image they are testing against x0. A lower E(x; x0) means an image which fits well towards the objective based on the original image (depends on the task). The paper gives some examples. For example, for the task of image denoising, E(x; x0) is just the squared distance of the generated (denoised candidate) image x to the pixels to the original image x0. Obviously you would want this to be low in the generated version since it should still look close to x0.

R(x) is a regularization term to avoid overfitting. For example, in the denoising example, it could be a measure of the variation in color of x. Clearly, just taking x = x0, the squared error (E(x; x0)) is 0, but it will have high R(x) because of all the noise. That's why they try to minimize both quantities combined, so we get min_x E(x; x0) + R(x), to get close to the objective but also not overfit.

https://en.wikipedia.org/wiki/Regularization_(mathematics)


You're right, and here, they're including the prior R(x) in the model architecture. So R(x) = 0 here,

and they pose x = f_theta(z), where f_theta is a neural net.

So they can avoid overfitting by stopping the training at the right moment.

So they're not really trying to minimize E(x, x0), but rather to get a small enough value, so that they obtain good results. A too small value means overfitting.


I think x_0 refers to the corrupted image and x is the (denoised/upscaled/inpainted) image the optimization algorithm tries to come up. Without R(x) the optimal solution is simply x=x0. So we need a good R(x) that can measure how good or natural the solution image is. In this work the R(x) is constraining the valid x to those that can be represented by a CNN.


Would a plausible explanation of this be that one part of an image tells you a hell of a lot about another part of it? And that ConvNet structure captures that really well?


Yes


How can it possibly know what was in the white areas of the library? Is there a residual image?

Seems impossible that it guesses correctly.


It cannot know, but it can extend what’s around it to something that “looks believable”. I don’t fundamentally think it is different from asking an artist to complete the image. You’ll get an image, but no actual information about what was under the white.


It doesn't guess "correctly" at all. Zoom in on the image, and focus on the filled-in areas, they look really blurry. It just doesn't look very bad from a birds-eye view.


It seems like it should be similar in capability to a wavelet-based approach to e.g. image inpainting. In other words the neural network architecture essentially establishes a basis of features onto which the image is projected. Gabor wavelets are known to occur in the human visual system as the 'image elements' in a similar setup -- Gabor wavelets are kind of optimal, but clearly the features that this nn architecture uses are pretty effective too.


Somewhat similar to content aware fill in Photoshop [0]. The untrained network can latch onto frequent patterns and match them to holes in the data.

Why doesn’t it paint everything white? Are these actually transparent images or are they somehow tagged?

[0]https://helpx.adobe.com/photoshop/using/content-aware-patch-...


Yes, for the inpainting, the parts to be painted (big white deleted areas) are supplied as masks, so it doesn't try to match them.


But its generating unique content in those areas...


Yes, every "run" of the network is generating pixels in those areas, but they're not being compared against the white (deleted) pixels. On the final run of the network, they're still not being compared against anything, except by us, visually, when we look at those pixels.


Probably this is a consequence of least-squares training; the 'average' solution is extremely unlikely.


It doesn't it repeats/generates pixels that minimize E, so the behaviour heavily depends on the choice of E. for inpainting E is the difference between their generated x and the true x but they mask out the error in the white areas. The model gets good results by just copying nearby pixels.


because your brain runs on same CNN software i.e. doing pattern matching, not image matching.


Maybe it's the presentation of the restoration process, but I'm particularly impressed with the inpainting sample.

The idea of not running the simulation `past` the realistic interpretation and using that result makes sense but the results are way beyond what I would have expected!

Great work on the write up.


I feel like this is related to the information bottleneck idea that's been floating around for some time [1]. I only really understand both of these at a superficial level, but from what I think I understand one thing that they observed is that there are two phases when training a deep learning model: a phase which maximizes the mutual information (?) between the input and the output, and a compression phase which compresses the learned representation. In that light this work makes sense, since artifacts are essentially noise that the network would filter out in the fitting process.

Very cool work though.

[1]: https://youtu.be/bLqJHjXihK8


I don't see how any choice of a function g(theta) could have the property they desire, ie could eliminate R(g(theta)). Can anyone explain?


It's expressed somewhat awkwardly, but what's going on is that R(x) is zero if x is in range of g, and infinite otherwise. Choice of g is such that natural images are in range of g, and non-natural images aren't.


Sorry I still don't understand. They require g to be surjective. Edit: in their paper they call it f_theta and it's explicitly not surjective. Dunno why their writeup is so confused.


Question to anyone that knows this area in depth.

I'm assuming that this impressive feat has a disadvantage (over the traditional example intensive technique of training a CNN with something like ImageNet) in the form of taking a long time to generate the corrected image.

With that assumption in mind, could this new technique be reversed? As in feeding in sharp images, getting back out corrupted ones, for the purposes of generating data sets where there isn't much data to begin with?

You could then take that data and use it to train a more traditional CNN to sort of amortize the results of the technique in the paper and have the process happening faster.


Do you mean generative adversarial networks?

https://en.m.wikipedia.org/wiki/Generative_adversarial_netwo...


No not really, this isn't generation for the sake of classification. It's generation in order to generate a dataset to train a new network that exhibits the characteristics of the network in the paper that can be used quickly and in a more general fashion.


Meaning that your assumption is that there is an optimal generalized network structure for generate a neural network as described in the paper and that as such, what is missing from the research is how to create an optimal way to generate a malformed source/target seed.

Is that correct?


I'm not suggesting it's missing from the research as I don't have anywhere near the area of expertise to make that call.

But you're correct in your summary: I'm wondering if that's a possible follow up.


Isn't it simply hardcore overfitting of network?

I haven't read the paper yet but it seems to me that it shows you can use overfitting for output interpolation with great results.

Same should be possibile for example for interpolation of video images instead od pixels.


So what kind of generative stuff does it "learn" from really really sparse data, or from random noise? Trained generative models can produce some crazy things, I wonder if something similar could work here.

Recursive geometric patterns perhaps?


An interesting question this raises is--what determines when the training has overfitted to the noise/error in the input image? If that's another NN that provides a perceptual model of quality, how was it trained? ;)


This is as close to magic as humanly possible.


Safe to say that Arthur C Clarke would count this as 'sufficiently advanced technology' then I guess :)



I'm finding it hard to put into words what I find wrong with this paper, but ... here goes nothing.

So, the novel thing here is that an encoder-decoder network applied to an image can learn enough from a source image to be useful. In some ways that's obvious, but the effectiveness of it on reconstruction tasks is certainly surprising.

I have two problems, though. One is that I would take the reconstruction results with a grain of salt. The examples are clearly lab queens, where the occluded regions are not particularly interesting/challenging.

Two is the conclusion the authors reach. Somehow the authors go from the novel discovery I describe above, to saying that somehow the architecture of the network is a prior.

Well ... I mean, yeah a network's architecture _is_ a prior. But it's not actually significant.

See, in the dark ages of machine learning we had only fully connected networks. They sucked. They'd always overfit and underperform or were impossible to train. Then we finally got convolutional networks, and suddenly a whole slew of machine learning problems became easier and that hurdled us into the current renaissance.

But, you see, convolutional networks weren't the _only_ reason for the dawn of this new age. Rather it was three major things: 1) Convolutional layers, 2) more data, 3) more computing power.

Some time after the "discovery" of convolutional layers we found out that, hey, our old fully connected networks actually _do_ work. If you give them enough data and enough computational power, you can get them to perform as well as state of the art convolution networks. The great thing about fully connected networks is that they assume nothing. That means A) you can theoretically get better results and B) you don't have to spend time designing an architecture.

So we already know that architecture isn't ultimately important. You can have a giant, fully connected network, and it _will_ work, if you feed it enough data and have the computational power necessary to train such a beast.

Convolutional layers are just simplifications which make training easier. They are priors in the sense that we know a fully connected layer in image applications would just devolve into a convolutional layer anyway, so we might as well start with a convolution layer. That "design" is the prior. But it's not mandatory; the network would still function without that "prior".

So ... I'm not sure how the authors are taking their research and using it to come to the conclusion that their results are because of some magical property imbued into the network by the "priors" of the architecture.

They apparently tried other architectures and got poor results, and so they use that to claim that architecture is the only reason their technique works.

That's like if you started with ResNet for a classification problem, tried other architectures, saw that they performed worse, and then published a paper saying that Residual Networks somehow embody the fundamental forces of natural images in their architecture, and that's why they work. When the truth is that ResNets aren't special, they are just easier to train.

Another example from the annals of machine learning history: time and time again when there is a breakthrough in architectures, it's usually followed a few years later by a simplification of the architecture. For example we started with networks like VGG which are these big, hand crafted architectures. Slowly over time architectures have become _less_ exotic, instead opting to simply define a basic building block repeated N times.

The reason for this is because in the intervening years we gather more training data, better training techniques, and more computational power. So we can instead use a more homogeneous architecture which has _less_ assumptions (less priors) and at the end of the day we get _better_ results.

I'll repeat that. We put _less_ priors into our networks and we get better results.

So on the one hand we have _all_ of machine learning history telling us that priors in architectures are _bad_. On the other hand we have this paper which makes some really weird logical leap from "we tried a few architectures, they were worse, so architecture is _key_ to machine learning and it's important because we need good priors built into the architecture."

Anyone remember hand crafted feature vectors? I do. Those were priors. Guess what happened when we got rid of them and used generic networks feeding directly from the raw data? Oh right, all of modern machine learning...


> Convolutional layers are just simplifications which make training easier. They are priors in the sense that we know a fully connected layer in image applications would just devolve into a convolutional layer anyway, so we might as well start with a convolution layer. That "design" is the prior. But it's not mandatory; the network would still function without that "prior".

As far as I know this is incorrect. Can you point to a paper that shows this? If by "easier to train" you mean that the models do not overfit training data, then that's the whole point of using correct priors / hypothesis classes.

I'm not sure what bugs you in this paper, but the point is that they decouple the prior architecture from the training/optimization mechanism, and that seems interesting.


>Another example from the annals of machine learning history: time and time again when there is a breakthrough in architectures, it's usually followed a few years later by a simplification of the architecture. For example we started with networks like VGG which are these big, hand crafted architectures. Slowly over time architectures have become _less_ exotic, instead opting to simply define a basic building block repeated N times.

>The reason for this is because in the intervening years we gather more training data, better training techniques, and more computational power. So we can instead use a more homogeneous architecture which has _less_ assumptions (less priors) and at the end of the day we get _better_ results.

>I'll repeat that. We put _less_ priors into our networks and we get better results.

Is move from VGG-like architectures a question of moving to using less priors or just move to different priors that work better?

In addition, I'd claim that a choice of training method also imbues the whole ML system with prior knowledge. If we add a dropout layer, we wish to enforce a particular regularization scheme on the model because we know that we need to regularize to model to avoid overfitting. If we choose to use a particular optimizer instead of another, we use it because we know it has desirable properties on particular objective objection.

>So we already know that architecture isn't ultimately important. You can have a giant, fully connected network, and it _will_ work, if you feed it enough data and have the computational power necessary to train such a beast.

I'm slightly unclear about your argument here. What is the point of training the fully connected network for a long time and with massive datasets to get equivalent results and structure as convnets if we already do know that convolutional structure with weight sharing is a good fit for natural images? Deep learning has been casted as a magical black box that produces impressive results "yet the scientists don't know why!" by the press. On the contrary, these results suggest there might be less magic in neural net models as the recent hype would suggest.

> The examples are clearly lab queens, where the occluded regions are not particularly interesting/challenging.

To me they look quite like the standard examples used in inpainting and denoising articles.


This is unbelievable that it works so well.


That Zebra one looks like a "before and after" image with really good MSAA. I realize the techniques are quite different, it's still interesting how the result seems so similar on a superficial level.


This is interesting news to neuroscientists as well. Even without plasticity cortical networks might be performing useful functions. Plasticity may then perform some different function.


I tried to run their ipynb but it wants something called skip. I don't think it's the "skip" on PyPi (that doesn't work anyway) - so what could they be using?


skip.py is in the repository under models directory.


ok.. this is revolutionary. Using the architecture as a way to capture an image prior hints at how network structure and captured invariance are related. By analogy it leads to thinking of brain areas as both hard coded prior knowledge through their natural arrangement and plastic learning structure. Turning the problem on its head shines a new light to how we could conceive network architectures.


Is there a straightforward explanation of how to run this anywhere, for those not steeped in the arcane ways of Jupyter?


Cool but I think a learned method is more interesting, especially how it can be used for image compression.


Can someone put their results and explanation in laymen's terms please?


how do I get the code running easily? I'm familiar with python but not with ipynb at all. Maybe somebody can give me some easy to follow instructions, thx.


i find it impressive how it placed a lamp over the library window. After that i was expecting a vase with flowers to be placed on the table in the next, palace shot :)


could you highlight which part this is ?. I think it can only fill in patterns that already exist in the image. I see a big artifact over the window but no lamp.


well, probably it is that deep (i hope) neural network inside my skull that interpreted the artifact as the lamp.


I wonder if you could do something similar with audio


And turn crappy music into good music.


This is amazing.


can someone please summarize this for us luddites? also wasnt there an HN summary project out there?


Am I missing something here? How does this apply to DNNs that tried to recognize images? Does it mean we can train a Net with just image now?


Truly congratulations! Amazing.


CSI:Miami...I'm sorry, I take it all back.


The intel and espionage communities are going to be all over this. This makes the Soviet photo retouching look like child's play.

Let's say you want to start a war, and need some evidence of chemical weapons. Now, you can drop in some images of chemical weapons and claim a GAN found them. Sample press releases:

"We believe this photo was retouched to hide the chemical weapons. Using a GAN, we recovered clear photographic proof of the chemical weapons."

"We believe they are using this tin-roofed building to hide the chemical weapons. Using a GAN on our own satellite images, we recovered clear photographic proof of the chemical weapons."


Can't see "we used a GAN" as an explanation that would fly. To counter it you can simply say "well, the GAN could be wrong."

The technique outlined above is useful because we as humans can eyeball the results and say "the improvement on that blurry zebra photo is great!", but we know that it's an improvement based on a networks hueristics that's good enough for us, not an exact replica of the information lost through noise and compression.


I'll grant you that it makes a good scapegoat, but frankly this doesn't really matter... the US went to war with Iraq with bad intel before the machine learning boom.




Applications are open for YC Winter 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: