Open a NIPS paper from 2010 or so, and you'll see extremely dense mathematics: nonparametrics, variational approximation, sampling theory, riemannian geometry. But from my (admittedly small) sampling of the convnet / RNN literature there really doesn't seem to be much maths there. The typical paper seems to run along the lines of "We tried this, and it worked".
I'm not sure whether there's anything to learn from this observation, but I think it's striking all the same.
What this created though, was a clean break from a field with a theoretical foundation like what we expect in CS usually, to an empirical foundation, like what we see in psychology, sociology, medicine etc. The theory will eventually catch up, but the fact that "it works" really trumps having a full understanding of why it works (at least for now).
Maybe... but the emergent behavior in a complicated system (and these networks are only getting more complicated, not less) is likely to quickly become more complex than the human mind can reasonably be expected to understand, given any amount of time.
We actually know a lot less about biology, for example, than your typical "biology 101" course would lead you to believe. It's pretty cool that we figured out how DNA translates into proteins, but actually figuring out how a cell decides which DNA to express at any given time, much less what any arbitrary protein actually does, takes you right back into guess-and-check territory. There are heuristics you can use to eliminate some guesses more quickly than others, but we're so far away from a unified explanatory theory of biology that I'm doubtful one will ever be developed.
I wouldn't argue that it means we don't know much, in the end you can argue we don't know anything because all empirical sciences are beholden to the belief of abstractions that can be attributed to some imperceptible abstraction until you (don't) reach the bottom of the stack of turtles.
Effectively, deep nets are good at tricking human visual cortex, so I'm not sure theres a deeper mathematical reason they work well, in the same way that the reason mp3 compression works so well is that they tested it against human hearing.
But yes, all deep learning is based on gradient descent which is a greedy heuristic algorithm and it's not a priori apparent it should do anything interesting. But it does.
But there are deeper reasons that mp3 works well, besides "tested against humans". It's just that those reasons are about psychoacoustics. The most obvious example is that humans are bad at discerning amplitude and phase differences at high frequencies.
I expect that as we get deeper into the theory around why particular algorithms work, we'll find similar answers. We'll be able to say things like "when an image has the following statistical properties, we're likely to interpret it as a cat". Of course, right now we can trivially say that if by "the following statistical properties" we mean "activates this specifically trained network in a particular way", but the point is that we can probably boil that down into deeper (and simpler) statements about how human image recognition works.
I can't find a link about it now, but I remember reading that for their Translate app, Google created a way to lossily compress their character recognition neural net. That kind of research seems likely to me to lead to a better understanding of how to boil down the insights that these learning algorithms are gleaning.
This Two Minute Papers video has some thinking along the same lines: https://www.youtube.com/watch?v=ZBWTD2aNb_o
But I'd also point out that the human visual cortex is itself a bunch of models about the real world. It's gotten that way through a combination of evolution and learning. When engineers train neural networks to do tricks like TFA shows, those networks are effectively modeling the real world in two different ways: directly, and by proxy in modeling human visual systems. So the insights we can find by studying them are likely to tell us not only about our own vision, but about the world itself.
Finally, it's worth noting that graphics algorithms have always been about human visual systems and the real physical world, as opposed to deep mathematical truths. Look at this paper for example: http://www.jiansun.org/papers/Dehaze_CVPR2009.pdf
The oversimplified plain English insight those researchers had is "For a photo that doesn't contain haze, you usually can find a fully saturated pixel within some reasonably small neighborhood of any point in the image." There's no deep mathematical insight there, but there is an important insight into the statistical nature of natural images – a fact about the physical world. And they use that insight to great effect: they can remove haze from photographs, and even use the haze as a way of estimating a depth map.
So sure, while you're right that there probably aren't many deep mathematical insights to find from the results of deep learning algorithms (which is generally true of graphics algorithms anyway), I wouldn't gloss over the deep practical insights that they encode.
In some respects this is good, in other respects it's awful.
This sounds like science to me, is "we tried this, and it worked" something to shy away from for some reason? Of course, it's super important to publish "we tried this, and it didn't work." But that's another topic altogether...
If less math really is a trend, do you see it as a bad thing? You didn't state it strongly, but you've hinted at that, and it seems like the replies assume that's what you're suggesting.
I love math and mathy papers, and I would still welcome a trend toward less math in papers in return for more effort spent on making simple ideas plain and easily understood. But academics don't always operate that way. Math in papers is often used to obscure simple ideas, sometimes on purpose, and sometimes it's an indicator that the author doesn't understand the domain clearly enough but still wants to sound smart. Sometimes a paper really requires dense math, but not very often. Dense math almost always makes a paper more difficult to reproduce. Either way it is harder for even experts to evaluate the quality of dense math than of expository writing that strives for simplicity and clarity.
Neural networks are really simple math under the hood, well understood algorithms and simple linear algebra, why not write great papers that work and don't re-hash the math but instead focus on clarity, reproducibility and results?
That means current research is operating via trial and error. One man's trial and error is another's blind search. Without maths to point researchers in the right direction, neural net research could easily hit a wall once the low-hanging fruit has been picked.
I'm not a web programmer, but I imagine few developers could remember the mathematics of the sorting algorithms that are fundamental underpinnings of their work (if they ever learned them at all). Yet I'm not sure it matters, even to great developers. The same thing will probably ultimately be true of machine learning. Honestly, you need not know what a convolution is to build a perfectly usable convnet. (And ultimately you may not need to even build your own if you can use a nifty amazon API.)
Whether NIPS should care or not is a separate story. It seems a little sad - I took all this hard pure math as an undergrad, and it doesn't seem to be important if all I'm doing is changing a few layer parameters (even if the change is ingenious).
 mathematics as in proof of sorting, proof of bounds on space/time complexity, etc.
If you want your knowledge to be hold as useful, then you need find a usage case for it. Simple as that. This is not only true for math, but for all the other techniques as well. Otherwise so-called knowledge is yet another self-indulgent toy, disconnected even further from being useful.
Contrary to what OP states here, recent development of WGAN and LSGAN pretty much math driven, and it leads to very useful realworld extension to the original model, that improves it quite a bit.
We don't have much math I'm aware of that can describe the capabilities of different network configurations from first principles. Even though we constructed the network it feels like we're back to the beginnings of science with this one.
Let's change this and see what happens.
I think it is our duty to explore all the possibilities of neural networks. They are still not fully understood yet. Theory will catch up once we practically understand the beast. It might not be pure math, but it is a necessary exploration nonetheless.
Too bad, it is already there :P
Disclaimer: I am the author
This shows that computers soon will have the ability to fool our senses so well that we may not even believe reality when it is right in front of us. Some of the pictures, when I was just viewing them (before reading captions or titles) looked real. I was astounded to see that they were derivatives from paintings.
The implications are significant, not just in things like gaming, or finance, but especially in psychology, where the delicate aspects of the mind may be easily disrupted. I expect there will be numerous growth in neuroses over the coming decades. Technology will have surpassed natural evolution by such a margin, that it could be difficult to recover.
Current link: https://junyanz.github.io/CycleGAN/
Previous title: Berkeley's software turns paintings into photos, horses into zebras, and more
Previous link: https://github.com/junyanz/CycleGAN/
The previous title, which was based on the repo description of the previous link, was much more informative to me.
It'll be interesting to watch how copyright law treats training data. Suppose I want to make an animation in the style of Disney or Studio Ghibli or Kyoto Animation, and so I use their entire body of work as training data to generate output in the same style from my own sketches. Is that now a derivative work? Is it different when a human copies a style while drawing by hand (requiring a degree of effort and skill that few possess) versus if a computer does most of the work (requiring low effort and ordinary artistic skills)? Would animation be treated any differently than, say, training an AI to write songs like Bob Dylan or write stories like J.R.R. Tolkien or host a radio show like Garrison Keillor?
I wish they had a more samples of that. I'd be interested to see where the threshold is in terms of detail to get something vaguely photorealistic
edit: in the other post there's a link to the paper with some good examples. I think the amount of detail you'd need would be pretty prohibitive in terms of artistic effort. On the other hand, it would be interesting to use this on something like a rough CG rendering to try to correct some of the shading imbalance and such.
Yet there really is something disturbing about seeing a computer resurrect so much of the mind of an artist who's been dead for nearly a century. I know the GPU doesn't understand what it's doing, but did Monet?
When I paint I don't really understand. There may be occasional moments of clarity, I like to think that I make deliberate choices based on the emotions and thoughts within me that I'd like to reflect to the people experiencing my art, but... Honestly am I so different from a neural network?
I am a neural network. Much of what goes into creating a piece of art is based on intuition, on experience and practice, pathways eroded into my mind from years of thoughts traversing the same landscape. The art I create is unique to me, not in that it cannot be reproduced but in that every action I take is a reflection and an echo of all the moments remembered and forgotten that create me.
How much of Monet's mind is in a GPU in Berkeley?
That said, I am amazed by the results even so.
I also want to note here, that I don't see any reason why machine intelligence could not produce meaningful works of art but it will require a new way of looking at it.
Very little, arguably none. Imitation is far removed from creation. If you see an art student reproduce Monet's paintings or redo an existing image in Monet's style, would you ask "how much of Monet's mind is in the art student?"
Sadly, I didn't work through the books like I had planned at all. Lacked the discipline to come home and work through them at the time. I really regret it now as things are blowing up and, as you said, it seems like every week there is something new and interesting
I wonder if it could start with input of a normal portrait of you, style: some celebrity of your gender, and output: what you would look like after receiving the same style? It doesn't seem beyond the examples shown at your link...
It would bring a whole new meaning to the word "filter" (instagram, etc.) I particularly like that the original is very much present in the output: it would still be "you".
But maybe there are subtle problems that I don't notice because I don't look at the subject matter as carefully. People pay a lot of attention to recognizing each other, perhaps the effect would not transfer as well as these examples presented.
Matlab seems to need a commercial license for at least one barrier.
- In the past, you needed to have a pianist at home to perform you a song, with the music box and then the phonograph you don't need to hire anyone anymore. It's probably not as good as a live performance (maybe?), but it's good enough for many people, and much much cheaper, faster, and available.
- You needed advanced knowledge and equipments at home to produce magazine-style tri-fold leaflets or wedding invitations, with modern word processors you can use a template and be alright. It's probably not as good as a professionally customized design (maybe?), but it's good enough for many people, and much much cheaper, faster, and available.
- You used to hire a photographer or an artist to have your portrait photographed/painted, now you can do with your NVIDIA card at home. It's probably not as good as a professionally painted one, but it's good enough for many people, and much much cheaper, faster, and available.
...in fact, I know you can already get photos printed to canvas - but taking it to the next stage of texture would be amazing - right now, I think the best you can get is to have a trained person "highlight" areas of the canvas with paint. From what I understand, there's a whole "village" or small city in China that specializes in custom painted images (which a lot of online places use); I would be surprised if there isn't an effort to automate this work.
By way of analogy: I own some furniture from Ikea and some nice antique furniture. I don't know or care much about the provenance of the antique furniture. It's made up of bargains from yard/estate sales. If a big box store could sell me an inexpensive antique-alike bureau that was indistinguishable from my existing one (to unaided human senses), I'd happily buy it. I want the thing more than the story behind the thing.
Why bother catching politicians doing something when you can just draw them in? Will wreak havoc on societies with weak politics/reporting culture.
Journalist: "Here is a compromising photo of a politician."
Politician: "Here are 1M photos of every politician doing every imaginable illegal act. Prove that your one photo is not similarly fabricated."
Iconic photos are crucial for telling news in a way people remember. For example:
* We remember Tienemen square because of [tank man](https://en.wikipedia.org/wiki/Tank_Man)
* We remember napalm in Vietnam because of [Phan Thi Kim Phuc](https://en.wikipedia.org/wiki/Tank_Man)
* There are too many pictures from the civil rights movement, so I'll [just link to the Getty's gallery](http://www.gettyimages.com/event/the-american-souths-troubli...). These pictures tell the story of violence against black far better than any article ever could.
errr.... I hate to break it to you but... https://github.com/phillipi/pix2pix/blob/master/imgs/example...
Of course it relied heavily on everybody knowing everybody else in much smaller societies, but that we can do also by doing an AR face lookup on every person we ever interact with.
Of course, eyewitness testimony is notoriously unreliable even when not intentionally deceptive, and testimony is much easier to falsify even without any technical aid than any other form of evidence, so relying solely on personal testimony wouldn't really help to avoid fabricated evidence, anyway.
...or a drug-usage scene.
...or any other thing you could think up. It's even possible this could be pushed into a generated movie. You could even work the "voices" in (have the person say virtually anything you want, nearly perfectly, synched perfectly with the facial expressions/movements/body language).
There's a lot of scary, yet interesting possibilities here!
Hmm - think of the possibilities for CGI effects? Rather than having to build a 3D model, just have the software imagine the scene (I know - much easier said than done - but I think the possibility is there, now).
Another idea: Could this kind of software take a movie or still image, and "imagine" a three-d model from it? I don't see why not...
...and yes, I understand this software as being a particular form of a neural network, and that for any of this it takes massive compute power in the form of multiple-GPU time and a ton of memory - and likely isn't yet in the realm of doing it yourself at home except in simpler examples - which might still take days or weeks to train, and a good amount of time to render; I just want to indicate that I have a certain minor level of understanding of what this software is about and that I don't believe for a second that it is a basic "photoshop" filtering system.
Seriously, if this continues, I don't know how to keep up with this field. I spend at least an hour a day just reading about the work that has been done (i.e. reading the research).
>All DiscoGAN experiments are on 64x64, this is high resolution. I don't know whether this is an important difference though.
However, the kind of network used here (GAN) can be used in domains other than images (text, financial data).
Imagine if you trained a network to generate fraudulent financial data and another to become an expert at catching fraud, each feeding back into the other's skill. This is the concept of GANs at heart and definitely disruptive if correctly executed.
Could be really useful for those age-progressed photos used in missing persons listings, for instance.
If you look at the nature of the transformations achieved in this paper, you might note that they are changes in, well, style. That is, the presentations of the objects in an image are represented using a different _style_, but they remain the same object.
As an example, take a look at the horse/zebra transformation; the horse obtains a zebra's stripes, but it _structurally_ still looks like a horse. That is to say, a zebra and a horse have identifiably different bodily proportions, and the horse's bodily proportions are not changed by the style transfer. Similarly, the trees in the summer -> winter transformation do not have the slightly saggy branches that they would due to the weight of snow on them.
With that in mind, I would be surprised if the approach taken in this work, taken as-is, would be able to change the gender or ethnicity of a person in a photo. There are structural differences between men's and women's faces, and similarly between races. I would imagine that an attempt of a race transfer, as I suppose you would call it, would largely amount to changing skin tone.
The age progression application, though, might work under the limitations I have speculated, at least for aging photos of a person that is already more or less mature. A person's facial structure does not significantly change once adulthood is reached, so simply transferring stylistic features might add wrinkles and other age-related changes in a realistic way.
Again, speculation. And even if the speculation is correct, that is not to say that some modifications to the approach would not be able to lift the constraints I made up.
Edit: reading through the rest of the paper behind this work, looks like I might not be far off:
>Although our method can achieve compelling results in
many cases, the results are far from uniformly positive. Several
typical failure cases are shown in Figure 17. On translation
tasks that involve color and texture changes, like many
of those reported above, the method often succeeds. We
have also explored tasks that require geometric changes,
with little success.
The quote is from Section 6 ("Limitations and Discussion"), and example "limitations" are given in Figure 17.
The full project can be seen here:
This done easily in Photoshop.
- select green pixels, smooth it a bit, then paint white over it..
- apply a blue cast on it
I wonder if there is a way to fix this, possibly by stacking another GAN on top?
Maybe I'm out of the loop, but I haven't seen anything demonstrating results on "data" – the kinds of challenges that are actually valuable to businesses.
Why is that? Are those just less sexy / more proprietary in nature, or is there something about those challenges that make NN's less useful to them?
Second, there is also the peer reviewing problem. You are still trying to explain a very abstract concept to your peers in a paper which is usually limited to 6 or 8 pages. Text and images make for very graspable examples in such a short paper. That's the reason why some other data with a spatial prior is not used as often, like time series or EEG-data.
So, there is a combination of those two elements at play.
The second reason is pretty bogus (text/images more graspable). It's valid if you're talking about mass media / popular press. But for research papers 1) images / large snippets of text are actually a negative since images take a lot of space and 2) the people doing peer review are expert scientists. They know the benchmarks and the theory.
Examples are the whole Predictive Maintenance sector, the medical sector (Computer Aided Diagnosis) or insurance companies which use NN to for all kind of analyses.
In fact I remember talking to an ML guy with a PHD who was working on one of these types of problems and I asked "why not try NNs on this problem". He looked at me with disgust and said something akin to "it's provable that NNs can never do better than BOOST, so why use them?"
However, boosted decision trees don't work on image analysis at all, so these types of problems have become the standard for NNs.
It is also worth noting that people are more likely to try image problems if everyone else is trying image problems, because then it is easy to compare multiple algorithms together.