So those new generated images are structurally very similar to the original sources. Neural net seems to be good at "reshuffling" of the sources. That's probably how things like reflections on the water got there, even if not present in the doodles.
The algorithm can only reuse combinations of patterns that it knows about, it can do extrapolation but it often ends up being just like a blend. However, you can give it multiple images and it'd borrow the best features from either—for example drawing from all of Monet's work. (Needs more optimization for this to work though, takes a lot of time and memory.)
As for the images, as long as the type of scene is roughly the same it'll work fine. The fact it can copy things "semantically" by understanding the content of the image makes it work much more reliably—at the cost of extra annotations from somewhere. With the original Deep Style Network it's very fragile to input conditions, and composition needs to match very well for it to work (or you pick an abstract style). That was part of the motivation for researching this over the past months.
That is, the final image, the one that looks the best, is the result of you doing tweaks to doodles to get something that neural net can then fill-in convincingly?
Or are these a different runs of the same method based on the same inputs, that have some natural variability, and you selected the one that looked the best?
Or are these progression steps in one run of the automated algorithm?
Language in the blog post is kinda ambiguous, not sure which steps were done by algorithm and which by a human being.
What still isn't clear to me is how exactly that "workflow" demo (and consequently the "money-shot" final generated images) happened.
There is a progression of generated images with increasing quality. Who did which steps in those iterations?
Blog post uses ambiguous language: "N-th image tries / removes / fixes", etc.
It's not clear though if it was:
1) algorithm steps (keep computing more till generated image looks good), or
2) human being tweaking inputs to fixed algorithm (keep painting new input/output doodles till generated image looks good), or
3) human being tweaking algorithm itself (change code till generated image looks good).
The output gets better because through iteration the glitches are removed incrementally, and it converges on a final painting that looks good!
I've done some experimentation with neural network-based style transfer (this one: https://github.com/jcjohnson/neural-style_), and the results that I got pointed strongly to the same effect: it works well if the two images (source for style and source for content) are very similar in framing, composition and subject, and very badly if they're wildly different.
Having said that, this algorithm seems to be MUCH better than the one I tried at transferring style. I'd have expected those paintings to transfer to the doodles much worse than they did.
But don't expect to take a portrait doodle and a landscape source and have it come out well :)
Representational art is all about modelling, highlighting and/or transforming the hinting, depending on the level of abstraction. E.g. if you look at portraits, the pen/brush strokes usually emphasise 3D structures.
This code does a little of that, but the model is extremely crude compared to the models the human brain uses.
For genuine semantic perception you'd have to duplicate - and maybe improve - the human model. I doubt you can do that in 2D, because the human model is trained by years of genuine 3D perception.
That's not to sound negative - I think this is very impressive visually. But it could be taken further.
Actually, the code does none of that ;-) All of the semantics are provided by the users: either as manual annotations or by plugging in an existing architecture for semantic segmentation / pixel labeling. It's designed to be independent of the source of the semantic maps, so we can continue to work on both problems separately.
It works for basic color segmentation already, and here are some of the papers we're integrating currently: http://gitxiv.com/search/?q=segmentation
For details, the research paper is linked on the GitHub page: http://arxiv.org/abs/1603.01768
For a video and higher-level overview see my article from yesterday: http://nucl.ai/blog/neural-doodles/
EDIT: Actually reading more closely I guess 10 minutes on a machine with a decent GPU is a lot of server load :|.
We'll try to integrate the idea of semantic style transfer into @DeepForger in the future, but this require quite a bit of work to get it to reliably understand portraits or landscapes without anyone's intervention. The fact it does require these semantic maps for all images makes it less straightforward to release as a service.
AMD is going to support CUDA somehow too, I think that's a sign they admitted defeat on OpenCL for this.
You could credibly put any words in the mouth of anyone.
I've even seen a demo converting one person's voice to another (without going through text) trying to preserve the pattern (pauses, stresses, etc.). It was kinda cool, but you wouldn't think it was the other person in a genuine way.
NSA for notary?
Even for illustrative work, where it can give you a good base, it still sucks, because for actual painters this step (thumbnailing) is actually the quickest; most of the time-consuming painting process is 'finishing', or 'detailing' the rough.
However, where it's great is in giving the ability to inexperienced people to paint well. The hard part of the painting is getting the lightning, color scheme, perspective right, but the finishing process is quite mechanical. So it could ease the outsourcing of some art assets creation.
Of course the originial craft to producing high quality output is still needed - and just one image is not going to be sufficient anyway.
As you mentioned, I can also think of giving lesser experienced folks the ability to tinker with scene setup, dimensions, ratios etc and get faster 'final' results, although the 'old school' approach to getting those right before actually detailing something is quite important.
You can already do that in 3D if the overall concepts have been decided and some basic assets are there. Just switching the textures, lightning conditions, and some predefined building blocks do basically the same thing as this algorithm, except it's already part of the pipeline, and it gives you much more since it's 3D.
The other thing is that these algorithms look indecently good when you see a thumbnail, but very bad if you're looking at it too closely. These cool 'concept art' pieces that look good are 90% of what people see, but they do not represent 10% of the concept art work; most of it are boring details of joints, how the blade is strapped to the costume, how windows open, unsexy stuff as can be (that you can't do with such algorithms).
 CNNMRF, Neural Style plus Markov Random Fields: https://github.com/chuanli11/CNNMRF
What I've found so far is that it takes a while to get good results like something that looks like its own creation instead of an overlap of pictures. There's no exact way to do this. If you modify existing artwork it works well enough since the source is already somewhat divorced from reality but photos are difficult. When it works it's amazing though.
First, the paper I call "Neural Patches" (Li, January 2016) makes it possible to apply context-sensitive style, so you have more control how things map from one image to another. Second, we added extra annotations (which you can specify by hand or from a segmentation algorithm) that helps you control exactly how you want the styles to map. We call that "semantic style transfer" (Champandard, March 2016).
You're right about it being hard otherwise, it was for many months and that's what pushed this particular line of research! Try it and see ;-)
You may remember the "Mazda" brand of bulbs
There's no additional training apart from that. The neural network is used to extract patterns (grain/texture/style) and a separate optimization tries to reproduce them as appropriate.
Just trying to get a birds-eye view of the algorithm :)
 e.g. https://www.google.com/search?q=usgs+map&safe=active&client=...
These examples are in the paper above, direct link for convenience: