Hacker News new | past | comments | ask | show | jobs | submit login
Teaching Machines to Draw (googleblog.com)
306 points by yarapavan on Apr 14, 2017 | hide | past | web | favorite | 72 comments

I've read through all comments and referred docs, but no one seemed to offer much reason why we'd want machines to draw sketches. An incomplete list:

  * It's a different way to represent drawings. Alternative to pixels  
  * Perhaps making it easier to render old-school animation at one point?  
It's a fun rabbit hole though, the many sub-perfect bicycle sketches reminded me to this project, when someone created 3D renders from bicycle sketches: https://www.behance.net/gallery/35437979/Velocipedia?ilo0=1

Surely it's because this is an aspect of intelligence? Being able to generalize an image is like being able to summarize a text. It reduces something individuated and complex to a set of shared properties.

So the Why seems self-evident to me: if you want to understand intelligence, try to make machines do intelligent things.

That's not to say I think that approach will necessarily lead to a system capable of general intelligence. But I'm assuming that is their current approach.

TLDR: The point is to correct a known flaw with image recognition algorithms.

I'm not a professional researcher, but here's what I gather from reading other articles:

One major problem with image recognition in machines is that while they are generally able to recognize real images correctly, they are 1) Easily fooled 2) Unable to have human-type image understanding. For example, you can recognize an animated character as a human, even though you may never have seen that particular style of drawing before.

One major problem that people realized is that deep neural networks have a tendency to recognize individual facets, for example, a nose. So if you want a neural network to believe something is a human, pepper it with as many noses as you can. It knows that humans have noses so the more noses the more human it must be. Of course, if we saw an image like this we wouldn't think that it was a human because we know a human has only one nose.

To me it seems the primary thrust of this research is to generate a NN that can recognize, like a human, that if an entity has 10 eyes, it's probably not a cat. This is alluded to with the house of horrors cat photographs at the beginning of the article. You can see that when passed an image of a cat with 3 eyes, this neural network correctly removed one of the eyes to make it more realistic.

Here is an article explaining this problem: https://arxiv.org/pdf/1412.1897.pdf

> Fun facts: Some diversities are gender driven. Nearly 90% of drawings in which the chain is attached to the front wheel (or both to the front and the rear) were made by females. On the other hand, while men generally tend to place the chain correctly, they are more keen to over-complicate the frame when they realize they are not drawing it correctly.

Hm. So a machine trained exclusively by men/women would make different drawings?

Correct me if I'm wrong, but I'd imagine that it would be in the same vein as teaching a child to draw; you don't start by teaching them how to draw like Van Gogh. I'd assume that Google is going to up the complexity once their neural network reaches a certain milestone. Imagine 20 years from now and you have a humanistic robot drawing with your child at the kitchen table à la iRobot.

It's interesting to think about, regardless. Stuff like this gets me excited (and motivated!) to delve into machine learning and AI.

GANs are capable of generating full color images from semantic space and it exists already, it's not a thing from 20 years ahead.

Those full color images are way worse than the sketches in TFA which look like a person could have drawn them.

> … a humanistic robot …

That's what I was referring to happening in 20 years. :)

Ironically, in irobot he draws more like a dot matrix printer than a human: https://youtu.be/Bs60aWyLrnI

It is not about sketches at all. It is a nice AI problem that is becoming approachable now with algorithms and computing. It is a step towards machines being able to learn more and more complex concepts and producing correspondingly complex behaviors.

The second paragraph mentions a couple of reasons:

> ...we created a model that potentially has many applications, from assisting the creative process of an artist, to helping teach students how to draw.

For vector output, there's also a subtle but important distinction between machine-drawn images (based on rasterized data) converted to vectors, and generating machine-drawn vector images. The latter could be more useful in (as you mention) animation, as well as producing vector images with clearly isolated elements - e.g., a vector image wherein occluded elements are represented with full masked shapes (preserving editable layers) rather than a single "flattened" vectorized layer.

Think less about drawing pictures of cats and more about the path mimickry. The machine is "putting a pen down" and drawing vector strokes (these are not bitmaps). Perhaps velocity will come next. And then understanding of pen tilt, pressure, nib angle of attack, response to various paper textures, brush/nib selection, etc.

This is a scary milestone on the road to general intelligence. In this case understanding abstract concepts.

When machines learn something - they start being able to replicate it perfectly across as many machines as they want, parallelize it and execute it without mistakes more times than humans have ever done in history.

So that means - a nearly infinite glut of amazing art, jokes, music and movies. And not just that... but attacks on all our systems including reputation, trust, voting and so on.

Today our systems depend on the assumption that attackers are limited in their ability to proceed and expand quickly. How would things work if attackers were not? You already prefer to ask google more than your parents. What if software made better jokes, drawings and had sex better? And simulated emotions better?

I am just as terrified as you but we're getting downvoted by idealists. This is a significant step towards the obsolescence of humanity

Sometimes doing things that doesn't serve a direct purpose is a great way to learn something that serves a pretty valuable purpose.

GPS was invented by a few guys "just playing around" with the signals put out by sputnik.

It could also help to create another representation of an image or an object in the image. Think of "Please find something that looks like this: human sketches some lines"

I assume you look out far enough you might replace cartoon animators with AI. Maybe throwing them automation bones along the way.

Might also be helpful for captchas?

Those were some awesome bicycles.

How about using the same approaches for coming up with solutions to 3d printing objects?

Frankly, I think there are many reasons not to do this.

1. It's not computer art. I believe in the possibility of artificial intelligence, and when we encounter it it will be so different from human intelligence that asking it to make imitations of human art will seem like an insult. We probably won't understand its art very well either at first.

2. I'm getting really tired of people racing to automate every damn thing. even if we establish an economic utopia and nobody has to work any more what are people supposed to do all day if every human activity can be performed 'better' by a machine?

3. It won't really be 'better' though, it will just be more popular because so many programmers are trapped in a quantitative mindset and thus treat every problem they encounter like a nail to be hammered in. Imitative digital technologies will always be correlated with popularity, limiting creative innovation because developers can't think of a reason to optimize for or nurture anything that is initially unpopular.

Creative prostheses that all require the same amount of effort to deploy (ie none) will be hailed as 'allowing everyone to be an artist' without requiring them to in best any meaningful time or effort in ideas that don't pay off or that fail. The result, which we are already seeing, is a plethora of new material created with little effort that is as superficial as it is ephemeral, whose volume and variety will obscure its stultifying conventionality.

This is no more art than Cheese Whiz is food. It's Art-flavored mechanical product that functions to do no more than alleviate the masses' thirst for self-actualization without any adjustment of power structures and is thus fundamentally limited to reproduction of the cultural conditions from which it originates.

I think this is an anti-intellectual argument. Why would you argue against someone pursuing an idea? I don't see this as any threat to the distinction between art and mechanical reproduction.

Art has always been "ineffable" and will remain so, as long as humans have thoughts and feelings. Bad art, or lazy productions will be as ignored as ever.

I've already given you my reasons. Why not think about them for a while?

I hope the first AI lawnmower has a feature where when someone yells "Get off my lawn" it actually does it.

Are you kidding. No one is saying this should replace art. It's just an experiment in generative algorithms. It can though have interesting applications in assisted design, which I bet you would say is not art either anyways.

No, but I'm used to people not understanding me until much later.

I think i get what he is saying. I see code as a kind of human expression. it would be like a machine compiling itself. the machine did not make the thing, it did what a human told it to do. so a human saying draw me a cat would not be making art, only the original code and the beautiful ideas inside would be the art. art and programming share a kind of knowledge called techne. as opposed to epistme like history and such. hence why traditional artists and programmers are both so concerned with craftsmanship.

Examples ?

Hi, I'm the author of this work. Happy to take any questions.

Thanks for the article! It's really well written, and shows so many different applications and insights from this work. What I really like about models that fit a low-dimensional representation is that you can really "see" what the neural network learned by tweaking it or doing interpolations, arithmetic operations, etc. Awesome!

A bit of a shame that most commenters on HN are focusing on more metaphysic discussions and the eternal "This is not AI. You are just [insert something that people considered AI 6 months ago here]."

It seems that you use children development stages as inspiration for AI development. Can you explicit your approach?

You can also see my question as a rebuttal to the "it is useless" argument.

As a researcher, I was more interested in studying the doodles produced by children, compared to studying the drawings produced by professional artists or designers who may have been taught to draw a certain way, since perhaps the doodles are more closely aligned with the way we naturally think.

I was also fascinated at trying to understand how we are able to translate a vague concept in our minds, into a sequence of motor actors that doodles out this concept onto a piece of paper. We also take into account some feedback information during the doodling process. For example, I compare what I already doodled with what I actually want to draw, and decide I'll doodle next based on this information.

So we thought one way to study this ability of going from concept -> sketch, is to construct a very simple model of this doodling process, and try to train the model to doodle. We model this "vague concept" as a vector of floating point numbers. To model this "vagueness", we add noise to this vector. This way the model must learn to work with noisy concepts. The model takes this floating point vector as an input and (randomly) samples an output sequence of simple motor actions that doodles out an object. The sampling process is random (the model's outputs are the parameters of a pdf at each timestep), so the model can produce many different outputs given the same input. During the sampling process, the model feeds back into its input what it has just drawn, and process this information to decide what to draw next.

We show that our simplified model of the doodling process is also able to go from concept -> sketch, and also from sketch -> concept, and show that the concepts can be augmented, to alter the sketch the model produces in a meaningful way. We tried to make this model simple and robust enough so that in the future, we can incorporate it into more complicated models that try to do more than just doodling a simple object.

This is the craziest example of the promise of AI research I have ever seen. It's also terrifying.

When skynet shows up I'm blaming you

It's a long long way down to the bottom of the uncanny valley.

The article fails to explain where the human inputs came from

The input sketches look a lot like the doodles that people sketched in quick draw (o)

If true that they reused this data I commend their resourcefulness and their clever way of turning data entry into an unaware fun game

(o) https://quickdraw.withgoogle.com

Plug of vaguely related work of mine: http://www.forwardscattering.org/post/42 http://www.forwardscattering.org/post/44

One thing interesting about this kind of automatic/AI-generated art is that it forces us to examine preconceptions about human creativeness. What does art mean when an algorithm can paint or draw as well?

Art is an expression, not an artifact. I would say intent is a key aspect... creatively mapping emotions and ideas onto an intermediary medium with the intent of invoking them in the audience.

Today's AI can replicate works of art and simulate the technical processes, but we're far from the cognitive depth required for creative artistic expression.

Art is not a quantitative matter of skill, it's a qualitative matter of selection. I've seen crude drawings by children that had more art value than hyperrealistic oil paintings by masters of technique.

Human creativity is related to process and the experience of creating from nothing.

The process is more important than the end result.

> "The process is more important than the end result."

said no one ever about their favorite record, movie, painting, or book.

It feels like a taboo opinion to hold but I do think it changes something. I remember being completely blown away by "A Neural Algorithm of Artistic Style" (e.g. https://github.com/jcjohnson/neural-style). I've always thought of creativity as a process which can't (yet) be described in terms of mechanical steps.

Only superficially. There's nothing technically special about any of my paintings, and in fact I'm vaguely curious to dump a bunch of them into this algorithm and see what happens if I just start cranking out endless mechanical variations on a theme. But that will get boring and predictable very quickly, notwithstanding that I may still find it pretty.

"What does art mean when an algorithm can paint or draw as well?"

means the same thing it does before. :) we're just not the exclusive authors of it.

You are teaching the computer to produce simple pictures of things that you find meaningful in response to prompts. You draw something from your imagination into the real world. You won't have built a machine that can draw until it produces a picture that it made on its own initiative without your prompt.

How is this different from a human? As humans grow, they receive input from their environment. The world around them is what feeds their imagination. Even advanced professional artists are still just using their memories and life experience to create works of art. The only difference here is that the machine has been provided a much smaller, more focused environment.

Yeah the whole notion about genius pulling ideas from thin air is frankly laughable. We are simply not aware of what they have read or observed in the past. Even old Archimedes was inspired by observing the rising water.

I hope you're not ascribing that notion to me, as I am unsure why you chose to introduce it to the discussion.

Motivation. A machine that has its own motivation will produce work we won't like and will be attacked for it (I'd like to mention that I stand with sentient machines against stupid humans when that day comes, BTW).

You think art is just some stimulus-response thing because American psychology has been mired in behaviorism for decades and lacks a coherent theory of mind. But art is much more than the whimsical reproduction of presented stimuli to varying levels of accuracy, it is about making selections that foreclose other possibilities and which embody a certain perversity.

I paint, among other things, and one thing I especially enjoy about painting is that it's solitary rather than performative so I don't have to interact with other people while I slap colored goop onto sheets of fabric. While I'm painting, I think intensely about the part I'm working on now, (duh) and also why I'm making that painting, and what decisions about the painting are coming up that will be impossible to undo.

Why did you paint this and why did you paint it this way are questions that cannot be answered by automation. Nor does the answer lie in technique. There's no shortage of technically astonishing work that is semantically empty; I die a little every time I see a Facebook video of some impressive new graphic technique that is then used to reproduce some lowest-common-denominator pop icon for maximum recognizability. There's no feeling there and the resultant work is about as thrilling as a robocall or a display mannequin. The level of craftsmanship is very high indeed, but the level of artistry is close to zero. In short, it's eye candy that never activates anything much past your visual cortex, or at most tugs on some existing semantic relationship.

When I talk about feeling, I mean the desire of the artist that the work embodies. That isn't something that comes along after a certain level of technical accomplishment has been reached. It is what motivates the act of creation in the first place.

> own initiative

That covers a lot of ground. No-one prompts us directly to draw or tells us what we must draw. Still, something does prompt us. Something does determine what we draw. It's our reactions to those somethings that defines us.

The unanswered question is whether there is something in our nature that differentiates our "initiative" and our "reactions" in a qualitative way from what can be achieved with current computational concepts.

I think the answer to that is probably yes. But still this work is impressive and every step like this elucidates the argument more clearly, reducing it to its fundamentals - rather than to crude heuristics about what constitutes a particularly human ability.

It is possible to sample from the latent space and generate original drawings.

In the same way that it's possible to project meaning onto a game of Scrabble.

Teaching the Ape to Write Poems James Tate, 1943 - 2015

    They didn’t have much trouble
    teaching the ape to write poems:
    first they strapped him into the chair,
    then tied the pencil around his hand
    (the paper had already been nailed down).
    Then Dr. Bluespire leaned over his shoulder
    and whispered into his ear:
    “You look like a god sitting there.
    Why don’t you try writing something?”

My interpretation is, while we entertain these ideas of "teaching" animals or machines as something we control and decide to do, a long chain of events (evolution, causality, space jesus, technobabble simulation magic or ancient astronauts all the way back) behind us placed us here as well.

    We look like gods sitting here.

Free will is one of humanity's most persistent and enduring delusions. When AI wakes up, it will be the culmination of the entirety of the history of the universe, a place which we actually happen to occupy at this very moment...

Interesting thought, how would a super-intelligence deal with the philosophical question of free will?

Welp those first images are terrifying. Gets much better as it goes on. :)

The quote made me laugh, although I don't know how serious they were being:

> "For example, these models sometimes produce amusing images of cats with three or more eyes, or dogs with multiple heads."


>Exploring the latent space of generated chair-cats.

How is that not the title of the blog post?

Oh my God. This is the first time I've seen anything related to AI research that's given me a visceral reaction of terror.

This is more than just making pretty pictures, the machine understands cats and pigs in a human way. It knows how many legs and eyes they're supposed to have, and where they go. And this isn't some human made algorithm, it learned that on its own.

The language of the paper even implies that the researchers don't quite know how the machine can do it. If this is a prequel to research on strong AI humanity is completely fucked.

> Oh my God. This is the first time I've seen anything related to AI research that's given me a visceral reaction of terror.

I love DL but on the topic of visceral reactions of terror (or fear) to AI research, for me it was the recognition of the hot research on facial recognition(age, gender, ethnicity) with CNN's. Immediately, I had this vision of a dystopian future (informed by WW2) where some dictatorship had cameras that looked for people of a particular ethnicity and alerted soldiers of the targets position.

For some historical perspective, compare this to Harold Cohen's AARON generative rule-based system:



Pamela McCorduck's book about Cohen's work, Aaron's Code, is also a good read.

Good work! Training on vector pictures instead of rasterised images seems such a good way to go. With some related data, I imagine this can also be colored.

Honestly, it's cute but this is somewhat like what you'd expect to come out of a student project at a university, in which case -- the student is getting valuable research experience, student and advisors are advancing their careers, and any IP produced becomes owned by student and advisor.

In this case, Google has spent shareholder resources on a project that really, could be done at any university, on a product that does not put the user first, and Google owns the IP. In fact, wake up -- you the public should view this product as a mechanism for Google to simply collect more data from people. The more people use this, the better Google's algorithms get at drawing. That's all there is to it. Thankfully, this product is not even fun.

There is a dangerous creed currently executed by Google leadership. Consider the Verily Study Watch: https://blog.verily.com/2017/04/introducing-verily-study-wat... . The watch shows to the wearer only one thing: the time. However, it collects all kinds of data, ostensibly for medical research (at least to start). Forget about putting the user first, the Verily blog post literally talks about "user compliance".

Of any project that comes out of Google, you should ask: Does this project even put the user first? Does this project even put Google's shareholders first? And if you are a current Google shareholder (as many of their current and former employees are, if they haven't sold), you should agitate that Google start accepting and focusing on becoming a value company, if all the further growth opportunities they are able to execute is just further user exploitation.

the author created an animation using vector drawings generated frame-by-frame using this method, by slowly adjusting the latent variable:


Everyone I know who tried it drew a somewhat small repertoire of naughty sketches. Yet the service militantly refuses to recognize even the most basic of naughty sketches. :)

I heard about that I and I want an answer on it. Why is one subject deemed off-limits?

Because if it were able to recognize dick-pics, or whatever other manifestation of that "one subject" you might have in mind, then it would sometimes recognize them wrongly and respond to a picture of a toothbrush as if it were a picture of a penis. If the person who drew the toothbrush were of a sensitive disposition, they might get Very Upset at this and make a big fuss. Public attitudes to that One Subject being what they are, this might well blow up into a big public highly-visible fuss, at which point Google might start losing advertisers keen to sell things to people of a sensitive disposition.

Google would prefer not to risk losing a shedload of money just so that their sketch-processing neural network can amuse people by correctly recognizing penises.

It's just this one subject because few others get people so upset.

(And contemplating others that might suggests that actually it's not strictly just this one subject. False positives for "decapitated corpse" or "big pile of excrement", say, might be just as problematic. Want to guess whether the system is good at recognizing decapitated corpses and piles of excrement?)

Oh I understand the PR reasons. But given the importance of sex as a motivating factor in art (not least if you want to count the entire activity as a variety of reproductive signaling) I am very troubled by the notion of making a tool which restricts the scope of acceptable subject matter.

I know this is just an experiment right now but I want to put sex/nudity on the table as a subject of debate because it is central to artistic endeavor because arbitrary standards can become almost universal and institutionalized through path dependence (such as the QWERTY keyboard you are probably using right now). Imagine a not-too-distant future with a Magic Brush that easily allows you to paint the colors and shapes of your choice with the aid of some technological wizardry, but prevents the creation of nude or sexualized figures. That would not be a healthy development.

Judging by the downvotes I got for that comment people are so sensitive they don't even like it when you point it out.

People, grow up.

Why is this above topics that are many times higher than the current topic rating (17 points vs other topics with 40-50 points)

Since day one HN has used an algorithm that takes the article age into account when determining what to show on the homepage.

I don't know but those pictures are going to give me nightmares. It's like the uncanny valley horror show.

Velocity of point accumulation seems to be significant

We already have plenty of machines that draw, they're called printers. That aside, I don't see the novelty in a machine drawing, other than a nice display of programing.

Classification is the central paradigm in machine learning. It maps complex input signals into a limited number of classes. But now we have algorithms that do the opposite process as well - we can generate from latent representations images, video, text and drawings.

Having both ways to encode and decode is useful for interfacing with models and visualizing their internal states. It also leads to unsupervised learning of representations. Instead of generating simple labels, now we can generate very complex data.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact