
How we built the Waifu Vending Machine - gwern
https://waifulabs.com/blog/ax
======
weeb_throwaway
I am trying out the app at [https://waifulabs.com/](https://waifulabs.com/)
and the art style is kind of one-note. Most of the expressions are the same
and the face shape skews towards loli.

I am more into "disgusted anime girl that looks at you like you're trash" type
and I couldn't find a waifu (even with their refinement steps).

Really impressed that this is even possible though!

~~~
b_tterc_p
I've played with it a few times. It seems like they have you choose features
in the space in the order of base -> palette -> art style (loosely) -> pose,
but have locked some emotion controlling vector to be happy. Probably a
reasonable step for their audience.

~~~
meruru
I think a large majority of anime art have happy expressions, so it might not
have been anything that they had to do.

Edit: "disgusted anime girl that looks at you like you're trash" is the theme
of a recent book that got adapted into anime, so it's a bit of a fad
currently. I thought that was worth mentioning.

------
9nGQluzmnq3M
The tech is interesting, but the concept and naming is pretty creepy. _Waifu_
, from "wife", is anime slang for female cartoon characters that people get
romantically attracted to.

[https://www.dictionary.com/e/fictional-
characters/waifu/](https://www.dictionary.com/e/fictional-characters/waifu/)

~~~
flor1s
I guess it's creepy in a similar way as people shooting other people virtually
(in shooter video games), though a making a waifu "vending machine" adds
another angle of objectification to it.

~~~
benatkin
I don't think the point is to objectify. I think it's the opposite. It's
turning objects into something that satisfies yearnings for human company,
sometimes replacing it and sometimes in addition to it. It's like listening to
music often because you want to hear human voices. And one big name, Hatsune
Miku, is a virtual pop star with a synthesized voice, which is based on female
human voices, even though it isn't based on a single human voice.

As globalization moves on there are more and more people without romantic
partners and/or close friends nearby and they'll use their imagination to
fulfill desires they're missing.

~~~
hatsunearu
Miku is absolutely based on one and only one person: Fujita Saki. Though the
two don't sound like each other.

Also, just cause you find it creepy doesn't mean it's reciprocated by everyone
else in the world. Besides, the usage in that booth was very tongue in cheek.

~~~
meruru
The creepy thing is funny because nerds fawning over stylized drawings is just
about the most innocuous thing there is

~~~
9nGQluzmnq3M
Until things like this happen:

[https://en.wikipedia.org/wiki/Tsutomu_Miyazaki](https://en.wikipedia.org/wiki/Tsutomu_Miyazaki)

~~~
mrpara
Ah, yes, surely this man killed and raped little girls because he was
influenced by cartoons, and not because he was born with a deformity into one
of the most collectivist societies on earth and then ostracized for his entire
life. We're so lucky that sexually normative people never rape or kill anyone.

Sarcasm aside, this is one of the many, many examples of choosing a scapegoat
to frame an entire sexuality, race, or any group of people with a common
interest as evil while completely ignoring any and all context. People are not
animals and possess some degree of responsibility and the ability to tell
reality from fiction. Unless someone presents some hard evidence that stylized
drawings lead to actual attacks against real children (and to my knowledge,
this simply is not true; in fact, it's easy to argue the opposite) we need to
stop with this puritan outrage like we stopped blaming computer games for any
and all violent crime back in the late 90s.

~~~
9nGQluzmnq3M
Yes, it's a textbook example of moral panic and the WP article says so. I was
primarily pointing out that to most people, especially in Japan, the image of
"nerds fawning over stylized drawings" sounds about as innocuous as "Catholic
priests fawning over little choir boys".

------
gwern
Also:
[https://www.reddit.com/r/MachineLearning/comments/ch0qms/p_d...](https://www.reddit.com/r/MachineLearning/comments/ch0qms/p_decomposing_latent_space_to_generate_custom/)

~~~
hanniabu
Looks like you're the creator so as some feedback it'd be nice to see a simple
outline of the steps with "inactive" styling, the step you're currently on as
bold (maybe with an arrow in from of it?), and as you go through it the
previous steps have a checkmark appear next to it (and possibly go back to
"inactive" styling).

The steps could be something like this: (1) Select character, (2) Select
colors, (3) Select outfit, (4) Select pose

The reason I suggest this is I was a little confused on what the options were,
what my future options would be, and how many options are left.

~~~
gwern
To clarify, I'm not the creator.

Sizigi Studios uses a dataset I put together
([https://www.gwern.net/Danbooru2018](https://www.gwern.net/Danbooru2018)) as
their primary training corpus, AFAIK, and they were partially inspired by my
application of GANs to anime art (as demonstrated on
[https://www.thiswaifudoesnotexist.net/](https://www.thiswaifudoesnotexist.net/)
and see [https://www.gwern.net/Faces](https://www.gwern.net/Faces) for much
more detail about every aspect of it), but I have never been involved with
them and don't know much about what they've done other than what you can read
in OP. It tickles me pink to see people following up on my anime GANs, though,
especially as a startup! The Great Work goes on.

~~~
kevinfrans
One of the creators here -- gathering the Danbooru data in one place was
definitely a big help! Anime's been a pretty nice space (both personally and
research-wise) especially due to the abundance of data online, and it's great
that we no longer have to manually scrape image hosts (which I've spent many
hours doing in the past @
[https://github.com/kvfrans/deepcolor](https://github.com/kvfrans/deepcolor),
[https://canvasdrawer.autodeskresearch.com/](https://canvasdrawer.autodeskresearch.com/)
etc).

------
userbinator
Random semi-useful idea: use an SSH public key as input, giving a very
memorable image for verification.

I could also see something like this having applications in
[https://en.wikipedia.org/wiki/Identicon](https://en.wikipedia.org/wiki/Identicon)
generation.

~~~
fragmede
that's exactly (in ascii, that is) the idea behind randomart, eg

    
    
        The key fingerprint is:
        SHA256:s6N0OwlTDKjDez98kZRwUGZbTYaQUArv+EYC6sigFwA ben@eshwil
        The key's randomart image is:
        +---[RSA 2048]----+
        |E   ..o=*o.+o    |
        |.   .oo+oo...    |
        |....  o=..       |
        | o+. o  =        |
        |o .oo ooS.       |
        |* ...+o oo       |
        |oo.. o+o+o       |
        | .   o+o+o       |
        |      .o..       |
        +----[SHA256]-----+
    

(from [https://blog.benjojo.co.uk/post/ssh-randomart-how-does-it-
wo...](https://blog.benjojo.co.uk/post/ssh-randomart-how-does-it-work-art) )

~~~
meruru
Why don't we combine those ideas and use ascii waifus?

(yes, that's a thing)

------
swsieber
I'm not sure how you're shipping your posters, but here's a tip:

When shipping posters, use triangle tubes, not circular tubes, it saves you
money.

[0]
[https://www.linkedin.com/feed/update/urn:li:activity:6549769...](https://www.linkedin.com/feed/update/urn:li:activity:6549769659112054784/)

~~~
p1mrx
I suggest using the money saved to include some chocolate, so people aren't
disappointed.

------
mdorazio
I'm actually really surprised no one did this sooner. I also wish they had
posted revenue figures for the two days.

To other people doing this in the future: bring (or order) a fat battery pack
with an AC outlet for $100 so you don't have to keep swapping laptops and can
use a mobile hotspot all day.

~~~
chendragon
It looks like they did have these. In one of their pictures there was a stack
of Anker PowerHouse power banks/battery packs.

~~~
kevinfrans
Yeah the two batteries we lugged down to AX ended up saving the day a couple
of times!

------
b_tterc_p
I would love to replicate this. It looks like the dataset is open source.
[https://www.gwern.net/Danbooru2018](https://www.gwern.net/Danbooru2018)

I don't have a sense for hardware requirements though. Does anyone have a good
idea of how much time and money it would take to train such a model?

~~~
gwern
I'm glad you asked:
[https://www.gwern.net/Faces#compute](https://www.gwern.net/Faces#compute)
Figure a few GPU-weeks and a few hundred dollars if you want to go from
scratch.

~~~
weeb_throwaway
I saw the section about not training from scratch (via transfer learning) in
[https://www.gwern.net/Faces#transfer-
learning](https://www.gwern.net/Faces#transfer-learning). The Holo example is
really impressive!

How expensive is it in terms of labelled data and compute? Do you know if
anyone tried this for just ahegao faces?

~~~
gwern
> How expensive is it in terms of labelled data and compute?

All the stuff you see in that page (except the BigGAN ones) is unconditional,
no labels. You just dump the images in and it figures it out. StyleGAN does
support labels via one-hot embedding as I understand it, but I don't know how
to use it so none of my experiments use it. A few people have mentioned or
used it, but there's no good documentation about how to make it work, so...
For unconditional samples, it depends on how many you have and how different
they are. You can see in the various examples transfer learning with a few
hundred to a few thousand (with and without data augmentation).

> Do you know if anyone tried this for just ahegao faces?

It's funny you ask that because I was corresponding with an anonymous who was
using it for just that (and ball gags). He'd run into some issues with the
encoder/editing functionality and wanted advice, but the regular transfer
learning worked fine. He'd compiled a small dataset of a few hundred to a few
thousand examples on his own, and it worked disturbingly well: sufficiently so
I didn't want to write it up. (I try to keep my site SFW.)

------
kevinfrans
One of the creators here, the team at Sizigi and I are glad to answer any
questions!

~~~
ve55
Great work! Do you have any thoughts to share on the future of this area?
Anything specific with this project, future projects you might work on, or
just the idea of profiting from AI-generated art to begin with?

~~~
liuru
Nothing specific so far. We did this because we thought it would be very cool
:)

------
meruru
All that drawing and character design practice for nothing I guess.

~~~
jchw
These things are pretty damn impressive, but I would guess like self driving
cars, we’re pretty far away from them displacing humans at the same task. It
does seem like technology in this vein could be used to help the creative
process, though on that note it’s only as good as its data set, which is of
course something a human has to handle for now.

Even if robots replace human illustration in short order, it will probably
never stop being a fun hobby, and I imagine neural networks were bound to be
at least involved in the process at some point. People still draw on paper
even though it’s hard to argue against the benefits of modern digital drawing.

~~~
meruru
>I would guess like self driving cars, we’re pretty far away from them
displacing humans at the same task

That does NOT make me happy. Self-driving cars are just around the corner!

------
sb057
PSA: I get a horrific 5 GB< memory leak immediately upon opening the tool's
page.

------
FrozenVoid
Neat but lacks variety(all same pose and template) and too little steps to
select. Resolution of final image is too low. I think same thing could be done
with human images, if you can make it 7-10 selection steps(to pinpoint more
fine features).

------
Causality1
Reminds me of the Deep Learning work from StyleGAN.

[https://twitter.com/_Ryobot/status/1096565388165300225](https://twitter.com/_Ryobot/status/1096565388165300225)

~~~
gwern
One of the inspirations:
[https://news.ycombinator.com/item?id=20511824](https://news.ycombinator.com/item?id=20511824)

------
bdon
Neato. What are the copyright implications of commercializing this though?

~~~
gwern
Should be safe, modulo issues about software patents:
[https://www.gwern.net/Faces#faq](https://www.gwern.net/Faces#faq)

------
mc32
Lobster _is_ the neue Comic Sans... although maybe it’s just carrying over the
tongue in cheek cheese further.

------
tofof
The dataset this is built from
([https://www.gwern.net/Danbooru2018](https://www.gwern.net/Danbooru2018)) is,
simply put, copyright-infringing on a gross scale. The vast majority of the
images uploaded to 'boorus' completely lack a compatible license or the
artist's express consent. The redistribution via torrent of 2.5 TB of some 3
million images only compounds this problem. None of this is ameliorated by the
$20 'generosity' of the dataset creator.

As a result, every single artist whose work was included in that dataset has a
clear, meaningful claim that each and every 'waifu' sold ($20, if customized,
or $5 if random) by Sizigi Studios is an infringing derivative work. Coupled
with at least one of the project authors' ready admissions -- in this very
comment section -- of scraping image sites himself, I would say that this team
is playing with fire. Even in the case that an algorithm's output is somehow
found to be 'creative' rather than mechanistic, AND this specific application
is found to be in all cases substantially transformative, there's STILL the
original massive 2.5 TB of copyright infringement up front to deal with.

All an enterprising lawyer would need to begin is to search the BigQuery
metadata for the 'artist' and 'copyright' tags on these images. Note of course
that the 'copyright' tag is widely misused on boorus and similar image
repositories to refer to the inspiring franchise; 'trademark' would be much
more accurate descriptor.

EDIT: I do not mean to suggest that litigation from the use of the dataset in
this ML (as opposed to the original, clearly infringing, download &
redistribution) would in any way be an easy, one-sided case --- only that this
scenario would represent nearly the worst possible test case imaginable for
determining the future legality of ML, short of directly antagonizing the RIAA
or MPAA.

~~~
lawrenceyan
The model they use is simply a large matrix of numbers. What copyright
infringement is there?

~~~
throwaway99111
Can one make a one-to-one correspondence of such numbers to a copyrighted work
of art?

~~~
lawrenceyan
The way these types of networks function results in the generation of wholly
unique, never before seen images. I guarantee you will never find a single
copyrighted work of art that is generated by a model like this.

~~~
throwaway99111
It seems then that the closest analog is the network is a derived work then.

~~~
DuskStar
Arguably it's a derived work in the same way that a person developing a taste
for art after looking at thousands of images and then going and painting their
own original creation is.

------
a2tech
..ick

------
thekevan
I don't care about the how, why the hell would you?

~~~
rootsudo
This is normal per course for anime subculture today.

------
garbre
Frankly, I refuse to believe so many people on HN have the kind of scruples
about or lack of exposure to this part of AmerOtaku culture. As such, I
believe many of the comments are simply 2nd-degree trolling.

Also, the post title is misspelled. It's "building", not "builing".

~~~
weeb_throwaway
I think I refused to believe it too until reddit, a San Francisco company,
started banning people for posting fictional high school anime characters in
bikinis (not even nudes).

~~~
FearNotDaniel
I'm not quite sure what point you're trying to make here. But it sounds like
you're trying to say the sexual objectification of children is okay, as long
as they are fictional children with a tiny amount of body covering. Is that
what you meant, or did you mean something else?

~~~
claudiawerner
What do we mean by "okay"? How do we differentiate virtual murder (for
example) from virtual pedophilia in a sufficiently rigorous way? You can say
that rigor isn't required beacuse Reddit can make whatever choices they like,
but it's not at all clear that the necessary connection for sexual
objectification _of children_ is made when these images are posted - that
presupposes that the viewer sees the image and real children in the same
light, which current evidence gathered of Japanese fan communities does not
support (see Galbraith and McLelland's work on this). This is why researchers
in the field are sometimes skeptical about calling this material "child
pornography".

Morally, one can differentiate between virtual murder and virtual pedophilia
and condemn virtual pedophilia while consistently enjoying games and other
media depicting murder - but as Gary Young pointed out in his piece on the
Gamer's Dilemma, it requires us to accept moral relativism.

------
hmahncke
Walk up to the station, and you'll be greeted with a quick array of girls.
After each step, the booth narrows your choices -- eventually leading to a
final screen, where you can "adopt" the girl on the spot.

Your scientists were so preoccupied with whether or not they could, they
didn't stop to think if they should.

~~~
meruru
The world isn't going to end because of war or famine or anything like that,
but because humans will be too infatuated with their artificial partners to
bother reproducing.

~~~
judge2020
Not even an artificial problem:
[https://www.theguardian.com/world/2018/dec/27/japan-
shrinkin...](https://www.theguardian.com/world/2018/dec/27/japan-shrinking-as-
birthrate-falls-to-lowest-level-in-history)

~~~
jamesknelson
I want to point out that the linked article is hyperbolic (as most “Japanese
people aren’t having kids/sex/etc.” articles are), and its title is outright
misleading.

It claims that Japan has the lowest birth rate ever. And while it may have the
lowest _number_ of births, that is to be expected in any country with a birth
rate below replacement - I.e basically the entire developed world.

In fact, as opposed to the US, UK, Canada, Australia and New Zealand, Japan’s
fertility rate has actually been _increasing_ for some years now. See details
here:
[https://fred.stlouisfed.org/series/SPDYNTFRTINJPN](https://fred.stlouisfed.org/series/SPDYNTFRTINJPN)

I really wish this meme would die, but it obviously drives clicks so people
keep publishing it.

~~~
meruru
Thanks for this.

It seems to have started going up in 2005. That's really interesting and
unexpected. Has anyone proposed an explanation? I know they've been trying to
encourage people to form families for a while. Maybe some of their measures
have worked?

------
karanlyons
A lot of these look like children. Nowhere in the article is it mentioned that
a lot of these look like children. No one in the comments has brought up that
a lot of these look like children. A lot of these look like children.

~~~
chrischen
You must be new to Japanese culture.

~~~
azernik
Note that this isn't actual Japanese culture; it's made by an SF-based game
studio [[https://sizigistudios.com/](https://sizigistudios.com/)] for an expo
in LA [[http://www.anime-expo.org/](http://www.anime-expo.org/)].

~~~
GolDDranks
However, much of the training data is likely to be of Japanese origin.

