Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: This Food Does Not Exist (nyx-ai.github.io)
248 points by MasterScrat on July 20, 2022 | hide | past | favorite | 172 comments



OP: Forgive me if this is out of place. Also, please know that my question is genuine, not at all a reflection on the author/their project, and most certainly born out of my own ignorance:

Why are these kinds of things impressive?

I think part of my issue is that I don't really "get" these ML projects ("This X does not exist" or perhaps ML in general).

My understanding is that, in layman's terms, workers are shown many, many examples of X and then are asked to "draw"/create X, which they then do. The corollary I can think of is if I were to draw over and over for a billion, billion years and each time a drawing "failed" to capture the essence of a prompt (as deemed by some outside entity), both my drawing, and my memory of it was erased. At the end of that time, my skill in drawing X would be amazing.

_If_ that understanding is correct, it would seem unimpressive? It's not as though I can pass a prompt of "cookie" to an untrained generator and, it pops out a drawing of one. And likewise, any cookie "drawing" generated by a trained model is simply an amalgam of every example cookie.

What am I missing?


For the longest time it was assumed that creativity was an almost magically human trait. The fact that somebody can, with a straight face, say "I don't get why it is impressive, I could draw these images too" is actually indicative of the wild change that has occurred over these last couple years.

I guess it is true that more than a couple demos like this have been shown, so some of the awe might have worn off, but it is still pretty shocking to lots of us that you can describe the general idea of something to a computer and it can figure out and produce "what you mean," fuzzy as that is.


> For the longest time it was assumed that creativity was an almost magically human trait. The fact that somebody can, with a straight face, say "I don't get why it is impressive, I could draw these images too" is actually indicative of the wild change that has occurred over these last couple years.

It's not creativity though. It's a program that arranges pixels in a way that is statistically similar to some training data set. It doesn't "draw" anything, it doesn't "figure out" anything. There is no thought or idea behind it.

The output is mildly interesting but there is no creative act at work, and there's certainly no revolution in the artistic world.

> For the longest time it was assumed that creativity was an almost magically human trait.

Maybe by some people. This is a human-centric perspective, a form of speciesm if I can call it that. Various primates have shown creativity, and various animals have shown the ability to solve problems creatively. Heck, even my cat figured out how to open doors by pulling down on the handle. Humans are likely not more creative than other animals with similar brain size, it's just that there's other factors at play that make it seem like that (such as opposable thumbs and the passing down of knowledge between generations using speech).


> It's not creativity though. It's a program that arranges pixels in a way that is statistically similar to some training data set. It doesn't "draw" anything

that's the exact argument made against chess engines - are they "really" playing chess?

What's to say that the brain works different than people imagine creativity to be? That we think we are creative might be an illusion, because the brain is tricking you into thinking that you creatively came up with an original idea, when the reality is that the idea came from a long list of training data that one might've been exposed to all his life.

And who's to say that brute force, and statistical methods of producing content is not creative?


> This is a human-centric perspective [...]

So is your perspective :) You essentially take a process we don't understand and can't quantify and say "because I can understand this, this can't be it". You don't know, claiming otherwise is disingenuous.


:) Yes, I haven't defined "creativity" and I haven't quantified anything. My assertion was empirical, based on the observation that what society usually considers creative comes into being due to a logical train of thought, such as "if X, then what if Y", or an impulse to do something that hasn't been done before. Imitation on its own would not be considered particularly creative.

Yes, you are right, I could be wrong, perhaps this form of statistical imitation is at the core of greater creativity, and not another hype wave in software development. Time will tell.


> It's not creativity though. It's a program that arranges pixels in a way that is statistically similar to some training data set. It doesn't "draw" anything, it doesn't "figure out" anything. There is no thought or idea behind it.

I think this is the crux of my thought on the subject. Thank you.

I absolutely see the kinds of uses it can have (guiding the repair of artwork that was damaged, seeing patterns in huge datasets, etc.), but I think the way it's marketed is that it's somehow coming up with new art, but it's really just "recreating" (on a very micro level) everything in the training set.

You could absolutely argue that creatives are just regurgitating things they've seen before, but I think the big separation between ML and human creativity is that it crosses "genres", if you will. For example, I could be influenced by my experience in a car crash in such a way that it causes me to create X (art, software, music, etc.) in such an abstract way, I'm not sure whether it's even possible to recreate artificially.


I agree with you but I can't get past this:

>> Heck, even my cat figured out how to open doors by pulling down on the handle.

... your cat can reach the doorknob? Maybe you should make sure to lock the door? O.o


It was a long time ago. It wasn't a knob, it was a handle. He would jump on it, his weight would pull down the handle, and then he would use his right leg to push against the frame and open the door. Not very graceful but it got the job done. He was a very smart cat.


Sorry- English is not my first language so I thought "knob" and "handle" can be used interchangeably.

Anyway that's a smart cat.


> The fact that somebody can, with a straight face, say ...

To be clear, I'm not trying to devalue this at all; In fact, as I noted above, I am certain I'm missing something and that was what my comment was aimed at. In any case, thank you for taking the time to reply (seriously).


Probably expression "with a straight face" has been used sarcastically too often, so maybe it looks sarcastic in my comment too. In this case I should have picked a phrase more unambiguous phrase. I wasn't using it sarcastically or anything, "with a straight face" = in good faith/honest in this case.


I will say that the images included have not show to be particularly creative, unless I missed a wider galaxy of non-existent food items. It's not entirely convincing that the generated images aren't just glued together pieces of other images with some fading between them.


Regardless of what people think as impressive, or creative, or sentient, or intelligent, there is one thing that cannot be disputed: stock photos have value in the current economy. There are people being paid to produce these stock photos. And it now looks like AI can do it for cheaper. That's the most important thing to realize imo, is that the value of human production is getting lower and lower relative to AI.


And think about on demand ML, where a stock photo on a website might change based on characteristics it knows of you.

Let's say a website knows you love chocolate cake, well the website is now going to show you a chocolate cake instead and has generated a unique image of that for you.


I think the real value is that it can combine items. I can easily find a stock image of a cake. I can probably find a stock image of a circular saw. But a decent stock image of a cake in the shape of a circular saw maybe doesn't exist anywhere. But this could generate several for me to choose from.

And then there's the "in the style of Salvador Dali" aspect.


Even for something simple like a cake, if you want to use that image and perhaps license it exclusively so no one else has it, that would cost you a lot more than an ML generated cake photo.

Even just an image of whatever with a very specific color would be a huge benefit for designers. Because almost all stock photography is trying to appeal to the broadest audience as possible, they all have a similar muted pallette of colors. Which makes stock photography kind of look like stock photography. ML generated images or even ML manipulated images could change that.


Correct. But I just see simplistic pictures that could just be copied from any stock. And maybee they are(from the software). They don't demnostrated what you say. It would be really impressive to see exactly this!


"AI" can do this for cheaper because it the model is built based on databases of stolen material. did they get any permission from all the people involved in creating the reference photos?


The process is not important, the idea is that a computer is able to generate new, never seen before, content that a human can understand.

This allows for incredible things, like having a world-class artist create a song drawing based on your prompt (i.e. DALL-E) in one second.


It's a computer. That's the impressive bit.


I wonder how close the nearest match from the training data is. Was there a cheesecake that looked almost like these generated images?


Maybe the ML model effectively implements a lossy image database with minor randomization. :)


Since GANs are effectively one class of denoising auto-encoders, your summary is spot-on. This type of ML model learns to effectively compress and decompress natural images by representing it as a hierarchy of convolutional features = shape templates.


I don't think it's accurate at all to characterize GANs as denoising auto-encoders, they're not even superficially similar, unless you're talking about a very specific architecture of autoencoder-based GANs like AAEs.


The part of a GAN that people use to synthesize images has to be a denoising decoder. You put in 1x1px high-dimensional noise as the embedding and it'll gradually upscale and turn that noise into the 3-dimensional output image.

The part of a GAN that people use for control is based on an encoder. You put in your 3-dimensional source image and it gets converted to that 1x1px high-dimensional noise in such a way that putting the noise into the encoder will produce your image again.

So the network architecture/structure of a GAN is the exact same structure as you'd use for a denoising autoencoder. You might train that GAN with a different intent, but it's still a stack of upscalers and convolutions that perfectly matches the architecture inside the decoder part of a denoising auto-encoder.

BTW sometimes people also explicitly call this out. The VQ part in VQGAN stands for VQVAE, the "Vector-quantized Variational AutoEncoder". And VQGAN+CLIP is what the first open source DALL-E clones were based on.


I thought that was the comparison you were drawing but it's misguided.

The denoising part of a denoising autoencoder refers to the noise applied to its input, the image, not to its latent space as would be the case in the equivalency you establish between the noise distribution used as the input of a GAN's generator and the latent space of an auto-encoder (input of the decoder).

Making a comparison between the generator and a decoder and their respective input spaces has some legs but not much so. An AE's latent space is constrained in an entirely different way with a regularization term that tries to minimize the divergence between the wanted distribution and the learned distribution, that's not a problem you have with a GAN since you simply draw from your wanted distribution every time. These differences lead to learned distributions in the latent space that are entirely different, with different failure modes wrt generalization and interpolation.

Beyond that the comparison just doesn't work, yes there are two networks but the discriminator doesn't play the role of the AE's encoder at all. In addition the learning signal is not the reconstruction of the input but the classification of the generated image. That's a completely different learning process, the learned distribution is different, the gradient path is different, the training is different (much more complex for a GAN), etc... I could go on.

The VQGAN would fall under the clause in my initial comment about GANs which specifically use an AE in their very specific architecture, like the AAE. In the AAE's case they use a discriminator to regularize the latent space of an AE(instead of classifying the generated image), in the VQGAN they use a full autoencoder as generator, in addition to a third network for the discriminator. These do not fit the comparison you're making and do not function in general like most GANs do. So the general comparison of GANs to autoencoders remains inaccurate.


> The denoising part of a denoising autoencoder refers to the noise applied to its input

Agree, it converts a noisy image to a denoised image. But the odd thing is, when you put a noisy image into a StyleGAN2 encoder, you get latents which the decoder will turn into a de-noised image. So in practical use, you can take a trained StyleGAN2 encoder/decoder pair and use it as if it was a denoiser. For example https://arxiv.org/abs/2103.04192

> These differences lead to learned distributions in the latent space that are entirely different

I also agree there. The training for a denoising auto-encoder and for a GAN network is different, leading to different distributions which are sampled for generating the images. But the architecture is still very similar, meaning the limits of what can be learned should be the same.

> Beyond that the comparison just doesn't work, yes there are two networks but the discriminator doesn't play the role of the AE's encoder at all

Yes, the discriminator in a GAN won't work like an encoder. But if you look at how StyleGAN 1/2 are used in practice, people combine it with a so-called "projection", which is effectively an encoder to convert images to latents. So people use a pipeline of "image to latent encoder" + "latent to image decoder".

That whole pipeline is very similar to an auto-encoder. For example, here's an NVIDIA paper about how they round-trip from image to latent to image with StyleGAN: https://arxiv.org/abs/1912.04958 My interpretation of what they did in that paper is that they effectively trained a StyleGAN-like model with the image L2 loss typically used for training a denoising auto-encoder.


100% It would be interesting to get an input as image and get the closest prompt that generates that image. Later we could run the prompt to get the image back.

That could effectively be a lossy compression of the image, exchanging space (the image) with time (cpu).


This feels like when I made a drawing in elementary school, and someone asks if I traced it. It just feels like looking for a way to downplay what was made by making an appeal to "creativity" or "originality".

But the tide never goes out on AI and computing. The capabilities will only grow more and more impressive and unassailable.

When the chatbot is completely convincing is someone going to ask, "I wonder close the responses are to training text" even though no one even blinks when fathers and sons act alike. No one demands children invent new languages to prove they aren't just "randomizing samplers"


> No one demands children invent new languages to prove they aren't just "randomizing samplers"

I sure as hell do. No son of mine will be comprehensible to other humans until he's at least two years old.


Is this just a search engine to find relevant content and remix it a bit, or can you actually create new content. These two things don't solve the same problem, and you may run into lots of copyright problems.


You know, if you've ever done anything creative, the brain's ability to take inspiration from something and remix it (without realizing it's not original and actually just a memory) is shocking. So are our brains really just search engines to find relevant content and remix it a bit? What really is creation?


DJs and music producers are exceptionally creative, but if they use existing work, they have to license it.


exactly.. where is the chocolate chip cheesecake?


Looks impressive but I can't escape the notion that surely some of the generated images will be very close to the some of the training images?

How am I to assess how original the generated results really are?


Yeah stylegan is rarely this good at super diverse data like this


Image search, I guess. No results, it's original enough.


At least with DALL-E you can be sure the food has a name. For a moment I was worried this would produce vaguely food-like images where on closer look you realise you have no idea what you're looking at - like a lot of other "this X does not exist" projects seem to do.

Also a bit of cultural bias in the training is shown I think. The "pile of cookies" prompt seems to mostly generate American cookies, while e.g. a German user might be disappointed they didn't get this: https://groceryeshop.us/image/cache/data/new_image_2019/ABSB... :)



Nice, very alien, somewhat Giger, but not very sushi.


I don't think a German user writing "pile of cookies", in English, would be disappointed with "English" results. Is that any different than what you get on, say, Google?

Try prompting craiyon for "Ein Stapel Kekse"* :)

* Google-translated


I thought DALL-E uses a sentence-piece encoder for the text that goes into CLIP, which would suggest that you can recombine the syllables from existing words and it'll "understand" that.

So both "banana chocolate cookies" and "banacoochoconakieslade" should work.


Is there a way to trigger a fresh image on demand? That's kind of what I expect when I see a does-not-exist site.



I was really hoping that this would be never-before-seen, AI-generated recipes or something similar ):


That would be the OpenAI Recipe creator (eat at your own risk) https://beta.openai.com/examples/default-recipe-generator


We have trained four StyleGAN2 image generation models and are releasing checkpoints and training code. We are exploring how to improve/scale up StyleGAN training, particularly when leveraging TPUs.

While everyone is excited about DALL·E/diffusion models, training those is currently out of reach for most practitioners. Craiyon (formerly DALL·E mega) has been training for months on a huge TPU 256 machine. In comparison our models were each trained in less than 10h on a machine 32x smaller. StyleGAN models also still offer unrivaled photorealism when trained on narrow domains (eg thispersondoesnotexist.com), even though diffusion models are catching up due to massive cash investments in that direction.


Nice work - can you share how large your training datasets are in terms of number of images? And did you train your models from scratch or fine-tune them from an existing model?


I don't suppose you have a way of converting these models into a pytorch usable version, do you?


Are you using the same model for cookies and cheesecakes? Do you get sometimes a cookiecake?


We currently train each model independently, ie we first gather a cookie dataset, train a cookie model then restart from scratch for the next one.

That's actually something we're investigating: can we train a single class-conditional model for multiple types of food? Or, can we finetune cheesecakes from cookies?


>> ie we first gather a cookie dataset,

Is there a chance your dataset provider makes a claim that they have derived data rights over your model generated images? Would you have sufficient confidence, say, to sell your images on a stock image site?


It is still somewhat unclear, but it seems that images generated by a machine learning model are not copyrightable (to quote the US Copyright Office, generated images "lack the human authorship necessary to support a copyright claim"). Whether the model itself is copyrightable is less clear to me, but [0] seems to suggest that it be. All of this depends on the country, but much of the world tends to eventually mimic US copyright law.

[0] https://law.stackexchange.com/questions/19981/who-can-claim-...


Well, now I want a cookiecake.


The dream of the 90s is alive in StyleGAN!


I think the 90s version would be icecreamcookiecake


I think both. I strongly remember those giant pizza sized cookies at the mall in the early 90s.


Darn! I was hoping for other-worldly foods that don't actually exist being generated from real food attributes. I suppose I should have known better.


The Cake is a Lie meme was never so relevant.


And the Science gets done and you make a neat gun.


I like the thought that, years from now, we're all drinking eating weirdly-presented food / drinking weird cocktails because AI synthesized the images of drinks around the web and decided `cocktails always include fruit` and `all food must be piled high on plate`


Coming soon to the restaurant site generator of some large delivery service.

("Picture is only for illustration purposes")


You are killing Instagram influencers


You mean supplying. Imagine running a food IG that didn’t even need to make the food.


Hold my “beer”.


Now we just need to connect this to ffmpeg, add some fake recipe scripts, upload a video on YT, multiply by 100 videos, 100 channels make about $2.00 nice.


Imagine being one of millions, your food IG will look fake even if the photos are real



One twit there, made me happy: This is going to break pinterest


Pinterest is great and useful and rad. Whoever's pushing them to chase KPIs and ruin search is not.


Seriously, won't this combined with GPT-3 flood the influencer market?


Yes. Images will lose all authenticity.


I think they have, some time ago. It seems like motion video is now on the chopping block.


It took effort to doctor and make fake images. This just makes it seamless.


I love cheesecake with strawraspcherries on top.


Are there any analysis techniques that can easily distinguish between these and real photographs? Do simple things like edge detections or histograms reveal any anomalies?


Neural networks can be trained to identify the difference, but I don't know how specific that is to the generating model. In fact, the GAN technique, at a high level is two networks -- one trying to distinguish the difference and one trying to create images that cannot be distinguished. That is the "adversarial" aspect.

It is an interesting question that there may be some simple pre-processing techniques (edge detection, Fourier transform, etc) that more easily distinguish the image as a fake. Something like a shortcut from training a network to make the distinction.


When will we see this as a contestant on Is It Cake?


The food looks great! I suppose these models could use some extra training with dishes, though. The plates and glasses look wobbly, which is an instant giveaway. Otherwise, I can see this being used by food posters! Maybe not as a primary source, but as a "filler" — for sure.


You can try out the model with this interactive Gradio demo: https://huggingface.co/spaces/nyx-ai/stylegan2-flax-tpu


I tried to use the linked Colab notebook to generate my own, and it appears to have been successful, but I don't see any way to view the generated images via the notebook interface. I'm not familiar with the notebook tool - have I missed something?


If the result is standard numpy 3d matrix then something like Pillow should be able to display the images.

Something like

  from matplotlib import pyplot as plt

  plt.imshow(matrix)

  plt.show()


I ran it locally and it generated images as PNGs in the "generated_images" directory (named 0.png, 42.png etc. after the seeds provided to the script). If it works and does the same in the notebook, you should be able to click the folder icon in the menu on the left to open the file browser, expand the "generated_images" directory, then click the ellipses next to each file to select "Download".


I'm honestly surprised that they trained a StyleGAN. Recently, the Imagen architecture has been show to be both easier in structure, easier to train, and even faster to produce good results. Combined with the "Elucidating" paper by NVIDIA's Tero Karras you can train a 256px Imagen* to tolerable quality within an hour on a RTX 3090.

Here's a PyTorch implementation by the LAION people:

https://github.com/lucidrains/imagen-pytorch

And here's 2 images I sampled after training it for some hours, like 2 hours base model + 4 hours upscaler:

https://imgur.com/a/46EZsJo

* = Only the unconditional Imagen variant, meaning what they show off here. The variant with a T5 text embedding takes longer to train.


Or, since they are comparing to Craiyon, why not just finetune Craiyon itself? Craiyon already exists, just take it off the shelf, you don't need to retrain it from scratch, so the cost to train it from scratch on everything (which is indeed quite large) is not relevant to someone who just wants to generate great food photos.


We haven't experimented much with Imagen, but our initial conclusions were that:

- It's hard to train to a photorealistic quality (we'd be happy to be proven wrong!)

- There is no strong pretrained model available yet

Checking the LAION Discord, the situation doesn't seem to have considerably evolved.


This is the most disturbing “does not exist” yet. A food blog could write itself


They already pretty much are. Top recipe hits on Google seem to always be from like "Southern Mama Cooking Tips" or something generic like that, and you have to scroll past 8 paragraphs of context for why this person is writing a recipe and why they like it so much, totally not to hit all the SEO sweet spots, and the full life story of this "Southern Mama" that's totally not a guy in India or a robot scraping together blurbs of text from other website.


It's not entirely SEO, that's part of it, but it's also copyright.

You can't copyright a recipe. It's information that can be freely shared. Anybody can steal it and set up their own site. You can copyright a recipe that comes with a life story. Copyright laws are weird and confusing, but it's really difficult to come up with a better solution.


Somewhere in that data set is found an Eigencookie. I want the recipe.


Aggregate "does not exist" website for anyone who's interested.

https://thisxdoesnotexist.com/


My partner is very impressionable when she see's food in a TV show. Immediately has a craving for it. This thing is like, limitless porn for her gluttony.


And here I was, hoping for new, never seen before dishes.


How big was your training dataset?


No hot dogs?

Nice work.


Everything looks delish!


and what was the licensing for the training data that you used?


This computer has pretty poor taste in cocktails.


should call that DeepCakes


We're looking at the complete collapse of the stock photography market.


Thinking bigger: I'm pretty sure the combo of a relatively free global Internet, liberal democracy on large (much bigger than city-state) scales, and cheap, customized, on-demand generation of totally fake text + photo + video propaganda based on simple prompts, cannot all co-exist. At least one of these isn't going to survive alongside the others. If we just let things keep going the way they are, I expect "liberal democracy on large scales" is the one we'll lose—and whatever follows probably won't let the fairly-free, global Internet keep existing, either, so we'll lose that too.


You can put letters in any order you want and make them say any damn lie.

This was not an impediment to liberal democracy.

I am as concerned as the next guy but throwing the towel already seems a bit premature?


> You can put letters in any order you want and make them say any damn lie.

You can run a web server by responding to every request by hand-typing the response, too. But you couldn't realistically run one-one-millionth the modern Web that way. You can't have global-scale e-commerce that way, et c. Some things that technically could work that way, can't actually—it's too slow, too expensive. This is very much one of those "quantity has a quality all its own" things. Increase the productivity of every astroturf-poster or propaganda-front-news-site manager a few hundred times and that's a big difference.

> I am as concerned as the next guy but throwing the towel already seems a bit premature?

Where'd you get throwing in the towel? I do think we're (especially the US) really unlikely to do what we need to in time, in part because measures that are probably necessary to defend against this are themselves risky and rather unappealing. But we might.


Humans are smarter than you give them credit for. They will adapt with increasing skepticism. I don't see why this is a threat to democracy.


That assumes that democracy can survive overreacted skepticism.

If you don't trust anything, you ... don't trust anything.

How can institutions survive? We're already living through a moment where large swaths of the population cannot get a fix on the same reality as their neighbors.

I would argue that this is happening precisely because the gullibles are being led into playing skeptics.


> cannot get a fix on the same reality as their neighbors

I believe this phenomenon was way more severe in the past. The reason you notice it now is because these meme cones are rubbing up against each other in public forums. In the past, isolated communities wallowed in their own wacky shit totally unchallenged. Pick any major religious movement of the 19th century and tell me they had a firm grasp on reality?


As a counterpoint, most cults of modern day have found digital guidance and strength where before they would not have.

Take this site, even. Ok, it's less cult-y as more people learn of it, but in the past there wouldn't have been anything close to the sort of community you can run into here. Maybe I've run into one, two people in real life as "in the know" as what you typically find here...

> The reason you notice it now is because these meme cones are rubbing up against each other in public forums

Yeah, it is true the amplification effect makes crazy stuff like the church of Scientology public knowledge, but there's also just that there's too much noise to keep track of all the special little brands of cult-ish "crazy". It's really hard to gauge the effect here, unless you have a study perhaps you could link? That would be quite on topic here.


I have heard this before, but I have several reasons to think it is not going to be a problem above what already happens:

1. AI lets you generate an enormous number of lies, but what is really dangerous is one well-placed lie within a trusted stream of many truths. CNN will retain a power to mislead far in excess of Twitter bots.

2. Democracy averages over everyone's confusion, which means lies are only dangerous when large numbers of people believe them at the same time. Hordes of bots generating spurious lies won't move a democracy in any specific direction, but again, mass media will retain its power to mislead everyone at once in the same, effectual, direction.

3. People have never respected the veracity of random tweets. In the same way that trust in mainstream media outlets is reaching record lows due to their consistently biased reporting (they might not all have the same bias, but I can't think of any I'd consider free of bias), everyone will learn to adjust their incredulity to match the true quality of random tweets.

4. Companies like Twitter and Google are known to be shaping their results and algorithms according to their own "political views" (broadly construed) so at worst this would represent a partial shift of power from the old masters to new masters quite like them (social media companies). In many ways trimming the front page to reflect editorial opinion is echoed in the way Twitter trims its feeds to reflect their own editorial opinions.

All taken together, it seems like the media is afraid that equally large companies with similar business models (content, attention, advertising) might end up eclipsing them. The same old model where the TV station is afraid to upset its advertisers, thereby giving a voice to business interests, is well-recorded in the recent history of YouTube. Not so much will change, although seeing it in its old and newer forms might shed light on how it works.


> CNN will retain a power to mislead far in excess of Twitter bots.

The major mistake here is thinking that these two things are mutually exclusive competition. They're not.

In 2015, an internet rumor started spreading that Obama was going to invade Texas. This rumor was based on the fact that there was a routine military training exercise taking place in Bastrop called Jade Helm. It started on Facebook and was quickly picked up by the mass media. These rumors became so viral that the Texas governor activated the Texas State Guard "just in case".[1]

Later, the US government alleged that this was a planned disinformation operation carried out by the Russians and was a precursor to later operations.[0]

Twitter and Facebook started the conspiracy, but mass media laundered it and made it look real enough to get a governor to act.

0: https://www.kut.org/politics/2018-05-03/jade-helm-conspiracy...

1: https://www.texasmonthly.com/news-politics/russians-sowed-di...


Whats your reasoning on this? Because i dont see why liberal democracy would cease to exist... life would go on if we all know that all pictures can be fabricated. I think this is already the case without AI.


Apparently you missed the problem Deep Fakes posed...

If you cannot distinguish reality (well), and in fact it becomes possible that most things you see do not exist, then there is nothing to stop a bad actor from producing a fake version of events in which they are elected, control everything, etc.

So, democracy would cease to exist, because democracy relies ultimately on a choice - if you have no choice then you do not have democracy, only a dictatorship.


Photoshop has existed for years and humans have been manipulating photos for longer, what's the difference, really?

If I see a photo in the Guardian newspaper (or any other reputable news outfit) I'm going to presume it's real, and I expect journalists to verify that for me. If I see a random photo that doesn't look quite right on a 4chan, I'm not going to immediately assume it's news.


> Photoshop has existed for years and humans have been manipulating photos for longer, what's the difference, really?

Scale, cost, and reach.


Reach is no different, bots and humans are able to post to social media, and cost is probably no different at the moment either since AI isn't perfect, some human interaction is probably needed to make it believable, and because of that, scale is the same too. I think we're approaching all of those things but it's probably still quite some time away until a machine can be trusted to manipulate the public on its own.


> For a Linux user, you can already build such a system yourself quite trivially by getting an FTP account, mounting it locally with curlftpfs, and then using SVN or CVS on the mounted filesystem. From Windows or Mac, this FTP account could be accessed through built-in software.


Right but today we have that problem already. We know that a bad actor journalist can write a fake story. We therefore require sources. If deepfakes come along we will know videos can be fake and so we will be skeptical, as we are today, and look to proper sources. We will easily come up with some way to validate sources via cryptography or org reputation (e.g. we might trust the NY Times to not just fabricate things)


This is already barely holding together with mostly human actors doing the astroturfing and creating bullshit "news organizations" expressly to spread propaganda. Automation is going to overwhelm a system that's already teetering.


Yeah but we will get the word out that none of that is trustworthy, then. There will be countermeasures and reactions to this just like previous things. It will certainly be effective to some degree - propaganda is effective for sure - but it won't just be, oh, there are deepfakes, everyone will now just unthinkingly accept them.


1) This effective backlash/education-campaign has not already happened despite there already being significant problems with this kind of thing, and most of it not being that hard to spot, even, and 2) I think the more likely effect is the destruction of shared trust in any set of news sources—we're already pretty damn close to this being the case, in fact. "It's all lies anyway" is a sentiment that favors dictators more than it does democracy.


> This effective backlash/education-campaign has not already happened despite there already being significant problems with this kind of thing

I think it could be claimed that we're clearly in the process of learning to distrust. People distrust media more now than ever [1], because they've demonstrated, again and again, that they can't be trusted. Trust in institutions are down to new lows [2], because they've also shown that they're untrustworthy and/or incompetent. Everyone is shouting "disinformation", "misinformation", and "fake news" constantly, because nobody trusts anything anymore, and rightfully so in many cases.

1. https://news.gallup.com/poll/355526/americans-trust-media-di...

2. https://news.gallup.com/poll/394283/confidence-institutions-...


People already produce all kinds of fake news and doctored photos and false flags and all kinds of things. This has been going on since we developed language and photography I suspect.

People already have trouble telling propaganda from fact. That has been going on since forever.

At the end of the day I don't see this being a game changer. If anything, now video and photos are less evidence for/against something as the potential falseness becomes well known. Congressman X: "no, that wasn't me you saw leaving the hotel with the prostitute, my slimy opponent obviously is deep faking stuff".

And people will continue to believe what they want to believe, in spite of all evidence to the contrary, just like they do right now.


There seems to me a huge difference between a few organizations being able to produce & distribute a total of X amount of self-serving bullshit with some limited reach, and anyone with a bit of money being able to produce 100,000 • X amount of self-serving bullshit and deliver it to exactly the people most likely to respond to it the way they want, anywhere in the world (save, notably, China and North Korea and such) while making it very hard to tell who it's coming from.

An environment in which 90% of the information is adversarial is really bad. It's a severe problem and very challenging to navigate. An environment where 99.9999% of it's adversarial and it's even harder than before to sort truth from fiction, functionally no longer has any flow of real information whatsoever.


Another thought:

Maybe liberal democracy is not the final outcome of human civilization. You like it and I like it (presumably we were both raised to believe this way) but perhaps it's not really true.

Just to question a base assumption here.

It seems to me, if all the things that are claimed to threaten liberal democracy actually do, liberal democracy might be much less robust and long lived then previously believed.


Oh, absolutely. I've even come around to thinking that's likely. But one can hope.

[EDIT] One thing I no longer think has any realistic future is the open, semi-anonymous Internet. We're either losing it because despots take over and definitely won't permit that threat to remain unfettered, or we're losing it (in perhaps a gentler-touch way) because we have to to prevent authoritarian take-over and vast civil strife. I don't think we're getting to keep that no matter what happens.


Yep I think you might be right. It's ultimately too much of a risk to all sorts of powers to have open unfettered real time communication and mass dissemination.

Even the "good guys" will call emergency that will never end.

Oh well, it was nice while it lasted. An intellectual Cambrian explosion. And all that porn!


> And all that porn!

On that note: I can't wait for the resulting proliferation of photorealistic tentacle hentai. Imagine the possibilities!


Take it a step further: Can you be arrested for having porn that would be illegal in your country if it was real, but instead it's a thousand generated images/videos? How blurred will those lines get?


I guess my first question would be how the creators of such porn put together the training set.


Eh you're asking the wrong question - training sets are not made of gold, it might be hard to make good ones, but faking a training set with a program like this, resource intensive but possible.


This is already possible today and we don't need AI generated stock photos to do it. A bad actor can already spin events to fit their narrative, suppress dissent and control their population. Dictators have been doing it for centuries and we're seeing it in real time in the form of Putin's Russia right now.


Sure, but being able to do the same thing at 100,000x the scale for the same price seems like a pretty big difference. Throw in the ability to target narrow constituencies with custom messages via modern ad networks, automation-assisted astroturfing, et c., and the whole thing looks like a powderkeg to me.


> 100,000x the scale

Then everyone quickly realizes it's all garbage. The bigger danger is when people think garbage isn't possible, which has been the case since Stalin erased people from photos 98 years ago.


If it were only Putin's Russia making stuff up we'd be in good shape.

Otherwise, I fully agree with your point.


indeed, it's just the most prominent example at the moment


I disagree completely. We've had this ability to create photorealistic fiction for decades now. You can go see how impressive the capability is by paying $12 at your local movie theater. The only thing that's changed is the ease and cost of doing it, but that has also been dropping for decades. Having people think that it's not possible right now is what is dangerous, because it's absolutely possible right now, no AI needed.

Even if this is released quickly, the internet would be flooded with crazy images, and people would quickly learn to have a critical eye to the media that they see, which should already be the case, because photorealistic fiction is already possible, as everyone experiences with every blockbuster movie. Only a slight shift in perspective is required to realize that every image and video you see aren't any different than frames/clips of Thor: Love And Thunder.


Lol imagine if someone said this as a prediction when photoshop was invented.


It's way more than that.

Anyone can be an artist, musician, photographer, writer.

It's going to result in more content being created, which will change the economies of content. Rate, scale, and volume of production will increase by orders of magnitude.

Disney thinks IP is a war chest. That's an old way of thinking.

Star Wars won't be special to the new kids growing up that can generate "Space Samurai" and "Galaxy Brouhaha" in an afternoon.

We're going to hit a Cambrian explosion of content.


"It's going to result in more content being created"

Is it, though? This model took over a month, on extremely fit hardware, to even create.

Lets say for a second, in some hypothetical future, that anyone can access/use/update these models (by anyone, I mean someone with both low amount of resources as well as little to no programming skill), why are they creating content?

"Rate, scale, and volume of production will increase by orders of magnitude."

If by production you mean "paid creation", I'm not so sure about that. In this world where everyone creates content from thin air 1) there is little to no monetary value to the content anymore (as monetary value inversely correlates with scarcity) 2) So there is less incentive to create anything, because there is no monetary value to doing so.

In fact, by definition we can pretty much prove that not much of anything will happen in this regard, because content is already limited by budget - the budget has not gone up, and the return has only gotten worse (in this hypothetical scenario).

What I think is more likely to happen - a few, "blessed" individuals will have out-sized content creation capabilities, without much need to innovate. The rest of us will have almost no incentive to create anything as a result.

Disney will use these systems, and they will use them to churn out more garbage, faster, on average, most kids will not be generating any movies in an afternoon.


I don't think that's even remotely what will happen.

> This model took over a month, on extremely fit hardware, to even create.

In a year kids will be training them on colab. This technology is moving so fast, it's jaw dropping. If we went back in time just ten years and showed ourselves this stuff, we'd think it was magic.

> why are they creating content?

Why does anyone post anything at all anywhere ever? Why do people edit photos and videos? Because it's human and people want to communicate.

> there is little to no monetary value to the content anymore (as monetary value inversely correlates with scarcity)

Maybe, but there's still a "search" problem in finding content that activates a response (and attracts the algorithm and views). Look at TikTok and YouTube. High rates of content creation. Immense value in both platforms. Lots of crap, but astoundingly good content mixed in. Enough to keep the whole world glued to it.

> So there is less incentive to create anything, because there is no monetary value to doing so

Nope. TikTok, YouTube, kids making Minecraft mods as a labor of love and hobby.

> content is already limited by budget

Per unit cost goes down by orders of magnitude. I don't see this resulting in less production.

This is a Henry Ford moment.

> What I think is more likely to happen - a few, "blessed" individuals will have out-sized content creation capabilities, without much need to innovate.

The market would disrupt this so fast. There's no ways these tools won't find themselves in the hands of everyone.

> Disney will use these systems, and they will use them to churn out more garbage, faster, on average, most kids will not be generating any movies in an afternoon.

Kids already make their own narratives in VRChat, Minecraft, and Roblox. The next systems will be a holy grail for them.

I'm interested in looking back on this thread in ten years. I think my predictions are correct.


> Why does anyone post anything at all anywhere ever?

I think you'll find the majority of the artwork community still favors, pen and paper, pencil (maybe oils). They do this because, at least in part, they appreciate the challenge, it makes them better as people, in some fashion.

I use tablet, sure, some use 3D modeling, even. People don't create pencil drawing because they are good, or sell, they do it for personal reasons. I think you're assuming this is going to solve some problem, it's really not - the people interested in content creation for content creations' sake are not likely to benefit.

> Nope. TikTok, YouTube, kids making Minecraft mods as a labor of love and hobby.

You're assuming because content platforms like TikTok exist now, that something similar must continue into the future... I see this as a flawed assumption. Regarding monetization, my point hold well here, though. For every "successful" Youtuber raking in millions, there are hundreds, thousands for whom it just didn't work out.

The content on these platforms is already often mundane, you don't see the promised plethora of content creators, because the system for the most part favors the few, not the many. Sure, everyone can upload, that doesn't mean most people do.

> If we went back in time just ten years and showed ourselves this stuff, we'd think it was magic.

I remember ten+ years ago, there was content on the web. Now, it's extremely centralized, most of it is not that good. So yeah I can time travel a bit, and things got worse. The reasons for this aren't all AI, but as trends go creating a program to do photoshop or video automatically isn't going to make things better. You're not solving the right problem here, and the people who stand to benefit are not the people you seem to think will.


> I think you'll find the majority of the artwork community still favors, pen and paper, pencil (maybe oils).

A field that has existed for centuries versus something in its larval stage that is still growing legs.

Calculators reached more than slide rule users.

> They do this because, at least in part, they appreciate the challenge, it makes them better as people, in some fashion.

I would love to do art, but life is full of opportunity cost. You can't do everything.

This technology will enable so many people to create art without having to devote thousands of hours to practice. That's a good thing!

I'm sure many artists used to say the same to artists that utilized tablets when they first came about. Or C programmers joking about early Perl and Javascript programmers.

It's rhymes a bit with the Luddite argument.

We're seeing the emergence of new workflows. Few people hand wash their clothes anymore, and we're not worse for it.

> the people interested in content creation for content creations' sake are not likely to benefit.

I'm telling you that I would directly benefit. I love creating content, but I can't draw. I don't have the time to learn. These tools will empower me, and they'll only get better with time.

> For every "successful" Youtuber raking in millions, there are hundreds, thousands for whom it just didn't work out

There are more people making money creating content now than in any time in history. These tools and platforms have democratized access. People now have a path where there wasn't one before.

These platforms and tools don't, however, remove the requirement to be good at what you do and to find, understand, cater to, and grow your audience.

> Sure, everyone can upload, that doesn't mean most people do.

90% rule. There are still more creators today than in the past, and the field will only grow from here.

> Now, it's extremely centralized

That's orthogonal. I agree with you about platforms and centralization being a bit of a problem, but for what it's worth, the platforms do make it easier for most people.


> I'm telling you that I would directly benefit. I love creating content, but I can't draw.

This, as you put it, orthogonal. The question is not whether some small set of individuals, such as yourself stand to gain or not. That is in fact what I surmise a very few will actually gain, on the whole this is not a boon.

Other orthogonal arguments could be made regarding other orthogonal avenues you have brought up e.g. Javascript being good (terrible for the environment, in part responsible for a large amount of e-waste).

Bringing the term luddite into the equation rhymes something with sensationalist....

Being not OK with inappropriate technological uses is not synonymous with being anti technology, tools have purpose, but if you murder someone with a knife and then claim people are luddite for decrying your actions, because progress and being able to cut things...

>These tools and platforms have democratized access.

Or at least have created the illusion of such.

> I don't have the time to learn.

No pain no gain, as the saying goes. I don't think this will even be as useful to yourself as you think, you will as you put it, need to still be good, but your unwillingness to put effort into something you otherwise claim to value indicates you are unlikely to benefit at all, your work is likely to be subpar compared to those of your peers.

It's OK to not be good at everything, and rely on others, that's the foundation of a society. But claiming you will benefit from something that a tool does is like claiming that you will become rich from hammers putting nails into wood - sure we benefit from not smashing the nails into wood ourselves, maybe society benefits from better constructed houses, even, but the hammer itself does not make you a carpenter or even a construction worker. It will not benefit you in that fashion.

Which is my point, the tool in question does not democratize making movies, I bring relevant issues in the realm of current video distribution, and you want to claim it is orthogonal... I'm not sure you understand just how much it won't benefit you...

Another way to think about it, if something is a problem, e.g. "I cannot draw", think "Maybe it's much worse than that", and you might begin to uncover the root causes of your issue. Acting prematurely, much like premature optimization, is the root of all evils, if you will. To bring it back to topic, if you create something like an image AI without properly examining what it is you are trying to accomplish, you are liable to create more harm than good. A lot of tech is like this, knee jerk scratch-an-itch, and it shows. Not an Apple fan, but they are a good example in industry, they took time to design well and the result is a more stable, more power efficient system. Android, as much as I love the platform, suffers from a million knee jerk decisions across industry, resulting in a much higher proportion of digital garbage, some virtual some literal landfill.


I have some extremely detailed imaginary images and clips in my head that I just don't want to devote the thousands of hours it would require to become proficient enough in drawing and visual effects to create them.


Agreed. If I can even get close with some of these generators, and hand-modify from there, I'll be happy.


It's just that they don't "create" anything and pressing some buttons to get more and more of the same, biased crap quickly gets boring.


1-10% of the audience will be using it as a tool and combining it with other inputs to produce new content.

That's an astounding number of people making content at a level of sophistication they wouldn't have been able to pull off before.


Friggles and bop, produce nothing.


Partially, yes. I certainly predict that DALLE-like models will ruin the prices for some stock photos.

But on the other hand, Adobe is pushing their CAI hard:

https://helpx.adobe.com/photoshop/using/content-credentials....

And the core benefit of "authentic" content is that it can't be generated by an AI. Only humans can own copyright.


dall-e is start giving full commercial license , i think in the end both will converge you will prom something the AI will make 100 prototypes and you will improve the one that you like whit help the AI. the line will blur, is not against the machine but whit the machine. the problem maybe is that will probably need less people to do the same works. is "fun" maybe is better be construction worker than artist, becouse the second will get obsolet for most case.


"Boardroom full of attractive business people gathered around a laptop with one of them pointing at the screen, all wearing suits, whiteboard in the background"


"Woman laughing alone with salad"


"Elderly man sitting at laptop looking puzzled, holding credit card"


"teen in hooded sweatshirt wearing sunglasses and gloves typing on a laptop in a dark room"


similarly, "criminal emerging from computer monitor holding crowbar"


"The feeling of drifting slowly through a field of moving vehicles."

"Once there were parking lots, now it's a peaceful oasis. This was a Pizza Hut, now it's all covered with daisies."

"Green grass grows around the backyard shit house. And that is where the sweetest flowers bloom."

"This ain't no party, this ain't no disco, this ain't no fooling around."


This will also democratize the market for comics. It used to be you needed to be able to draw to be a comic. Now you can just have ideas, and use This Comic Does Not Exist (which does not yet exist) to generate the imagery.


"Smiling happy family holding cell phones above their heads standing in a field of grass wearing all white"


Yields subtly nightmarish results:

[1] https://i.imgur.com/YqNkaGj.png

[2] https://i.imgur.com/EQL7pqw.png


Was Aphex Twin using DALLE two decades ago?!


That's actually a great example. Just think of the aggregated human-hours wasted to bring those people together, create that setting, photograph, edit, publish... All for a meaningless flyer or landing page.


To what extent are we ripping off the photographers? Weren't the models trained on their hard work?

Have we reached a point where we've bounded art within the data models are trained on?

Have we imposed a limit on ideas as a realm of "what came before" and implicitly decided that any "after" is a pointless exercise without knowing whether that's even true?


> To what extent are we ripping off the photographers?

Photographers in general, certainly, but no individual photographer enough that it could ever matter, legally or ethically.

> implicitly decided that any "after" is a pointless exercise...

Seems a little pessimistic. I believe that we will still be able to recognize and appreciate new art.


this argument is like saying you kill 100,000 people, fine. that's generally hurting a lot of people but no individual person, so no ethical problems m8


Bizarre comparison. Each individual in your example is killed. Having more killed doesn't make it better.

In the photographers example, each photographer is fractionally damaged, with the more photographers the less the damage.

Similarly, GPT3 is ripping off the words of everyone who has ever written anything on the internet, but to such a miniscule degree that none of us could ever have cause for complaint.


Excellent questions, and I was thinking the same thing. In my opinion, AI-generated art or images are not as impressive as they might seem at first purely because there is no real imagination involved. It's an art simulacrum.

A more accurate title would be "This picture of food does not exist."


pretty much every art I've ever drawn was influenced by things I've drawn previously. How is that different?


For a generative model to hit the quality threshold you see on this site, it’s going to need so much training data that you can presume there’s already a ton of stock photos that match that description that it scraped from the internet.


Can these things be used for commercial purposes? Which ones?


Shrinking microstock rates already killed it.


I'll never trust stock photos again


Stock market crash!?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: