OP: Forgive me if this is out of place. Also, please know that my question is genuine, not at all a reflection on the author/their project, and most certainly born out of my own ignorance:
Why are these kinds of things impressive?
I think part of my issue is that I don't really "get" these ML projects ("This X does not exist" or perhaps ML in general).
My understanding is that, in layman's terms, workers are shown many, many examples of X and then are asked to "draw"/create X, which they then do. The corollary I can think of is if I were to draw over and over for a billion, billion years and each time a drawing "failed" to capture the essence of a prompt (as deemed by some outside entity), both my drawing, and my memory of it was erased. At the end of that time, my skill in drawing X would be amazing.
_If_ that understanding is correct, it would seem unimpressive? It's not as though I can pass a prompt of "cookie" to an untrained generator and, it pops out a drawing of one. And likewise, any cookie "drawing" generated by a trained model is simply an amalgam of every example cookie.
For the longest time it was assumed that creativity was an almost magically human trait. The fact that somebody can, with a straight face, say "I don't get why it is impressive, I could draw these images too" is actually indicative of the wild change that has occurred over these last couple years.
I guess it is true that more than a couple demos like this have been shown, so some of the awe might have worn off, but it is still pretty shocking to lots of us that you can describe the general idea of something to a computer and it can figure out and produce "what you mean," fuzzy as that is.
> For the longest time it was assumed that creativity was an almost magically human trait. The fact that somebody can, with a straight face, say "I don't get why it is impressive, I could draw these images too" is actually indicative of the wild change that has occurred over these last couple years.
It's not creativity though. It's a program that arranges pixels in a way that is statistically similar to some training data set. It doesn't "draw" anything, it doesn't "figure out" anything. There is no thought or idea behind it.
The output is mildly interesting but there is no creative act at work, and there's certainly no revolution in the artistic world.
> For the longest time it was assumed that creativity was an almost magically human trait.
Maybe by some people. This is a human-centric perspective, a form of speciesm if I can call it that. Various primates have shown creativity, and various animals have shown the ability to solve problems creatively. Heck, even my cat figured out how to open doors by pulling down on the handle. Humans are likely not more creative than other animals with similar brain size, it's just that there's other factors at play that make it seem like that (such as opposable thumbs and the passing down of knowledge between generations using speech).
> It's not creativity though. It's a program that arranges pixels in a way that is statistically similar to some training data set. It doesn't "draw" anything
that's the exact argument made against chess engines - are they "really" playing chess?
What's to say that the brain works different than people imagine creativity to be? That we think we are creative might be an illusion, because the brain is tricking you into thinking that you creatively came up with an original idea, when the reality is that the idea came from a long list of training data that one might've been exposed to all his life.
And who's to say that brute force, and statistical methods of producing content is not creative?
So is your perspective :) You essentially take a process we don't understand and can't quantify and say "because I can understand this, this can't be it". You don't know, claiming otherwise is disingenuous.
:) Yes, I haven't defined "creativity" and I haven't quantified anything. My assertion was empirical, based on the observation that what society usually considers creative comes into being due to a logical train of thought, such as "if X, then what if Y", or an impulse to do something that hasn't been done before. Imitation on its own would not be considered particularly creative.
Yes, you are right, I could be wrong, perhaps this form of statistical imitation is at the core of greater creativity, and not another hype wave in software development. Time will tell.
> It's not creativity though. It's a program that arranges pixels in a way that is statistically similar to some training data set. It doesn't "draw" anything, it doesn't "figure out" anything. There is no thought or idea behind it.
I think this is the crux of my thought on the subject. Thank you.
I absolutely see the kinds of uses it can have (guiding the repair of artwork that was damaged, seeing patterns in huge datasets, etc.), but I think the way it's marketed is that it's somehow coming up with new art, but it's really just "recreating" (on a very micro level) everything in the training set.
You could absolutely argue that creatives are just regurgitating things they've seen before, but I think the big separation between ML and human creativity is that it crosses "genres", if you will. For example, I could be influenced by my experience in a car crash in such a way that it causes me to create X (art, software, music, etc.) in such an abstract way, I'm not sure whether it's even possible to recreate artificially.
It was a long time ago. It wasn't a knob, it was a handle. He would jump on it, his weight would pull down the handle, and then he would use his right leg to push against the frame and open the door. Not very graceful but it got the job done. He was a very smart cat.
> The fact that somebody can, with a straight face, say ...
To be clear, I'm not trying to devalue this at all; In fact, as I noted above, I am certain I'm missing something and that was what my comment was aimed at. In any case, thank you for taking the time to reply (seriously).
Probably expression "with a straight face" has been used sarcastically too often, so maybe it looks sarcastic in my comment too. In this case I should have picked a phrase more unambiguous phrase. I wasn't using it sarcastically or anything, "with a straight face" = in good faith/honest in this case.
I will say that the images included have not show to be particularly creative, unless I missed a wider galaxy of non-existent food items. It's not entirely convincing that the generated images aren't just glued together pieces of other images with some fading between them.
Regardless of what people think as impressive, or creative, or sentient, or intelligent, there is one thing that cannot be disputed: stock photos have value in the current economy. There are people being paid to produce these stock photos. And it now looks like AI can do it for cheaper. That's the most important thing to realize imo, is that the value of human production is getting lower and lower relative to AI.
And think about on demand ML, where a stock photo on a website might change based on characteristics it knows of you.
Let's say a website knows you love chocolate cake, well the website is now going to show you a chocolate cake instead and has generated a unique image of that for you.
I think the real value is that it can combine items. I can easily find a stock image of a cake. I can probably find a stock image of a circular saw. But a decent stock image of a cake in the shape of a circular saw maybe doesn't exist anywhere. But this could generate several for me to choose from.
And then there's the "in the style of Salvador Dali" aspect.
Even for something simple like a cake, if you want to use that image and perhaps license it exclusively so no one else has it, that would cost you a lot more than an ML generated cake photo.
Even just an image of whatever with a very specific color would be a huge benefit for designers. Because almost all stock photography is trying to appeal to the broadest audience as possible, they all have a similar muted pallette of colors. Which makes stock photography kind of look like stock photography. ML generated images or even ML manipulated images could change that.
Correct. But I just see simplistic pictures that could just be copied from any stock. And maybee they are(from the software). They don't demnostrated what you say. It would be really impressive to see exactly this!
"AI" can do this for cheaper because it the model is built based on databases of stolen material. did they get any permission from all the people involved in creating the reference photos?
Since GANs are effectively one class of denoising auto-encoders, your summary is spot-on. This type of ML model learns to effectively compress and decompress natural images by representing it as a hierarchy of convolutional features = shape templates.
I don't think it's accurate at all to characterize GANs as denoising auto-encoders, they're not even superficially similar, unless you're talking about a very specific architecture of autoencoder-based GANs like AAEs.
The part of a GAN that people use to synthesize images has to be a denoising decoder. You put in 1x1px high-dimensional noise as the embedding and it'll gradually upscale and turn that noise into the 3-dimensional output image.
The part of a GAN that people use for control is based on an encoder. You put in your 3-dimensional source image and it gets converted to that 1x1px high-dimensional noise in such a way that putting the noise into the encoder will produce your image again.
So the network architecture/structure of a GAN is the exact same structure as you'd use for a denoising autoencoder. You might train that GAN with a different intent, but it's still a stack of upscalers and convolutions that perfectly matches the architecture inside the decoder part of a denoising auto-encoder.
BTW sometimes people also explicitly call this out. The VQ part in VQGAN stands for VQVAE, the "Vector-quantized Variational AutoEncoder". And VQGAN+CLIP is what the first open source DALL-E clones were based on.
I thought that was the comparison you were drawing but it's misguided.
The denoising part of a denoising autoencoder refers to the noise applied to its input, the image, not to its latent space as would be the case in the equivalency you establish between the noise distribution used as the input of a GAN's generator and the latent space of an auto-encoder (input of the decoder).
Making a comparison between the generator and a decoder and their respective input spaces has some legs but not much so. An AE's latent space is constrained in an entirely different way with a regularization term that tries to minimize the divergence between the wanted distribution and the learned distribution, that's not a problem you have with a GAN since you simply draw from your wanted distribution every time. These differences lead to learned distributions in the latent space that are entirely different, with different failure modes wrt generalization and interpolation.
Beyond that the comparison just doesn't work, yes there are two networks but the discriminator doesn't play the role of the AE's encoder at all. In addition the learning signal is not the reconstruction of the input but the classification of the generated image. That's a completely different learning process, the learned distribution is different, the gradient path is different, the training is different (much more complex for a GAN), etc... I could go on.
The VQGAN would fall under the clause in my initial comment about GANs which specifically use an AE in their very specific architecture, like the AAE. In the AAE's case they use a discriminator to regularize the latent space of an AE(instead of classifying the generated image), in the VQGAN they use a full autoencoder as generator, in addition to a third network for the discriminator. These do not fit the comparison you're making and do not function in general like most GANs do. So the general comparison of GANs to autoencoders remains inaccurate.
> The denoising part of a denoising autoencoder refers to the noise applied to its input
Agree, it converts a noisy image to a denoised image. But the odd thing is, when you put a noisy image into a StyleGAN2 encoder, you get latents which the decoder will turn into a de-noised image. So in practical use, you can take a trained StyleGAN2 encoder/decoder pair and use it as if it was a denoiser. For example https://arxiv.org/abs/2103.04192
> These differences lead to learned distributions in the latent space that are entirely different
I also agree there. The training for a denoising auto-encoder and for a GAN network is different, leading to different distributions which are sampled for generating the images. But the architecture is still very similar, meaning the limits of what can be learned should be the same.
> Beyond that the comparison just doesn't work, yes there are two networks but the discriminator doesn't play the role of the AE's encoder at all
Yes, the discriminator in a GAN won't work like an encoder. But if you look at how StyleGAN 1/2 are used in practice, people combine it with a so-called "projection", which is effectively an encoder to convert images to latents. So people use a pipeline of "image to latent encoder" + "latent to image decoder".
That whole pipeline is very similar to an auto-encoder. For example, here's an NVIDIA paper about how they round-trip from image to latent to image with StyleGAN: https://arxiv.org/abs/1912.04958 My interpretation of what they did in that paper is that they effectively trained a StyleGAN-like model with the image L2 loss typically used for training a denoising auto-encoder.
100% It would be interesting to get an input as image and get the closest prompt that generates that image. Later we could run the prompt to get the image back.
That could effectively be a lossy compression of the image, exchanging space (the image) with time (cpu).
This feels like when I made a drawing in elementary school, and someone asks if I traced it. It just feels like looking for a way to downplay what was made by making an appeal to "creativity" or "originality".
But the tide never goes out on AI and computing. The capabilities will only grow more and more impressive and unassailable.
When the chatbot is completely convincing is someone going to ask, "I wonder close the responses are to training text" even though no one even blinks when fathers and sons act alike. No one demands children invent new languages to prove they aren't just "randomizing samplers"
Is this just a search engine to find relevant content and remix it a bit, or can you actually create new content. These two things don't solve the same problem, and you may run into lots of copyright problems.
You know, if you've ever done anything creative, the brain's ability to take inspiration from something and remix it (without realizing it's not original and actually just a memory) is shocking. So are our brains really just search engines to find relevant content and remix it a bit? What really is creation?
At least with DALL-E you can be sure the food has a name. For a moment I was worried this would produce vaguely food-like images where on closer look you realise you have no idea what you're looking at - like a lot of other "this X does not exist" projects seem to do.
Also a bit of cultural bias in the training is shown I think. The "pile of cookies" prompt seems to mostly generate American cookies, while e.g. a German user might be disappointed they didn't get this: https://groceryeshop.us/image/cache/data/new_image_2019/ABSB... :)
I don't think a German user writing "pile of cookies", in English, would be disappointed with "English" results. Is that any different than what you get on, say, Google?
I thought DALL-E uses a sentence-piece encoder for the text that goes into CLIP, which would suggest that you can recombine the syllables from existing words and it'll "understand" that.
So both "banana chocolate cookies" and "banacoochoconakieslade" should work.
We have trained four StyleGAN2 image generation models and are releasing checkpoints and training code. We are exploring how to improve/scale up StyleGAN training, particularly when leveraging TPUs.
While everyone is excited about DALL·E/diffusion models, training those is currently out of reach for most practitioners. Craiyon (formerly DALL·E mega) has been training for months on a huge TPU 256 machine. In comparison our models were each trained in less than 10h on a machine 32x smaller. StyleGAN models also still offer unrivaled photorealism when trained on narrow domains (eg thispersondoesnotexist.com), even though diffusion models are catching up due to massive cash investments in that direction.
Nice work - can you share how large your training datasets are in terms of number of images? And did you train your models from scratch or fine-tune them from an existing model?
We currently train each model independently, ie we first gather a cookie dataset, train a cookie model then restart from scratch for the next one.
That's actually something we're investigating: can we train a single class-conditional model for multiple types of food? Or, can we finetune cheesecakes from cookies?
Is there a chance your dataset provider makes a claim that they have derived data rights over your model generated images? Would you have sufficient confidence, say, to sell your images on a stock image site?
It is still somewhat unclear, but it seems that images generated by a machine learning model are not copyrightable (to quote the US Copyright Office, generated images "lack the human authorship necessary to support a copyright claim"). Whether the model itself is copyrightable is less clear to me, but [0] seems to suggest that it be. All of this depends on the country, but much of the world tends to eventually mimic US copyright law.
I like the thought that, years from now, we're all drinking eating weirdly-presented food / drinking weird cocktails because AI synthesized the images of drinks around the web and decided `cocktails always include fruit` and `all food must be piled high on plate`
Now we just need to connect this to ffmpeg, add some fake recipe scripts, upload a video on YT, multiply by 100 videos, 100 channels make about $2.00 nice.
Are there any analysis techniques that can easily distinguish between these and real photographs? Do simple things like edge detections or histograms reveal any anomalies?
Neural networks can be trained to identify the difference, but I don't know how specific that is to the generating model. In fact, the GAN technique, at a high level is two networks -- one trying to distinguish the difference and one trying to create images that cannot be distinguished. That is the "adversarial" aspect.
It is an interesting question that there may be some simple pre-processing techniques (edge detection, Fourier transform, etc) that more easily distinguish the image as a fake. Something like a shortcut from training a network to make the distinction.
The food looks great! I suppose these models could use some extra training with dishes, though. The plates and glasses look wobbly, which is an instant giveaway. Otherwise, I can see this being used by food posters! Maybe not as a primary source, but as a "filler" — for sure.
I tried to use the linked Colab notebook to generate my own, and it appears to have been successful, but I don't see any way to view the generated images via the notebook interface. I'm not familiar with the notebook tool - have I missed something?
I ran it locally and it generated images as PNGs in the "generated_images" directory (named 0.png, 42.png etc. after the seeds provided to the script). If it works and does the same in the notebook, you should be able to click the folder icon in the menu on the left to open the file browser, expand the "generated_images" directory, then click the ellipses next to each file to select "Download".
I'm honestly surprised that they trained a StyleGAN. Recently, the Imagen architecture has been show to be both easier in structure, easier to train, and even faster to produce good results. Combined with the "Elucidating" paper by NVIDIA's Tero Karras you can train a 256px Imagen* to tolerable quality within an hour on a RTX 3090.
Here's a PyTorch implementation by the LAION people:
Or, since they are comparing to Craiyon, why not just finetune Craiyon itself? Craiyon already exists, just take it off the shelf, you don't need to retrain it from scratch, so the cost to train it from scratch on everything (which is indeed quite large) is not relevant to someone who just wants to generate great food photos.
They already pretty much are. Top recipe hits on Google seem to always be from like "Southern Mama Cooking Tips" or something generic like that, and you have to scroll past 8 paragraphs of context for why this person is writing a recipe and why they like it so much, totally not to hit all the SEO sweet spots, and the full life story of this "Southern Mama" that's totally not a guy in India or a robot scraping together blurbs of text from other website.
It's not entirely SEO, that's part of it, but it's also copyright.
You can't copyright a recipe. It's information that can be freely shared. Anybody can steal it and set up their own site. You can copyright a recipe that comes with a life story. Copyright laws are weird and confusing, but it's really difficult to come up with a better solution.
My partner is very impressionable when she see's food in a TV show. Immediately has a craving for it. This thing is like, limitless porn for her gluttony.
Thinking bigger: I'm pretty sure the combo of a relatively free global Internet, liberal democracy on large (much bigger than city-state) scales, and cheap, customized, on-demand generation of totally fake text + photo + video propaganda based on simple prompts, cannot all co-exist. At least one of these isn't going to survive alongside the others. If we just let things keep going the way they are, I expect "liberal democracy on large scales" is the one we'll lose—and whatever follows probably won't let the fairly-free, global Internet keep existing, either, so we'll lose that too.
> You can put letters in any order you want and make them say any damn lie.
You can run a web server by responding to every request by hand-typing the response, too. But you couldn't realistically run one-one-millionth the modern Web that way. You can't have global-scale e-commerce that way, et c. Some things that technically could work that way, can't actually—it's too slow, too expensive. This is very much one of those "quantity has a quality all its own" things. Increase the productivity of every astroturf-poster or propaganda-front-news-site manager a few hundred times and that's a big difference.
> I am as concerned as the next guy but throwing the towel already seems a bit premature?
Where'd you get throwing in the towel? I do think we're (especially the US) really unlikely to do what we need to in time, in part because measures that are probably necessary to defend against this are themselves risky and rather unappealing. But we might.
That assumes that democracy can survive overreacted skepticism.
If you don't trust anything, you ... don't trust anything.
How can institutions survive? We're already living through a moment where large swaths of the population cannot get a fix on the same reality as their neighbors.
I would argue that this is happening precisely because the gullibles are being led into playing skeptics.
> cannot get a fix on the same reality as their neighbors
I believe this phenomenon was way more severe in the past. The reason you notice it now is because these meme cones are rubbing up against each other in public forums. In the past, isolated communities wallowed in their own wacky shit totally unchallenged. Pick any major religious movement of the 19th century and tell me they had a firm grasp on reality?
As a counterpoint, most cults of modern day have found digital guidance and strength where before they would not have.
Take this site, even. Ok, it's less cult-y as more people learn of it, but in the past there wouldn't have been anything close to the sort of community you can run into here. Maybe I've run into one, two people in real life as "in the know" as what you typically find here...
> The reason you notice it now is because these meme cones are rubbing up against each other in public forums
Yeah, it is true the amplification effect makes crazy stuff like the church of Scientology public knowledge, but there's also just that there's too much noise to keep track of all the special little brands of cult-ish "crazy". It's really hard to gauge the effect here, unless you have a study perhaps you could link? That would be quite on topic here.
I have heard this before, but I have several reasons to think it is not going to be a problem above what already happens:
1. AI lets you generate an enormous number of lies, but what is really dangerous is one well-placed lie within a trusted stream of many truths. CNN will retain a power to mislead far in excess of Twitter bots.
2. Democracy averages over everyone's confusion, which means lies are only dangerous when large numbers of people believe them at the same time. Hordes of bots generating spurious lies won't move a democracy in any specific direction, but again, mass media will retain its power to mislead everyone at once in the same, effectual, direction.
3. People have never respected the veracity of random tweets. In the same way that trust in mainstream media outlets is reaching record lows due to their consistently biased reporting (they might not all have the same bias, but I can't think of any I'd consider free of bias), everyone will learn to adjust their incredulity to match the true quality of random tweets.
4. Companies like Twitter and Google are known to be shaping their results and algorithms according to their own "political views" (broadly construed) so at worst this would represent a partial shift of power from the old masters to new masters quite like them (social media companies). In many ways trimming the front page to reflect editorial opinion is echoed in the way Twitter trims its feeds to reflect their own editorial opinions.
All taken together, it seems like the media is afraid that equally large companies with similar business models (content, attention, advertising) might end up eclipsing them. The same old model where the TV station is afraid to upset its advertisers, thereby giving a voice to business interests, is well-recorded in the recent history of YouTube. Not so much will change, although seeing it in its old and newer forms might shed light on how it works.
> CNN will retain a power to mislead far in excess of Twitter bots.
The major mistake here is thinking that these two things are mutually exclusive competition. They're not.
In 2015, an internet rumor started spreading that Obama was going to invade Texas. This rumor was based on the fact that there was a routine military training exercise taking place in Bastrop called Jade Helm. It started on Facebook and was quickly picked up by the mass media. These rumors became so viral that the Texas governor activated the Texas State Guard "just in case".[1]
Later, the US government alleged that this was a planned disinformation operation carried out by the Russians and was a precursor to later operations.[0]
Twitter and Facebook started the conspiracy, but mass media laundered it and made it look real enough to get a governor to act.
Whats your reasoning on this? Because i dont see why liberal democracy would cease to exist... life would go on if we all know that all pictures can be fabricated. I think this is already the case without AI.
Apparently you missed the problem Deep Fakes posed...
If you cannot distinguish reality (well), and in fact it becomes possible that most things you see do not exist, then there is nothing to stop a bad actor from producing a fake version of events in which they are elected, control everything, etc.
So, democracy would cease to exist, because democracy relies ultimately on a choice - if you have no choice then you do not have democracy, only a dictatorship.
Photoshop has existed for years and humans have been manipulating photos for longer, what's the difference, really?
If I see a photo in the Guardian newspaper (or any other reputable news outfit) I'm going to presume it's real, and I expect journalists to verify that for me. If I see a random photo that doesn't look quite right on a 4chan, I'm not going to immediately assume it's news.
Reach is no different, bots and humans are able to post to social media, and cost is probably no different at the moment either since AI isn't perfect, some human interaction is probably needed to make it believable, and because of that, scale is the same too. I think we're approaching all of those things but it's probably still quite some time away until a machine can be trusted to manipulate the public on its own.
> For a Linux user, you can already build such a system yourself quite trivially by getting an FTP account, mounting it locally with curlftpfs, and then using SVN or CVS on the mounted filesystem. From Windows or Mac, this FTP account could be accessed through built-in software.
Right but today we have that problem already. We know that a bad actor journalist can write a fake story. We therefore require sources. If deepfakes come along we will know videos can be fake and so we will be skeptical, as we are today, and look to proper sources. We will easily come up with some way to validate sources via cryptography or org reputation (e.g. we might trust the NY Times to not just fabricate things)
This is already barely holding together with mostly human actors doing the astroturfing and creating bullshit "news organizations" expressly to spread propaganda. Automation is going to overwhelm a system that's already teetering.
Yeah but we will get the word out that none of that is trustworthy, then. There will be countermeasures and reactions to this just like previous things. It will certainly be effective to some degree - propaganda is effective for sure - but it won't just be, oh, there are deepfakes, everyone will now just unthinkingly accept them.
1) This effective backlash/education-campaign has not already happened despite there already being significant problems with this kind of thing, and most of it not being that hard to spot, even, and 2) I think the more likely effect is the destruction of shared trust in any set of news sources—we're already pretty damn close to this being the case, in fact. "It's all lies anyway" is a sentiment that favors dictators more than it does democracy.
> This effective backlash/education-campaign has not already happened despite there already being significant problems with this kind of thing
I think it could be claimed that we're clearly in the process of learning to distrust. People distrust media more now than ever [1], because they've demonstrated, again and again, that they can't be trusted. Trust in institutions are down to new lows [2], because they've also shown that they're untrustworthy and/or incompetent. Everyone is shouting "disinformation", "misinformation", and "fake news" constantly, because nobody trusts anything anymore, and rightfully so in many cases.
People already produce all kinds of fake news and doctored photos and false flags and all kinds of things. This has been going on since we developed language and photography I suspect.
People already have trouble telling propaganda from fact. That has been going on since forever.
At the end of the day I don't see this being a game changer. If anything, now video and photos are less evidence for/against something as the potential falseness becomes well known. Congressman X: "no, that wasn't me you saw leaving the hotel with the prostitute, my slimy opponent obviously is deep faking stuff".
And people will continue to believe what they want to believe, in spite of all evidence to the contrary, just like they do right now.
There seems to me a huge difference between a few organizations being able to produce & distribute a total of X amount of self-serving bullshit with some limited reach, and anyone with a bit of money being able to produce 100,000 • X amount of self-serving bullshit and deliver it to exactly the people most likely to respond to it the way they want, anywhere in the world (save, notably, China and North Korea and such) while making it very hard to tell who it's coming from.
An environment in which 90% of the information is adversarial is really bad. It's a severe problem and very challenging to navigate. An environment where 99.9999% of it's adversarial and it's even harder than before to sort truth from fiction, functionally no longer has any flow of real information whatsoever.
Maybe liberal democracy is not the final outcome of human civilization. You like it and I like it (presumably we were both raised to believe this way) but perhaps it's not really true.
Just to question a base assumption here.
It seems to me, if all the things that are claimed to threaten liberal democracy actually do, liberal democracy might be much less robust and long lived then previously believed.
Oh, absolutely. I've even come around to thinking that's likely. But one can hope.
[EDIT] One thing I no longer think has any realistic future is the open, semi-anonymous Internet. We're either losing it because despots take over and definitely won't permit that threat to remain unfettered, or we're losing it (in perhaps a gentler-touch way) because we have to to prevent authoritarian take-over and vast civil strife. I don't think we're getting to keep that no matter what happens.
Yep I think you might be right. It's ultimately too much of a risk to all sorts of powers to have open unfettered real time communication and mass dissemination.
Even the "good guys" will call emergency that will never end.
Oh well, it was nice while it lasted. An intellectual Cambrian explosion. And all that porn!
Take it a step further: Can you be arrested for having porn that would be illegal in your country if it was real, but instead it's a thousand generated images/videos? How blurred will those lines get?
Eh you're asking the wrong question - training sets are not made of gold, it might be hard to make good ones, but faking a training set with a program like this, resource intensive but possible.
This is already possible today and we don't need AI generated stock photos to do it. A bad actor can already spin events to fit their narrative, suppress dissent and control their population. Dictators have been doing it for centuries and we're seeing it in real time in the form of Putin's Russia right now.
Sure, but being able to do the same thing at 100,000x the scale for the same price seems like a pretty big difference. Throw in the ability to target narrow constituencies with custom messages via modern ad networks, automation-assisted astroturfing, et c., and the whole thing looks like a powderkeg to me.
Then everyone quickly realizes it's all garbage. The bigger danger is when people think garbage isn't possible, which has been the case since Stalin erased people from photos 98 years ago.
I disagree completely. We've had this ability to create photorealistic fiction for decades now. You can go see how impressive the capability is by paying $12 at your local movie theater. The only thing that's changed is the ease and cost of doing it, but that has also been dropping for decades. Having people think that it's not possible right now is what is dangerous, because it's absolutely possible right now, no AI needed.
Even if this is released quickly, the internet would be flooded with crazy images, and people would quickly learn to have a critical eye to the media that they see, which should already be the case, because photorealistic fiction is already possible, as everyone experiences with every blockbuster movie. Only a slight shift in perspective is required to realize that every image and video you see aren't any different than frames/clips of Thor: Love And Thunder.
Anyone can be an artist, musician, photographer, writer.
It's going to result in more content being created, which will change the economies of content. Rate, scale, and volume of production will increase by orders of magnitude.
Disney thinks IP is a war chest. That's an old way of thinking.
Star Wars won't be special to the new kids growing up that can generate "Space Samurai" and "Galaxy Brouhaha" in an afternoon.
We're going to hit a Cambrian explosion of content.
"It's going to result in more content being created"
Is it, though? This model took over a month, on extremely fit hardware, to even create.
Lets say for a second, in some hypothetical future, that anyone can access/use/update these models (by anyone, I mean someone with both low amount of resources as well as little to no programming skill), why are they creating content?
"Rate, scale, and volume of production will increase by orders of magnitude."
If by production you mean "paid creation", I'm not so sure about that. In this world where everyone creates content from thin air 1) there is little to no monetary value to the content anymore (as monetary value inversely correlates with scarcity) 2) So there is less incentive to create anything, because there is no monetary value to doing so.
In fact, by definition we can pretty much prove that not much of anything will happen in this regard, because content is already limited by budget - the budget has not gone up, and the return has only gotten worse (in this hypothetical scenario).
What I think is more likely to happen - a few, "blessed" individuals will have out-sized content creation capabilities, without much need to innovate. The rest of us will have almost no incentive to create anything as a result.
Disney will use these systems, and they will use them to churn out more garbage, faster, on average, most kids will not be generating any movies in an afternoon.
I don't think that's even remotely what will happen.
> This model took over a month, on extremely fit hardware, to even create.
In a year kids will be training them on colab. This technology is moving so fast, it's jaw dropping. If we went back in time just ten years and showed ourselves this stuff, we'd think it was magic.
> why are they creating content?
Why does anyone post anything at all anywhere ever? Why do people edit photos and videos? Because it's human and people want to communicate.
> there is little to no monetary value to the content anymore (as monetary value inversely correlates with scarcity)
Maybe, but there's still a "search" problem in finding content that activates a response (and attracts the algorithm and views). Look at TikTok and YouTube. High rates of content creation. Immense value in both platforms. Lots of crap, but astoundingly good content mixed in. Enough to keep the whole world glued to it.
> So there is less incentive to create anything, because there is no monetary value to doing so
Nope. TikTok, YouTube, kids making Minecraft mods as a labor of love and hobby.
> content is already limited by budget
Per unit cost goes down by orders of magnitude. I don't see this resulting in less production.
This is a Henry Ford moment.
> What I think is more likely to happen - a few, "blessed" individuals will have out-sized content creation capabilities, without much need to innovate.
The market would disrupt this so fast. There's no ways these tools won't find themselves in the hands of everyone.
> Disney will use these systems, and they will use them to churn out more garbage, faster, on average, most kids will not be generating any movies in an afternoon.
Kids already make their own narratives in VRChat, Minecraft, and Roblox. The next systems will be a holy grail for them.
I'm interested in looking back on this thread in ten years. I think my predictions are correct.
> Why does anyone post anything at all anywhere ever?
I think you'll find the majority of the artwork community still favors, pen and paper, pencil (maybe oils). They do this because, at least in part, they appreciate the challenge, it makes them better as people, in some fashion.
I use tablet, sure, some use 3D modeling, even. People don't create pencil drawing because they are good, or sell, they do it for personal reasons. I think you're assuming this is going to solve some problem, it's really not - the people interested in content creation for content creations' sake are not likely to benefit.
> Nope. TikTok, YouTube, kids making Minecraft mods as a labor of love and hobby.
You're assuming because content platforms like TikTok exist now, that something similar must continue into the future... I see this as a flawed assumption. Regarding monetization, my point hold well here, though. For every "successful" Youtuber raking in millions, there are hundreds, thousands for whom it just didn't work out.
The content on these platforms is already often mundane, you don't see the promised plethora of content creators, because the system for the most part favors the few, not the many. Sure, everyone can upload, that doesn't mean most people do.
> If we went back in time just ten years and showed ourselves this stuff, we'd think it was magic.
I remember ten+ years ago, there was content on the web. Now, it's extremely centralized, most of it is not that good. So yeah I can time travel a bit, and things got worse. The reasons for this aren't all AI, but as trends go creating a program to do photoshop or video automatically isn't going to make things better. You're not solving the right problem here, and the people who stand to benefit are not the people you seem to think will.
> I think you'll find the majority of the artwork community still favors, pen and paper, pencil (maybe oils).
A field that has existed for centuries versus something in its larval stage that is still growing legs.
Calculators reached more than slide rule users.
> They do this because, at least in part, they appreciate the challenge, it makes them better as people, in some fashion.
I would love to do art, but life is full of opportunity cost. You can't do everything.
This technology will enable so many people to create art without having to devote thousands of hours to practice. That's a good thing!
I'm sure many artists used to say the same to artists that utilized tablets when they first came about. Or C programmers joking about early Perl and Javascript programmers.
It's rhymes a bit with the Luddite argument.
We're seeing the emergence of new workflows. Few people hand wash their clothes anymore, and we're not worse for it.
> the people interested in content creation for content creations' sake are not likely to benefit.
I'm telling you that I would directly benefit. I love creating content, but I can't draw. I don't have the time to learn. These tools will empower me, and they'll only get better with time.
> For every "successful" Youtuber raking in millions, there are hundreds, thousands for whom it just didn't work out
There are more people making money creating content now than in any time in history. These tools and platforms have democratized access. People now have a path where there wasn't one before.
These platforms and tools don't, however, remove the requirement to be good at what you do and to find, understand, cater to, and grow your audience.
> Sure, everyone can upload, that doesn't mean most people do.
90% rule. There are still more creators today than in the past, and the field will only grow from here.
> Now, it's extremely centralized
That's orthogonal. I agree with you about platforms and centralization being a bit of a problem, but for what it's worth, the platforms do make it easier for most people.
> I'm telling you that I would directly benefit. I love creating content, but I can't draw.
This, as you put it, orthogonal. The question is not whether some small set of individuals, such as yourself stand to gain or not. That is in fact what I surmise a very few will actually gain, on the whole this is not a boon.
Other orthogonal arguments could be made regarding other orthogonal avenues you have brought up e.g. Javascript being good (terrible for the environment, in part responsible for a large amount of e-waste).
Bringing the term luddite into the equation rhymes something with sensationalist....
Being not OK with inappropriate technological uses is not synonymous with being anti technology, tools have purpose, but if you murder someone with a knife and then claim people are luddite for decrying your actions, because progress and being able to cut things...
>These tools and platforms have democratized access.
Or at least have created the illusion of such.
> I don't have the time to learn.
No pain no gain, as the saying goes. I don't think this will even be as useful to yourself as you think, you will as you put it, need to still be good, but your unwillingness to put effort into something you otherwise claim to value indicates you are unlikely to benefit at all, your work is likely to be subpar compared to those of your peers.
It's OK to not be good at everything, and rely on others, that's the foundation of a society. But claiming you will benefit from something that a tool does is like claiming that you will become rich from hammers putting nails into wood - sure we benefit from not smashing the nails into wood ourselves, maybe society benefits from better constructed houses, even, but the hammer itself does not make you a carpenter or even a construction worker. It will not benefit you in that fashion.
Which is my point, the tool in question does not democratize making movies, I bring relevant issues in the realm of current video distribution, and you want to claim it is orthogonal... I'm not sure you understand just how much it won't benefit you...
Another way to think about it, if something is a problem, e.g. "I cannot draw", think "Maybe it's much worse than that", and you might begin to uncover the root causes of your issue. Acting prematurely, much like premature optimization, is the root of all evils, if you will. To bring it back to topic, if you create something like an image AI without properly examining what it is you are trying to accomplish, you are liable to create more harm than good. A lot of tech is like this, knee jerk scratch-an-itch, and it shows. Not an Apple fan, but they are a good example in industry, they took time to design well and the result is a more stable, more power efficient system. Android, as much as I love the platform, suffers from a million knee jerk decisions across industry, resulting in a much higher proportion of digital garbage, some virtual some literal landfill.
I have some extremely detailed imaginary images and clips in my head that I just don't want to devote the thousands of hours it would require to become proficient enough in drawing and visual effects to create them.
dall-e is start giving full commercial license , i think in the end both will converge you will prom something the AI will make 100 prototypes and you will improve the one that you like whit help the AI.
the line will blur, is not against the machine but whit the machine.
the problem maybe is that will probably need less people to do the same works.
is "fun" maybe is better be construction worker than artist, becouse the second will get obsolet for most case.
"Boardroom full of attractive business people gathered around a laptop with one of them pointing at the screen, all wearing suits, whiteboard in the background"
This will also democratize the market for comics. It used to be you needed to be able to draw to be a comic. Now you can just have ideas, and use This Comic Does Not Exist (which does not yet exist) to generate the imagery.
That's actually a great example. Just think of the aggregated human-hours wasted to bring those people together, create that setting, photograph, edit, publish... All for a meaningless flyer or landing page.
To what extent are we ripping off the photographers? Weren't the models trained on their hard work?
Have we reached a point where we've bounded art within the data models are trained on?
Have we imposed a limit on ideas as a realm of "what came before" and implicitly decided that any "after" is a pointless exercise without knowing whether that's even true?
this argument is like saying you kill 100,000 people, fine. that's generally hurting a lot of people but no individual person, so no ethical problems m8
Bizarre comparison. Each individual in your example is killed. Having more killed doesn't make it better.
In the photographers example, each photographer is fractionally damaged, with the more photographers the less the damage.
Similarly, GPT3 is ripping off the words of everyone who has ever written anything on the internet, but to such a miniscule degree that none of us could ever have cause for complaint.
Excellent questions, and I was thinking the same thing. In my opinion, AI-generated art or images are not as impressive as they might seem at first purely because there is no real imagination involved. It's an art simulacrum.
A more accurate title would be "This picture of food does not exist."
For a generative model to hit the quality threshold you see on this site, it’s going to need so much training data that you can presume there’s already a ton of stock photos that match that description that it scraped from the internet.
Why are these kinds of things impressive?
I think part of my issue is that I don't really "get" these ML projects ("This X does not exist" or perhaps ML in general).
My understanding is that, in layman's terms, workers are shown many, many examples of X and then are asked to "draw"/create X, which they then do. The corollary I can think of is if I were to draw over and over for a billion, billion years and each time a drawing "failed" to capture the essence of a prompt (as deemed by some outside entity), both my drawing, and my memory of it was erased. At the end of that time, my skill in drawing X would be amazing.
_If_ that understanding is correct, it would seem unimpressive? It's not as though I can pass a prompt of "cookie" to an untrained generator and, it pops out a drawing of one. And likewise, any cookie "drawing" generated by a trained model is simply an amalgam of every example cookie.
What am I missing?