Good to see LucidRains get the love he (rightly) deserves. He's a beast!
As a thank you to him -- he also does for work for commission/etc, check hus GitHub page for more info. I'm not fiscally or currently otherwise directly linked to him too closely, I've just hung around a while and think he deserves far more credit than he gets. This is literally the smallest piece of the pie of what the man does across several subdisciplines, send him a thank-you please, if possible!
Two reasons:
1) Even though it's all technically very impressive, so far there's not a huge amount of commercialization potential here. OpenAI is charging for its GPT-3 model but its revenue is probably negligible next to the hardware costs (sunk + ongoing) to train it in the first place, let alone the researcher salaries they're paying
2) Most of the stunning examples are cherry-picked. These things fail much more often than they are willing to admit, and they're probably assuming (correctly) that not enough people are willing to pay for something that only-sorta-kinda-works 1/3 of the time, when you're holding it the right way.
I'm currently working fulltime on AI-powered design suite Accomplice (https://accomplice.ai) and if you ask me on a good day I would tell you I do think there's already huge commercial potential. On a bad day, though ;)
My current approach is a "model marketplace" (https://accomplice.ai/models) where the most popular open source text-to-image models (VQGAN+CLIP, Disco Diffusion, DALL-E Mega coming soon…), sit alongside the most popular open source style transfer models, and then finally I have the ability for a user to finetune their own models using a simple drag-and-drop tool (https://accomplice.ai/no-code-model-training).
So theoretically if there were a searchable marketplace of 100s of different finetuned models people could choose from, they would use it much like an iStockPhoto and be able to create the kind of images they want instead of just downloading them.
But it's of course a constant work in progress. Slowly growing though and lots of promising stuff ahead!
I meant commercialization potential for companies like Google, where anything less than a hundred million is probably a failure :) Hopefully for the non-Googles of the world (i.e. you), there's a good pathway forward!
I wanted to try your site, but after clicking on the link in the email, it tries to send me through some redirectingat.com link which is blocked by my ad blocker.
Update: Ah, found the "Tracking" setting in Sendgrid that I thought I had already turned off. Off for sure now. Thanks again for the heads up!
The confirmation link should only be going to accomplice.ai unless Sendgrid is doing some link tracking that I've just forgotten about. Could you forward that email to adam at accomplice dot ai if you get a chance. Thanks for letting me know!
That will get turned back on automatically in an "update".
Same thing happened to me multiple times across multiple platforms: SendGrid, Mandrill, MailJet, MailGun. I always turn off the tracking (enabled by default on all of them), but magically its back on a few weeks/months later. I've given up finding a solution and just revisit my settings every few months to check on it.
Most of the blue eyed ones have the Eyes of Ibad from Dune. (And several of the ones that aren't blue eyed also have weird things going on in the sclerae.)
The two “african american” ones look south or maybe southeast asian (and the one of those that is a “young...girl” looks like a, maybe young, adult.)
All the ones without a racial/ethnic prompt are white, and disproportionately blue eyed (again, including sclerae.)
(It indicate “diverse”, and yet all of the examples read white or Asian, though the unlabeled darker-skinned male figure in the group of six at the top is ambiguous enough to be plausibly be something else.)
The “beautiful woman with curly red hair” has rather radical facial asymmetry, and straight to slightly wavy hair.
Why would people do this when aspiring artists are practically giving their real photos and paintings for free on places like deviant art? Why further commoditize something that's already been commoditize to practically free?
I'm an artist, and I'd absolutely love to use something like this to inspire me or to give me something to continue working with on my own.
In one sense it's kind of like a much "smarter" photoshop filter, where it can make your own art/photos look more like what you want (ex: Van Gogh, Dali, Picasso, or combinations of those, or something completely weird/new/different).
You could also train the models on your own work and have it generate art in your own style that could inspire you or could be useful to you either as a base to work from or that you could take interesting elements from to create new art.
Similar things can be done in music, by the way, and that would be really useful to musicians too.
Poets could use something like this to create poetry, novel writers to write novels, etc..
This is really an improvement on the collaboration potential between humans and computers -- which is probably why it's called "Accomplice".
Etc etc. AI can make all this stuff easier. And you have a sense of ownership over what you create. All in one place where you can collaborate on all of it with your team. I feel like that's valuable. It's certainly a tool I've always wanted.
But, also, as a bit of an aside – if the Googles and OpenAIs of the world are just going to bite every artist's style anyway with a mostly black box service and training set… it feels like the option for an artist to train/finetune their own model, promote it and possibly make money off of that is worth trying.
Seems exactly false. DALL-E 2 seems to end much of the illustrator industry and if the endless array of Twitter posts from early adopters are any indication it works great.
- being open is kind of just how things in ML generally work right now, it's in stark contrast to things like chemistry or physics where paywalls are pretty common
- it's a matter of clout, ML is moving ridiculously quickly, with work from just 5 years ago being considered outdated in terms of capability, if you don't publish, someone else will and they'll get the credit. This likely also matters for the researchers since they get credit too. In a sense this is just publish or perish culture from academia.
- it's also somewhat about hiring, which is related to the clout. By putting out this kind of research, they're attracting talented engineers to consider working for them. This of course is pretty relevant to the rest of their business, especially given how heavily Google leans on AI to handle moderation.
I think it is because the proprietary part for them is the data, not the particular algorithm. They benefit more from other people making advances on their technology because they have the data to get more benefit than anyone else. If they kept it to themselves, they would get no "free" advancement. So they trade-off the secret of the technique in the hope that others will advance the technique, making their data more valuable.
What would the product be? My experience is that most ML papers are very brittle, for every cool result/example you see there's a plethora of nonsense spit by the models.
You're saying it's Google who have done this research. In a way that's true. But really it is Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet and Mohammad Norouzi who did it, with material support from Google.
It's likely that some or all of these people would have refused to do the work they do if Google kept it all as their secret sauce.
And moreover, there are excellent reasons why they wouldn't want to. It's not just the obvious that if it all were secret, they wouldn't be able to use it in their non-Google career advancement. It's also that research without the freedom to talk is far more difficult and frustrating.
On paper, scientific papers are supposed to document the whole of the discovery/innovation. So you might think that an insider, who got to read all the secret Google research papers AND all the public ones would have an advantage. But problem is, even the best written papers with full code and comments inevitably leave out things, especially of the "why this and not that" type.
If you're a researcher in the free world, you can just ask. Especially if you have a public track record of great papers yourself, they will WANT to talk to you. You can learn so much more from the interactive process of back and forth questions than you can from a static piece of information like a scientific paper.
If you work for a secretive and command-driven organization, you need to be careful about what you reveal of your own research when you ask. You can't talk freely. The thought of having to justify your communication to some old-school IBM lawyer type, is going to chill even the most enthusiastic reseacher. It's easier to just stay in your own corporate bubble, and focus on the things your corporation does well since at least you can talk freely to your colleagues (although in really paranoid organizations like the NSA or old IBM, even that may not be true). But then at best you specialize, at worst you fall behind.
Google started publishing for several reasons but the primary one was recruitment (showing off was a secondary goal). The mapreduce, GFS, and bigtable papers played an important role in attracting an early generation of distributed computing/high performance computing people from around the valley and CMU/MIT, who helped build the second really successful versions of the web search engine (retrieval and ranking), ads serving (the auction, the logs joining pipeline), etc.
The other reason is that the leaders at Google at the time believed that we would achieve the singularity faster if Jeff Dean periodically sent ideas back 10 years in time to Doug Cutting.
There is a dataset of 5 billion image-text pairs (laion-5b) scraped by various parties. This can then be filtered and used to train these models. Cost is a bit of an issue but there are orgs that have provided compute for open model training. And Imagen is nice because the text encoder part is already available and doesn't need more training, so it would just be the diffusion model components being trained. I'd guess we'll see a biggish training run starting in a few weeks.
Four or five figures I'd guess? I'm not clued up on costs/performance for TPU stuff to give a better estimate, but guessing at a week on a 256 TPU pod, call it $30k?
You are off by an order of magnitude at least.
256 TPUs-v4 (not pod), would cost you around 20k$/day.
They actually used 512 TPUs (256 for base model + 128 for each of the two superresolution models).
Assuming an average training time of 1 week as you said, that gives us about 280k$.
It's also most likely trained for longer than a week, the base model for Dalle-2 was trained for 100-200k GPU hours, so between 2-4x longer than that, we can guess this is roughly similar.
You also never successfully train everything first try, so all in all, to replicate this work just from the paper, we are talking about at least 500k$.
While that is what they did, they also used a batch size of 2048 while training. This is just to speed training up, not a hard requirement. It's easy for Google to justify more money on compute to save engineer iteration loops.
I'll have to read the paper for more details, but it would almost certainly cost less (and take longer) to train a model like this in a more resource constrained situation than Google faces .
Increasing batch size does not increase the cost of your training. The opposite actually: With bigger batch size (to an extent), models tend to converge slightly faster so you need less GPU hours.
As for the rest, training for 1h with batch size of 2048 on 10 TPUs, or training for 8h with batch size of 256 on 1 TPU has the exact same cost, the cost is just spread over a longer time.
Couldn't a bunch of us shell out $5000~$50,000 and do this ourselves? Create a non-profit shell corporation outside US jurisdiction, issue shares, raise funds and open source the result?
The shares would simply be votes towards future training dataset endeavors as no profit would be booked here. Say you buy 5000 out of 500,000 shares, that would give you 1% voting power in what dataset to train.
Are there prefiltered derivatives of Laion-5B available? I can imagine various contraindicated categories you might want to avoid entirely, as well as biases you might want to adjust for by balancing classes in the data (5 billion images gives you a lot of room to balance the dataset).
I feel like the fact big porn hasn't poached talent and jumped all over this suggest at least 10s of millions. That said some for profit no-rules deepfake service for disinformation and illegal content has to be in the works.
There's a company in Montreal that makes that in a month and also has access to copious amount of said datasets on their servers. It may or may not be that they already have engineers on it. We have no way of knowing since its a private company
In fact, this kind of reverses things, doesn't it?
Open source is built on the assumption that you can do more with source code than with binaries. In the case of AI models, the computed weights of models are what's valuable, and the source code used to achieve them is less useful.
> DALL·E would cost $131,604 to train on AWS, assuming a p3.16x-large at market rates. Could be as low as $40k if you already paid for reserved instances.
I did the pip installs and installed Cuda.
I changed prompt and ran the sample code. It ran to completion. How do I save the image from trainer.sample?
Couldn't you just scrape porn to get copious amount of dataset? Scope would be narrower and thus require less classification and in general it has common themes.
I'm more concerned how expensive it will be to train it on GPU instances. We are looking at A6000s right? That's like $15/hr.
You do realize web scraping has gotten very cheap at scale and easy now too. It's an after thought for me, I'm more concerned with the economics of training, it can't be cheap
You don't realize how hard it is to make an inclusive, cleaned up dataset. Take a look at this Notion from BigScience detailing their workgroups. Three of them are related to preparing the dataset.
In my experience, scraping the data is the easy part. Once you've scraped it you've got to get rid of all the garbage, which is where the issues arise, especially if you're just blindly scraping everything you can find.
For example, in a generative model I'm working on, I have a dataset consisting of ~5M images just blindly scraped from a website. After filtering, this drops down to ~500k images, yet a model trained on that performs worse than one trained on a set of 30k curated images (picked based on a manually evaluated list of "known good" artists), where filtering brings it down to ~18k images. The larger dataset, while containing more information, also contains more errors, many of them pretty hard to filter out.
I was writing up a whole explanation for how that couldn't possibly work, but tbh you aren't really wrong with that, after all captchas these days do similar work.
I guess the primary concern with a porn model would be the ethics of it, which might turn off any company from helping out on training resources (for example, Google's TPU Research Cloud requires you to follow their code of ethics on AI, which would be very difficult to do with a topic as sensitive to people as porn).
I'm not someone who thinks sex work is inherently exploitative, but in practice it usually is, and many of the efforts to stamp it out just exacerbate the power imbalances that lead to exploitation.
We're seeing a lot of garden variety capitalist exploitation happening with ordinary mainstream training data production and labeling work, so I would expect anything adult-adjacent to be correspondingly worse, even though it doesn't have to be.
You are going to need to figure out how to filter out copyrighted works and images for which you don't have permission (eg, someone uploading a picture of an ex partner).
If its in public domain, there is no implicit right to privacy. It's going to be super difficult to claim derivative of copyrighted material going forward.
Training an AI requires high quality, clean/normalized data, which is very difficult to do at millions/billions of data points and is very frequently done incorrectly with silent failures.
There's a section of the internet where you can easily find billions of images or can be generated from moving pictures. Even upscaled. I really didn't expect to have to spell it out.
hint: they are all about one thing and there are a lot of eager volunteers to help on those websites. It would be easy to "normalize/clean/classify" because the pictures would have a consistent theme, thus reducing the amount of parameters.
We are not trying to generate elephants getting railed on a SpaceX rocket flying in oil painting style (although I'm sure theres people into that and its not my place to judge), we are just trying to remove the human cost out of this necessary evil.
I can't believe nobody is investing in "DALL-E-2-4-PORN". This sounds like an X amount of money thrown at a hugely sticky product that can be iterated (with the current trend in hardware) to the point where it literally generates billions of dollars in revenues for ages to come (no pun intended).
Does anyone actually want porn of fake, AI generated “people”? Seems like most of the demand would be deepfakes of real people, which is both highly unethical and a good way to get your business sued out of existence.
furry porn? vore? transformation? selfcest? there's a ton of niche fetishes that aren't really doable IRL, and only exist as art. generated images would be a game changer.
There’s also seemingly unlimited content of real people produced consensually by those people. What can DALL-E generate that you can’t find on pornhub or onlyfans??
Already does. It works by magic, we put images on one side and text on the other side and then power it up. Let it simmer for a few months at megawatt power levels.
Sure, you could use images you do not have the legal rights to if you do not release anything at the end / just leak the final result. But this is a lot of effort and resources invested for something that would effectively never be shared on the clear web.
Lol. Not the same thing obviously but your question just reminded me of @robotpornaddict, a neural net that watches porn and tries to describe what it sees.
They specifically mention in the Images documentation that one reason they haven't publicized this yet is due to porn or deep fakes risk. Assuming this works very well already.
To run we must learn to walk first. To walk we must learn to be erect. To be erect we must master crawling.
I don't think the leap is too crazy if we are talking short moving pictures without sound. However, when sound gets involved, this is where it would become very tricky.
Human faces were excluded on purpose from the DALL-E 2 training set in order to prevent misuse. I suppose the same will be the case here (or at least, in public-facing versions of the models).
Given how it renders dog faces, I don't see why it wouldn't be good at human faces too, if trained for it.
While the "it will never do as well as humans at this one thing I feel strongly about" bias is still a common one, even on hn, at this point I am fairly certain that we will soon all have to live with the fact that we are no longer all that special in regards to, well, absolutely everything.
One of my more interesting realisations over this development is that being human is apparently a religion to a lot of otherwise secular humans.
"we are no longer all that special in regards to, well, absolutely everything"
If/when general AI comes about, maybe so.
Until then all we've got are a bunch of highly specialized tools/helpers/slaves that may be good (in some sense) at one thing and awful at pretty much everything else.
You could argue that humans themselves are an ensemble of such highly specialized parts that are more than the sum of their parts in some ineffable way that's more of a "we know it when we see it" than something that's formalizable. Machines/computers lack that... so far.
Korea. Hardcore pornography is also illegal there.
Keep in mind this is a country where if you leave a bad review after you get scammed by someone with evidence, it is defamation. So not quite leadership the world needs in this industry.
Really sad to see ppl on HN flagging all of my comments on this thread. I mean it's not like you can't find celebrity deepfakes including Kpop.
The cat is out of the bag and its only going to get better and faster from here whether some cultures/jurisdictions take offense or not.
As a thank you to him -- he also does for work for commission/etc, check hus GitHub page for more info. I'm not fiscally or currently otherwise directly linked to him too closely, I've just hung around a while and think he deserves far more credit than he gets. This is literally the smallest piece of the pie of what the man does across several subdisciplines, send him a thank-you please, if possible!