One thing I’ve not seen mentioned/asked about DALL-E 2, which I’m really, really intrigued about is what hardware its typically running on and the average time it takes for image generation? Maybe I’ve somehow missed it, but I’m really intrigued by what their hardware set up is and how long it on average takes to generate the eight images from a users text input? I’m guessing it’s really quick, but I honestly have no idea… How does this system scale? Typical cost per image generated?
In case anyone else is put off by the link referencing an answer that then links to something else with most likely higher hardware requirements that are not stated, the end of the rabbit hole seems to be here: https://github.com/openai/dalle-2-preview/issues/6#issuecomm...
TL;DR: A single NVidia A100 is most likely sufficient; with a lot of optimization and stepwise execution a single 3090 Ti might also be within the realm of possibility.
How feasible is it to evaluate the quality of chosen parameters/settings without going through an extensive training? I mean <BIGNUM> GPU hours to discover your results are meh, jesus.
(Small timer with absolutely no idea of practical ML here)
Also very interested in this. AFAIK, the best alternative to DALLE-type generation is CLIP-Guided generation (such as Disco Diffusion [1] and MidJourney[2]) which can take anywhere from 1 - 20 minutes on an RTX A5000.
I seem to remember once they finally found a way to wake up in the real alien world it was a bit different than the group had imagined it, which was a nice touch.
Certainly one of the creepier TNG episodes even if it was a one off meant to cash in on the popularity of alien abduction in media at the time.
The overlap between the hardware requirements for model training and web3 mining has me thinking could there be a coin where the proof of work could be training a model ?
Unfortunately ML model training is too easy to 'cheat' - ie. To look at the test data to fine-tune the model to overfit a lot, do really well at test time, but actually be useless for real world stuff.
To work out, you'd need a way to keep some test data secret, which in turn kinda kills decentralisation.
You could do something like kaggle does, where you have some test data but the final score is determined by a more complete test data that you don't have access to.
AFAIK it works like this: you have a test data you develop against and some secret (bigger) test data that you only have access for the final score. While you are developing you can overfit if you want, but then probably you won't perform that well with the secret tests. What you are meant to be doing is to perform with the test data without overfiting. Even if it is not optimal, it mostly solves the overfiting problem. It might work for a cryptocurrency too.
You don't need to test that they perform well, just that they perform the same (for algorithms that should be bit for bit identical) / similarly (for ones that are less so). If multiple people train the same thing and the results are the same you could trust they have faithfully run the training as asked. That rewards "train as asked" rather than "train and get a good result".
Lack of trust there can be addressed with a few other things like staking, and a broad desire for all miners that people trust the system as a whole. Quite how to design those things is a complex problem but not insurmountable I think.
Disclaimer, this kind of decentralised more useful kind of work is something I'm investigating now.
Many coins are not really decentralized , there is some centralization anyways as we recently heard with the Ronin bridge attack or many examples from before.
I don't think there's much overlap. Crypto stuff is all integer maths as far as I know (maybe there are some weird floating point coins?) but ML is all low precision floating point. ML training hardware has a ton of powerful FPUs but much less integer compute.
Interesting idea for proof of work, a large nonlinear function where you find weights so that f(x_0) = x_1 < x_0 or something like that. To exploit ML advances.
I meant the hardware not the math, while there are new GPUs which disable mining, and there are GPUs and ASICs designed for mining, the fact is last 5 years, the GPU market has been hogged by mining.
Potentially, I was recently introduced to this: https://bws.xyz/
Not affiliated and haven't researched yet but looks interesting. There are a few other crypto mining companies offering general compute to manage changing power/price/hardware resources.
There's been some discussion below on OpenAI not being open. Contrary to most arguments, I have to share that I like their current approach. We can all easily imagine the sort of negative things that could come out of the misuse of their models, and it is their responsibility to ensure it drips into public domain rather than just putting it out there. It also allows them to more carefully consider any implications that they might have missed during development. And even when you do have access to them, at least in my experience they make sure to properly caveat some uses even during runtime (eg, in their playground for the DaVinci model). So while I'm not associated with OpenAI in any way, I would like to go out and say I broadly support their "closedness" for now and hope we can all be in a place in the future well release of high-performance models does not need to be so carefully considered because people would be more reasonable overall.
> We can all easily imagine the sort of negative things that could come out of the misuse of their models, and it is their responsibility to ensure it drips into public domain rather than just putting it out there.
Can we? I’m not sure about that - one issue being that they’re reproducable, so other people can recreate them without constraints. It turns out to not be that hard.
I think a better explanation is there’s a bunch of spare “AI ethicists” hanging around because they read too much material from effective altruism cultists and decided they’re going to stop the world conquering AGI. But as OpenAI work doesn’t actually produce AGI, the best they can do is make up random things that sound ethical like “don’t put faces in the training output” and do that.
Btw, their GPT3 playground happily produces incredibly offensive/racist material in the same friendly voice InstructGPT uses for everything else, but there they decided to add a “this text looks kind of offensive, sorry” warning instead of not training it in at all. (And to be clear I think that's the right decision, but man some of the text is surprising when you get it.)
Derailing a little here but I have to say I'm not a fan of implying this kind of 1:1 link between having an interest or participating in effective altruism and the AI ethics scene. The Venn diagram is not a circle, and the overlapping part may well be a loud minority. I get that "cultists" may be talking about just that minority but it doesn't much read like that.
Well, GiveWell and GiveDirectly are good, I've given to them.
But every time I meet someone who talks about this/has related keywords in their twitter profile/is in a Vox explainer, they always have strange ideas about existential risks and how we're going to be enslaved by evil computers or hit by asteroids or reincarnated by future robot torture gods. I think as soon as you decide you're going to be "effective" using objective rational calculations, you're just opening yourself to adversarial attacks where someone tells you a story about an evil Russell's teapot and you assign it infinite badness points so now you need to spend infinite effort on it even though it's silly. You need a meta-rational step where you just use your brain normally, decide it's silly and ignore it without doing math.
> It also allows them to more carefully consider any implications that they might have missed during development.
Do you have some proof of this? That they follow a thoughtfully considered ethical framework for each project? Because I never see that addressed, at all…
This is weird take in a thread whose very topic is an open source implementation of the model less than a full month after it originally came out. So obviously an evil mastermind can already copy them, and anyone strapped for cash to buy the hardware can hire people on fiverr to do the same thing.
I think the primary reason for the closedness is that it hypes up the research and makes it look more significant than it is. Same reason Palantir is allegedly happy about negative news coverage, it just makes them seem like James Bond villains.
Well you most likely need a CUDA compatible GPU with a non-negligible amount of vram to load the model. And a few hours of your time to mash your head at the keyboard in frustration because of the inevitable dependencies that won't want to resolve properly.
Most "open source" models don't actully upload the final models either, just the code they used to train them (not sure if this is the case here), so the next step after that is downloading 100G of data and running your workstation for a few days to train the actual model first.
I recall at one point I attempted to run BERT or GPT-J or one of those lighter language models locally, and it was going well until I realized I needed a 24G of vRAM to even load the model hah.
Do CUDA AI projects not containerize well? I understand that AI projects often have many, many dependencies, but this seems like something containers ("docker") would be ideally suited for. Something like `docker pull dall-e` and `docker run -ie -n dalle dall-e` probably volume mount in a local folder? I'm curious what the hurdles are here for packaging something like this.
They do containerize but it’s not enough. You still need the right drivers and CUDA stuff installed on the docker host, which can be finicky to setup. Not to mention figuring out how to get docker to actually pass the GPU to the container.
I see this has not gotten any better in the last 5 years. For future reference, if you as a dev are interacting with a product that is this hard to use and there is no other option but to use it (CUDA), you should buy their stock.
I think OpenCL support has been getting better, and it may be possible to run a lot of models with it, but that just doubles your already frustrating amount of hours trying to set the damn thing up.
> I'm curious what the hurdles are here for packaging something like this.
No hurdle, it's just that Docker is not part of the standard toolkit of ML people.
ML people all use conda (or virtualenv etc.) which already solves most of the dependency problems, making learning Docker not especially appealing.
But virtually all training/inference platform (including the ones used by OpenAI) are using docker. It's not a technical limitation.
That's not the issue. You can already download weights of other models through BitTorrent or from large archive sites like Hugging Face or the-eye. OpenAI just doesn't want to release them, so that they can sell access to it through their cloud service.
If you have a decent amount of VRAM, you can use it to start generating images with their pre-trained models. They're nowhere near as impressive as DALL-E 2, but they're still pretty damn cool. I don't know what the exact memory requirements are, but I've gotten it to run on a 1080 TI with 11gb.
EDIT: I also tried a 980 with 4GB of RAM a while back, but that failed...so you probably need more than that.
This uses several models. One of them is already trained and released, which you can use or you can train it too yourself. The other models need training. Training means having a labelled dataset and let the model run several iterations until the loss function has a good value in order to use the model. This code is quite complex as it is not one model but several ones and also you can make some changes to generate bigger images and to use better models for some parts. Once you have everything trained (you might need a GPU or a cluster of GPUs to train it for several days/weeks/months), you can use it.
I know this wont help you very much. But, either you are willing to spend a lot of time messing around with the code and have access to a good hardware to train the models, or you would have to wait until someone releases something already trained.
> Once you have everything trained (you might need a GPU or a cluster of GPUs to train it for several days/weeks/months), you can use it.
It takes a whole team to replicate an advanced model. Take a look at our open source working groups: Eleuther, LAION, BigScience. They worked for months for one release and burned millions of dollars on GPU (gracefully donated by well meaning sponsors).
The readme says they're working on a command line interface (CLI). You might want to wait for that if you don't want to learn PyTorch.
Once that's done it might be as simple as: install the package, and then find some training dataset, and then run the training CLI (for days or weeks or more), and then run the image generation CLI. Or even: just download an already trained model and use the image generation CLI immediately.
Is there a reason why the pretrained models are not included as well? Or perhaps this is coming at some point in the future? There are many users who simply want to play around with the functionality, who lack the time and hardware required to replicate the training steps.
As someone who has trained transformers with only ~5 layers on 1000 data point inputs for a decoder of length like 10 (so far fewer than the 12 billion parameters of DALL-E [1]), I learned two things:
1. Training is very finicky and time-consuming.
2. PyTorch model files are pretty large.
For point 1, when I was training a different autoencoder for example, I would run it for a week on 4 GPUs, only to return and see that its results are subpar. For point 2, I would get 100 MB model files for a miniscule transformer (relative to DALL-E numbers). Combined with the strong dependence of the transformer on the training data, this can make it problematic when it comes to sharing pre-trained models for tasks that the user doesn't specify a priori.
Large models generally require beefier GPUs and more RAM. Wide (not just deep) models require more GPU cores, total model size requires more RAM. Some models require multiple GPUs to train and/or run. Some require non-consumer server GPUs. These are non-trivial hardware investments if purchased, and non-trivial cloud bills if rented, even at preemptible spot pricing.
The main issue is that what neural nets are doing is just very fancy heuristic compression of vast amounts of data into a few gigs of vram (and a useful way to access it), so the smaller the model is the less it'll be able to do.
Most of the pretrained models I see for the stuff I'm working on (Coqui, ESRGAN, etc.) are really tiny in size. I'm sure the much better models are bigger but still fit under 8gb vram. It seems like there could be a market for models that fit into 8gb vram, but I'm not familiar enough with users of ML to say for sure.
Agreed. This feels like the next big step in open source machine learning. I would love to be able to dedicate CPU / GPU towards chunking out open-source GPT-3 / DALL-E 2 clones so long as the trained weights are shared with the world afterwards.
That feels more long-term productive to me than crunching coins.
For what it's worth, the architecture for seti@home doesn't make much sense any more in the modern era for most types of data processing. It's simply too slow and error-prone to transfer large amounts of data between user machines. Modern radio telescope data analysis happens at datacenters with a lot of GPUs in them, just like modern AI model training.
No, I just mean you design your system expecting the data from machine 1462 to get analyzed at some point, but that machine belongs to some random person who uninstalled your software without you knowing about it, so you never get back an analysis. Or you have bugs that come from 100 different versions of your software running because you can't simply upgrade the software on all the machines of the datacenter.
With no discount, one A100 is about 3$/hour, so we are talking about 600K$ for a full training. Probably closer to 1M$ since in practice you never train successfully to the end on the first shot.
He started quite a bit of other reimplementations, see https://github.com/lucidrains . Before Deepmind open sourced AlpahFold2, his implementation was one of only two available.
Based on the constant pushes I get in Windows to use MS services, I believe the software is just an advertising platform, and the actual moneymakers are cloud services and selling users' data.
You can use it to infill subregions. Or you could add a “but <your requested edit>“ to your and run again. It’s not super robust to prompts involving complicated composition though so that might not be a reliable strategy.
I agree; perhaps they are afraid of negative impacts of releasing their work to the public. Either way, I’m not sure I understand the name choice at this point.
Honestly, I think the main problem is, that training large models like this, or GPT-any, or anything like that costs a lot of compute power - which costs a lot of money. So if you're just investing a lot of compute but give everything away for free, that isn't a strategy which can be maintained.
So what we really need, are ways to train models in a distributed fashion. Their initial goal was to 'democratize AI'. But democratizing these models doesn't just mean giving everyone access to your trained parameters, it also means that everyone should be able to spend an effort on improving it. I think a good solution for this would be huge.
The training set is a one time computation though, running the model still costs cycles and people have to pay for it (see sites like Hugging Face or GooseAI). What people are asking about is the code itself and the one time computed training set. They're not asking to run the trained model for free.
Anyway, getting cycles isn't too hard, expect CoreWeave or another compute-centric company to hop on board in due course.
> So what we really need, are ways to train models in a distributed fashion.
How is the replication effort for this project handling that? They have over 600 people on their Discord right now, do they have some way for them to cooperate on training a single model?
GPT-J and GPT-Neo are previous open source reproductions of GPT, but distributed training is overrated. There’s way too much communications overhead. I’m not sure any of the various @home projects have ever accomplished much for that matter.
Instead they use donated time on a TPU or a GPU cloud provider.
An exception is Leela Zero since you can realistically have it play chess at home.
The could have chosen to work on smaller models and used their already substantial amount of funding to be truly open. Instead they chose a path that demonstrates the Open part in OpenAI was just marketing.
That was their fear with GPT-3 because it would enable whole "fake news" articles to be written without much effort. At this point why should they release trained models? AI developers have a problem now, either show their work (locked down and limited to protect their image), or don't show it. If they do show it, it'll be reproduced. What's interesting to me is how we are approaching the free open source use of these systems. An open source DALL-E 2 could create some very abhorrent imagery.
I still think what happens is that they started out on a mission of "we're going to democratize AGI access!" and then shortly thereafter, the not-crazy people managed to convince them what an astoundingly terrible idea that would be.
What do you use the API for? I've not really seen any interesting commercial uses other than customer support bots that are sightly more convincing but not something consumers actually want (they actually want the human that can solve their problem).
It’s important to recognize the definition of “open” that sama and team embrace versus the definition of open source and “free as in beer.” The early team gathered around the idea of achieving lift (as in marketing lift) of the idea of sharing research results and building upon open source. They do that really well, relative to MS Research and Deepmind. OpenAI was never going to focus primarily on open source or “free as in beer”—- their goal was to market themselves such to draw attention away from that. Well, I’m sure some of the team wanted that, but I’m also sure sama was more than willing to embrace his own definition of success versus that if the dev community.
ClosedAI is a bit harsh, but they have definitely both divided the community somewhat on the definition of “Open” and been very comfortable with focusing on their toys and ignoring the conversation. I mean sama, especially when it comes to Worldcoin, has been very direct about ignoring criticism in general.
How much does it matter? Everyone's practically forgotten that compviz and Midjourney were just in the past few months, Sberbank has ruDALL-E which largest model is somewhere between DALL-E 1 and 2, FB has Make-A-Scene which is somewhat below GLIDE/DALL-E 2, and Tsinghua's Cogview2 paper was released just days ago and is claiming roughly DALL-E 2 parity. AI capabilities are advancing fast enough that whether or not OA releases something, there will be competitors soon. (Heck, you want a GPT-3 competitor? There's like a dozen of them of similar or better quality, several of which are public or API - BigScience, Aleph Alpha, & AI2 come to mind.)
They were suppose to be a non-profit company that will help to develop AI. AI community is very open, and here you have someone different from OpenAI releasing an open source code from a research from OpenAI (because they didn't release their code)
Years ago I made a shared tensor library[1] which should allow people to do training in a distributed fashion around the world. Even with relatively slow internet connections, training should still make good use of all the compute available because the whole lot runs asynchronously with highly compressed and approximate updates to shared weights.
The end result is that every bit of computation added has some benefits.
Obviously for a real large scale effort, anti-cheat and anti-spam mechanisms would be needed to ensure nodes aren't deliberately sending bad data to hurt the group effort.
In a world where "no, mRNA vaccines don't change your DNA" has to be explained (often, unsuccessfully), I'm not sure I wanna try explaining digital signatures with elections on the line.
my peer at work didnt take the shot because he doesnt believe in "new technologies". hes an engineer, MSC, phd and worked as a researcher. I mean, I'm not talking about someone totally out of science
People who deal with complexities understand that experience is the encounter with the disruptive unexpected. Similarly to the Socratic principle, ancient counterpart of the Dunning-Kruger effect. Trust is heavily "modulated" in the experienced, it's kept at a distance. It's normal.
So, back to the topic, we may have a few ways to build new shields against the new weapons, but this will be not trivial in technical and social matters, and their interactions - having the involved parts adequately understand the new realities: "a document may not be just trusted", "there is no boolean trust value but a 0..1 fraction", "signatures may not be a total warranty"...
To some people, "If you received an e-mail using that sender's name it does not mean he sent it; no, easily his account was not hacked (etc.)" is already a massive blow to their inner world, that makes them throw themselves in the chair finding it unbearable if taken seriously and incomprehensible if considered.
And, some people will find it "less painful" to distrust /you/ than to break the laws in their inner world (it's a phenomenon you may also meet in this very forum) - some minds observe a regimen of strict conservative low expenditure. Complexity involves social dangers.
> Similarly to the Socratic principle, ancient counterpart of the Dunning-Kruger effect.
I don't remember ever hearing of "the Socratic principle" – what do you mean by that?
I had a quick google. There are only 12 pages of results for the phrase, i.e. it's not commonly used, and every hit for "the Socratic principle" I find seems different. The first four I found: "Virtue is knowledge", "follow the argument wherever it leads", "Wherever a man posts himself on his own conviction that this is best or on orders from his commander, there, I do believe, he should remain", and "Whenever we must choose between exclusive and exhaustive alternatives which we have come to perceive as, respectively, just and unjust or, more generally, as virtuous and vicious, that very perception of them should decide our choice. Further deliberation would be useless, for none of the non-moral goods we might hope to gain, taken singly or in combination, could compensate us for the loss of a moral good." If I had to say what I thought Socrates' main principle was, it might be "study humans and ethics, not nature/physics or maths". It doesn't sound like you meant any of those, but I'm not sure.
Apologies for the unintended obscurity - I thought the meaning would be apparent from the context and that mention of the Dunning-Kruger as a posterior counterpart. I meant that "the more you investigate, the more you realize that the most solid knowledge you have is that of how little you know"
(Socrates was said by the Pythia, the Oracle of Delphi, that he was the wisest man in Athens: Plato has he comment in the Apology that if there is one thing that makes him wise, it could be that he does not delude himself about his ignorance involving all the things he does not know).
To such "Socratic principle on wisdom and knowledge", the wiser you get the more you see limits in your knowledge; to its counterpart the D-K, lack of experience has subjects underestimate their ignorance.
People experienced in dealing with problems experience a large amount of unexpected obstacles, unexpected in the mental framework they brought: after some long while, difficulties and complications are expected preemptively, they are part of the picture one forms. Experience brings diffidence.
> People experienced in dealing with problems experience a large amount of unexpected obstacles, unexpected in the mental framework they brought: after some long while, difficulties and complications are expected preemptively, they are part of the picture one forms.
If you have some command line experience you can rent out some compute and then run it on rented hardware. I am renting an A40 with 48G of vram for $0.40/hr on Vast.ai right now to make cool images. I am still using Big Dream which I think is still based on DALL-E, not DALL-E 2. I couldn't figure out how to run this latest iteration (yet!).
Ballpark answer: ~5 years until this is running on local hardware owned by the average person. Hardware is always improving and the algorithms will get more efficient.
Until then, you'll likely be able to use an API to generate images within the next 6 months.
I expect all usages of this tool to border on fiction in some way. After all, the images don't depict anything real. They might accompany a nonfiction article as an "artist's impression," though, as so often happens in popular science articles about things we have no images of yet.
It's become increasingly common to start off articles with an image, even when the article is describing an abstract concept, just to avoid having a "wall of text." If that's what you need to do, it seems like a more interesting alternative to clip art.
I'd much prefer tech being open to everyone rather than kept to megacorporations that OpenAI deems worthy (and has received funding from). A neural network isn't a lethal weapon, the potential for malicious use is much more limited than its potential positive creative applications.
A is a mentally healthy person living a fulfilling life. Will watching a disturbing deepfake affect A severely enough to cause them to commit suicide? No.
B leads a life of extreme isolation and faces troubling financial difficulties. They come across a deepfake that pushes them over the edge at the wrong time and they give up on their life. Is the deepfake to blame? Or were there more important factors that we could’ve focused on to save B and potentially thousands of others.
And since we’re discussing the unintended consequences of deepfakes, let’s not forget that a carefully crafted one can even save lives hypothetically.
When I read the parent comment I was thinking a deepfake designed to cause extreme reputational damage (cheating on spouse, etc.) in which case even person A would be at risk of having their social circle destroyed.
I think it’s a lot easier to create a harmful deepfake than a helpful one. It’s easier to create harmful lies than helpful truths.
Imagine - you could convince a parent that their child has died. Show them photographic evidence. I don’t know how close to the edge they’d need to be.