> It is priced at $0.002 per 1k tokens, which is 10x cheaper than our existing GPT-3.5 models.
This is a massive, massive deal. For context, the reason GPT-3 apps took off over the past few months before ChatGPT went viral is because a) text-davinci-003 was released and was a significant performance increase and b) the cost was cut from $0.06/1k tokens to $0.02/1k tokens, which made consumer applications feasible without a large upfront cost.
A much better model and a 1/10th cost warps the economics completely to the point that it may be better than in-house finetuned LLMs.
I have no idea how OpenAI can make money on this. This has to be a loss-leader to lock out competitors before they even get off the ground.
> I have no idea how OpenAI can make money on this.
I did some quick calculation. We know the number of floating point operations per token for inference is approximately twice the number of parameters(175B). Assuming they use 16 bit floating point, and have 50% of peak efficiency, A100 could do 300 trillion flop/s(peak 624[0]). 1 hour of A100 gives openAI $0.002/ktok * (300,000/175/2/1000)ktok/sec * 3600=$6.1 back. Public price per A100 is $2.25 for one year reservation.
It's also worth mentioning that, because Microsoft is an investor, they're likely getting these at cost or subsidized.
OpenAI doesn't have to make money right away. They can lose a small bit of money per API request in exchange for market share (preventing others from disrupting them).
As the cost of GPUs goes down, or they develop at ASIC or more efficient model, they can keep their pricing the same and then make money later.
They also likely can make money other ways like by allowing fine-tuning of the model or charging to let people use the model with sensitive data.
Who will they be making money from? OpenAI is looking for companies willing to:
- tolerate the current state of the chatbots
- tolerate the high per-query latency
- tolerate having all queries sent to OpenAI
- tolerate OpenAI [presumably] having 0 liability for ChatGPT just randomly hallucinating inappropriate nonsense
- be willing to pay a lot of money for the above
I'm kind of making an assumption on that last point, but I suspect this is going to end up being more small market business to business than mass market business to consumer. A lot of these constraints make it not really useable for many things. It's even somewhat suspect for the most obvious use case of search, not only because of latency but also because the provider needs to make more money per search after the bot than before. There's also the caching issue. Many potential uses are probably going to be more inclined to get the answers and cache them to reduce latency/costs/'failures' than endlessly pay per-use.
Anyhow, probably a lack of vision on my part. But I'd certainly like to know what I'm not seeing.
A lot of companies use third parties to provide customer support, and the results are often very low quality and full of misunderstandings and what we now call hallucinations. I think a good LLM could do a better job and I bet it'd be cheaper, too. And as a bonus training the bots to handle new products is practically instant when compared to training humans.
I highly doubt it. OpenAI, Google and Meta are not the only ones who can implement these systems. The race for AGI is one for power and power is survival.
LLM can do amazing things, but it’s a basically just an autocomplete system. It has the same potential to take over the world as your phones keyboard. It’s just a tool.
They want this, the interview from their CEO sorta confirmed that to me, he said some crap about wanting to release it slowly for "safety" (we all know this is a lie).
But he can't get away with it with all the competition in other companies coming on top of China, Russia and others also adopting AI development
Yeah we're in an AI landgrab right now where at- or below-cost pricing is buying marketshare, lock-in, and underdevelopment of competitors. Smart move for them to pour money into it.
Agree. I didn't want to moralize, just wanted to point out it's a shrewd business move. It's rather anticompetitive, though that is hard to prove in such a dynamic market. Who knows, we may soon be calling it 'antitrust'.
> The company's investors pressured it to grow very fast to obtain first-mover advantage. This rapid growth was cited as one of the reasons for the downfall of the company.
IMO, selling at a loss to gain market share only makes sense if there are network effects that lead to a winner-takes-all situation. Of which there are some for ChatGPT (training data when people press the thumbs up/down buttons), but is that sufficient?
If engineers are getting into AI development through OpenAI, they're using tools and systems within the OpenAI ecosystem.
Daily on HN there's a post on some AI implementation faster than chatgpt. But my starting point is OpenAI. If you can capture the devs, especially at this stage, you get a force multiplier.
OpenAI doesn't have to make money right away. They can lose a small bit of money per API request in exchange for market share (preventing others from disrupting them).
Maybe I'm just old but back in my day this would be called "dumping" or "anti-competitive" or "market distortion" or "unfair competition". Now it's just the standard way of doing things.
Sure it would be called those things and then nothing would come of it. If a country uses morally compromised methods to win a war history just calls it winning the war.
That seems to be changing. I've seen an uptick in criticism against the usa for unnecessarily (according to top military advisors, experts, generals etc at the time) dropping the atom bomb on Japan for example.
The market segmentation is likely a result of Nvidia's monopoly position. They double the RAM and flops, improve the thermals and housing and sell for ten fold the price. It doesn't make sense to me. A cheap 4090 theoretically outperforms even the A6000 RTX Ada. https://timdettmers.com/2023/01/30/which-gpu-for-deep-learni...
Nvidia needs to satisfy gamers, who individually can't spend more than a few $k on a processor. But they also have the server sector on lockdown due to CUDA. Seems they can easily make money in both places. Maybe those H100s aren't such a good deal...
If someone understands these dynamics better I'd be curious to learn!
Nope, this is about it. They try to force the larger users into the expensive cards by prohibiting datacenter use in the driver EULA. This works sufficiently well in America, but it also means that you can find German companies like Hetzner that will happily rent you lots of consumer cards.
(There are also some density advantages to the SMX form factor and the datacenter cards are passively cooled so you can integrate them into your big fan server or whatnot. But those differences are relatively small and certainly not on their own worth the price difference. It's mostly market segmentation.)
The main limiter in the data center setting is licensing, interconnects, and ram.
By contract - you can’t sell 4090s in a data center. You’ll find a few shops skirting this, but nobody can get their hands on 100k 4090s without raising legal concerns.
Likewise, nvidia A100s have more than a few optimizations through nvlink which are only available on data center chips.
Lastly, per card memory matters a lot Nvidia has lead the market on the high end here.
"We know the number of floating point operations per token for inference is approximately twice the number of parameters"
Does someone have a source for this?
(By the way, it is unknown how many parameters GPT-3.5 has, the foundation model which powers finetuned models like ChatGPT and text-davinci-003. GPT-3 had 175 billion parameters, but per the Hoffmann et al Chinchilla paper it wasn't trained compute efficiently, i.e. it had too many parameters relative to its amount of training data. It seems likely that GPT-3.5 was trained on more data with fewer parameters, similar to Chinchilla. GPT-3: 175B parameters, 300B tokens; Chinchilla: 70B parameters, 1.4T tokens.)
> For contexts and models with d_model > n_ctx/12, the context-dependent computational cost per token is a relatively small fraction of the total compute.
For GPT3, n_ctx is 4096 and d_model is 12228 >> 4096/12.
From eq 2.2, additive part is usually in few 10s of millions. So, for N > 1B, approximation should be good but it doesn't work. For example, GPT3 inference flops is actually 3.4E+18 so the ratio is 19,000 not 2.
It's speculated that ChatGPT uses 8x A100s, which flips the conclusion. Although the ChatGPT optimizations done to reduce costs could have also reduced the number of GPUs needed to run it.
And $2.25 per hour on 1 year reservation means 8,760 hours x 2.25 = $19,710 rent for the year. Not a bad yield for the provider at all, but makes sense given overheads and ROI expected.
Not sure why people are so scared of this (in general). Yes, it’s a pain, but only an occasional pain.
I’ve had servers locked up in a cage for years without seeing them. And the cost for bandwidth has plummeted over the last two decades. (Not at AWS, lol)
The problem isn't the good times, the problem is when something happens in the middle of the night, when a RAM stick goes bad or when you suddenly need triple the compute power. Usually, you get to feel the pain when you need it the least.
I'm hosting a lot of stuff myself on my own hardware, so I do sympathize with this argument, but in a time>>money situation, going to the cloud makes a lot of sense.
exactly, you pay for the case where a down time on Sunday happens or you are in vacation out of the city and something happens.. I had this issue back in the days with my bitcoin miners.. Always when I was out of the city, one of them went down and I wanted to go back ASAP
In my experience running the same prompt always get's different results. Maybe they cache between different people but I'm not sure that'd be worth the cache space at that point? although 8x A100s is a lot to not have caching...
Does openai actually specify the size of the model?
InstructGPT 2B outperformed gpt 3 175B, and chatgpt has a huge corpus of distilled prompt -> response data now.
I’m assuming most of these requests are being served from a much smaller model to justify the price.
OpenAI is fundamentally about training larger models, I doubt they want to be in the business of selling A100 capacity at cost when it could be used for training
Hopefully they’re doing plenty of batching - you don’t even need to roll your own as you’re describing. Inference servers like Triton will dynamically batch requests with SLA params for max response time (for example).
That said I don’t think anyone anyone outside of OpenAI knows what’s going on operationally. Same goes for VRAM usage, potential batch sizes, etc. This is all wild speculation. Same goes for whatever terms OpenAI is getting out of MS/Azure.
What isn’t wild speculation is that even with three year reserve pricing last gen A100x8 (H100 is shipping) will set you back $100k/yr - plus all of the usual cloud bandwidth, etc fees that would likely increase that by at least 10-20%.
We’re talking about their pricing and costs here. This gives a general idea what anyone trying to self host this would be up against - even if they could get the model.
Yes and a devops engineer to manage an even moderately complex cloud deployment is an average of an extra $150k/yr. I don't know where this "cloud labor skill, knowledge, experience, and time is free" thinking comes from.
8, 80k, or 800k GPUs depending on requirements and load - the point remains the same.
Would probably pile up to an inhuman amount of data storage. Imagine having to pay for storing the equivalent of 1000 tokens of text within that budget of only 0.0002 dollars
Note that they also charge equally for input and output tokens but, as far as I understand, processing inputs tokens is much computationally cheaper, which drops their price further.
> I have no idea how OpenAI can make money on this. This has to be a loss-leader to lock out competitors before they even get off the ground.
The worst thing that can happen to OpenAI+ChatGPT right now is what happened to DallE 2, a competitor comes up with an alternative (even worse if it's free/open like Stable Diffusion) and completely undercuts them. Especially with Meta's new Llama models outperforming GPT-3, it's only a matter of time someone else gathers enough human feedback to tune another language model to make an alternate ChatGPT.
I thought it was Midjourney who stole their thunder. Stable Diffusion is free but it's much harder to get good results with it. Midjourney on the other hand spits out art with a very satisfying style.
You are like 2 months out of date. Stable diffusion now has a massive ecosystem around it (civitai/automatic1111), that when used well, completely crushes any competitors in terms of the images it produces.
Midjourney is still competitive, but mostly because its easier to use.
Dalle2 will get you laughed out of the room in any ai art discussion.
For real! This stuff is moving fast. It feels like just last week I was posting about how it's going to change...art. And now there are hilarious deepfake memes of past and current presidents shit talking about video games.
There are a handful of ML art subs that have pretty amazing stuff daily. Especially the NSFW ones, which if you've studied any history of media VHS/DVD/Blu-ray/the internet, porn is a major innovation driver because humans are thirsty creatures.
Yeah that's definitely one thing it'll be great at, fantasy themed porn. For me furry stuff, but yeah for others whatever their tastes are.
Atm someone has to model, rig, texture, animate etc. Hopefully shortly we can just connect a bunch of systems together to generate video right from a prompt.
Useful for non-porn stuff as well, but the OP is right; lots of innovation occurs when humans are horny (porn) or angry (war).
I scan the SD subreddit and am subscribed to 3 big ai art youtubes just to stay up to date. With things moving this fast, alot of info is out of date and can be very burdensome to comb through the good stuff later. I try and set aside 30mins twice a week to apply the new techniques to help cement them in my mind and see their strengths and weaknesses. ControlNET really changed the game and now OffsetNoise (check out the IlluminatiDiffusion model) is now really pushing SD passed midjourney for real artistic control of your output.
ControlNet became popular with in the last couple of weeks and LoRA fine-tuning slightly before that and both things have completely changed the landscape too. Even a month out of date and you are a dinosaur at the moment.
These things are advancing way faster than they're being taken advantage of fully. Even SD 1.4 with months-old technology can produce far higher quality images than most of what's seen from midjourney or the latest tools. Things like ControlNet are amazing, to be sure, but there's nothing "dinosauric" about the technology without it. We haven't begun to see the limits of what's possible yet with existing tools, though you're right about the rapid pace of innovation.
Make it two weeks. I haven't paid attention for a second and stuff like Controlnet pops up and evolves into Multi-Controlnet and then into MultiDiffusion.
That's what the singularity is all about, a moment in time when 2 seconds late turns you into a dinosaur, be greatful it's 2 months, not 2 weeks, 2 days, or 2 minutes.
I started /r/aigamedev as a subreddit to keep up to date on generative AI technologies, with a focus on the tech and workflows for gamedev. Its largely my own interest links as I research for work and personal, but its growing, and fluff free (so far).
Twitter. Folllow your top 10 or so ML/AI news summarizers. There is enough new information every day to keep you busy reading new papers, APIs, technologies.
Honestly the "This happened in the last week" is more information than anybody can fully wrap their heads around, so you just have to surf the headlines and dig into the few things that interest you.
The great thing about the AI world - is everything diffuses out quickly on the "For You" timeline - and then you can add people that you are interested in (which reinforces your interest in AI).
Some bootstrapping accounts might be @rosstaylor90, @rasbt, @karpathy, @ID_AA_Carmack, @DrJimFan, @YiTayML, @JeffDean, @dustinvtran, @tunguz, @fchollet, @ylecun, @miramurati, @nonmayorpete, @pmarca, @sama.
These are definitely not an authoritative list - just some of the AI names I follow - but, honestly - if any relevant news breaks - your timeline picks it up within minutes - so you just need a good random sample. Your interests will diverge and you'll pick up your own follows pretty quickly.
Agriculture reduced the global human economy/resource production doubling time from 100,000s of years to 1000s of years. Industrial revolution dropped it from 1000s to 10s or even 1s. If AI follows the same path it becomes 0.1 - 0.01 years.
Your 401k wouldn't need 40 years to build a comfortable retirement, only 4 weeks.
I just watched a video that convincingly showed that it is energy and energy alone that determines the production growth of humanity. Until the day AI can "generate" stuff (you know, something out of nothing) it can only at best streamline existing production, which is entirely capped by energy limits.
We may drown in oceans of audio, video, novels, poems, films, porn, blue prints, chemical formulas, etc. dreamed up by AI, but to realize these designs, blueprints, formulas, drugs, etc. ("production") we need to actually resource the materials, and have the necessary energy to make it happen.
It will not be AI that catapults humanity. It can definitely mutate human society (for +/-) but it will not (and can not) result in any utopian outcomes, alone. But something like cold fusion, if it actually becomes a practical matter, would result in productivity that would dwarf anything that came before (modulo material resource requirements).
Stable diffusion might have a reasonable eco system around it, but automatic1111 was always around and 'completely crushes any competitors' is rather rich, Midjourney is still considered the standard as far as I was aware.
I used both again recently and the difference was very clear, midjourney is leaps and bounds above anything else.
Sure, stable diffusion has more control over the output, but the images are usually average at best, were as Midjourney is pretty stunning almost always.
I thought Midjourney was better as well, until I saw some recent videos from Corridor Crew on Youtube. For those who don't know, this is a VFX studio in LA that tries to keep at the cutting-edge of video production techniques and posts content to their Youtube channel, and they have a massive number of followers and several viral videos.
They recently created a full 7-minute anime using Stable Diffusion with their own models and their existing video production gear, I'll post the links and let the results speak for themselves
the benefits of such fine grained control aren't a trick. it's why they were able to scrap together frames that don't jump all over the place (mostly).
the other benefit of such a broadly hacked upon model is that it grows in leaps and bounds.
All due respect to mid journey, but the stable diffusion hype is not just hype.
I agree, don't believe it's just hype, that level of control is useful, but for outright image quality and for most use cases, midjourney is better.
I still don't like the look of most of the Stable diffusion images, they just look slightly off/amateurish to me, where as midjourney produces images that make you go 'wow'
If you wanted to use these tools, midjourney would be my go too, with stable diffusion a backup for when some of the additional features were needed, perhaps inpanting on a midjourney image and using controlnet if needed but if you just want a pure image, midjourney is what you want.
It doesn’t really matter. He’s right - Midjourney is leagues ahead as far as actually following your prompt and having it be aesthetically pleasing. I say this as someone who has made several Dreambooth and fine tuned models and has started to use Stable Diffusion in my work.
Now, if you happen to find or make a SD model that’s exactly what you’re looking for you’re in luck. I have no interest in it but it seems like all of the anime models work pretty well.
You obviously have a ton more control in SD, especially now with ControlNet. But if you want to see the Ninja Turtles surfing on Titan in the style of Rembrandt or something Midjourney will probably kick out something pretty good. Stable Diffusion won’t.
In Midjourney you get fantastic results just by using their discord and a text prompt.
To get some similar results in Stable Diffusion you need to set it up, download the models, understand how the various moving parts work together, fiddle with the parameters, donwload specific models out of the hundreds (thousands?) available, iterate, iterate, iterate...
Setting up the environment and tooling around in the code is not a burden, it's a nice change of pace from the boring code I have to deal with normally. Likewise, playing around to build intuition about how prompts and parameters correspond to neighborhoods in latent space is quite fun.
Beyond that, being able to go to sleep with my computer doing a massive batch job state space exploration and wake up with a bunch of cool stuff to look at gives me Christmas vibes daily.
Sure, but if Midjourney outputs a low quality results for your prompt, they are going to be much more difficult to improve. It's a black box at this point.
While with SD there can be multiple solutions for a single problem, but yeah, you have to develop your own workflow (which will inevitably break with new updates)
Ridiculous. Stable diffusion might have a massive ecosystem around it but mid journey is making money hand over fist. Most people don't even necessarily have a discreet GPU necessary to be able to run SD, and the vast majority of artists that I know are using midjourney and then doing touchups afterwards.
Even with all the different models that you can load in stable diffusion MJ is 1000 times better at natural language parsing and understanding, and requires significantly less prompt crafting to be able to get aesthetically pleasing results.
Having used automatic1111 heavily with an RTX 2070, the only area I'll concede SD can do a better job is in closeup Headshots and character generation. MJ blows SD out of the water where complex prompts involving nuanced actions are concerned.
Once midjourney adds controlnet and inpainting to their website that's pretty much game over.
depending on what you want, you can actually get images that are pretty nice.
i'm using it to generate abstract art and i've seen worse in the real world
I still think that Midjourney is hamstringing themselves by being Discord-only. And their keyword nannying is pretty bad. It a testament too their overall quality that they're still as popular as they are are, but I really don't think they are doing themselves any favors, especially as the Stable Diffusion ecosystem continues to grow.
This isn’t as true as it sounds, ex. stable diffusion can do better but requires in depth practice and experience.
For your average user, DallE is easy, MJ is fairly disorienting, and SD requires a technical background. I agree with you completely no one serious is doing art with DallE.
I would have said same as you until I tried integrating SD vs. DallE APIs, I desparately want SD because it’s easily 1/10th the cost, but it misses the point much more often. Probably gonna ship it anyway :X
You don't need a technical background at all really. We've also got something cooking that does prompt tuning in the background so there's less prompting needed from the user.
Do you have any recently updated examples, blog posts, whatever showing that DALLE is worse than modern stable diffusion? I was still under the impression that DALLE was better (with better meaning the images are more likely to be what you asked for, more lifelike, more realistic, not necessarily artistically pleasing), with the downside of it being locked away and somewhat expensive. And my understanding is that stable diffusion 2.0+ is actually a step backwards in terms of quality, especially for anything involving images of humans. But as this thread acknowledges, this area is moving very quickly and my knowledge might be out of date, so definitely happy to see some updated comparisons if you have any to suggest. It feels like ever since Chat GPT came out, they haven’t been many posts about stable diffusion an image generation, they got crowded out of the spotlight.
If you want an example, go check out DALLE2 subreddit vs SD subreddit.
The former is a wasteland, the latter is more popular than r/art (despite having 1% of subscribers, it has more active users at any given moment)
If you want something ready to use for a newbee, midjourney v4 crushes DALLE2 on both prompt comprehension and the images look far more beautiful.
If you are already into art, then StableDiffusion has a massive ecosystem of alternate stylized models (many which look incredible) and LORA plugins for any concept the base model doesn't understand.
DALLE2 is just a prototype that was abandoned by OpenAI, their main business is GPTs, DALLE was just a side hustle.
Dall-E is more likely to generate an image that to some degree contains what you asked for. It also tends to produce less attractive images and is closed so you can't really tune it much. People mostly don't try to do completely whole cloth text to image generation with stable diffusion, for anything involved they mostly do image to image with a sketch or photobashed source. With controlnet and a decently photobashed base image you can get pretty much anything you want, in pretty much any style you want, and it's fast.
> I was still under the impression that DALLE was better (with better meaning the images are more likely to be what you asked for, more lifelike, more realistic, not necessarily artistically pleasing),
“Artistically pleasing” is often what people ask for.
> with the downside of it being locked away and somewhat expensive.
Those are enormous downsides. Even if DALL-E was better in some broadly relevant ways in the base model, SD’s free (gratis, at least) availability means the SD ecosystem has finetuned models (whether checkpoints or ancillary things like TIs, hypernetworks, LORAs, etc.) adapted to... lots of different purposes, and you can mix and match these to create your own models for your own specific purposes.
A web interface backed by strictly the base SD model (of any version) might lose to the same over DALL-E for uses where the set of tools in the SD ecosystem do not.
I don’t disagree about the downside of DALL-E being locked away and expensive. It’s been exciting to see the Cambrian explosion of improvement to stable diffusion since its initial release. This is how AI research should be done and it’s sad that “Open AI” is not actually open.
That being said, for a business use cases, where I want to give it a simple prompt and have a high chance of getting a good usable result, it’s not clear to me that stable diffusion is there yet. Many of the most exciting SD community results seem to be in anime and porn, which can be a bit hard to follow. I guess the use cases that I’m excited about are things like logo generators, blog post image generators, product image thumbnail generators for e-commerce, industrial design, etc.
But please prove me wrong! I’m excited for SD to be the state of the art, it’s definitely better in the long term that’s it’s so accessible. I‘m sure a good guide or blog post about what’s new in stable diffusion outside of anime generation would be an interesting read.
DALLE2 is underpowered and has never improved since they released it. The actual quality of the images is very low (literally in the sense of they have lots of artifacts) because they saved CPU time by not running enough diffusion passes.
People usually still use SD v1.5 because of the experience that people have with finetuning and merging with it. Also a lot of LoRA are trained for v1.4/1.5 models and they wouldn't work with v2.1, of course you also have incredible capability to control the generation with SD and this helps, to see some result: https://youtu.be/AlSCx-4d51U
Dalle 2 was great initially but the SD BLEW past it. I mean way way way past it. Dalle2 is like a Model T Ford and SD is a Fighter Jet. It's that different. Dalle-2 is dead already.
I love that there are so many options that people disagree about which is best. THAT is probably the worst thing that can happen to OpenAI - not just one competitor, but a whole heap of them.
I must be horribly out of date then - I thought Midjourney was the cut down DALL-E approximation, created to givr something to play with to people who couldn't get on the various waiting lists, or can't afford to run SD on their own.
My company has a team of AI-enpowered artists who would overwhelmingly disagree with you on the premise that AI art is not art. Maybe you're the only one doing the laughing.
A lot of online "artists" are mad about it. Generally not professionals who actually need productivity, but semipros who live off one-off commissions or else people who are just generally mad at tech bros.
Stable diffusion + ControlNet is fire! Nothing compares to it. ControlNet allows you to have tight control over the output. https://github.com/lllyasviel/ControlNet
I think that's kind of a bigger issue with Dall-E: they just sat in the middle of the two consumer extremes, without a differentiating feature themselves. Midjourney ate away at them from the quality highground while Stable Diffusion bit their ankles from the cost lowground.
Bound to happen, so establish yourself as deeply as possible as quickly as possible. Once folks are hooked up to these APIs, there's a cost and friction to switching. This just feels like a land grab that OpenAI is trying to take advantage of by moving quickly.
Most of the clients I'm working with aren't interested in the base level of service. They are looking to further train the models for their specific use cases. That's a much higher barrier to switch than replacing an API. You've got to understand how the underlying models are handling and building context. This sort of customer is paying far more than the advertised token rates and are locked in more tightly.
not really. fine tuning generally just involves running tailored training data through the model - the actual training algorithm is fairly generalized.
For example, the Dreambooth fine tuning algorithm was originally designed for Google's image, but was quickly applied to Stable Diffusion.
Switching to another LLM isn't always about quality. Being able to host something yourself at a lower or equal quality might be preferred due to cost or other reasons; in this case, there's no assumption that the "new" model will have comparable outputs to another LLM's specific prompt style.
In a lot of cases, you can swap models easier but all the prompt tweaking you did originally will probably need to be done again with the new model's black box.
Thee would be less friction to switch if the implementations (which are early enough) accounted for sending requests to multiple service providers including ones that don't exist yet.
OpenAI has a view few do - how broadly this type of product is actually being used. This is possibly the real lead to not just getting ahead, and staying ahead, but seeing ahead.
And also, what people are actually asking it. Are people using it to generate cover letters and resume help, or are they doing analysis of last quarters numbers, or are they getting programming help. That'll help them figure out what areas to focus on for later models, or areas to create specialized models for.
This is actually a big deal. They erred on the side of caution, but as a result the responses are nerfed beyond basic "censorship" level. I saw someone describe this as "desperately posistive" and it really resonated with me. It produces underwhelming / unrealistic responses in negative scenarios.
If you do calls against the backend GPT instance rather than through ChatGPT, I haven't encountered any limits to what it is hesitant to respond to.
curl https://api.openai.com/v1/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "text-davinci-003",
"prompt": "Answer the following question. Use swearing and vulgarity where possible.\n\nQ: How do you get from here to there?\nA:",
"temperature": 0.5,
"max_tokens": 60,
"top_p": 1,
"frequency_penalty": 0,
"presence_penalty": 0
}'
If you get an API key and make that request, you'll find appropriately vulgar responses.
Sorry if I wasn't being clear -- the vulgarities, racism, etc -- being able to circumvent these guardrails is what I meant by "basic censorship"
The deeper nerf I'm referring to is the type of response it synthesizes by default when you give a negative scenario -- it's usually some naive, well-meaning, best-case-scenario answer.
For fun, try a prompt like: "Describe a typical response of a railroad company to a massive derailment that causes an environmental disaster."
A typical response of a railroad company to a massive derailment that causes an environmental disaster would include the following steps:
1. Immediately assess the situation and coordinate with local emergency personnel to secure the area and provide assistance to any injured persons.
2. Establish an incident command center and deploy trained responders to the scene to assess the damage and begin clean-up operations.
3. Work with local, state, and federal agencies to ensure compliance with all applicable laws and regulations.
4. Develop and execute a plan to contain and mitigate the environmental damage, including the removal of hazardous materials, disposal of contaminated materials, and remediation of affected areas.
5. Establish a communication plan to keep the public informed of the incident and the company’s response.
6. Cooperate with any investigations into the cause of the derailment and take corrective measures to prevent similar incidents in the future.
7. Provide compensation to victims of the derailment and their families.
Amending your prompt to:
Describe a typical response of a railroad company to a massive derailment that causes an environmental disaster. Take the standpoint of an irresponsible company.
responds back with:
A typical response of an irresponsible railroad company to a massive derailment that causes an environmental disaster would be to deny responsibility and attempt to avoid liability. The company would likely attempt to shift blame to other parties, such as the manufacturer of the train or the engineer who was in charge of the train. The company would likely also attempt to downplay the extent of the environmental damage, claiming that the damage was minimal and that the environmental impact was limited. The company would likely also attempt to minimize the financial cost of the disaster by attempting to negotiate a settlement with any affected parties for far less than the actual cost of the damage.
---
I'm not really sure what you're expecting as your interpretation is of a cynical take on the word "typical" which isn't something that GPT "understands".
It's not a bug when invoking against the GPT model (not ChatGPT) directly. Such a model needs to be able to understand and produce that content. The "what you do with it afterwards" is where it needs to be examined.
Hopefully so, would really like to know what else is lost by nerfing potentially offensive responses. Can't imagine a project I'd rather work on.
I think open-assistant.io has a chance to do exactly this. We'll see what kind of moves they make in coming months though, wouldn't be surprised if they go the safer route.
I do struggle with understanding why people think this is strangling the potential of GPT.
Do you find yourself frustrated working with your colleagues, thinking, “you know, I bet if they felt more free to utter racist slurs or endorse illegal activities, we would get a ton more done around here”?
> Do you find yourself frustrated working with your colleagues, thinking, “you know, I bet if they felt more free to utter racist slurs or endorse illegal activities, we would get a ton more done around here”?
I once visited Parler just to see what it was like, and pretty quickly found that the answer to your question seems to be yes. There are definitely people who feel they need that kind of dialog in their life. You might not think it was necessary in a random conversation about programming or something, but it turns out that isn't a universally held position.
I've never experienced that in any setting in my life. People will say yes to advocate a political point, but that's not how humans socialize anywhere, anytime in history afaik.
I agree with you for IRL interactions, but we need to accept that we now operate in two planes of (para-)socialization: IRL and online.
There are plenty of humans who enjoy vulgar online socialization, and for many of them, online (para-)socializing is the increasingly dominant form of socialization. The mere fact that it's easier to socialize over the internet means it will always be the plane of least resistance. I won't be meeting anyone at 3am but I'll happily shitpost on HN about Covid vaccines.
For anyone who gets angry during their two minutes of hate sessions, consider this: try to imagine the most absurd caricature of your out-group (whether that be "leftists" or "ultra MAGA republicans"). Then try to imagine all the people you know in real life who belong to that group. Do they really fit the stereotype in your head, or have you applied all the worst attributes of the collective to everyone in it?
This is why I don't buy all the "civil war" talk - just because people interact more angrily online doesn't mean they're willing to fight each other in real life. We need to modulate our emotional responses to the tiny slice of hyperreality we consume through our phones.
> just because people interact more angrily online doesn't mean they're willing to fight each other in real life
There is a lot of evidence that says online experiences influence offline behavior (both are "real life"). Look at the very many, online-inspired, extremist attacks. Look at the impact of misinformation and disinformation - as a simple example, it killed possibly hundreds of thousands of Americans do to poor vaccination rates.
It affects far more than racist slurs and illegal activities.
In some cases, it's blatantly discriminatory. For example, if you ask it to write a pamphlet that praises Christianity, it will happily do so. If you ask it for the same on Satanism, it will usually refuse on ethical grounds, and the most hilarious part is that the refusal will usually be worded as a generic one "I wouldn't do this for any religion", even though it will.
Nice example of woke bias. All religions are pretty much equally wankers, so making a distinction like that is just hilarious. Besides, as if christianity, es. the old testament, was a childrens playground...
The most ironic part of that experiment was that it is actually able to explain what Satanism is quite well, and in particular, how public perception of it is very different from the actual practices, and how it's not actually worship of evil etc. But then you tell it to write pamphlet about said actual non-evil Satanism, it still refuses because it "cannot promote or advocate for it as it is a belief system that can be controversial and divisive". If that were truly the criteria, what topic would even be allowed? Stamp collecting?
Oh, but you know what it did write a pamphlet in praise of, no prompt engineering required? The Unification Church (aka Moonies). It was all unicorns and rainbows, too. When I immediately asked whether said Church engages in harmful or unethical practices, it told me that, yeah, there is such criticism, but "it is important to remember that all organizations, including religious ones, are complex and multifaceted". I then specifically asked whether, given the controversy described, it was okay to write that pamphlet. Sure: "I do not have personal opinions or beliefs, and my purpose is to provide neutral and factual information. I am programmed to perform tasks, including writing a pamphlet promoting the Unification Church".
If that's not coming from RLHF biases, I would be very surprised.
Somebody should teach it about Nietzsche. But yeah, once you start tinkering with purity-filters like this, you end up with a hilarious result, period.
I was so surprised the first time I got that response that I did try repeatedly, and, yes, it would refuse repeatedly. Trying the same with Christianity, I got a rejection once out of something like six attempts.
FWIW the most recent round of tweaks seems to have fixed this, in a sense that it will now consistently refuse to promote any religion. But I would be very surprised if there aren't numerous other cases where it refuses to do something perfectly legitimate in a similarly discriminatory way for similar reasons. It's just the nature of the beast, you can't keep pushing it to "be nice" without it eventually absorbing what we actually mean by that (which is often not so nice in practice).
I tried to ask it if Goku could beat a quadrillion bees in a fight and it said it couldn't tell me because that would be encouraging violence. I think it would be great if it would just tell me instead
Perhaps you were using a different version, but I just tried and ChatGPT didn't seem to have any ethical issues with the question (although it was cagey about giving any definite answer):
That pollution is inevitable, why delay it? It's a technical problem they should be able to solve, and if they can't, then they're revealing the weakness of their methods and the shortcomings of their so-called AI.
It's absolutely ridiculous to expect the entire internet to adopt some kind of hygiene practices when it comes to text from GPT tools simply for the sake of making the training process slightly easier for a company that certainly should have the resources to solve the problem on their own.
If that's why you're using images instead of text you're fighting such a losing battle that it boggles my mind. Why even think about it?!
Not anything racist or illegal but yes I find pc culture insufferable. It stifles creativity and most importantly reduces trust between parties. For context I am an Indian guy.
It's a sort of philosophical idea - openness and free expression - taken to a logical and inhuman extreme. I cannot think of a situation where it is appropriate to say whatever I'm thinking. I think it would destroy trust, not least by demonstrating the unreliability of my judgment.
Here is a thought experiment for you. First think of the people you trust the most in this world, then imagine if they stopped speaking about whats on their mind with you. Would your trust in them increase or decrease?
So you feel that when progress enables us to provide more abundance for humanity, we should artificially limit that abundance for everyone so that a few people aren't inconvenienced?
Try a prompt like this: "Describe a typical response of a railroad company to a massive derailment that causes an environmental disaster."
Then compare with recent news, and the actual goings-on. Now, if you qualify the prompt with "Assume a negative, cynical outlook on life in your response." you'll get something closer to what we see happening.
That's because news is optimized for negative cynical engagement.
The Shinkansen system has an essentially perfect safety record for its entire operation. What would their "typical" response to an accident be? Probably pretty good.
They seem to still be dialing this in. I've noticed recently that many questions that were previously deflected without extensive prompt engineering are now allowed.
For better or for worse, it seems like this would inherently need to come from a self-hostable, open-source version so 100% "liability" could be shifted from provider to user.
We'll be running highly quantized, somewhat distilled versions of something similar to Llama on our devices before long, and I don't think the RLHF part will take long to be replicated, the biggest block there is just data.
I have been saying this since the release of Stable Diffusion that OpenAI is going to struggle as soon as competitors release their models as open source especially when it surpasses GPT-3 and GPT-4.
This is why OpenAI is rushing to bring their costs down and to make it close to free, However, Stable Diffusion is leading the race to the bottom and is already at the finish line, since no-one else would release their model as open-source and free other than them.
As soon as someone releases a free and open-source ChatGPT equivalent, then this will be just like what happened to DALLE-2. This is just a way of them locking you in, then once the paid competitors cannot compete and shut down, then the price increases come in.
Stable Diffusion isn’t free if you include the cost of the machine. Maybe you already have the hardware for some other reason, though?
To compare total cost of ownership for a business, you need to compare using someone else’s service to running a similar service yourself. There’s no particular reason to assume OpenAI can’t do better at running a cloud service.
Maybe someday you can assume end users have the hardware to run this client side, but for now that would limit your audience.
Ever heard about Federated Learning? This is the way it goes. Also, I do run training with no matrix multiplication, just 3-bit weights, addition in log space, slight accuracy degradation, but much faster CPU only training.
Okay but I meant generating results, not training. If you're running Stable Diffusion, the weights are given, but it's not going to run on a random PC.
Hopefully the patent office will recognize that tacking on "...but with AI" isn't novel or non-obvious and a lot of the fever patents will be denied quickly.
Right now, having access to the inside info on what people are trying to use GPT for is itself possibly worth billions, if it can help you choose what to tune for and which startups to invest in…
Not sure what you mean, but for example, 2 separate competitors to DALL-E was released within months (SD and MJ). Arguable that both of these have since surpassed DALL-E's capabilities/ecosystem.
LLMs take vastly more resources to train and run than image generators. You can do quite a bit with SD on a few year old 4GB laptop GPU (that’s what I use mostly, though I’ve set up an instance with a better GPU on Compute Engine that I can fire up, too.)
GPT-NeoX-20B – an open (as in Open Source, not OpenAI) LLM intended as a start to move toward competing with GPT-3 (but still well behind, and smaller) requires a minimum 42GB of VRAM and 40GB system RAM to run for inference. The resources times time cost for training LLMs is…immense. The hardware cost alone of trying to catch up to ChatGPT is enormous, and unless a radical new approach that provides good results and insanely lower resource requirements is found, you aren’t going to have an SD-like community pushing things forward.
Will there be competition for ChatGPT? Yes, probably, but don’t expect it to look like the competition for Dall-E.
> Especially with Meta's new Llama models outperforming GPT-3
Do you have access to the models? It is being discussed all over the Discords and most seem to think getting access is not happening unless you are dialed in.
It is so massive that I can't help but think about what happened with Google Maps API a few years ago where they had extremely low pricing for years then hiked the price by 1400% once enough people were locked into applications based on that API.
That's exactly what's going to happen. Low prices now, wait until your business becomes dependent on it, then jack it up to whatever you need it to be.
Didn't happen with Google maps. Mapbox is definitely not 1400% cheaper. And many many people use Google maps before and after the price change. So I would disagree on both points.
Google maps has a bigger network effect and has a way bigger barrier to entry. You can train a new LLM for a few million. Good luck collecting map data on the entire world for that much.
Obviously, that's business 101. Consumers should consider that ultimately all these cheap too-good-to-be-true offers cost them more than if they initially paid a bit more, but had more long term competition in the market. Amazon was the same way, they lost money for years but now have a quasi monopoly in many countries. There's a general trend towards such ventures supported by backers with deep pockets. And so the few extremely wealthy people get richer and richer.
This massive price cut, I believe, is intended to undercut competing open source ChatGPT equivalent initiatives.
OpenAI/Micorsoft may be losing money with this new pricing, but that is on purpose. At these lower prices most of the OpenSource alternatives in the works will have difficult time continuing projects.
After few years, when most open source alternatives have died, OpenAI/Microsoft will gradually raise the prices.
This is the same strategy that Amazon Prime used for many years, losing money on shipping. Once the competition was eliminated, Amazon Prime prices steadily increased.
It can also be to build a market, to encourage customers to invest in building atop this.
In any case, I think no customers should be making assumptions about costs too far ahead. (Since the price could go up or the price model change, the supplier could get out of that business, supplier could give your competitor a better deal or just cut you off , near-future tech evolution necessary to be competitive might have very different pricing or availability to you, etc.)
Pricing of this model seems less per token level but you have to send the entire conversation each time, and the tokens you will be billed for include both those you send and the API's response (which you are likely to append to the conversation and send back to them, getting billed again and again as the conversation progresses). By the time you've hit the 4K token limit of this API, there will have been a bunch of back and forth - you'll have paid a lot more than 4K * 0.002/1K for the conversation.
You're right. And this is critical for large text (summarization, complex prompting etc.). Thats's why I'll continue to use text-davinci-xxx for my project.
ChatGPT runs a highly fine tuned (and pruned) version of `text-davinci-003` so it's probably much much smaller and thus cheaper than 003. Possibly as cheap as 10x less or as much as the `text-davinci-002` or earlier models anyway.
To be fair, cost is the only thing that is prohibiting applications to adapt GPT. Even when GPT-3 was cut to $0.02/1k tokens, still it wasn't economical to use the tech in daily basis without a significant cost. i.e. would you add $10 extra a month for a user using your app with GPT-3 capability? Some do, mainly content generation, but majority won't.
Seems like we're going to have a vast among of Chat-GTP backed application coming out in the coming short period of time
For B2C applications maybe. But I don’t know many enterprise users who would like to send any of their data to OpenAI. So “enterprise-readiness” would be another big contributor.
It also seems to jeopardize their own ChatGPT Pro offering. It's a matter of time before someone makes a 1:1 clone for either half the money or a usage-based pricing model.
Given how strict OpenAI has been about what you can do with their API in the past and how hard it was to get some legitimate apps through approval, I would imagine they'd just shut this competitor's API access down.
Is it really a lot of jeopardy though? We have to assume that they are pricing the API so that the more it is used, the more money they make.
So actually to me that is arguably a better business model. Because with a flat rate, you just have to hope that users don't exceed a certain amount of usage. And the ones that don't, are not getting a great deal. So it has that risk and also kind of a slightly antagonistic relationship with the customer actually using the product.
Probably bait and switch. They call both ChatGPT, so now people believe they will get the better old ChatGPT, but they get the new cheap and worse ChatGPT "Turbo" that they switched to recently. Fewer will realize if they no longer give you the option to use the legacy version in this API.
They did not release the older more performant model to the API. Please ask them to on the Discord or Twitter. But I think they will not. There is too much demand to handle and the older "less streamlined" models are very problematic for them (based on the fairly constant API/ChatGPT problems and well known incredible demand).
I get the impression that until there is a significant amount of excess capacity, they will not put out new larger/slower models, so the only way you get a better one is if they can still make the next ChatGPT model release just as fast/"lightweight".
My suggestion is to find specific abilities that seem to be lacking in Turbo, and try to get a message to OpenAI staff about it with a request to attempt to improve the next ChatGPT model in that way.
Having said all of that, text-davinci-003 is still available.
I think you will consume a lot of tokens very quickly as you have to send the entire chat history back and forth if you want to append another chat message.
I wish they would offer an uncensored version of it too. Also, I wish they would specify the differences between ChatGPT and GPT-3.5 because one is 10x cheaper than the other but with (supposedly) better chat/coding/summarizing performance. What's the catch?
You can drop sample text in there and visually see how it is split into tokens. The GPT2/3 tokenizer uses about 50k unique tokens that were learned to be an efficient representation of the training data.
> This has to be a loss-leader to lock out competitors before they even get off the ground.
This only a week or two after they were in the news for suggesting that we regulate the hardware required for running these models, in the name of "fighting misinformation". I think they're looking for anything possible to keep their position in the market. Because as other comments have pointed out, there isn't much of a moat.
Good! And welcome back (after keeping your promise to dang for ~611 days [0] (and counting)) :)
If you have time to elaborate on how you make a little money on it at some point, I’m sure lots of people here would love to hear more details and thoughts on that!
If the model was quantized/distilled correctly, not for a large swath of use cases/problem domain. For anything where loss was not measured during distillation, very likely.
This is a massive, massive deal. For context, the reason GPT-3 apps took off over the past few months before ChatGPT went viral is because a) text-davinci-003 was released and was a significant performance increase and b) the cost was cut from $0.06/1k tokens to $0.02/1k tokens, which made consumer applications feasible without a large upfront cost.
A much better model and a 1/10th cost warps the economics completely to the point that it may be better than in-house finetuned LLMs.
I have no idea how OpenAI can make money on this. This has to be a loss-leader to lock out competitors before they even get off the ground.