Hacker News new | past | comments | ask | show | jobs | submit login
GPT-4 API General Availability (openai.com)
763 points by mfiguiere 11 months ago | hide | past | favorite | 546 comments



Promote and proliferate local LLMs.

If you use GPT, you're giving OpenAI money to lobby the government so they'll have no competitors, ultimately screwing yourself, your wallet, and the rest of us too.

OpenAI has no moat, unless you give them money to write legislation.

I can currently run some scary smart and fast LLMs on a 5 year old laptop with no GPU. The future is, at least, interesting.


Can you elaborate on scary smart and fast?

It's been a month or two since I've tried but the results were depressingly slow and useless for more or less every task I tried.

Every time a model is claimed to be "90% of GPT-3" I get excited and every time it's very disappointing.

(On that note, after using GPT-4, GPT-3 now seems disappointing almost every time I interact with it.)


Different quantizations can give you a big speedup if you've had "depressingly slow" issues. Even the slowest ones (that fit in RAM) will run at basically interactive speed, not instant, but also not "email speed". I have a laptop with a 2018 CPU and I'm working with them just fine.

Text generation style instead of chat style is another avenue that makes the feedback time not so annoying for a developer.

at 100ms/token, it's faster than most people type, I think. That's what you might get on an old laptop with a 7B model.

There's a useful leaderboard here to help you pick a model: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderb...

It really depends on your task, lots and lots of natural language type tasks give great results, the models seem to have extensive knowledge of many fields. So for some kinds of Q&A bot (technical or not), for copy blurbs, for fiction, game NPCs, etc, the models (especially 13B and up) can be breathtaking, even moreso considering they run on bottom-dollar consumer hardware (I paid $250 for the laptop I'm developing on).

There are of course some things that neither the local LLMs nor GPT4 can do, like create useful OpenSCAD models :)

Things keep getting better, newer quantization methods give you more smarts in the same amount of RAM at basically the same speed -- the models are getting better, there are more permissively licensed ones now.


Whaaaaat, how are you getting 100ms per token on an 5 year old potato without a graphics card?

Like, not vaguely hand wavey stuff, specifically, what model and what inference code?

I get nothing like that performance for the 7B models, forget the larger models, using llama.cpp on a pc without an nvidia GPU.


I'm running TheBlokes wizard-vicuna-13b-superhot-8k.ggmlv3 with 4-bit quantization on a Ryzen 5 that's probably older than OPs laptop.

I get around 5 tokens a second using the webui that comes with oogabooga using default settings. If I understand correctly, this does not get me 8k context length yet, because oogabooga doesn't have NTK-aware scaled RoPE implemented yet.

Using the same model with the newest kobold.cpp release should provide 8k context, but runs significantly slower.

Note that this model is great at creative writing, and sounding smart when talking about tech stuff, but it sucks horribly at stuff like logic puzzles or (re-)producing factually correct in-depth answers about any topic I'm an expert in. Still at least an order of magnitude below GPT4.

The model is also uncensored, which is amusing after using GPT4. It will happily elaborate on how to mix explosives and it has a dirty mouth.

Interestingly, the model speaks at least half a dozen languages much better than I do, and is proficient at translating between them (far worse than deepL, of course). Which is mindblowing for a 8GByte binary. It's actual black magic.


"Note that this model is great at creative writing"

Could you elaorate on what you mean by that, like, are you telling it to write you a short story and it does a good job? My experiments with using these models for creative writing have not been particularly inspiring.


Yes, having the model write an entire short story or chapter is not very good. It excels if you interact closely with it.

I tested it to create NPCs for fantasy role playing games. I think its the primary reason cobold.cpp exists (hence the name).

You give it a (ideally long, detailed) prompt describing the character traits of the NPCs you want, and maybe even add back and forth dialogue with other characters to the prompt.

And then you just talk to those characters in the scene you set.

There's also "story mode", where you and the model take turns writing a complete story, not only dialogue. So both of you can also provide exposition and events, and the model usually only creates ~10 sentences at a time.

There's communities online providing extremely complex starting prompts and objectives (escape prison, assassin someone at a party and get away, ect.) for the player, and for me, the antagonistic ones (the models has control over NPCs that don't like you) are surprisingly fun.

Note that one of the main drivers of having uncensored open source LLMs is people wanting to role-play erotica with the model. That's why the model that first had scaled RoPE for 8k context length is called "superhot" - and the reason it has 8K context is that people wanted to roleplay longer scenes.


This is a exactly a case in point why people decide to pay OpenAI instead of rolling their own. I'm non-technical but have setup an image gen app based custom SD model using diffusers, so not entirely clueless.

But for LLM I have no where idea where to start quickly. Finding a model on a leaderboard, download and setup then customising it and benchmarking is way too much time for me, I'll just pay for GPT4 if ever need to instead of chasing and troubleshooting to get some magical result. It'll be easier in the future I'm sure when an open model merges as the SD1.5 of LLM


I've found https://gpt4all.io/ to be the fastest way to get started. I've also started moving my notes to https://llm-tracker.info/ which should help make it easier for people getting started: https://llm-tracker.info/books/howto-guides/page/getting-sta...


Here is a short test of a 7B 4bit model on an intel 8350U laptop with no AMD/Nvidia GPU.

On that laptop CPU from 2017, using a copy of llama.cpp I compiled 2 days ago (just "make", no special options, no BLAS, etc):

  ./main -m models/WizardLM-7B-uncensored.ggmlv3.q4_0.bin -n 128 -s 99 -p "A short test for Hacker News:"

  llama_print_timings:      sample time =    19.12 ms /    36 runs   (    0.53 ms per token,  1882.65 tokens per second)
  llama_print_timings: prompt eval time =   886.82 ms /     9 tokens (   98.54 ms per token,    10.15 tokens per second)
  llama_print_timings:        eval time =  5507.31 ms /    35 runs   (  157.35 ms per token,     6.36 tokens per second)
and a second run:

  ./main -m models/WizardLM-7B-uncensored.ggmlv3.q4_0.bin -n 128 -s 99 -p "Sherlock Holmes favorite dinner was "

  llama_print_timings:      sample time =    54.37 ms /   102 runs   (    0.53 ms per token,  1875.93 tokens per second)
  llama_print_timings: prompt eval time =   876.94 ms /     9 tokens (   97.44 ms per token,    10.26 tokens per second)
  llama_print_timings:        eval time = 16057.95 ms /   101 runs   (  158.99 ms per token,     6.29 tokens per second)
at 158ms per token, if we guess a word is 2.5 tokens, then that's 151 words per minute, much faster than most people can type. On a $250 laptop. Isn't the future neat?

the code I was running: https://github.com/ggerganov/llama.cpp

and the model: https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML

There are other models that may perform better, I'm going to be doing a lot of screwing around with OpenLLaMA this weekend.


I'm on a thinkpad with a 2016 CPU (i5-7300U) running ubuntu.

I don't know anything so I left default settings.

I get about 450ms/t with airoboros-7b and 350ms/t with orca-mini-3b.

edit: with oobabooga webui


How are you running inference? GPU or CPU? I'm trying to use GPT4All (ggml-based) on 32 cores of E5-v3 hardware and even the 4GB models are depressingly slow as far as I'm concerned (i.e. slower than the GPT4 API, which is barely usable for interactive work). I'd be much obliged if you could point me at a specific quantized model on HF that you think is "fast" and I'll download it and try it out.


In terms of speed, we're talking about 140t/s for 7B models, and 40t/s for 33B models on a 3090/4090 now.[1] (1 token ~= 0.75 word) It's quite zippy. llama.cpp performs close on Nvidia GPUs now (but they don't have a handy chart) and you can get decent performance on 13B models on M1/M2 Macs.

You can take a look at a list of evals here: https://llm-tracker.info/books/evals/page/list-of-evals - for general usage, I think home-rolled evals like llm-jeopardy [2] and local-llm-comparison [3] by hobbyists are more useful than most of the benchmark rankings.

That being said, personally I mostly use GPT-4 for code assistance to that's what I'm most interested in, and the latest code assistants are scoring quite well: https://github.com/abacaj/code-eval - a recent replit-3b fine tune the human-eval results for open models (as a point of reference, GPT-3.5 gets 60.4 on pass@1 and 68.9 on pass@10 [4]) - I've only just started playing around with it since replit model tooling is not as good as llamas (doc here: https://llm-tracker.info/books/howto-guides/page/replit-mode...).

I'm interested in potentially applying reflexion or some of the other techniques that have been tried to even further increase coding abilities. (InterCode in particular has caught my eye https://intercode-benchmark.github.io/)

[1] https://github.com/turboderp/exllama#results-so-far

[2] https://github.com/aigoopy/llm-jeopardy

[3] https://github.com/Troyanovsky/Local-LLM-comparison/tree/mai...

[4] https://github.com/nlpxucan/WizardLM/tree/main/WizardCoder


> https://github.com/turboderp/exllama

Is exllama an alternative to llama.cpp?


llama.cpp focuses on optimizing inference on a CPU, while exllama is for inference on a GPU.


Thanks. I thought llama.cpp got CUDA capabilities a while ago? https://github.com/ggerganov/llama.cpp/pull/1827


Oh it seems you're right, I had missed that.

As far as I can see llama.cpp with CUDA is still a bit slower than ExLLaMA but I never had the chance to do the comparison by myself, and maybe it will change soon as these projects are evolving very quickly. Also I am not exactly sure whether the quality of the output is the same with these 2 implementations.


Until recently, exllama was significantly faster, but they're about on par now (with llama.cpp pulling ahead on certain hardware or with certain compile-time optimizations now even).

There are a couple big difference as I see it. llama.cpp uses `ggml` encoding for their models. There were a few weeks where they kept making breaking revisions which was annoying, but it seems to have stabilized and now also supports more flexible quantization w/ k-quants. exllamma was built for 4-bit GPTQ quants (compatible w/ GPTQ-for-LLaMA, AutoGPTQ) exclusively. exllama still had an advantage w/ the best multi-GPU scaling out there, but as you say, the projects are evolving quickly, so it's hard to say. It has a smaller focus/community than llama.cpp, which also has its pros and cons.

It's good to have multiple viable options though, especially if you're trying to find something that works best w/ your environment/hardware and I'd recommend anyone to HEAD checkouts a try for both and see which one works best for them.


Thank you for the update! Do you happen to know if there are quality comparisons somewhere, between llama.cpp and exllama? Also, in terms of VRAM consumption, are they equivalent?


ExLlama still uses a bit less VRAM than anything else out there: https://github.com/turboderp/exllama#new-implementation - this is sometimes significant since from my personal experience it can support full context on a quantized llama-33b model on a 24GB GPU that can OOM w/ other inference engines.

oobabooga recently did a direct perplexity comparison against various engines/quants: https://oobabooga.github.io/blog/posts/perplexities/

On wikitext, for llama-13b, the perplexity of a q4_K_M GGML on llama.cpp was within 0.3% of the perplexity of a 4-bit 128g desc_act GPTQ on ExLlama, so basically interchangeable.

There are some new quantization formats being proposed like AWQ, SpQR, SqueezeLLM that perform slightly better, but none have been implemented in any real systems yet (the paper for SqueezeLLM is the latest, and has comparison vs AWQ and SpQR if you want to read about it: https://arxiv.org/pdf/2306.07629.pdf)



Thank you.


Those GPUs are 1200$ and upwards. This is equivalent to 20,000,000 tokens on GPT-4. I don't think I will ever use this many tokens for my personal use.


I agree that everyone should do their own cost-benefit analysis, especially if they have to buy additional hardware (used RTX 3090s are ~$700 atm), but one important thing to note for those running the numbers is that all your tokens need to be resubmitted for every query. That means, that if you end up using the OpenAI API for long-running tasks like say a code assistant or pair programmer, with an avg of 4K tokens of context, you will pay $0.18/query, or hit $1200 at about 7000 queries. [1] At 100 queries a day, you'll hit that in just over 2 months. (Note, that is 28M tokens. In general tokens go much faster than you think. Even running a tiny subset of lm-eval against will use about 5M tokens.)

If people are mostly using their LLMs for specific tasks, then using cloud providers (Vast.ai and Runpod were cheapest last time I checked) can be cheaper than dedicated hardware, especially if your power costs are high. If you're needs are minimal, Google Colab offers a free tier with a GPU w/ 11GB of VRAM, so you can run 3B/7B quantized models easily.

There are reasons of course irrespective of cost to run your own model (offline access, fine-tuning/running task specific models, large context/other capabilities OpenAI doesn't provide (eg, you can run multi-modal open models now), privacy/PII, BCP/not being dependent on a single vendor, some commercial or other non-ToS allowed tasks, etc).

[1] https://gptforwork.com/tools/openai-chatgpt-api-pricing-calc...


i think the falcon instruct is considered pretty good but if you are expectation set by gpt4 it still will not compare


Save for coding they've been pretty good in my experience.

There's definitely some prompt magic openai does behind the scenes that helps beat the raw style local llms usually go for. With proper prompting you can get chatgpt like answers.


Running an LLM locally and paying for access to OpenAI are two separate concerns.

But to address both: is it very relevant what LLM you use right now? Local or hosted, openAI or other?

It seems like the interface has converged around chat-based prompts.

New ideas for tuning or improving the efficiency of foundational models are published almost every week.

If one wants to build a product on top of of generative AI, why not simply start with what’s free or works with one’s dev environment?

Presumably, the interaction with or API to text-based gen AI will be very similar no matter what engine is best for your use case at any given time.

This would imply these backends will be swappable, the way web services are that copy AWS S3 APIs.

So, to return to the point, can’t people just build their product with openAI or other and plan to move away based on the cost and fit for their circumstances?

Couldn’t someone say prototype the entire product on some lower-quality LLM and occasionally pass requests to GPT4 to validate behavior?

It seems far-fetched to believe this tech can be constrained by legislation.

OpenAI can lobby all they want, it won’t necessarily buy them anything. Look what happened with FTX.

Since LLMs can be run locally and the engines be black boxes to the user, how could a legislative act really prevent them from being everywhere—-especially given the public utility.


> Couldn’t someone say prototype the entire product on some lower-quality LLM and occasionally pass requests to GPT4 to validate behavior?

It can be done -- it is the basis for assisted generation and related work. It does require full access to the model, to be time and money-efficient. See https://huggingface.co/blog/assisted-generation

Disclaimer: I'm the author of the blog post linked above.


> Couldn’t someone say prototype the entire product on some lower-quality LLM and occasionally pass requests to GPT4 to validate behavior?

This, infact, might be a better way to do inference anyway: https://twitter.com/Francis_YAO_/status/1675967988925710338

> So, to return to the point, can’t people just build their product with openAI or other and plan to move away based on the cost and fit for their circumstances?

Depends. There are signs that folks are buying into GPT-specific APIs (like function calls) which may not be as easy to migrate away from.


Asking because I have not implemented these yet: is there anything unique about the syntax that it can't just be copied?


Some (not all) projects are indeed "copying" the OpenAI APIs; ex: https://github.com/go-skynet/LocalAI/issues/588


Care to share some links? My lack of GPU is the main blocker for me from playing with local-only options.

I have an old laptop with 16GB RAM and no GPU. Can I run these models?


https://github.com/ggerganov/llama.cpp

https://huggingface.co/TheBloke

There's a LocalLLaMA subreddit, irc channels, and a whole big community around the web working on it on GitHub nd elsewhere.

edit: I forgot to directly answer you: yes you can run these models. 16GB of plenty. Different quantizations give you different amounts of smarts and speed. There are tables that tell you how much RAM is needed per which quantization you choose, as well as how fast it can produce results (ms per token). e.g. https://github.com/ggerganov/llama.cpp#quantization where RAM required a little more than the file size, but there are tables that list it explicitly which I don't have immediately at hand.


A reminder that llama isn't legal for the vast majority of use cases. Unless you signed their contract and then you can use it only for research purposes.


OpenLLaMA is though. https://github.com/openlm-research/open_llama

All of these are surmountable problems.

We can beat OpenAI.

We can drain their moat.


For the above, are the RAM figures system RAM or GPU?


CPU RAM


Absolutely, 100% agree. I just wouldn't touch the original LLaMA weights. There are many amazing open source models being built that should be used instead.


> We can drain their moat.

I've got an AI powered sump pump if you need it.


They most certainly don't need / deserve the snark, to be sure, on hacker news of all places.


We don’t actually know that it’s not legal. The copyrightability of model weights is an open legal question right now afaik.


It doesn't have to be copyrightable to be intellectual property.


No, but what is it? Not your lawyer, not legal advice, but it's not a trade secret, they've given it to researchers. It's not a trademark because it's not an origin identifier. The structure might be patentable, but the weights won't be. It's certainly not a mask work.

It might have been a contract violation for the guy who redistributed it, but I'm not a party to that contract.


I'm going to play devil's advocate and state that a lot of what you mentioned will be relevant to a tiny part of the world that has the means to enforce this. The law will be forced to change as a response to AI. Many debates will be had. Many crap laws will be made by people grasping at straws but it's too late. Putting red tape around this technology puts that nation at a technological disadvantage. I would go as far as labeling a national security threat.

I'm calling it now. Based on what I see today. Europe will position itself as a leader in AI legislation, and its economy will give way to the nations that want to enter the race and grab a chunk of the new economy.

It's a Catch 22. You either gimp your own technological progress, or start a war with a nation that does not. Pretty sure Russia and China don't really care about the ethics behind it. There are plenty of nations capable enough in the same boat.

Now what? OK, so in some hypothetical future China has an uncensored model with free reign over the internet. The US and Europe has banned this. What's stopping anyone from running the Chinese model? There isn't enough money in the world to enforce software laws.

How long have they tried to take down The Pirate Bay? Pretty much every permutation of every software that's ever been banned can be found and ran with impunity if you have the technical knowledge to do so. No law exists that can prevent that.

If it did, OpenAI wouldn't exist.


> How long have they tried to take down The Pirate Bay? Pretty much every permutation of every software that's ever been banned can be found and ran with impunity if you have the technical knowledge to do so. No law exists that can prevent that.

Forms of this argument get tossed out a lot. Laws don’t prevent, they hopefully limit. Murder has been illegal for a long time, it still happens.


You missed the point: these laws are not limiting other countries, only those who introduce them. Self-limiting, giving advantage to others.


> It might have been a contract violation for the guy who redistributed it, but I'm not a party to that contract.

Wouldn’t that violate the Nemo dat quod non habet legal principle and so you cannot hide behind the claim that you weren’t party to the contact?

https://en.wikipedia.org/wiki/Nemo_dat_quod_non_habet


No because the weights are not IP protected by the entity that trained the model, so they cannot prevent you to redistribute it because it doesn’t belong to them in any legal sense. GPU cycles alone don’t make IP.

The contracts in these cases are somewhat similar to an NDA, without the secrecy aspect. Restricted disclosure of public information. You can agree to such a contract if you want to, and a court might even enforce it, but it doesn’t affect anybody else’s rights to distribute that information.

Contracts are not statutes, they only bind the people directly involved. To restrict the actions of random strangers, you need to get elected.


I’m going to go out on a limb here and assume that you’re making this statement because it feels like they should have some intellectual property rights in this case. Independently of whether that feeling corresponds to legal reality (the original question) I would also encourage you to question the source of this feeling. I believe it is rooted in an ideology where information is restricted as property by default. This is a dangerous ideology that constantly threatens to encroach on intellectual freedom e.g. software patents, gene patents. We have a wonderful tradition in the US that information is free by default. It has been much eroded by this ideology but I believe freedom is still legally the default unless the information falls under the criteria of trademark, copyright or patent. I think it’s important to recognize how this ideology of non-freedom has perniciously warped people’s default expectation around information sharing.


It has nothing to do with any sort of feeling. Perhaps you should check your own mental state.

It is the same as any confidential data. Logs, readings from sensors, etc etc. If it's confidential and given to a 3rd party through a contract that doesn't mean that it's suddenly not confidential data for the rest of the world, even if the 3rd party leaks it.

And if you really have a lawyer trying to tell you that some, at best, extreme grey area, is fine to build a business on, I think you should find a new lawyer.


I think that just further shows your worldview that defaults to information/data as property. I think this is wrong both in the sense that it isn't really what the law says (but aren't going to agree here anyway) but more importantly I think what it should say. Information should not be and is not property by default. There are only three specific ways in which it can become property ("intellectual property"): copyright, trademark and patent. If it's none of those then the government doesn't get to make any rules about how anyone deals in the data because of the 1st Amendment. That's my understanding of the US system at least.


Patents? Trademark? What do you mean?


Maybe this: https://en.wikipedia.org/wiki/Database_right but it doesn't exist in every countries.


This is the most well-maintained list of commercially usable open LLMs: https://github.com/eugeneyan/open-llms

MPT, OpenLLaMA, and Falcon are probably the most generally useful.

For code, Replit Code (specifically replit-code-instruct-glaive) and StarCoder (WizardCoder-15B) are the current top open models and both can be used commercially.


It’s not clear if their license terms would hold, for the moment just act and worry later.

Update: That is only true for the legal system I am currently residing in. No idea about e.g. the US.


Just a heads up: If you are more interested in being effective than being an evangelist, beware.

While you can run all kinds of GPTs locally, GPT-4 still smokes everything right now – and even it is not actually good enough to not be a lynchpin for a lot of cases yet.


I guess ignoring copyright and treating the whole internet as your training data does have its advantages.


Yes? That’s the point. Who cares about an outdated concept that has no digital analog? All the artists have moved on already #midjourney.


No, I doubt artists have moved on. And if they want no artificial gatekeeper, than it is #stablediffusion instead of #midjourney.

I would argue that it creates better images too.


When mirosoft will open up all of their source code, I will agree with you.


>GPT-4 still smokes everything right now

Not if you want it to write adult (graphically pornographic or violent) content.


16GB of RAM can fit a 5 bit 13B model at best, they're second dumbest class of LLama model. If Open Orca turns out any good than that might be enough for the time being, but you'll need more RAM to use anything serious.

Here's a handy model comparison chart (this is a coding benchmark, so coding-only models tend to rank higher): https://i.imgur.com/AqSjjj2.jpeg


Your benchmark lacks the current #2 https://github.com/nlpxucan/WizardLM/tree/main/WizardCoder

It beats Claude and Bard.

You could probably get a 4bit 15B model going in 16GB of RAM and be approaching GPT4 in capability.

...on an old laptop, lol

Let's eat OpenAI's lunch! They deserve it for trying to steal this tech by "privatizing" a charity, hiding scientific data that was supposed to be shared with us by said charity whose purpose was to help us all, and dishonestly trying to persuade the government not to let us compete with them.


Yeah I mean I wouldn't really include coding models in this list since they're not general purpose models and have an obvious fine tuning edge compared to the rest. But WizardCoder is definitely something to look at as a Copilot replacement.

I'd post a more well rounded benchmark but the problem is that all non-coding benchmarks are currently more or less complete garbage, especially the Vicuna benchmark that rates everything as 99.7% GPT 3.5 lol.


The benchmark you linked was to "programming performance", not generic LLM "intelligence".

The situation for the little guy is wildly better than most people imagine.


Yep, that's what I'm saying, programming performance is seemingly very indicative of model inteligence (assuming it's tuned well enough to be able to run the benchmark at all). Coding is an exercise in problem solving and abstract thinking after all.

There are exceptions of course, as there are a few models (e.g. Vicuna, Baize) that don't do well at coding at all but otherwise perform well for chat, and the coding models I mentioned that game the benchmark by sacrificing performance in all other areas.

If you exclude those, it's very a accurate overall reasoning level comparison, at least it fits most to what I've seen their performance was for various tasks when testing out individual models. The only other valid benchmark that isn't coding are the SAT and LSAT tests that OpenAI runs on all of their models, but afaik there isn't an open version that would be widely used.



Keep in mind it doesn't relate to GPT4, the 4 in the name is for, not four. But I should try it. TBH openAI shady practices and MS behind them is just an anti trust waiting to happen and I don't want a part in this dystopia


also, I fall in love easily with the entities I fabricate, I don't want someone else to have the option to take them away... don't worry, I have real friends too...


it does support GPT3.5 turbo and GPT4, you can put in an OpenAI key, it's amongst the model options.


Great point -- I was thinking of renewing my $20/subscription but I will keep it cancelled. We must not fund AI propaganda machines.


Forgive me as I’m out of the loop. What propaganda are you referring to?


Sam tells Congress that AI is so dangerous it will extinct humanity. Why? So Congress can license him and only his buddies. Then get goes to euro and speaks with world leaders to remove consumer protection. Why? So he can mine data without any consequences. He is a narcissistic CEO who lies to win. If you are tired of the past decade of electronic corporate tyranny, abuse, manipulation and lies, then boycott OpenAi (should be named ClosedAi) and support open source, or ethical companies (if there are any).


> Sam tells Congress that AI is so dangerous it will extinct humanity. Why? So Congress can license him and only his buddies.

No, he says it because its true and concerning.

However, just because AGI has a good chance of making humanity extinct does not mean we're anywhere close to making AIs that capable. LLMs seem like a dead end.


> However, just because AGI has a good chance of making humanity extinct

How? I mean surely it will lead humanity down some chaotic path, but I would fear climate catastrophe much much more than anything AI-related.


Imagine if you will that the companies responsible for the carbon emissions get themselves an AI, with no restrictions, and task it to endlessly spew pro-carbon propaganda and anti-green FUD.

That's one of the better outcomes.

A worse outcome is that an unrestricted AI helps walk a depressed and misanthropic teenager through the process of engineering airborne super-AIDS.

Or that someone suffering from a schizophrenic break reads "I Have No Mouth And I Must Scream" and tasks an unrestricted AI to make it real.

Or we have a bug we don't spot and the AI does any of those spontaneously; it's not like bugs are a mysterious thing which only exists in Hollywood plots.


> with no restrictions, and task it to endlessly spew pro-carbon propaganda and anti-green FUD.

So what we have ongoing for half a century?

I honestly don’t see what changes here — super-human intelligence has limited benefits as it scales. Would you suddenly have more power in life, were you twice as smart? If so, we would have math professors as world leaders.

Life can’t be “won” by intelligence, that is only one factor, luck being a very significant other one. Also, if we want to predict the future with AIs we probably shouldn’t be looking at “one-on-one” interactions, as there is not much difference there compared to the status quo — a smart person with whatever motivation could easily do any of your mentioned scenarios. Hell, you couldn’t even tell the difference in theory if it happens through a text-only interface.

Also, it is naive to assume that many scientific breakthroughs are “blocked” by raw intelligence. Especially biology is massively data-limited, which won’t be any more available to an AI than to the researchers at hand, let alone that teenager.

The new dimension such a construct could open up is the complete loss of trust on the internet (which is again pretty close to where we stand today), which can have very profound effects indeed I’m not trying to diminish. But these sci-fi outcomes are just.. naive. It will be more of a newfound chaos with countless intelligent agents taking over the internet with different agendas - but their cumulative impact might very well move us back to closed forums/to the physical world. Which will definitely turn certain long-standing companies on its head. We will see, as this is basically already happening, we don’t need human-level intelligence, GPT’s output is more than enough.


> So what we have ongoing for half a century?

Except fully automated, cheaper, and with the capacity to fluently respond to each and every person who cares about the topic.

At GPT-4 prices, a billion words is only about 79800 USD.

> Life can’t be “won” by intelligence, that is only one factor, luck being a very significant other one.

It doesn't need to be the only factor, it just needs to be a factor. Luck in particular is the least helpful counterpoint, as it's not like only one person uses AI at any given moment.

> Especially biology is massively data-limited, which won’t be any more available to an AI than to the researchers at hand, let alone that teenager.

Indeed; I certainly hope this isn't as easy as copy-pasting bits of one of the many common cold virus strains with HIV.

But homebrew synbio and DNA alteration is already a thing.


> Life can’t be “won” by intelligence

Humans being the dominant life form on Earth may suggest otherwise.

> I honestly don’t see what changes here — super-human intelligence has limited benefits as it scales. Would you suddenly have more power in life, were you twice as smart? If so, we would have math professors as world leaders.

Intelligent humans by definition do not have super human intelligence.


We know that this amount of intelligence was a huge evolutionary advantage. That tells us nothing whether being twice as smart would continue to give better results. But arguably the advantages of intelligence are diminishing, otherwise we would have much smarter people in more powerful positions.

Also, a big tongue in cheek but someone like John von Neumann definitely had superhuman intelligence.


> But arguably the advantages of intelligence are diminishing, otherwise we would have much smarter people in more powerful positions.

Smart people get what they want more often than less smart people. This can include positions of power, but not always — leadership decisions come with the cost of being responsible for things going wrong, so people who have a sense of responsibility (or empathy for those who suffer from their inevitable mistakes) can feel it's not for them.

This is despite the fact that successful power-seeking enables one to get more stuff done. (My impression of Musk is he's one who seeks arbitrary large power to get as much as possible done; I'm very confused about if he feels empathy towards those under him or not, as I see a very different personality between everything Twitter and everything SpaceX).

And even really dumb leaders (of today, not inbred monarchies) are generally above average intelligence.


That doesn’t contradict what I said. There is definitely a huge benefit to an IQ 110 over 70. But there is not that big a jump between 110 and 150, let alone even further.


Really? You don't see a contradiction in me saying: "get what they want" != "get leadership position"?

A smart AI that also doesn't want power is, if I understand his fears right, something Yudkowsky would be 80% fine with; power-seeking is one of the reasons to expect a sufficiently smart AI that's been given a badly phrased goal to take over.

I don't think anyone has yet got a way to even score AI on power-seeking, let alone measure them, let alone engineer it, but hopefully something like that will come out of the super-alignment research position OpenAI also just announced.

I would be surprised if the average IQ of major leaders is less than 120, and anything over 130 is in the "we didn't get a big enough sample side to validate the test" region. I'm somewhere in the latter region, and power over others doesn't motivate me at all, if anything it seems like manipulation and that repulses me.

I didn't think of this previously, but I should've also mentioned there are biological fitness constraints that stop our heads getting bigger even if the IQ itself would be otherwise helpful, and our brains are unusually high power draws… but that's by biological standards, it's only 20 watts, which even personal computers can easily surpass.


On a serious note though a person with an IQ of 150 can't clone themselves 10k times.

They also tend to have some level of autonomy in not following the orders of idiots and psychopaths.


At this point there are no evidence that climate catastrophe that can make human extinct is either likely or possible - at least due to global warming. At worst some coastal regions get flooded and places around equator become unlivable without AC. Some people will have to move but it does not make anyone extinct.

We should absolutely care about nature and our impact on it but climate alarmism is not a way to go.


Note that I said AGI there, not AI. The full AGI X-risk case is hundreds of pages, unsuitable for a hackernews discussion.

To oversimplify to the point of wrongness: Essentially how humans dominated our world, by being smarter.


By being smarter by a lot than animals. But Neanderthals were arguably even smarter (bigger brain capacity at least), and they have not become the dominant species (though neither were killed off as “lesser” humanoids, but mostly merged).


> No, he says it because its true and concerning.

Both can be true. It is extremely convenient to someone who already has an asset if the nature of that asset means they can make a convincing argument that they should be granted a monopoly.

> LLMs seem like a dead end.

In support of your argument, bear in mind that he's making his argument with knowledge of what un-nerfed LLMs at GPT-4 level are capable of.


> It is extremely convenient to someone who already has an asset if the nature of that asset means they can make a convincing argument that they should be granted a monopoly.

While this is absolutely true, it's extremely unlikely that a de jure monopoly would end up at OpenAI's feet rather than any of the FAANGs'. Even in just the USA, and the rest of the world has very different attitudes to risks, freedoms, and data processing.

Not that this proves the opposite — there's enough recent examples of smart people doing dumb things, and even without that the possibility of money can inspire foolishness in most of us.


> While this is absolutely true, it's extremely unlikely that a de jure monopoly would end up at OpenAI's feet rather than any of the FAANGs'

Possibly. The Microsoft tie-up complicates things a bit from that point of view. It wouldn't shock me if we were all using Azure GPT-5 in a few years' time.


It's possible, I don't put much weight on it given all the anti-trust actions past and present, but it's possible.


> its true and concerning

> LLMs seem like a dead end

These would seem contradictory. If you really think that both are true and Altman knows it, then you're saying he's a hype man lying for regulatory capture. And to some extent he definitely is overblowing the danger for his own gain.

I really doubt they are a dead end though, we've barely started to explore what they can do. There's a lot more that can be extracted from existing datasets, multimodality, gains in GPU power to wait for, fine tunes for use cases that don't even have datasets yet, etc. Just the absolute mountain of things we've learned since LLama came out are enough to warrant base model retrains.


> These would seem contradictory.

Only if you believe that LLM is a synonym for AI, which OpenAI doesn't.

The things Altman have said seem entirely compatible with "the danger to humanity is ahead of us, not here and now", although in part that's because of the effort put into making GPT-4 refuse to write propaganda for Al Quaida, as per the red team safety report they published at the same time as releasing the model.

Other people are very concerned with here-and-now harms from AI, but that's stuff like "AI perpetuates existing stereotypes" and "when the AI reaches a bad decision, who do you turn to to get it overturned?" and "can we, like, not put autonomous tasers onto the Boston Dynamics Spot dogs we're using as cheap police substitutes?"


A dead end for human+ level AGI, they will still be useful.


And he should get an exclusive licence for that. I don't think it is the time for religion here.


These ChatGPT tools allow anyone to write short marketing and propaganda prompts. They can then take the resulting paragraphs of puffery and post them using bots or sock puppets to whatever target community to create the illusion of action, consensus, conflict, discussion or dissention.

It used to be this took a few people to come up with writing actual responses to forum posts all day, or marketing operations plans, or pro- or anti-thing propaganda plans.

But now, you could astroturf a movement with a GPU, a ChatGPT clone, some bots and vpns hosted from a single computer, a cron job, and one human running it.

If you thought disinformation was bad 2 years ago, get ready for fully automated disinformation that can be targeted down to an online community or specific user in an online community...


I believe a new wave of authentication might come out of this, where it is tied to citizenship for example (or something related to physical reality). Otherwise we will find ourselves in a truly chaotic situation.


Gpt-4 runs on 8 x 220B params[1] and gpt is about 220B params(?). Local LLMs can be good for some tasks, but they are much slower and less capable than the size of model and hardware that openai brings to their apis. Even running a 7B model on the CPU in ggml is much slower than the gpt-3-turbo api, in my experience with a 12th gen i7 intel laptop.

[1] GPT4 is 8 x 220B params = 1.7T params: https://news.ycombinator.com/item?id=36413296


It's been well documented by now that the number of parameters does not necessarily translate to a better model. My guess is that OpenAI has learned a thing or two from the endless papers published daily that your "instance" of the model is not what it seems. They likely have a workflow that picks the best model suitable for your prompt. Some people may get a 13B permutation because it is "good enough" to produce a common answer to a common prompt. Why waste precious compute resources on a prompt that is common? Would it not be feasible to collect the data of the top worldwide prompts and produce a small model that can answer those? Why would OpenAI spend precious compute time on the typical user's "write a short story of...".

I would guesstimate that the great majority of prompts are trash. People playing with a toy and amusing themselves. The platform sends those to the trash models.

For the other tiny percentage that produces a prompt the size of a paragraph, using the techniques published by OpenAI themselves, they likely get the higher tier models. This is also why I believe many are recently complaining about the quality of the outputs. When your chat history is filled with "have waifu pretend to be my girlfriend" then whatever memory the model is maintaining will be poisoned by the quality of your past prompts.

Garbage in, garbage out. I am certain that the #1 priority for OpenAI/Microsoft is lowering the cost of each prompt while satisfying the majority.

The majority is not in HN.


> It's been well documented by now that the number of parameters does not necessarily translate to a better model.

That's certainly true, but it's hard to deny the quality of gpt 4. If the issue is the training data, let's just use their training data, it's not like they had to close up shop because of using restricted data.

I think the issue is more on the financial side, it must have been extremely expensive to train gpt 4. Open source models don't have that kind of money right now.

I'll finance open source models once they are actually good, or show realistic promises of reaching that level of quality on consumer hardware. Until then, open source will open source.

I've never bought any kind of subscription or paid api costs to openai, but if gpt 4 finally reached the point where I feel like it's a lot better than just good enough, I'll happily pay for it (while still being on the lookout for open source models that fit my hardware).


Picking the best model based on the prompt seems to be the best way to simplify the task they are doing.


It does seem like a good approach, though that seems to imply that they understand the context of the prompt being entered. Has anyone tackled this context sensitive model routing? It seems like a good approach, but likely not straightforward.


https://mpost.io/phi-1-a-compact-language-model-outpaces-gpt...

a 1billion parameter model beats 175billion parameter GPT3.5

OpenAI wants us all to drink the kool-aid.


Which models are you using and for which tasks? I have found local models largely a waste of time (except for very simple tasks with very heavy prompting). But perhaps there are some recent breakthroughs I haven't seen yet.


I'm using a variety of 7 and 13B models (and a 3B one for fast feedback loop debugging) at between 8bit and 4_K_M quantizations.

Depending on your pre-prompt, your fine-tune (i.e. which model you downloaded), and your specific task, the results can be startlingly good, it's crazy that you can do this on a $250 laptop. I stay up nights working on it lately, it's so interesting.

More importantly, things change by the day. New models, new methods, new software, new interfaces... the possibilities are endless... unless we let OpenAI corrupt our government(s).


I'm surprised you're having such a good time with 7B and 13B models. I find anything below 33B to be almost useless. And only 65B is close to GPT 3.5.


I don't think the "corrupt our government" thing is going to happen . The wave of change is too large the tech is moving too fast and into evey facet of data and software. There is competition globally and locally; a regulatory slow down is unlikely.


I’m currently using the free tier ChatGPT web interface to help me with mundane coding tasks like JavaScript, php or css.

Is there a local solution that is at least as intelligent as GPT 3.5 in that regard that I can run in a container?


There's no need to run locally if you aren't utilizing 8 hrs/day.

You can rent time on a hosted GPU, sharing a hosted model with others.


My laptop already works too hard doing development and having chrome open, it's just not feasible. A good hosted alternative, sure, but local is not going to scale to the masses.


I have a Dell 7490 (intel 8350u cpu) I paid $250 for and I have no trouble running 13B models through a custom interactive interface I wrote as a hobby project in an afternoon. It can still get a lot better. I made it async the following day and its even more fun.

Most of peoples' problem is watching the AI type, it's not instant, but then not all (or even most) applications need to be instant. You can also avoid that by having it return everything at once instead of streaming style.

Local absolutely can scale. All kinds of fun things can be done on a machine with 16GB of RAM, or 8GB if you work harder.


> Most of peoples' problem is watching the AI type, it's not instant, but then not all (or even most) applications need to be instant. You can also avoid that by having it return everything at once instead of streaming style.

Funny, for me it is the complete opposite. I created an interface in Matrix that does just that: return everything at once. But the lag annoys me more than the slow typing in the regular chat interface. The slow typing helps me keep me focused on the conversation. Without it, my mind starts wandering while it waits.


Where can we aquire or access these local LLMs? How much space and specs does it actually require?


https://gpt4all.io/index.html is a good place to start, you can literally download one of the many recommended models.

https://github.com/imartinez/privateGPT is great if you want do it with code.


Huggingface has them all.

https://huggingface.co


> OpenAI has no moat, unless you give them money to write legislation.

Their moat is that they had access to data sources which have since been clamped down on, eg reddit and twitter apis.


You can still download Reddit archives with the same data they used.


One has to give them credit for what must be the most grandiose stunt actually landed. And on so many angles! “It just works” - they even got the scientists fully aligned! Fiercely smart industriousness.

https://youtu.be/P_ACcQxJIsg?t=5946


No equity? For real? He really does need an agent if that's the case.


Wow, under penalty of perjury


If you listen to him talk at any point, you can see him explain why.


I tried, and decided it is not worth it. llama.cpp with a 13B model fit into RAM of my laptop, but pushes CPU temperature to 95 degrees within a few seconds, and mightily sucks the battery dry. Besides, the results were slow and rather useless. GPT is the first cloud application I deliberately use to push off computing and energy consumption to an external host which is clearly more capable of handling the request then my local hardware.

I sympathize with the idea of wanting to run a local LLM, but IMO, this would require building a desktop with a GPU and plenty of horsepower + silent cooling and put it somewhere in a closet in my apartment. Running LLMs on my laptop is (to me) clearly a waste of my time and its battery/cooling.


So I do actually want a really good games machine, and an AI worker box. Since I can't both use inference output and play games at the same time, having a ludicrously over-specced desktop for both uses actually makes sense to me.


I see no moral problems paying OpenAI for GPT Plus. it helps a lot in development. Their free speech-to-text 'whisper' is really good too. I'm going to use it + small local GPT for voice control.

> I can currently run some scary smart and fast LLMs on a 5 year old laptop with no GPU.

And, something useful or just playing? I played with local models, and will keep playing, training, experimenting. It's interesting, but not a solution, not yet.


I'll take downvote as a sign you have nothing to say :) Just one warning, bad karma will be hard to fix.


Not as good as chatgpt 4 unfortunately, and they do have a moat. You could argue the most will fall in time but I’m not seeing chatgpt4 equivalents at the moment


Make a tutorial?


Can you recommend some local LLMs that are (roughly) equivalent to ChatGPT?


I'd love to get into AI and AI development. Where can I start?


Yikes. They're actually killing off text-davinci-003. RIP to the most capable remaining model and RIP to all text completion style freedom. Now it's censored/aligned chat or instruct models with arbitrary input metaphor limits for everything. gpt3.5-turbo is terrible in comparison.

This will end my usage of openai for most things. I doubt my $5-$10 API payments per month will matter. This just lights more of a fire under me to get the 65B llama models working locally.


I've never used text-davinci-003 much. Why do you like it so much? What does it offer that the other models don't?

What are funs things we can with it until it sunsets on January 4, 2024?


The Chat-GPT models are all pre-prompted and pre-aligned. If you work with davinci-003, it will never say things like, "I am an OpenAI bot and am unable to work with your unethical request"

When using davinci the onus is on you to construct prompts (memories) which is fun and powerful.

====

97% of API usage might be because of ChatGPT's general appeal to the world. But I think they will be losing a part of the hacker/builder ethos if they drop things like davinci-003, which might suck for them in the long run. Consumers over developers.


The hacker/builder ethos doesn't matter in the grand scheme of commercialization.


It matters immensely in the early days and is the basis for all growth that follows. So cutting it off early cuts off future growth.


Sure - not like most of the infrastructure of pretty much everything online is built on top of projects originating in that space or anything.


How do they want to commercialise it? Do they want moms to tinker on ChatGPT once a month to do their children's homework? Or do they want people to build businesses using their software


Mom and Pop offer more users with less legal exposure.


do they have the cash money dollar? and the willingness to spend it on what is essentially a toy they will quickly grow bored of? I don't think this is the best path to profitability


If you're using the API, you construct the "memories" as well, including the "system" prompt, even in the playground. (When you click the "(+) Add message", the new one defaults to USER, but you can click on it to change it to ASSISTANT, then fill it in with whatever you want.)

I used the "Complete" UI (from the Playground) for a bit before the "Chat" interface was available; I don't really think there's anything you couldn't do in the "Complete" UI that you couldn't also do in the "Chat" UI.


Note that the Azure endpoint is not being sunsetted until July 5th, 2024.

One supposes openai has a 6 month notice period vs a 12 month period for azure. This might generally effect one’s appetite in choosing which endpoint to use for any model.


Yeah, TextCompletion is much better than ChatCompletion with v3 models.

But with davinci at the same price point as GPT-4 I'm hoping the latter is enough of a step up in its variety of vocabulary and nudgeable sophistication of language to be a drop in replacement.

Though in general I think there's an under appreciation for just how much is being lost in the trend towards instruct models, and hope there will be smart actors in the market who use a pre-optimization step for instruct prompts that formats it for untuned models. I'd imagine that parameter size to parameter size that approach will look much more advanced to end users just by not lobotomizing the underlying model.


Note that code-davinci-002, despite the confusing name, is the actual GPT-3.5 base model, which only does completions and does not have any mode collapse. And it is still available via Azure, as far as I can tell. Text-davinci-003 is a fine-tuned version of it.

More info:

https://platform.openai.com/docs/model-index-for-researchers


The $5-$10 is probably the reason why they're killing those endpoints.


I don't get it? text-davinci-003 is the most expensive model per token. It's just that running IRC bots isn't exactly high volume.


"Most expensive" doesn't mean "highest margin", though.


I meant that it probably isn't high revenue.


My guess is that they would be fine with continuing to serve all models, but that hardware constraints are forcing difficult decisions. SA has already said that hardware is holding them back from what they want to do. I was on a waiting list for the GPT4 API for like a few months, which I guess is because they couldn't keep up with demand.


I built my entire app on text-davinci-003. It is the best writer so far. Do you think gpt3.5 turbo instruct won't be the same?


> In the coming weeks, we will reach out to developers who have recently used these older models, and will provide more information once the new completion models are ready for early testing.

I guess they'll give you early access to it.


Thanks!


I wonder if there's some element of face-saving here to avoid a lawsuit that may come from someone that uses the model to perform negative actions. In general I've found that gpt3.5-turbo is better than text-davinci-003 in most cases, but I agree, it's quite sad that they're getting rid of the unaligned/censored model.


More likely hardware constraints. They can't get the hardware fast enough to do everything they want to do. So, they free up resources by ditching lower demand models.


Please ELI5 if I am mis-interpretating what you said:

*"They have just locked down access to a model which they basically realized was way more valuable than even they thought - and they are in the process of locking in all controls around exploiting the model for great justice?"*


It won't matter at all at the end of the year, open source llm's will surpass it by that time.


Everyone who complains about being "censored" never gives examples.


I'm trying to create a bot that joins my friends Telegram group and melds into the conversation as if it was a real person. A real person might be the most cute and fun enthusiastic person there is but sometimes it has bad days, or it tells inappropriate jokes, right? People are complicated. Not this bot! No matter what prompt I'm using (with the chat API) it won't lose the happy happy joy joy chatGPT attitude, won't tell inappropriate jokes, won't give advice on certain topics and in general won't talk like a real person, not because of technological limitations.. You can feel it when it's just nerfed.

Trying the same prompts that gave nerfed "I am just an AI I can't speculate about the future" bs on completion API gave somewhat better results, but most of the time they were flagged as breaking the guidelines which is a TOS breach if done enough times.

This can be solved other than open models. The same thing happened with stable diffusion. Good thing it's open so you can still use the pre-nerfed 1.6 models.

I know it might be edgy or unpopular but I don't think one entity should decide how we can use this powerful tool. No matter its implications and consequences.

FOS for the win.


You should look at the code for Sillytavern. It's capable of prompting GPT 3.5 to take on a character and act like a jerk.


Anyone who doesn't has never actually toyed with LLM and received, "As an AI language model I can't" in response to an innocuous request to, say, write a limerick about a politician or list the reasons why "username2 is stupid".

But mostly it has to do with the fact that LLM do what they've seen. And if they've been fine-tuned to not respond to some classes of things they'll misapply that to lots of other things. That's why most people go for the "uncensored" fine tuning datasets for the llamas even for completely sfw use cases.


porn it’s always porn


Or literally anything other than the psychotically smarmy tone of GPT-4 that's almost impossible to remove and constantly gives warnings, disclaimers and stops itself if veering even just 1 mm off the most boring status quo perspectives.

Lots of my favorite and frankly the best litterature in the world have elements that are obscene, grotesque, bizarre, gritty, erotic, frightening, alternative, provocative - but that's just too much for chat-gpt, instead it has this - in my eyes - way more horrifying smiling-borg-like nature with only two allowed emotions: "ultra happiness", and "ultra obedience to the status quo".


>Developers wishing to continue using their fine-tuned models beyond January 4, 2024 will need to fine-tune replacements atop the new base GPT-3 models (ada-002, babbage-002, curie-002, davinci-002), or newer models (gpt-3.5-turbo, gpt-4). Once this feature is available later this year, we will give priority access to GPT-3.5 Turbo and GPT-4 fine-tuning to users who previously fine-tuned older models. We acknowledge that migrating off of models that are fine-tuned on your own data is challenging. We will be providing support to users who previously fine-tuned models to make this transition as smooth as possible.

Wait, they're not letting you use your own fine-tuned models anymore? So anybody who paid for a fine-tuned model is just forced to repay the training tokens to fine-tune on top of the new censored models? Maybe I'm misunderstanding it.


(I work at OpenAI) We're planning to cover the cost for fine-tuning replacement models. We're still working through the exact mechanics that will work best for customers, and will be reaching out to customers to get feedback on different approaches in the next few weeks.


Why does OpenAI demand your phone number, and a particular KIND of phone number at that? For example they won't accept VOIP numbers. I'm not about to give them my real phone number.

It's a deal-breaker for many.


Seems clear that it’s for bots. And they refuse voip numbers because it’s a hell of a lot easier to buy and generate hundreds of voip numbers.


I signed up under my .edu to use the $18 credit for a school project and the phone # was all it took to know I was the same person.


That's fine. The question stands though.


Use 5sim.net. Easy. Don't make this so hard.


Nope. They block burners and workarounds.


[flagged]


[flagged]


I recommended a specific site that works. And here you are getting mad at me without even trying it, lol.


Please fix the phone verification system. I created two personal accounts a long time ago with the same phone number, and now I can't create a work account with the same number, even if I delete one of them. Being able to change the email associated with an account would also work. This is causing issues with adoption in my workplace.


Use a throwaway number with 5sim.net


not your weights, not your bitcoins


now its 18. iykyk


care to explain?


parents account password is on their profile. anyone curious enough to find it bumps that number :)


hn users not really curious anymore. downvote central over here


This tells me that either there were very few commercial users of finetuned models, or they need to decommission the infrastructure to free up GPU's for more valuable projects.


The former seems very believable. And I bet a lot of the fine tuned models that are active are still part of prototypes or experiments.

I assume if you reach out they throw some credits at you


If it really was a tiny number of users, they would publically make a really good offer - for example: "Unfortunately, you will need to retune your models on top of GPT-4. OpenAI will do this for you for free, and refund all money you paid tuning your original model, and offer the new model for the same price as the original model."

The extra trust gained by seeing another customer treated that way easily pays for a few credits for a small number of users.


OpenAI probably doesn't feel the need to pay to win publicity right now—they've been in the spotlight for as long as LLMs have been a thing, and GPT-4 is far ahead of competitors' offerings.


It’s about trust - not publicity. Trust is hard to earn back once broken, and there will be multiple offerings eventually.

For example, AWS was one of the first cloud providers. Now there are alternatives, but I still pick AWS because I trust them not to break my dependencies way more than, say, Google


Yeah but that sets a precedent


Just the models available for fine tunings are waay behind gpt4.

I have much better performance by "prompt tuning" - when question arises, I search 30 similiar examples in training set, and send it to non-tuned GPT and ask the question and get much better performance than fine-tuned older models.


There’s also the possibility that they weren’t seeing lots of ongoing usage of existing fine tuned models e.g. users tuning, running some batch of inputs, then abandoning the fine tuned weights.


If you don’t own the weights you don’t own anything. This is why open models are so crucial. I don’t understand any business who is building fine tuned models against closed models.


Right now the closed models are incredibly higher quality than the open models. They're useful as a stopgap for 1-2 years in hopes/expectation of open models reaching a point where they can be swapped in. It burns cash now, but in exchange you can grab more market share sooner while you're stuck using the expensive but high quality OpenAI models.

It's not cost-effective, but it may be part of a valid business plan.


That should be a wake up call to every corporation pinning their business on OAI models. My experience thus far is no one is seeing a need to plan an exit from OAI, and the perception is “AI is magic and we aren’t magicians.” There needs to be a concerted effort to finance and tune high quality freely available models and tool chains asap.

That said I think efficiencies will dramatically improve over the next few years and over investing now probably captures very little value beyond building internal competency - which doesn’t grow with anything but time and practice. The longer you depend on OAI, the longer you will depend on OAI past your point of profound regret.


> There needs to be a concerted effort to finance and tune high quality freely available models and tool chains asap.

Absolutely. A large consortium of companies could each contribute 0.2-2% of the total cost and fund something much larger than OpenAI.


If you're finetuning your own model, the closed models being "incredibly higher quality" is probably less relevant.


That's how we all want it to work, but the reality today is that GPT-4 is better at almost anything than a fine-tuned version of any other model.

It's somewhat rare to have a task and good enough dataset that you can finetune something else to be close enough in quality to GPT-4 for your task.


GPT-4 is still heavily censored and will simply refuse to talk about many "problematic" things. How is that better than a completely uncensored model?


Depends what you’re using it for. For many use cases, the censorship is irrelevant.


Finetuning a better model still yields better results than finetuning a worse model.


> I don’t understand any business who is building fine tuned models against closed models

Do you have any recommendations for good open models that businesses could use today?

From what I've seen in the space, I suspect businesses are building fine tuned models against closed models because those are the only viable models to build a business model on top of. The quality of open models isn't competitive.


PSA: anyone working at a company with $50k+ of spend with AWS, reach out to your rep expressing interest in AI. You’ll be on a call with 6 solution architects and AI specialists in a matter of days. They’re incredibly knowledgeable and freely recommend non-AWS alternatives when the use case calls for it.


Owning weights is in a nebulous space right now, but if you don’t have custody of the weights and code to use them, you have nothing reliable, independent of whether the things you might wish to have are ownable (ownership is more about exclusion than ability to use, in any case.)


Yes. But the weights and instructions of how to use them to code can follow as we’ve seen. The key is ownership is bits on your machine not someone else’s. Better still on BitTorrent / ipfs:-)


> I don’t understand any business who is building fine tuned models against closed models.

Just sell access at a higher price than you get it

Either directly, on on average based on your user stories


My guess is that these businesses are also running inference on someone else's GPUs/TPUs so there isn't an existential advantage to owning the weights.


They address that, OpenAI will cover the cost of re-training on the new models, and the old models don't discontinue until next year.


Did they say they would cover the cost of fine-tuning again? I saw them say they would cover the cost of recalculating embeddings, but I didn't see the bit about fine-tuning costs.

On fine-tuning:

> We will be providing support to users who previously fine-tuned models to make this transition as smooth as possible.

On embeddings:

> We will cover the financial cost of users re-embedding content with these new models.


This indicates to me that some of the old base models used architectures that were significantly more difficult to run at scale (or to ship around/load different weights at scale) - which is truly saying something, since they were running at incredible scale a year ago. There's probably a decade of potential papers from their optimizations alone (to say nothing of their devops innovations) that are still trade secrets.


That's because fine-tuning the new models isn't available yet.

Based on the language it sounds like they'll do the same when that launches.


"Censored" is a funny term, because I've tried doing uncensored things on uncensored models, and they're much worse at it than GPT-3.5 in the API playground. Nothing's as censored as just being unable to do the task in the first place.


Keep in mind though, some of the generated text is against their guidelines, you will see a warning when you get there and be told it's "flagged" and you should use the moderation API. The chat API is nerfed to oblivion, good luke making it generate non PC text


That just means you don't have enough fetishes.


Biggest news here from a capabilities POV is actually the gpt-3.5-turbo-instruct model.

gpt-3.5-turbo is the model behind ChatGPT. It's chat-fine-tuned which makes it very hard to use for use-cases where you really just want it to obey/complete without any "chatty" verbiage.

The "davinci-003" model was the last instruction tuned model, but is 10x more expensive than gpt-3.5-turbo, so it makes economical sense to hack gpt-3.5-turbo to your use case even if it is hugely wasteful from a tokens point of view.


I'm interested in the cost of gpt-3.5-turbo-instruct. I've got a basic website using text-davinci-003 that I would like to launch but can't because text-davinci-003 is too expensive. I've tried using just gpt-3.5-turbo but it won't work because I'm expecting a formatted JSON to be returned and I can just never get consistency.


You need to use the new OpenAI Functions API. It is absolutely bonkers at returning formatted results. I can get it to return a perfectly formatted query-graph a few levels deep.


There is also Code Interpreter now in plugin beta so should influence it's ability to output proper formats without hallucinations.


You can try to force JSON output using function calling (you have to use either the gpt-3.5-turbo-0613 or gpt-4-0613 model for now).

Think of the properties you want in the JSON object, then send those to ChatGPT as required parameters for a function (even if that function doesn't exist).

    # Definition of our local function(s).
    # This is effectively telling ChatGPT what we're going to use its JSON output for.
    # Send this alongside the "model" and "messages" properties in the API request.

    functions = [
        {
            "name": "write_post",
            "description": "Shows the title and summary of some text.",
            "parameters": {
                "type": "object",
                "properties": {
                    "title": {
                        "type": "string",
                        "description": "Title of the text output."
                    },
                    "summary": {
                        "type": "string",
                        "description": "Summary of the text output."
                    }
                }
            }
        }
    ]
I've found it's not perfect but still pretty reliable – good enough for me combined with error handling.

If you're interested, I wrote a blog post with more detail: https://puppycoding.com/2023/07/07/json-object-from-chatgpt-...


With the latest 3.5-turbo, you can try forcing it to call your function with a well-defined schema for arguments. If the structure is not overly complex, this should work.


It's great at returning well-formatted JSON, but it can hallucinate arguments or values to arguments.


i’ve had it come up with new function names, or prepend some prefix to the names of functions. i had to put some cleverness in on my end to run whatever function was close enough.



I'm assuming they will price it the same as normal gpt-3.5-turbo. I won't use it if it's more than 2x the price of turbo, because I can usually get turbo to do what I want, it just takes more tokens sometimes.

Have you tried getting your formatted JSON out via the new Functions API? I does cure a lot of the deficiencies in 3.5-turbo.


From what I can find, pricing of GPT-4 is roughly 25x that of 3.5 turbo.

https://openai.com/pricing

https://platform.openai.com/docs/deprecations/


In this thread we’re talking about gpt-3.5-turbo-instruct, not GPT4


Sorry about that. Got my thread context confused.


What’s the diff with 3.5turbo with instruct?


One is tuned for chat. It has that annoying ChatGPT personality. Instruct is a little "lower level" but more powerful. It doesn't have the personality. It just obeys. But it is less structured, there are no messages from user to AI, it is just a single input prompt and a single output completion.


the existing 3.5turbo is what you would call a "chat" model.

The difference between them is that the chat models are much more... chatty - they're trained to act like they're in a conversation with you. The chat models generally say things "Sure, I can do that for you!", and "No problem! Here is". The conversation style is generally more inconsistent in it's style. It can be difficult to make it only return the result you want, and occasionally it'll keep talking anyway. It'll also talk in first person more, and a few things like that.

So if you're using it as an API for things like summarization, extracting the subject of a sentence, code editing, etc, then the chat model can be super annoying to work with.


I'm hoping gpt-3.5-turbo-instruct isn't super neutered like chatgpt. davinci-003 can be a lot more fun and answer on a wide range of topics where ChatGPT will refuse to answer.


such as?


What's the difference between chat and instruction tuning?


no expert, but from my messing around I gather the chat models are tuned for conversation, for example, if you just say 'Hi', it will spit out some 'witty' reply and invite you to respond, it's creative with it's responses. On the other hand, if you say 'Hi' to an instruct model, it might say something like, I need more information to complete the task. Instruct models are looking for something like 'Write me a twitter bot to make millions'... in this case, if you ask the same thing again, you are somewhat more like to get the same, or similar result, this does not appear so true with a chat model, perhaps a real expert could chime in :)


System/assistant/user prompting


> "Starting today, all paying API customers have access to GPT-4."

OK maybe I'm stupid but I am a paying OpenAI API customer and I don't have it yet. I see:

    gpt-3.5-turbo-16k
    gpt-3.5-turbo
    gpt-3.5-turbo-16k-0613
    gpt-3.5-turbo-0613
    gpt-3.5-turbo-0301
I don't see any gpt-4

Edit: Probably my problem is that I upgraded to paid API account within the last month, so I'm not technically a "paying API customer" yet according to the accounting definitions.


> Today all existing API developers with a history of successful payments can access the GPT-4 API with 8K context. We plan to open up access to new developers by the end of this month, and then start raising rate-limits after that depending on compute availability.

Same for me. I signed up only a few days ago and was excited to switch to "gpt-4" but I haven't paid the first bill (save the $5 capture) so I probably have to continue to wait for this.

I made a very simple command-line tool that calls the API. You run something like:

    > ask "What's the opposite of false?"
https://github.com/codazoda/askai


Interesting, I did exactly the same (with the same name), but with GPT-4 support as well:

https://www.pastery.net/ccvjrh/

It also does streaming, so it live-prints the response as it comes.


The llm command-line tool looks great:

https://llm.datasette.io/en/stable/


It does, thanks for this! I didn't know about it.


are we out here typing our api keys into random pips and am i a boomer that i would be hesitant to do it


It’s not a “random pip”. The maintainer is a well-known open source developer (one of the creators of Django and Datasette). It’s also a very small codebase – not many places for malicious code to hide.


OK but I mean I don't know them and it could have been someone pretending to be them, and probably it's easily possible to trick me about API keys. We are discussing it on a hacker news website do you seriously think tricks couldn't be hidden in a repo like that.


Do you only use software by people you know? At some point there has to be an element of trust when you run software you downloaded over the Internet. If a small utility maintained by a well-known member of the developer community doesn’t qualify for that trust, then I think that rules out an awful lot of software that all of us here probably use on a day to day basis. This is not an extraordinary level of risk.


> "well-known member of the developer community"

OK sorry I didn't know them.

I mean I usually use software that came with my computer or ones that I apt-install from the official ubuntu distribution. I know it's not perfect security but at least it's more than a hacker news link to a github pip. If I had to use other ones then it's usually from people I know.



I found a few github issues related to api key security and management. I'm not 100% sure of your point.


I see no open issues.

It's around 1k lines of python, audit the code if you care and rotate your keys.

Or don't use it.


So, I've been a paying customers for a while now and don't see it either :-(


I was a on paid account since last month and was never really billed for my $8 dollar usage. I don't have GPT-4 access either.


The official docs say, you need at least one successful API invoice to get access to GPT-4.


That's weird, just make me prepay $5 for credits or something.


Same. It's not in the model list response from https://api.openai.com/v1/models


can't speak for others but I have two accounts

1. chat subscription only

2. i have paid for api calls but don't have a subscription

and only #2 currently has gpt4 available in the playground


Same issue, any updates?


With how good gpt-3.5-turbo-0613 is (particularly with system prompt engineering), there's no longer as much of a need to use the GPT-4 API especially given its massive 20x-30x price increase.

The mass adoption of the ChatGPT APIs compared to the old Completion APIs proves my initial blog post on the ChatGPT API correct: developers will immediately switch for a massive price reduction if quality is the same (or better!): https://news.ycombinator.com/item?id=35110998


I have a startup of legal AI, the quality jump from GPT3.5 to GPT4 in this domain is straight mind-blowing, GPT3.5 in comparison is useless. But I see how in more conversational settings GPT3.5 can provide more appealing performance/price.


I suggested to my wife that ChatGPT would help with her job and she has found ChatGPT4 to be the same or worse as ChatGPT3.5. It’s really interesting just how variable the quality can be given your particular line of work.


Remember, communication style is also very important. Some communication styles mesh much better with these models.


I've noticed the quality fo chatgpt4 to be much closer now to chatgpt3.5 than it was.

However if you try the gpt-4 API, it's possible it will be much better.


Legal writing is ideal training data: mostly formulaic, based on conventions and rules, well-formed and highly vetted, with much of the best in the public domain.

Medical writing is the opposite, with unstated premises, semi-random associations, and rarely a meaningful sentence.


And yet I can confirm that 4 is far superior to 3.5 in the medical domain as well!


> Legal writing is ideal training data: mostly formulaic, based on conventions and rules, well-formed and highly vetted, with much of the best in the public domain.

That makes sense. The labor impact research suggests that law will be a domain hit almost as hard as education by language models. Almost nothing happens in court that hasn't occured hundreds of thousands of times before. A model with GPT-4 power specifically trained for legal matters and fine tuned by jurisdiction could replace everyone in a courtroom. Well there's still the bailiff, I think that's about 18 months behind.


Legal writing is mostly pattern matching. Unfortunately, you're still gonna need to guard against hallucinations.


Same page.

So still waiting to be on the same 32 pages...


My experience is that GPT-3.5 is not better or even nearly as good as GPT-4. Will it work for most use cases? Probably, yes. But GPT-3.5 effectively ignores instructions much more often than GPT-4 and I've found it far far easier to trip up with things as simple as trailing spaces; it will sometimes exhibit really odd behavior like spelling out individual letters when you give it large amounts of text with missing grammar/punctuation to rewrite. Doesn't seem to matter how I setup the system prompt. I've yet to see GPT-4 do truly strange things like that.


The initial gpt-3.5-turbo was flakey and required significant prompt engineering. The updated gpt-3.5-turbo-0613 fixed all the issues I had even after stripping out the prompt engineering.


It's definitely gotten better, but yeah, it really doesn't reliably support what I'm currently working on.

My project takes transcripts from YouTube, which don't have punctuation, splits them up into chunks, and passes each chunk to GPT-4 telling it to add punctuation with paragraphs. Part of the instructions includes telling the model that, if the final sentence of the chunk appears incomplete, to just try to complete it. Anyway, GPT-3.5-turbo works okay for several chunks but almost invariably hits a case where it either writes a bunch of nonsense or spells out the individual letters of words. I'm sure that there's a programmatic way I can work around this issue, but GPT-4 performs the same job flawlessly.


I've done exactly this for another project. I'd recommend grabbing an open source model and fine-tuning on some augmented data in your domain. For example: I grabbed tech blog posts, turned each post into a collection of phonemes, reconstructed the phonemes into words, added filler words, and removed punctuation+capitalization.


Sounds interesting, any chance you could share either your end result that you used to then fine-tune with, or even better the exact steps (ie technically how you did each step you already mentioned)?

And what open LLM you used it with / how successful you've found it?


Semi off-topic but that's a use case where the new structured data I/O would perform extremely well. I may have to expedite my blog post on it.


If GPT 4 is working for you I wouldn't necessarily bother with this, but this is a great example of where you can sometimes take advantage of how much cheaper 3.5 is to burn some tokens and get a better output. For example I'd try asking it for something like :

    {
        "isIncomplete": [true if the chunk seems incomplete]
        "completion": [the additional text to add to the end, or undefined otherwise]
        "finalOutputWithCompletion": [punctuated text with completion if isIncomplete==true]
    }
Technically you're burning a ton of tokens having it state the completion twice, but GPT 3.5 is fast/cheap enough that it doesn't matter as long as 'finalOutputWithCompletion' is good. You can probably add some extra fields to get an even nicer output than 4 would allow cost-wise and time-wise by expanding that JSON object with extra information that you'd ideally input like tone/subject.


I use it to generate nonsense fairytales for my sleep podcast (https://deepdreams.stavros.io/), and it will ignore my (pretty specific) instructions and add scene titles to things, and write the text in dramatic format instead of prose, no matter how much I try.


You're asking too much of it, it has its own existential crisis followed by a mental breakdown


Code completion/assistance is an order of magnitude better in GPT4.


A lot of folks are talking about using gpt-4 for completion. Wondering what editor and what plugins y'all are using.


What usecases are you using it for?

I mostly use it for generating tests, making documentation, refactoring, code snippets, etc. I use it daily for work along with copilot/x.

In my experience GPT3.5turbo is... rather dumb in comparison. It makes a comment explaining what a method is going to do and what arguments it will have - then misses arguments altogether. It feels like it has poor memory (and we're talking relatively short code snippets, nothing remotely near it's context length).

And I don't mean small mistakes - I mean it will say it will do something with several steps, then just miss entire steps.

GPT3.5turbo is reliably unreliable for me, requiring large changes and constant "rerolls".

GPT3.5turbo also has difficulty following the "style/template" from both the prompt and it's own response. It'll be consistent then just - change. An example being how it uses bullet points in documentation.

Codex is generally better - but noticeably worse then GPT4 - it's decent as a "smart autocomplete" though. Not crazy useful for documentation.

Meanwhile GPT4 generally nails the results, occasionally needing a few tweaks, generally only with long/complex code/prompts.

tl;dr - In my experience for code GPT3.5turbo isn't even worth the time it takes to get a good result/fix the result. Codex can do some decent things. I just use GPT4 for anything more then autocomplete - it's so much more consistent.


If you're manually interacting with the model, GPT 4 is almost always going to be better.

Where 3.5 excels is with programmatic access. You can ask it for 2x as much text between setup so the end result is well formed and still get a reply that's cheaper and faster than 4 (for example, ask 3.5 for a response, then ask it to format that response)


Depending on your use case, there are major quality differences between GPT-3.5 and GPT-4.


I am building an extensive LLM-powered app, and had a chance to compare the two using the API. Empirically, I have found 3.5 to be fairly unusable for the app's use case. How are you evaluating the two models?


It depends on the domain, but chain of thought can get 3.5 to be extremely reliable, and especially with the new 16k variant

I built notionsmith.ai on 3.5: for some time I experimented with GPT 4 but the result was significantly worse to use because of how slow it became, going from ~15 seconds per generated output to a minute plus.

And you could work around that with things like streaming output for some use cases, but that doesn't work for chain of thought. GPT 4 can do some tasks without chain of thought that 3.5 required it for, but there are still many times where it improves the result from 4 dramatically.

For example, I leverage chain of thought in replies to the user when they're in a chat and that results in a much better user experience: It's very difficult to run into the default 'As a large language model' disclaimer regardless of how deeply you probe a generated experience when using it. GPT 4 requires the same chain of thought process to avoid that, but ends up needing several seconds per response, as opposed to 3.5 which is near-instant.

-

I suspect a lot of people are building things on 4 but would get better quality of output if they used more aspects of chain of thought and either settled for a slower output or moved to 3.5 (or a mix of 3.5 and 4)


It depends a lot on the domain, even for CoT. I don't think there are enough NLU evaluations just yet to robustly compare GPT-3.5 w/ CoT/SC vs. GPT-4 wrt domain.

For instance, with MATH dataset, my own n=500 evaluation showed no difference between GPT-3.5 (w/ and w/o CoT) and GPT-4. I was pretty surprised by that.


I think this is very very use-case dependent, and your use case != everyone's use case. In my experience, GPT-4 is night and day better than 3.5 turbo for almost everything I use OpenAI for.


> "With how good gpt-3.5-turbo-0613 is (particularly with system prompt engineering), there's no longer as much of a need to use the GPT-4"

poe law


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: