Large language models are having their Stable Diffusion moment

minimaxir · on March 11, 2023

Currently right now there's too many caveats to run even the 7B model per the workflows mentioned in the article.

The big difference between it and Stable Diffusion which caused the latter to go megaviral is a) it can run on a typical GPU that gamers likely already have without hitting a perf ceiling and b) it can run easily in a free Colab GPU. Although Hugging Face transformers can run a 7B model on a T4 GPU w/ 8-bit loading, but with its own caveats too.

There's a big difference between "can run" and "can run well". VQGAN + CLIP had a lot of friction too and that's partially why AI image generation didn't go megaviral then.

BulgarianIdiot · on March 11, 2023

The 65B model runs on a MacBook. So "currently" changes really fast, these days.

https://twitter.com/ItIsFinch/status/1634643547189137409

Roritharr · on March 11, 2023

I have the 65B Model running on my MacBook here. Honestly, I wish I knew enough about how to get it to output something useful. It's like poking a drunk librarian.

sillysaurusx · on March 12, 2023

Le sigh. Do I need to fix this sampler too? The default output should be excellent, not a drunk librarian.

Rip open the code and make sure it has top_k 40, temperature 0.7, no top_p, and a repetition penalty of 1/0.85.

Proof: https://twitter.com/theshawwn/status/1632569215348531201?s=6...

Implementation: https://github.com/shawwn/llama

It feels basically identical to davinci. I think no one realizes it yet because Facebook released a horrible sampler that everyone copied.

Der_Einzige · on March 12, 2023

BTW - Now that discussions about sampling techniques within NLP are going mainstream, I have to shamelessly self-plug my own research work in this direction, specifically in the context of Constrained Text Generation. Simply filtering tokens according to constraints before generating text is wonderful for fixing stupid failure cases in LLMs like ChatGPT - such as asking it to not write with a particular letter, asking it to respect syllable counts, or asking it to rhyme properly (look carefully, ChatGPT struggles with exact rhymes in poetry!)

https://paperswithcode.com/paper/most-language-models-can-be...

My work got cited by Gwern in this article! https://gwern.net/gpt-3

JonathanFly · on March 12, 2023

I love this. I first tried it with Grover in 2019 and ran into similar token issues: https://twitter.com/jonathanfly/status/1143773270451396608?l...

Looks like you don't have a 'most common/simple words' vocab restriction? That's a pretty fun one too. https://twitter.com/jonathanfly/status/1144345934857474048

Der_Einzige · on March 12, 2023

I feel bad for not being active on Twitter, specifically because I missed out on stuff like this (which predates my paper). I should have cited this thread since it's evidence that someone else had utilized this technique in my paper!

I think this vocabulary filtering technique will pick up steam again now that good quality open source LLMs are proliferating. I've always wanted to try these techniques with ChatGPT, and the inability to do so has caused me to somewhat publicly complain about their lack of access to ChatGPT's vocabulary probability distribution (they don't give more than the top 10 most likely continuations).

kastnerkyle · on March 13, 2023

You might be interested in this thread from 2019 here on HN as well, I linked to a line of work on constrained sequence generation from Sony CSL that is pretty nifty https://news.ycombinator.com/item?id=21334639

As an aside - do you have any academic references for masking / constraining based next step transitions? It's a trick I've been messing with for years, but I probably picked this up from some older poetry generation work. Haven't been able to find any direct reference for simple 1 step masking, let alone more involved things like constraining based on the unrolled, modified markov sequence or things bigger than 1 step.

Der_Einzige · on March 14, 2023

I think that this is what you will be looking for

https://huggingface.co/blog/constrained-beam-search

This technique has been quietly available within HF for awhile, and has a lot of real world use-cases.

kastnerkyle · on March 25, 2023

For any models where you get access to the code (counting HF in this), definitely a lot of tooling available - and this tool looks really nice hadn't seen it before! Hope similar levels of control can become available for the bigger, less accessible systems, even though people don't want to expose token log-likelihoods there should be ways.

I've also been reading your paper on these topics, based on previous comments on HN. So thanks for that! Always interested in work about constrained generation.

I'd really like to see some version of the techniques from this palindrome generation https://www.ijcai.org/Proceedings/15/Papers/353.pdf revisited in modern contexts. Or some of the other Sony CSL work on this topic (e.g. https://www.francoispachet.fr/wp-content/uploads/2021/01/pap...) which I will admit I (still) do not understand yet.

Roritharr · on March 12, 2023

I can set top_k, temperature and top_p via shell already, but I don't think repetition penalty is dynamic yet. Looking into it.

Edit: This fork ( https://github.com/beiller/llama.cpp/tree/feature/repeat_pen... ) adds repetition penalty via two values, repeat_last_n = 64, repeat_penalty = 1.300000.

Playing with those now.

I wish I could somehow force it to output longer texts. It just stops at arbitrary points, often in the middle of sentences.

eigenvalue · on March 12, 2023

Thanks for flagging this. Using that repo, I'm getting pretty good results with the following:

./main -m ./models/13B/ggml-model-q4_0.bin -t 7 -b 10 --top_p 0.0 -n 5000 --temp 0.7 --repeat_last_n 64 --repeat_penalty 1.3 --prompt "You are a question answering bot that is able to answer questions about the world. You are extremely smart, knowledgeable, capable, and helpful. You always give complete, accurate, and very detailed responses to questions, and never stop a response in mid-sentence or mid-thought. You answer questions in the following format:

Question: What’s the history of bullfighting in Spain?

Answer: Bullfighting, also known as "tauromachia," has a long and storied history in Spain, with roots that can be traced back to ancient civilizations. The sport is believed to have originated in 7th-century BCE Iberian Peninsula as a form of animal worship, and it evolved over time to become a sport and form of entertainment. Bullfighting as it is known today became popular in Spain in the 17th and 18th centuries. During this time, the sport was heavily influenced by the traditions of medieval jousts and was performed by nobles and other members of the upper classes. Over time, bullfighting became more democratized and was performed by people from all walks of life. Bullfighting reached the height of its popularity in the 19th and early 20th centuries and was considered a national symbol of Spain. However, in recent decades, bullfighting has faced increasing opposition from animal rights activists, and its popularity has declined. Some regions of Spain have banned bullfighting, while others continue to hold bullfights as a cherished tradition. Despite its declining popularity, bullfighting remains an important part of Spanish culture and history, and it continues to be performed in many parts of the country to this day.

Now complete the following questions:

Question: What happened to the field of cybernetics in the 1970s?

Answer: "

alfonsodev · on March 12, 2023

I think accidentally the bot is kind of correct, you are asking to "complete the following questions" and that question seems complete, maybe the bot could have asked for clarification, I've never had that situation yet, would be something like .. "That question seems very well formulated, do you want me to answer it ? or to complete the question? we could complete the question by being more specific about location, for example what happened to the filed of cybernetics in EEUU in the 1970s ?"

Roritharr · on March 12, 2023

Interesting, that one segfaults for me. Gonna try with fresh eyes tomorrow, thanks!

Retr0id · on March 12, 2023

> and never stop a response in mid-sentence or mid-thought

I'm curious if this part actually does anything to improve the output, my intuition says it doesn't help much. I don't have it running locally to test yet, though.

aardvarkr · on March 12, 2023

These prompts become “key pillars” of the LLM’s “identity”. If you are curious you should at look about how much personality MS Sydney acquired through it’s own startup prompt.

dr_dshiv · on March 12, 2023

Lol, was that blank an accurate answer about cybernetics?

delusional · on March 12, 2023

That's the prompt, it's where the inserts the stuff it generates. The AI generated content has not been posted.

eigenvalue · on March 13, 2023

Yes I believe the parent was making a joke :)

sorokod · on March 12, 2023

Why would there be medieval jousts in the 17th and 18th centuries?

nsxwolf · on March 12, 2023

What kind of effort does it take to have any idea what you guys are talking about? Is there a For Dummies book?

pksebben · on March 12, 2023

there's no for dummies, this stuff is too new. there's a bunch of white papers to read; _attention is all you need_ is a good place to start, kinda.

if you troll HN for AI related stuff and just read read read, progressively googling the terms you don't understand, eventually the haze will lift and things will start to make some sense.

source: in the middle of doing that right now

ben_w · on March 12, 2023

> if you troll HN for AI related stuff and just read read read

Do you mean "troll" or "trawl"?

(I'm thinking of an old saying that the best way to get an answer isn't to ask a question, but to say the thing is impossible. Can't remember it well enough to google for the exact form, however).

Qworg · on March 12, 2023

IIRC, both forms work.

Trolling is done by hand with a fishing pole, sometimes on a dock.

Trawling is from the back of a boat, generally with a net.

ben_w · on March 13, 2023

Interesting. Until now I only knew the verb "troll" as in "being mean on the internet".

pksebben · on March 15, 2023

If I'm not hallucinating like GPT, I think the etymology of troll as in internet actually stems from trolling as in fishing.

I have a hazy recollection of folks in the early days of 4chan describing 'trolling for idiots' - as in, feigning ignorance to get a rise out of folks. Like, you post a dumb, obviously stupid but infuriating comment and watch people get all worked up about it. You know, for the lulz.

I feel like since then it's changed to just actually being a dumb asshole and enjoying when people are mad at you. Or maybe I've just gotten smarter. In any event at the time it certainly didn't seem like the kind of thing racist lunatics would go out of their way to do. Perhaps more the fool I, as I was a teenager at the time and as such that opinion cannot really be trusted.

d0mine · on March 12, 2023

To understand how pytorch works (bottom-up with practice video lectures): Neural Networks: Zero to Hero Andrej Karpathy https://youtube.com/playlist?list=PLAqhIrjkxbuWI23v9cThsA9Gv...

bootsmann · on March 12, 2023

I think the best way is to get a basic understanding of torch and then just reading what the code does. The model internals are hard (though for the transformer there are good step by steps online) but for this case it is basically playing with the output of the actual neural net inside the model. The net gives you a probability distribution to be next over all tokens (think words). Now it has proven to be useful not to pick the token with the highest probability but to employ different strategies. Top-k in this case means probabilistically sampling from the top k tokens instead of piking the best.

enjoylife · on March 12, 2023

It’s essentially a question of did you take a look the code? There is comments in the codebase directly where the config values in question are being used. Eg.

> # repetition penalty from CTRL paper (https://arxiv.org/abs/1909.05858)

Now I’m assuming a base rate of knowledge for this to help, but in general I find diving into the code paths of open source models, usually is a good use of time.

sirsinsalot · on March 12, 2023

"Transformers for Natural Language Processing", Denis Rothman (Packt)

eigenvalue · on March 12, 2023

What does no top_p mean exactly? Set top_p to 0.0? or to 1.0?

toxik · on March 12, 2023

jerpint · on March 12, 2023

Having it be on par with DaVinci is a big claim, will definitely be putting that to the test

ComplexSystems · on March 13, 2023

What does "1/0.85" mean here? The reciprocal of 0.85 is ~1.176, so is that what you are recommending?

taf2 · on March 12, 2023

Thanks even the 13B model that runs way faster with top_p disabled works better!

_just7_ · on March 12, 2023

Give it few shot examples and the quality should improve a lot. If you want a bot able to do technical Q&A give it two questions as well as the answers in its prompt. The model will then infer it's role as an AI that answer questions truthfully, and will try it's best to continue along those lines

Roritharr · on March 12, 2023

I gave it the prompt thats the example from here: https://huggingface.co/blog/few-shot-learning-gpt-neo-and-in...

Tweet: "I hate it when my phone battery dies." Sentiment: Negative ### Tweet: "My day has been " Sentiment: Positive ### Tweet: "This is the link to the article" Sentiment: Neutral ### Tweet: "This new music video was incredibile" Sentiment:

and it's output was:

Tweet: "I hate it when my phone battery dies." Sentiment: Negative ### Tweet: "My day has been 90% balls." Seentiment: Negative ### Twitter: "i don't have a favourite type of music i just hate country" Seentiment: Negative ### Twitter: "c'mon first sunshine of the year...why the rain already?!" Seentiment: Negative ### Twitter: "Dear twitter, are you getting tired of me yet? Seentiment: Negative ### Twitter: "I don't care about the race, I just care about the president being a psych

Doesn't feel like that did the trick, but at least you got me to setup the shell command in a way where I pipe long-form prompts in via a separate file.

visarga · on March 12, 2023

I think the model only has pre-training on large text but not multi-task fine-tuning and RLHF. So it can be improved a lot.

speedgoose · on March 12, 2023

Hopefully a RLHF fine tuned model will be released soon because that will make the technology much more useful and less disappointing to people trying to run an alternative of ChatGPT on their local machine.

mkl · on March 12, 2023

It invented "Seentiment"?

lifeisstillgood · on March 12, 2023

The next version release now needs to be called "Drunk Librarian". I want it enshrined in law :-)

vintermann · on March 13, 2023

For people complaining that ChatGPT makes up stuff, I use to say it's like asking a librarian at gunpoint. Of course it's very polite, and it knows a lot, but it also doesn't have time to look anything up or do any real research.

lalwanivikas · on March 11, 2023

Yup! I also ran it! It needs lot of prompt engineering though to get anything coherent out of it. I am waiting for instruction tuning[2] next.

[1] https://twitter.com/LalwaniVikas/status/1634648646535770113

[2] https://openai.com/research/instruction-following

notadev · on March 12, 2023

If you didn’t have a $5,000 MacBook, could you run it on a GPU over a long period of time? Would you be able to feed it a list of prompts that maybe take all night to process and have the responses by morning?

dheera · on March 12, 2023

Uh, if your budget is <$5000 or <$3000 or whatever you can get a lot more GPU crunch power if you don't buy a Mac.

Vanilla Ubuntu desktop with a used server-grade NVIDIA GPU with lots of GPU RAM is probably your best bang for buck for messing with large model inference right now.

BulgarianIdiot · on March 12, 2023

You're dispensing outdated advice. Instead of saying "you'll save" and "lots of GPU RAM" start specific listing numbers and you'll soon realize your mistake.

Mac has unified memory architecture that no other mainstream desktop can offer right now. This is the source of having up to 96GB VRAM on a laptop.

dheera · on March 12, 2023

Oh that's cool.

On a separate note I never understood why NVIDIA was always so stingy on RAM.

Give us a GPU with 256GB of RAM already. It doesn't need to be that fast, it just needs to be big enough to hold a huge model.

Instead they're still selling RTX4090Ti with 24GB of RAM while the free GPU I got from them at a raffle a few years ago has 32GB. WTF? I would have expected the RTX4090Ti to have 64GB minimum, being the highest-end consumer GPU.

BulgarianIdiot · on March 12, 2023

Hardware lags use cases by a few years. You don't need that much RAM to game, or to mine crypto. With AI, you need all the RAM. So expect NVIDIA to offer some drastically different products in coming years.

But to me, Mac's architecture is the most inevitable final solution: the GPU becomes part of the CPU, and there's no CPU RAM and GPU RAM, just RAM. Will be great for AI too.

jandrese · on March 13, 2023

Maybe we need GPUs with DIMM slots? Or maybe a second card that's all memory and connected by the bus they use for dual GPU setups? Sadly both are almost certainly too niche to exist in our current GPU marketplace.

BulgarianIdiot · on March 13, 2023

GPUs have too many cores, and each of them needs to read and write to RAM really fast, so any such indirection introduced through interfaces like DIMM is an obstacle to performance.

I suspect the future is tightly integrated CPU+GPU+RAM, but we may STILL have DIMM slots to acts as a buffer between "fast" RAM and disk. So your swap file goes from "fast RAM" to "DIMM RAM" before it has to go to disk, basically.

dheera · on March 13, 2023

Conceptually that's already the case i.e. L1 cache -> L2 cache -> RAM -> swapfile

If the definition of RAM is DIMMs then it's really just a matter of merging the CPU and GPU into one unit with a massive L2 cache (or maybe we have an L3 cache) of e.g. 32GB that's able to hold a massive model without cache misses.

bitRAKE · on March 14, 2023

Long term is fuzzy, but the 3D chip tech seems poised to offer exactly that: massive cache, GPU and CPU cores. NVidia isn't going to play nice because they don't like where this positions them in this future. AMD, Intel and Apple don't seem to need NVidia. This tension is bad for end-users.

jandrese · on March 13, 2023

Seems like AI type work is more memory constrained than per-core performance constrained.

dheera · on March 13, 2023

I feel like they would not be too niche, many AI researchers I know complain about the amount of RAM on a single GPU, especially the ones they or their labs can afford.

Very often multi-GPU training is maxing out RAM on each GPU and isn't making full use of compute capacity.

hnfong · on March 12, 2023

I think everyone running LLaMA on "cheap" Macs are using this: https://news.ycombinator.com/item?id=35100086

I've had success running the 7B model on a 16GB M1 Mac Mini (purchased in 2020!). I didn't bother to try but pretty sure the 13B model works as well.

I suspect you could grab one of those for less than $1000 these days.

itake · on March 12, 2023

My m1 max was $3k and it runs the 30b model

DennisAleynikov · on March 12, 2023

30B also loads great on my M2 Pro Mac mini that's just shy of $2K

This is insane

yieldcrv · on March 11, 2023

wow, this makes me wonder if human intelligence, or just biological intelligence, is based on how many parameters an individual can float and access.

I wonder if there are learnings for how to measure intelligence, just from proliferation of these AI models. Maybe even learnings for how to more efficiently access biological intelligence.

BulgarianIdiot · on March 11, 2023

I used to think that to achieve human-level intelligence with AI, we need to have human-sized neural network. It was a conservative guess, but seeing where things are going, I now strongly believe we need a lot LESS parameters with AI, because AI is trained from literally nothing (from noise) on our current culture, from the Internet. It has no baggage, it's built from the ground up to be a modern 21st century human. We're not, we train decades to become ones. And struggle with it. Our genes are mostly tuned for our environment from thousands of years ago.

A human brain carries a lot more baggage with it. Long story short, our biological brain hardware is much superior to current AI hardware, but our biological brain software (where we learn the culture that surrounds us) is much inferior to current AI machine learning. GPT-3 trained 175B on basically the whole internet in 15 days. That's insane. And now there are rumors that ChatGPT is executed on an optimized subset of this 175B model, which may be a bit more than 1/10 that size.

I think going forward we'll see 20-30B parameter models perform much better than an average human in typical tasks. Especially after we evolve the topology of an AI to not be a straight directed acyclic graph, as it is now, but allow structures to loop internally, to invoke other structures with input (as subroutines) and so on.

And also there's new hybrid analog/digital hardware in the pipeline, which makes AI execution 100 times more energy efficient and compact. So above-human level AI on your phone is on the horizon, I'd say.

We're only getting started.

nwienert · on March 12, 2023

This is nonsense. These nets regurgitate the most average possible output given inputs. By definition. You’re conflating and hyping and mixing up so many weird things at once.

You literally couldn’t get GPT to come up with a single novelty if you tried. It’s all remixing existing content, and again, doing so in a way to fit the average of the dataset.

When you realize this you realize it has no intelligence as we most typically define it (novel solutions to novel problems).. its not AI. Call it what it is: a beautifully advanced way to regurgitate the exact most popular (mundane) reply you’d expect given a huge dataset.

It’s sort of good for studying what already exists. It won’t even really ever show you the edges though so it’s actually almost dangerously deceptive as evidenced by this absurd rounding up people are doing. If you want to learn the gist of anything, ask GPT. If you want to know anything in depth, GPT in fact will only mislead you towards genericity, platitudinous mediocrity.

BulgarianIdiot · on March 12, 2023

Has GPT ever told you "let me think" and then you notice it actually pause for a few seconds? No. Its output is fixed cost. GPT can't think in depth because it's not allowed to. This is something I already covered above by noting that current NN models are a DAG. We're already experimenting with variations that can iterate internally, and therefore think deeper, using still their existing knowledge. And the results change dramatically.

You're also not right that GPT can't produce anything novel. Here's a tiny, modest example:

https://sumplete.com/about/

I won't waste my time, as I see you didn't read what you responded to the first time. But you need to start paying attention. You denial & empty confidence in what's happening will only get you so far.

nwienert · on March 12, 2023

Considering my former co founder works at OpenAI and we spent years working on and talking about related ML with and around some of the best people in the world, considering I’ve implemented language models multiple times myself, please try and be more specific on where I’m wrong.

I know many people who work in ML share this concern - people thinking it’s smart, it’s more than it is.

It’s very useful no doubt, but your comment is an example of extrapolation from maladjusted priors.

Your example proves my point, a very uninteresting looking game. I don’t even doubt GPT will find all sorts of interesting stuff by giving averaged out things. There’s plenty of gaps in the averages humans have missed. But nothing new, and until you can show that let’s just call it what it is.

ad404b8a372f2b9 · on March 12, 2023

I've seen ChatGPT draw logical conclusions from (most-likely) previously unseen axioms. I've seen it invent ideas that (most-likely) were not in his training set. I've seen it fail too, but when it succeeds I can not substantially distinguish it from the novelty or logical reasoning of a human.

On niche subjects, it fails often. But you can get it to correct itself by making it think logically by spelling out axioms, which seems to fit your definition of novelty.

All the arguments I've read about it not being intelligent seem to stem from some form of essentialism. Right now it's dumb intelligent but I think giving it an adjustable recursive depth, similarly to what BulgarianIdiot suggests, will go a long way to improving it given its base capabilities.

WoodenChair · on March 12, 2023

> I've seen it invent ideas that (most-likely) were not in his training set.

How can you possibly know what's in its training set? Its training set is so large that any "guess" about what is or is not in it is a fool's errand. Given that we know how the technology works, there's no reason to claim emergent behavior when Occam's razor would say the items are likely in the training set.

ad404b8a372f2b9 · on March 12, 2023

It's not a fools errand, you can't know definitively but you can have a level of confidence.

It's quite easy to invent a sentence that's never been uttered before, it's also fairly easy to hit the limits of the internet's knowledge in pretty much any complex discipline that's not software engineering.

And it's apparent when you speak to ChatGPT, he will hallucinate a lot to compensate for niche topics it doesn't know much about.

Occam's razor wouldn't point to ChatGPT having omniscient knowledge of all things ever thought of and that will ever be invented.

Sunhold · on March 12, 2023

This kind of stuff clearly shows that GPT can generalize concepts:

https://twitter.com/emollick/status/1626084142239649792

vkou · on March 12, 2023

And this kind of stuff shows that it clearly cannot:

https://acoup.blog/2023/02/17/collections-on-chatgpt/

> If you know nothing about either book, this answer almost sounds useful (it isn’t). Now this is a trivial research task; simply typing ‘the limits of empire review’ into Google and then clicking on the very first non-paywalled result (this review of the book by David Potter from 1990) and reading the first paragraph makes almost immediately clear the correct answer is that Isaac’s book is an intentional and explicit rebuttal of Luttwak’s book, or as Potter puts it, “Ben Isaac’s The Limits of Empire offers a new and formidable challenge to Luttwack.”

> A human being who understands the words and what they mean could immediately answer the question, but ChatGPT which doesn’t, cannot: it can only BS around the answer by describing both books and then lamely saying they “intersect in some ways.”

> The information ChatGPT needed was clearly in its training materials (or it wouldn’t have a description of either book to make a lossy copy of), but it lacks the capacity to understand that information as information (rather than as a statistically correlated sequence of words). Consequently it cannot draw the right conclusion and so talks around the question in a convincing, but erronous way.

Sunhold · on March 12, 2023

That appears to show that it can't answer some obscure trivia question to the standards of the blog post author. Nothing to do with generalizing concepts.

vkou · on March 12, 2023

It's not an obscure trivia question. It's a question that anyone with an understanding of the material in question should be able to answer, with something vastly better than a superficial 'the two books are similar, but, like, different and talk about different stuff'.

It immediately cuts to the heart of the matter - an LLM does not understand things. It does not know anything but which word follows another. It can't do anything but regurgitate things it has read in an incredibly lossy manner.

But it's a neat party trick that seems to fool a lot of people into thinking that there's something legitimately useful, here.

simonw · on March 12, 2023

Large language models are not search engines, even though a lot of people seem determined to treat them as such.

I don't think that blog post's excercise is a particularly useful demonstration of anything.

BulgarianIdiot · on March 12, 2023

Indeed. And telling it to spell out the steps in math solution is essentially (ab)using the chat log as the memory for an iteration loop.

This demonstrates then when you instruct it "break down the problem and solve it iteratively" it comes out with superior solutions. More accurate, more novel. Ergo, then if it was allowed to iterate in this way internally, it'd produce better answers by default.

This thread sums up the same idea: https://twitter.com/ItIsFinch/status/1634730897520951296

ad404b8a372f2b9 · on March 12, 2023

Yeah, I think it's going to be a promising area of research. It's known as Chain-Of-Thought (CoT) in the literature and there are a few papers already out:

https://openreview.net/forum?id=_VjQlMeSB_J

https://openreview.net/forum?id=5NTt8GFjUHkr

https://openreview.net/forum?id=qFVVBzXxR2V

I'm also looking forward to explicit long-term memory mechanisms that don't rely on the main architecture's weights nor on CoT.

fock · on March 12, 2023

There's no memory or iteration log. You are just priming the (more or less deterministic even...) predictor with some history. You are doing a lot of work yourself, setting a factually correct context (and this has severe limits...).

As for "we just need a way to include iterations": isn't this what attention is supposed to do (kind of dynamically updating weights). The usual way to really update weights are variations on gradient descent. Can you link a paper outlining how we integrate your proposal into the current framework (which took around 20years to mature)? Otherwise your statements are Sci-Fi (at the flying cars-level).

BulgarianIdiot · on March 12, 2023

Oh so there's no memory, just recallable history.

Oh so it's not learning by example, it's just primed by a sample.

Oh so there's no iteration of thought, just a loop of prediction.

OK.

Also, the papers you asked about: https://news.ycombinator.com/item?id=35115563

BTW, people are already implementing recursive/iterative queries on the ChatGPT API, and getting promising results.

LawTalkingGuy · on March 12, 2023

The langchain project is an example of the iterative queries approach. It comes with constructs for working memory, factual lookup / calculation agents, etc.

https://github.com/hwchase17/langchain

The general (non-technical) guideline is that the LLMs can "answer" anything you just gave them the answer for. So you give it a problem, ask it how to solve it, tell it to use that method and explain the data it needs, give it that data, and then show it everything at once: "With this data you requested and summarized, use this technique to answer this question".

BulgarianIdiot · on March 12, 2023

Are you on other social media BTW? Thx.

ad404b8a372f2b9 · on March 12, 2023

Afraid not, only here. You can at me at scrap.titanium@fastmail.com though.

BulgarianIdiot · on March 12, 2023

Your goalpost moved from "nothing novel" to "it's novel, but it's not interesting to me". I'll let not the years ahead, but literally the DAYS ahead prove you wrong. Because things are changing quite rapidly lately.

checkyoursudo · on March 12, 2023

> We're already experimenting with variations that can iterate internally, and therefore think deeper, using still their existing knowledge. And the results change dramatically.

This is still just deterministic except for a bit of noise to vary the output. There is no "thinking" going on. What you get is still just the most statistically likely output based on the training data.

BulgarianIdiot · on March 12, 2023

You're also deterministic. Except for a bit of noise. This is less of a slam dunk argument than many think it is.

_puk · on March 12, 2023

For reference, I enjoyed playing sumplete. Master level passed the time nicely (not the full 20 hours, but I did come back to it!).

Mechanics work well for an AI generated game. No idea if it's novel or not, but it's a good take.

chpatrick · on March 13, 2023

I think people also often figure out exactly what they think mid-sentence, by way of actually putting it into words. LLMs can iterate on their ideas as they generate the text.

WoodenChair · on March 12, 2023

> You're also not right that GPT can't produce anything novel. Here's a tiny, modest example

Actually it turned out it didn't invent any of the games in that article. You can just Google them and find out.[0][1]

Reading is not believing.

0: https://www.digitaltrends.com/gaming/sumplete-chatgpt-ai-gam...

1: https://www.novelgames.com/en/labyrinthsudoku/

> I won't waste my time, as I see you didn't read what you responded to the first time. But you need to start paying attention. You denial & empty confidence in what's happening will only get you so far.

Maybe he didn't read what you wrote, but you didn't even bother Googling the claims in the article you posted.

BulgarianIdiot · on March 12, 2023

Did you read carefully?

Your second link is a game different than Sumplete's rules, so not sure why you linked it.

You have one source, the Digital Trends article. It speaks about a mobile game being similar. The link here is at best circumstantial. The app page doesn't describe the rules of the game. Instead it says this:

"It is very similar to classical game of Kakuro"

I checked Kakuro, and it has similar, but distinctly DIFFERENT rules, as well. So, what's the theory here? GPT downloaded the game and ran it on its phone? No. Maybe the description of this game is somewhere. Maybe. But I couldn't find it. Where is it?

The excuse "it's somewhere, so it didn't invent it" will always be used for why an AI can't produce a "novel idea". While the reality of novel ideas is very simple: it's like conservation of matter & energy. You can never make something or destroy it. You can only rearrange it. New ideas are a rearrangement of other existing ideas. It's the case for everything we ever made. And everything we ourselves are.

pksebben · on March 12, 2023

This is a discussion in the art world that you'd hear a lot, before stable diffusion came out. The whole idea of "steal like an artist".

It's one of those questions about the nature of us that's on the level of "is there a ghost in the machine"; "do we have actually novel ideas?".

I'm kind of leaning the way you are; I don't think we do. What seems to be happening that we _call_ novelty is just an advanced form of recursive synthesis - we take information in and remix it. But we're capable of levels of abstraction, so we can remix it really well.

I think that getting the abstraction part into these models might just be as simple as wiring them together with a meta-model. A network responsible for identifying similarity between outputs of different types, or one that serves as a connective layer, distributing tokens through other networks and then synthesizing the results.

What I'm almost certain of, is that the important changes are going to be architectural.

BulgarianIdiot · on March 12, 2023

The important changes will be architectural, yes.

Three things I'm noticing:

1. We're reinventing structured programming in AI in fast forward. First it's a plain Markov chain. Then it's an "attention" directed acyclic graph. Then we realize we need loops. Then we realize we need to jump to different points in the loop. Then we realize it's useful to recursively call yourself or parts of yourself as a subroutine, parametrized with specific input. Etc.

2. Even before we fully realize this framework of thought into a model, I'm almost sure the model EVOLVES some of these structures during training. In the form of crude unrolled loops etc. Simply because it's inevitable for processing certain types of input data.

3. In order to preserve pragmatic outcomes, I'd bet the future is not one giant monolithic model for AI, but many medium-sized models, communicating in a meta network, like meta neurons, sending meta (high-level) messages to each other.

Essentially, we need to make neural networks more like a fractal. I have this rule of thumb that always works somehow: "no concept definition is complete, until it's made recursive". Neural networks will get there.

Sunhold · on March 12, 2023

There is no evidence at all that it copied the smartphone game. No one has even been able to show a textual description of its rules online that could be copied. And modifying the rules of Sudoko to create a variation is well within ChatGPT's known abilities. You can easily test this stuff yourself.

BulgarianIdiot · on March 12, 2023

It's quite bizarre to watch this discrimination against AI forming, where we make these increasingly more intelligent entities, and now everyone is out to prove they're stupid, and by definition they'll always be stupid, because only humans can be smart.

I don't want or need to anthropomorphize AI (yet?), but this reaction by us is a tad too similar to rather uncomfortable parts of our history.

nl · on March 12, 2023

I wonder how ChatGTP reinvented its own game? That indicates it's being retrained very frequently on crawl data which I wasn't aware was happening - certainly not with a cycle time in the weeks!

Sunhold · on March 12, 2023

It didn't. The article is nonsense. He probably primed it beforehand by talking about Sudoku, or it may have just chanced onto something similar. When I ask it to code Sumplete, it creates a "math-based game in which players have to solve a sequence of simple addition problems within a limited time to progress to the next level."

nl · on March 12, 2023

It says:

> “Can you code a game called Sumplete?” Despite protesting about its inability to make games earlier, ChatGPT immediately started spitting out fully formed HTML, Javascript, and CSS code. Sure enough, it had once again created a version of Tait’s game – only this time, we never discussed what it actually was beforehand.

I guess "a version of Tait's game" leaves some wriggle room about exactly what it did.

BulgarianIdiot · on March 12, 2023

The article seems to leave for itself wiggle room in several such places, suggesting the author had an agenda. Likely ideological rejection of AI as intelligence.

lordnacho · on March 12, 2023

> You literally couldn’t get GPT to come up with a single novelty if you tried. It’s all remixing existing content, and again, doing so in a way to fit the average of the dataset.

So if I ask chatGPT to make up a poem about the power of friendship, and it must involve a unicorn and a jackelope, it will give me that. What's not novel about it? Isn't it pretty much what any person would do, mash up some related words?

What does novelty mean in this context?

cal85 · on March 12, 2023

Leaving aside the question of whether that combination of images is novel… In your example, all the proposed novelty is specified in the prompt. ChatGPT didn’t come up with it, you did.

> Isn't it pretty much what any person would do, mash up some related words?

Yes, that’s exactly what chatgpt does, and it’s what many humans do. But to be analogous to your initial example, there should be another person who actually came up with the instruction specifying which words to mash into a poem. The word-masher, whether human or chatgpt, is just following the instructions, not coming up with them.

lordnacho · on March 12, 2023

So kinda like when your English teacher asks you to write a poem about whatever you want? Can't ChatGPT do that as well? There's at least novelty in whatever random content it generates, is there not? Just by having a large enough library you will create a new thing because nobody's been in the jakelope/unicorn area before, or whatever the dice decide. I feel like maybe there's something about this no-novelty argument, but it's not been worded in a convincing way yet.

cal85 · on March 12, 2023

By this logic, all my farts are novel content.

TigeriusKirk · on March 12, 2023

I just gave chatGPT the prompt "write a poem" and it gave me an original little ditty about nature.

The novelty was not in the prompt.

AlecSchueler · on March 12, 2023

The point is that it's able to produce content that hasn't been made before, it's not just giving back examples from the training set. It's able to--yes, with instruction--combine concepts logically to produce something new.

If I ask a random person I meet to write a poem with the same instructions as were given above I'd be surprised if they were able to come up with something as good as ChatGPT.

sirsinsalot · on March 12, 2023

Try getting a human to come up with a novel idea and you'll find their idea is just a remix of existing ones, partial understanding and a masked view of the world from their unique perspective (or they incorporated a random stimulus).

This constant reach to see human cognition as somehow unique (without evidence) is tiring and frankly an ego-extension / "made in God's image" type thought hole.

southernplaces7 · on March 12, 2023

On this site lately there seems to be a truly deep sense of unmitigated woo over all things AI and this comment nicely demonstrates it.

We know so far that human cognition and sense of self is still largely a mystery and no evidence exists to show that it's just mechanical remixing of absorbed things. Contextually, many humans can indeed create novel suggestions, arguments and ideas, and they can self direct towards these in ways that no AI can so far. This does indeed make our cognition visibly unique without even having to mention anything religious

ChatGPT on the other hand is literally just a well-engineered design for creating coherent phrases from a huge training set of human information. As even its very creators admit, it doesn't consciously think, consider for real or perform literal AGI in any way that's sentient. For you and others here to call this equal to human cognition is absurd and not based on measurable evidence. It seems more guided by emotional awe at something presented in a new way than guided by sober reasoning. And you talk about human-centric ego arguments being tiring?

sirsinsalot · on March 13, 2023

> We know so far that human cognition and sense of self is still largely a mystery and no evidence exists to show that it's just mechanical remixing of absorbed things

I'm not saying that ChatGPT is comparable in either capability or design to a human's cognition. What I am saying is that a human brain is just input,process and output. There's no reason a model of sufficient size can't emulate human thought in such a way that there's not much difference in capability (even if the mechanics differ).

I also think you're overstating the uniqueness of "novel" suggestions from humans and that "novel" ideas are some kind of high-water mark of intelligence rather than a fuzzy mechanism with induced feedback.

I can think of a novel idea by randomly mixing concepts I know. What matters is if this idea is good, and that test is done via interacting with the world. This mechanism can be implemented in an LLM now.

nwienert · on March 12, 2023

Do you think this net can tell us something about math or physics that we don’t know? If so, how?

The person making the claim that this is as or more intelligent than humans should at a minimum show it do something as impressive as humans. I’ll wait for a great piece of art or science until I for no reason diminish humans, that’s all. It’s not tiring to me to see the beauty of human intelligence.

People seem to be able to draw outside the lines. It’s cool, it’s beautiful. I shouldn’t have to point to any specific amazing achievements because there are too many. I haven’t seen one from an LLM yet. I’ll know it when I see it! Not hating, just clarifying.

sirsinsalot · on March 12, 2023

I didn't say an LLM is as intelligent as humans.

I'm saying that human intelligence isn't some mystic process above replication or emulation ... and that if an LLM-type computation can do what it does now I see not reason it can't match or surpass human intelligence in the future.

It may produce that intelligence in a different way to our organic squishglobes but I don't see that it matters.

AlecSchueler · on March 12, 2023

The vast majority of humans I've met have never produced great works of art or science. We're holding the models to higher standards than ourselves.

nl · on March 12, 2023

I didn't agree with the OP but just noting this is wrong:

> These nets regurgitate the most average possible output given inputs. By definition.

You seem to be confusing the training objective with the capabilities.

For example:

> Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers

https://arxiv.org/abs/2212.10559

nwienert · on March 12, 2023

How do you get from bigger LLMs being better at few shot learning to being able to generate novelties?

nl · on March 12, 2023

I don't know.

But note that this is happening at inference time, so it's not just generating the average of the input the model is trained on.

flir · on March 12, 2023

> You literally couldn’t get GPT to come up with a single novelty if you tried

That's an interesting claim I don't know how to measure.

>> Hey, ChatGPT, could you invent five words that mean hungry?

> Sure, here are five words that mean hungry:

> 1. Cravacious

> 2. Rumble-hungry

> 3. Chewyemptiness

> 4. Famishy

> 5. Voracihunger

Is that novel? I don't know. It feels a lot like the word games Lewis Carroll or Roald Dahl played, though.

BulgarianIdiot · on March 12, 2023

Few of those have zero results in Google search and sound fit for the purpose.

Unfortunately none of this matters, because "no true novel AI" is at this point just the "no true Scotsman" fallacy over and and over again.

We'll need to learn the hard way. Thanks for the creative test.

flir · on March 12, 2023

If you're looking for ways to measure creativity, ask it for recipes and then ask it to adapt them. It'll give you a recipe that uses strawberries and anchovies if you ask it. But like the words, it's remixing existing content and knowledge. Is that creative? I don't know. It's what a chef does, though.

RedCondor · on March 12, 2023

What do you think of this?

https://redsails.org/dialectics-and-ai/

hnfong · on March 12, 2023

> You literally couldn’t get GPT to come up with a single novelty if you tried.

Have you actually tried? You'd be surprised.

ikekkdcjkfke · on March 12, 2023

How can you say the outputs are average when you don't know the weights.

winstonprivacy · on March 12, 2023

Amen. Listen to this person, this is the voice of reason.

jerojero · on March 12, 2023

Here's a thought. Recently the whole connectome of the fruit fly has been mapped. 3000 neurons and 548k connections.

The fruit flys brain is capable of piloting the body of the fruit fly on a precision and energy efficiency that's simply unmatched in any kind of aerial machine we have developed. And only with those many connections, half a million? Neurons are very complex computation machines that can module their response very carefully.

The human brain does so much more than simply being able to process language. It's capable of one shot learning and to course correct incredibly easy. These features are very important for survivability and I would say are the "magic sauce" that we just haven't been able to replicate.

So I'm actually not so sure this is simply a parameters game. There could be important structures and specialisation that we are missing. In the fruit fly there are 93 different types of neurons, in the human brain iirc it might be hundreds.

BulgarianIdiot · on March 12, 2023

A language model is also capable of one shot learning. Try it in the chat, you tell it something, once, and it incorporates it for the rest of the chat. :-)

You may think I don't know what "one shot learning" means, but I do know, I'm just saying it's a problem of our perspective, not of the model's capability. Aside from it having no long term memory (by design) from the chat. Yet.

Our high-level learning is absolutely nothing like how we learned as toddlers. It took us DECADES to understand complex concepts from "few shots". Only pretrained models can do that, and you're also one, in your adulthood.

What you say about extra complexity from neuron types and neurotransmitters combining with one another to modulate even simple networks is true of course. But it's also true you don't need to replicate all those organic details in a NN, because they're equivalent to just a few more "regular" parameters. Think of an artificial neuron in a network as a Turing machine. A Turing machine can do everything with just some more tape, and a neural network can do everything with just some more parameters.

And yet, our models show that we don't need AS many parameters as we thought. And that's exciting.

kragen · on March 12, 2023

half a million connections is about 500 times more connections than there are in a 6502, so structure clearly matters a lot

lelanthran · on March 12, 2023

> It has no baggage, it's built from the ground up to be a modern 21st century human. We're not, we train decades to become ones.

The whole point of machine learning is to do all that "decades" of training in a few hours.

In terms of learning material, it takes LLM a few orders of magnitude more material to learn than it does a human. After all, a human child trained with very few books (by age 7 a child would have seen only a few books) still outperforms CHatGPT trained with the same material, because ChatGPT trained on 4 books is basically useless.

SanderNL · on March 12, 2023

They train on years of real-time streaming video. Feedback loops all over the place. Higher order structures and rules all get learned way, way before they even are able to touch a book. I cannot phantom the amount of data and analyses that passes a typical human child in a day.

So, sure, they can learn fast after all that. But so can GPT, which also one-shots a lot.

lelanthran · on March 12, 2023

> They train on years of real-time streaming video.

...

> I cannot phantom the amount of data and analyses that passes a typical human child in a day.

You're counting every second of every image that a child sees as a separate "image" that is learned. I'm counting the number of objects that a child sees, because people aren't doing frame-capture and analysis on every image, they're doing lazy interpolation (which is why all your peripheral vision in in monochrome but you perceive it as full color).

A 6 month old baby who has only ever seen close family (parents, siblings, pets) is quite able to discern the same features on strangers.

Nothing we have in AI right now can take 3 example models of humans, and detect hair, nose and mouth in a brand new never before seen model of a human.

Not really the same thing as training on millions of different images and videos. The number of distinct and different objects in a 7 yo child's training set is rarely the same as "every single youtube video ever uploaded".

Going back to the original point: ChatGPT needs millions of books to learn to regurgitate information as well as a semi-intelligent young adult, while typical toddlers are quite able to regurgitate what they've just seen.

Humans and animals get by with millions of times less training data than AI models need.

sebzim4500 · on March 12, 2023

>You're counting every second of every image that a child sees as a separate "image" that is learned. I'm counting the number of objects that a child sees, because people aren't doing frame-capture and analysis on every image, they're doing lazy interpolation (which is why all your peripheral vision in in monochrome but you perceive it as full color).

Then it sounds like he is slightly overestimating the dataset (since some of the frame is monochrome) and you are massively underestimating it.

hexane360 · on March 12, 2023

Faces may not be the best example, because there's a lot of evidence babies are 'hardcoded' to be interested in and pay attention to faces. This implies there's some knowledge of them built in.

However, if you substitute abstract shapes (blocks, balls, etc.), your point absolutely stands up.

In some ways, though, isn't it more interesting to have an intelligence that learns in a completely different way from humans, as opposed to one that learns through essentially the same method?

somenameforme · on March 12, 2023

I think there's a simple thought experiment that challenges these models as intelligence, artificial or otherwise.

Not especially long ago humanity knew basically nothing. The epitome of technology was the stone age. Literally crush two rocks together and then poke things with the sharp bits that flake off. Somehow, in the blink of an eye, we went from that to putting a man on the moon.

So let's do the same with these sort of LLM models and see what happens. Train one with the entirety of expressible human knowledge from the stone age. Where will it send you? I think the answer is largely pretty much where you are. In any case, certainly not to the moon. It seems we're just building natural language search.

This isn't shifting the goal posts. The problem is we keep intentionally making meaningless goal posts in pursuit of AI, because if you don't then the goal seems impossibly far off. But that's because the reality is that it probably is!

yieldcrv · on March 12, 2023

I get the impression that people are more triggered by the words intelligence and sentience, than the actual outcomes.

LLMs can make social consequences for us, from creating and maintaining state about opinions of us individually, and we will want to coexist in that reality. That's enough to adjust to and... respect. It doesn't seem to matter what arbitrary threshold of "intelligence" there is, in the face of an entity that creates much less arbitrary and dynamically generated - but predictable - outcomes. I don't have to understand what a human was trained on, or how smart they are, to understand the same thing. The same with an animal.

Closi · on March 12, 2023

I can personally see a difference in the definition of ‘intelligence’ on HN to when I speak to people I socially interact with day to day.

On HN intelligence is often seemed to be defined as more about new and novel thought / invention - it’s only intelligence if it thinks a thought that has never been thought/written before.

Off HN ‘intelligence’ seems to be closer defined to “is it smart enough to replace my job as a knowledge worker”. Pharmacists would say it has intelligence if it could automatically check prescriptions, and data analysts would say it has intelligence if it can answer a data request without their intervention.

IMO most people seem to define intelligence as whatever the part of their job is that currently requires analytical skill, because people are worried about the impact this could have on the middle and entry tier of knowledge work, rather than defining AI as needing to invent something fundamentally new.

Jensson · on March 12, 2023

They have API glue jobs, they take a request and then manually maps it to an API. It is basically the mental version of factory work, it is obvious that most of those jobs would disappear the instant we had good NLP. Human calculator jobs disappeared when we got artificial calculators, now human language processors will disappear since we have artificial language processors.

sebzim4500 · on March 12, 2023

Is your argument that chatGPT is not intelligent because it can not singlehandedly invent all the technology to do the Apollo program from scratch?

These goalposts have literally been sent to space.

tbrownaw · on March 12, 2023

At least partly, yes. There's that "7 +/- 2" number for how many slots there are in working memory, and then what actually goes in those slots has something to do with how well you understand whatever topic you're thinking about. IIRC there have been studies on chess experts that more or less boil down to them being able to use their slots more efficiently for game states that make sense.

mafuy · on March 12, 2023

Yes. It's fewer than 7 (more like 3), and each slot takes one 'symbol'. A symbol is anything that's in long term memory. Symbols are also recursive. It could thus be, for instance, a chess opening of 15 moves. This is why an expert player can memorize usual chess games easily, but struggles with truly random arrangements of chessmen. In the latter case, an expert does not outperform a beginner.

yieldcrv · on March 12, 2023

It could be interesting if we can create custom neural processors to excel at particular kinds of tasks.

Like temporarily, or permanently, allocate some part of the brain for a certain configuration, beyond what experts can do and closer to what a processor can do.

koheripbal · on March 12, 2023

With the 4bit quantization they needed to make it is like a jpeg with only 16 pixels per "thought" parameter.

It's going to perform like a brilliant person with brain injury. 8 bit is far better and it would be interesting to see if they can get it working on any reasonable hardware.

sebzim4500 · on March 12, 2023

Apparently if you quantize carefully you can avoid any loss of performance.

See https://arxiv.org/pdf/2210.17323.pdf

BulgarianIdiot · on March 12, 2023

The first proofs of concept are rarely ideal. But you know no one will stop here. On the software side, on the hardware side.

bestcoder69 · on March 11, 2023

Then this is SD for Apple silicon users. 13B runs on my m1 air at 200-300ms/token using llama.cpp. Outputs feel like original GPT-3, unlike any of the competitors I’ve tried. Granted- non-scientific first impressions.

ddren · on March 11, 2023

They have recently merged support for x86. I get 230ms/token on the 13B model on a 8 core 9900k under WSL2.

qumpis · on March 12, 2023

What's your ram usage for this?

JoeMattie · on March 12, 2023

I've got the (4-bit quantized) 65B param model running at somewhat acceptable speed on an i9-7900. It uses around 55GB of RAM.

ddren · on March 12, 2023

The (quantized) 13B model is 7.6 GB on disk and the program uses around 8 GB to run. It runs without hitting the swap with just 9 GB assigned to WSL2.

j45 · on March 11, 2023

Agreed. For those who have been quietly sitting with a base Mac Studio, or a reasonably capable Mac Mini. The possibilities changed on some fronts, but GPT's extremely low price on their API remains a good option.

nordsieck · on March 11, 2023

> or a reasonably capable Mac Mini

IMO, Apple's habit of cheaping out on ram is maddening. Although perhaps this is a sign for me to pick up a m2 pro mini with extra ram.

j45 · on March 12, 2023

Agreed. Apple might have sped up the ram addressing by building it in this time.

The Base mac studio with 32 gigs of ram can be a good value from a ram perspective.

Let’s see if the Mac studio is discontinued or updated.

nordsieck · on March 12, 2023

> Let’s see if the Mac studio is discontinued or updated.

I do hope that it's updated. I could see them reserving the top end chip for the Mac Pro, though. Or maybe make it a dual socket?

In any case, I much prefer the ability to run an Apple computer with a proper cooling setup. I know that cooling pads for laptops exist, but ultimately it's a bit of a janky solution compared to actually adequately cooling the system properly.

j45 · on March 12, 2023

You’re bang on with this comment.

The Mac studio is surprisingly quiet and well cooled. I haven’t had to install a fan control utility as of yet.

I thought I’d be trying one of these out and sell it if my setup didn’t work, but it’s the first desktop I’m starting to consider especially if I want to leave a workload running on it instead of a decade of carrying a lot of horsepower around with me.

Instant on for 3 monitors, no need for docking station or USB hubs, everything is plugged in and works, I can run a virtual cam obs setup easily.

The studio is a great upgrade on Mac mini, which might cannibalize the Mac pro sales and why it might not get toasted or refreshed. I’ll F the max studio was a way to put a dent in hackkntoshes, it makes sense.

The M2 mini looks super decent but quickly goes up in price when ram and ssd is setup to match a studio.

For my purposes the ultra wasn’t that much faster than the M1 max on because the software has to be optimized to benefit from it. Maybe some of this ML stuff will be shortly.

aaomidi · on March 11, 2023

Difference is chatgpt is not privacy friendly.

staticautomatic · on March 11, 2023

Is it still not privacy friendly on Azure?

aaomidi · on March 11, 2023

Azure has access to your queries. Running locally really is the only way of having a privacy friendly LLM.

fastball · on March 12, 2023

I wonder if there is some way you could do an E2EE LLM SaSS? Guess more work probably needs to be done into homomorphic encryption for that.

fullsend · on March 12, 2023

It feels like you don’t need encryption here, you just need a business model that doesn’t keep user accounts and sell the query data.

jamiek88 · on March 12, 2023

This would actually be a great use case for homomorphic encryption. I’m behind on what the state of the art is in that though, but my mind immediately went there.

speedgoose · on March 12, 2023

From my engineering point of view, the state of the art of homomorphic encryption can do some maths very very slowly at a huge cost and can’t be used yet for any real use case. A very cool research topic, but it’s much simpler to run our software locally if you don’t want to leak your data.

hobs · on March 12, 2023

It's insanely slow and probably wont be fast for anything for a long time.

aaomidi · on March 12, 2023

I don’t think homomorphic encryption works like that

jamiek88 · on March 12, 2023

Well that was helpful.

j45 · on March 12, 2023

Is the chatgpt api paid to openai not private to openai?

I understand azure has its own chatgpt api.

vid · on March 12, 2023

You can put together a 192GB 16+ core x86 system with 16GB CUDA card and multi TB of fast storage for $2000ish. I'm not trolling, just wondering why people go to "Apple" all the time, when other approaches may be better for this kind of work. Yes, the CPU - RAM interface is not as fast, but the CUDA card is much faster, and the large cheap memory makes some things a lot more practical. If I'm not mistaken, in these approaches, GPU + VRAM + CPU + RAM are used in conjunction, so it all adds up to quite a bit more powerful system for the same amount of money if working with them is a main goal. If Apple had expandable RAM it would be a different story.

j45 · on March 12, 2023

I’m as much PC/Linux as I am Mac. Lots of experience putting together more computers than I can count, or having them put together.

I think there might be a different use case for me but to compare with your economics - I picked up a base mac studio (m1 max with 32 gb) for about $1370 USD with 8 months warranty remaining. It’s letting me test my daily setup to see if everything can run on Apple silicon yet without intervention.. if not, I’ll be able to get rid of it at little to no loss, and decide if I want to carry that much computing power in a laptop or not, or head on to other options.

My interest currently is computational power, per watt. Not much comes close to Apple Silicon.

The cost of a loaded pc can quickly outstrip other options with electricity costs included so it needs to be useful in any case.

The integrated speed of the Apple silicon, ram, and ssd is a little astonishing. More than I expected to admit. I don’t know if there’s anything like it on PC. If Apple silicon supported eGPUs it would interest me.

Comparable PCs have described are generally power inefficient. Still, the system you’re laying out is interesting, especially the ram, mind sharing a bill of materials?

vid · on March 13, 2023

Sure, and I also get the "Apple lifestyle" POV.

I guess my PoV is with a GPU + inexpensive RAM, I can do some things more quickly, and other things are at least possible. But granted, hardware nuts and bolts is a hobby for me too, I've always enjoyed researching and building systems, there is a huge community of people and companies, and I much prefer KDE to MacOS, Android to IOS.

If Apple released a reasonably priced system with lots of RAM, or that could take a GPU, I'd not have as much to stand on, but they don't seem inclined to do that.

Here's BoM, $2170:

$700 for 192GB https://www.corsair.com/us/en/Categories/Products/Memory/VEN...

About $200 for a LGA1700 board with four slots (currently Asus and Gigabyte support 4 X 48GB memory modules on consumer hardware)

$200+ for a CPU

$120 for an 850 watt PSU

$700 for a used 16GB quadro, or a new 12GB A2000

$150 for 2TB NVME storage

$50 for some kinda case (I use an XTIA open frame, it is the opposite of a Macbook)

These are "best prices," You'd have to pay around 10% more if you're not into finding deals.

j45 · on March 24, 2023

How many watts would be needed to power the above?

If it’s 400-800w 24 hours a day, op costs quickly pile up.

It looks really reasonable for everything here. I am interested more in eGPUs for interchangeability but have enough on my plate.

simonw · on March 12, 2023

In my case I'm interested in AI, but not quite enough to spend money on a whole separate computer for exploring ML work. I want a really great single laptop I can use for everything.

smoldesu · on March 12, 2023

Screw that - the other day I realized that it's cheaper to buy an Intel A770 ($350) with 16gb of memory than it is to upgrade a Mac Mini with 16gb ($400) of extra memory.

Apple's optimization here is nice for the people who own their hardware, but it's totally silly to read through the comments promising the end of CUDA.

MacsHeadroom · on March 12, 2023

13B in 4bit runs on your average 8GB VRAM consumer video card at fast enough speeds.

The GP's info is outdated. Apparently they don't realize how quickly LLaMA is being developed on. It's just like the early days of SD.

zamadatix · on March 12, 2023

llama.cpp runs on the CPU anyways, the whole conversation is on a false premise. The only hard thing about running it on consumer hardware is the quantization step (where the largest RAM requirements come in) and that's not even a technical problem since you could just redistribute the quantized version if it weren't for the license of the model.

smy20011 · on March 12, 2023

Maybe out-dated by 1 day LOL.

tomp · on March 12, 2023

Same here.

running 4-bit LLaMa 13B on my MacBook Air M2:

"possible resolutions to the Fermi paradox, the question of where is the evidence for extra-terrestrial civilizations: - "

answer:

  possible resolutions to the Fermi paradox, the question of where is the evidence for extra-terrestrial civilizations:
  - 1. We are looking in the wrong direction and in the wrong frequency bands for signals from intelligent life in the galaxy.
  - 2. We are looking in the right frequency bands, but the technological level of intelligent life in the galaxy is not high enough to be a threat to us.
  - 3. We are looking in the right direction and in the right frequency bands, but the probablity of interstellar travel is so small that we are not likely to meet civilizations.
  - 4. We are looking in the right direction and in the right frequency bands, and the probability of interstellar travel is not small, but we are unlikely to meet civilizations at our level of technological development.
  - 5. We are looking in the right direction and in the right frequency bands, and the probability of interstellar travel is not small, and we are likely to meet civilizations at our level of technological development.
  - 6. We are looking in the right direction and in the right frequency bands, and the probability of interstellar travel is not small, and we are likely to meet civilizations at our level of technological development.
  I would be inclined to think that option 5 or 6 are the most likely. Given the nature of the universe, I think that we are more likely to encounter civilizations in the earliest stages of their development, than those that are further developed.
  I would think that the big question of the next 10 years is whether SETI will be able to detect an ET signal. I think that the chances of being able to do so are less than even.
  The reason is that a lot of the technical issues that SETI has been addressing are becoming less of a problem, because the Earth-based radio telescopes are being joined by many smaller radio telescopes that are being sent into space. So, from a technical standpoint, the ability to look for ET signals is becoming easier and easier.
  The problem is that any signals that are sent to Earth that come from technologically advanced civilizations are likely to be only a few percent of the power of the ET signals that are coming from sources at our level of technological development. So, for example, SETI@home is looking for

simonw · on March 11, 2023

By caveats do you mean the licensing terms or the difficulty of prompting the model?

Unless it's relicensed I don't expect LLaMA to be a long-term foundation model. But it's shown that yes, you can run a GPT-3 class model on an M1 Mac with 8GB of RAM (or maybe 16GB for the 13B one?)

I fully expect other models to follow, from other organizations, with better capabilities and more friendly licensing terms.

minimaxir · on March 11, 2023

Ignoring the licensing issues, there are a few other constraints that would make the model harder to go viral outside of developers who spend a lot of time in this space already:

1) Model weights are heavy for just experimentation, although quantizing it down to 4-bit might make them on par with SD FP16.

2) Requires extreme CLI shenanigans (and likely configuration since you have to run make) compared to just running a Colab Notebook or a .bat Windows Installer for the A1111 UI.

3) Again hardware: a M1 Pro or a RTX 4090 is not super common among people who are just curious about text generation.

4) It is possible the extreme quantization could be affecting text output quality; although the examples are coherent for simple queries, more complex GPT-3-esque queries might become relatively incoherent. Particularly with ChatGPT and its cheap API (timely!) out now such that even nontechies have a strong baseline on good output already. The viral moment for SD was that it was easy to use and it was a significant quality leap over VQGAN + CLIP.

I was going to say inference speed since that's usually another constraint for new LLMs but given the 61.41 ms/token cited for the 7B model in the repo/your GIF, that seems on par with the inference speed from OPT-6.7B FP16 in transformers on a T4.

Some of these caveats are fixable, but even then I don't think LLaMA will have its Stable Diffusion moment.

simonw · on March 11, 2023

The 4-bit quantized models are 4GB for 7B and 8GB for 13B.

I'm not too worried about CLI shenanigans, because of what happened with whisper.cpp - it resulted in apps like https://goodsnooze.gumroad.com/l/macwhisper - wouldn't be at all surprised to see the same happen with llama.cpp

A regular M1 with 8GB of RAM appears to be good enough to run that 7B model. I wonder at what point it will run on an iPhone... the Stable Diffusion model was 4GB when they first released it, and that runs on iOS now after some more optimization tricks.

For me though, the "Stable Diffusion" moment isn't necessarily about the LLaMA model itself. It's not licensed for commercial use, so it won't see nearly the same level of things built on top of it.

The key moment for me is that I've now personally seen a GPT-3 scale model running on my own personal laptop. I know it can be done! Now I just need to wait for the inevitable openly-licensed, instruction-tuned model that runs on the same hardware.

It's that, but also the forthcoming explosion of developer innovation that a local model will unleash. llama.cpp is just the first hint of that.

smoldesu · on March 11, 2023

> The key moment for me is that I've now personally seen a GPT-3 scale model running on my own personal laptop.

I hate to pooh-pooh it for everyone, but this was possible before LLaMa. GPT-J-125m/6b have been around for a while, and are frankly easier to install and get results out of. The smaller pruned model even fits on an iPhone.

The problem is more that these smaller models won't ever compete with GPT-scale APIs. Tomorrow's local LLaMa might beat yesterday's ChatGPT, but I think those optimistic for the democratization of chatbot intelligence are setting their hopes a bit high. LLaMa really isn't breaking new ground.

simonw · on March 11, 2023

I'm not particularly interested in beating ChatGPT: I'm looking for a "calculator for words" which I can use for things like summarization, term extraction, text rephrasing etc - maybe translation between languages too.

There are all kinds of things I want to be able to do with a LLM that are a lot tighter than general chatbots.

I'd love to see a demo of GPT-J on an iPhone!

sroussey · on March 12, 2023

In a decade, these models will be baked into a SOC.

Vetch · on March 11, 2023

There was only ever a 6B GPT-J, you must be thinking of GPT-neo's for smaller sizes. GPT-J was the best of its kind for a long while but even just the 7b version of LLaMa soundly surpasses it in how well it follows examples to solve problems and do useful work like summarization, keyword extraction and a mini search agent. Being able to run something on your laptop that matches the original davinci GPT3 and can be tuned to go beyond is nothing to sniff at.

The best models will be by API but there is a threshold of utility for LLMs, below which they're just bad story telling toys, that hasn't been available for home users until recently, with FlanT5s and LLaMas.

simonw · on March 12, 2023

What prompt are you using to get LLaMA to do summarization?

BryceSchroeder · on March 22, 2023

Would also like to know.

ShamelessC · on March 11, 2023

GPT-J is cool and all, but it’s been out for awhile and hasn’t seen anywhere near this amount of network effects. Further the Eleuther team is a bit detached from the “average user” in my experience, being all-in on TPU and JAX (which is still suffering from a similar lack of maturity making it difficult to run on non-standard setups).

I’m sure someone did the work you suggest to let folks run on this on more reasonable setups- but I never saw it on the front page here.

Did they achieve GPT-3 performance on anything at that model size? Happy to be corrected, but my understanding was llama does in fact outperform GPT-3 even at the smaller parameter-counts on certain downstream tasks due to using the chinchilla training regime (and other tricks).

stonebraker · on March 21, 2023

Has anyone compared performance between LLaMa.cpp vs Alpaca.cpp?

zamnos · on March 11, 2023

But is anyone actually making money off of StableDiffusion? Maybe the shovel-sellers (runpod.io et al), but afaik no one using it as the foundation for a revenue generating company. I ask, because yes, technically, you can't get LLaMA legally unless you're a researcher and get it directly from Facebook. But that's not going to stop the faithful from finding a copy and working on it.

nomdep · on March 11, 2023

> But is anyone actually making money off of StableDiffusion?

How about:

- https://www.deepagency.com/

- https://avatarai.me/

simonw · on March 11, 2023

I believe Midjourney may have used bits of Stable Diffusion in their product, which is definitely profitable.

vintermann · on March 13, 2023

If anyone is making net money on Stable Diffusion already, they're either small scale or not going to be around for long, probably both. Anything in it for the longer term is going to be in the buildup phase.

blueblimp · on March 12, 2023

NovelAI's image generation uses a fine-tune of Stable Diffusion.

logifail · on March 11, 2023

> is anyone actually making money off of StableDiffusion?

We're all still waiting to hear about (non-shovel-selling) successes in this space.

pmoriarty · on March 11, 2023

I don't know about Stable Diffusion in particular, but three examples of AI-generated art making money immediately spring to mind:

1 - some guy won hundreds of dollars in an art contest from AI generated art (and this made big news, so it should be easy to find)

2 - one person reported using midjourney's images as a starting point for images that wound up being used in a physical magazine

3 - another artist has used midjourney images that they modify to sell in all sorts of contexts (like background images on stock illustration sites)

You'd probably find many other examples in midjourney's #in-the-world discord channel.

I'd also be shocked if stock image sites, clipart sites and freelance design/illustration sites weren't already flooded with AI generated images that have been sold for money.

That being said, because high questly AI-generated images are so easy to make, the value of images of all types is likely to plummet soon if it hasn't already.

dragonwriter · on March 12, 2023

> the value of images of all types is likely to plummet soon if it hasn't already.

The price might, but the marginal value having them or not has to other products which incorporate them likely won’t as much. So, say, if you are a TTRPG publisher, the value of art to your business doesn't change, but the cost of moving farther along the no art -> scattered sketches & tangentially relevant public domain works -> ideal quantity of art, specific to the product, goes down. So, you are, in effect, making money on SD.

dragonwriter · on March 12, 2023

Ultimately, its an art tool — a shovel — ecosystem that is democratized enough that it is dominated by high-quality, end-user usable components.

Making money with it in the non-shovel-selling sense is going to be commercial artists (or other people with visual image needs, perhaps avoiding the need for commercial artists) incorporating it in production workflows. But except when they are also selling shovels (e.g., Pixar with Renderman) there’s not a whole lot of incentive to trumpet workflow components rather than product in that space.

logifail · on March 12, 2023

> commercial artists (or other people with visual image needs, perhaps avoiding the need for commercial artists) incorporating it in production workflows

Maybe it's just me, but that reeks of the archetypal race to the bottom.

"We didn't have to pay professional X to do task Y as we and others have been doing for decades, instead we used modern replacement Z funded by cheap money which costs peanuts at present".

Q: Is there really a business to be made selling Z?

dragonwriter · on March 12, 2023

Who cares? If Z is effectively free, and there are businesses to be enhanced using Z, then people are making money with Z, even if the mechanism isn’t “selling Z.”

(Of course, yes, if there are businesses to be enhanced with Z, even if Z is free in the sense that open source software is free – which is exactly or approximately the case with much of the Stable Diffusion ecosystem – there will be businesses also to be made “selling” Z, or, more accurately, selling specialized services relating to Z like bespoke enhancements, support, hosting, and training. But those are very much the selling-shovels services that the question upthread asked to exclude; the using-shovels businesses are actually producing products incorporating images.)

tracyhenry · on March 11, 2023

Another big difference is quality of the results. Haven't tried myself but seen many complaints that it's nowhere near GPT-3 (at least for the 7B version). Correct me if I'm wrong!

simonw · on March 11, 2023

That doesn't bother me so much. GPT-3 had instruction tuning, which makes it MUCH easier to use.

Now that I've seen that LLaMA can work I'm confident someone will release an openly licensed instruction-tuned model that works on the same hardware at some point soon.

I also expect that there are prompt engineering tricks which can be used to get really great results out of LLaMA. I'm hoping someone will come up with a good prompt to get it to summarization, for example.

sp332 · on March 11, 2023

ChatGPT had an estimated 20,000 hours of human feedback. That’s not going to be easy to replicate in an open source way.

Taek · on March 12, 2023

OpenAssistant has already collected 100,000 human feedback examples, estimated 5,000+ hours of human work via crowd sourced volunteers.

Enough programmers want this badly enough that its going to happen. Inference at 8 GB and fine tuning at 24 GB, just like stable diffusion, on a 13B model.

boredhedgehog · on March 12, 2023

From what I gather, tuning a language model is like jury duty: The more eager someone is to volunteer, the less useful his input is going to be.

Taek · on March 12, 2023

I don't think the quality of the OpenAssistant data represents that idea at all.

sebzim4500 · on March 12, 2023

This sounds incredibly hard to believe, unless your concern is that advertisers will poison the well.

vintermann · on March 13, 2023

Is it that hard to believe? If you look at Stable Diffusion, I'd say a lot, maybe the majority of the volunteer effort was focused on anime girls and "realistic" pictures of anime girls (which amounts to young faces on adult bodies).

simonw · on March 11, 2023

That's the next level up from instruction tuning though: that was the RLHF stuff, which was essential to make ChatGPT useful and safe enough to expose to a wide audience.

For a model running on my own laptop I'm OK taking more risks. I'd like it to be able to obey simple instructions like "Summarize this text" or "Extract the names of everyone mentioned in this article" - I don't care as much about the stuff ChatGPT has to get right.

renewiltord · on March 11, 2023

Perhaps we will provide feedback to open source Llama using ChatGPT. The cost to adjust the model is presumably what's hard?

bestcoder69 · on March 11, 2023

13B feels on-par with the base non-instruction davinci. People might not realize how it was a bit trickier to prompt gpt3 when it first released.

koheripbal · on March 12, 2023

They had to chop the 30b and 65b models to 4-bit quantization which makes it significantly dumber.

EricLeer · on March 13, 2023

Next to being easier to run, I also have the feeling that images by them selves are also more interesting then generating text. It is much easier for us to look past the defaults, whereas for a text it has to be of a really high quality to be of notice (and even more if you want it to be interesting).

Dalewyn · on March 12, 2023

>it can run on a typical GPU that gamers likely already have

So a GTX 1060[1] or GTX 1650[1]?

[1]: https://store.steampowered.com/hwsurvey/videocard/