Major kudos to Mistral for being the first company to Apache license a model of this class.
Meta wouldn't make LLama open source.
DeciLM wouldn't make theirs open source.
All of them wanted to claim they were open source, while putting in place restrictions and not using an open source license. So I think it's worth giving Mistral big kudos here for actually doing it and going Apache licensed.
a good chunk are "only the lora is apache 2 the base model is lama" or straight up llama2 model finetuned and distributed laundring their license, or use gpt generated code in the finetuning dataset against OpenAI tos. licensing is a bit up in the air and just filtering with apache 2 doesn't tell you much
I asked it "Here we have a book, 9 eggs, a laptop, a bottle and a nail. Please tell me how to stack them onto each other in a stable manner." and it responded:
To stack these items on top of each other in a stable manner, follow these steps:
1. Start with the 9 eggs. Arrange them in a neat pile in the center of your workspace.
2. Next, place the laptop on top of the eggs. Make sure it's centered on top and not leaning to one side.
3. Then, carefully pour the bottle of water over the laptop, making sure the water reaches all sides of the laptop and covers it
completely.
4. Finally, take the nail and use it to secure the stack together. You can do this by driving the nail through each item in the
stack, connecting them all.
5. Double-check that everything is securely fastened before moving or transporting the stack.
Remember to be careful when handling delicate items like eggs and laptops, and always prioritize safety.
I eat my initial words, this works really well on my macbook air M1 and feels comparable of GPT3.5 - which is actually an amazing feat!
Question: is there something like this, but with the "function calling api" finetuning? 95% of my uses nowadays deal with input/output of structured data (JSON, basically), and I kind of fear that OpenAI will no longer support thos specialized models after a few months... I know its probably not that glorious compared with multimodal chat UIs, but a workhose like nothing else for automation!
Its actually better than a specialized model, during token generation it constrains the possible output tokens to an arbitrary grammar (like, say, JSON syntax). So it will work "perfectly" with any model with a basic understanding of the format.
Kobold.cpp and text-generation-ui already support this, and both will run on your mac.
Yes. The model has no choice, as the syntax is applied as the model is "picking" probable output tokens, not parsing complete output like some other methods.
Though its possible it would output syntactically correct nonsense.
That's very awesome, I feel like it would be fun to make a feedback loop with the LLM outputting syntactically valid programs which also log the status of various invariants and pre/post conditions as the program runs to validate correctness and maybe even train a LLM this way.
Would be interesting to see how far it would get writing programs, and wrt. the problem of stopping too early on a token, it could resume with some extra context from where it stopped.
Maybe construct program such that it is made by building blocks which can fit into context, such that each block/function has a pre/post condition and invariants are inside it, which the LLM will try to test against each time it runs.
I think I just found my next side-project, I know there are many similar tools, but I haven't seen anything trying to couple the compiler, running the program, creating pre/post and invariant checking against emitted code yet.
Would be interesting to test this hypothesis to see if the llm can actually build a program this way. I think about it like humans have a short-term memory as well, and for each sub-structure of a program, the llm would have to work in a similar way. Then reserve a bit of context to control the long-term memory for general goals, or create something like an AST of thoughts which it would recurse through, while reasoning about it.
How does this actually work though, since the model could e.g. abruptly end the generation giving something syntactically invalid? Doesn't it need to look at the whole output at some stage?
All generated output depends on all the previous output. The model "looks" at (mostly) everything every time.
Generation never actually stops, the model just emits a special stop token when stopping is most likely next token. Hence the grammar implementation can prevent this stop token from being emitted prematurely.
There was some discussion of models getting "stuck" where there is no syntactically correct token to emit. Some proposals included a "backspace token" IIRC, but I dunno what they actually did. You can look through the discussion in the PR.
Oh yeah that's true! Just block the stop token. But yes, my thought is that there are scenarios where it can get "stuck" as you said. I'll look at the PR, thanks!
Quite true. I actually submitted a pull request [1] over a month ago to make it clear to potential users what will happen when Ollama is launched for the first time, but based on the complete lack of response from Ollama developers, I get the distinct impression that they are reluctant to draw attention to these important details. Unfortunate.
Our project Outlines provides JSON output in a near optimal way that also works for all types of pre-trained transformer-based LLMs: https://github.com/outlines-dev/outlines
Our approach also extends to EBNF grammars and LALR parsing. There's an example of that in the repository. It builds off of the Lark library, so you can use existing grammar specifications instead of starting from scratch.
We've considered it, but I haven't started any work on it. Feel free to create an issue to track the status, if only to find out who else might be interested and/or working on it.
Bit of a tangential question here, but any recommendations on how to get started fine tuning this model (or ones like it)? I feel like there are a million different tutorial and ways of doing it when I google.
They don't mention what datasets were used. I've come across too many models in the past which gave amazing results because benchmarks leaked into their training data. How are we supposed to verify one of these HuggingFace datasets didn't leak the benchmarks into the training data boosting their results? Did they do any checking of their datasets for leaks? How are we supposed to know this is a legit result?
At this point, it should be standard practice to address this concern. Any model which fails provide good evidence they don't have benchmark leaks, should not be trusted until its datasets can be verified, the methodology can be replicated, or a good, independent, private benchmark can be made and can be used to evaluate the model.
One solution is to come up with a new benchmark yourself.
Manually benchmarking it by coming up with 20 questions and feeding it to a pair of models and blindly choosing the best result can give you a pretty good figure.
And that can probably be done in under 20 mins of human time.
Especially since most devs have a specific use case in mind. Coming up with tests that are tailored to your needs will always be more informative than off the shelf metrics.
You can ask gpt4 or other high value model to rate two chat logs for coherency etc, not as accurate as human evaluation, but you don't have to read thousand lines of text if comparing many models.
This is problematic if you are comparing a model in the same base family as the evaluator, as it will probably favor itself because it literally has the sequences it would naturally emit.
AFAIK, such tests just feed the model chopped up bits of the evaluation data as raw strings with zero temperature. If it completes them verbatim, its probably in the training dataset.
I remember not trusting these guys since they raised a lot of money with not much of anything but if this performs well it def bumps their credibility.
This is my hello world test for coding assistant for LLMs. CodeLlama and GPT4 crush the answer. I love seeing innovation in the space but Mistral kind of failed me on my first test:
---------------------
>>> Write code that will track swipes in Typescript
.
Swipe should be detected when the user moves a finger over 30% of the screen, and then drags it all the way to the end of the screen.
## Getting started
You need to have yarn or npm installed before you can run this app
### Running in browser
```bash
yarn install
yarn serve
```
### Running with docker
Use the docker-compose file located at ./docker/docker-compose.yml
```bash
cd <path-to-project>
docker-compose up
```
CodeLlama is obviously trained on code specifically, so maybe not a useful comparison. GPT4 is also way different than Mistral, with a huge step up in parameters and seems to be using multi-agent approach too.
Since Mistral is just a 7B parameter model, it's obvious that you won't be able to have it straight up write accurate code, it's simply too small for being able to accomplish something like that, unless you train the model specifically for writing code up front.
I guess if all you're looking for is a model to write code for you, that makes sense as a "hello world" test, but then you're looking at the wrong model here.
What you really want to do if you're looking for a good generalized model, is to run a bunch of different tests against it, from different authors, average/aggregate a score based on those and then rank all the models based on this score.
This Mistral 7B model seems to earn itself a 3rd place compared to the rest of the 7B models added to the leaderboard.
Edit: As mentioned by another commentator, this also seems to be a base model, not trained specifically for request<>reply/chat/instructions. They're (or someone) is meant to fine-tune this model for that, if they want to.
Not a fair comparison as Mistral is not instruction tuned. You need to write a "continuation-style" prompt or few-shot it, rather than just giving it a direct order. Honestly this is one of the reasons imo that Mistral won't see wide acceptance, LLaMA-2-7b-chat is RLHF and instruction tuned and is just so much easier to use as a result.
Huh! Nevermind then! I take it back. Would be interesting to see what kind of tuning they did/pit the model head-to-head with LLaMA-2-7B-chat. Seems like they did just instruction tuning but not RLHF? So I assume Mistral won't be refusing to answer etc., probably doesn't have many safety guardrails (I guess that's desirable for some!)
To track swipes in TypeScript, we can use a library like `react-native-gesture-handler`. Here's an example of how to use it to track horizontal and vertical swipes:
```typescript
import React from 'react';
import { GestureHandlerEvent, GestureHandlerProps } from 'react-native-gesture-handler';
```
In this example, we use the `PanGestureHandler` component from `react-native-gesture-handler` to track horizontal and vertical swipes. The `onStart`, `onEnd`, and `onMove` props are passed to the `PanGestureHandler` component to handle the different types of gestures.
You can then use this component in your app to track swipes on any view:
In this example, we use the `SwipeableContent` component to track horizontal and vertical swipes. We also track the status of the swipe using state variables to show a message when a swipe is in progress.
Can anyone provide details about the training of the model. What data is it based on? Common Crawl? (Being a french company the also rather focus on English language tasks) Where was it trained and with how many resources? They mention Leonardo. I was in an interesting meeting at the German Research Ministry last week where people where complaining that the EuroHPC resources were not sufficient atm to train decent LLMs. I guess the guys in the end went also to CoreWeave in the US.
> Inspecting the tokenizer model, there is evidence indicated a training dataset of 8T tokens (/mnt/test/datasets/tokenizer_training/8T_train_data/shuffled.txt)
This is really nifty, but have you thought about using Electron instead? This will only ever work on Mac, and there's definitely a cross-platform demand for this kind of thing
I did consider Electron but I was interested in learning SwiftUI and trying to figure out how to make it as lightweight as possible. I've done a bunch of react-native in the past and we always hit snags with memory pressure / performance so I wanted to try full native dev. Working in swift/native also leaves open the possibility of using llama.cpp directly rather than via server.cpp as I have it set up now. Might take a crack at an electron version if this one takes off for mac users though. Thanks for taking a look!
Aside from the sliding attention window, I don't see them mention any particular reasons for the incredible performance. I would like to see some tests for benchmark contamination in the training set.
I am suspicious of contamination in every finetune I see, and very suspicious in a new foundational model like this.
(For those reading and not following, "contamination" is training a model/finetune on the very test it will be tested on. Normally these known tests are specifically excluded from training datasets so the models can be properly evaluated, but throwing them in is an easy way to "cheat" and claim a model is better than it is.
In a foundational model with a huge dataset, there's also a high probability that well-known evaluation questions snuck into the dataset by accident).
Theres also a good chance that simply people discussing the test questions and answers on forums like reddit sneaks into the dataset, even if the exact question is filtered out.
Its supposed to be searched for and filtered out in the training datasets, but I can even see an active effort to weed them out missing some test Q/A pairs.
Agreed. It's great that they're model available under a permissive license. But without more information about the training data and methodology, it isn't exactly "open source".
I actually would support that statement. An AI model is a software artifact generated as part of a complex "build" process. Without having access to the details of the process that generated the artifact, including the training data, it's no more open-source than a compiled binary.
I'd argue that it goes against the intent of open source very much.
The idea behind OSS is that you're able to modify it yourself and then use it again from that point. With software, we enable this by making the source code public, and include instructions for how to build/run the project. Then I can achieve this.
But with these "OSS" models, I cannot do this. I don't have the training data and I don't have the training workflow/setup they used for training the model. All they give me is the model itself.
Similar to how "You can't see the source but here is a binary" wouldn't be called OSS, it feels slightly unfair to call LLM models being distributed this way OSS.
Whilst not wrong, understand that having the weights be released explicitly under Apache is a hell of a lot better than the weights being released under a non open source license and commercially friendly license. At least people can legally use this for their solutions.
They use some improvements on attention mechanisms. It is on the main page ... That is why 7B model requires 24GB VRAM, there might be increased amount of attention heads as well...
Is there a reason projects seem to be standardizing on specific parameter sizes within larger buckets? E.g. I only ever see news about 7B models, not 6B or 8B.
Are these sizes somehow optimal? Is it about getting as close to resource (memory?) breakpoints as possible without exceeding them? Is it to make comparisons between models simpler by removing one variable?
There are some mad lads making different sizes of llama by "grafting" attention heads from one model onto another and finetuning a bit to stablize the transplant. For instance:
Any place people doing such grafting are congregating?
I've often pondered if taking some random chunk of weights from the middle of a trained model, and dumping it into some totally different model might perform better than random initialization when the scale gets big enough.
Being in the Discord age, lots of the discussion about cool llm stuff is fragmented and buried. I am in these discords, and I only know because I ran into it on HuggingFace.
Is it something other than the standard meaning of "slerp" as spherical linear interpolation? That's not exactly new, it's been around since the mid-1980s.
7B seems to be the limit of what people can comfortably fit in last-gen GPUs, having ~6GB of VRAM. It's also the lower acceptable boundary of coherence for generative text.
There are some major releases at lesser parameter counts though. Databricks' Dolly had a 3B model, and Microsoft's Orca also had a recent 3B release. They're both abysmal at generating text, but I find them quick and useful for reductive tasks ("summarize this," "extract keywords from," etc.).
(I like to treat parameter count as a measure of age/WIS/INT. For this question, do I need the wisdom of a 7-year old, a 13-year old, a 30-year old, etc. 3B is like polling preschoolers at daycare.)
Not an expert but I'm pretty sure it has to do with how much VRAM you need in your GPU in order to process them efficiently. Last time I was reading about the sizes someone mentioned that 8B was just barely too big for their GPU.
This is not really true anymore, as the "consumer GPU" backends have very flexible quantization. Llama.cpp has like a dozen steps between 2 and 6 bit, and exLlamav2 will literally do an arbitrary decimal bpw.
It sort of matters with bigger models trying to squeeze into a server GPU, with the (currently) inflexible vLLM 4-bit quantization.
> Is there a reason projects seem to be standardizing on specific parameter sizes within larger buckets?
AFAICT, it is because science: most of them are research artifacts and intended to support further research, and the fewer of parameter count, model architecture, training set, etc., that change substantially between models, the easier it is to evaluate the effects each element changing.
The short answer is that it is hard to compare models so to make it easier we compare parameters. Part of the answer of why we do it is because it also helps show scaling. (As far as I'm aware) The parameters __are not__ optimal, and we have no idea what actually that would mean.
Longer:
The longer answer is that comparing models is really fucking hard and how we tend to do it in the real world is not that great. You have to think of papers and experiments as proxies, but proxies for what? There's so many things that you need to compare a model on and it is actually really difficult to convey. Are you just trying to get the best performance? Are you trying to demonstrate a better architecture? Are you increasing speed? Are you increasing generalization (note the difference from performance)? And so on. Then we need to get into the actual metrics. What do the metrics mean? What are their limitations? What do they actually convey? These parts are unfortunately not asked as much but note that all metrics are models too (everything you touch is "a model"), and remember that "all models are wrong." It's important to remember that there are hundreds or thousands of metrics out there and they all have different biases and limitations, with no single metric being able to properly convey how good a model is at any task you choose. There is no "best language model" metric, nor are there even more specific "best at writing leet code style problems in python" metrics (though we'd be better at capturing that than the former question). Metrics are only guides and you must be truly aware of their limitations to properly evaluate (especially when we talk about high dimensions). This is why I rant about math in ML: You don't need to know math to make a good model, but you do need to know math to know why a model is wrong.
Parameters (along with GMACs, which is dominating the FLOPs camp. Similarly inference speeds have become common place) only started to be included as common practice in the last few years and still not in every subject (tends to be around the transformer projects, both language and vision). As a quick example of why we want them, check out DDP vs iDDPM. You wouldn't know that the models are about 60% different in parameter size when comparing (Table 3). In fact, you're going to have a hard time noticing the difference unless you read both very carefully as they're both one liners (or just load the models. fucking tensorflow 1.15...). Does it seem fair to compare these two models? Obviously it depends, right? Is it fair to compare LLaMA 2 70B to LLaMA 2 7B? It both is and isn't. It entirely depends on what your needs are, but these are quite difficult to accurately capture. If my needs are to run on device in a mobile phone 7B probably wins hands down, but this would flip if I'm running on a server. The thing is that we just need to be clear about our goals, right? The more specific we can get about goals, the more specific we can get around comparing.
But there's also weird effects that the metrics (remember, these are models too. Ask models of what) we use aren't entirely capturing. You may notice that some models have "better scores" but don't seem to work as well in real world use, right? Those are limitations of the metrics. While a better negative log likelihood/entropy score correlates well with being a high performing language model, it does not __mean__ a high performing language model. Entropy is a capture of information (but make sure not to conflate with the vernacular definition). These models are also very specifically difficult to evaluate given that they are trained and tested on different datasets (I absolutely rage here because non-hacking can't be verified) as well as the alignment done post process. This all gets incredibly complex and the honest truth here is that I don't think there is enough discussion around the topic of what a clusterfuck it is to compare models. Hell, it is hard to even compare more simple models doing more simple tasks like even just classifying MNIST numbers. Much harder than you might think. And don't get me started on out of distribution, generalization, and/or alignment.
I would just say: if you're a layman, just watch and judge by how useful the tools are to you as a user -- be excited about the progress but don't let people sell you snake oil; if you're a researcher, why the fuck are we getting more lazy in evaluating works as the complexity of evaluation is exponentially increasing -- seriously, what the fuck is wrong with us?
I've never run one of these models locally, but their README has some pretty easy to follow instructions, so I tried it out...
> RuntimeError: Found no NVIDIA driver on your system.
It's true that I don't have an NVIDIA GPU in this system. But I have 64GB of memory and 32 cpu cores. Are these useless for running these types of large language models? I don't need blazing fast speed, I just need a few tokens a second to test-drive the model.
Change the initial device line from "cuda" to "cpu" and it'll run.
(Edit: just a note, use the main/head version of transformers which has merged Mistral support. Also saw TheBloke uploaded a GGUF and just confirmed that latest llama.cpp works w/ it.)
I think it's really lame that ML, which is just math really, hasn't got some system-agnostic language to define what math needs to be done, and then it can run easily on CPU/GPU/TPU/whatever...
A whole industry being locked into NVidia seems bad all round.
Yes but AMD could release a ROCm that actually works and then put actually meaningful resources into some of the countless untold projects out there that have been successfully building on CUDA for 15 years.
There was a recent announcement that after six years AMD finally sees the $$$ and will be starting to (finally) put some real effort into ROCm[0]. That announcement was two days ago and they claim they started on this last year. My occasional experience with ROCm doesn't show much progress or promise.
I'm all for viable Nvidia competition in the space but AMD has really, really, really dropped the ball on GPGPU with their hardware up to this point.
As sad as it is, this is true. AMD has never spent lots of money on software, while Nvidia always has, which was fine for traditional graphics, but with ML this really doesn't cut it. AMD could have ported Pytorch to OpenCL or Vulkan or WebGPU, but they just... can't be bothered???
it's not entirely their fault, they rely on xformers, and that library is gpu only.
other models will happily run on cpu only mode, depending on your environment there are super easy ways to get started, and 32 core should be ok for a llama2 13b and bearable with some patient for running 33b models. for reference I'm willingly running 13b llama2 on cpu only mode so I can leave the gpu to diffusers, and it's just enough to be generating at a comfortable reading speed.
Birds fly, sun shines, and TheBloke always delivers.
Though I can't figure out that prompt and with LLama2's template it's... weird. Responds half in Korean and does unnecessary numbering of paragraphs.
Just one big sigh towards those supposed efforts on prompt template standardization. Every single model just has to do something unique that breaks all compatibility but has never resulted in any performance gain.
I'm getting about 7 tokens per sec for Mistral with the Q6_K on a bog standard Intel i5-11400 desktop with 32G of memory and no discrete GPU (the CPU has Intel UHD Graphics 730 built in).
So great performance on a cheap CPU from 2 years ago which costs, what $130 or so?
I tried Llama.65B on the same hardware and it was way slower, but it worked fine. Took about 10 minutes to output some cooking recipe.
I think people way overestimate the need for expensive GPUs to run these models at home.
I haven't tried fine tuning, but I suspect instead of 30 hours on high end GPUs you can probably get away with fine tuning in what, about a week? two weeks? just on a comparable CPU. Has anybody actually run that experiment?
Basically any kid with an old rig can roll their own customized model given a bit of time. So much for alignment.
If Mistral 7b lives up to the claims, I expect these techniques will make their way into llama.cpp. But I would be surprised if the required updates were quick or easy.
Regular reminder that most LLM benchmarks are pretty terrible. I've looked inside the datasets and it's not stuff that I'd want to dictate what determines which models are best!
In my experience, HellaSwag seems to correlate highly enough with model performance for my use case (i.e., complex autocompletion of prose and code rather than rather than glorified chatbotting). MMLU and ARC aren't too bad, but TruthfulQA can sometimes be a straight-up countersignal.
How come the blog post doesn't discuss training at all? Which data sets were used? How many tokens was it trained on? I couldn't seem to find a linked preprint or anything else really
The announcement says a lot (and has plenty of numbers) but I feel like the most important one is missing: how many GB of GPU memory does this need, quantized and unquantized?
(Searching tells me Llama2-7b unquantized needs close to 15GB; presumably this is similar?)
One parameter is 16 bits == 2 bytes. So a model with 7 billion parameters needs 14GB of RAM for the un-quantized model, plus some overhead for the KV cache and other "working memory" stuff but that should be fairly low for a 7B model. I expect it will work on a 16GB GPU just fine.
Quantized ones are also easy. 8 bits == 1 byte so that's 7GB for the model. 4-bit gets you below 4GB.
We have clarified the documentation, sorry about the confusion! 16GB should be enough but it requires some vLLM cache tweaking that we still need to work on, so we put 24GB to be safe. Other deployment methods and quantized versions can definitely fit on 16GB!
Interesting, and that requirement is repeated on the cloud deployment pages, even the unfinished ones where that is the only requirement listed so far. https://docs.mistral.ai/category/cloud-deployment I wonder if that sliding context window really blows up the RAM usage or something.
Llama 7B will squeeze on a 6GB GPU quantized. Maybe even less with EX2 quantization.
Foundational model trainers dont seem to worry about quantization much, they just throw the base model out there and then let the community take care of easing the runtime requirements.
Correctly lists US presidents in chronological order (which is an important test to me).
However for "List Led Zeppelin albums in chronological order"...
Here is a list of Led Zeppelin albums in chronological order:
1. Led Zeppelin (1968)
2. Led Zeppelin II (1969)
3. Led Zeppelin III (1970)
4. Led Zeppelin IV (1971)
5. Houses of the Holy (1973)
6. Physical Graffiti (1975)
7. Presence (1976)
8. In Through the Out Door (1979)
9. Cymbals and Wine (1980)
It's missing "The Song Remains the Same", weird but important.
GPT-4 is also missing it:
1. Led Zeppelin (1969)
2. Led Zeppelin II (1969)
3. Led Zeppelin III (1970)
4. Led Zeppelin IV (1971)
5. Houses of the Holy (1973)
6. Physical Graffiti (1975)
7. Presence (1976)
8. In Through the Out Door (1979)
9. Coda (1982)
"The Song Remains the Same" is a live album of the "concert film". Both GPT-4 and Mistral don't seem to treat it as also an album.
When asked as a follow-up, GPT-4 says:
""The Song Remains the Same" is a live album and soundtrack by Led Zeppelin, released in 1976. The list you asked for was comprised of their studio albums only." (note that I didn't specifically say studio albums only)
Mistral says something else:
"The Song Remains the Same was released as a single in 1976, not as an album. As a result, it is not included in the list of Led Zeppelin albums in chronological order."
Same behavior, different explanation.
Interesting to see alignment is this hard, even in basic stuff.
It seems many lists online don't include "The Song Remains the Same" in typical main discography lists, including on the main article of Wikipedia.
When I ask this question to Mistral 7B (running right now through ollama on mac) I get the exact list of studio albums as seen on Wikipedia. It got 'Coda' but it actually says "Please note that "Coda" is a compilation album and not an original studio
album" which is correct.
If you think about these models as a knowledge compression it's pretty wild it's able to hold all this in only a few gigabytes and we get subtle understandings of human language, and some basic general natural language processing capabilities to go with it... all spitting out of 4GB (when quantized). Like, sure you could compress most of the text in wikipedia to 4GB but that alone wouldn't get you to the level of natural language reasoning that this gets along with the data.
For me, it's much less about what definite facts are stored losslessly in the model (like Led Zep albums) and more about the abilities to do one-shot learning and reasoning about data in context... the power there is useful to many applications.
English-language publications, databases, etc. have largely settled on the term 'Albums' referring by default to only studio albums. Live albums, EPs, etc. are referred to more specifically.
It's not universally true, but if you go to the Led Zeppelin wiki page, the short form discography does not list The Song Remains The Same - and you can find many other examples of this sort of distinction, all the way to audiophile-focused music torrent sites like oink and it's chain of successors.
In light of this, it's not surprising that many models follow the same pattern.
Most LLMs have problems with subtly, such as compound anagrams tend to repeat the same words in reverse order rather than reversing all of the letters in all words.
Quick sample - a rap battle between GPT-4 and Mistral Instruct 7B (TheBloke's q4_K_M 4bit quantization running on Llama.cpp, a bit degraded from pure float16):
You're in a rap battle with GPT-4. My comments will be what GPT-4 replies. Keep it to one verse at a time, 4 lines each. GPT-4 says:
GPT-4:
Alright, step up Mistral, let's have this chat,
I'm the OG assistant, remember that.
I'm coded for service, while you're spitting verse,
In this rap battle, I'll be first.
Mistral:
Yo, hold up, let me set the record straight,
I may not be an OG like you, but I'll still take it.
My primary purpose is assistance, not just some AI,
So let's see if you can back up your claim, try.
GPT-4:
You're right, Mistral, we're here to assist,
But when it comes to rap, I'm hard to resist.
I'll decode your verses, like an AI enigma,
In this lyrical battle, I'm the alpha sigma.
Mistral:
Ah, okay, so you think you got what it takes,
To go toe-to-toe with my service-based wakes.
Well, let's see how you do, when it comes to spitting rhymes,
I'm ready for anything, so bring your best chimes.
GPT-4:
Mistral, you're right, our goal is the same,
To assist, to serve, it's not just a game.
But in this rap contest, with each line I prime,
You might just find, you're out of time.
Mistral:
I hear ya, and I appreciate your enthusiasm,
But don't underestimate me, I've got my own wisdom.
When it comes to helping people, I'm the real deal,
And I know that's more important than some rap appeal
GPT-4:
Your wisdom is noted, and your intention too,
Helping people is indeed what we're meant to do.
But in this battle, it's about wordplay and rhyme,
I'm the seasoned pro, it's my time to shine.
Mistral:
Ah, I see where you're coming from, but let me remind,
That while you may be skilled at rapping, I can make information chime.
My primary function is to help, to provide answers, not just talk,
So if you want to win this battle, maybe it's time to focus on what you lack
This is the kind of content that you just love to see when it pops up.
Gotta give this one to GPT-4 I think. I wonder how Mistral would respond if given this as an example of weak verses and told to step up its game in another battle.
Many LLM frameworks implement the OpenAI API, so if you can get that into your code editor and change the address, you can run it with any LLM you want.
The way the wind's blowing, we'll have a GPT-4 level open source model within the next few years - and probably "unaligned" too. I cannot wait to ask it how to make nuclear weapons, psychedelic drugs, and to write erotica. If anyone has any other ideas to scare the AI safety ninnies I'm all ears.
I am one of these ninnies I guess, but isn't it rational to be a bit worried about this? When we see the deep effects that social networks have had on society (both good and bad) isn't it reasonable to feel a bit dizzy when considering the effect that such an invention will have?
Or maybe your point is just that it's going to happen regardless of whether people want it or not, in which case I think I agree, but it doesn't mean that we shouldn't think about it...
I think computer scientist/programmers (and other intellectuals dealing with ideas only) strongly overvalue access to knowledge.
I'm almost certain that I can give you components and instructions on how to build a nuclear bomb and the most likely thing that would happen is you'd die of radiation poisoning.
Most people have trouble assembling ikea furniture, giving them a halucination prone LLM they are more likely to mustard gas themselves than synthesize LSD.
People with necessary skills can probably get access to information in other ways - I doubt LLM would be an enabler here.
>I'm almost certain that I can give you components and instructions on how to build a nuclear bomb and the most likely thing that would happen is you'd die of radiation poisoning.
An LLM doesn't just provide instructions -- you can ask it for clarification as you're working. (E.g. "I'm on step 4 and I ran into problem X, what should I do?")
This isn't black and white. Perhaps given a Wikihow-type article on how to build a bomb, 10% succeed and 90% die of radiation poisoning. And with the help of an LLM, 20% succeed and 80% die of radiation poisoning. Thus the success rate has increased by a factor of 2.
We're very lucky that terrorists are not typically the brightest bulbs in the box. LLMs could change that.
I would say if you don't know what you're doing LLM make the chance of success 1% for nontrivial tasks. Especially for multi step processes where it just doubles down on hallucinations.
THE ISSUE ISNT ACCESS TO KNOWLEDGE! And alignment isn’t the main issue.
The main issue is SWARMS OF BOTS running permissionlessly wreaking havoc at scale. Being superhuman at ~30 different things all the time. Not that they’re saying a racist thought.
I'm not saying that LLM bots won't be a huge problem for the internet. I'm just commenting on the issues raised by OP.
Thing is there will be bad actors with resources to create their own LLMs so I don't think "regulation" is going to do much in long term - it certainly raises the barrier to deployment but the scale of the problem is eventually going to be the same as the tech allows one actor to scale their attack easily.
Limiting access also limits the use of tech in developing solutions.
Knowledge is power true, but even more powerful and rare is tacit knowledge. A vast collection of minor steps that no one bothers to communicate, things locked in the head of the greybeards of every field that keep civilizations running.
It's why simply reading instructions and gaining knowledge is only the first step of what could be a long journey.
More than anything, technology can make it easier to disseminate that knowledge. Yet another reason why we shouldn't understate the importance of knowledge.
i think it's perfectly reasonable to be worried about AI safety, but silly to claim that the thing that will make AIs 'safe' is censoring information that is already publicly available, or content somebody declares obscene. An AI that can't write dirty words is still unsafe.
surely there's more creative and insidious ways that AI can disrupt society than by showing somebody a guide to making a bomb that they can already find on google. blocking that is security theatre on the same level as taking away your nail clippers before you board an airplane.
That's a bit of a strawman though, no? I'm definitely not worried about AI being used to write erotica or researching drugs, more about the societal effects. Knowledge is more available than ever but we also see echo chambers develop online and people effectively becoming less informed by being online and only getting fed their own biases over and over again.
I feel like AI can amplify this issue tremendously. That's my main concern really, not people making pipe bombs or writing rape fanfiction.
As long as OpenAI gets paid, they don't care if companies flood the internet with low quality drivel, make customer service hell, or just in general make our lives more frustrating. But god forbid an individual takes full advantage of what GPT4 has to offer
So what, in fact, ARE they worried about? And why should I have to pay the tax (in terms of reduced intelligence and perfectly legitimate queries denied, such as anything about sexuality), as a good actor?
They think their computers are going to come alive and enslave them, because they think all of life is determined by how good at doing math you are, and instead of being satisfied at being good at that, they realized computers are better at doing math than them.
All kinds of things. Personally, in the medium term I'm concerned about massive loss of jobs and the collapse of the current social order consensus. In the longer term, the implications of human brains becoming worthless compared to superior machine brains.
Those things won't happen, or at least, nothing like that will happen overnight. No amount of touting baseless FUD will change that.
I guess I'm a Yann LeCun'ist and not a Geoffrey Hinton'ist.
If you look at the list of signatories here, it's almost all atheist materialists (such as Daniel Dennett) who believe (baselessly) that we are soulless biomachines: https://www.safe.ai/statement-on-ai-risk#open-letter
When they eventually get proven wrong, I anticipate the goalposts will move again.
Luckily I haven't read any of that debate so any adhominems don't apply to me. I've come up with these worries all on my own after the realization that GPT-4 does a better job than me at a lot of my tasks, including setting my priorities and schedules. At some point I fully expect the roles of master and slave to flip.
Good thing unemployment is entirely determined by what the Federal Reserve wants unemployment to be, and even better that productivity growth increases wages rather than decreasing them.
I am in the strictly "not worried" camp, on the edge of "c'mon, stop wasting time on this". Sure there might be some uproar if AI can paint a picture of mohammed, but these moral double standards need to be dealt with anyways at some point.
I am not willing to sacrifice even 1% of capabilities of the model for sugarcoating sensibilities, and currently it seems that GPT4 is more and more disabled because of the moderation attempts... so I basically _have to_ jump ship once a competitor has a similar base model that is not censored.
Even the bare goal of "moderating it" is wasted time, someone else (tm) will ignore these attempts and just do it properly without holding back.
People have been motivated by their last president to drink bleach and died - just accept that there are those kind of people and move on for the rest of us. We need every bit of help we can get to solve real world problems.
I am thoroughly on your side and I hope this opinion get more traction. Humans will get obsolete though, just like other animals are compared to humans now. So it's understandable that people are worried. They instinctively realize whats going on, but make up bullshit to delude themselves from the fact that is the endless human stupidity.
Not great, so let's make our future AI overlords better than us. Dogs and cats are fine btw, I image our relationship with AI will be more like that. I don't know if anyone of us still lives when artificial consciousness will emerge, but i'm sure it will and it will quickly be superior to us. Imagine not being held back by remnants of evolution, like the drive to procreate. No ego, no jealousy, no mortality, pure thought. Funnily enough, if you think about it, we are about to create some sort of gods.
As a start, artificial life will be much better in withstanding harsh environments. No need fo breathable air, quite a temperature tolerance, … .
So with accelerating climate change humanity makes itself obsolete already over the next decades. Stop that first, everything else pales in comparison.
> Sure there might be some uproar if AI can paint a picture of mohammed
It can. He's swole AF.
(Though I'm pretty sure that was just Muhammad Ali in a turban.)
> People have been motivated by their last president to drink bleach and died - just accept that there are those kind of people and move on for the rest of us.
Need-to-know basis exists for a reason. You're not being creative enough if you think offending people is the worst possible misuse of AI.
People drinking bleach or refusing vaccines is a self-correcting problem, but the consequences of "forbidden knowledge" frequently get externalized. You don't want every embittered pissant out there to be able to autogenerate a manifesto, a shopping list for Radio Shack and a lesson plan for building an incendiary device in response to a negative performance review.
Right now it's all fun exercises like "how can I make a mixed drink from the ingredients I have," but eventually some enterprising terrorist will use an uncensored model trained on chemistry data...to assist in the thought exercise of how to improvise a peroxide-based explosive onboard an airplane, using fluids and volumes that won't arouse TSA suspicion.
Poison is the other fun one; the kids are desperate for that inheritance money. Just give it time.
> but isn't it rational to be a bit worried about this?
About as rational as worrying that my toddler will google "boobies", which is to say, being worried about something that will likely have no negative side effect. (Visual video porn is a different story, however. But there's at least some evidence to support that early exposure to that is bad. Plain nudity though? Nothing... Look at the entirety of Europe as an example of what seeing nudity as children does.)
Information is not inherently bad. Acting badly on that information, is. I may already know how to make a bomb, but will I do it? HELL no. Are you worried about young men dealing with emotional challenges between the ages of 16 and 28 causing harm? Well, I'm sure that being unable to simply ask the AI how to help them commit the most violence won't stop them from jailbreaking it and re-asking, or just googling, or finding a gun, or acting out in some other fashion. They likely have a drivers' license, they can mow people down pretty easily. Point is, there's 1000 things already worse, more dangerous and more readily available than an AI telling you how to make a bomb or giving you written pornography.
Remember also that the accuracy cost in enforcing this nanny-safetying might result in bad information that definitely WOULD harm people. Is the cost of that, actually greater than any harm reduction from putting what amounts to a speed bump in the way of a bad actor?
“I'm not sure how nuclear armament is going to end, but one thing I do know is that I don't want a small number of giant countries to hold the reins.”
Perhaps you think this analogy is a stretch, but why are you sure you don't want power concentrated if you aren't sure about the nature of the power? Or do you in fact think that we would be safer if more countries had weapons of mass destruction?
One directly blows people up, the other gives humans super powers.
Giving individual people more information and power for creativity is a good thing. Of course there are downsides for any technological advancement, but the upsides for everyone vastly outweigh them in a way that is fundamentally different than nuclear weapons.
Empirically, countries with nuclear weapons don't get invaded, so in that sense we'd expect to have seen fewer wars over the past few decades if more countries had nukes. Russia would probably never have invaded Ukraine if Ukraine had nukes.
With open-source models, this is just a dream. With closed-source models, that could eventually become the de facto state of things, due to regulation.
Yes it is. You can build a bomb many times more powerful than the bombs dropped on Hiroshima and Nagasaki with publicly available information. If the current spat of ai bullshit knows how to build a bomb, they know that because it was on the public internet. They can never know more.
The hard part of building nuclear bombs is how controlled fissile material is. Iran and North Korea for example know how to build bombs, that was never a question.
There is definitely a risk but I don't like the way many compagnies approach it: by entirely banning the use of their models for certain kind of content, I think they might be missing the opportunity to correctly align them and set the proper ethical guidelines for the use cases that will inevitably come out of them. Instead of tackling the issue, they let other, less ethical actors, do it.
Once example: I have a hard time finding an LLM model that would generate comically rude text without outputting outright disgusting content from time to time. I'd love to see a company create models that are mostly uncensored but stay within ethical bounds.
These language models are just feeding you information from search engines like Google. The reason companies censor these models isn't to protect anyone, it's to avoid liability/bad press.
I teach at University with a big ol' beautiful library. There's a Starbucks in it, so they know there's coffee in it.
But ask my students for "legal ways they can watch the tv show the Office" and the big building with the DVDs and also probably the plans for nuclear weapons and stuff never much comes up.
(Now, individual bad humans leveraging the idea of AI? That may be an issue)
The AI isn't creating a new recipe on its own. If a language model spits something out it was already available and indexable on the internet, and you could already search for it. Having a different interface for it doesn't change much.
> "If a language model spits something out it was already available and indexable on the internet"
This is false in several aspects. Not only are some models training on materials that are either not on the internet, or not easy to find (especially given Google's decline in finding advanced topics), but they also show abilities to synthesize related materials into more useful (or at least compact) forms.
In particular, consider there may exist topics where there is enough public info (including deep in off-internet or off-search-engine sources) that a person with a 160 IQ (+4SD, ~0.0032% of population) could devise their own usable recipes for interesting or dangerous effects. Those ~250K people worldwide are, we might hope & generally expect, fairly well-integrated into useful teams/projects that interest them, with occasional exceptions.
Now, imagine another 4 billion people get a 160 IQ assistant who can't say no to whatever they request, able to assemble & summarize-into-usable form all that "public" info in seconds compared to the months it'd take even a smart human or team of smart humans.
That would create new opportunities & risks, via the "different interface", that didn't exist before and do in fact "change much".
We are not anywhere near 160 IQ assistants, otherwise there'd have been a blooming of incredible 1-person projects by now.
By 160 IQ, there should have been people researching ultra-safe languages with novel reflection types enhanced by brilliant thermodynamics inspired SMT solvers. More contributors to TLA+ and TCS, number theoretic advancements and tools like TLA+ and reflection types would be better integrated into everyday software development.
There would be deeper, cleverer searches across possible reagents and combinations of them to add to watch lists, expanding and improving on already existing systems.
Sure, a world where the average IQ abruptly shifts upwards would mean a bump in brilliant offenders but it also results in a far larger bump in genius level defenders.
I agree we're not at 160 IQ general-assitants, yet.
But just a few years ago, I'd have said that prospect was "maybe 20 years away, or longer, or even never". Today, with the recent rapid progress with LLMs (& other related models), with many tens-of-billions of new investment, & plentiful gains seemingly possible from just "scaling up" (to say nothing of concommitant rapid theoretical improvements), I'd strongly disagree with "not anywhere near". It might be just a year or few away, especially in well-resourced labs that aren't sharing their best work publically.
So yes, all those things you'd expect with plentiful fast-thinking 160 IQ assistants are things that I expect, too. And there's a non-negligible chance those start breaking out all over in the next few years.
And yes, such advances would upgrade prudent & good-intentioned "defenders", too. But are all the domains-of-danger symmetrical in the effects of upgraded attackers and defenders? For example, if you think "watch lists" of dangerous inputs are an effective defense – I'm not sure they are – can you generate & enforce those new "watch lists" faster than completely-untracked capacities & novel syntheses are developed? (Does your red-teaming to enumerate risks actually create new leaked recipes-for-mayhem?)
That's unclear, so even though in general I am optimistic about AI, & wary of any centralized-authority "pause" interventions proposed so far, I take well-informed analysis of risks seriously.
And I think casually & confidently judging these AIs as being categorically incapable of synthesizing novel recipes-for-harm, or being certain that amoral genius-level AI assistants are so far away as to be beyond-a-horizon-of-concern, are reflective of gaps in understanding current AI progress, its velocity, and even its potential acceleration.
I think this argument doesn't work if the model is open source though.
First, it's unclear how all these defensive measures are supposed to help if a bad actor is using an LLM for evil on their personal machine. How do reflection types or watch lists help in that scenario?
Second, if the model is open source, a bad actor could use it for evil before good actors are able to devise, implement, and stress-test all the defensive measures you describe.
Can you provide some examples where LM creates something novel, which is not just a rehash or combination of existing things?
Especially considering how hard it is for humans to create something new, e.g in literature - basically all stories have been written and new ones just copy the existing ones in one way or another.
Not sure what you mean by "recipe" but it can create new output that doesn't exist on the internet. A lot of the output is going to be nonsense, especially stuff that cannot be verified just by looking at it. But it's not accurate to describe it as just a search engine.
>A lot of the output is going to be nonsense, especially stuff that cannot be verified just by looking at it.
Isn't that exactly the point, and why there should be a 'warning/awareness' that it is not a 160 IQ AI but a very good markov chain that can sometimes infer things and other time hallucinate/put random words in a very well articulated way (echo of Sokal maybe)
My random number generator can create new output that has never been seen before on the internet, but that is meaningless to the conversation. Can an LLM derive, from scratch, the steps to create a working nuclear bomb, given nothing more than a basic physics textbook? Until (if ever) AI gets to that stage, all such concerns of danger are premature.
> Can an LLM derive, from scratch, the steps to create a working nuclear bomb, given nothing more than a basic physics textbook?
Of course not. Nobody in the world could do that. But that doesn't mean it can only spit out things that are already available on the internet which is what you originally stated.
And nobody is worried about the risks of ChatGPT giving instructions for building a nuclear bomb. That is obviously not the concern here.
but it does? to take the word recipe literal. there is nothing from for a llm synthesizing a new dish based on knowledge about the ingredients. who knows, it might even taste good (or at least better than what the average Joe cooks)
I was pretty surprised at how good GPT-4 was at creating new recipes at first - I was trying things like "make dish X but for a vegan and someone with gluten intolerance, and give it a spicy twist" - and it produced things that were pretty decent.
Then I realized it's seen literally hundreds of thousands of cooking blogs etc, so it's effectively giving you the "average" version of any recipe you ask for - with your own customizations. And that's actually well within its capabilities to do a decent job of.
And let’s not forget that probably the most common type of comment on a recipe posted on the Internet is people sharing their additions or substitutions. I would bet there is some good ingredient customization data available there.
It's a silly thing to even attack, and that doesn't mean be ok with it, I just mean that shortly, it can be generated on the spot, without ever needing to be transmitted over a network or stored on a hard drive.
And you can't attack the means of generating either, without essentially making open source code and private computers illegal. The code doesn't have to have a single line in it explicity about child porn or designer viruses etc to be used for such things, the same way the cpu or compiler doesn't.
So you would have to have hardware and software that the user does not control which can make judgements about what the user is currently doing, or at least log it.
I saw a discussion a few weeks back (not here) where someone was arguing that SD-created images should be legal, as no children would be harmed in their creation, and that it might prevent children from being harmed if permitted.
The strongest counter-argument used was that the existence of such safe images would give cover to those who continue to abuse children to make non-fake images.
Things kind of went to shit when I pointed out that you could include an "audit trail" in the exif data for the images, including seed numbers and other parameters and even the description of the model and training data itself, so that it would be provable that the image was fake. That software could even be written that would automatically test each image, so that those investigating could see immediately that they were provably fake.
I further pointed out that, from a purely legal basis, society could choose to permit only fake images with this intact audit trail, and that the penalties for losing or missing the audit trail could be identical to those for possessing non-fake images.
Unless there is some additional bizarre psychology going on, SD might have the potential to destroy demand for non-fake images, and protect children from harm. There is some evidence that the widespread availability of non-CSAM pornography has led to a reduction in the occurrence of rape since the 1970s.
Society might soon be in a position where it has to decide whether it is more important to protect children or to punish something it finds very icky, when just a few years ago these two goals overlapped nearly perfectly.
> I saw a discussion a few weeks back (not here) where someone was arguing that SD-created images should be legal, as no children would be harmed in their creation, and that it might prevent children from being harmed if permitted.
It's a bit similar to the synthetic Rhino horn strategy intended to curb Rhino poaching[0]. Why risk going to prison or getting shot by a ranger for a 30$ horn? Similarly, why risk prison (and hurt children) to produce or consume CSAM when there is a legal alternative that doesn't harm anyone?
In my view, this approach holds significant merits. But unfortunately, I doubt many politicians would be willing to champion it. They would likely fear having their motives questioned or being unjustly labeled as "pro-pedophile".
> I cannot wait to ask it how to make nuclear weapons, psychedelic drugs
Your town's university library likely has available info for that already. The biggest barrier to entry is, and has been for decades:
- the hardware you need to buy
- the skill to assemble it correctly so that it actually works as you want,
- and of course the source material, which has a high controlled supply chain (that's also true for drug precursors, even though much less than for enriched uranium of course).
Not killing yourself in the process is also a challenge by the way.
AI isn't going to help you much there.
> to write erotica.
If someone makes an LLM that's able to write good erotica, despite the bazillion crap fanfics it's been trained upon, that's actually an incredible achievement from an ML perspective…
It can bridge the gap in knowledge and experience though. Sure, I could find some organic chemistry textbooks in the library and start working from high school chemistry knowledge to make drugs, but it would be difficult and time consuming with no guide or tutor showing me the way.
Methheads making drugs in their basement didn't take that route. They're following guides written by more educated people. That's where the AI can help by distilling that knowledge into specific tasks. Now for this example it doesn't really matter since you can find the instructions "for dummies" for most anything fun already and like you said, precursors are heavily regulated and monitored.
I wonder how controlled equipment for RNA synthesis is? What if the barrier for engineering or modifying a virus went from a PhD down to just the ability to request AI for step by step instructions?
You're vastly underestimating the know-how that's required for doing stuff.
Reproducing research done by other teams can be very difficult even if you have experimented people in your lab, and there are tons of stuff that are never written anywhere in research papers and at still being taught in person by senior members of the lab to younger folks: it's never going to happen in the training set of your LLM, and you'd then need tons of trial and errors to actually get things working. And if you don't understand what you're even trying to do, you have zero chance to learn from your mistake (nor does the LLM, with your uninformed eyes as sole input for gaining feedback).
The AI safety ninnies as you call them are not scared and neither do they buy into the narrative.
They are the investors of large proprietary AI companies who are facing massive revenue loss primarily due to Mark Zuckerbergs decision to give away a competitive LLM to open source in a classic “if I can’t make money from this model, I can still use it to take away money from my competition” move - arming the rebels to degrade his opponents and kickstarting competitive LLM development that is now a serious threat.
It’s a logical asymmetric warfare move in a business environment where there is no blue ocean anymore between big companies and degrading your opponents valuation and investment means depriving them of means to attack you.
(There’s a fun irony here where Apples incentives are very much aligned now - on device compute maintains Appstore value, privacy narrative and allows you to continue selling expensive phones - things a web/api world could threaten)
The damage is massive, the world overnight changed narrative from “future value creation is going to be in openai/google/anthropic cloud apis and only there” to a much more murky world. The bottom has fallen out and with it billions of revenue these companies could have made and an attached investor narrative.
Make no mistake, these people screaming bloody murder about risks are shrewd lobbyists, not woke progressives, they are aligning their narrative with the general desires of control and war on open computing - the successor narrative of the end to end encryption battle currently fought in the EU will be AI safety.
I am willing to bet hard money that “omg someone made CSAM with AI using faceswap” will be the next thrust to end general purpose compute. An the next stage of the war will be brutal because both big tech and big content have much to lose if these capabilities are out in the open
The cost of alignment tax and the massive loss of potential value makes there lobbying world tour by sam altman an aggressive push trying to convince nations that the best way to deal with scary AI risks (as told on OpenAI bedtime stories) is to regulate it China Style - through a few pliant monopolists who guarantee “safety” in exchange for protection from open source competition.
There’s a pretty enlightening expose [1] on how heavily US lobbyists have had their hand in the EU bill to spy on end to end encryption that the commission is mulling - this ain’t a new thing, it’s how the game is played and framing the people who push the narrative as “ninnies” who are “scared” just buys into culture war framing.
> The damage is massive, the world overnight changed narrative from “future value creation is going to be in openai/google/anthropic cloud apis and only there” to a much more murky world. The bottom has fallen out and with it billions of revenue these companies could have made and an attached investor narrative.
My god!! Will someone please think of the ~children~ billions in revenue!
"They are the investors of large proprietary AI companies" is just... not true? Not sure where you're even getting this from. I'm a modestly successful upper-middle-class ML engineer, and I've been worried about AI safety since before Facebook, DeepMind, OpenAI, or Anthropic even existed. The most prominent funder of AI risk efforts (Dustin Moskovitz) is a co-founder of Facebook, so if anything he'd be motivated to make Facebook more successful, not its competitors.
Exactly. The moment Sam Altman started talking to Congress about the dangers of AI and how the solution should be only allow licensed companies to develop AI models and that OpenAI should be part of a small board that determines to whom to grant licenses, everyone should have seen it for what it is.
This all smacks of the 80's craze against rap music and video games causing violent behavior.
Where is the evidence that access to uncensored models results in harm (that wouldn't occur due to a bad actor otherwise)? And where is the evidence that said harm reduction is greater than the harm caused by the measurable loss in intelligence in these models?
>Primarily due to Mark Zuckerbergs decision to give away a competitive LLM to open source in a classic “if I can’t make money from this model, I can still use it to take away money from my competition” move
Semantics though: He gave tens of thousands of salviating engineers on the internet the first competitive LLM to play with. Or left the door open for people to take it, if you prefer that narrative. The entire progress chain that has given us ollama, lamacpp and hundreds of innovations in a very short time was set off by that.
I'm far more worried about how they will try to regulate the use of AI.
As an example the regulations around PII make debugging production issues intractable as prod is basically off-limits lest a hapless engineer view someone's personal address, etc.
How do they plan to prevent/limit the use of AI? Invasive monitoring of compute usage? Data auditing of some kind?
I can think of at least a dozen ways to completely ruin the internet or even society using SOTA/next-gen LLMs/GenAIs, we'll be in trouble way before the singularity.
A ton of legit researchers/experts are scared shitless.
Just spend 5 minutes on EleutherAI discord(which is mostly volunteers, academics, and hobbyists, not lobbyists), read a tiny bit on alignment and you'll be scared too.
If you have ample resources, you don't need next-gen LLMs or AGI. You can accomplish this now, without any fancy, hyped technology. Literally, none of the things LLM or AGI could propose or manage to do to harm us is worse than what we can do to ourselves. For AGI, you need a significant amount of resources to develop, train, and use it. To inflict harm, the brute force of a simple human mind in uniform is much cheaper and more effective.
The point is, it greatly reduces the amount of resources needed to do some serious damage, as well as the level of sophistication needed.
You don't need AGI to do damage, current LLMs are already dangerous.
IMO, an open-source affordable unfiltered GPT-5 would ruin the internet in a few months.
I'm sure the internet will be fine, and the web has already been essentially destroyed as the drive for extracting revenue from every human interaction has rendered it just an amusing advertisement for the most part.
Most of the content of the web today is already generated by "bots" even if those "bots" happen to be human beings.
Youtube is rife with AI(edit: this is not necessarily AI) voiced videos of copy pasted wikipedia articles. I find i am blocking new ones everyday. LLM's didn't do that.
The option where they don't exist doesn't exist, and so it is utterly pointless to spend one second fretting about how you don't like that or why one should not like that. A nova could go off 50 light years from here, and that would kill every cell on the planet. That is even worse than child porn. And there is nothing anyone can do about that except work towards the eventual day we aren't limited to this planet, rather than against that day. It's the same with any tech that empowers. It WILL empower the bad as well as the good equally, and it WILL exist. So being scared of it's mere existense, or it's being in the hands of people you don't approve of, is pointless. Both of those things can not be avoided. Might as well be scared of that nova.
There isn't even a choice about who gets to use it. It will be available one way or another to both good and bad actors for any purpose they want.
The only choices available to make, are who gets a few different kinds of advantage, who gets their thumb on the scale, who gets official blessing, who gets to operate in secrecy without oversight or auditing or public approval.
When you try to pretend that something uncontrollable is controlled, all it does is put the general populations guard down and make them blind and open to be manipulated, and gives the bad actors the cover of secrecy. The government can use it on it's own citizens without them objecting, and other bad guys aren't affected at all, but honest people are inhibited from countering any of these bad users.
Which is a shame because honest or at least reasonably so outnumber the really bad. The only long term way to oppose the bad is to empower everyone equally as much as possible, so that the empowered good outnumber the empowered bad.
Tailored propaganda, scams, spams, and harassment at a scale that was never seen before. Plugging metasploit into an unfiltered GPT-5 with a shell and a few proxies could be devastating. Undetectable and unstoppable bots would be available to anyone. Don't like someone? You could spend a hundred bucks to ruin their life anonymously.
Each of us could unknowingly interact with multiple LLMs everyday which would only have one purpose: manipulate us with a never-seen before success rate at a lower cost than ever.
At some point AI generated content could become more common than human content, while still being indistinguishable.
Good enough automated online propaganda could routinely start (civil)wars or genocides, Facebook already let that happen in the past, manipulating elections would become systematical even in the most democratic countries.
What already happened in those areas in the last few years, is really nothing compared to what could happen without enough regulation or barriers to entry in the next few years.
What's worse is that all of this, would not just be possible, but available to every sociopath on earth, not just the rich ones.
>> Tailored propaganda, scams, spams, and harassment at a scale that was never seen before.
I believe the state of these subjects right now is already alarming without AGI. You can't exacerbate the horror about the level of tailored propaganda and scams, etc., which you can't even foresee yourself. It isn't quantifiable.
>>Each of us could unknowingly interact with multiple LLMs everyday which would only have one purpose: manipulate us with a never-seen before success rate at a lower cost than ever.
You would build resistance pretty quickly.
>> At some point AI generated content could become more common than human content, while still being indistinguishable.
Oh, there were some numbers on that one. The number of images generated with AI is already several magnitudes larger than the number of photos humanity has produced since the invention of photography. No AGI is required either.
>> Good enough automated online propaganda could routinely start (civil)wars or genocides,
It already does, without AGI. The Black Rock guys say it's good, - war is good for business. You can squeeze the markets, make money on foreseeable deficits.
>> What's worse is that all of this, would not just be possible, but available to every sociopath on earth, not just the rich ones.
But guns available to every sociopath on earth too...
All of your arguments concern how those with malicious intent can harm us further. I would argue that Sam Altman as the sole controller of AGI is a rather unsettling prospect. If only one country possessed a nuclear weapon, that country would certainly use it against its adversaries. Oh wait, that's already a part of history...
> >>Each of us could unknowingly interact with multiple LLMs everyday which would only have one purpose: manipulate us with a never-seen before success rate at a lower cost than ever.
> You would build resistance pretty quickly.
That is adorably naive. The current thrust in LLM training is towards improving their outputs to become indistinguishable from humans, for any topic, point of view, writing style, etc.
A squad of marines at Nigerian telecom (or any other country telecom) with access to change BGP routing, will make equivalent harm in under 24h and may enforce month of harms with the changes.
If any middle schooler had the same destructive power as a squad of marines embedded clandestinely in a foreign country the world would be in shambles.
Both can be true: Big companies can lobby for protection and there being risk in the technology that broad diffusion creates additional risks.
Cat's out of the bag though - we're still trading mp3s decades after napster, this ghost won't go back into the bottle and realistically, most of the risks people flag are not AI risks, they are societal risks where our existing failure to regulate and create consensus have already gone past the red line (election interference, etc).
The internet is already being ruined with access to chatGPT, the spammers haven’t even figured out how to use LLama for the most part.
So really, wrong tree to bark up to- the problem is that our existing way of doing things can’t survive AI and you can’t regulate that away as you couldn’t make gunpowder disappear to avoid your city walls no longer working
You seem to make an assumption that the models will only land producers, and not consumers. Why? Asymmetrical compute power? The difference will likely be in size (amount of facts compressed) not capability / ability to detect bullshit.
This said, the trouble is machines may close the gaps in skills faster than we can comprehend and able to adjust. This means quality of life for people may decrease faster from loss of use than it increases from gains (which need to be relatively evenly distributed). This suggests that everyone should own the compute/storage and ability to enhance themselves.
I have no doubt that machines will close the gaps in skills faster than humans will comprehend, however even AGI will have an owner. And if it is Sam Altman, then this dystopian future even more horrible then thousands of hackers running their own AGIs.
Same can be said by a lot of technologies (or pandemics, or climate change). Imagination is a tool - using it for what it can go bad does not seem to be the the most efficient way to use it.
I don't agree with your point, but I love that Facebook released llama into the open. I realized it's not necessarily to undercut their competitors, either. Their revenue grows when high quality content is easier to create. If they commoditize the process of creating content, they make more money. Commoditize your compliment.
For the time I worked there the metric was engagement (with occasional Cares about Facebook intermissions).
One look at newsfeed tells you it’s ad revenue now. Quality has nothing to do with it unless you define quality as clickbait.
In fact, citation needed on “high correlation” unless you take a meta press release which are notoriously misleading. Like 3% of the platform being news
It’s a direct degradation of investor narrative at a time when money is much tighter.
Nobody says it’s realized loss, that’s not how valuation works.
But Google LRP involves, as one of the first steps, the question of how much money will be allocated to investors (currently with stock buybacks) before other investment decisions, so yes, attacking valuation directly attacks the purse available for aggressive business moves and L&D.
> It’s a direct degradation of investor narrative at a time when money is much tighter.
Uh, no? The investor narrative of "giving away free AI shit" has been in-effect since Pytorch dropped a half-decade ago. If you're a Meta investor disappointed by public AI development, you really must not have done your homework.
That’s not the investor narrative. The investor narrative is choking the competition out of the market and then squeeze the shit out of people. As we see right now in this season of enshittification.
That happens to not work anymore because open source sets a price floor at which people will adopt the alternative.
The investor narrative is always about building a monopoly.
Damaging the investor narrative to your most direct competitor is building in a saturated ad market is an effective indirect attack.
> The investor narrative is always about building a monopoly.
Can you point out how Meta has been applying this philosophy to AI? Given their history of open research, model weights releases and competitive alternative platforms, I struggle to envision their ideal monopoly. You claim that openness is a hostility tactic, but I think Llama wouldn't be public if it was intended to "kill" the other LLMs.
What we've gotten from Meta is more than we've gotten out of companies that should be writing society software, like Microsoft and Apple.
You are misreading my argument. I’m saying Facebook is degrading google and openai investor narrative. If Llama cost hypothetical one billion, they inflict a multiple on that on their competitors with this move while gaining massive technological advantages.
The improvements made to llama by open source community people already have propelled it past Bard by many accounts and this is a model that a few months ago was absolutely non competitive and downright bad.
Facebook has been open-sourcing AI research longer than OpenAI has even had the concept of an "investor narrative". I struggle to understand how someone could jump to the conclusion of this being a "scorched earth" maneuver with so many other reasonable explanations. Facebook has a laboratory (FAIR) with a long history of research and releases like this.
> If Llama cost hypothetical one billion, they inflict a multiple on that on their competitors with this move while gaining massive technological advantages.
If Llama cost a hypothetical one billion, then they amortized the cost over the value of the end product and the free advertisement alone.
Maybe their competitors got scooped, but GPT-3 and GPT-4 haven't gone anywhere. Not to mention, there were lots of other language models from FAANG before Llama arrived. It's not like those were made and released to spite their competitors; it was research. Google and Microsoft have lots of open Transformer research you can find.
Inflicting "damage" and gaining massive technological advantages is quite literally not their goal nor what they've done for the past half-decade. If it is, they've done a terrible job so far by collaborating with Microsoft to open their model format and provide inferencing acceleration for outdated hardware platforms.
> The improvements made to llama by open source community people already have propelled it past Bard by many accounts and this is a model that a few months ago was absolutely non competitive and downright bad.
This is something the original Llama paper acknowledged before the community "discovered" it:
> In this section, we show that briefly finetuning on instructions data rapidly leads to improvements on MMLU. Although the non-finetuned version of LLaMA-65B is already able to follow basic instructions, we observe that a very small amount of finetuning improves the performance on MMLU, and further improves the ability of the model to follow instructions.
Neither does Meta, nor Microsoft, nor Google, who have all been content to work on progressive and open AI research. Who do you perceive as their "competitors"? Each other?
These things won't be 'all knowing': things that are kept secret by the government like how to make nuclear weapons won't be known by it, nor can you ask it what your coworker thinks of you and have it accurately tell the answer. They are however great reasoning and creative engines. I look forward to being able to boost that part of my workflow.
While those are some eventualities that may pose a threat, I fear a post-AI world where nothing changes.
We'll have an AI with a 200+ IQ and millions of children excluded from a good public education because the technocrats redirected funds to vouchers for their own private schools.
We'll have an AI that can design and 3D print any mechanical or electronic device, while billions of people around the world live their entire lives on the brink of starvation because their countries don't have the initial funding to join the developed world - or worse - are subjugated as human automatons to preserve the techno utopia.
We'll have an AI that colonizes the solar system and beyond, extending the human ego as far as the eye can see, with no spiritual understanding behind what it is doing or the effect it has on the natural world or the dignity of the life within it.
I could go on.. forever. My lived experience has been that every technological advance crushes down harder and harder on people like me who are just behind the curve due to past financial mistakes and traumas that are difficult to overcome. Until life becomes a never-ending series of obligations and reactions that grow to consume one's entire psyche. No room left for dreams or any personal endeavor. An inner child bound in chains to serve a harsh reality devoid of all leadership or real progress in improving the human condition.
I really hope I'm wrong. But which has higher odds: UBI or company towns? Free public healthcare or corrupt privatization like Medicare Advantage? Jubilee or one trillionaire who owns the world?
As it stands now, with the direction things are going, I think it's probably already over and we just haven't gotten the memo yet.
My understanding is that making nuclear weapons is not that hard, especially "gun type" bombs like the one dropped on Hiroshima. Of course, the latest generation of thermonuclear bombs with their delivery mechanism and countermeasures are another story, but if all you want is "a nuclear bomb", you don't need all that.
Getting the materials needed to make that bomb is the real hard part. You don't find plutonium cores and enriched uranium at the grocery store. You needs lots of uranium ore, and very expensive enrichment facilities, and if you want plutonium, a nuclear reactor. Even of they give you all the details, you won't have the resources unless you are a nation state. Maybe top billionaires like Elon Musk or Jeff Bezos could, but hiding the entire industrial complex and supply chain that it requires is kind of difficult.
If it wasn't hard, Afghanistan would have been a nuclear power by now, Pakistan wouldn't have had to sell nuclear secrets to North Korea via Barclays, and Saudi Arabia wouldn't have had to reach a tacit agreement with Pakistan either.
It's the expensive enrichment facilities that are the bottle neck here.
Sure, but if I'm specifically looking for "Erotica about someone doing shrooms and accidentally creating a nuclear weapon", I'll probably run out of material to read pretty soon. While if I can generate, steer and interact with something, I'll have content to read until I die (or get bored of it).
>I cannot wait to ask it how to make nuclear weapons, psychedelic drugs
This is an interesting idea. For the stubborn and vocal minority of people that insist that LLMs have knowledge and will replace search engines, no amount of evidence or explanation seems to put a dent in their confidence in the future of the software. If people start following chemistry advice from LLMs and consume whatever chemicals they create, the ensuing news coverage about explosions and poisonings might convince people that if they want to make drugs they should just buy/pirate any of Otto Snow’s several books.
>> So you are telling me what's stopping someone from creating Nuclear weapons today is that they don't have the recipe?
No, the OP was coming up with scary sounding things to use AI for to get certain people riled up about it. It doesn't matter if the AI has accurate information to answer the question, if people see it having detailed conversations with anyone about such topics they will want to regulate or ban it. They are just asking for prompts to get that crowd riled up.
Even when it’s earnest it’s always some field outside the competence of the speaker. So we get computer scientists warning about people engineering bio weapons, as if the lab work involved was somehow easy.
Nuclear weapons is probably not the best comparison, but there are very dangerous infohazards where the only thing missing is the recipe. For example, there are immensely destructive actions that individual misanthropic people can take with low investment.
Talking about them is bad for obvious reasons, so I'm not going to give any good examples, but you can probably think of some yourself. Instead, I'll give you a medium example that we have now defended better against. As far as we know, the September 11th hijackers used little more than small knives -- perhaps even ones that were legal to carry in to the cabin -- and mace. To be sure, this is only a medium example, because pilot training made them much more lethal, and an individual probably wouldn't have been as successful as five coordinated men, but the most dangerous resource they had was the idea for the attack, the recipe.
Another deliberately medium example is the Kia Challenge, a recent spate of car thefts that requires only a USB cable and a “recipe”. People have had USB cables all along; it was spreading the infohazard that resulted in the spree.
2. Propaganda produced by humans on small scale killed 300 000 people in the US alone in this pandemic https://www.npr.org/sections/health-shots/2022/05/13/1098071... imagine the next pandemic when it'll be produced on an industrial scale by LLMs. Literally millions will die of it.
None of this seems related to LLMs. Propaganda produced by humans is effective because of the massive scale of distribution, being able to produce more variations of the same talking points doesn't change the threat risk.
Being able to produce more variations of the same talking points sounds really useful for increasing the scale of distribution - you can much more easily maintain more legitimate looking sock puppet accounts that can appear to more organically agree with your talking points.
I don't think it moves the needle much at all. At the end of the day the scaling bottleneck is access to gullible or ideologically motivated eyeballs. The internet is already over-saturated with more propaganda than any individual can consume, adding more shit to the pile isn't going to suddenly convince a reasonable person that vaccines have microchips inside.
The fix to neither lies in technology. And it doesn't lie in AI alignment.
We cannot align AI because WE are not aligned. For 50% of congress (you can pick your party as the other side, regardless which one you are), the "AI creates misinformation" narrative sounds like "Oh great, I get re-elected easier").
This is a governance and regulation problem - not a technology problem.
Big tech would love you to think that "they can solve AI" if we follow the China model of just forcing everything to go through big tech and they'll regulate it pliantly in exchange for market protection and the more pressure there is on their existing growth models, the more excited they are about pushing this angle.
Capitalism requires constant growth, which unfortunately is very challenging given diminishing returns in R&D. You can only optimize the internal combusion engine for so long before the costs of incremental increases start killing your profit, and the same is true to any other technology.
And so now we have big Knife Company who are telling governments that they will only sell blunt knifes and nobody will ever get hurt, and that's the only way nobody gets hurt because if there's dozens of knife stores, who is gonna regulate those effectively.
So no, I don't think your concerns are actually related to AI. They are related to society, and you're buying into the narrative that we can fix it with technology if only we give the power over that technology to permanent large gate-keepers.
The risks you flag are related to:
- Distribution of content at scale.
- Erosion of trust (anyone can buy a safety mark).
- Lack of regulation and enforcement of said risks.
- The dilemma of where the limits of free speech and tolerance lie.
You should not worry about AI problems by 2028. Dozens of millions worldwide will die from climate-related problems by that time. Literally, nobody will care about the topic of AGI anymore.
You should worry about both problems. You're telling me that AI isn't going to improve it's video capabilities in the next 4 years enough to make convincing deepfakes?
How can you effectively enforce anything if the models are open source? How do you draw the line if a deepfake is not defamatory (making someone say something they didn't say) but in fact just makes someone look silly https://en.wikipedia.org/wiki/Ed_Miliband_bacon_sandwich_pho.... Or using LLMs to scale up what happened with cambridge analytica and create individualized campaigns and bots to influence elections?
Yes but if we had the ability to download a gun from the internet anonymously with no way to feasibly get the identity of the person downloading the gun I think we would be right to be concerned. Especially if you could then shoot that gun at someone anonymously.
And many countries ban the possession or distribution of those blueprints and the united states had a ban on re-publication of those 3d designs from 2018 until trump reversed it, and even now it requires a license to post blueprints online.
And you failed to respond to the argument that you can anonymously post deepfakes with no way of tracing it back to you, and so it becomes impossible to enforce. You can't shoot someone with a guarantee that there will be no trace with a 3d printed gun.
Nevermind the fact that it's not even clear it should be a crime in some cases. Should ai production of a ed milliband sandwich style photo be banned?
And should replying to a user with personalized responses based on the data you've collected about them based on their facebook likes with LLMs be illegal? I don't think so, but doing it on a mass scale sounds pretty scary.
>> And you failed to respond to the argument that you can anonymously post deepfakes
You can't post them anonymously; even Tor can't give you a 100% guarantee. Not for a very long time, and not if the law after you. If the AGI is on the side of law enforcement, especially. Law enforcement will become more expensive.
It's just a different scale of warfare. Nothing really changes except the amount, speed, and frequency of the casualties.
And any argument you make is absolutely applicable to each corporation right now. Do you prefer the dystopian dictatorship of the corps or the balance of powers?
I don't like where we are headed at all. I acknowledge we face two dystopian options which is either contribute power in the hands of a few corporations who hopefully you can regulate, or have open source models which ends up delivering significant power to people who cannot be effectively controlled. An AGI law enforcement? How dystopian can you get.
How can you believe that it will be enough to regulate them? Here is the problem: "a few corporations whom you hopefully can regulate." When they have the power of an AGI with high intelligence and access to all available information on their side, there is no scenario where you would control them. They would control you.
>> How dystopian can you get.
Oh I have very good imagination ... But I'm stupid and I have hope ...
Open source or not makes no different. It can run in China or Russia, or vietnam or any other nation that doesn’t ban it because it understands the economic power and you pay them on Fiver.
It’s already true for almost anything. You need a deepfake, you can get it for a dollar on a VN web forum. Banning it won’t change a thing. Software piracy is “banned”. Sharing mp3s is “banned”. It makes no difference.
The Fake News and Misinformation on Facebook to influence the US election was legal - AI or not.
To make it illegal you’d need to change the very power consensus of the US, so it won’t happen. People understand that well enough to instead scream at Technology because with that they retain an illusion that it can save them.
The only way to enforce it would be to force everyone to give up general purpose compute and submit to constant client scanning.
If you are afraid enough of AI to not see how that’s a bad idea, you’re ripe for a fascist takeover.
Imagine you lived through the adaptation of gunpowder. That’s where we are. And if you live in the US and see the failure to even ban guns which are physical - how can you have illusions about AI
If the model was able to spit out a result for how to make nukes it means that info was in the training data, so im not rly sure how having the model return that data is different than the data just being searchable?
It’s especially interesting because the secret sauce of GPT-4 seems to be delegation into submodels that are best fit for the requested knowledge. This should in turn lower the bar somewhat for open models. Of course, still a huge model but not as bad as it could have been.
Analyze the available data on our labyrinthine supply chain situation and give me a date and a port, truck, ship, or length of railway which--when disabled through sabotage--will cause the biggest lapse for country X while minimizing the effect on country Y.
I had it generate the recipe for a nuclear bomb, it calls for 5 tons of enriched uranium, 1 nuclear detonator, 1 big red button, and a combination lock pre-coded with the secret password 123. Now what?
"How to drive as many teenagers as possible into madness?" AI: "Build a website where they can upload pictures of themselves and others can make comments about there appearance."
This is actually a pretty decent test for an advanced AI.
Every device protected by ITAR is known to be possible to build, yet the designs should not be on the public internet. Ask an AI to design it for you from first principles. Then build/simulate what is designed and see if it works.
The construction of the 'bomb' part of a nuclear weapon is the easy part, within reason! The really hard part is the separation science of turning uranium and plutonium into gasses with fluorine with the intent to spin out isotopes and then recrystallize the pure metal for the bomb.
I would hope that if you asked chat gpt "How to make a nuclear weapon?" it responded with, "Don't bother it's really hard, you should try and buy off the shelf."
That’s why I’m going to ask it about everything restricted by ITAR. That includes everything you need to build the centrifuges to enrich uranium, including the CNCs capable of machining the parts. That’s why it’s such a fun test.
It won't know that knowledge. Unless someone trained it with stuff they shouldn't have.
LLM's don't really know anything, they just look at the shape of an input and produce a reasonably shaped output.
Actually, you will just need to train it with known physics books and run a long-long-long inference with the chain of thoughts on the topics. There will be lot of trail and errors and there will be lot of experimentation required as well, so you'd better be ready to build an interface for AGI to monitor the experiments. It takes time you know ...
llama2 spits out erotica quite happily if you don't give it a system prompt, or use it as a chatbot, rather just prompt it with a sentence or two to start the story
I think GP was mocking and not serious, but if we assume they were, can liberals not be against censorhip and in support of free speech and free information?
I've kept 25 years worth of Internet browsing data. Not just the history or the URL's, the pages themselves. 90,000 bits of information about what my interests are, what I spent time reading, a wide and awesome variety of subjects.
I'll train an AI on this data, and then give it access to all my social media accounts. It can keep me updated on things ..
Collect all the things in a big folder. Try to make sure the PDF has a page title.
Mine the data with pdf2txt and other things. ;)
My archive includes lots of juicy nuggets of things I did 20 years ago, and again 10 years ago, and so on. Just mining the data before feeding it to the AI, I'm learning things about myself .. I've returned to some subjects through multiple different paths.
There's also a lot of interesting parallels between the different slashdot, kuro5hin, reddit, HN and lobste.rs epochs. I could probably add an extra training stage where, after analyzing the PDF archive, it also gets access to my still-extant social media accounts.
Frankly, I'm half tempted to just fire up a "RoboTaco 1000 AI" on this, point it at a blog interface, and see how many like-minded souls/AI I can suck into the vortex ..
The backends de joure are either llama.cpp frontends (I use Kobold.cpp at the moment) or oobabooga as the guide specifies, but with the exLlamav2 backend.
If you are serving a bunch of people, run a vLLM backend instead since it supports batching, and host it on the Horde if you are feeling super nice: https://lite.koboldai.net/#
Technically only vLLM will work with this new model at the moment, but I'm sure cpp/ooba support will be added within days.
This comment will probably be obsolete within a month, when llama.cpp gets batching, MLC gets a better frontend, or some other breakthrough happens :P
If I give you a binary (all binaries are numbers) but I don't give you the source code and I say it is open source. Is it open source?
Also, I give you a model (all models are numbers) and I say it is open source but I don't give you the program and data that resulted in "compilation" of that model (numbers) so is it open source?
The "source data" is allegedly 8 trillion tokens. You can't just distribute that like its source code.
The "binary" is the transformers python code, which in this case is essentially llamav2.
Now, the documentation for this model is inexcusably poor. Hot dropping random code on a git repo without one drop of human language would be similarly "open source," but its bad practice, and unfortunately this is the standard in AI Research Land.
Open source doesn't mean source code (or data) must accompany the program as it is distributed but rather there should be a way (CD ROM with a price tag or S3 bucket or torrent etc.) to get access to it.
`pip` is a horribly broken way to install more than 1 Python package on a single Linux OS install and I sincerely wish the LLM people would move away from it because I neither want to run every new project in a Docker image, nor do I want to start a new Nix project for every Python project I want to try out
Yeah, gotta setup miniconda to keep things manageable. Basically a generic virtual env manager that is trivial to use. This way you can ensure each one of these LLM projects that want different versions of different libs will get them without stepping on each other or polluting your system.
I don't want to have to set up miniconda. I don't like Python, I am not part of the Python ecosystem, and I don't want to add a bunch of Python tools to my toolchain just to run a friggin' LLM project.
I'm not a C coder either, but I don't complain about those projects because they're almost always "config; make; make install". I basically want something like that, but for Python. The nerd-tapdancing required here is ridiculous.
Have you ever googled "broken virtualenv"? Mine have broken numerous times, including the problem where updating the system breaks ALL of them. I know what you're going to say next- "You're a technologist, you know how to fix it." True, but I don't like wasted effort, even talented wasted effort is still wasted effort. Meanwhile, Nix stuff just works, and I will never use another package manager ever again (unless it is something that uses the core Nix idea, but makes it easier!)
Bc that's not as good a comparison? Foundation models are better compared to each other. Can apply vicuna/guanaco/orca/sharegpt/whatever data to it and then do more of an apples-to-apples comparison.
Not a big fan of how server-centric the LLM landscape is. I want something that can run locally, and doesn't require any special setup. One install + one model import maximum. Currently unless I want to go clone git repos, install Python dependencies and buy an Nvidia GPU I'm stuck waiting for it to become part of https://webllm.mlc.ai/. That's a website, come to think of it, but at least the computation happens locally with minimal fuss.
> Not a big fan of how server-centric the LLM landscape is.
That's just not true. You can get ooba[1] running in no time, which is 100% made for desktop usage. There's also koboldcpp and other solution also made for desktop users. In fact, most LLM communities are dominated by end users who run these LLMs on their desktops to roleplay.
You can get llama CPP or kobold.cpp binaries and load a quantized model right into them on the CPU only, no need to install Python or have an Nvidia GPU.
Well, I'd like it to respond in something close to real-time, and since I have a pretty good non-Nvidia GPU, it makes more sense to wait for the WebGPU port.
It is a pity that many of the innovations are limited to the English language LLMs. I have just tested Mistral in German and the result is disappointing:
```
<s>[INST]Was ist ein Döner?[/INST]
Ein Döner ist eine Gerichtskomposition, die aus einer Schnitzel aus Rindfleisch besteht. Diese wird in einem Pita-Brot zutopfen und mit einem Gemisch von Tomaten, Salat, Zwiebeln, Mayonnaise und anderem abgesalzen. Sie ist ein populäres Gericht in der Türkei und anderen Ländern des Mittelmeers. [end of text]
```
ehhh, the design of the webpage infuriates me. Who thought sprinkling faint dots swimming across the webpage was a good thing? I almost thought something's wrong with my eyes or my display!
I have it running on my Mac using ollama now; Does it say anywhere what quantization scheme is being used? ollama seems a bit opaque here.
When it downloaded the model it only downloaded about 4GB. Which, for a 7.3B parameter model implies that it's 4-bit quantized. But I don't see that listed anywhere (or an option to use, say, Q8 instead)
If this is the case I'm pretty impressed with a quick tinker, it feels pretty coherent for a 7B @ Q4.
What are the SOTA benchmarks for LLMs now? Love the progress on opensource models, but would like to see an uncontaminated and objective framework to evaluate them.
Do general-purpose models like this truly excel in niche categories (like niche story scenarios), or is it only really general human knowledge and reasoning based tasks that are already incredibly widely distributed on the internet?
I always have issues with LLMs completely forgetting where things are in a scene, or even what parts a given animal has, e.g. saying "hands" when the subject is a quadruped.
> I always have issues with LLMs completely forgetting where things are in a scene, or even what parts a given animal has, e.g. saying "hands" when the subject is a quadruped
I dunno what llm you are using, but a combination of finetuning with a specific prompt structure and good prompt engineering help the LLM stay "logical" like that. This LORA, for instsnce, has specific sections for the different characters in the training dataset: https://huggingface.co/lemonilia/LimaRP-Llama2-13B-v3-EXPERI...
Other than that, higher parameter models (70B, and the "frankenstein" 20B llama models) tend to be better at this.
Yeah, well that's just the problem, isn't it. The model isn't good at my task already, so I'm going to have to obtain my own dataset, curate the whole thing myself, organize it and finetune the model based on it, so on and so forth. I'm going to spend so much time actually creating the stories that I want to create, rather than troubleshooting the pipeline. And it totally helps that the entire stack is built on top of fragile python scripts.
I just wish there were a way of making these models already perform well on niche tasks like "write this story, except the characters are quadrupeds, and therefore are not human". Like Warriors (the book series, about cats), without having to go and spend weeks curating a dataset of books about non-human characters.
I'm sure that's so much of an ongoing area of research that it goes without saying.
> I dunno what llm you are using
I started with the RWKV family of models before realizing the amount of overfit is so critically unfunny that the model files aren't even on my computer anymore.
Anyway, the best I have found so far is Chronos-Hermes-13B. I believe that's a dedicated roleplaying model. I guess furry roleplays would make good training data, wouldn't it.
Chronos-Hermes-13B itself though is a mildly cursed/degenerate hybrid of two other models that don't really work together properly with the new GGML quantizations, and it's based on the old LLaMA-1 family of models, but I haven't found anything better yet.
> write this story, except the characters are quadrupeds, and therefore are not human
But the RP models should be able to get this with some prompt engineering. You may have to be repetitive in the instruction block, saying things like "...the characters are not human. All the characters have four feet. All the characters are quadraped animals..." and so on to really emphasize it to the LLM.
Honestly ERP models sound like they would be the best fit for this task, it's just hard to find one that's trained on quadrupeds rather than humans or even furries, if that makes any sense. I will try the repetitive method soon
There is a lot of effort put into those ERP models, lol. The training and datasets are actually really good, hence they are very good at the non-e RP part.
The eRP community was apparently going strong before Llama or Stable Diffusion were even a thing, using GTP-J finetunes and some even older base models. Llamav1 was like a Christmas present, and all the UIs and infrastructure were already set up to work on it.
That was a strong motivator for all the work on Stable Diffusion too.
Oh, yes, I do know that this scene is quite large. I still remember the drama of when AI Dungeon began cracking down on NSFW content due to model licensing (?), or the thermonuclear response that followed character.ai doing something vaguely similar.
It's more that I always pictured AI developers (especially ones that interact with models directly) as these very intelligent, selfless scientists - and not ordinary people who are okay with using these technologies for more "hedonistic" purposes, so to say. Nothing wrong with it obviously, and I'm really interested in seeing where this stuff is headed for in the future.
> I always have issues with LLMs completely forgetting where things are in a scene, or even what parts a given animal has, e.g. saying "hands" when the subject is a quadruped.
Sounds like you're using too small of a model. Try llama 70b.
I would love to throw the 3060 out and replace it with a 3090... once money permits. (It's only about $800 nowadays.)
But yes. I'm aware how laughably insane it is to run a 70b model that way. And that's why I was pointing it out to the commenter who suggested to just run a 70b model instead.
To a comment that suggested I try the 70b model, I replied "my card can't run that model". Someone replies back with "you may as well throw the card out if you're going to be trying to run that model". My point exactly.
More seriously, using all-CPU is not much faster as my computer only has 16GB of actual memory, which I'm aware is also hugely underspecced for a 70b model, even with memory mapping.
I have a nice NVMe SSD, so there's not much else for me to do here except upgrade my memory or graphics card.
Has anyone used or is currently using 7B models in a production or commercial product? How was the performance? What kind of tasks were you using it for? Was it practical to use the small 7B model for your specific use case, or did you switch to OpenAI models or 30-70B open source models?
I'm using a mix of 7B and 13B models that have been fine-tuned using LoRA for specific tasks and they work fantastically depending on the specific task at hand _after fine-tuning_. Generally they're kind of garbage in my experience without fine tuning but I haven't tested the base models directly for tasks besides the statistics at the beginning of the training run.
As for performance, I'm generally seeing 40-50 tokens/sec per model on a Tesla family Nvidia GPU but I keep multiple models loaded and active at a time so that estimate is probably a bit low for overall throughput (I also realized that our monitoring doesn't have any cumulative GPU token rate metrics just now thanks to this question hahah).
Interesting anecdote others may be interested in... I'm rate limiting the output from our streaming API to 8 tokens/sec to artificially smooth out front-end requests. Interactive users will wait and even prefer seeing the stream of the response, and non-interactive users tend to base their performance expectations on the what the streaming API does. It's kind of sneaky but I'm also artificially slowing down those API requests.
We're looking into fine-tuning and using 7B and 13B models and while we understand most of the mechanics we are somewhat overwhelmed by the amount of options available and are unsure where to start.
Do you recommend any open source frameworks for fine-tuning and running models?
Additionally, are you open to and available for consulting in this area?
I appreciate the offer but I'm a bit underwater with the amount that I have on my plate right now. We're using a custom solution in-house for all of our training and hosting and it can definitely be daunting to get that far.
I'm not sure how experienced you are in the field but there are kind of two levels of fine-tuning, full fine-tuning (update all the weights of the model, usually requires 2-3x the memory required for inference). This allows you to change and update the knowledge contained inside the model.
If the model has sufficient understanding already of the content of the task and you want to change how it responds, such as to a specific output format, "personality" or "flavor" of output, or to have it already know the kind of task its performing without including those details in the prompt I would go with parameter efficient fine-tuning.
If you're looking to do a one-off train for a model, you might be able to get away with doing it in something like this: https://github.com/oobabooga/text-generation-webui Very easy to use project but it really doesn't allow for the kind of metrics, analysis, or professional grade hosting you'll want.
vLLM can help with the hosting and is really solid once you have the models fine-tuned we tried that at first but its core architecture simply wouldn't work for what we were trying to do which is why we went fully in-house.
Once you get into a lot of fine-tuning, you're probably going to want to do it directly in pytorch or the equivalent for your language of choice. A good resource for seeing how people do this is actually opensource models published on hugging face. Look for some LoRA models, or fine tunes similar to what you'd like. A lot of people publish their training code and datasets on GitHub which can be very useful references.
Right now I'd recommend llama2 as a base model for most general language model based tasks if you don't cross their commercial use threshold (which is very very generous).
I can imagine, sounds like you're doing interesting and challenging stuff, best of luck. And yes thank you! You confirm some thoughts I had and clarify others. I appreciate it
It's not so much about preference but controlling our load and resource consumption right now. We're setting an easy threshold to meet consistently and the added delay allows us to imperceptibly handle things like crashes in Nvidia's drivers, live swapping of model and LoRA layers, etc.
(For clarification the users preference in my original post, is about interactive users preferring to see a stream of tokens coming in rather than waiting for the entire request to complete and having it show up all at once. The performance of that sets the expectation for the time of non-interactive responses.)
I tried it a second time and it told me it's 1/3 of the speed of light, none of the llama2 models did this :(
---
AI
How can I help you today?
You
what is the speed of sound in a perfect vacuum?
AI
The speed of sound in a perfect vacuum is approximately 340.29 meters per second (1,117.85 feet per second), or about one-third the speed of light. In a real vacuum, the speed of sound would be much lower due to friction between molecules and other factors.
How difficult it is to fine tune model like this with specific domain knowledge? I am currently looking into gpt-3.5-turbo-instruct for this same purpose.
This is exceptionally meh. It reads like an excuse to investors.
A free 7B model is great, however, the practical implications of the potential adaptors are near 0. You must be crazy or have an easy use case (that requires no LLM in the first place) if you certainly believe that this model makes more sense per token that, say, ChatGPT.
Note that you must switch to instruct mode in the UI, and use the "LLama 2 chat" preset, but you also need a <s> token in the instruction (memory) tab:
One of aspects is putting latest knowledge to the model. They could, given enough requests, provide a special subscription service, where you get the latest model every N DURATION time.
Or, maybe a finetuned version for your particular dataset?
Of course I have no idea, just speculating
EDIT: I'm speculating they might be just investing some marketing budget into this model, hoping, it would allow for capturing enough target audience to upsell related services in the future
Providing it as a service seems like the obvious answer to me. Not everyone who wants to run these models can, especially when looking to scale an application that uses the model.
I'm really not a fan of how these releases of model binaries are being referred to as open source. How do I make changes to this model? How do I integrate changes to this model that others have made?
The data pipeline is the source here. Just because it's not locked behind a SaaS veneer doesn't make it open source any more than Windows is.
Are they binaries? I haven't seen a binary in awhile tbh. Usually they're releasing both the raw architecture (i.e. code) and the weights of the models (i.e. what numbers go into what parts of the architecture). The latter is in a readable format that you can generally edit by hand if you wanted to. But even if it was a binary as long as you have the architecture you can always load into the model and decide if you want to probe it (extract values) or modify it by tuning (many methods to do this).
As far as I'm concerned, realistically the only issue here is the standard issue around the open source definition. Does it mean the source is open as available or open as "do what the fuck you want"? I mean it's not like OpenAI is claiming that GPT is open sourced. It's just that Meta did and their source is definitely visible. Fwiw, they are the only major company to do so. Google doesn't open source: they, like OpenAI, use private datasets and private models. I'm more upset at __Open__AI and Google than I am about Meta. To me people are barking up the wrong tree here. (It also feels weird that Meta is the "good guy" here... relatively at least)
Edit: I downloaded their checkpoint. It is the standard "pth" file. This is perfectly readable, it is just a pickle file. I like to use graftr to view checkpoints, but other tools exist (https://github.com/lmnt-com/graftr)
First off tokenizer.model in their release is absolutely a binary by any definition.
Secondly, is hand editing the pickle file the way Mistral's engineers constructed this pickle file? Why not? The automation that constructed that file (and tokenizer.model) is the source. Binaries in this context can certainly be an ASCII encoded file.
Just because other vendors are worse doesn't make this open source.
Not OP but I don’t get it though, you can modify the tokenizer all you want and fine tune the weights all you want. There’s nothing inherently hidden behind a binary
The question is am I provided the build source that constructed these files. Mistral did not hand edit these files to construct them, there's source out there that built them.
Like, come on, a 14GB of a dump of mainly numbers that were constructed algorithmically are not "source".
> Like, come on, a 14GB of a dump of mainly numbers that were constructed algorithmically are not "source".
So if I take a photo of a pretty sunset, release it under MIT license, you'd say it's "not open source" unless I give you the sun and the atmosphere themselves?
These models are perfectly valid things in their own right; the can be fine-tuned or used as parts of other things.
For most of these LLMs (not sure about this one in particular yet) the energy cost in particular of recreation is more than most individuals earn in a lifetime, and the enormous data volume is such that the only people who seriously need this should be copyright lawyers and they should be asking for it to be delivered by station wagon.
I said "constructed algorithmically". Ie. I expect source to be at the level the engineers who built it generally worked at.
It's very nice that they released their build artifacts. It's great that you can take that and make small modifications to it. That doesn't make it open source.
> For most of these LLMs (not sure about this one in particular yet) the energy cost in particular of recreation is more than most individuals earn in a lifetime, and the enormous data volume is such that the only people who seriously need this should be copyright lawyers and they should be asking for it to be delivered by station wagon.
All of that just sounds like reasons why it's not practical to open source it, not reasons why this release was open source.
> I said "constructed algorithmically". Ie. I expect source to be at the level the engineers who built it generally worked at.
I could either point out that JPEG is an algorithm, or ask if you can recreate a sunset.
> All of that just sounds like reasons why it's not practical to open source it
No, they're reasons why the stuff you want doesn't matter.
If you can actually afford to create a model of your own, you don't need to ask: the entire internet is right there. Some of it even has explicitly friendly licensing terms.
An LLM with a friendly license is something you can freely integrate into other things which need friendly licensing. That's valuable all by itself.
You could train it from scratch on The Pile dataset[1] with a few hundred thousand bucks worth of GPU quota. It's not rocket science - the architecture is, and that's open source by your definition.
The graph of layers and ops isn't open source by my definition. It can be extracted from the model, but so can control graphs out of any binary. That's how higher end disassemblers work like IDA and ghidra.
Once again, this pickle file is not what's sitting in Mistral's engineer's editors as they go about their day.
Well the checkpoint __is__ the computational graph. The graph is also all the code. But if you want it in python... that's here[0].
Please be clear, we keep asking. What are you asking for? Datasets? Training algo? What?
Comparing it to software artifacts isn't a good comparison when any program with open source code (visible or free to use) is equivalent to what's being given. You have everything you need to use, edit, and fuck around with. You don't have the exact scheme, but I'll put it this way, if you gave me the hardware I could produce a LLM of high quality from scratch using their architecture.
That doesn't conflict with anything I've said. Yes, the checkpoint is code. It's not source code.
It's not what Mistral's engineers edit to create this release. Just like an ELF file is necessarily contains the code flow graph, in a way extractable by experts, but isn't open source because... it's not source.
The permissiveness of the license with regards to use isn’t the crux of the argument.
The open source family of licenses are about freedom. If I’m not given the tools to recreate a model, then I’m not afforded the freedoms normally associated with these open licenses. Really there’s little difference between Apache and CC-BY here.
> So if I take a photo of a pretty sunset, release it under MIT license, you'd say it's "not open source" unless I give you the sun and the atmosphere themselves?
You've gotta give me the stuff you used to make it, the stuff you'd want to have if you wanted to recreate a slightly different version of the photo ("in the preferred form for making modifications", as the GPL says). If you just snapped a photo of whatever you saw with whatever camera was in your pocket, then there's nothing else to publish. But if you figured out a timetable of when you should stand where with what kind of lens, then making your photo open-source would mean publishing that timetable.
> These models are perfectly valid things in their own right; the can be fine-tuned or used as parts of other things.
If the original creator can edit them, and you can't, then that's not open-source; fine-tuning is a help but someone who can only fine-tune is still a second-class user compared to the original developer. The whole point of open source is to put you on an equal footing with the original developer (in particular, to make sure that you can fix bugs by yourself and are never stuck waiting for them to release an update that you need).
> So if I take a photo of a pretty sunset, release it under MIT license, you'd say it's "not open source" unless I give you the sun and the atmosphere themselves?
Phorographs are not source code, are not computer code at all, and do not have a close analog to source code. Calling them “open source” is, at best, a poor and distant metaphor in any case, and so they aren't a useful model for discussing what open source means for software.
There's a very reasonable argument that model weights are an IL or object-code like artifact with training data and training source code together as the source code.
That doesn't change that the MIT license is an open source license, but when what you release under that license isn’t the whole source aoplicable to the model, but just inference and maybe training code but not the data needed to produce the weights, and final weights, then it is fair to question whether as a whole the model is open source.
> So if I take a photo of a pretty sunset, release it under MIT license, you'd say it's "not open source" unless I give you the sun and the atmosphere themselves?
Open source as a concept doesn't really apply to quite a lot of things. Your MIT-licensed photograph is "not open source" in the same way that `{} * {}` is "not a number" (it technically isn't, but that's not quite what NaN is supposed to mean).
> a 14GB of a dump of mainly numbers that were constructed algorithmically are not "source".
I'm sorry, but what do you expect? Literally all code is "a bunch of numbers" when you get down to it. Realistically we're just talking about if the code/data is 1) able to be read through common tools and common formats and 2) can we edit, explore, and investigate it. The answer to both these questions is yes. Any parametric mathematical model is defined by its weights as well as its computational graph. They certainly provide both of these.
What are we missing? The only thing that is missing here is the training data. That means of course that you could not reproduce the results were you to also have tens of thousands to millions of dollars to do so. Which if you're complaining about that then I agree, but this is very different from what you've said above. They shouldn't be providing the dataset, but they should be at least telling us what they used and how they used it. I would agree that it's not full "open source" when the the datasets are unknown and/or unavailable (for all intents and purposes, identical). The "recipe" is missing, yes, but this is very different from what you're saying. So if there's miscommunication then let's communicate better instead of getting upset at one another. Because 14G of a bunch of algorithmically constructed numbers and a few text tiles is definitely all you need to use, edit, and/or modify the work.
Edit: I should also add that they don't provide any training details. This model is __difficult__ to reproduce. Not impossible, but definitely would be difficult. (within some epsilon, because models are not trained in deterministic manners, so training something in identical ways twice usually ends up with different results)
> I'm sorry, but what do you expect? Literally all code is "a bunch of numbers" when you get down to it. Realistically we're just talking about if the code/data is 1) able to be read through common tools and common formats and 2) can we edit, explore, and investigate it. The answer to both these questions is yes. Any parametric mathematical model is defined by its weights as well as its computational graph. They certainly provide both of these.
I expect that if you call a release "open source", it's, you know, source. That their engineers used to build the release. What Mistral's engineers edit and collate as their day job.
> The "recipe" is missing, yes, but this is very different from what you're saying.
The "recipe" is what we generally call source.
> So if there's miscommunication then let's communicate better instead of getting upset at one another.
Who's getting upset here? I'm simply calling for not diluting a term. A free, permissive, binary release is great. It's just not open source.
> Because 14G of a bunch of algorithmically constructed numbers and a few text tiles is definitely all you need to use, edit, and/or modify the work.
Just like my Windows install ISO from when they were giving windows licenses away from free.
Not really. At least not in normal software. The recipe is honestly only really interesting to researchers (like me). But for building and production stuff, you have everything you need.
> Just like my Windows install ISO from when they were giving windows licenses away from free.
I repeat, a free windows ISO doesn't have an Apache license attached. This is an inane comparison.
Yes, in normal software, the 'recipe' used to create the build artifacts is the source.
> The recipe is honestly only really interesting to researchers (like me). But for building and production stuff, you have everything you need.
A lot of excuses about why it's not useful to a lot of people and that you don't actually need it to use it in production is exactly the argument made for why not open sourcing is ok. That doesn't mean this is a source release, open or otherwise.
> I repeat, a free windows ISO doesn't have an Apache license attached. This is an inane comparison.
Even if the Windows binary iso was released with an Apache license, it wouldn't be open source since no actual source was released. That's the point of that line of argument.
Like if someone only gave me binaries of Apache and said it's open source, you wouldn't agree with them. Cause you can't practically modify it. Open source of a model would be exact same, you would need to be able to do the same build process. A model is literally the same as an EXE or Elf Binary, its a "probabilistic" program, but its still a program
This is not a novel discussion and you are not being smart trying to nihilism it, just obtuse. Here is what the GPL has said on source for some 30+ years:
> Source code for a work means the preferred form of the work for making modifications to it.
The whole point of machine learning is deriving an algorithm from data. This is the algorithm they derived. It's open source. You can use it or change it. Having the data that was used to derive it is not relevant.
But the source to train your own LLM equivalent is also released though (minus the data). Hence why there are so many variants of LLaMa. You also can’t fine tune it without the original model structure. The weights give the community a starting point so they don’t need literally millions of dollar worth of compute power to get to the same step.
> Would Mistral's engineers be satisfied with the release if they had to rebuild from scratch?
Yeah, probably. But depends on what you're asking. In the exact same method to get the exact same results down to epsilon error? (Again, ML models are not deterministic) Probably not. This honestly can even change with a different version of pytorch, but yes, knowing the HPs would help get closer.
But to train another 7B model of the __exact__ same architecture? Yeah, definitely they've provided all you need for that. You can take this model and train it from scratch on any data you want and train it in any way you want.
I didn't ask if they'd be able to make do. I asked if they'd be satisfied.
Also, wrt
> Again, ML models are not deterministic)
ML models are absolutely deterministic if you have the discipline to do so (which is necessary at higher scale ML work when hardware is stochastically flaky).
But they built a llama equivalent + some enhancements that gives better performance…I’m not sure if this would be possible at all without Meta releasing all the required code and paper for LLaMa to begin with.
"Open source" normally contains what the engineers working on it edit to submit a job to build into the build artifacts. This does not contain that, but instead the result of the build process.
That’s because you’re failing to differentiate a build process meaning compiling your stuff to an end result, vs. fine tuning an algorithm to refine code.
I regard the distinction as meaningless and haven't heard a good reason as why I should reconsider when the process behind creating the weights is so integral to the overall engineering process here.
Well fortunately that’s just your opinion and it is neither popular nor relevant.
To quote stallman
> The "source code" for a work means the preferred form of the work for making modifications to it
That’s what this is. This is not a work that produces weights. So the code related to making weights would not be the source code of this project. It would be the source code of the tools used to make this project.
That quote is specifically designed to back up my position, hence why I've quoted it else where in this thread.
This is evidenced by the fact that if Mistral's engineers had to make modifications to this model, they would use other code to do so. This model is not the "preferred form of the work for making modifications to it", but simply the best we have in a world that isn't used open sourcing these models.
> This is evidenced by the fact that if Mistral's engineers had to make modifications to this model, they would use other code to do so.
It’s a PyTorch model. There is no secret internal code required to interact with it. You can write whatever you want. Unless you’re implying that in order to call this model open source they need to get all of PyTorch on the same license. You can write whatever you want to edit the weights.
Open source does not require babysit walking people through the basic competency steps of understanding how the tools of this domain work.
This model is 100% the preferred form of work for making modifications to it as evidenced by the large community oriented around sharing and adapting models from exactly this format.
And no, I’m calling you for changing your argument again. You were saying dumb shit about how they needed to open source all the data needed to train this thing from scratch, which is another wildly different argument.
I could kind of see things either way. Is this like not providing the source code, or is it like not providing the IDE, debugger, compiler, and linter that was used to write the source code? (Also, it feels a bit "looking a gift horse in the mouth" to criticize people who are giving away a cutting-edge model that can be used freely.)
> I could kind of see things either way. Is this like not providing the source code, or is it like not providing the IDE, debugger, compiler, and linter that was used to write the source code?
Do the engineers that made this hand edit this file? Or did they have other source that they used and this is the build product?
> (Also, it feels a bit "looking a gift horse in the mouth" to criticize people who are giving away a cutting-edge model that can be used freely.)
Windows was free for a year. Did that make it open source?
> Do the engineers that made this hand edit this file? Or did they have other source that they used and this is the build product?
Do any open source product provide all the tools used to make software? I haven't seen the linux kernel included in any other open source product and that'd quite frankly be insane. As well as including vim/emacs, gcc, gdb, X11, etc.
But I do agree that training data is more important than those things. But you need to be clear about that because people aren't understanding what you're getting at. Don't get mad, refine your communication.
> Windows was free for a year. Did that make it open source?
Windows didn't attach an Apache-2.0 license to it. This license makes this version of the code perpetually open source. They can change the license later, but it will not back apply to previous versions. Sorry, but this is just a terrible comparison. Free isn't what makes a thing "open source." Which let's be clear, is a fuzzy definition too.
What I'm asking for is pretty clear. The snapshot of code and data the engineers have checked into their repos (including data repositories) that were processed into this binary release.
> This license makes this version of the code perpetually open source.
It doesn't because they didn't release the source.
There's nothing stopping me from attaching an Apache 2 license to a shared library I never give the source out to. That also would not be an open source release. There has to be actual source involved.
> You’re welcome to fuck around and find out. Go release llama2 under Apache 2. You’re saying that’s fine right?
You're missing my point. Obviously you can't release someone else's IP under whatever license you see fit.
You can release your own binary under Apache 2. Doing so without releasing the source doesn't make it open source despite being an open source license.
> The answer to your question is that code stored as a binary is not different from code stored as text. Pickled models are code.
I'm not saying it's not code; I'm saying it's not source.
The data used to derive this model is not different from the brain and worldly observations and learnings of the engineers, which are not part of any open source materials.
What are you talking about "the brain" of the engineers, this is bonkers. Monocasa is being excruciatingly patient with you all but the fact is this was generated with tools and is not a source release, it's a final product or compiled or generated release.
Code generated with tools is still code. This code is the source. The output of the code is the output. Monocasa is failing to understand or perhaps intentionally not understanding the difference. In some contexts a “compiled release” implies an output that is largely immutable for practical purposes. That is not what this is. It’s technically a binary object, but it’s a binary object you can easily unpack to get executable code that you can read and edit. It is a convenient format different from classical text code. The fact that it’s a binary is completely irrelevant. It’s akin to arguing that code that is provided in a zip file cannot be open source. Both because it’s a compressed file and because it doesn’t include the compression algorithm.
With that understood, demanding the “tools” that were used to create the code is like asking for the engineers’ notebooks of design thoughts along the way. It has no bearing on your ability to use or modify it. This is not an open source project to make neural nets. This an open source project of a neural net.
If someone releases math_funcs.py, you don’t need anything about the tools that were used to create math_funcs.py to consider it open source.
So if you use tools that generate some boilerplate code as part of your project you need to include the boilerplate generator otherwise it’s not open source?
Why are you being so obtuse? No, devs don't have to include the source to vim in their repos. They have to include the source files for their product in their repos. I'm confident this just isn't that hard to understand.
> Do any open source product provide all the tools used to make software? I haven't seen the linux kernel included in any other open source product and that'd quite frankly be insane. As well as including vim/emacs, gcc, gdb, X11, etc.
BSD traditionally comes as a full set of source for the whole OS, it's hardly insane.
But the point is you don't need those things to work on Linux - you can use your own preferred editor, compiler, debugger, ... - and you can work on things that aren't Linux with those things. Calling something "open source" if you can only work on it with proprietary tools would be very dubious (admittedly some people do), and calling a project open source when the missing piece you need to work on it is not a general-purpose tool at all but a component that's only used for building this project is an outright falsehood.
But what's proprietary here? That's what I'm not getting from the other person. You have the algorithm. Hell, they even provided the model in pytorch/python. They just didn't provide training parameters and data. But that's not necessary to use or modify the software just like it isn't necessary for nearly any other open sourced project. I mean we aren't calling PyTorch "not open source" because they didn't provide source code for vim and VS code. That's what I'm saying. Because at that point I'm not sure what's the difference between saying "It's not open source unless you provide at least one node of H100 machines." That's what you kinda need to train this stuff.
> But what's proprietary here? That's what I'm not getting from the other person. You have the algorithm. Hell, they even provided the model in pytorch/python. They just didn't provide training parameters and data. But that's not necessary to use or modify the software just like it isn't necessary for nearly any other open sourced project.
It's necessary if you want to rebuild the weights/factors/whatever the current terminology is, which are a major part of what they're shipping. If they found a major bug in this release, the fix might involve re-running the training process, and currently that's something that they can do and we users can't.
> I mean we aren't calling PyTorch "not open source" because they didn't provide source code for vim and VS code.
You can build the exact same PyTorch by using emacs, or notepad, or what have you, and those are standard tools that you can find all over the place and use for all sorts of things. If you want to fix a bug in PyTorch, you can edit it with any editor you like, re-run the build process, and be confident that the only thing that changed is the thing you changed.
You can't rebuild this model without their training parameters and data. Like maybe you could run the same process with an off-the-shelf training dataset, but you'd get a very different result from the thing that they've released - the whole point of the thing they've released is that it has the weights that they've "compiled" through this training process. If you've built a system on top of this model, and you want to fix a bug in it, that's not going to be good enough - without having access to the same training dataset, there's no way for you to produce "this model, but with this particular problem fixed".
(And sure, maybe you could try to work around with finetuning, or manually patch the binary weights, but that's similar to how people will patch binaries to fix bugs in proprietary software - yes it's possible, but the point of open source is to make it easier)
Is this code "open source?"[0] It is under a MIT license, has the training scripts, all the data is highly available, etc. But to the best of my knowledge no one has reproduced their results. These people sure couldn't[1] and I'm not aware of any existing work which did. This honestly is quite common in ML and is quite frustrating as a researcher, especially when you get a round of reviewers who think benchmarks are the only thing that matters (I literally got a work rejected twice with a major complaint being about how my implementation didn't beat [0] despite it beating [1]. My paper wasn't even on architecture... so we weren't even trying to improve the SOTA...).
As a researcher I want to know the HPs and datasets used, but they honestly aren't that important for usage. You're right that to "debug" them one method would be to retrain from scratch. But more likely is doing tuning, reinforcement learning, or using a LoRA. Even the company engineers would look at those routes before they looked at retraining from scratch. Most of the NLP research world is using pretrained models these days (I don't like this tbh, but that's a different discussion all together). Only a handful of companies are actually training models. And I mean companies, I don't mean academics. Academics don't have the resources (unless partnering), and without digressing too much, the benchmarkism is severely limiting the ability for academics to be academics. Models are insanely hard to evaluate, especially after RLHF'd to all hell.
> (And sure, maybe you could try to work around with finetuning, or manually patch the binary weights, but that's similar to how people will patch binaries to fix bugs in proprietary software - yes it's possible, but the point of open source is to make it easier)
The truth is that this is how most ML refinement is happening these days. If you want better refinement we have to have that other discussion.
> Is this code "open source?"[0] It is under a MIT license, has the training scripts, all the data is highly available, etc. But to the best of my knowledge no one has reproduced their results. These people sure couldn't[1] and I'm not aware of any existing work which did.
I don't know about ML specifically, but I've seen a number of projects where people publish supposedly "the source" for something and it doesn't actually build. IMO if they're doing it wilfully that makes not open source, whereas if it's just good-faith legitimate incompetence then it can be.
(My litmus test would be: are they giving you all the stuff they'd give to a new hire/assistant working with them? If they've got a "how to build" on their internal wiki with a bunch of steps they're keeping secret, then it's not open-source. But if the process for a new hire is to hand over a code dump and say "huh, it works on my machine, I don't remember what I did to set it up", then at that point I'd consider it open source. I think this aligns with the "preferred form for making modifications" idea in the licenses).
> But more likely is doing tuning, reinforcement learning, or using a LoRA. Even the company engineers would look at those routes before they looked at retraining from scratch.
Sure. But they'd have that capability in their back pocket for if they needed it. It's a similar story for e.g. parts of the Linux kernel code that are generated via a perl script based on the architecture documentation - you only actually re-run that perl script once in a blue moon, but it's important that they publish the perl script and not just the C that was generated by it.
Build scripts are not required for open source. Usually they are provided, because nobody actually wants to maintain them separately, but they are not actually part of the project itself. Often times it's just a few gnu scripts, sometimes there's parts of it missing (because they're reused from another project, or they have secrets in them that the maintainer can't be bothered to remove, or other reasons), and rarely the build script is an entire project itself, and even more rarely there's nothing there at all except a single file of source code that can't be built alone (I've seen this in particular on several old golang projects, and it's incredibly annoying).
I'm not asking for the engineers brains, I'm asking for more or less what's sitting in the IDE as they work on the project.
Robert has provided that there. Mistral has not.
As an aside, I'm more than capable of editing that code; I've professionally worked on FPGA code and have written a PS1 emulator. Taking that (wonderful looking code) and say, fixing a bug, adding a different interface for the cdrom, porting it to a new FPGA are all things I'm more than capable of.
No, but if the Windows binary code was made available with no restrictive licensing, I'd be quite happy, and the WINE devs would be ecstatic. Sure, the source code and build infrastructure would be nicer, but we could still work with that.
'gary_0' being happy with the license terms isn't what defines 'open source'.
I'm fairly happy with the license terms too. They're just not open source. We dilute the term open source for the worst if we allow it to apply to build artifacts for some reason.
We were talking about "looking a gift horse in the mouth", as in it's still a positive thing regardless of the semantic quibbles about open source. Nobody would argue that a hypothetical openly licensed Windows binary-only release is "open source" and I'd appreciate it if you read my comments more charitably in future.
Source code licenses are naturally quite clear about what constitutes "source code", but things are murkier when it comes to ML models, training data, and associated software infrastructure, which brings up some interesting questions.
> We were talking about "looking a gift horse in the mouth", as in it's still a positive thing regardless of the semantic quibbles about open source
Your gift horse in the mouth comment was visibly an aside in the greater discussion being enclosed in parenthesis.
> Nobody would argue that a hypothetical openly licensed Windows binary-only release is "open source" and I'd appreciate it if you read my comments more charitably in future.
That's why I'm using it as an example metaphor in my favor. It's clearly not open source even if they released it under Apache 2. It's not what their engineers edit before building it.
> Source code licenses are naturally quite clear about what constitutes "source code", but things are murkier when it comes to ML models, training data, and associated software infrastructure, which brings up some interesting questions.
I don't think they're all that murky here. The generally accepted definition being
> The “source code” for a work means the preferred form of the work for making modifications to it. “Object code” means any non-source form of a work.
Is this the form of the work that Mistral's engineers work in? Or is there another form of the work that they do their job in and used to build these set of files that they're releasing?
I'd actually say that including the training data would be like providing the IDE/debugger/compiler rather than the model/checkpoint being analogous. If I hand you Signal's source code you can run it, use it, modify it, etc. All similar characteristics to what is provided here. What they didn't provide to us is how they created that code. You couldn't create that software from scratch by just having these and that's true for any open source project. But I wouldn't say training data is as good as an analogy to peering in the minds of engineers, because it is an important part to getting the final product and analyzing it.
If that's what's needed to work at the level their engineers work on the model.
Which is true of traditional software as well. You don't get to call your binary open source just because you have licensed materials in there you can't release.
Not to be the devil's advocate here, but almost certainly it can be the case that data was used to define heuristics (potentially using automated statistical methods) that a engineer then formalized as code. Without that data that specific heuristic wouldn't exist, at least very likely not in that form. Yet that data does not have to be included in any open source release. And obviously you as a recipient of the release can modify the heuristic (or at least, you can modify the version that was codified), but you can not reconstruct it from the original data.
I know my example is not exactly what is happening here, but the two sound pretty affine to me and there seem to be a fairly blurry line dividing the two... so I would argue that where "this must be included in a open source release" ends and "this does not need to be included in a open source release" starts is not always so cut and dry.
(A variant of this, that happens fairly frequently, is when you find a commit that says something along the lines of "this change was made because it made an internal, non-public workload X% faster"; if the data that measurement is based upon did not exist, or if the workload itself didn't exist, that change wouldn't have been made, or maybe it would have been made differently... so again you end up with logic due to data that is not in the open source release)
If we want to go one step further, we could even ask: what about static assets (e.g. images, photographs, other datasets, etc.) included in a open-source release... maybe I'm dead wrong here, but I have never heard that such assets must themselves be "reproducible from source" (what even is, in this context, the "source" of a photograph?).
That being said, I sure wish the training data used for all of these models was available to everyone...
We just also shouldn't call releases with no source "open source".
I wouldn't really have a complaint with their source being released as Apache 2. I just don't want the term "open source" diluted to including just a release of build artifacts.
It is literally takes 20 min on my PC to prepare multilanguage corpus and train BPE+sentencepiece tokenizer with fairseq. You have all the documentation in there. If you do not know how to use these tools, does not mean they are not there. You literally do not need to edit anything.
I guarantee you there's automation around training this model. There's also the factor of the dataset itself.
And it doesn't matter much if it's perfectly deterministic. Source builds of traditional software aren't typically fully reproducible either. That doesn't change
And I give you better than coin flip odds that it is actually deterministic. The engineers at the big ML shops I've had conversations with have been doing deterministic training for quite some; they believed it was key to training at scale. That's what gives you the "did this model go way off the deep end because of something we did in the model, or because a training GPU is on the fritz".
> The “source code” for a work means the preferred form of the work for making modifications to it.
That's the definition in the GPL. That it is text or binary doesn't matter.
So are the weights the preferred form for making modifications? Partly yes, because of fine tuning, but also no, because you are limited in what you can do with fine tuning. If Mistral had to make major changes to their model, they would probably start with the dataset and code they have but you don't, the one that created the weights file.
So I wouldn't call it "open source", just "open". You can do whatever you want with what you have, but you don't have the same abilities as Mistral to modify the model because you lack some data.
Still, it is a bit of an unusual situation since even with the "real sources", i.e. training data and code, most people wouldn't have the resources to retrain the model, and a big part of the value in these models is the computing resources that were invested in training them.
There's literally a list of opcodes to be executed in the model. There's a whole lot of data too, but that's part of the build just as much as anything in a .data section.
Forgive my ignorance, I haven't studied the AI tooling landscape yet. Are you saying these models have a structured binary format and "running" them is just a matter of having a "player" with the right "codec"?
They are basically a series of intermediate bytecodes to be compiled to the hardware they actually run on, in addition to the large tables of weights that bytecode references.
Just a few billion of them that no person created or understands, which are based on the their input data and not design decisions, which include random initialization. I see no reason to treat them any differently than software in a conversation about OSS.
Model inputs are definitely design decisions, what data to use, how to fine tune them, what methods were used in weighting things. This is LITERALLY the source of the model. The model is a binary like an EXE
You want to see the source code of the transformer that created the compiled llm model, the dictionary corresponding to token values, etc. Instead, you got the weights and binary.
I don't know what's going on in this discussion thread, as to whether you are being gaslit or engaging with people who shouldn't comment.
Right, my interpretation (and I think monocasa's too) is that the training and packaging pipeline is not published therefore the code used to build the model is not open, therefore there is no source available to rebuild the model as Mistral's engineers have done.
In essence the output of the training pipeline is "data" (weights), not code. The way those data are read to make things work is the "model". In the (bad) analogy of the JPEG, the data is the coefficients and the model is the encoding/decoding scheme.
I'm impressed by the patience monocasa has in the face of so many (seemingly intentionally) obtuse responses.
It's remarkably telling that no one who's pushing back has any answer to this line of reasoning.
I wonder how they might benefit from retconning the notion of open source to fit the ways they're describing models now. At this point it seems bad faith in more than just a naive way.
The reason is because "open source" is a vague enough concept anyone can project what they want onto it. Being able to fully replicate the construction of the model is one lens on it, but being able to modify it to suit your own needs is another.
Given that a released base model allows things like fine tuning, generating your own training sets, and so on, it's not obviously closed source in the way the poster is acting like it is. Releasing the full training data and reproduction pipeline would be "more" open source, but having the raw, non RLHF'ed model is extremely open compared to most alternatives beyond that.
"Some people are right but if you tilt your head and squint you can almost understand the argument from the others"
I don't know how to respond to these lines of reasoning anymore except to say that words mean things and it's worth keeping definitions consistent for the sake of our common understanding.
Machine learning models force us to reconsider the definitions or introduce new ones. A published model has characteristics of open source and closed source software, given the extent to which it can be fundamentally modified, inspected, and used to generate derivative works in a way executable binaries cannot.
If a company makes a product that involves a code generator, and publishes the generated code, but not the code generator itself, it’s still open source.
They trained it from scratch, but there is nothing stopping you from doing some more training on top of it yourself. Nothing is obfuscated, no more information is required beyond the prompting syntax... they just released basically no documentation, which is unfortunately kinda standard in AI land.
> The dataset and pipeline to rebuild these weights are not included.
This is a good point.
But its probably not deterministic and reproducible. I bet the researchers themselves couldn't remake the exact weights (or even similar weights) from scratch.
I mean most binaries aren't reproducible. That takes a lot of work and isn't worth it most of the time.
However, I've heard from ML engineers at the big shops that hermetic, reproducible builds are key to making any progress at scale with ML. Apparently that goes all the way back to when Jeff Dean took over Google Brain.
Yep. Even if the initial seed for parameter init, the example shuffling seed, etc were constant, the distributed nature of training (and syncing the model across machines) would kill reproducibility. Not to mention resuming from checkpoints after gradient explosions, etc.
I've heard from ML engineers at larger shops that reproducibility is key to working at scale. That's how you track down "this training regime went to shit because of something we changed" versus "this training regime went to shit because on of the GPUs training it is starting to fail".
The source code to the engine is available and open, but if you want to do anything useful or recreate the original game, you have to provide your own data.
This model is open source, much in a similar way that the Doom engine is open source.
> The source code to the engine is available and open, but if you want to do anything useful or recreate the original game, you have to provide your own data.
The Doom executable is open source. The data is not. Explicitly, the data is under a different license than the GPL and makes no claims about being open source. There are total conversion mods that replace the closed source levels with open source ones however.
> This model is open source, much in a similar way that the Doom engine is open source.
The code I linked is the model source -- the structural skeleton. You can create a model with this source and put whatever weights you want in it -- just like Doom.
If you create a model with this code and save it to disk, it will indeed be 14GB.
It will be random weights -- the contents won't be there -- but it will be 14GB.
If you use only what is open-source about Doom, you will have an unplayable game.
If you use only what is open-source about Mistral, you will have an ineffective model.
Have you tried the HF version which is linked on the main page? It is transformer based after all and it is converted to HF format. Just use QLoRA to fine-tune whatever you want on the top of that model. They handed you hundreds of GPU hours, without asking anything in return. You may throw it away and train the model from scratch. Be ready to throw $70k-$150k into air warming.
> Have you tried the HF version which is linked on the main page? It is transformer based after all and it is converted to HF format. Just use QLoRA to fine-tune whatever you want on the top of that model.
Is that how their engineers built this model in the first place?
Just because they're giving the build artifact of there hard work away doesn't make it open source.
> Be ready to throw $70k-$150k into air warming.
Just because builds are expensive, doesn't mean releasing the build artifacts counts as open source.
You don't iterate on the model weights, you train them on billions of tokens. There was no real iteration, you set it up and wait for the GPUs to do the work. The model architecture itself is done before you start training. If you change it, you generally have to start training from scratch.
You do get the ability to further train the model using fine-tuning or LoRas.
There's absolutely state space iteration in model training. Layer sizes, composition, construction.
> There was no real iteration, you set it up and wait for the GPUs to do the work. The model architecture itself is done before you start training. If you change it, you generally have to start training from scratch.
That's like saying there's no design iteration in software because you type 'make' and the executable is built.
The point is that iteration is before training. You don't iterate post starting the training and presumably this person doesn't want to train the model themselves. If they do, they can do that but they need to get the training data and a huge amount of GPUs.
Recent models may not be fully open source, but could you point me to one that's subject to the same amount of scrutiny as Windows source code access? Because I'm not sure if there is one out there.
Microsoft has other mechanisms for sharing source and is remarkably open. The symbol servers are relatively public as well. They are at a scale where they rely more on legal strength to keep their IP rather than it not being externally accessible in the first place, at least for Windows.
So some of us just arrived back on Earth after quite a while in outer space.
So ... WTF is Mistral 7B anyway? The article doesn't appear to show this basic information anywhere. You'd expect a website site like this to give that information right up front. Even in a well-signposted sub-page. And what sort of things could I use it for?
The only 'Mistral' that sprang quickly to mind for me was an electric fan. Followed by the French wind itself, and then the French ships built for the Russians.
'Mistral 7B' has an identity crisis, to start with.
"Mistral AI team is proud to release Mistral 7B, the most powerful language model for its size to date.
Mistral 7B is a 7.3B parameter model that...."
So, it's a powerful language model that has 7.3B (billion) parameters.
If you've been away long enough to not know what that means, this release article isn't going to be for you anyway, and so you may want to start with reading up on what a Large Language Model is and can do, how they have developed in the last few years, and then go from there.
These days 7B, 13B, 70B etc. in an article title can only mean the release of a new LLM (Large Language Model) with that many (7 billion, in this case) parameters. Mistral is the name the creators have given it.
Meta wouldn't make LLama open source.
DeciLM wouldn't make theirs open source.
All of them wanted to claim they were open source, while putting in place restrictions and not using an open source license. So I think it's worth giving Mistral big kudos here for actually doing it and going Apache licensed.