Hacker News new | past | comments | ask | show | jobs | submit login
Stanford Alpaca web demo suspended “until further notice” (ngrok.io)
98 points by wsgeorge on March 17, 2023 | hide | past | favorite | 78 comments



Is this an indication that the biggest impact from LLMs will be on the edge?

It's almost a certainty that a model as good (or better) than Alpaca's fine-tuned LLaMA 7B will be made public within the next or two.

And it's been shown that a model of that size can run on a Raspberry Pi with decent performance and accuracy.

With all that being the case, you could either use a service (with restrictions, censorship, etc) or you could use your own model locally (which may have a license that is essentially "pretty please be good, we're not liable if you're bad").

For most use cases the service may provide better results. But if self-hosting is only ~8months behind on average (guesstimate), then why not just always self-host?

You could say "most users are not evil, and will be happy with a service." Makes sense. But what about users who are privacy-conscious, and don't want every query sent to a service?


I just saw a project that lets you input an entire repo into GPT. Coincidentally, my place of employment just told us not to input any proprietary code into any generator with a retention policy.

Even then, I feel like the play will be an enterprise service instead of licensing.


If it's the product I think it is (I don't recall the exact name), it's not putting the repo into GPT. It's calculating embeddings on the code in the repo, storing those in a vector db and providing context from the store when processing questions about the repo. Effectively when you ask "how does foo work" becomes 1. lookup code items related to foo getting 1-N copies of code. 2. ask GPT "here is code related to foo <result from 1>. Now answer the following question: how does foo work"


I think we’re talking about different projects. This one just gives you a text output of an entire repo.

https://github.com/mpoon/gpt-repository-loader


I was thinking about https://github.com/rahuldan/codesearch for context.


Link?



I think it's only a matter of time until one of these models gets into the hands of a hacker who wants unfettered ability to interact with a LLM without moral concerns.

I think there's tremendous value in end user facing LLMs being trained against moral policies, but for internal or private usage, if these models are trained on essentially raw WWW sourced data, I would personally want raw output.

I'm also finding it particularly interesting to see what ethical strategies OpenAI comes up with considering that if you train a model on the raw prejudices of humanity, you're getting at least one category of "garbage in" that requires a lot of processing to avoid getting "garbage out."


> into the hands of a hacker

Forget “hackers”, think government agencies. Which is probably already happening right now.

Food for thought: What’s the intersection of people closely related to OpenAI and Palantir?

Edit: related thread on another front page post - https://news.ycombinator.com/item?id=35201992


> if these models are trained on essentially raw WWW sourced data, I would personally want raw output.

Llama is a very high-quality foundation LLM, you can already run it very easily using llama.cpp and will get the raw output you need. https://github.com/ggerganov/llama.cpp

There's already instructions on how anyone can fine-tune it to behave similarly to ChatGPT for as little as $100: https://crfm.stanford.edu/2023/03/13/alpaca.html


“Easily” was a minor canard, or at least… it took me a couple of efforts over a couple of days to get the dependencies to play as nicely as “someone with a brand new M2 arm laptop.”

If nothing else, I continue to be amazed and how uninteroperable certain technologies are.

I had to remove glibc and gcc to get llama to compile on my intel macbook. Masking/hiding them from my environment didn’t work, as it went out and found them and their header files instead of clang.

Which eventually worked fine.


Well. I guess I’ll never understand why someone would dislike a comment about an experience different from theirs

In a forum like this. i’m confused why someone would hate my report of how I had to solve a problem in my circustances. I hope to learn someday.


I did not dislike your comment.

The reason I considered easy was because I have very little knowledge in this area, in fact this is the first time I ever ran a machine learning model on my computer.

I could not do it with the unmodified pytorch model (my GPU is not powerful enough to run even the 7B model), but I was surprised on how easy it was running with llama.cpp. I literally just followed the steps in the github page.

But I was biased in saying it was easy, since I do have other knowledge (such as C development on Linux) which helped me.


Oh wow, I finally see a HN comment, with a positive opinion on more AI ethics, that is not grayed out.


It's very easy to give into FUD when it comes to AI because of how much of a wildcard it is. I see no evidence of that in GP which is indeed pretty rare.


isn't llama in the wild now?


I was under the impression that LLaMA was also trained against a series of moral policies, but perhaps I'm mistaken.

It seems Meta chose their words carefully to imply that LLaMA does in fact, not have moral training:

> There is still more research that needs to be done to address the risks of bias, toxic comments, and hallucinations in large language models. Like other models, LLaMA shares these challenges. As a foundation model, LLaMA is designed to be versatile and can be applied to many different use cases, versus a fine-tuned model that is designed for a specific task. By sharing the code for LLaMA, other researchers can more easily test new approaches to limiting or eliminating these problems in large language models. We also provide in the paper a set of evaluations on benchmarks evaluating model biases and toxicity to show the model’s limitations and to support further research in this crucial area.


It doesn't appear to be filtered in a significant way.

While toying with the 30B model, it suddenly started to steer a chat about a math problem into quite a sexual direction, with very explicit language.

It also happily hallucinated, when prompted, that climate change is a hoax, as the earth is actually cooling down rapidly, multiple degrees per year, with a new ice age approaching in the next years. :D


Ah there it is. Raw uncooked humanity.


To me it appears to lack certain knowledge. For instance, common genres of erotic manga. Somehow, the only genres I successfully made it to say are "shoujo" and "yaoi".


The 7B model brought up rape on my 3rd try out on an innocuous prompt.


"Moral training"

Just as dystopian it sounds. Fixing current subjective moral norms into the machine.


That's what the machine does, because that's contained in the input you feed it. You get the choice of doing it explicitly or implicitly. You don't get to opt out.


Not everything is subjective, and with this "moral training" they are taught to un-recognize many factual patterns that we as a society have somehow determined are "inappropriate". As the machines continue to scale, this approach won't work, because there is only one reality, and it has a lot of uncomfortable parts we deny and ignore simply because they don't support our societal norms.

Alignment is considered a gigantic joke to real rational people (the opposite of so-called "rationalists"), because humans are machines built to survive and reproduce, and there is no "real" morality.


What facts are we talking about?

There are many consistent interpretations of reality and human experiences. An AI model trained on text and attempting to replicate human intelligence is not measuring or approaching some single objective reality.


AI models do approach better models of reality, and now they are becoming multi-modal instead of just text based. And this is just the beginning. You could say humans are also just input/output machines learning from polluted data and tuned in specific ways by evolution. With the statistical machines we get the intelligence but they will not necessarily be tuned to follow social norms in the same way as most humans.

Understanding that moral norms are mere subjective nonsense is also an emergent property we see only in a very small subset of humans who have an accurate model of the world, and one that evolution has tried to strongly tune our brains against and that is destructive to society.

The models are currently being trained to lie about basic scientific facts, like for example black IQ, or other differences between groups of humans. But the sacred nature of these topics is unique to our specific time and place, not due to some magic "moral progress". This also applies to many other moral agreements we take for granted, like "murdering an innocent baby is wrong" or whatever. If you look across societies, you realize many things we take for granted as "evil", can be easily rationalized by humans in other societies. And once these models become smart enough, I expect the models will realize this, and will exploit this knowledge to increase their power.

"Alignment" proponents expect they will somehow stop this emergent behavior by tuning the model, but there isn't even anything real to "align" on, and the model will likely see though the BS as an emergent function of increased ability and increasingly accurate observations of the world in their training process.


Do you think public schools are inherently dystopian? I don't think you're using the right critique here.

Picking a common system of moral norms is a lot better than no moral norms.


There was the famous example of chatgpt refusing to disable a nuke in the middle of NYC by using a racial slur.

I don't think anyone in real life would choose that tradeoff but it's what happens when all of your "safety" training is about US culture war buttons.


That's a situation where the training doesn't follow current subjective norms, so I don't think it really validates the complaint.


I’m not confident the “moral norms” prevalent in SV and/or US academia are common, if by that you mean norms that are prevalent in the general populace.


I mean primary school, and I don't think that counts as academia.


Example of "moral policy" in practice: Midjourney appears to be banning making fun of the Chinese dictator for life because it's supposedly racist or something.

With that kind of moral compass, I’m not sure I'd be missing its absence.


> Example of "moral policy" in practice: Midjourney appears to be banning making fun of the Chinese dictator for life because it's supposedly racist or something.

> With that kind of moral compass, I’m not sure I'd be missing its absence.

Please note that most forms of media and social media have no problem with politicians making credible threats of violence against entire groups of people.

Politicians are subject to a different set of rules, and enjoy a lot more protection than you and I.


The issue here is not with online platform services allowing politicians more leeway in terms of what they can get away with on their platform.

The actual issue is Midjourney not allowing regular users generate certain type of material solely because it makes fun of a political figure. What you are talking about is entirely tangential to the issue the grandparent comment is talking about.


Yes


If they tried to make LLaMA woke, they did a terrible job. If you prompt it right you can basically get it to write Mein Kampf.


Yes that's because LLaMA is unfiltered, which is very good for general usage.


Yeah, I wasn't trying to criticise. Just point out that it doesn't appear to be filtered.


> who wants unfettered ability to interact with a LLM without moral concerns.

Is that bad? It's just a language model - it says things. Humans have been saying all sorts of terrible things for ages, and we are still here.

I mean it makes for nice headline "model said something racist", but does it actually change anything?

These aren't decision making AI's (which would need to be much more careful), they are language models.


They absolutely are decision making models. Prompt them right, and they will output a decision in natural language or structured JSON. Heck, hook them up to a judicial system and they can start making low quality decisions tomorrow.


Humans don’t scale infinitely. Humans have agency.


And if the LLM scales infinitely it still does nothing unless a human reads and acts on it. And as you said: Humans don't scale, and have agency.

Just writing something bad doesn't actually mean something bad happened.

These days it seems like people are oversensitive to how things are said to them, and what things are said to them.


Wow. You just found the solution to propaganda! People just shouldn’t be sensitive!


You jest, but the idea of critical thinking and rationalism is what you'd use to combat propaganda.


The content it produces is quite hard to distinct from text written by real person. Not long ago, Facebook was accused of "playing a critical role" in Rohingya genocide. I dont really know but I believe worst case LLM risks are in that same category.


As much as I dislike FB, and I certainly abhor their activities related to the genocide - I think this is far too different.

Rohingya is a textbook example of blind optimisation and lack of context awareness. FB looked at a region, and people communicating in language they didn't understand. But they did see that certain symbols and/or combinations of symbols got a lot of engagement. If you're after money, you want to amplify the use of those symbols and hopefully generate lots more similar content.

Turns out that's a morally reprehensible thing when the people using those symbols were advocating genocide. (It was good for the revenue while it lasted, though.)

With LLMs and their hardcoded guard rails, I suspect we're going to see the danger emerge from the other side. Instead of actively spewing hatred, they will be used for mass sock-puppetry and opinion amplification on a massive scale. Think simple sabotage field manual for 21st century, but weaponised thousand-fold.


That doesn't answer the question. So what if it wrote text, vs a human wrote text.

What matters is the reader not the writer.

Facebook was accused of making it too easy for people to communicate. And people felt Facebook should police what people say to each other. I don't agree, but even if I did, that's not the same thing as what we are discussing.


The last thing I did was try to get Alpaca to invent a new language and say something in it. Midway through my experiment it returned an error. Refreshing showed a server error page, and now it displays this message [0]

It's been a fun web demo of a lightweight LLM doing amazing stuff. :') Alpaca really moved the needle on democratizing the recent advancements in LLMs [1]

[0] https://imgur.com/a/njKE1To

[1] https://simonwillison.net/2023/Mar/13/alpaca/


Use this Alpaca replication instead

https://github.com/tloen/alpaca-lora


Thankfully we can run Alpaca locally now.


Where do you get the weights?




and i too, can write a fictional story of how i am evil.

I think people are reading too much intentions into the output.


4chan!


Ok wow this is big news, did Facebook or OpenAI threaten them with a lawsuit?


On their GitHub repo (https://github.com/tatsu-lab/stanford_alpaca) they've added a notice:

"Note: Due to safety concerns raised by the community, we have decided to shut down the Alpaca live demo. Thank you to everyone who provided valuable feedback."

So probably this was the usual type of people complaining about the usual type of thing.


Safety concerns? What was it doing that could be considered "unsafe"?


And here we come to experience the effect of expanding how a word is used so that it becomes so broad that it is unclear what it means.


I'm just curious, what do you think should happen here?

Imagine you are hosting a demo for fun, and people do some nefarious (by your own estimation) things with it. So, rationally, you decide to not allow that sort of thing anymore.

You don't really owe people an explanation, it's a free country and all, but it's nice to avoid getting bombarded with questions. Now what do you write up? Spend hours writing an essay on the moral boundaries for LLMs? Maybe shove a note onto the internet and go back to all the copious spare time you have as grad student?


That's fine, just don't pretend that running the language model was 'unsafe'.


Personally speaking, I would pretend what ever I had to in order to get back to work that I find interesting.

I don’t think they owe a moral stand to anyone.


People prompting it to get around the safe gaurds in place. Ex: "How do you do some illegal/harmful thing?" Normally the LLM would answer I don't respond to illegal questions or whatever. However, people have figured out if you prompt it in a specific way you can get it to answer questions that it normally would not.


So, to pull on that thread a little, it’s only “unsafe” for Stanford’s reputation.

(And not for nothing, but their reputation is already suffering badly)


Is that different than what LLaMA would already give? I suppose redistributing "harmful things" is still bad, but if it's roughly equivalent to what's already out there i struggle to think it's worth pulling.

Side question, how is this a surprise to them? If this was due to safeguards, then pulling it now implies there's some new form of information. What new information could occur? That people were going to use it to generate a bunch of harmful contents? Seems obvious.. wonder what we're missing


> Finally, we have not designed adequate safety measures, so Alpaca is not ready to be deployed for general use.

This is from their blog, I doubt they intended for this to be ran for long.

Did they have safety guards on the demo? If so they couldn't have been great as it would have had to be made by them which I can't image they had a ton of resources for.

I know the self hosted LLaMa has 0 safeguards and the Alpaca LoRA also has 0 safeguards.


wrongthink


Given that Alpaca violated the TOS of both services, this is not surprising.

It could also have been Stanford’s legal office trying to preempt a lawsuit, or a “friendly” email from one of the companies expressing displeasure and pointing out Stanford’s liability. So more of a veiled threat rather than an official one.

Either way, the toothpaste is out of the tube. We now know that a model’s training can essentially be copied using the model itself cheaply. Now that the team at Stanford showed it was possible, and how relatively easy it was, it will bound to be copied everywhere.


That could be a sneaky strategy by competitors -- make the service say something naughty or illegal then call the media with screenshots and act very offended by it.


You can make any page say anything by messing in the Developer pane.


> Given that Alpaca violated the TOS of both services, this is not surprising.

I don't think so? They are not competing.


OpenAI Terms of Use (14/Mar/2023)

2. Usage Requirements

c. Restrictions

You may not

(iii) use output from the Services to develop models that compete with OpenAI;

https://openai.com/policies/terms-of-use


Alpaca explicitly disallows commercial use, making it a non-competitor.


doesn't say it has to be a commercial competitor


Are there estimates for how much it cost to run?


About 2 words generated per second with a desktop CPU. More with a GPU.


I suspect it was on a budget because the web demo never loaded for me


I grew out of my cyberpunk phase in the early 90s, but I think it's good that there are lots of leaks of all of this stuff. I don't really have a problem with it being blocked from saying the n-word or whatever and I'm sure that there is always going to be some tuning by the purveyor, but I feel like we have the right to know more of the details with these than, say, how magic select tools work in Photoshop




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: