This reminds me a recent chat I had with Claude, trying to identify what looked like an unusual fossil. The responses included things along the lines of "What a neat find!" or "That's fascinating! I'd love it if you could share more details". The statements were normal nice things to hear from a friend, but I found them pretty off-putting coming from a computer that of course couldn't care less and isn't a living thing I have a relationship with.
This sort of thing worries me quite a bit. The modern internet has already sparked an awful lot of pseudo or para-social relationships through social media, OnlyFans and the like, with serious mental health and social cohesion costs. I think we're heading into a world where a lot of the remaining normal healthy social behavior gets subsumed by LLMs pretending to be your friend or romantic interest.
Trying to find the silver lining in this makes me God's advocate I guess?
I was able to reflect a lot on my upbringing by reading reddit threads. Advice columns, relationships, parenting advice, just dealing with people. It was great to finally have a normalized, standardized world view to bounce my own concepts off. It was like an advice column in an old magazine, but infinitely big. In my early 20s I must have spent entire days on there.
I guess LLMs are the modern, ultra personalized version of that. Internet average, westernized culture, infinite supply, instantly. Just add water and enjoy a normal view of the world, no matter your surroundings or how you grew up. This is going to help out so many kids.
And they're not evil yet. Host your own LLMs before you tell them your secrets, people.
It was great to finally have a normalized, standardized world view to bounce my own concepts off. It was like an advice column in an old magazine, but infinitely big. In my early 20s I must have spent entire days on there.
I guess LLMs are the modern, ultra personalized version of that. Internet average, westernized culture, infinite supply, instantly.
That's a really interesting way to put it and actually made me look back at my own heavily internet-influenced upbringing. Setting healthy personal boundaries? Mindfulness for emotional management? Elevated respect for all types of people and ways of life beyond what my parents were exposed to? Yes. These were not automatically taught to me by my inherited culture or family. I would not have heard about them in a transformative way without the internet. Maybe passively, as something "those weird rich people" do, but not enough to become embedded in my mental operating system. Not to disparage the old culture. I still borrow a lot from it, but yeah I like westernized internet average culture.
I’m in the same boat. And judging by the people I’ve met at Google and FB (before it was Meta) a lot of us are refugees from conservative minded illiberal cultures within North America, Asia, and Europe. Memes are our currency. A lot of the internal cultures of these two companies are steeped in formative memes of those born in the mid-80s who only had the internet to find their people in the early 2000s.
Although I agree with you and GP, there are cynics who will say: "Ha! You think that the totality of Reddit posts is some kind of normalized, standardized, Internet average world view? HA!" There are people deep in ideological bubbles that think Reddit is too liberal! or Reddit is too young! or Reddit is too atheist! or other complaints that amount to "The average Internet-Person doesn't match what I think the average should be!" and they would not be interested in using that ISO Standard World View for anything.
I have a feeling if there is a market for this kind of LLM sounding board, the software writers will need to come up with many different models that differ ideologically, have different priors, and even know different facts and truths, in order to actually be acceptable to a broad swath of users. At the limit, you'd have a different model, tailored to each individual.
I was also quick to dive into early internet forums and feel like I got a lot out of them, but LLMs just seem different. Forums were a novel medium but it was still real people interacting and connecting with each other, often over a shared interest. With LLMs none of the social interactions are genuine and will always be shallow.
I'm sure some nerds will continue to host their own models but I would bet that 99.9% of social-type LLM interactions will be with corporate hosted models that can and will be tweaked and weighted in whatever ways the host company thinks will make it the most money.
It all reminds me a lot of algorithmic social media feeds. The issues were forseen very early on even if we couldn't predict the exact details, and it's an unsurprising disappointment that all of the major sites have greatly deemphasized organic interactions with friends and family in favor of ads and outrage bait. LLMs are still in their honeymoon phase but with the amount of money being plowed into them I don't expect that to last much longer.
>I think we're heading into a world where a lot of the remaining normal healthy social behavior gets subsumed by LLMs pretending to be your friend or romantic interest.
I loved when Gemini called out what I thought was a very niche problem as classic. I think there are very few people attempting this stack, to the point where the vendor's documentation is incorrect and hasn't been updated in two years.
"Ah, the "SSL connection already established" error! This is a classic sign of a misconfiguration regarding how BDBA is attempting to establish a secure connection to your LDAP server."
I spent a good half hour "talking" to 4 mini about why Picard never had a family and the nature of the crew as his family despite the professional distance required. It really praised me when I said the holodeck scene where Data is playing King Henry walking amongst his men and I felt pretty smart and then realized I'd not actually garnered the admiration of anyone or anything.
I think there's a similar trap when you're using it for feedback on an idea or to brainstorm features and it gives you effusive praise. That's not a paying customer or even a real person. Like those people you quickly learn aren't too useful for seeking out over feedback because they rave about everything just to be nice.
> “…I found them pretty off-putting coming from a computer that of course couldn't care less and isn't a living thing I have a relationship with.”
I’ve prolly complained about it here, but Spectrum cable’s pay by phone line in NYC has an automated assistant with a few emotive quirks.
I’m shocked how angry that robot vote makes me feel. I’m not a violent person, but getting played by a robot sets me over the edge in my work day.
Reminds me of a BoingBoing story from years ago about greeter robots being attacked in Japan. Japan has a tradition of verbally greeting customers as they enter the building and large department stores will have dedicated human greeters stationed at the entrance. IIFC this was a large store who replaced human greeters with these robots. Rando customers were attacking these robots. I now know how they feel.
After giving me continuous wrong answers ChatGPT decided it would try allow me to indulge it in a "learning opportunity" instead.
I completely understand your frustration, and I genuinely appreciate you pushing me to be more accurate. You clearly know your way around Rust, and I should have been more precise from the start.
If you’ve already figured out the best approach, I’d love to hear it! Otherwise, I’m happy to keep digging until we find the exact method that works for your case. Either way, I appreciate the learning opportunity.
Which reveals strong reasons to suspect strong "parroting" qualities. "Parroting" qualities that should have been fought in implementation since day 0.
I always laughed when ChatGPT would reply with the same emoji I typed, regardless of context. Not sure if parroting exactly, but I assumed it would understand meaning (if not context) of emoji?
Its fun to pick it apart sometimes and get it to correct itself but you often would never know unless you had direct or deep cut knowledge to interrogate it from
And we have tackled the issue long ago with education.
What are you trying to imply? Imitating fools or foolery is not a goal, replicating the unintelligent is not intelligence - it is strictly undesirable.
I'm wondering how small of a model can be "generally intelligent" (as in LLM intelligent, not AGI). Like there must be a size too small to hold "all the information" in.
And I also wonder at what point we'll see specialized small models. Like if I want help coding, it's probably ok if the model doesn't know who directed "Jaws". I suspect that is the future: many small, specialized models.
But maybe training compute will just get to the point where we can run a full-featured model on our desktop (or phone)?
> Like there must be a size too small to hold "all the information" in.
We're already there. If you running a Mistral-Large-2411 and Mistral-Small-2409 locally, you'll find the larger model is able to recall more specific details about works of fiction. And Deepseek-R1 is aware of a lot more.
Then you ask one of the Qwen2.5 coding models, and they won't even be aware of it, because they're:
> small, specialized models.
> But maybe training compute will just get to the point where we can run a full-featured model on our desktop (or phone)?
Training time compute won't allow the model to do anything out of distribution. You can test this yourself if you run one of the "R1 Distill" models. Eg. If you run the Qwen R1 distill and ask it about niche fiction, no matter how long you let it <think> for, it can't tell you something the original Qwen didn't know.
I suppose we could eventually get to a super-MoE architecture. Models are limited to 4-16GB in size, but you could have hundreds of models on various topics. Load from storage to RAM and unload as needed. Should be able to load up any 4-16GB model in a few seconds. Maybe as well as a 4GB "Resident LLM" that is always ready to figure out which expert to load.
> We're already there. If you running a Mistral-Large-2411 and Mistral-Small-2409 locally, you'll find the larger model is able to recall more specific details about works of fiction.
Oh, for sure. I guess what I'm wondering is if we know the Small model (in this case) is too small -- or if we just haven't figured out how to train well enough?
Like, have we hit the limit already -- or, in (say) a year, will the Small model be able to recall everything the Big model does (say, as of today)?
It's a sliding scale based on what you consider "generally intelligent" but they're getting smaller and smaller. This 27b model is comparable to 400b models not much over a year ago. But we'll start to see limits on how far that can go, maybe soon.
You can try different sizes of gemma3 models, though. The biggest one can answer a lot of things factually, while the smallest one is a hilarious hallucination factory, and the others are different levels in between.
> It comes in sizes from 1 billion to 27 billion parameters (to be precise: 1B, 4B, 12B, 27B), with the 27B version notably competing with much larger models, such as those with 400B or 600B parameters by LLama and DeepSeek.
Maybe Llama 3.3 70B doesn't count as running on "one GPU", but it certainly runs just fine on one Mac, and in my tests it's far better at holding onto concepts over a longer conversation than Gemma 3 is, which starts getting confused after about 4000 tokens.
Gemma 3 is a lot better at writing for sure, compared to 2, but the big improvement is I can actually use a 32k+ context window and not have it start flipping out with random garbage.
Technically, the 1.58-bit Unsloth quant of DeepSeek R1 runs on a single GPU+128GB of system RAM. It performs amazingly well, but you'd better not be in a hurry.
I've looked into it and the only sane answer right now is still "If it flies, floats, or infers, rent it." You need crazy high memory bandwidth for good inference speed, and that means GPUs which are subject to steep monopoly pricing. That doesn't look to be changing anytime soon.
Second place is said to be the latest Macs with lots of unified memory, but it's a distant second place.
The recently announced hardware from nvidia is either underpowered, overpriced, or both, so there's not much point waiting for it.
... then this generation of RTX Pro hardware sounds better to me. At the end of the day I don't know anything I didn't see on YouTube or /r/LocalLLaMA, though.
My instinct is that it would be cheaper overall to buy API credits when needed, compared with buying a top-of-the-line GPU which sits idle for most of the day. That also opens up access to larger models.
It's a choice. Running local means personal safety and privacy. It could also mean easier compliance with any enterprise that doesn't want to share data.
Agrees with my own experience. I have a 4070 super which of course is nothing to brag about, but tps using quantized 27b model is miserable. I could go down to 12b or even smaller, but it would sacrifice in quality. Then I could upgrade my gears, but I realize that however much I spend, the experience is not going to be as smooth as off-the-shelf LLM products, and definitely not worth the cost.
Of course it is nice to have an LLM running locally where nobody gets to know your data. But I don't see a point in me spending thousands of $ to do that.
Does anyone use GoogleAI? For an AI Company with an AI Ceo using AI language translation, I think their actual GPT products are all terrible and have a terrible rep. And who wants their private conversation shipped back to google for spying?
Right, this thread is about which models are better than Gemma-3-27B.
I'm a fan of Mistral Small 3 personally but I've not spent enough time with it, Gemma and the new Mistral Small 3.1 to have an opinion of which of those is the "best" model.
If I ask qwq32 anything that is even slightly complicated it will ramble until it exceeds the context window, then forget my question. Q4k which is all that fits (with context) on a 3090.
Gemma3 27B gives me a rapid 1shot response, and actually works really well for the type of rubber duck brainstorming partner I often need.
Try giving it a river crossing puzzle with substitutions. QwQ can take a lot of time but it will solve it. Gemma will just confidently give you a wrong answer, and will keep giving you wrong answers if you point out the mistakes.
Now, yes, QwQ will take a lot of tokens to get there (in one case it took it over 5 minutes running on Mac Studio M1 Ultra). Nevertheless, at least it can solve it.
Yeah, but how many river crossing puzzles and murder mystery games was it trained on, and how many times do I actually need to solve a river crossing puzzle?
Anything that requires reasoning rather than regurgitating. For a simple example, try the classic river crossing puzzle with non-trivial substitutions. Gemma can't solve it even if you keep pointing out where it fucks up. To be fair, this also goes for all non-CoT 70B models, so it's not surprising. But QwQ can solve it through sheer persistence because it can actually fairly reliably detect when it's wrong, and it just keeps hammering until it gets it done.
It's an example of a problem that requires actual reasoning to solve, and also an example of a "looks similar therefore must use similar solution" trap that LLMs are so prone to.
Translating this to code, for example, it means that Gemma is that much more likely to pretend to solve a more complicated problem that you give it by "simplifying" it to something it already knows how to solve.