Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Can you imagine if Excel just quietly adjusted formulas in the background, and you didn't know the numbers weren't right?

Or if Excel just said, Sorry, you can't use that formula with this formula? Or with these types of numbers, or this shape of data, etc?

 help



They implemented both those things, but only apologized for the first. They’re doubling down on the second.

My limited experience with fable over the last few days suggests (1) I can’t see any improvement in output, and (2) it is useless for writing secure software because it constantly hits safety walls if you ask it to close security holes.

I’m definitely shopping around for other LLM providers next week, and testing vs local (target: 128GB strix halo - any war stories?)


With 128 GB strix halo, you can't do as big of a model as you would think. You can do larger than having a single graphics card, of course, but that 128 gigs cannot all be dedicated to the model. Remember, the context alone is usually larger than the model itself. I got an EVO X2, and I don't regret it, but by my current calculations, it will take 8 years to recoup the cost, as opposed to just using equivalent, paid commercial options.

A key consideration in favor of running your local LLM despite all the trouble: The commercial serving endpoint may not exist tomorrow, or at least not at the same price.

My current rule of thumb is 1GB gets you 1B parameters with a big context. (Qwen 32B fits in 32GB with 200K+ contexts)

That’s with heavy compression of the weights and the context, of course.

I haven’t gone through model evaluation + shoehorning at 128GiB yet.


the output is definitely better. and i find it crazy how every time a new model comes out people trip over themselves to say how much worse it is than previous models, when in fact that is basically an impossibility. like, they've got the numbers, man-- you only release a new model when the numbers get gooder. the burden of proof is on the "didn't get better" side, not the "prove that it's better" side, because the architecture itself (1) only works because of how giant the training data / eval / etc. sets are and (2) has a fractal property of becoming strictly deeper and more thoughtful when you just click and drag the edge up and to the right (obviously AI research is harder than this, but that doesn't make the general point untrue). i say this especially because the scuttlebut is that this model genuinely is a shift-click-expand moreso than any sort of architectural "new science" or anything.

this is exactly why hypotheses come before the experiment in the scientific method.


You're wrong in lots of ways.

Some model cards do show regressions on benchmarks for newer models on specific tasks: https://storage.googleapis.com/deepmind-media/Model-Cards/Ge...

This wasn't a new model but updates to models backed by numbers being better can make the model worse: https://openai.com/index/sycophancy-in-gpt-4o/

The slight increases in performance/benchmarks may be just noise: https://arxiv.org/pdf/2602.07150


That analogy is... Not inappropriate, but I think it could confuse by being compatible with two different problems, where only one is the target of today's controversy.

1. The sloppy/unpredictable behavior of LLMs as a general class of algorithm, how you shouldn't use document-generation for calculating budgets, and you shouldn't trust it to not-alter things you "asked" it to to alter.

2. Vendors of thing-as-a-service (not necessarily only LLMs) putting in traps and sabotage to prioritize their own business-model or economic incentives.


Can you imagine if printers just refuse to print something just because a few circles are arranged in this shape?

https://en.wikipedia.org/wiki/EURion_constellation


I would say if Excel instead of failing when you divide by 0 would be instead secretly changing it to a value like 0.0001

Have you ever sent your excel file to someone who uses different locale?

Not really, the purpose of Excel is pretty clear cut and the scope is small.

Preventing a human-like general purpose textbot from engaging in certain discussions and performing certain tasks seems like a natural thing to do given the massive scope of its capabilities. None of these tools are sold with free license to do whatever with them anyway.


No. Excel is a general purpose tool that can be used for calculating tasks that are good, neutral, or evil things. It's a fancy calculator.

> the purpose of Excel is pretty clear cut and the scope is small.

That has to be the understatement of the century.


I don’t think excel can give me the instructions for building a house or how to cook a particular meal or write my emails for me. The potential output of LLMs is quite obviously more broad than excel.

What’s the point when they will remove those guardrails when competition reaches their levels. Shows that they don’t Reddit care about “safety” at all

you invest billions of dollars many months of work to just everyone distill your model?

>be me

>anthropic

> mine the internet for data, blasting millions of blogs with scrapers

>a few have to shut down, but that's just the price to pay

>finally, the chatbot is ready

>learn that there are EVIL cretins out there trying to scrape automated output from OUR product to build their chatbot

>build in safeguards to new model to stop this

>the users are mad, now the model accuses users of being bioterrorists if they so much as mention they have a cold

>mfw


Seriously... the gaul of people just scraping a model for free data!

You wouldn't download an LLM for free, would you?

It's the game. Because consumers reject it otherwise.

Why go to bat for anti-consumer behaviors unless you are a shareholder?

Their billions are not my problem; but the money I pay them and service I get in return, is. And if they can't provide, I will shop elsewhere (and do).


You invest billions of dollars in hosting and benefit from hundreds of millions of man hours of human output, just so everyone trains on "your" data?

Science can be expensive. New findings that get released to the public for free sometimes have taken billions of dollars of investment to get.

That might be an indication that the business is not sustainable because there is not any technical or practical differentiator besides scale. Harming your customers to maintain that differentiation isn't sustainable either.

any intellectual labor is not sustainable, if anyone can copy your data. why have microsoft, i you can just copy windows and run it?

Have you copied Windows and tried to run it? I would love to see the plain text source code that you claim to have. We all would.

half of the developing world did. guess what it stopped a bit the trend? protection.

Did it really? Here in my <large 3rd world country> at least, afaik no one's stopped pirating. The tools to activate may have changed but haven't gone away.

There is a difference between being able to validate a Windows license and copying Windows from source code.

If we are talking about distillation vs building from scratch, none of these are congruent to Windows. I can build my own LLM [0] and then distill off of Claude, but that is not the same as a 1:1 copy of an operating system because there was the ability to crack how licensing works. We are not seeing Windows clones, at the source level, for that reason.

Also, Linux exists. Anyone can copy that. Why doesn't that count?

[0] https://huggingface.co/docs/transformers/quicktour




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: