Hacker News new | past | comments | ask | show | jobs | submit | IanCal's comments login

> I think the ability to swap out APIs just isn't the bottleneck.. like ever

It's a massive pain in the arse for testing though. Checking which out of X number of things performs the best for your use case is quite annoying if you have to have X implementations. Having one set that you swap out keys and some vars makes this massively easier.


A really important thing is the distinction between performance and utility.

Performance can improve linearly and utility can be massively jumpy. For some people/tasks performance can have improved but it'll have been "interesting but pointless" until it hits some threshold and then suddenly you can do things with it.


Are you sure you're using the same models? G2.5P updated almost exactly a week ago.

G2.5P might've updated, but that's not the model I noticed a difference. o3 seemed noticeably dumber in isolation, not just compared to G2.5P.

But yes, perhaps the answer is that about a week ago I started asking subconsciously harder questions, and G2.5P handled them better because it had just been improved, while o3 had not so it seemed worse. Or perhaps G2.5P has always had more capacity than o3, and I wasn't asking hard enough questions to notice a difference before.


Ish - it always depends how deep in the weeds you need to get. Tokenisation impacts performance, both speed and results, so details can be important.

I maintain a llama.cpp wrapper, on everything from web to Android and cannot quite wrap my mind around if you'd have any more info by getting individual token IDs from the API, beyond what you'd get from wall clock time and checking their vocab.

I don’t really see a need for token IDs alone, but you absolutely need per-token logprob vectors if you’re trying to do constrained decoding

Interesting point, my first reaction was "why do you need logprobs? We use constrained decoding for tool calls and don't need them"...which is actually false! Because we need to throw out those log probs then find the highest log prob of a token meeting the constraints.

Haha yeah. I’ve seen you mention the llama cpp wrapper elsewhere, it sounds cool! I’ve worked enough with vLLM and sglang to get angry at xgrammar, which I believe has some common ancestry with the GGML stack (GBNF if I’m not mistaken, which I may be). The constrained decoding part is as simple as you’d expect, just applies a bitmask to the logprobs during the “logit processing” and continuing as normal.

Do we have the vocab? That's part of the point here. Does it take images? How are they tokenised?

>If entire commit is generated by AI then it is obvious what created it - it’s AI.

Whether it's committed or not is irrelevant to the conclusion there, the question is what was the input.


For something like a compiler where the output is mostly deterministic[0] I agree. For an AI that was trained on an unknown corpus, and that corpus changes over time, the output is much less deterministic and I would say you lose the human element needed of copyright claims.

If it can be shown that for the same prompt, run through the AI several times over perhaps a year, results in the same output - then I will change my mind. Or if the AI achieves personhood.

[0] Allowances for register & loop optimization, etc.


> So it can save calendar events if you type something into a prompt

They didn't type it. They hit a button, it suggested adding it to the calendar and they approved it.

You can still find some of the flow forced, but I think you've missed the starting point here.


> OpenAI CEO Sam Altman has voiced worries about the use of AI to influence elections, but he says the threat will go away once “everyone gets used to it.”

Then he's lying or a complete moron.

People have been able to fake things for ages, since you can entirely fabricate any text because you can just type it. The same as you can pass on any rumour by speaking it.

People are fundamentally aware of this. Nobody is confused about whether or not you can make up "X said Y".

*AND YET* people fall for this stuff all the time. Humans are bad at this and the ways in which we are bad at this is extensively documented.

The idea that once you can very quickly and cheaply generate fake images that somehow people will treat it with drastically more scepticism than text or talking is insane.

Frankly the side I see more likely is what's in the article - that just as real reporting is dismissed as fake news that legitimate images will be decried as AI if they don't fit someones narrative. It's a super easy get-out clause mentally. We see this now with people commenting about how someone elses comment simply cannot be from a real person because they used the word "delve", or structured things, or had an em dash. Hank Green has a video I can't find now where people looked at a space X explosion and said it was fake and AI and CGI, because it was filmed well with a drone - so it looks just like fake content.


> - something that can be stored for the cost of less than a day of this industry's work

Far, far less even. You can grab a 1TB external SSD from a good name for less than a days work at minimum wage in the UK.

I keep getting surprised at just how cheap large storage is every time I need to update stuff.


Good find - the first video is a frequency sweep, video 2 has some music.

Edit - I'm not sure that's the same thing? The release talks about pixel based sound, the linked paper is about sticking an array of piezoelectric speakers to the back of a display.

edit 2- You're right, the press release is pretty poor at explaining this though. It is not the pixels emitting the sound. It's an array of something like 25 speakers arranged like pixels.

https://www.eurekalert.org/news-releases/1084704


This is the article that should be the main link- though still an incredibly misleading named technology.

But the current one is just wrong.


This is where I find having a disconnect between an ML team and product team is so broken. Same for SE to be fair.

Accuracy rates, F1, anything, they're all just rough guides. The company cares about making money and some errors are much bigger than others.

We'd manually review changes for updates to our algos and models. Even with a golden set, breaking one case to fix five could be awesome or terrible.

I've given talks about this, my classic example is this somewhat imagined scenario (because it's unfair of me to accuse people of not checking at all):

It's 2015. You get an update to your classification model. Accuracy rates go up on a classic dataset, hooray! Let's deploy.

Your boss's, boss's, boss gets a call at 2am because you're in the news.

https://www.bbc.co.uk/news/technology-33347866

Ah. Turns out improving classifications of types of dogs improved but... that wasn't as important as this.

Issues and errors must be understood in context of the business. If your ML team is chucking models over the fence you're going to at best move slowly. At worst you're leaving yourself open to this kind of problem.


Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: