Hacker Newsnew | past | comments | ask | show | jobs | submit | binarymax's commentslogin

Nonsense bills get introduced all the time. I’m not saying this shouldn’t be taken seriously, but this eventually getting codified is a long shot.

There are so many issues with how this can work in practice. Best case it just asks how old you are like a website that shows mature content, and the user lies. So from a liability perspective that shifts it to the user who gave false information. Beyond that there’s no practical way to actually verify someone’s age at the OS level.


>Beyond that there’s no practical way to actually verify someone’s age at the OS level.

KYC for windows, MacOS, iOS, Android, and internet players like Google and Cloudflare being forced to block unverified devices.

There probably would still be a way around it, but it would be a headache for most people.


Age verification in general rolled out so fast, I'm more inclined to think that nonsense gets passed more easily if it enables going after drastically more control.

You're assuming that politicians are competent, thorough, and consider all the implications before writing a bill or voting on a bill.

That is a _very_ dangerous assumption.


The government doesn't care if it's nonsensical and contradictory and enforcement would be a dumpster fire because all that essentially grants their buddies (if not now then in the future) in the executive more power.

In a multiline text box, enter should NOT submit the form. Chat interfaces violate this rule and it results in lots of premature chat submissions.

Precisely. 'member CUA?

Codex switched to paid API tokens only. Not to mention their alignment with the department of war.

> Codex switched to paid API tokens only.

They’re still doing subscriptions: https://developers.openai.com/codex/pricing


I'm happy I invested in setting up Codex CLI and getting it to work with ollama. For the toughest jobs I can use Github Copilot (free as an academic) or Gemini CLI. If we see the per token price increase 5x or 10x as these companies move to focusing on revenue, local models will be the way to go, so long as stuff like Gemma 4 keeps getting released.

That's not true.

There was a headline saying they were, and the actual article showed they were doing nothingbof the sort.

If you read HN headlines, and don't even bother to click into the comments and see everyone calling out the headline as bogus, you might think something like your statement is true.


Can you give context for the API thing?

Edit: Looks like it still works with subs, they just measure usage per token instead of per message.


Why do you think the author of this piece, to who you originally replied, has any control over this?

Given how bad windows has become since windows 7, I’ve been wondering. Does Bill Gates still use Windows? Does he put up with the horribleness?

The guy getting std’s from Russian girls who Warren Buffet doesn’t talk to anymore.

That Bill Gates?

Seems he might have other priorities these days?


I think Bill pretty much chilled out since he stepped down?

I just looked into this a bit because I thought he still had some kind of role at Microsoft even after leaving as CEO/chairman, but it turns out that in 2020 he left any and all positions at Microsoft as it was investigating him over inappropriate sexual relationships he had with Microsoft employees.

Before that he had a role as a technical advisor and sat on the board of directors.

I also found it interesting that Steve Ballmer owns considerably more of Microsoft than Bill Gates (4% for Steve Ballmer while Bill Gates owns less than 1%).


Yeah of course. He has nothing to do with Microsoft operations or strategy. But does he still use the products?

He still visits Microsoft occasionally. A friend showed me a picture of him visiting Microsoft in Beijing a few months ago (he was excited about BillG visiting). So my guess is that he still has an interest in Microsoft products.

I couldn’t know, but generally speaking, older billionaires don’t typically interact with the world in the same way most of us do (well, those without a social media addiction anyway). The device is someone else’s problem.

He’s still around as a part time advisor, he has to officially step back or no one would take Satya seriously, but on important stuff like AI he is a bit more active.

He's mostly been hanging with Epstein and asking for people to buy him STD medication due to his endless trysts

No im sure he uses linux or osx. Everyone uses this since windows phone was killed

If he doesn't use Windows, you won't hear about it. And if you hear that he uses Windows, it might not be true. He loses nothing by denying it. If it worked for his friendship with Epstein, it will work here.

It’s wrong. It made large mistakes on my code literally yesterday.


Wrong context


No. Aside from just making an algorithm that didn’t even run, it refused to use an MCP that it had registered in the same context session.


Ah, the eternal handwave for anything the AI doesn't do well - it must be user error.


Making a whole movie just to retcon the parsec misuse in Ep IV was a choice


They made a movie to make money. I doubt anyone holding the purse strings cared one iota if that bit were corrected or not. It’s not really a retcon either because they didn’t change anything.


I disagree that evaluation is always a coding task. Evaluation is scrutiny for the person who wants the thing. It’s subjective. So, unless you’re evaluating something purely objective, such as an algorithm, I don’t see how a self contained, self “improving “ agent accomplishes the subjectivity constraint - as by design you are leaving out the subject.


Sure. There will always be subjective tasks where the person who asks for something needs to give feedback. But even there we could come up with ways to make it easier / faster / better ux. (one example I saw my frontend colleagues do is use a fast model to create 9 versions of a component, in a grid. And they "at a glance" decide which one is "better", and use that going forwards).

OTOH, there's loads you can do for evaluation before a human even sees the artifact. Things like does the site load, does it behave the same, did anything major change on the happy path, etc etc. There's a recent-ish paper where instead of classic "LLM as a judge" they used LLMs to come up with rubrics, and other instances check original prompt + rubrics on a binary scale. Saw improvements in a lot of evaluations.

Then there's "evaluate by having an agent do it" for any documentation tracking. Say you have a project, you implement a feature, and document the changes. Then you can have an agent take that documentation and "try it out". Should give you much faster feedback loops.


> Things like does the site load, does it behave the same, did anything major change on the happy path, etc etc.

I asked Claude to build a web app to run locally polling data from the LAN. It fought me for four rounds of me telling it that the data from the api wasn’t rendered on the page. It created tests with mock data, it validated the api, it tested that the page loaded. It was gaslighting telling me that everything worked every time I told it that it didn’t work. I had to tell it to inspect the dom and take screenshots with Playwright to make it stop effing around. I don’t think it ever would have found the right response on its own.

Even after deliberate intervention, it regressed a few rounds later and stopped caring that tests failed. Whatever, I don’t treat it as anything more than a sometimes-correct random output machine.


The thing you're missing is harness engineering.


In science there are ways to surface subjectivity (cannot be counted) into observable quantized phenomena. Take opinion polls for instance: "approval" of a political figure can mean many things and is subjective, but experts in the field make "approval" into a number through scientific methods. These methods are just an approximation and have many IFs, they're not perfect (and for presidential campaign analysis in particular they've been failing for reasons I won't clarify here), but they're useful nonetheless.

Another thing that get quantized is video preferences to maximize engagement.


I want to echo the top comment in that post. Apple removing the headphone jack from iPhones was absolutely criminal.


Why not use skills? They follow a three-tier loading approach, and you can stick an MCP as part of the toolset for the skills, so it will only load it when the skill is selected.

See the progressive disclosure section in the skills docs: https://agentskills.io/what-are-skills


Three tiers? I thought it was two?


Discovery, Activation, Execution as per the linked doc


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: