Nonsense bills get introduced all the time. I’m not saying this shouldn’t be taken seriously, but this eventually getting codified is a long shot.
There are so many issues with how this can work in practice. Best case it just asks how old you are like a website that shows mature content, and the user lies. So from a liability perspective that shifts it to the user who gave false information. Beyond that there’s no practical way to actually verify someone’s age at the OS level.
Age verification in general rolled out so fast, I'm more inclined to think that nonsense gets passed more easily if it enables going after drastically more control.
The government doesn't care if it's nonsensical and contradictory and enforcement would be a dumpster fire because all that essentially grants their buddies (if not now then in the future) in the executive more power.
I'm happy I invested in setting up Codex CLI and getting it to work with ollama. For the toughest jobs I can use Github Copilot (free as an academic) or Gemini CLI. If we see the per token price increase 5x or 10x as these companies move to focusing on revenue, local models will be the way to go, so long as stuff like Gemma 4 keeps getting released.
There was a headline saying they were, and the actual article showed they were doing nothingbof the sort.
If you read HN headlines, and don't even bother to click into the comments and see everyone calling out the headline as bogus, you might think something like your statement is true.
I just looked into this a bit because I thought he still had some kind of role at Microsoft even after leaving as CEO/chairman, but it turns out that in 2020 he left any and all positions at Microsoft as it was investigating him over inappropriate sexual relationships he had with Microsoft employees.
Before that he had a role as a technical advisor and sat on the board of directors.
I also found it interesting that Steve Ballmer owns considerably more of Microsoft than Bill Gates (4% for Steve Ballmer while Bill Gates owns less than 1%).
He still visits Microsoft occasionally. A friend showed me a picture of him visiting Microsoft in Beijing a few months ago (he was excited about BillG visiting). So my guess is that he still has an interest in Microsoft products.
I couldn’t know, but generally speaking, older billionaires don’t typically interact with the world in the same way most of us do (well, those without a social media addiction anyway). The device is someone else’s problem.
He’s still around as a part time advisor, he has to officially step back or no one would take Satya seriously, but on important stuff like AI he is a bit more active.
If he doesn't use Windows, you won't hear about it. And if you hear that he uses Windows, it might not be true. He loses nothing by denying it. If it worked for his friendship with Epstein, it will work here.
They made a movie to make money. I doubt anyone holding the purse strings cared one iota if that bit were corrected or not. It’s not really a retcon either because they didn’t change anything.
I disagree that evaluation is always a coding task. Evaluation is scrutiny for the person who wants the thing. It’s subjective. So, unless you’re evaluating something purely objective, such as an algorithm, I don’t see how a self contained, self “improving “ agent accomplishes the subjectivity constraint - as by design you are leaving out the subject.
Sure. There will always be subjective tasks where the person who asks for something needs to give feedback. But even there we could come up with ways to make it easier / faster / better ux. (one example I saw my frontend colleagues do is use a fast model to create 9 versions of a component, in a grid. And they "at a glance" decide which one is "better", and use that going forwards).
OTOH, there's loads you can do for evaluation before a human even sees the artifact. Things like does the site load, does it behave the same, did anything major change on the happy path, etc etc. There's a recent-ish paper where instead of classic "LLM as a judge" they used LLMs to come up with rubrics, and other instances check original prompt + rubrics on a binary scale. Saw improvements in a lot of evaluations.
Then there's "evaluate by having an agent do it" for any documentation tracking. Say you have a project, you implement a feature, and document the changes. Then you can have an agent take that documentation and "try it out". Should give you much faster feedback loops.
> Things like does the site load, does it behave the same, did anything major change on the happy path, etc etc.
I asked Claude to build a web app to run locally polling data from the LAN. It fought me for four rounds of me telling it that the data from the api wasn’t rendered on the page. It created tests with mock data, it validated the api, it tested that the page loaded. It was gaslighting telling me that everything worked every time I told it that it didn’t work. I had to tell it to inspect the dom and take screenshots with Playwright to make it stop effing around. I don’t think it ever would have found the right response on its own.
Even after deliberate intervention, it regressed a few rounds later and stopped caring that tests failed. Whatever, I don’t treat it as anything more than a sometimes-correct random output machine.
In science there are ways to surface subjectivity (cannot be counted) into observable quantized phenomena. Take opinion polls for instance: "approval" of a political figure can mean many things and is subjective, but experts in the field make "approval" into a number through scientific methods. These methods are just an approximation and have many IFs, they're not perfect (and for presidential campaign analysis in particular they've been failing for reasons I won't clarify here), but they're useful nonetheless.
Another thing that get quantized is video preferences to maximize engagement.
Why not use skills? They follow a three-tier loading approach, and you can stick an MCP as part of the toolset for the skills, so it will only load it when the skill is selected.
There are so many issues with how this can work in practice. Best case it just asks how old you are like a website that shows mature content, and the user lies. So from a liability perspective that shifts it to the user who gave false information. Beyond that there’s no practical way to actually verify someone’s age at the OS level.
reply