Hacker Newsnew | past | comments | ask | show | jobs | submit | lurkshark's commentslogin

I assume we’ll end up with proof-of-identity attestation as a part of public posting (e.g. Worldcoin) which doesn’t necessarily solve the issue but will at least identify patterns more likely to be LLMs (e.g. a firehose of posts at all hours of the day from one identity). Then we’ll enter the dystopia of mandated real identity on the internet

I agree. I think that ultimately it will be governments providing services to attest humanity.

They already do to a certain extent via passports. I built a little human verifier using those at https://onlyhumanhub.com


There are a few “updating” benchmarks out there. I periodically take a look at these two:

https://swe-rebench.com/

https://livebench.ai/


Things like MemGPT/Letta, ToM-SWE, and Voltropy have made long context documentation pretty manageable. You could probably build some specialized tooling/prompts for development artifacts specifically too. But I’ll be the first to admit this is basically “Throw more agents at the problem”


I agree with this, I like spec-driven-development tooling partially for this reason. That being said, what I’ve found is often that I don’t include enough of the “why” in my prompt artifacts. The “what” and “how” are pretty well covered but sometimes I find myself looking back at them thinking “Why did I do this?” I’ve started including it but it does sometimes feel weird because I feel like “Why would the LLM ‘care’ about this story?”


My theory is that Anthropic has been wanting to make this change and doing it now while they’re making a (leaked to the) public stand in the name of ethics was a good opportunity.


Honest question: why have an elaborate theory with no evidence when the simple facts support a much simpler conclusion?

Anthropic is free to do what they want. I can’t imagine the board meeting where this triple bank shot of goading the government into threatening the company to do what they want.


I don't think it's that elaborate. I didn't mean to suggest they intentionally goaded the government into this confrontation. I figure it's a simpler "Oh look, we now have a good opportunity to make that announcement that we were worried about." Considering it's probably the same high-level decision makers on both choices it doesn't need a board meeting. And yes they're absolutely free to do what they want, but they're also not blind to how the public will view their decisions.


Seems like the two main threats are Defense Production Act and Supply Chain Risk. I'd assume Anthropic would sue if either were invoked. I could imagine Supply Chain Risk being easier to push back on because it's pretty clearly being used punitively rather than because of an actual risk. DPA might be a bit harder to push back on if the banned functionality (i.e. mass surveillance and autonomous weapons) exists in the LLM itself and it's just a matter of disabling external checks. If the banned functionality is baked into the training data/weights directly they could probably push back on the DPA by saying the functionality isn't something they can reasonably create.

Only other precedent I can think of in the case where pushback fails is Lavabit with Edward Snowden's email, but I feel like Anthropic is too big to "fail" in the same way Lavabit did to avoid complying. The penalty for refusing to comply with the Defense Production act is $10k and/or a year in prison, but I think if the government actually pursued that they would burn a bunch of bridges and Amodei would be a folk hero.


I'm wondering exactly how they expect the DPA to help them with what is essentially a SaaS product. It's still going to refuse to do things it refuses to do.


My thought was that if the refusal to service some requests is implemented as an external guard model The Pentagon could try to require them to drop the guard model. This would be similar to saying "we're asking for a 'product' you already 'manufacture'" in the way the DPA is often understood. But if the refusal is baked into the model itself then that argument is dead. Not saying I agree with this, I think it turns into the same kind of problem we saw with the Apple v. FBI conflict and the All Writs Act, but the government doesn't always act in the most sane ways.


guidance and alignment are usually handled by RLHF, which actually rewires the weights such that it becomes near-impossible for the model to have certain kinds of 'thoughts'. This is baked in such that it's not something you can just extract or turn off.


Yeah this standoff is worth at least 10 Super Bowl ads in good publicity. The Pentagon is saying "Claude is the best so we need to use it but you need to stop acting ethically". I'm almost wondering if someone in the administration has a stake in Anthropic because this is such a boost.

Their threat to label it a supply chain risk also feels toothless because they've basically admitted that using Claude is a benefit, so by their own logic they're be shooting themselves in the foot to ban contractors from using it.


Yes, I agree, and this is a moment to prove they aren’t full of it - and it also seems like a very good move when the rest of the world seems increasingly vary wary of tech that even whiffs of US govt involvement.

I am not at all a skeptic anymore on this stuff and the science is well beyond me, but from what I think I know about alignment issues, and anthropic’s intense focus on solving these, it would not surprise me at all if we learn that catering to US whims on AI safety will result in the model actually getting worse or causing intense, 2nd and 3rd order unintended consequences. I’m not saying I believe there is a Terminator sequence of events happening, but if I did believe that, the headlines right now would look exactly what that would look like.

Alignment is the biggest issue for me - in terms of getting these things to actually behave in an environment where it is absolutely necessary that they behave. If I had to guess, that’s probably why the military is preferring to use it. Claude tooling is the only thing I have used yet in this hype cycle that actually I can get to behave how I want and obeys (arguably, and often to a fault).

However I also believe we’re in the worst possible timeline so the moment we get a taste of something that works as promised, it’ll be ripped away because the govt decides to do something stupid or build a moat around its use in a way to make it less useful, and favor other more “compliant” competitors.

Either way I bet there are some wild board room discussions going on at Anthropic right now.


My favorite moment of the past year was when grok was too woke, so they changed it and it became stupid, which they fixed resulting in it getting woke again (and identifying Musk as 'one of the people most deserving of the death penalty'[0]).

It's almost as if contextual awareness and consideration are cornerstones of intelligence.

0 - https://www.theverge.com/news/617799/elon-musk-grok-ai-donal...


I feel like a lot of this would go away if they made a different API for the “only for use with our client” subscriptions. A different API from the generic one, that moved some of their client behaviors up to the server seems like it would solve all this. People would still reverse engineer to use it in other tools but it would be less useful (due to the forced scaffolding instead of entirely generic completions API) and also ease the burden on their inference compute.

I’m sure they went with reusing the generic completions API to iterate faster and make it easier to support both subscription and pay-per-token users in the same client, but it feels like they’re burning trust/goodwill when a technical solution could at least be attempted.


> I feel like a lot of this would go away if they made a different API for the “only for use with our client” subscriptions.

They literally did exactly that. That's what being cut off (Antigravity access, i.e. private "only for use with our client" subscription - not the whole account, btw.) for people who do "reverse engineer to use it in other tools".

Nothing here is new or surprising, the problem has been the same since Anthropic released Claude Code and the Max subscriptions - first thing people did then was trying to auth regular use with Claude Code tokens, so they don't have to pay the API prices they were supposed to.


What I was getting at is that the current API is still a generic inference endpoint, but with OAuth instead of an API key. What I'm suggesting is that they move some of the client logic up to the OAuth endpoint so it's no longer a generic inference endpoint (e.g. system prompt is static, context management is done on the server, etc). I assume they can get it to a point that it's no longer useful for a general purpose client like OpenClaw


I’ve thought an interesting outcome might be that it’s not even that there’s a binary generated. It’s just user input -> machine code LLM -> CPU. Like the only binary would be the LLM itself and it’s essentially mimicking software live. The paper “Diffusion as a Model of Environment Dream” (DIAMOND) is close to what I’m thinking, where they have a diffusion model generate frames of a game, updating with user input, but there’s no actual “game” code it’s just the model.

https://diamond-wm.github.io/

Like you’d have a machine code LLM that behaves like software but instead of a static binary being executed it’s just the LLM itself “executing” on inputs and precious state. I’m horrible at communicating this idea but hopefully the gist is there.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: