Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Anthropic: Expanding Access to Claude for Government (anthropic.com)
113 points by Luuucas on June 26, 2024 | hide | past | favorite | 95 comments


There's no doubt that LLMs massively expand the ability of agencies like the NSA to perform large-scale surveillance at a higher quality. I wonder if Anthropic (or other LLM providers) ever push back or restrict these kinds of use cases? Or is that too risky for them?


That ship has probably sailed. If Llama3 is performing on par with GPT-3.5, then there is no real benefit for companies to restrict access to slightly better proprietary models.


GPT-4 is “holy shit, this actually works, could be better but it’s so good I almost can’t believe it” while GPT-3.5 is “when it works it’s pretty great, just a pity it almost never does”.

So I would assume that three letter agencies would love to take something like GPT-4 and fine tune it based on all the data they have about existing terrorists.


I'm still dealing with hallucinations nearly every time I use it.


I get maybe one hallucination per twenty chats with gpt4.


I haven't tried more than a handful of queries, but I think I've gotten 100% rate of hallucination or generic useless response to specific question.


Can I try your question? Just curious.


It is very easy to find such questions. A very recent example is a thread about AIs not having a concept of correlation:

https://news.ycombinator.com/item?id=40751756

In that thread multiple people posted wrong answers from GPT-4o but assumed that the answers were correct and praised the AI.

This matches my experience that anything that deviates from an encyclopedia lookup or web search is very likely to be wrong.


What does this have to do with hallucinations?


I don't remember exactly, but they were broadly "How can I do $WEIRD_NICHE_THING with $GENERAL_FEATURE of gradle / some java library?"


You might need to help it out with some more context. I find that LLMs act a lot like humans because they are trained on data that is mostly produced by humans. Sometimes having a bit of a conversation with it based on the general theme of your question first will help it focus on that part of its knowledge.

I’ve started using the chat feature in Github Copilot in IntelliJ. I wanted it to add some logging to my code for me, since it was a tedious task. I started off with a few relevant files and an explanation of what I wanted. Naturally it didn’t get it right on the first try, I don’t think any humans would either. But I could continue as conversation explaining what I thought was wrong and how I wanted it to actually be. I even realised that I didn’t know exactly what I wanted before I had seen some of the suggestions.

Once I was happy with the result I added another file to the chat and asked it to do the same with this file. I had a handful of files that were structured very similarly and all needed the same kind of logging. It did a great job and I could use the response without further editing. I tried to add more files but realised that the replies got slower and slower, so instead I reverted the conversation back to the state where I had initially been happy with the results and asked it to do the same thing but this time to a different file.

I find that it takes some practice to get good at getting the best results from LLMs. One great place to start is the prompt engineering guide by OpenAI https://platform.openai.com/docs/guides/prompt-engineering

When using something like GPT-4 for developing I try to think of it as a junior developer or a grad student. With a search engine you need to include the correct keywords to get the best results. For LLMs you need to set the right mood by writing a good prompt and holding a conversation before getting to the point. I also find that GPT-4 is fairly good at answering factual questions, but it’s much more useful and powerful when used to create things or discuss an approach.


"do a web search to validate what you've told me"


Do you mean 3.5? While I still face issues with GPT 4, I can't even remember the last time it hallucinated. I'm not saying it can't. But, yeah, that's crazy that they're specifically targeting your IP address like that.


NSA should be training their own GPT-4 or better model as we speak and should have been doing it for a long while now. Anything else is borderline incompetence.


NSA can't hire the right talent capable of producing that product for the same reason they have trouble finding white-hat security people to hire: You can't work for the government and do drugs in your personal time. Enough of the pie of elite researchers are in to wacky mind-bending that it's a real recruitment problem.


Also, imagine the public shitstorm when people see headlines that NSA has overturned their policy against microdosing. They're not gonna understand what tf is going on and trying to explain it away sure af isn't gonna happen because they'll always believe that all drugs are bad and defending zero tolerance policies are the hallmark of being one of "the good guys".


You don't think compensation is the bigger issue?


And given the volume of data they likely sift through, I'd also expect them to want very small, high-throughput models for identifying targets for larger models to examine.

On the flip side, LLMs must give the NSA a new challenge: a flood of garbage text generated by no-one in particular. Perhaps there will be more effort to put surveillance directly on-device as tapping networks yields more noise.


I’d expect they’re using huge models to train many small ones, one for each threat actor. Those small models could decide whether their actor is detected, or it’s time to slot in a different one.


On the grasping side, they are probably in the best position to train a GPT-5, given the amount and type of data they're presumed to have.


Will it really though? So far I've seen most of the "revolutionise" claims to be mainly hot air and marketing.

It's possible that LLMs will suddenly make a leap in reliability and usability (e.g. much higher context window without corresponding massive increases in memory usage). But I have yet to see it.

So far it's great at some specific usecases. Interacting with humans, rewriting or making up text. Summarising. A hit & miss at everything else.

Don't get me wrong, I love AI tech and I'm heavily experimenting with it (both at work and at home with local models). But as with most hyped technologies I find the benefits far overblown in marketing stories.

Our leadership jumped on Microsoft Copilot (the one for Office 365 because they have tens of different copilots :) ) like a pack of hungry wolves afraid to miss the boat. And the result was.... kinda meh. It's kinda promising and impresses with simple play school stuff ("make me a presentation about home safety") and totally and utterly fails when you try to do anything serious work related. Sooo many times I get "Sorry I can't do this right now", "Sorry I need more training for this", "I can't do this for you but this is how you can do it yourself!" or it does something but like totally wrong.

Meanwhile we have a bunch of MS training people running around evangelising and telling us how great everything is and making excuses for everything that goes wrong :) You can almost see them breathe a sigh of relief every time something works as it should. That's not what we were promised.

Maybe it will get there, but I don't see it happening tomorrow to be honest. LLMs were an impressive leap but their achilles heels have become clear and it's proving difficult to overcome them.

I'm really enjoying surfing the knife's edge of technology (as I was and still am with metaverse) but I don't yet see this as a game changer except in a few specific industries. People editing text for a living certainly have a need to worry.

I also wonder what will happen with future AI training. Now that more and more websites are filled with AI-generated content that is often at best "mediocre", and considering future AI models will be trained on that, will they be able to improve their accuracy or struggle to maintain it?


I use LLMs extensively in my field to automate all sorts of tasks. Need to classify a million PDF documents for cheap? Write a prompt and submit a batch job. Need to read 30,000 drilling reports to automatically scan for hazards? Done in 60 minutes.

These are tasks that would have taken months of development or millions of dollars in manual effort before. It's not just hype.


Boy, I can’t wait for the foundation of my house to disappear because the LLM mis-classified a drilling report as non-hazardous.

What’s the deal here with liability and accountability? That’s a serious problem when considering using these for anything other than toy problems.


You don't actually think the LLM is reviewing those 30k documents do you? You tell it to write a program (which is easy to audit) to pull the info from the PDFs or whatever. I don't get why this crowd is so goddamn unimaginative with LLMs.


> You tell it to write a program (which is easy to audit) to pull the info from the PDFs

Wherein you discover that unless you ask it to consider the fact that PDFs are ... very hard to parse [1] [2] you get something that misses whole blocks of text or turns them into something they aren't and the rest of the program misses chunks of the document.

[1]: https://news.ycombinator.com/item?id=22473263 [2]: https://web.archive.org/web/20200303102734/https://www.filin...


Why are you expecting they are all very different? They're all likely very similar.


Because presuming that all of them are produced by the same utility is a _presumption_. They could be - but they could also be produced by many different vendors using many different methods all of them simply conforming to the specification "a PDF with HIGH LEVEL DESCRIPTION OF THE DATA".


Because I've heard of enough lazy uses of LLMs to be suspicious. Auditing the program means being sure that the info pulled from those documents is reviewed properly. Also, a complete lack of regard for other people's privacy.


No idea where privacy enters in here.


>Boy, I can’t wait for the foundation of my house to disappear because the LLM mis-classified a drilling report as non-hazardous

LMAO! It's so hilarious that people like you forget that the alternative is relying on bureaucracies managed by people that get things wrong more often and are both too lazy and too stubborn to process your application to review your drilling report again.

If using both human-level and AI-level analysis is cheaper and much more accurate (but still imperfect), I'm willing to settle for a better system than oppose all change and die holding out for a perfect system.


What are 'people like me'? It's not like I know nothing about large language models, I just think using them for civil engineering is a bad idea...


One thing I've struggled with while applying LLMs to business problems is how others have dealt with identifying and managing system failures.

Let's say some of your drilling reports contain a pattern that indicates balrog activity, which the LLM misses. The legal or insurance context requires you to monitor and address potential balrog activity. How do you plan for these failures?

In almost every case I've seen, the plan is to not have a plan, which is another way of saying that the data doesn't matter so long as no one complains about the results.


Same way you manage human failures?


The way we manage human failures are with rules, checklists, and accountability. LLMs struggle with all of these, and I get the sense that spending 6mos to develop long lists of rules isn't what the parent comment has in mind with "just write a prompt"


I think that for low-risk classification tasks and similar, something like an LLM is a great tool, and I can absolutely see it being extremely useful for intelligence work where sifting through stuff is very hard. However, I would not at all trust AI to make actually important decisions independently.


A genuine question and not meant as a snipe: as hallucinations are an inherent “feature” of LLMs, how can you be sure of the accuracy of the model’s interpretation of those 30,000 drilling report hazards? Or what is the acceptable level of risk?


You have it write a program to analyze it. I think a lot of people fail to understand that you don't always need the LLM to do the thing, have it write a program to do the thing for you.


That's not very likely to succeed, is it? LLMs can do a lot of things, but writing software that not only parses semi-proprietary file formats but also analyze unstructured data sounds more than little bit far fetched. I'd be impressed if just the first, and by far the easiest, part of that can be accomplished.


It's extremely likely to succeed because there is a documented format. I can't believe how pessimistic this site is about this stuff. Yeah, you're not going to one shot it with a prompt. If that's your expectation, you're confused.


Give it a go, then! No one would be more happy than me if you would prove me wrong.

Until then, I'd have to side with said pessimists here.


Okay, but you still need to debug the program. If your program must give correct results you still need to check the program output against every case. There's no free lunch there.


Speaking generally: The program doesn't always have to give correct results. The program just needs to reduce 30k documents down to 200 documents for human review.

You're comparing LLMs to a hypothetical alternative where a human reviews all 30k documents in detail. But the real alternative is often just a worse quality sieve where more errors blunder their way through the existing flawed processes. LLMs can improve on that.


The epistemology problem never goes away. How should I have any confidence that it's correctly flagging things for review? I need to go through 28800 documents to see if it missed anything.

You're right, I am comparing it to that alternative. There are fields and applications where this is necessary. I do not know if drilling reports are one of them. If you can tolerate a large false negative rate then great. But if you need to be catching 99.99% of problems then IMO you should at least be able to show your work. Taking black box output and throwing it over the wall sounds so sketchy in engineering contexts.


You can't have confidence, but my point is you often don't need confidence. All you need is an improvement on the flawed status quo.


Yeah I mean I had to move some big folders from server to server last week, maybe about 400. It was too random to script (would take longer to write the script) and I, as a human, doing it manually, still fucked up about 10%. 30k to 200 is exactly the stuff I'm talking about. The other people's existential dread is showing in this thread.


You're right. That's why to be sure I don't use software. All paper and pencil. So I can be sure. I have no idea what your point is.


I'm fine with writing software. I do so for a living. Usually when I'm responsible for a piece of software being correct, I'm the one who wrote it and not a black box. I use AI to autocomplete my code all the time and it very frequently suggests the wrong thing and attempts to insert random bugs.

So if my ass was on the line for the output of an AI-written program being correct for 30k cases of parsing unstructured or mixed data I would be extremely careful. That is my point.


Autocomplete is not in the same ballpark as intentionally prompting software.


Both processes produce bugs. And at any rate, LLMs are our best model for reading unstructured text. What program could an LLM possibly produce to read thousands of comments in natural language that would outperform, well, an LLM?


How can you be sure with humans doing the work?


That's where the law comes in. You can prosecute a human for negligence. What about an AI?


Would you trust your LLM to file your taxes for you?


Yes because without an LLM I don't do it.


How did you do your taxes a couple years ago?


I never did. Had to pay fines.


I hope with all the time and money being saved, you're having humans check the results.


Yes but that is one of those niche tasks I meant.

Once again they are selling it like something that's for everyone right now. This is the problem. THe same with the metaverse. It has some really great usecases, but they made it out like next year we would all ditch our phones and work exclusively in a VR headset. Obviously that didn't happen, as the tech was nowhere near that and probably people don't want it either.

Also, if you really need to be sure that those 30.000 drilling reports really didn't contain any hazards, you still have to go through it all yourself. Don't forget LLMs aren't reproducible.

But no, my point was exactly that it's not just hype. There are genuine useful usecases, I totally agree.

As there were for metaverse, and probably even for blockchain (NFT not so sure tho :) I always thought they were really a solution looking for a problem). The key thing about a hype is that they overblow the potential benefits way too much though. I see this happening here once again.


They're pretty clear about being pro safety to the extreme, and mass surveillance to protect american interests and abuse of LLM tech (e.g. open source misuses) are probably within the umbrella of ends justifying the means logic anthropic employs.


When you see the kinds of things that are developed in the name of "defense" it's easy to see how AI "safety" could become a similar sort of doublespeak.


AI safety already is double speak. The primary meaning is "safety" for investors who don't want to be associated with something distasteful. The other meaning is basically a thin cover.


Well you can look forward to worse, give it another decade and Lockheed Martin will be extolling their commitment to AI safety while announcing their new generation of fully autonomous kill drones. For defense, of course.


dumb question. I can understand LLM can be used for disinformation as it can generate text/image at scale. can you explain how it can do large scale surveillance?


LLMs can be fed a conversation and understand the intent of its participants, even if no particular keywords are used. Before this, surveillance was limited by how many human agents you could have sifting through recorded data.

Put another way: most people only get charged with a crime if it's worth a law-enforcement officer's time to catch you, but many small violations are ignored in favor of higher priorities. We may have to contemplate a future where AI is clever enough to notice everything that can be construed as a violation of some law and put on a prosecutor's backlog.

Schneier talks about this as well: https://www.schneier.com/blog/archives/2023/12/ai-and-mass-s...


I wouldn't say that they can be used to do large-scale surveillance, but they can definitely facilitate it, especially with CV integration. I think one can easily imagine the following scenario: you fill a LLM with photos from people (taken from a public camera for instance), it finds the closest matches (via a web search for instance, as Gemini does). From then, you can easily gather the most essential information: first and last name, age, usernames... And then use this information to structure even more precise prompts and find even more potentially interesting data: posts on forums, relatives... And with this data, you can create an exhaustive database with a plethora of information and data about these people.

That's what any good stalker or person experienced with social engineering is able to do right now, but it takes a lot of time and energy. Resorting to LLMs would considerably decrease both. And it gets easier the more people you have information about.


Specifically, vision transformers (ViT) outperforming established CNN.


I can imagine that for many government tasks, there would be a need for a reduced-censorship version of the AI model. It's pretty easy running into the guardrails on ChatGPT and friends when you talk about violence or other spicy topics.

This then begs the question of what level of censorship reduction to apply. Should government employees be allowed to e.g., war-game a mass murder with an AI? What about discussing how to erode civil rights?


Sure. Everyone, including government employees, should be allowed to discuss anything with AI. The problem is actually doing illegal things, which is... already illegal.


I find all of the virtue signalling from AI companies exhausting.


It's interesting how we dismiss anyone caring about things other than profit as virtue signally. Anthropic was founded by people who seemed to legitimately care about AI safety. It's possible the current conglomeration that is the company doesn't, but I wouldn't be so quick to assume that.


Calling something virtue signaling is not a dismissal of people caring about things other than profit. It's a statement of belief that they don't actually care about things other than profit and are just pretending that they do. There's a pretty clear financial incentive to virtue signal since getting the benefit of the doubt from society lets them make more profit with less scrutiny. There's also no benefit for society to give them the benefit of the doubt and assume they're being genuine.


Yes I agree that's the intent, but I think it can often be used by people who think the thing being "virtue" signaled about isn't important. People who think that corporations aren't going far enough in support of their social issue sometimes accuse them of virtue signally, but people opposed to people viewing the issue as important do all the time.

The prevailing wisdom on Hacker News now is that actually people who say they care about AI safety are mostly lying and any road blocks they try to put up are motivated by greed. It feels very much like someone accusing a company talking about responsible forestry and advocating for higher standards in the forest cutting business of virtue signaling


The danger there is that the phenomenon of throwing up roadblocks that are purely motivated by greed but are claimed to be for good reasons is common enough that we've got a phrase for it - regulatory capture. If the belief is that these companies are actually simply not motivated by anything other than pure greed, which I don't think is unreasonable, then it's also not unreasonable to be skeptical of any roadblocks they propose. Responsible forestry is a pretty great example actually - not that there's regulatory capture there, but in terms of it being an industry that pretends that it's planting forests and restoring ecosystems while actually taking rich, dynamic ecosystems and turning them into biodiversity-free monoculture tree farms. But the branding of 'forest' means people think it's something other than miles of high density monoculture agrigulture of neat, soulless rows of trees. And if those companies began talking about 'higher standards' in forest cutting, I'd pretty much immediately want to dig into who they're trying to lock out of the market, if it was some sort of protectionist thing, etc.


If Anthropic truly cared about AI safety, then they wouldn't put customers in states of uncertainty about monopolistic legal terms buried amidst verbose documents commanding customers to enforce Anthropic legal terms for them AFTER they leak harmful information.

Not only is Anthropic anti-open-source, they're also anti-open-output.

Saying "Hey, try our product! It can do everything!" while ALSO saying, "Sorry, you're not allowed to use our general intelligence product to compete with general intelligence..." just evidences no upper IQ bound on Dunning-Kruger


It's possible that they have the best intentions but are woefully misguided, like pretty much the average sv techno-optimist.


The market will decide if that's right.


I rest my case


Are you assuming that anyone claiming to be doing good things must be lying? And then getting angry at them both for not doing the good thing and for the lying?

That does sound exhausting.


If they're assuming they're lying, and then the good thing doesn't happen proving they've lied, then that might be exhausting, but they'd also be correct lol. You're allowed to be angry at things that suck about the world.


I think "AI safety" deserves an eye-roll. AI doesn't allow people to break the laws of physics or travel in time. Safety as presented by Anthropic and OpenAI, is just AI companies playing favorites with which industry gets to use some powerful software tools.

You're defending a company that makes "safe" usage part of their brand and every press release mentions how much they care about safety. Then one day, they announce they are making some compromises to their safety policy so they can get large new customers (government) but don't worry all they care about is safety. It's comical how predictable this was.


They have to do it for recruitment's sake. Enough of the talent they need live in Ivory Tower land where the base world-model / philosophy is super far to the progressive direction vs those Americans who work in industry.


So, basically all "confidential" information, if you are a subject "of interest", will be in the cloud and used to train models that can spit it out again. And the models will confabulate stories about you.

The can call themselves "sonnet", "bard", "open" and a whole plethora of other positive things. What remains is that they go into the direction of Palantir and the rest is just marketing.


That's not at all evidenced by the link. The link simply says that their language model will be available on AWS GovCloud, and that they've created these specific exceptions to their usage policy.

https://support.anthropic.com/en/articles/9528712-exceptions...

The things which you're allowing yourself to imagine, don't exist in the reality of information we're discussing here


> Claude offers a wide range of potential applications for government agencies, both in the present and looking toward the future. Government agencies can use Claude to provide improved citizen services, streamline document review and preparation, enhance policymaking with data-driven insights, and create realistic training scenarios. In the near future, AI could assist in disaster response coordination, enhance public health initiatives, or optimize energy grids for sustainability. Used responsibly, AI has the potential to transform how elected governments serve their constituents and promote peace and security.

> For example, we have crafted a set of contractual exceptions to our general Usage Policy that are carefully calibrated to enable beneficial uses by carefully selected government agencies. These allow Claude to be used for legally authorized foreign intelligence analysis, such as combating human trafficking, identifying covert influence or sabotage campaigns, and providing warning in advance of potential military activities, opening a window for diplomacy to prevent or deter them.

Sometimes I wonder if this is cynicism or if they actually drank their own cool-aid.


Its possible that you may have misunderstood what happened.

Firstly, anthropic made an LLM, exposed it to the internet, and provided these terms of acceptable use.

https://www.anthropic.com/legal/archive/4903a61b-037c-4293-9...

There was no need for cynicism or kool aid at this stage.

Later on, presumably now-ish, anthropic changed the usage policy, to add an exception.

https://support.anthropic.com/en/articles/9528712-exceptions...

> Exceptions to our Usage Policy

> Updated today

The exception is that, starting from now,

> Anthropic may enter into contracts with government customers that tailor use restrictions to that customer’s public mission and legal authorities if, in Anthropic’s judgment, the contractual use restrictions and applicable safeguards are adequate to mitigate the potential harms addressed by this Usage Policy.

I don't think any kool aid or cynicism is needed.

The change is that, if anthropic think the client use case meets the listed humanitarian goals, then the client may use the LLM.


Modern version of "Do no evil"? Come on, no one believes that sort of thing any longer.


They changed their policy, it now says "if the governmental client can convince anthropic that the government client needs to use Claude for purposes of anti-human trafficking. Then, anthropic will allow the government client to use Claude."

It doesn't require you to believe anything. It is not a modern version of any slogan. It's simply that, anthropic will allow government clients to use Claude if anthropic are convinced it's for certain, listed purposes. You are looking at this and approaching it ass-backwards.

It used to be a forbidden use case, or not an expresssly permitted one, for governments to use Claude to fight human trafficking.

Now it expressly permitted. You don't have to drink kool aid to understand this.

The Kansas city FBI field office's human task force, can now get some api keys, as long as they can convince anthropic that it's for a "catch a predator" sting.


Is the announcement just that they're on the AWS marketplace for govcloud? Do people ever actually make use of AWS marketplace? It just seems like a way to skirt procurement.


I wonder if they really intend to control ethics of Sonnet's use in government or if it's just a nice thing to say.


Meanwhile, the best models with sensible OSI-approved licenses are from China.

What are the security implications if American corpos like Google DeepMind, Microsoft GitHub, Anthropic and “Open”AI have explicitly anticompetitive / noncommercial licenses for greed/fear, so the only models people can use without fear of legal repercussions are Chinese?

Surely, Capitalism wouldn’t lead us to make a tremendous unforced error at societal scale?

Every AI is a sleeper agent risk if nobody has the balls and / or capacity to verify their inputs. Guess who wrote about that? https://arxiv.org/abs/2401.05566


Is there really anyone who thinks this is a good idea? AI systems routinely spit out false information. Why would a system like that be anywhere near a Government?

Perhaps (optimistically) this is just a credibility-grab from Anthropic, with no basis in fact.


From the link,

> Government agencies can use Claude to provide improved citizen services, streamline document review and preparation, enhance policymaking with data-driven insights, and create realistic training scenarios. In the near future, AI could assist in disaster response coordination, enhance public health initiatives, or optimize energy grids for sustainability.


Yeah, I read it. That just says what they want to do, and nothing about why it's a good idea. You would have to have your brain plugged in upside-down to even consider using an LLM to "enhance policymaking".


Going forward be very very wary of inputting sensitive information in Anthropic, OpenAI products, especially if you work for a foreign government, corporation.

Listen to Edward Snowden. This guy is not fucking around.


Edward Snowden presented sales decks for half baked programs as if they were fully realized, for shock value. To sell his narrative. His claims, like everyone elses, should be approached critically.


you think they can’t already get all of that from the average Joe by accessing backdoors in the cell carriers, cloud providers, etc?

very optimistic of you :-)




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: