Hacker Newsnew | past | comments | ask | show | jobs | submit | writeslowly's commentslogin

It’s interesting that with both Anthropic and Google we’re seeing them develop agentic models that are supposed to do anything a human can do on computers without human intervention, but at the same time, if you plug one program into another of their programs or APIs in a way that wasn’t preapproved you may be blocked or banned.

To be charitable, maybe they’re expecting AI agents to eventually start reading the ToS docs


I wonder if you can use lower quality models (or some other non-llm related process) to inject more "noise" into the text in between stages. Of course it wouldn't help retain uniqueness from the original source text, just add more in between.

The vibes around the self-driving car hype (maybe 10 years ago?) felt very similar to me, but on a smaller scale. There was a lot of "You might like driving your car and having a steering wheel, but if you do, you're a luddite who will soon be forced to ride about in our featureless rented robot pods" type of statements, or that one AI scientist who was quoted saying we should just change laws around how humans are allowed to interact with streets to protect the self-driving cars.

Not all of it was like that, I think oddly enough it was Tesla or just Elon Musk claimng you'd soon be able to take a nap in your car on your morning commute through some sort of Jetsons tube or that you could let your car earn money on the side while you weren't using it, which might actually be appealing to the average person. But a lot of it felt like self-driving car companies wanted you to feel like they just wanted to disrupt your life and take your things away.


I see a number of uploaded skills on the site with bash and python scripts. No idea what runs them


Oh god...I guess I haven't gotten that deep in the crap yet


I've triggered similar conversation level safety blocks on a personal Claude account by using an instance of Deepseek to feed in Claude output and then create instructions that would be copied back over to Claude (there wasn't any real utility to this, it was just an experiment). Which sounds kind of similar to this. I couldn't understand what the heuristic was trying to guard against, but I think it's related to concerns about prompt injections and users impersonating Claude responses. I'm also surprised the same safeguards would exist in either the API or coding subscription.


I look at products like Hershey's chocolate or Reeses more like their own category of processed food, kind of like Spam. They have a close, but not exact resemblance to "normal" chocolate or peanut butter, but they're also sort of an acquired taste, and I think their customers would be upset if Reese's Peanut Butter cups suddenly tasted like the Trader Joe's versions (with real peanut butter instead of a mysterious chalky peanut-flavored substance), or if Hershey's stopped using the butyric acid process that makes them taste like vomit to non-americans.


I haven't actually had that much luck with having them output a boring API boilerplate in large Java projects. Like "I need to create a new BarOperation that has to go in a different set of classes and files and API prefixes than all the FooOperations and I don't feel like copy pasting all the yaml and Java classes" but the AI has problems following this. Maybe they work better in small projects.

I actually like LLMs better for creative thinking because they work like a very powerful search engine that can combine unrelated results and pull in adjacent material I would never personally think of.


> Like "I need to create a new BarOperation that has to go in a different set of classes and files and API prefixes than all the FooOperations and I don't feel like copy pasting all the yaml and Java classes" but the AI has problems following this.

To be fair, I also have problems following this.


One thing I suspect is that leadership at tech companies that would have previously been working off of direct experience with technical processes, even if they no longer work directly on their own codebases, is pretty clueless about AI coding because it's so new. All they have to go with is what they read, or sales pitches, or their experience dabbling with Cursor to build simple python utilities (which AI tools work pretty well for most of the time), and they don't see what it can and can't do on a real codebase.


These are people who are stock market shook. I'd be looking at reducing your impact from index funds or if you were stupid enough to invest in tech stocks directly cash out now.


I recently ran across this toaster-in-dishwasher article [1] again and was disappointed that the LLMs I have access to could replicate the "hairdryer-in-aquarium" breakthrough (or the toaster-in-dishwasher scenario, although I haven't explored it as much), which has made me a bit skeptical of the ability of LLMs to do novel research. Maybe the new OpenAI research AI is smart enough to figure it out?

[1] https://jdstillwater.blogspot.com/2012/05/i-put-toaster-in-d...


Do you mean they sided with the incorrect common wisdom all the people in the article were using?


This looks like my experiments to get R1 to write fiction and I think it’s worse than what you get from openai. For instance, it’s using very colorful language to describe a place that’s both a remote fishing village on the edge of a cliff hours before dawn, and a bustling wharf with chattering laborers and large ships anchored in the distance. It also starts by saying the protagonist wakes up with his mouth tasting like blood, that he was screaming, and that his throat is hoarse from holding back from screaming. It’s very colorful but it’s very confusing to read.

I suspect you can update the prompt to make the setting more consistent, but it will still throw in a lot of inappropriate detail. I’m only nitpicking because my initial reaction was that it’s very vivid but feels difficult to understand and I wanted to explain why.


I agree that it felt hard to read. It also doesn't make sense that they're fishing in a storm. But from a prose perspective I don't think it's cringe, which is an improvement from my expectation. I'd share some of the writings I think are terrible but I don't like to pick on people.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: