Hacker Newsnew | past | comments | ask | show | jobs | submit | idopmstuff's commentslogin

I am currently using a Claude skill that I have been building out over the last few days that runs through my Amazon PPC campaigns and does a full audit. Suggestions of bid adjustments, new search terms and products to advertise against and adjustment to campaign structures. It goes through all of the analytics Amazon provides, which are surprisingly extensive, to find every search term where my product shows up, gets added to cart and purchased.

It's the kind of thing that would be hours of tedious work, then even more time to actually make all the changes to the account. Instead I just say "yeah do all of that" and it is done. Magic stuff. Thousands of lines of Python to hit the Amazon APIs that I've never even looked at.


And it doesn't freak you out that you're relying on thousands of lines of code that you've never looked at? How do you verify the end result?

I wouldn't trust thousands of lines of code from one of my co-workers without testing


> And it doesn't freak you out that you're relying on thousands of lines of code that you've never looked at?

I was a product manager for 15 years. I helped sell products to customers who paid thousands or millions of dollars for them. I never looked at the code. Customers never looked at the code. The overwhelming majority of people in the world are constantly relying on code they've never looked at. It's mostly fine.

> How do you verify the end result?

That's the better question, and the answer is a few things. First, when it makes changes to my ad accounts, I spot check them in the UI. Second, I look at ad reporting pretty often, since it's a core part of running my business. If there were suddenly some enormous spike in spend, it wouldn't take me long to catch it.


It's thousands of lines of variation on my own hand-tooling, run through tests I designed, automated by the sort of onboarding docs I should have been writing years ago.

I've been doing agentic work for companies for the past year and first of all, error rates have dropped to 1-2% with the leading Q3 and Q4 models... 2026's Q1 models blowing those out the water and being cheaper in some way

but second of all, even when error rates were 20%, the time savings still meant A Viable Business. a much more viable business actually, a scarily crazy viable business with many annoyed customers getting slop of some sort, with a human in the loop correcting things from the LLM before it went out to consumers

agentic LLM coders are better than your co-workers. they can also write tests. they can do stress testing, load testing, end to end testing, and in my experience that's not even what course corrects LLMs that well, so we shouldn't even be trying to replicate processes made for humans with them. like a human, the LLM is prone to just correct the test as the test uses a deprecated assumption as opposed to product changes breaking a test to reveal a regression.

in my experience, type errors, compiler errors, logs on deployment and database entries have made the LLM correct its approach more than tests. Devops and Data science, more than QA.


Why wouldn't you test? That sounds like a bad thing.

Me? I use AI to write tests just as I use it to write everything else. I pay a lot of attention to what's being done including code quality but I am no more insecure about trusting those thousands of tested lines than I am about trusting the byte code generated from the 'strings of code'.

We have just moved up another level of abstraction, as we have done many times before. It will take time to perfect but it's already amazing.


So people don't look at the code, or the tests.

So they don't know if it has the right behavior to begin with, or even if the tests are testing the right behavior.

This is what people are talking about. This is why nobody responsible wants to uberscale a serious app this way. It's ridiculous to see so much hype in this thread, people claiming they've built entire businesses without looking at any code. Keep your business away from me, then.


Do you trust the assembly your compiler puts out? The machine code your assembler puts out? The virtual machine it runs on? Thousands of lines of code you've never looked at...

None of that is generated by an LLM prone to hallucination and is perfectly deterministic unless there's a hardware problem.

And yes, I have occasionally run into compiler bugs in my career. That's one reason we test.


> None of that is generated by an LLM

How did you verify that?

> prone to hallucination

You know humans can hallucinate?

> is perfectly deterministic

We agree then that you can verify, test, and trust the deterministic code an LLM produces without ever looking at it.

> That's one reason we test

That's one way we can trust and verify code produced by an LLM. You can't stop doing all the other things that aren't coding.

I get there's a difference. Shitty code can be produced by LLMs or humans. LLMs really can pump out the shitty code. I just think the argument that you cant trust code you haven't viewed is not a good argument. I very much trust a lot of code I've never seen, and yes I've been bitten by it too.

Not trying to be an ass, more trying to figure out how im going to deal for the next decade before retirement age. Uts going to be a lot of testing and verification I guess


> How did you verify that?

The compiler works without an internet connection and requires too little resources to be secretly running a local model. (Also, you can’t inspect the source code.)

> You know humans can hallucinate?

We are talking about compilers…

> We agree then that you can verify, test, and trust the deterministic code an LLM produces without ever looking at it.

Unlike a compiler, an LLM does not produce code in a deterministic way, so it’s not guaranteed to do what the input tells it to.


It is for me because the LLM makes my ability to evaluate super, too.

Compiler theory and implementation is based on mathematical and logic principles. And hence much more provable and trustworthy than a LLM thats stitching together pieces of text based on ‘training’

"Trust"? God no. That's why I have a debugger

Also you really do have to know how the underlying assembly integer operations work or you can get yourself into a world of hurt. Do they not still teach that in CS classes?

It's also worth nothing that the "our" in that sentence is just SWEs, who are a pretty small group in the grand scheme of things. I recognize that's a lot of HN, but still bears considering in terms of the broader impact outside of that group.

I'm a small business owner, and AI has drastically increased my agency. I can do so much more - I've built so many internal tools and automated so many processes that allow me to spend my time on things I care about (both within the business but also spending time with my kids).

It is, fortunately, and unfortunately, the nature of a lot of technology to disempower some people while making lives better for others. The internet disempowered librarians.


> It's also worth nothing that the "our" in that sentence is just SWEs

It isn't, it just a matter of seeing ahead of the curve. Delegating stuff to AI and agents by necessity leads to atrophy of skills that are being delegated. Using AI to write code leads to reduced capability to write code (among people). Using AI for decision-making reduces capability for making decisions. Using AI for math reduces capability for doing math. Using AI to formulate opinions reduces capability to formulate opinions. Using AI to write summaries reduces capability to summarize. And so on. And, by nature, less capability means less agency.

Once men turned their thinking over to machines in the hope that this would set them free. But that only permitted other men with machines to enslave them

Not to mention utilizing AI for control, spying, invigilation and coercion. Do I need to explain how control is opposed to agency?


I'll grant that it does extend beyond SWEs, but whether AI atrophies skills is entirely up to the user.

I used to use a bookkeeper, but I got Claude a QuickBooks API key and have had it doing my books since then. I give it the same inputs and it generates all the various journal entries, etc. that I need. The difference between using it and my bookkeeper is I can ask it all kinds of questions about why it's doing things and how bookkeeping conventions work. It's much better at explaining than my bookkeeper and also doesn't charge me by the hour to answer. I've learned more about bookkeeping in the past month than in my entire life prior - very much the opposite of skill atrophy.

Claude does a bunch of low-skill tasks in my business, like copying numbers from reports into different systems into a centralized Google Sheet. My muscle memory at running reports and pulling out the info I want has certainly atrophied, but who cares? It was a skill I used because I needed the outcome, not because the skill was useful.

You say that using AI reduces all these skills as though that's an unavoidable outcome over which people have no control, but it's not. You can mindlessly hand tasks off to AI, or you can engage with it as an expert and learn something. In many cases the former is fine. Before AI ever existed, you saw the same thing as people progressed in their careers. The investment banking analyst gets promoted a few times and suddenly her skill at making slide decks has atrophied, because she's delegating that to analysts. That's a desirable outcome, not a tragedy.

Less capability doesn't necessarily mean less agency. If you choose to delegate a task you don't want to do so you can focus on other things, then you are becoming less capable at that skill precisely because you are exercising agency.

Now in fairness I get that I am very lucky in that I have full control of when and how I use AI, while others are going to be forced to use it in order to keep up with peers. But that's the way technology has always been - people who decided they didn't want to move from a typewriter to a word processor couldn't keep up and got left behind. The world changes, and we're forced to adapt to it. You can't go back, but within the current technological paradigm there remains plenty of agency to be had.


> but whether AI atrophies skills is entirely up to the user

Thing with society is that we cannot simply rely on self-discipline and self-control of individuals. For the same reason we have universal and legally enforced education system. We would still live in mostly illiterate society if people were not forced to learn or not forced to send their children to school.

Analogies to past inventions are limited due to the fact that AI doesn't automate physical-labor, hard or light - it automates, or at least its overlords claim it automates, lot of cognitive and creative labor. Thinking itself, at least in some of its aspects.

From sociological and political perspective there is a huge difference between majority of population losing capability to forge swords or sew dresses by hand and capability to formulate coherent opinions and communicate them.


I use Claude code in a number of different parts of my business - coding internal applications, acting as a direct interface to SaaS via APIs and just general internal use.

I find there is a virtuous cycle here where the more I use it, the more helpful it is. I fired my bookkeeper and have been using Claude with a QBO API key instead, and because it already had that context (along with other related business context), when I gave it the tax docs I gave to my CPA for 2024's taxes plus my return, and asked it to find mistakes, it determined that he did not depreciate goodwill from an acquisition. CPA confirmed this was his error and is amending my return.

Then I thought it'd be fun to see how it would do on constructing my 2024 return just from the same source docs my CPA had. First time I did it, it worked for an hour then said it had generated the return, checked it against the 2024 numbers and found they're the same. I had removed the 2024 before having it do this to avoid poisoning the context with the answers, but it turns out it had a worksheet .md file that it was using on prior questions that I had not erased (and then it admitted that it had started from the correct numbers).

In order to make sure I wouldn't have that issue again, I tried the 2024 return again, completely devoid of any historical context in a folder totally outside of my usual Claude Code folder tree. It actually got my return almost entirely correct, but it missed the very same deduction that it had caught my CPA missing earlier.

So for me, the buildup of context over time is fantastic and really leads to better results.


> it can price compare and then ask the user for confirmation

Sure, but that's explicitly not what the Citrini article said. It said: "The part that should have unsettled investors more than it did was that these agents didn’t wait to be asked. They ran in the background according to the user’s preferences. Commerce stopped being a series of discrete human decisions and became a continuous optimization process, running 24/7 on behalf of every connected consumer."


There are people already doing that today. Why do you think it will not increase in usage?

That's sort of besides the point. Both of you claim an extreme, the truth is in between.


I do think the models themselves will get commoditized, but I've come around to the opinion that there's still plenty of moat to be had.

On the user side, memory and context, especially as continual learning is developed, is pretty valuable. I use Claude Code to help run a lot of parts of my business, and it has so much context about what I do and the different products I sell that it would be annoying to switch at this point. I just used it to help me close my books for the year, and the fact that it was looking at my QuickBooks transactions with an understanding of my business definitely saved me a lot of time explaining.

On the enterprise side, I think businesses are going to be hesitant to swap models in and out, especially when they're used for core product functionality. It's annoying to change deterministic software, and switching probabilistic models seems much more fraught.


If you embezzled money at your last company, I shouldn't be able to decline to hire you on my finance team on that basis?


In many sane countries, companies can ask you to provide a legal certificate that you did not commit X category of crime. This certificate will then either say that you did not do any crimes in that category, or it will say that you did commit one or more of them. The exact crimes aren't mentioned.

Coincidentally these same countries tend to have a much much lower recidivism rate than other countries.


This doesn't seem better?

I'm an employer and I want to make sure you haven't committed any serious crimes, so I ask for a certificate saying you haven't committed violent crimes. I get a certificate saying you have. It was a fistfight from a couple of decades ago when you were 20, but I don't know if it's that or if you tortured someone to death. Gotta take a pass on hiring you, sorry.

Seems like the people this benefits relative to a system in which a company can find out the specific charges you were convicted of would be the people who have committed the most heinous crimes in a given category.


At least where I live, a fistfight from decades ago wouldn’t be on the certificate. In your example you want to know about serious crimes, but ask for violent crimes, why are you surprised that the answer you get won’t be useful to make a decision?

As in many things judicial, it only works if the rest of the system is designed to make it work.


No, because even if they're not sold as new (which as others have commented is often not the case), they're still competing with you for sales. Someone who would have paid full price for a new one instead gets a version with a slight issue at 25% off. That's fine if you're the one selling it at a discount, but here you've lost money on the production and are now losing even more money because you've lost a sale of a full price unit.


I think the spirit of that regulation is so you as the producer see this as an incentive to better manage production so there is no need to discard/burn 10% of everything.


I'm with you. I own a business and have created multiple tools for myself that collectively save me hours every month. What were boring, tedious tasks now just get done. I understand that the large-scale economic data are much less clear about productivity benefits, in my individual case they could not be more apparent.


I'm thirding this sentiment!

I run an eComm business and have built multiple software tools that each save the business $1000+ per month, in measurable wage savings/reductions in misfires.

What used to take a month or so can now be spat out in less than a week, and the tools are absolutely fit for purpose.

It's arguably more than that, since I used to have to spread that month of work over 3-6 months (working part time while also doing daily tasks at the warehouse), but now can just take a week WFH and come back with a notable productivity gain.

I will say, to give credit to the anti-AI-hype crowd, that I make sure to roll the critical parts of the software by hand (things like the actual calculations that tell us what price an item at, for example). I did try to vibecode too much once and it backfired.

But things like UIs, task managers for web apps, simple API calls to print a courier label, all done with vibes.


Understanding when to make something deterministic and not is critical. Taste and judgement is critical.


This is the one that gets me - sometimes you're forced to work with systems that do annoying things that you have to accommodate. It's annoying, but it's more important to do the thing that prevents your users from having issues than it is to be theoretically right about whether something's required by a standard.

I've dealt with many worse cases than this, where the systems I was integrating with were doing things that weren't even close to reasonable, but they had the market power so I sucked it up and dealt with it for the sake of my users. Maybe Google's wrong here, but how do you not just implement the solution anyway?


> Maybe Google's wrong here, but how do you not just implement the solution anyway?

But they just did (make it work). The logical assumption is that most ppl did the same, just used another email provider. Why would viva care? (same as google, why would google care?).


[dead]


Well, apparently it's not even an issue for gmail users:

    To unblock myself, I switched to a personal @gmail.com address for the account. Gmail's own receiving infrastructure is apparently more lenient with messages, or perhaps routes them differently. The verification email came through.
So it's only an issue for people paying for Google's hosted email—a much smaller set!


Also product manager here.

Not at all cynically, this is classic product management - simplify by removing information that is useful to some users but not others.

We shouldn't be over it by now. It's good to think carefully about how you're using space in your UI and what you're presenting to the user.

You're saying it's bad because they removed useful information, but then why isn't Anthropic's suggestion of using verbose mode a good solution? Presumably the answer is because in addition to containing useful information, it also clutters the UI with a bunch of information the user doesn't want.

Same thing's true here - there are people who want to see the level of detail that the author wants and others for whom it's not useful and just takes up space.

> It requires deep understanding of customer usage in order not to make this mistake.

It requires deep understanding of customer usage to know whether it's a mistake at all, though. Anthropic has a lot deeper understanding of the usage of Claude Code than you or I or the author. I can't say for sure that they're using that information well, but since you're a PM I have to imagine that there's been some time when you made a decision that some subset of users didn't like but was right for the product, because you had a better understanding of the full scope of usage by your entire userbase than they did. Why not at least entertain the idea that the same thing is true here?


Simplification can be good---but they've removed the wrong half here!

The notifications act as an overall progress bar and give you a general sense of what Claude Code is doing: is it looking in the relevant part of your codebase, or has it gotten distracted by some unused, vendored-in code?

"Read 2 files" is fine as a progress indicator but is too vague for anything else. "Read foo.cpp and bar.h" takes almost the same amount of visual space, but fulfills both purposes. You might want to fold long lists of files (5? 15?) but that seems like the perfect place for a user-settable option.


> "Read 2 files" is fine as a progress indicator but is too vague for anything else. "Read foo.cpp and bar.h" takes almost the same amount of visual space, but fulfills both purposes.

Now this is a good, thoughtful response! Totally agree that if you can convey more information using basically the same amount of space, that's likely a better solution regardless of who's using the product.


> It requires deep understanding of customer usage to know whether it's a mistake at all

Software developers like customizable tools.

That's why IDEs still have "vim keybindings" and many other options.

Your user is highly skilled - let him decide what he wants to see.


There are a lot of Claude Code users who aren't software developers. Maybe they've decided that group is the one they want to cater to? I recognize that won't be a popular decision with the HN crowd, but that doesn't mean it's the wrong one.


I fully agree with you on almost everything you wrote in this thread, but I’m not sure this is the right answer. I myself currently spend a lot of time with CC and belong to that group of developers who don’t care about this problem. It’s likely that I’m not alone. So it doesn’t have to be the least professional audience they serve with this update. It’s possible that Anthropic knows what are they doing (e.g. reducing level of detail to simplify task of finding something more important in the output) and it’s also possible that they are simply making stupid product decisions because they have a cowboy PM who attacks some OKR screaming yahoo. We don’t know. In the end having multiple verbosity levels configured with granularity similar to java loggers would be nice.


Oh totally - I'm definitely not saying that they made the decision to cater to non-dev users, just that it's a possibility. Totally agree with you that at the end of the day, we haven't the foggiest idea.


Yeah, I made a similar point about the tone of ChatGPT responses; to me, I can't imagine why someone would want less information when working and tuning an AI model. However, something tells me they actually have hard evidence that users respond better with less information regardless of what the loud minority say online, and are following that.


100%. Metrics don't lie. I've A/B tested this a lot. Attention is a rare commodity and users will zone out and leave your product. I really dislike this fact


> Metrics don't lie

Metrics definitely lie, but generally in a different way to users/others. It's important to not let the metric become the goal, which is what often happens in a metric-heavy environment (certainly Google & FB, not sure about the rest of big tech).


Then why is the suggestion to use verbose mode treated as another mistake?

The user is highly skilled; let them filter out what is important

This should be better than adding an indeterminate number of toggles and settings, no?


does claude code let me control whats output when?

verbose i think puts it on the TUI and i cant particularly grep or sed on the TUI


> You're saying it's bad because they removed useful information, but then why isn't Anthropic's suggestion of using verbose mode a good solution?

Because reading through hundreds of lines verbose output is not a solution to the problem of "I used to be able to see _at a glance_ what files were being touched and what search patterns were being used but now I can't".


Right, I understand why people prefer this. The point was that the post I was responding to was making pretty broad claims about how removing information is bad but then ignoring the fact that they in fact prefer a solution that removes a lot of information.


They know what people type into their tools, but they don't know what in the output users read and focus on unless they're convening a user study or focus group.

I personally love that the model tells me what file it has read because I know whether or not it's headed in the generally right direction that I intended. Anthropic has no way of knowing I feel this way.


But you have no idea if they've convened user study or focus groups, right?

I'll just reiterate my initial point that the author of the post and the people commenting here have no idea what information Anthropic is working with. I'm not saying they've made the right decision, but I am saying that people ought to give them the slightest bit of credit here instead of treating them like idiots.


Developer> This is important information and most developers want to see it.

PM1> Looks like a PM who is out of touch with what the developers want. Easy mistake to make.

PM2> Anthropic knows better than this developer. The developer is probably wrong.

I don't know for sure what the best decision is here, I've barely used CC. Neither does PM1 nor PM2, but PM2 is being awfully dismissive of the opinion of a user in the target audience. PM1 is probably putting a bit too much weight on Developer's opinion, but I fully agree with "All of us... have seen UIs where this has occurred." Yes, we have. I personally greatly appreciate a PM who listens and responds quickly to negative feedback on changes like this, especially "streamlining" and "reducing clutter" type changes since they're so easy to get wrong (as PM1 says).

> It's good to think carefully about how you're using space in your UI and what you're presenting to the user.

I agree. It's also good to have the humility to know that your subjective opinion as someone not in the target audience even if you're designing the product is less informed in many ways than that of your users.

----

Personally, I get creeped out by how many things CC is doing and tokens it's burning in the background. It has a strong "trust me bro" vibe that I dislike. That's probably common to all agent systems; I haven't used enough to know.


> Personally, I get creeped out by how many things CC is doing and tokens it's burning in the background. It has a strong "trust me bro" vibe that I dislike.

100% this.

It might be convenient to hide information from non-technical users; but software engineers need to know what is happening. If it is not visible by default, it should be configurable via dotfiles.


> PM2> Anthropic knows better than this developer. The developer is probably wrong.

Nope! Not what I said. I specifically said that I don't know if Anthropic is using the information they have well. Please at least have the courtesy not to misrepresent what I'm saying. There's plenty of room to criticize without doing that.

> It's also good to have the humility to know that your subjective opinion as someone not in the target audience even if you're designing the product is less informed in many ways than that of your users.

Ah, but you don't know I'm not the target audience. Claude Code is increasingly seeing non-developer users, and perhaps Anthropic has made a strategic decision to make the product friendlier to them, because they see that as a larger userbase to target?

I agree that it's important to have humility. Here's mine: I don't know why Anthropic made this decision. I know they have much more information than me about the product usage, its roadmap and their overall business strategy.

I understand that you may not like what they're doing here and that the lack of information creeps you out. That's totally valid. My point isn't that you're wrong to have that opinion, it's that folks here are wrong to assume that Anthropic made this decision because they don't understand what they're doing.


I'm sure the goal is that reading files is something you debug, not monitor, like individual network requests in a browser.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: