Hacker Newsnew | past | comments | ask | show | jobs | submit | majormajor's commentslogin

> I foresee a wave of entrepreneurship coming. AI will empower more people to provide useful services directly to other people, with less middlemen and menial work, and more direct problem solving.

Why doesn't this extend one step further? Many of those "service provider" people no longer are needed? If you're a consultant for domain X, and you used to work at Big Consulting, and they fire you to replace you with AI... soon the customer will hire neither your new provider, or Big Consulting, and just use AI directly, if it's that good.

Certain professions have legal/regulatory protections, but thesis (a) "entrepreneurs replace big incumbent service providers" doesn't seem necessarily more stable compared to thesis (b) "the people who need the knowledge have their AI do it themselves". In order for (a) to be true without (b), the AI tools themselves have to be good enough to make the concentration of specialized knowledge and institutional expertise/history no longer critical; but not good enough that the would-be-entreprenurial-middleman's own specialized knowledge can't also be replaced.


> If you're a consultant for domain X, and you used to work at Big Consulting, and they fire you to replace you with AI... soon the customer will hire neither your new provider, or Big Consulting, and just use AI directly, if it's that good.

Most consulting is not some flashy 25 year old Ivy grad putting together a slide deck that says “fire people,” it usually involves either gathering (or providing) extensive domain knowledge much of which is in forms not legible to AI (or at least in forms that can’t be encapsulated at a context level that doesn’t cause unacceptable quality drops.) Often there are compliance mandates involved that have real teeth. So again I think there will still be plenty of humans involved.


> Right, but it's already doing that, and runs just fine, from what I understand. The developers don't have to sit there pounding the enter key on their keyboards over and over all day to keep the messages flowing.

> Is the user count and message rate growing so quickly that people are constantly needing to make architectural changes and performance improvements in order to keep it scaling up? Does adding new capacity need constant human intervention?

> Or are they adding new crazy features all the time that are genuinely challenging to implement?

> As a software developer who has worked on big distributed systems, I'm well aware that things take a lot more work than they often seem from the outside, but this strains belief.

IMO based on working on not-that-large-or-high-revenue systems, but ones where these things already applied, a bunch of it is probably a combination of three things:

* You're doing enough total revenue that a couple million a year to fund a team of engineers to try to make tiny marginal improvements in ad revenue through targeting, or new features on how to present ads, etc, can still easily pay for itself.

* You're running at a high enough scale / spending enough on resources that you can similarly justify spending millions on teams to knock more millions off your infra costs.

* You've got enough usage/users that making tiny improvements in bug rates/crashes/etc similarly results in more usage that more-than-pays-for-itself. (And the list of bugs to squash is possibly never-ending if those other groups keep changing things!)

"Why make 30M profit on 100M revenue when you can make 35M profit on 115M revenue" sorta thing.


IMO with multiple devs the focus on integration (module-level) instead of unit (lower-level/helper-function-level) is even more important than on single-dev projects.

I've gotten a lot of value out of LLM tools but without extensive feedback and direction I've found even the newer version of Opus pretty bad at writing good tests. First drafts are full of tests with some of these characteristics:

- good test, wrong layer, would turn into a mess of "wait why'd tests over there blow up for changes over here"

- mostly-good test, subtle issue (yes, this is the status quo with most human-written tests, but the risk of not being careful is that you (or the agent you throw at crap) now is overconfident in your future changes)

- weird silly test like "assert that calling the six statements in this method in the right order do the right thing" without... actually calling the method itself... so don't protect against changes to the method?

For non-greenfield work I've recently been much more happy with their out of the box code change quality then their attempts at adding coverage for those changes!

I think this is an area that has remained hard because putting directives in CLAUDE.md or whatnot for tests is generally gonna be so generic to be useless, like "put tests in the right place" without more module-specific context. Whereas if I'm making a non-greenfield change, I'm thinking much more in my prompt about constraints on the code itself, and much less about the current shape/state/organization of the tests or what to direct it on.

Properly used it's great. Definitely improved my test coverage a lot.

But it's entirely still in the world of "people who'd care to write good code before will write good code faster; other people will just write mediocre code faster."


> For some reason, Apple's ideal desktop experience is tailored around focusing on one application at a time. Which is certainly true for some workflows, but that's not me.

This is a very weird-sounding take to someone who has used Macs for three decades and recalls that for most of that time they never even had a full-screen mode.

Apple's desktop experience DNA is still, for better or worse, deeply anchored to spatial arrangement of partially-overlapping windows (or non-overlapping, if screen is big enough and window small enough), driven by mouse (Expose hot corners back in 2004 were basically the end-game after which they haven't made any new significant changes to this, and haven't had to). Their full-screen/single-app modes are IMO a weird half-baked Windows-maximize alternative.

But yes, it's a very mouse-oriented, single-desktop spatially-organized-and-layered world.


Not one window, but one application. Which is, yeah, about the worst of both worlds.

>> For some reason, Apple's ideal desktop experience is tailored around focusing on one application at a time. Which is certainly true for some workflows, but that's not me.

> This is a very weird-sounding take to someone who has used Macs for three decades and recalls that for most of that time they never even had a full-screen mode.

Sorry about that. I should've clarified better. What I meant was that Apple's opinion of an ideal desktop is closely matching a cluttered desk where only the owner knows the position of something and the focus shifts back and forth from one primary task to another task/interruption.

Edit: typos


Not sure I agree with this considering they have the double whammy of maximising giving you a new desktop, and also their default behaviour of shuffling your desktops to make sure you're disoriented.

The ideal desktop is a cluttered desk, where only the desk knows where it has stuck your tasks.


Last time I returned anything to amazon it was a two-minute-or-less task that didn't involve anything like a warranty service request email?

I buy sous vide cookers that die in about 3 months (all brands I've found do when run 24x7). That's outside of the normal return window and requires a warranty claim to get a replacement. For that you go through the maker, not Amazon. This basically doubles the useful life of the purchase, but I'd rather find one that lasts.

"Honesty" seems like unnecessary (and annoying) anthropomorphism there. I don't think there's any intent of fraud or deception in outputs from these things, just overreaching of prediction. Based on the latter part of the paragraph, I wish they'd just say something like "less likely to skip steps or overemphasize thin evidence" in the first place.

Don't play to the sci-fi "this thing's trying to outsmart me" tropes.


Using words people understand is more important than this strange fixation on not anthropomorphizing things.

I think "honesty" is not a particularly good descriptor, independent of anthropomorphism. Previous commenters suggestion was much more understandable to me.

Being that can be understood is language. The previous commenter is making an particular argument for how we can improve this understanding. They didn't suggest we should use less familiar words, but different familiar words. Why is this strange?

Anthropomorphizing is a shorthand for a powerful and poorly defined set of metaphors. There are tradeoffs going both ways but trying to dismiss it as merely "strange fixation" shows your own weakness.

To be clear, this is about anthropomorphizing large language models, not the general category of "things". Also, we should be evaluating these constructs using well-defined and measurable criteria; evaluating "honesty" fails to achieve both goals.

I think Honesty can be evaluated. Does the model push back when it knows the user is wrong? How often does the model hallucinate data vs. say it doesn't know? Provide a prompt with contradictions or other issues and see if the model corrects you.

Here is an article by Anthropic that explains what they do and mean in more detail: https://alignment.anthropic.com/2025/honesty-elicitation/


Just swap 'Honesty' with 'correctness in its claims' and you'll get what you need out of this aspect of the model description.

Honesty and correctness are not the same thing, even when talking about LLMs. Sometimes an LLM says a false thing and you don't know whether it's being dishonest or merely incorrect. Sometimes, however, you can see in the CoT that the model does know the true fact and is reasoning about how to deceive the user. That's lying, not just being incorrect.

Fair points. I notice it's not hiding as much from me as earlier versions. It's telling me exactly where it has gaps, where someone might be critical of what it did. Then it's easy for me to adjust. Before it used to lie or just not tell me. Feels like it is acting more like a senior that has enough game and credibility to just tell it like it is. It's noticable in only a few long prompts so far.

People get so wrapped around the axle with "anthropomorphizing". For regular folks with no technical background, sure maybe a bit of caveat sprinkled here or there is useful to help them understand what is or isn't true, but on HN it would seem to me that the bar is high enough that we can just use shared language to generally talk about capabilities.

When they say "Honesty" I don't think to myself, "Goodness, does this model have moral understanding?" No, I understand they mean it's less likely to directly bullshit me, which models frequently do.

I don't feel like this level of pedantry around language is useful for people who more or less know what's going on with LLMs. (Again, I concede that perhaps with a less technical audience, there's more need for it.)


I agree. In connection with LLMs we also shouldn't use the words intelligent, smart, reasoning, thinking, chat, conversation, etc.

I think if the dream of semantic search from vector embeddings had worked out as well as people had hoped then "grep over a bunch of text" would have some significant disadvantages.

But in practice I never saw anyone crack the embedding-generation-and-comparison problems well enough to actually get better results than grep for things like "find similar code and see what it does."

(You also don't need that advanced a model to use "grep over a pile of files", but the models today can run MUCH faster than GPT 3.5/4 were running over the APIs back then, making "summarize all five hundred of these matches from those files" much more usable.)


I’ve had very good luck having my system search for available tool functions with natural language (ultimately against Qdrant). I’m surprised to hear that people are trying to grep files, instead.

People? No, that's what AI agents themselves do.

There are theoretical gains from using a vector search engine in an agentic loop, but grep is the lowest common denominator of agentic search.


The US still largely believes everything that Reagan Republicans preached about the "evils" of taxation and regulation of oligarchy, despite the US economy overall (and the "average joe") doing quite well in the era that followed "soak the rich" taxes being passed.

So many claims about how it would lead to far better lives for everyone, but the working conditions and general affordability have basically gone down for 40 years. Imagine bringing back the white collar work in the 80s, with a private office with a door, and people whose jobs were to help coordinate and schedule things even if you weren't an exec, instead of you just having a phone to answer all hours of the day.


Are we really so sure that reducing working hours can't, itself, lead to improved economic health? Such as by increasing distribution of income flows, and increasing time available for economic consumption?

One of the greatest tricks of the modern era in the US has been to convince everyone that making the slice of pie bigger for the richest people is necessary to grow the economy.


In fact, we now know, with a fairly high degree of certainty, that it can.

There have been numerous experiments with four-day work weeks or six-hour work days that have almost uniformly shown increases in overall productivity.

Not just productivity per hour at work. Overall productivity.

The resistance to this is clearly not based in financial concerns, but rather in control, and classism.


> and increasing time available for economic consumption?

Where is that additional money going to come from? I think you’re missing some important factors in your analysis.


1. Not everyone is spending all their "spending money" every month, and 2. more free time allows people to get more value for their money (e.g. by comparing more alternative options).

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: