Hacker Newsnew | past | comments | ask | show | jobs | submit | gertop's commentslogin

I've not heard many people claim that LLMs don't hallucinate, however I have seen people (that I previously believed to be smart):

1. Believe LLMs outright even knowing they are frequently wrong

2. Claim that LLMs making shit up is caused by the user not prompting it correctly. I suppose in the same way that C is memory safe and only bad programmers make it not so.


AnonC doesn't seem to be upset that the journalist was fired. The disappointment comes from Ars trying to brush this entire situation away by deleting articles, comments, and making no statement on their website.

My understanding is that AnonC is upset at Ars not taking the mature approach by allowing this to become a learning moment for the employee and using it to double down and confirm their stance on AI generated content. There's strength in maturity. But I am doing some reading between the lines, and I'm possibly reading a bit too much into "There’s something to be said about the value of owning up to issues"

Reminds me of a story I was told as an intern deploying infra changes to prod for the first time. Some guy had accidentally caused hours of downtime, and was expecting to be fired, only for his boss to say "Those hours of downtime is the price we pay to train our staff (you) to be careful. If we fire you, we throw the investment out the window"


"Make sure quotes in your article are things the subject actually said to you" is not something that should need a "learning moment".

Accidentally taking down production should not lead to firing. It should lead to improved process

Making up quotes for article, with technology or not, should lead to firing.


"should lead to firing..."

... and, also, improved processes. There should be no way an individual writer can damage the brand to this extent with absolutely no checks or oversight. This was just an error, but a bad actor could've put something far, far worse out there.

Even an automated quote-checker might have helped in this case.


Fact checking is a vital part of the editorial process and clearly that process failed here. Tech people often have a double standard when it comes to journalism--rules for thee but not for me. However the structure is fairly analogous, in that both professions ship under lots of time pressure where mistakes can be costly. I'm not sure, honestly, who is most at fault here or why only the reporter was terminated. But my comment above was to highlight that there shouldn't be a double standard--if you think a journalist should be fired for this kind of error it would be inconsistent to believe a software engineer shouldn't.

There is a difference between an error and totally misunderstand your actual task. I have absolutely no sympathy for journalists getting caught producing hallucinated articles. Thats an absolute no go, and should always result in that person being fired.

Same goes for engineers reviewing vibeslop. If you let that shit through code review, and a customer impacting outage results, that should be instant termination. But it won't be, because as an engineer you are supposed to be held "blameless" right?

Hence why software engineers aren't an actual professional licensed engineers.

I love vibe coding but you are absolutely right. We're at the stage where vibe coding is a fun way to produce sloppy software and that's fine if the intended user is just yourself and you're fully informed about what you're getting into. But actually shipping vibe coded slop to other people is wacky, anybody doing the needs to be manually reviewing every commit very carefully and needs to be prepared to accept personal responsibility for anything that slips by.

The problem is that reviewing code for correctness is harder than writing correct code. So these things will always slip through review. I'm a little bit divided here whether we can (or should) blame a reviewer too harshly for letting broken code through review whether it's LLM or human generated.

I've worked on teams with a rubber stamp review culture where you're seen as a problem if you "slow things down" too much with thorough review. I've also worked on teams that see value in correctness and rigor. I've never worked on a team where a reviewer is putting their job on the line every time they click "Approve". And culturally, I'm not sure I'd want to.

That said I think it's pretty clear we need mechanisms that better hold engineers to account for signing off on things they shouldn't have. In some engineering domains you can lose your license for this kind of thing, and I feel like we need some analogous structure for the profession of software engineering.


Joirnalist job was not to review ai-slop. That is rather crucial difference.

The irony here is that news.ycombinator.com has a 1 second TTL. One DNS query per page load and they don't care, yay!


Joke on them because I use NextDNS with caching so all TTL is 3600s


It's trying to prevent the server from caching the search. Thousands of different searches will cause high CPU load and the WordPress might decide to suspend the blog.


Good news then because taxing the wealthy wouldn't increase what you pay in the slightest.


The answer is that it's not okay and never was. Do you really think you're pulling a gotcha here?

Photoshopping nudes of your coworkers was always seen poorly and would get you fired if the right people heard about it. It's just that most people don't have the skill to do it so it never became a common enough issue for the zeitgeist to care.


I am not trying to pull a gotcha and I made no claim that it is okay or not okay. Don't suggest otherwise. I also wasn't talking about coworkers or any other particular group.

My argument is that it is either okay or not, regardless of the tools used.


Another explanation for the lack of faces online could be that most of us in the 90s simply didn't have an easy way of getting our photos online.

Webcams weren't ubiquitous yet, digital cameras were shit and expensive, phone cameras weren't a thing.


True for the images, but users not using real names when posting on forums was the usual.


Those modern terminal projects have weird defaults and quirky behaviors just to be different.

So to me it's easy to believe that a user expects something to work a certain way, does minimal or no research about it, and go directly to report a bug when in reality it's intented behavior.


To be fair it also stinks of arrogance to think that you have the skills to know "what is clearly a bug" 100% of the time in projects you don't own.


> to know "what is clearly a bug" 100% of the time in projects you don't own

Owning a project is counter-productive for QA. If it’s your project, you know where to click and where to not click.

OTOH, you don’t need to know anything about a project to conclude that a crash with access violation, or hang with 100% CPU usage, are clearly bugs.


>90% of the time it's usually pretty clear if something is a bug, especially if there's log files full of errors.


Fwiw, you might have misinterpreted the idiomatic expression "all the time" as meaning "100% of the time". It just means "often" or "commonly". The parent is just saying they often find bugs, they know they're bugs through experience.

Of course anyone can make a mistake. Maybe you prefer the 'discussions' route because it's only seemingly then possible for a projects own devs to make a mistake in creating an issue.


Have you confirmed that the new feature works without an account or is that speculation?

The account requirement for nearby share never made sense yet they still did it the way...


The account requirement for nearby share is, as I understand it, to enable "contacts only" mode, which is how you prevent people from receiving random dickpics the second they try out the protocol and permanently turn the feature off afterwards. I think NS also has some kind of cloud transfer backup connection in case local transfers don't work (using Samsung's cloud), but I'm not 100% sure if that's related.

The account requirement can already be avoided using existing implementations of standard QuickShare (i.e. https://henriqueclaranhan.github.io/rquickshare/) but those are limited to devices sharing the same WiFi connection. However, as there is no contact sharing between iOS and Android, interoperability basically forces Google to pick between "Google account optional" and "doesn't work with iOS".


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: