Hacker Newsnew | past | comments | ask | show | jobs | submit | wenbin's commentslogin

HN is full of tech savvy people. Yet an llm slop article is upvoted to the front page of HN...

Imagine how deceptive llm slop contents are to the general population.


Same for podcasts (and other types of online contents) -

Here's a dataset of 26,000+ ai-generated "podcasts"

https://www.kaggle.com/datasets/listennotes/ai-generated-fak...


if you use microfeed.org , you can use jsonfeed , eg, https://www.microfeed.org/json/


It's similarly insulting to listen to your AI-generated fake podcasts[0]. Ten minutes spent on them is ten minutes wasted.

[0] AI-generated fake podcasts (mostly via NotebookLM) https://www.kaggle.com/datasets/listennotes/ai-generated-fak...


AI will create ever more AI-generated synthetic content because current systems still can't determine with 100% certainty whether a piece of content was produced by AI. And AIs will, intentionally or unintentionally, train on synthetic content produced by other AIs.

AI generators don't have a strong incentive to add watermarks to synthetic content. They also don't provide reliable AI-detection tools (or any tools at all) to help others detect content generated by them.


I’d be kind of surprised if they don’t watermark the content they generate. Just so they don’t train on their own slop.


Maybe some of them already embed some simple, secret marker to identify their own generated content. But people outside the organization wouldn’t know. And this still can’t prevent other companies from training models on synthetic data.

Once synthetic data becomes pervasive, it’s inevitable that some of it will end up in the training process. Then it’ll be interesting to see how the information world evolves: AI-generated content built on synthetic data produced by other AIs. Over time, people may trust AI-generated content less and less.


I really hope SynthID becomes a widely adopted standard - at the very least, Google should implement it across its own products like NotebookLM.

The problem is becoming urgent: more and more so-called “podcasts” are entirely fake, generated by NotebookLM and pushed to every major platform purely to farm backlinks and run blackhat SEO campaigns.

Beyond SynthID or similar watermarking standards, we also need models trained specifically [0] to detect AI-generated audio. Otherwise, the damage compounds - people might waste 30 minutes listening to a meaningless AI-generated podcast, or worse, absorb and believe misleading or outright harmful information.

[0] 15,000+ ai generated fake podcasts https://www.kaggle.com/datasets/listennotes/ai-generated-fak...


Given there is "misleading or outright harmful" information generated by humans, why is it more pressing that we track such content generated by AI?


I suppose efficiency? It's easier to filter out petabytes of AI slop than to determine which human generated content is harmful


missed the good old days of telnet bbs & newsgroup :)


Earlier this year, we at Listen Notes switched to Better Stack [0], replacing both Datadog and PagerDuty, and we couldn’t be happier :) Datadog offers a rich set of features, and as a public company, it makes sense for them to keep expanding their product and pushing larger contracts. But as a small team, we don't have a strong demand for constant new features. By switching to Better Stack, we were able to cut our monitoring and alerting costs by 90%, with basically the same things that we used from Datadog previously.

[0] https://www.listennotes.com/blog/use-betterstack-to-replace-...



Let me guess - the keyword here is "Section 174", just from the title alone :)

Dealing with Section 174 amortization in those first one to three years is a real headache (and your tax bill ends up higher than if it didn’t apply). Once your startup survives that the first few years of doing Section 174, things do get easier... but, sadly, most don't make it that far.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: