Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The idea is pretty cool, but it doesn't work super well. 1. I imagine most major news outlets don't have RSS feeds these days. 2. A lot of stuff originates from news agencies, so they don't spread from website to website, but radiate out from the agency. 3. Most of the included sources are pretty small. To draw meaningful conclusions we would need infos like popularity, political leaning, nation of origin, etc. 4. The similarity check doesn't appear to do translation. So when news spreads from one country to another we loose the thread.




Yes. For example, this story about Ukraine [1] is credited to WNYT as first, but the story itself credits the Associated Press. This problem is worth solving, because it's something search engines should be doing.

[1] https://wnyt.com/ap-top-news/rubio-says-us-ukraine-talks-on-...


yea, what im currently doing is pretty simple check on published at date from the rss feed (with some small validation checks)... but its causing issues bc it can be wrong and mess up everything...

I think checking source in story is next step...


Treating the Associated Press as a special case might be worthwhile. Its stories will appear in hundreds of places, some with a little alteration and some fully intact.

The devil really is always in the details.

Being consistent in message framing even when its not in the best interest of the public should not reasonably be considered "news" =3

https://en.wikipedia.org/wiki/Sinclair_Broadcast_Group

https://www.youtube.com/watch?v=GvtNyOzGogc


Yea not all major have rss feeds, but it seems like the majority still do.

No translation yet.

I think the biggest problem is im relying on published date from the news source itself too much and its wrong sometimes... not super often, but if 1 out of 100 sources get its wrong then it can steal credit for being source article when its not.


Also, not all information spreads through public channels, and might not even be/become publicly known. But that doesn't mean news refraction based on textual similarity isn't worthwhile to pursue, as it can reveal a lot about the self-organising principles by which the media operate.

>the similarity check doesn't appear to do translation

This surprises me. The system is based on embeddings. AFAIK embeddings cluster the same concept in different languages in roughly the same place? Maybe it depends on the model (or maybe it's not exact and the clustering cutoff loses it).


I'm basically throwing away non english articles for now... I'll pry get them in later, but I want to get english right first before trying to move to other languages...

The embeddings themselves will (pry) cluster ok in different languages (but I have not tested this yet)


> I imagine most major news outlets don't have RSS feeds these days

I’m not aware of any that don’t. RSS is alive and well.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: