
Returning the “killed” RSS of Reuters from the dead - artembugara
https://codarium.substack.com/p/returning-the-killed-rss-of-reuters
======
k1m
I work on a project which helps you produce RSS feeds from web pages that
don't offer their own using the page URL as input and simple selectors to
identify the web page elements to be used in the feed.

Here's what it can produce for the Reuters World News mobile page:

[https://createfeed.fivefilters.org/index.php?url=https%3A%2F...](https://createfeed.fivefilters.org/index.php?url=https%3A%2F%2Fmobile.reuters.com%2Fnews%2Fworld&in_id_or_class=article-
heading)

Here's what it can do for the main site:

[http://createfeed.fivefilters.org/index.php?url=https%3A%2F%...](http://createfeed.fivefilters.org/index.php?url=https%3A%2F%2Fwww.reuters.com%2Fnews%2Fworld&item=.story-
content&item_desc=p)

The downside is that certain changes to the HTML structure (e.g site
renaming/removing class attribute values used as selectors) could cause the
feeds to break.

~~~
simlevesque
As a dev I really like your website. Ultimate no bullshit product and I got
all the information I need in a single page. Congrats.

~~~
k1m
Thank you! Glad you liked it. :)

~~~
advaita
+1 to GP. Seeing this kind of products make happy in general. (Or at least
appeal to my confirmation bias about how products should be built)

FYI, you _probably_ have a typo on pricing page "Term Extraction" should be
"Term Extension"?

------
bscphil
A major problem with Reuters' RSS feeds while they lasted is that Reuters
pushes new URLs with updates to existing stories and kills or redirects the
previous URLs. So for a major developing story you'd see the same article in
your feed 5+ times, since the feed was just a dumb push of every URL added to
whatever category you were subscribed to. Still better than nothing, I guess.

The issue with this solution is that there doesn't seem to be any way to
specialize it at all. I was subscribed to Reuters' politics feed [1]
specifically, and I got other types of news from other sources. But I don't
see any way to do that with this method. The articles unfortunately do not
have the category in the URL.

[1]
[http://feeds.reuters.com/Reuters/PoliticsNews](http://feeds.reuters.com/Reuters/PoliticsNews)

~~~
Animats
Printing each story exactly once is tough. The RSS feed serial number thing
never worked. Sites with multiple RSS servers and a load balancer would return
a different serial number. I ended up taking the MD5 of the title and
description fields, with HTML markup deleted, and discarding new feed items
with a duplicated MD5.

~~~
bscphil
That would be a start, but unfortunately Reuters frequently changes their
article titles too.

------
ekpyrotic
Artem, do you know if there is a parameter for ordering the RSS results by
time for the Google News RSS results? This is very helpful.

Finally, I pay $2000+ annually for a competitor to NewsCatcherAPI. It may be
worth connecting. I signed up for a trial earlier and the two issues for me
would be (i) the range and depth of publications and (ii) not being about to
track mentions / references in the body of the article.

I may not be your target audience but one of your competitors is pulling in
10m articles with the full article content per day. I use the API for timely
alerts for PR monitoring -- my priority is that I pick up mentions of
companies / individuals wherever they happen quickly.

Your pricing is a lot more competitive. I just wonder whether you are looking
to move in the direction of range and depth in the future, or whether you're
targeting a different market segment.

~~~
kashprime
What competitor is this, if you don't mind me asking? Could use a service like
that for my own project.

~~~
artembugara
Could you connect with me at artem@newscatcherapi.com?

------
hombre_fatal
Just curious, what are reasons businesses would pay for a NewsCatcher API
subscription? What are they using it for?

Fun API, though.

~~~
artembugara
Integrate some kind of news feed to their platforms.

Analyzing the PR campaigns, market.

Building their own news aggregators (usually theme specific).

~~~
greenice
Would I be able to build a niche topic focused site (e.g. video gaming news)
based on the NewsCatcher API?

~~~
artembugara
Yes, that is exactly how our service works.

Ping me at artem@newscatcherapi.com, or go to a live chat on our website

------
hadrien01
I have a question about NewsCatcherAPI: what is the legal framework around
indexing copyrighted articles and provide a paid API to search them?

Edit: particularly when Reuters' business is selling its newswire feed

~~~
artembugara
Short answer: seems like OK as long as you do not resell the full body text.

Long answer: Google does the same thing everyday. Each country has its own
laws. Usually, news are in a special category that is less protected with
copyrights.

~~~
cortesoft
Yeah, and Google was sued and settled by agreeing to pay a fee to the news
wire:

[https://www.reuters.com/article/us-google-afp/afp-google-
new...](https://www.reuters.com/article/us-google-afp/afp-google-news-settle-
lawsuit-over-google-news-idUSN0728115420070407)

~~~
hombre_fatal
To an organization in France. Doesn't sound like concerning precedent to a
startup that doesn't share much in common with google.

------
NmVJCo
I was worried they had removed it completely. I'm using this for a simple web
page which shows RSS feeds from multiple news sources across the political
spectrum, and the Reuters feeds have been the essential pivot for me. It is
the only source I significantly trust to be neutral in these trying times.

Thank you so incredibly much for this very simple solution. I was worried I'd
have to spend many hours on some complicated fix. I'm glad I no longer have to
(unless Google kills of their news RSS).

~~~
tracker1
I have to agree on Reuters... They seem to make the effort to keep the context
with unbiased framing.

------
Animats
I have an antique Teletype set up to print news from the Reuters news feed,
and it's stopped working because of this. So I tried this new approach via
Google. All you get is the RSS feed titles, not the content. The "description"
is just a link to content elsewhere, with the title as link text. The real
Reuters RSS feed had a few sentences of copy for each story, roughly what
radio stations would read.

Associated Press seems to have dropped their RSS feeds too, or hidden them
well.

The New York Times still has a usable RSS feed at:
[https://rss.nytimes.com/services/xml/rss/nyt/World.xml](https://rss.nytimes.com/services/xml/rss/nyt/World.xml)

This gets me, in Teletype format:

    
    
        CHINA SLAMS TRUMP OVER UIGHUR LAW AMID BOLTON ACCUSATIONS
        (JUNE 18TH, 7:16 PM)
        A NEW LAW AIMED AT PUNISHING CHINESE OFFICIALS INVOLVED IN MASS
        INTERNMENTS OF UIGHURS AND OTHER MINORITIES IN XINJIANG CAME AS
        JOHN BOLTON ACCUSED PRESIDENT TRUMP OF SUPPORTING BEIJING?S
        CRACKDOWN.

~~~
totetsu
Do you ever tare off the latest breaking headline and rush into another room
and slam it on a desk and say "you're gunna want to see this!"?

~~~
Animats
It's at home.[1] This was from my steampunk period.

[1]
[http://www.aetherltd.com/images/tty14ro/printerathome1.jpg](http://www.aetherltd.com/images/tty14ro/printerathome1.jpg)

------
antpls
I read RSS feeds with an app on Android called Aggregator. Reuters feeds
included a short description of the story inside the feed, but Google News
doesn't have them. The descriptions in the feed allowed me to precisely filter
and label the entries based on keywords.

Anyway, I also use the Google News RSS trick described in the article, as
replacement for now. Not sure how long it will last, however.

------
_puk
I've always wondered how the copyright for the Google RSS works..

As far as I am aware it is an undocumented API that can be called without
authentication or acceptance of any EULA.

The response includes an explicit copyright field.

Does that mean the feed can not be reproduced? Can a derivative work be
created from it?

~~~
_curious_
"Can a derivative work be created from it?"

Grey area, lived in it for a while and made some money there, but at the end
of the day it depends on how the original creator feels about your derivative
work - or you personally :)

------
stijnsanders
Does this trick work for apnews.com as well?

~~~
banana_giraffe
Yep. It works for most news sites, it's just filtering the recent news from
Google News based off the URL. Oddly, it doesn't work for CNN. Never
understood why. Maybe "cnn" is a stop word to this search engine and ignored.
Dunno.

And, of course, it works today. It's a Google product, so enjoy it while it
lasts.

------
afrcnc
Brilliant. Thank yo!

------
mobilio
Probably someone there read NH?

------
solarkraft
> Did it work? Consider subscribing to my newsletter to get more useful
> content like that. It’s free: (...) I am a co-founder of NewsCatcherAPI —
> ultra-fast API to find news articles by any topic, country, language,
> website, or keyword. ...

It looks like an unfortunately automatically placed ad, making me look for a
continuaton of the actual content (a direct answer to the question) that never
came.

Weird writing style.

