Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Python package to collect normalized news from almost any website (github.com/kotartemiy)
420 points by kizy25 11 months ago | hide | past | favorite | 194 comments

> Programmatically collect normalized news from (almost) any website.

By this you mean ... Fetch RSS feeds from the websites that we've hardcoded into a Python library?

It seems like the work here was in collecting and cataloging a lot of RSS feeds. You don't seem to be able to arbitrarily "collect news from any website".

For any news junkies heres, I've built https://maagnit.com which gathers both Left and Right leaning sources for any story and displays them altogether on one page.

My approach has been, if we can't get neutral/objective coverage, getting comprehensive, 360 desgree coverage is a good alternative.

These days, news bias is not only in the way a story is covered, but also which stories are covered. So, maagnit automatically collects stories from the left and right.

Seems like a terrible idea to me. The two sides are at war, and if the way you sample them is by going 50-50, that encourages them to be more extreme because that will skew the average in their direction.

We already see this in the media. You have the host and two people. Lets hear what person A says, lets hear what person B says. Who's right? Impossible to tell, otherwise you must be biased. And no mention that person A drew conclusions on claims which were mostly verifiable facts, whereas person B drew conclusions on entirely made up claims.

In fact, this is a known fallacy and has a name: argument to moderation.


Some times person A is 100% right and person B is 0% right. The thing is that to be able to tell the different takes effort in educating yourself. And the minute you try to outsource this to someone else, they will use that trust against you. But calling it 50-50 is not a solution, it's not even an approximate solution.

To be honest I don't find "verifiable fact" to be worth much these days. If coronavirus has done anything it has highlighted how easy it is to do wrong.

Journalists and politicians alike getting drawing incorrect insights from data.

Meanwhile so little of the data surrounding coronavirus is comparable. You have countries only recording cases if the patient was hospitalised (not if they tested positive) and cause of death reporting is a minefield.

So "verifiable fact" is just as easily weaponised / politicised as bullshit. At least bullshit is easier to debunk...

We had this almighty Imperial Model projecting 500k dead meanwhile Sweden just did their own thing.

I don't really have a leaning to the policy setting but it feels that once you wrap some information up with numbers, maths and scientific authority you can parade subjectivity as fact anyway too.

I guess I'm a little jaded after seeing politics and business use spurious pedestalise data which really amounts to numerology when you start peeling back the layers and thinking critically.

> meanwhile Sweden just did their own thing

I'm not sure if you are just throwing this out there or if your intent is to hold up Sweden as an example that contradicted expected results with respect to models/social distancing requirements/etc.

If the later, then think Sweden is not a good example of such a contradiction: With the comparably minor-to-moderate social distancing, they have roughly about 2.5x the cases per capita as their two neighbors Finland and Norway, and roughly 6x to 7x the deaths per capita.

As you said, the data from country to country is hard to compare, but at the very least Sweden should not be the poster-child for low social-distancing mandates.

The point of the early models, which were predicated on no behavior change, was to change behavior. And they certainly did. Well before the official stay-at-home orders, people here in SF were starting to act differently. And when those orders came, people took them seriously. So I don't think we can blame early models for doing exactly what they were supposed to do, which was telling us what could happen if we didn't take the disease seriously.

Antibody studies from Spain and elsewhere show only ~5% of population of these 'hard-hit' nations has had the virus. This confirms a mortality rate of about 1% when the national epidemic has a bad first wave but limited medical system overwhelm, followed by a severe lockdown.

Best guess given herd immunity kicks in at about 85% of pop infected is about 260,000 deaths with a 1% mortality rate. With these figures 500,000 deaths is a 1.9% mortality rate - high but still possible.

contextualise everything with this 5% figure - the US, and everywhere else, is much closer to the beginning of this than to the end.

You say that verifiable facts don't count for much and then as an example point to a model? A model isn't a fact.

That someone made assumptions and drew conclusions might be a fact. But the assumptions might or might not be facts, and the conclusions certainly aren't facts. As a general rule, if it's in the future, it's not a fact.

You're completely right - but I guess my point is that anything that has a number attached to it is often treated as fact in the media.

In the very strictest sense of verifiable fact - boolean true/false statements with irrefutable proof, then yes I think fact trumps all.

Unfortunately I think that in reality, very little of what we think of as fact falls under that very narrow definition. And my example was merely trying to illustrate that the entire field of medical statistics and epidemiology is filled with pitfalls, given that many of the numbers that are reported themselves have a tonne of nuance and context baked into them.

And I suppose any discussion which focuses solely on the numbers associated with the current pandemic are going to be extremely lossy as the numbers are a very low fidelity expression of the "truth" (in the idealistic sense).

What does this have to do with facts? I guess I feel that culturally we attribute the trust to things we consider "facts" at quite a superficial level without digging deeper.

Just like bias by choosing which stories to cover, you can choose which facts to confirm and debunk. Snopes used to be where I went to debunk urban legends. It seems they’ve taken a more political bent now. Their fact verification is still true, but it does seem like they fact check with an agenda now.

That's like saying you should never collect opinions from more than one person on a topic.

If you read garbage publications, of course you will get garbage arguments and opinions.

US news channels are a horrible example, they pick someone to represent the most extreme views and have them fight on TV to entertain viewers.

But reading two articles on the same topic by two sources on different ends of the political spectrum and with decent quality and can be very enlightening. (emphasis on quality)

This reminds me, how does a fallacy get discovered or made official? I mean, I followed the link above and saw it has a latin name and everything but yet feels very new and contemporary. Not that I disagree except that to avoid this fallacy, the 'verifiable facts' have to be known by all in advance

Traditionally, they come from ancient Greek philosophers. This one seems to be a modern variant of https://en.m.wikipedia.org/wiki/Sorites_paradox so I guess it's not "official".

Seems like a perfectly good idea to me; I doubt one website permitting comparison is going to suddenly instigate a dialogue extremism arms-race (more than normal). In fact if anything it's bringing us closer to the original idea of debate rather than echo chambers.

And it seems like your argument falls under plenty of its own fallacies.

Perfect solution fallacy https://en.wikipedia.org/wiki/Nirvana_fallacy

And some might argue straw-man fallacy given that op never suggested that truth would lie 5050 between the two.

Debate allows us to see two opposing views and draw our own conclusions; not perfect but a whole lot better than nothing.

Educating yourself about every topic out there is absolutely impossible. Thats precisely why news businesses exist in the first place.

> Thats precisely why news businesses exist in the first place.

Nah. My view is that news exist for entertainment. That might not be their stated goal, but it's why people consume them.

I do not claim they educate but benefit from the mainstream ignorance about a vast variety of topics.

News categorically doesn't educate.

The Guardian (yes, ironically a provider of "News"), published an extract from the writings of "The Art of Thinking Clearly" author with some decent enough bullet points on this that are difficult to argue with:


I've always found myself to very much agree with Jefferson's views [1] on the topic of news, and I agree with this piece as well. I observe in myself that I'm generally happier and more productive the less I expose myself to "the news".

I think there's a threshold of distance and size where it stops being useful to the average person. Too far away? Can't do anything about it? Better not weigh down your mind by getting stressed about it.

[1] http://press-pubs.uchicago.edu/founders/documents/amendI_spe...

And the better alternative is...? What you say is correct but also worthless unless you provide a better way to do it.

> And the better alternative is...?

Actively researching both sources and stories, and not being a passive consumer who expects someone else to spoon-feed you a solution.

You have the time to dredge up sources and check their veracity every time you read an article? Often sources are private but verified by the journalist. This is why journalism exists and has a code of ethics. You can’t just look every claim up on Wikipedia.

I'm talking about your sources (i.e., media outlets), not an articles sources.

Definitely not worthless. Also OP makes a good point at the end:

> The thing is that to be able to tell the different takes effort in educating yourself.

When I was a kid CNN used to do this: they would invite a single climate scientist and a single climate change skeptic and have them make their points on equal time. CNN would then say “who’s right? You decide.” And the program would end.

This was extremely harmful to the overall mission of informing people and created a false balance between sides which are not equally valid. Today, this is almost universally seen as a failure of journalism.

Given the lessons learned, I wouldn’t be so quick to replicate that experience with a website.

Sorry, I have to respectfully disagree. This is the model that PBS Nightly News takes: let both sides of an issue talk and treat everyone respectfully.

Compare this to MSNBC and Fox News whose business models are really the same: cater to their viewers’ dogma/bubble and reinforce biases, and don’t try to introduce new ideas because it is bad for their business.

I don’t live in the US, so please take what I’m saying with a grain of salt. I imagine you are a smart and decent person on the other side of this screen.

Usually print media is considered higher quality journalism than TV media. There are numerous studies that show reading print media leads to people being more informed than just watching TV news. Whether this is correlation or causation I can’t really say.

My understanding of both Fox and MSNBC is they are entertainment programs, not so much news programs. I’m not sure it’s fair to compare those to anything in print, nor do I think it’s entirely possible.

A good news article might present both sides of a story, but try to fact-check claims by both sides, and bring in supporting evidence. That’s the key differentiating factor of quality journalism - helping bring facts to the forefront rather than let people on either side spin the facts for their purposes.

Print media is akin to software released on DVDs - a lot more testing (fact checking), approvals (Editorial decisions) need to happen due to the perceived finality of the medium.

TV on the other hand is like SaaS - continious integration and deployment. Release first and apologize later if needed.

I think you've got a witty metaphor here, but after mulling it over:

Doesn't CI rely on robust testing to ensure buggy software doesn't make it into production? I'm not sure the fire-and-forget of the 24/7 news cycle lends itself to a comparison to continuous deployment, as there is a strong bias to 'deliver now' given competition to break the news first, and it's hard to build in 'automated testing' (vetting?) of content.

CI doesn't imply CI with testing.

I think the world would be better off if no one ever referred to anything they watched on TV as “news” any more and kept it that news is something you read that can be cited and referenced.

>> “ Fox and MSNBC is they are entertainment programs”

No, maybe to you, but to vast amounts of people, millions of people, they watch a singular source for news and to them, that is the news.

In the US, since you’re referring to them, the news legally used to have to allow all sides of an issues to present their views. In the 1980s, the laws changed, and that is how news became so one-side.

They are news entertainment not just to him, pretty much all the mainstream sources are news entertainment. They're not objective. This argument seems like calling breakfast cereal breakfast whereas it's marketed as part of a complete breakfast. The differences are subtle, like the WHO is a political organization that covers medical topics. A decent write up of the broadcast transition: https://www.medialit.org/reading-room/whatever-happened-news

I would argue mainstream media is literally propaganda, or at least the US government is legally allowed to broadcast propaganda [1] — either directly via direct government intervention— or like what is now known to be the case in Turkey, for example, where parties buy out the media to gain favor from a given political group.

My point is that the majority of people actually watching these news outlets see it as news, not entertainment or propaganda.

[1] https://en.wikipedia.org/wiki/Voice_of_America#Smith–Mundt_A...

—- EDIT: Reviewed your CML link from spring of 1990, interesting read, but really fails to do critical analysis of the specific regulatory changes by name, what impact changes had, etc. Beyond that, stating obvious, it fails to cover all the material events since 1990.

Ah, thanks for the clarification. I see and completely agree with your points. Everybody believing something that's wrong doesn't make it right, critical thinking is down and I believe that there's far too many distractions for people to stay on top of things (it's a full time job following politics).

+1 thanks for the great reply.

As misleading as a particular spin of a story might be, those perspectives are still valid in the sense they connect with what is important to their readers.

I'd like to see a news app that has the facts and sensationalism separated, and then a reference to a more relevant fact/story that relates more directly to what a particular spin was getting at. So I guess feed their bubble with more accurate stories.

I disagree since these days (And especially TV-based) MSM has taken it upon themselves to tell readers what should be important to them.

The problem OP describes is real though.

The question is 'How far can you go?'

For example does it make sense to present advocates for science and pseudo-science on the same level, with equal weight?

Putting a Biologist who explains Evolution on the same pedestal on equal time as a Priest who believes in Intelligent Design creates a false sense of balance between these viewpoints, even though these viewpoints are fundamentally different in their very basic quality.

You can spin this further to illustrate this point: racism, religion, false promises, lies, mental illnesses.

We can spin it the other way too: Should people be fed only with information approved by some TV boards or government committees? Are we adults expected to vote, but not allowed and/or trusted anymore to make our own mind on what we believe in?

And even more importantly do you realize that in such world it's far more likely that only a priest will get invited to a TV show, not the biologist? Because that's the "how far" that I'm afraid of, fundamentalism that believes that the only way is to force "the truth" upon people, and it's of course always their truth.

There is a middle ground between presenting one-and-only-one viewpoint or presenting every possible viewpoint with implied equal validity.

For example: Given two commentators, if one is arguing that TV boards and government committees are supposed to be populated by and representative of the general population and the other arguing that TV boards and government committees are populated by crab people secretly supporting the illuminati; the discussion should be about how effectively TV boards and government committees are being filled by the general population and how effective they are at articulating societal consensus related to important issues.

In the "gotta hear both sides" world, rational discussion is dragged off the table by the notion that every issue has two equally valid sides no matter how absurd one "side" might be.

I think that crazy stuff is taking over the TV more because it's purposefully pushed by TV stations in a run to compete with all reality shows than because of giving voice to the both sides. Just look at the History channel that has nothing to do with history anymore, but it's all about aliens, Illuminati, Masons, Hitler's secret weapons (actually that one was interesting), etc. It's a lite fantasy entertainment that is cheap to make and it sells well, TV equivalent of pulp fiction, and channels are pushing it very intentionally.

I agree with your sentiment. There are two sides/extremes to this, both of which are undesirable or even harmful.

But were revealing a more fundamental issue here: Where are the checks and balances of the media?

As it stands now, we've got two possible solutions:

1. Censorship, which I find dangerous and shortsighted. 2. The media itself, which results in a 'who yells the loudest' kind of culture.

This issue bleeds into all kind of problems with propaganda, advertising, fake-news, bias, ideology, lies, scams, click-bait and other kind of bullshit.

In this discussion we've been talking as if there was some kind of sane regulation of these things. But there isn't really.

All of this stuff erodes trust and creates trenches. Sometimes it feels like it is getting worse. People have been talking about how Google Search is getting weaker on this site. There is so much more noise and bullshit today than there ever was, because we're accreting information w/o distinction.

There are projects, which try to make fact-checking easier for example, new kind of platforms and ideas to foster real debate. But those things are still on the fringe.

This is a massive, important and unsolved problem I think.

A number of European public broadcasters have some sort of balance rule along these lines. It can get pretty weird.

A memorable example, in Ireland, during the campaign for the referendum to legalise same-sex marriage, was a radio news article where they were talking to someone in the hotel industry about the potential impact on the industry from wedding bookings. Because of the balance rule, they then had to have on one of the three available complete lunatics they had for all discussions of same-sex marriage to rant about how the gays were going to transgender all the unborn babies or something (due to all political parties supporting the amendment, and the Catholic Church staying out of it, there were a very small number of public figures available to go on the radio/tv for 'balance', and most of them were conspiracy theorists).

Some broadcasters have recently changed the rules around certain subjects; for instance the BBC no longer invites climate change denialists or creationists for 'balance' in most contexts.

I'm uncertain how I feel about it. It can produce truly bizarre results, like the marriage one I mentioned, but it does seem useful in some cases. A major concern I'd have about it is that it certainly _is_ used to spread misinformation and conspiracy theories, especially where one side of the argument is "basically everyone" and the other side is a weird fringe, or where the natural proponents of one side are sitting it out (common on social issues in Ireland, where the Catholic Church is skittish about opposing liberalisation too visibly).

"Of course there are reasons why a gubernatorial election should not be decided by a ski race, but are there also reasons why it should? For the sake of fairness, we’ve brought in two experts with opposite opinions, who will now have equal time to just say those opinions, because that’s what news is."


I mean honestly I think the person advocating for the ski-race should probably get more time since that's the stance I am, and the audience presumably, are least familiar with.

There are some good faith and sound mind tests that should be applied here but if someone wants to genuinely make a case for choosing a political leader via athletic competition (which has no basis in history at all, nope) then why not hear them out? What are you afraid of?

Honestly I'm not convinced. What approach is better then, to invite only the side that you like? If you invite only one side you'll get to hear only one side of the story, it's that simple - and there's far too many subjects that are genuinely still worth a debate - it's not all about climate changes and 5G conspiracies. Journalists are just not proficient enough in the various subjects discussed on TV to be able to counter-argument their guests' theories in a live talk show. So you invite the expert from the other side and expect your guests to do the live fact-checking of the other side's claims for you. It works very good for politicians during the election campaigns, why wouldn't it work for science or any other subject?

It is definitely possible to have reasonable debate where both sides are presented equally.

But ruling out absurd ideas is incredibly important.

If you want to present a discussion on "Selecting sides for the news: How much restriction is too much?" Should the sides be "Industry consensus vs societal consensus" or should you make sure that the "Any censorship is intellectual theft and every news outlet must comply with individual requests for people to state their positions on the topic" side is treated the same as the first two?

The approach needs follow through. If you have both sides just say their sides, and not have arbiters that probe each side, then it’s pointless.

Have them both on, and then probe, and even make a decision. The viewers can still choose, but it is much more of an investigation than politely taking turns.

The current status quo on the internet is infinitely worse than that though.

The internet and social media allow you to ignore the correct conclusion because it makes a person uncomfortable with the idea that they could be wrong when everything they read reinforces that incorrect conclusion.

Hard to say.

By far the worst actor I've seen is (old media) Rush Limbaugh. It's hard to believe that Rush could believe that America faces any threats (e.g. war, crime, economic, environmental) other than the "Democrat party." You'd imagine that he wants to see a system more like Japan (liberalism with one party) or China (one party, no rights, no dissent.)

More generally I see the problem of "spamming the agenda", combined with the "if it bleeds it leads" tendency of the media.

Before Trump ran for president CNN's ratings were in the toilet, and it lumbered on between school shootings and MH370. To report on things like that you might have to take the Redeye from Atlanta to a flyover state, but today you can sit your ass in Atlanta or Washington DC and just cover 'what trump said', and 'what somebody said about what trump said'.

Since the election, the news has seemed to be "broken" in the way that a "clock" breaks. Arstechnica's Dealmaster always has a sale on Amazon Fire devices, Anker power banks, Xbox Gold, and whatever craptop Lenovo has a surplus over. The banner on CNN often seems to go unchanged for weeks. (e.g. "TRUMP WANTED TO FIRE MULLER")

COVID-19 was the first real news in a long term, but there is an obsession about "what trump said" compared to the other 300 million Americans. Trump doesn't really care if it is good or bad news, he just wants to be in the news, and if he held a press conference and nobody came, that would bring him to tears.

Similarly there is always some article pushed by a right wing group that says there is too much occupational licensing in the U.S (maybe true, but the same editorial turns up every month as if it was fresh) or that there is a trade-off between economic efficiency and inequality. (e.g. the evidence is that if you have too much inequality people tilt the scales, rich people buy T-bills depressing interest rates, and try to keep the action for themselves and their children... One of those reasons why Stalin invented the Purge)

> This was extremely harmful to the overall mission of informing people and created a false balance between sides which are not equally valid. Today, this is almost universally seen as a failure of journalism.

Is there data that show CNN debates on climate change correlate with a decrease in belief in climate change? This sounds like folk lore rather than reality to me, but I could be wrong.

You're questioning the conclusion rather than the premise(s). It's problem more helpful to look at the premises rather than extrapolating into an argument about the impacts of a particular debate style on popular opinion.

I didn't want to extend his comment to saying more than it was, or debate his beliefs on what good news is. That's an ethical debate and is always situational on what the news is about, who's watching the news, and when they're watching the news.

I'm amazed how many people responded to you with this broken idea of "let's hear everybody out."

Broken because the talent of listeners to fully evaluate every statement is not distributed evenly.

Germany and Europe learned this the very hard way after WWI and they drew consequences after WWII.

Yes, I fully agree with you: not all opinions are created equal. Because some are scientifically accepted while others are scientifically rejected. Some are philanthropic, some are antisocial.

And we have to take a stance if we want a more humane World, and media have to preselect if we don't want war.

Keep in mind in the old days people were probably arguing this exact point when saying we shouldn't invite people arguing to free slaves or let women into higher education, because "they are so obviously wrong it's harmful to even listen to them, African Americans do not have the mental acumen to survive in the wild, slavery is compassionate! You're arguing for their deaths!"

Anytime people argue about having a authoritarian dedicator style of hard line "DO NOT QUESTION THIS EVER" should remember you might end up on the other side of the fence.

We should always listen to the other side. History has shown people can believe ridiculous things for a long time and treat it as common sense.

I am showing my age when I remember PBS having an entire show where two sides discussed an important issue of domestic or international concern. It was called "The Advocates"[0]. It ran for five years, and included a moderator and two teams that basically had a debate as a TV show. Michael Dukakis, who later was the Democratic nominee for president, was one of the moderators. (Dukakis' Wikipedia page has no mention that he was on national TV for years before running for president, which is pretty odd.) My understanding is that The Advocates was a result of the fairness doctrine, where television licensees had to show both sides of a subject. When the fairness doctrine went away, so did that show. At the end of every debate, viewers were encouraged to write in about which side they thought won. One reason I read for the demise of the show was that the letters received pretty consistently favored the right wing argument.

The modern spin on The Advocates is Intelligence Squared[2].

[0] http://openvault.wgbh.org/collections/advocates/full-program...

[1] https://www.imdb.com/title/tt11014804/

[2] https://www.intelligencesquaredus.org/

I used to watch some of these IQ debates in college, they can be really good.

The longer format and strong moderation really lend themselves to interesting discussions, and you can learn a lot. One of my favourite debates of this format (though not IQ) is Peter Thiel vs Eric Schmidt (then executive chairman of Google) arguing about whether Google was still capable of innovation [2012]:


This is literally what news is supposed to do — that is enable multiple & unbiased perspectives on a topic, and if necessary, give equal time to bias opposing views.

Otherwise you end up with the “ministry of truth” — and we all know how that worked out.

Are you implying there is only one side one must be allowed to take on climate change?

Are you implying that "climate change is real" and "climate change is a hoax" must be treated as two equally valid opposing sides?

there are many things in between Your 2 options...

people can have different opinions on how much of it is caused by people.

for example moving to biofuels is being considered a big fuckup and this was done by the biggest supporters of imminent climate catastrophe.

There are reasons it is no more global warming but climate change.

more progressive people tend to move faster to progress but at the same time they make more errors while doing it..

Thanks for posting this. Mainly because of the discussion it brought.

Basically: airing of unscrutinised claims is propaganda, not news, even if done with both sides.

I wish we had more of this going on right now. TV now is pretty much "spoonfeed nation" and reality.

I would prefer to be persuaded than told what to think. Otherwise, journalists can tell us whatever they want to without any accountability.

> Today, this is almost universally seen as a failure of journalism.

That is a shame.

The problem is that for many people, someone appearing persuasive is orthogonal to them actually being correct.

Unfortunately, the alternative to persuasion is belief and trust.

I wonder if it’s interesting to collect a ton of article and compare them for similarity and where they deviate. So you get an article that shows agreement and deviation.


> Go fuck yourself, and learn how to think

Ouch. Accounts that post like this get banned here regardless of how wrong someone else is or you feel they are. Please review https://news.ycombinator.com/newsguidelines.html and please don't ever do this here again. (We had to warn you about personal attacks once before.)

So they present the facts and let them speak for themselves? Sounds like they had it right. Are you saying if they don't present their opinion its a failure of journalism? I think this new way of defining journalism is a bad standard.

Picking two people on the opposite extremes of a topic with a moderator to stop the conversation, often unsuccessfully, from devolving into a shouting match is not the same thing as presenting the facts and letting them speak for themselves. In fact a 10-minute TV segment is not a great venue for any complex presentation of views.

The level of knowledge needed to adequately parse an expert's opinion, evaluate it, check its assumptions, research the evidence supporting those assumptions, follow up with research on the validity of that research... well, it takes a lot longer than a TV segment. In a TV segment like this, the "expert" with the better grasp on rhetoric and rhetorical devices "wins" in terms of audience agreement, and factual & verifiable basis of opinions is a distant second in terms of influence on audience agreement.

Or strike that: it's probably a distant 3rd: 1st place influence is whether or not it agrees with a viewer's current opinion. 2nd would be rhetorical ability, 3rd would be any actual evidence.

So, by all means let the facts speak for themselves-- just don't let yourself believe that is readily possible in a TV segment like those referenced.

People don’t know what is fact and what is fiction. Giving 50/50 air time to climate scientists and climate change deniers is on one side presenting facts and on the other total bullshit and saying “hey these are both valid opinions- you decide”.

This isn't climate scientists vs deniers this is major news organisations on both sides; both of which are guilty of presenting total bullshit as facts. Allowing them to be viewed side-by-side permits users to to get a better image of who is willing to miss-lead on which subjects and get facts that have been omitted by the other side.

Just because one side might be more willing to mislead doesn't mean that their stance on a topic is more likely to be the incorrect one.

Showing which news sources employ the most underhanded rhetorical devices may be a positive goal in itself, but it doesn't, by itself, help the audience make their own determination on an issue. Even more of an issue is that a viewer's determination of which source is more willing to mislead or omit relevant details is much more likely to be influenced by prior opinion than by the content of either source.

Basically, the problem isn't, in itself, biased news sourced, its that the format is fundamentally ill-suited towards giving individuals enough information to come to a reasonably well-supported position on just about any topic of moderate complexity. Further take any topic that appears to be of simple complexity and scratch the surface a bit and there's a decent chance it will turn out to be not so simple.

Respectfully; it does. Not 100%, but for the most part it does. The moment you have to lie to make your point you concede that your argument never had a grounding in reality.

Even if putting that aside, the utility of bringing to light underhand tactics isn't meant to be used in and by itself but instead serves as one of many aspects of debate to help decide what is right/wrong true/false.

Regarding the poor suited nature of news for getting the full story across to reader I fully agree but again just because a tool isn't perfect, it doesn't mean it gets cast aside; more perspectives (and these are mainstream organisations) on a subject doesn't hurt at all.

I agree that the side that is more deceptive & manipulative in their persuasive tactics will tend to be the ones with less potential substance, my point was only that such a scenarios isn't necessarily the case. Even an "honest" person can find themselves coming to the correct conclusions for the wrong, faulty reasoning. In such cases They are only accidentally correct. In the hands of someone that understands that facts don't win arguments, but none the less believes their "side" is correct, it is all too easy to justify sensational, emotional arguments, rhetorical flourishes, etc in an "ends justify the means" sort of way.

I don't have an answer on the issue of news organizations being poorly suited here. On the one hand, there is an appeal to your the idea you convey that something is better than nothing. However, that status quo is also what has lead us to the current situation. There is a correlation with the rise of 24-hour news networks and the internet with the increased vitriolic, polarizing, and propagandist tone things. The need to fill air time was a big part of that. I don't wholly think that was the cause. There was some trend in that direction already:

Note to readers: This next part is not intended to cast blame only in one direction. It is simply one concrete example of the type of things that became commonplace.

Around 1990 Newt Gingrich penned a memo for titled "Language: A Key Mechanism for Control" It went on to explain how language could be used to manipulate people, complete with a guide for how to use demonizing dehumanizing language against political opponents. Over the years it was systematically disseminated through his party, and when Newt became house Speaker around 1995 he literally made it required reading. Shortly after is around the time that the term "liberal" went from being a fairly neutral description like "conservative" to being a hated moniker for a political opponent. (Though right-wing, alt-right, etc., fill that purpose. now for the other side)

It’s not about presenting an opinion, it’s about moderating a debate with facts to inform the public.

A journalist’s job is to try to present the truth, not create false equivalence.

The US really needs to change their political system so that it's no longer "the winner takes it all" politics where only two parties can effectively rival. This is so poisonous for everything. No nuances in the discussion. Truth comes under the wheels. Stupid slogans win. No chance for larger green/ecological movements besides the overwhelming topic of economics. Ridiculous gerrymandering. It's just very sad to see.

So something like the super-dysfunctional political systems of Israel or Belgium or Italy or South Africa?

Fixing first past the post voting is not some magic cure-all.

You don't have to select (from your point of view) dysfunctional systems to make a point.

I feel as a world citizen but I am German; I know the system that was built after the bitter lessons of our history. A proportional representation (with an x% threshold to limit the number of governing parties) is really key to preventing two large parties from dominating everything and becoming the only rivals with the resulting polarisation of the political discourse.

(Free higher education would then be the next step.)

We use the MMP system in NZ and it seems to work quite well, though needs a bit of fine tuning as currently you need 5% to get into a coalition and that is slightly too high.

Since New Zealand passed MMP 30 years ago, every Prime Minister has come from the same two parties that have dominated NZ politics since 1935.

Two party systems also are built on coalitions. It isn't as if "Democrat" or "Labor" is a unified ideology.

I'm all for MMP but I don't think it will fix any problems in American politics.

Just introducing MMP would add another veto point in an American political system that already struggles to actually accomplish anything.

*Accomplish anything good.


It's better than what you've got.

I’m sure this is not an original comment, but it’s interesting to see what you’ve classified as left/right. It must be difficult given that there is not really one axis of left/right and that “the centre” is highly relative. To me, seeing the BBC and Euronews in the “left” section is pretty funny, but I guess it’s true relative to US politics.

Is there anything you’ve learned about “the left media” and “the right media” from doing this? Do you think your sources are equidistant from “the centre”?

You are describing the https://en.wikipedia.org/wiki/Overton_window - and in the US it is very narrow and entirely shifted into the authoritarian right side.

The fact US politics call them "left" and "right" is meaningless even, given how right-shifted US politics are (By EU standards, only US extreme-progressives are actually in the european "Left").

Not to mention americans have demonized "the center" as some "if you're somehow trying to consider all the facts you're a coward who can't decide" type of thing. The two sides being "at war" drives TV/website engagement and that's all that matters to the people writing the headlines.

It just so happens that currently, one of the two "sides" relies heavily on disinformation; so anything that tries to fight disinformation (including remaining impartial) is that side's enemy. So those things end up being considered "left-wing".

Truly, the united states has four political parties: The Media Left, the Media Right, the Political Left, and the Political Right. Nearly every american you know is part of the first two; the last two don't make for good TV.

The "wisdom of crowds" thing is supposed to work like those fairground games of guessing the number of sweets in the jar; everyone has their own answer, but you'd expect the answers to be roughly in a normal distribution around the true mean.

Suppose you have a bunch of sincere answers of 1867, 1957, 2101, 2057 sweets, and then someone comes along with an answer of two million. They get a bunch of astroturf accounts, a few celebrity retweets, and suddenly thousands of people are claiming the answer to be two million, skewing the average up. They can even start using the fact that the other answers are different from each other and the average as evidence of their bias!

It is extremely difficult to determine honest effort vs dishonest spin, but it's absolutely vital to the process because you have to exclude the spin guys and all their sockpuppets before you do any kind of averaging or consensus.

How are you deciding whether a source is right or left leaning? Is the rating a manual process based on some set of factors, or are you using an external rating source?

IMO, Media Bias Fact Check [0] is on of the better external rating sources if you haven't come across them before.

[0] https://mediabiasfactcheck.com/

Thanks! Yes I use this a lot for determining if a source is Left or Right. Its great!

Since people here seem to be digging your site (well, those that can access it right now anyway), I'll mention https://www.allsides.com as well. They also add a "center" viewpoint, which yours may do too, but it's getting hugged to hard for me to see right now.

I used allsides.com for my site too. I really like how transparent their process for assigning bias is especially compared to other sites that try to place news on a spectrum. They essentially try to assign a bias by having someone with each political bias assign a rating to the news and then collect how strong the crowd agrees or disagrees to that rating.

The thing about which stories to cover isn’t only about Left/Right. It’s mainly about what draws most attention.

I find most media biased towards conflict in that regard. The primary goal is to trigger emotions like rage and fear.

So for my self I find I’m happier not consuming “news” at all.

I like the phrase "transparency is the new objectivity".

No human is truly objective, so being transparent and honest about biases, and making good-faith efforts to be fair-minded, is likely as good as it gets.

That’s great.

Nothing is never neutral so it’s best to show different sides of issues to make readers figure it out themselves (as easily as possible, but not as opinionated as popular news outlets)

There are a bunch of fallacies on the work here.

There aren't usually 2 sides to a debate. The number of sides varies wildly and some times is even 1.

All sides on a debate are not equal. There are perfectly fine reasons to honestly disagree about issues, but there are plenty of dishonest ones, and there is way too much dishonest coverage of things on the news; there are plenty of uninformed people with opinions that don't hold any water; and there are plenty of people just not interested on truth.

Always putting 2 sides on equal footing in a debate is dishonest. It's made to have the appearance of a honest debate, while suffering from the same problems of a single sided view.

Wow this is very similar to what I just started working on: https://bifocalnews.com

My strategy for picking headlines is to let it happen democratically on Reddit where the bias is explicit in the community rules of the subreddit. I have not yet found a great "center" news source or community.

I really like your news layout and the scrolling headlines at the bottom!

FYI, I'm not sure what the site is doing but it ended up using up a ton of memory and I had to kill firefox to unlock my desktop (FF 76, Arch Linux).

That is a brilliant idea!

Left, Right & Center podcast also tries to address this. They pick a topic and try to get both sides of the argument.


To me, picking a single topic and trying to get to the bottom of it seems like the approach that journalism should take (and advertises itself as taking, I'm quite sure), but the reality is anything but (in my opinion).

I was excited when @oceanbreeze83 said:

> For any news junkies heres, I've built https://maagnit.com which gathers both Left and Right leaning sources [for any story] and displays them altogether on one page.

But unfortunately, I see nothing that does that on a per-topic basis.

> Left, Right & Center podcast also tries to address this. They pick a topic and try to get both sides of the argument. https://www.kcrw.com/news/shows/left-right-center

It looks like these guys are actually taking a shot at it, but then if I pick a topic that I know is subject to extremely propagandized (framed) reporting, ObamaGate, something that I happen to know a fair amount of lower-level detail about (but for from everything, it is an insanely complicated topic), and read their overview:

>> https://www.kcrw.com/news/shows/left-right-center/obamagate-...

>> President Trump is very upset about Obamagate. It seems to have to do with his former national security adviser, Michael Flynn — who the president fired after he lied to Vice President Pence and the FBI, and who pleaded guilty to charges that the Department of Justice is now seeking to drop. Is this a really important political issue? Or is this just President Trump’s effort to talk about anything besides the pandemic.

...I don't get a very strong feeling that what follows is going to be a sincere effort at truly "getting to the bottom of it", as much as is possible.

But this is just an intuition, I'd have to listen to the whole 55 minute talk before forming a tentative conclusion.

I find /r/NeutralPolitics to often be quite good on many topics, due to their very well thought out approach:

>> What is Neutral Politics?

>> Neutral Politics is a community dedicated to evenhanded, empirical discussion of political issues. It is a space to discuss policy and the tone of political debate.

>> Is this a subreddit for people who are politically neutral?

>> No - in fact we welcome and encourage any viewpoint to engage in discussion. The idea behind r/NeutralPolitics is to set up a neutral space where those of differing opinions can come together and rationally lay out their respective arguments. We are neutral in that no political opinion is favored here - only facts and logic. Your post or comment will be judged not by its perspective, but by its style, rationale, and informational content.

>> Neutral Politics is strictly moderated. Our full guidelines are here: https://www.reddit.com/r/NeutralPolitics/wiki/guidelines <--- Very much worth a read for those who are genuinely interested in learning more about how a "purely rational" society/organization should approach controversial topics. I would say that this is "how Journalism should be done".

Unfortunately, they seem to have only one post on the ObamaGate topic.


The submitter's question (see link for specifics) seems like an excellent way of approaching the question, in that he provides an example of how each side is framing/spinning the story, and then proceeds to ask a fairly awesome 5 part question on the matter. The post doesn't have much in the way of comments unfortunately - it's only 21 hours old, so maybe some will roll in eventually, but I suspect we'd have seen something by now.

Regardless, the fact that someone recognizes the true problem and is trying to do something about it, and there are 295,123 subscribers to the subreddit, is quite encouraging. But I wouldn't get too excited about the notion that this grassroots effort subreddit will get enough traction and subsequent publicity to make any serious change in the world. For that, I think any initiative needs help from famous people who will repeatedly promote it via their social media channels. But my intuition tells me that most people are so unconsciously biased, that they would be reluctant to do this, as (I speculate) the mind will sense significant risk in promoting an unbiased platform, and therefore decide against it.

I think this is the very same underlying subconscious phenomenon that @TulliusCicero and @dang are talking about in this post, but from a bit of a different perspective (italicized emphasis mine):


>>> (@dang) The underlying phenomenon, I think, is that people feel insecure in an internet forum, especially a large one, because the sheer quantity that shows up there is bewildering and our wiring did not evolve to process anything like that. Instead of seeing it as a statistical cloud produced by thousands of people (which is what it really is), we interpret it as the productions of a small group of individuals (which is how we're wired to see the world). Since that's such a distorted interpretation, those imaginary individuals seem weird and dramatic in our imagination. Either they seem super smart (because of all the information we had no idea of), which makes us feel dumb, or they seem monstrous (because of all the views that seem outrageously wrong or offensive), which makes us feel surrounded by enemies, if not demons—and so on. Because these feelings are uncomfortable, we end up creating an image of the community that we can diss in order to restore our sense of equilibrium towards it. The problem is that if everybody's doing this (and I think we all do it to some extent), it makes community really hard.

>>> (@dang) There's a long sequence of past comments on a related mechanism here: (see link above for two searches he references).

And also, while /r/NeutralPolitics have wrung about as much value as possible out of the generic Reddit reddit technical platform, I suspect doing this "right" is going to need a completely new platform, designed from the ground up specifically for this purpose. I assert that this is the most complicated problem mankind has ever had to address, it makes flight, putting a man on the moon, or splitting the atom look like a walk in the park. But if we ever hope to solve it, people will first have to realize the magnitude and complexity of the problem they are dealing with.

If we ever want to get the current state of affairs on this planet sorted out, I think some truly(!) independent organization, and platform, that is completely controlled by "the people" is an absolute pre-requisite. But I am not terribly optimistic that we will ever get one, at all, or that gets enough traction to make a difference. Just taking many of the comments in this thread as an example, it seems clear to me that there is significant intuitive opposition to a purely fact-based, freedom of speech approach - by this I am referring to how so many people frame/conceptualize this approach as a ~"false equivalence", that it gives the two sides equal "respect", something that does have some truth to it, but the manner in which people describe that problem is typically by picking the most extreme strawman example they can conjure up (exposing their subconscious bias in the process, imho).

Just a few examples (chosen at random, not the worst of the worst by any means):

> When I was a kid CNN used to do this: they would invite a single climate scientist and a single climate change skeptic and have them make their points on equal time. CNN would then say “who’s right? You decide.” And the program would end. This was extremely harmful to the overall mission of informing people and created a false balance between sides which are not equally valid. Today, this is almost universally seen as a failure of journalism.

> "Of course there are reasons why a gubernatorial election should not be decided by a ski race, but are there also reasons why it should? For the sake of fairness, we’ve brought in two experts with opposite opinions, who will now have equal time to just say those opinions, because that’s what news is."

> For example does it make sense to present advocates for science and pseudo-science on the same level, with equal weight? Putting a Biologist who explains Evolution on the same pedestal on equal time as a Priest who believes in Intelligent Design creates a false sense of balance between these viewpoints, even though these viewpoints are fundamentally different in their very basic quality.

Looks like it’s getting hugged to death now, but I’ve been looking for an alternative to Google News for a while now. I want an aggregator that doesn’t “learn my preferences” and tries to collect balanced coverage. Thanks for building this and I can’t wait to check it out.

Cool - always thought of building exactly that. Do you just have a list of left/right sources, or some kind of "AI" to classify each item?

I have been a paying subscriber to Ground News for that reason, the app shows me bias very transparently. Glad to hear there are others doing this

a piece of feedback: please move the 'close' button on the pop-up video player to the top-right, where users usually expect it to be (ideally I would prefer no pop-up player at all, but I guess you had your reasons).

This is a really good suggestion. I will mak this change this weekend, thank you!

I like it. I would like to know how you did it!

I think the biggest problem in US Society is the idea that there is just Left and Right. Often they are both idiots.

The idea that our options for Covid are: 1) open up the economy and let people die until we get herd immunity 2) Keep every one home until we starve or get a vaccine.

The correct answer may be complex like Taiwan's 124 point plan https://en.wikipedia.org/wiki/COVID-19_pandemic_in_Taiwan

This is literally just a sqlite db with rss feeds and a python script with a very misleading description.

The real purpose of this post is to get traffic for their paid news api.

Also this closed issue is hilarious! https://github.com/kotartemiy/newscatcher/issues/3

Depending on what you're trying to do, there's also newspaper3k: https://github.com/codelucas/newspaper

It's quite easy to get "good" extraction for large numbers of outlets/articles without a massive amount of special-casing, as news articles are nearly universally marked up with RDF metadata (partly for Google News's benefit). Article discovery, and perfect parsing, is quite a bit harder. I ended up rolling a new Scrapy project with site-specific parsing code for an academic project as I had quite specific requirements.

Hey, I used this for my newsbetting site - https://www.rashomonnews.com/

I haven't pulled any articles in a while, so it's a little outdated but I love newspaper3k.

Is your code on github? I'm actually working on something very similar, and would love to get some ideas from what you've done.

This doesn't "collect" anything, it's just a python package wrapped on top of a sqlite database. If you choose an arbitrary website that doesn't have information in said database you just get "website not supported". There's not even any logic to guess a website's RSS feed URL... And in the end it's just an RSS feed reader?

It's just a python package wrapped on top of a sqlite database. Yes, it is!

Though thousands of ppl found it useful

What would the logic to guess the RSS feed URL look like? I suppose it is easy for wordpress sites, might not be so simple for others?

Common way is to set a link-tag in html with type rss, so others can discover the feed to the active url. If this is not set, chances are slow that there is a proper feed available. Not that you can't still try guessing and googling for it...

This looks like a cool library, but I have a question about the newscatcher API. How does the licensing work for the content? Seems odd (but great) that I can just read the news in my terminal from NYT but not pay a subscription or see ads. I read in some of the comments it's an RSS feed, is that freely available all the time? Surely even the RSS feed is protected with copyright and has restrictions on republishing?

If that's not the case, the larger implication here that the news is free if it's in a format that is not as widely used (RSS) compared to what the mass populous uses (mobile browsers/app).

Cool library, thanks for sharing!

Hi. Co-founder is here. Short answer. We do not know if it is legal!

It seems a little risky to build a paid service that you're not sure is legal.

Also, the Terms of Service, Privacy Policy, and GDPR Policy links in the footer of your site don't work. They all have empty hrefs.

Will be updated once we start to sell

Why is this being downvoted? Seems like a viable question.

I'm genuinely curious how this works but maybe it's being perceived as "I'm entitled to free stuff on the internet" (I am definitely not one of those people).

FWIW I had the same question, and the way you asked it seems legit to me.

Why did you choose to ship the list of RSS feeds as an SQL database? This makes it hard to keep it up to date and submit pull requests with additional sources. Would it not be better to keep that info in a json file or a dict / list in a python module?

Hey. Yes. You are right. We will change that!

yeah, git isn't really meant to handle binary files

The metadata of language, topic, rss url, etc, is nice work.

For use outside of python, here's a gist with a sorted list of sites and a sqlite dump of the site data: https://gist.github.com/tyingq/8e921eed10bf2ecf9c40ebdd70ff1...

Hi looks interesting and useful! What's the differences between newscatcher (python) vs. newscatcherapi.com? Is there any limit on using newscatcher (python) as shown in the pricing pages?

Also, I was looking at https://newsapi.org. How does your python API compare? I see that in newsapi, it also get old articles; do you implement similar features?

Hey. Newscatcher (python) is an open-sourced package that we developed for users' side projects. There are no limits. You can even modify it for your own needs.

Newscatcher API is a product that allows you not only get the latest articles but also search by keywords, topic, country etc. Basically, the main feature is that you search for articles that contain a specific word or phrase. This can have an added value for professionals and companies. There will be a free plan for developers with limits and chargeable for more heavy usage.

Compare to newsapi, we are less expensive for the content. We will not be able to get old articles from today. We began to stock data couple at the release.

Hope I answered all of your questions.

Do you guys have any plans to create webhooks that let you know when a feed/search is updated with a new article in real time?

Hey. Not yet. Let us think about that.

It's probably not worth it. Anyone needing near-real-time feed webhook updates will probably build their own scraper.

There's seems to have a lot of news crawler APIs lately, a quite google and found these:

[1] https://currentsapi.services/en/product/price

[2] https://aylien.com/news-api/

[3] https://www.cityfalcon.com/products/api/financial-news

Also, newscatcherapi.com has a real suppport for 10 languages. And yeah, we are like 10 times cheaper

Disclaimer: I am a co-founder

How complete are the articles that are returned? Last time I looked into RSS news feeds a lot of sources would just put abbreviated / teaser content in and then try to get you to click through to the story on their site. That obviously didn't make for the best experience with an RSS downloader though.

I work on Full-Text RSS which can help convert abbreviated feeds into full-text versions. The idea is you'll get a new feed URL from Full-Text RSS to use instead of the original partial feed in your news reader or application.

Free to try here: http://ftr.fivefilters.org/ and code for a slightly older version available here: https://bitbucket.org/fivefilters/full-text-rss/src/master/

Your git repo contains the dist folder even though its in .gitignore, might be good to remove that. No need to checking generated artifacts.

Thx. Will check.

As others have noted, this doesn't seem to collect the full article text, just stuff that you would get from an RSS fee. From the title, I expected something more like newspaper3k[0]. I used that for an NLP class during undergrad to collect full-text news articles, in conjunction with Selenium (many mainstream sites don't work with just plain wget or requests).

Lately I've starting using EpubPress[1] to grab full-text articles and generate an ePub, which happens every night via cron. Then I can get a full digest on my iPad over sftp at my leisure. Sadly EpubPress is not very sophisticated, sites like Bloomberg or ArsTechnica return "are you a robot" challenges which it can't bypass.

I wish there was some kind of community driven library for retrieving full-text articles from common sites. In my vision of how that would work, users would contribute hand-crafted Selenium scripts to download and extract the article text, bypassing the bot-detection for each site. Then something like EpubPress would work a lot better.

The "modern web" just has too much junk to be interesting any more. Sometimes news sites publish articles I would like to read, but I'm not interested in dealing with 1000 different implementations of crappy mobile UIs, advertisements, animations, etc. I know reader view exists, but you still have to wait for the page to load, and it doesn't work very well for some sites. For the sites it doesn't break with, the experience with EpubPress is much better.

0 - https://github.com/codelucas/newspaper

1 - https://epub.press/

Neat! Pair this with some sketchy ad tech, and you can start raking in the bucks.


I would be more interessted in a ready-to-use server which I can selfhost and to which I can throw any script for collecting data, filter and process them, which also handles errors and storage.

At the moment the only real solution for this seems to be using your own RSS-Reader (Tiny Tiny RSS for me at the moment), which has the disadvantage of being limited to RSS-sources, as also not allowing much filtering and processing. But with more and more sources moving avway from RSS, I want something which can be fetch alternative sources and integrate them into a unified interface.

In best case it would be even work with any language, be it a shellscript or python, ruby or even java.

Well, we hat is quite similar to what we are doing at newscatcherapi.com.

We collect tons of news data and let you query it with an API

Not selfhosted, isn't it? By which I mean it costs money.

Yeah. Though. It will have a free plan. And our hosting cost quite a lot so (imho) it’s almost always must cheaper to buy a ready solution.

This seems really cool!

How does it work internally? Is it downloading the news from a RSS or is it crawling the content of the website? Or is the content coming from an external service?

How are the feeds selected? Can we add more? who is maintaining them (in case the data is crawled)?


I saw this at the bottom of the README:

The package itself is nothing more than a SQLite database with RSS feed endpoints for each website and some basic wrapper of feedparser.

hey, there is a sqlite DB that stores the RSS endpoints. Then we use feedparser python package to parse it.

Then why should I use this instead of a real Feedreader? What advantage has this?

I think you mean feedparser. It knows how to parse the feed. So you have to give it the feed’s url

No, I mean feedreader. A Feedreader is a service which collects and parses rss-files, then makes them accessable in an interface for the user. Similar to a mailclient, but for rss.

This package seems to do the first part, collecting and parsing, but lacks the interface. So what is the point of it?

And if this is maintaining it's own config for sources, can I even add my own sources? Or is this just an elaborated OPML-file with attatched business-logic?

Hey. Alright. Thanks for letting us know. I will check it tonight.

I see, thanks for the answer and thanks for sharing!

Seems like a repost of previous Show HN [1].

[1] https://news.ycombinator.com/item?id=22407835

What does the normalized news means? Able to query from a predefined RSS feed? Because I was expecting news coverage from different partisan source pointing to the same news event.

Neat project! Definitely useful to have this, folks can build "headline-edit trackers" or services that collect news using other filters more easily using this.

This looks great!

I read your blog post about how open-sourcing helped you find testers, which is a great step for such tools.

* Shameless plug *: Our web service, Feedity - https://feedity.com, helps create a custom feed for any news webpage, via a point-and-select feed builder and REST API.

I've been experimenting with a side project: www.glancereport.com, which pulls the top headline from a variety of news sources. You can also sort these headlines by neutral, progressive, or conservative sources.

I'm still learning how to play with Python and Redis, and it's not perfect, but would love feedback.

I see that the Neutral/Progressive/Conservative filter is most applicable to Politics news, but I'd be interested in seeing if you can filter by topic like Business, Sports, Lifestyle etc.

There is a similar library for this called newspaper which I had used in my undergrad thesis. Not dismissing this work, but I am curious about what it is offering on top of it? It doesn't offer a comparison to the newspaper, at least not in the Github page.

The demo gif is really long (a few minutes), but it's well worth watching and summarizes the capabilities of the library well. It'd be cool if you could register additional sources via a registration API or some passed in configuration.

This would be very cool - a bit like how youtube-dl handles non YouTube websites.

Self-Plug: https://www.hvper.com (Official Successor of popurls which more or less started the single page aggregator craze.)

That pink bar takes up almost half of my screen on my iphone 11 and makes the website defacto unusable.

Not to be unkind, but you should fix that. Instead of being incentivized to pay you, my only instinct was to close that page asap.

That's by design — It's not a charity service.

How about you switch to desktop mode on your browser instead of forcing sites to be mobile friendly so you can consume it on a tiny screen. Half the web is broken because of "responsiveness".

To be fair, it's not broken — but coming up with clever viewport-dependent layout changes takes up enough of resources to consider it a low-level priority for some business cases.

How about I just close it and never open again?

Looks good ! How do I get the BIG pink bar away ? I would rather see ads (to support the free version) than that BIG PINK BAR ?

You can either cover my monthly 5 digit server bill or upgrade to a paid version. Sorry to be the bearer of bad news ;)

My initial reaction is very positive. I believe there is a market for a tool like this. Lets see how it handles marketwatch.

First, kudos to the OP. If you don't mind, would you compare and contrast it to newspaper3k?

Hey, looks cool. Which news sources are available for the italian market/language?

hey, you can get the news sources by `it` language and check it


How do subscriptions work with this? Or is it treated like anonymous browsing?

Perhaps nice to show news headlines as an alternative to /etc/motd

What is "normalized news"? What does that even mean?

This looks amazing.


I'm not sure how to parse the following sentence:

By newscatcherapi.com (this package is fully self-sufficient, you can just use it. No dependency on external services/API)

I interpret this as

Created by <api provider> but you do not need anything from us or from anyone else to get the software going, it just works out of the box.

Well the 'anyone else' part is wrong, someone has to provide the news.

I am going to put it on a README. Thanks a lot!

We tried to make sure that everyone understands that it does not depend on newscatcherapi.com products/services)

Will this work for fetching financial data from the public bloomberg.com website?

I see your downvotes. It means I am on to something. I will try.

> "Almost any website"

Is there a list of supported websites?



So, the centralized, paid version of RSS?

It's a Python package to collect news data. Nothing paid

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact