Media outlets should be required to publish a correction of a story at least as visible and for at least as long as the initial incorrect version.
This would prevent a site from first claiming something and then burying the correction somewhere where nobody finds it.
I would very much love if the media outlets agree on self imposed "best practices for corrections".
Which in the end should be a benefit for those complying because they can advertise it and be more believable
I'm curious, how would you punish those who did not comply?
Publishers would just silently edit the article, forcing the federal government to monitor all news outlets content for changes. In the face of this daunting task, some naive senator would suggest that all news first be submitted to the fed for review.
The only solution to this problem is the nationalization of the news media, which would be a very bad thing. Sometimes the only way to win is not to play.
They're like Vox, minus the self-back-patting about intellectual honesty and plus the _actual_ intellectual honesty.
Parts of Loudoun are very sparsely populated; indeed, it has the most unpaved roads of any county in VA. But since 1980, the population has grown from 60,000 to 385,000. Almost all of that growth has been in Dulles/Leesburg/Ashburn, etc. So in terms of percentage of households who have broadband access, the other parts of the county doesn’t have much of an impact on the overall number.
But for what it's worth, a lot of US DSL still wouldn't meet broadband definition, even using your lower speed definition. AT&T DSL (the non-FTTN version) in Michigan maxes out around 6mbps down for most people.
Also, the selection of left-leaning people doing publicly accessible research is, AFAICT, a self-selection. The problem isn't that our institutions are biased, but that conservatives (in general) choose not to pursue scientific careers in the same numbers as liberals. This suggests a fundamental limit to your proposed solution because there aren't enough "heterodox" collaborators to ensure viewpoint diversity. One of the studies published on the website you mentioned even suggests this underlying cause.
Also, I'll note that in the vast majority of academia, internal politics (what research methodologies, maths, etc. someone likes to use) are far more of a problem than external politics (of the sort you'd typically call "politics" in general conversation). E.g. I'm not at all worried about "liberal bias" in physics or chemistry or CS; if it's a problem at all, then it's a minor one compared to other problems.
As for left v right, it's true that adjunct faculty who teach English are rabid leftists because they are working four jobs that are 60 miles apart, have no health insurance, qualify for food stamps, etc.
One reason you see so few rightists in conventional academia is that they can go work for places like the Manhattan Institute, Hoover Institution, etc. and get paid much more, not have to teach, not deal with the rat race. There is a continuous drumbeat of reports on how there is no broadband problem in the U.S. that is funded by the industry -- these don't get much attention outside the trade press because normal people know that students camp outside the school some nights because they don't have broadband at home.
The provider (or the provider's CSR) is being lazy. They absolutely do have the data.
In AT&T's case, they have a Java software interface ("CPSOS") into which you can plug in an address and get back the distance in thousands of feet from the address to the nearest RT or CO (major network interconnection). The distance between those points gives you an estimate for the maximum DSL speed that can likely be supported; you can guarantee up to 6mpbs down at up to 5k feet if I remember right, and it tapers off from there to 10k feet, at which point you start saying, "we might be able to get you 512kbps but we're not sure, we'd have to try installing it and see what happens."
The sense you get, both as a consumer and as a service provider that gets a little more of an inside view into these companies, is that they absolutely do have the data and the capability, but that isn't where their priorities are. They just want to make a buck with as little capital investment as possible, and everything that they do reflects that.
The degree to which self selection plays a role is an interesting question. Firstly, there is a certain amount of overt harassment--for example, professors in the humanities and social sciences openly admit that they would be inclined to reject a candidate if they knew he or she was conservative. Also there are lots of other data, anecdotal or quantitative. Much more is needed, however.
Importantly, the proportion of conservatives to liberals has been falling since the 90s, so something is driving the trend. It's also possible that conservatives are selecting out of a career that they (evidently correctly) perceive to be hostile to them or otherwise corrupt (more interested in furthering an agenda than seeking truth). The point is that even if it is self-selection, we may be able to improve the ratio enough to improve the science. And it's not like the ratio needs to be 50:50--probably 30:70 would suffice. Right now many fields are less than 10:90, and intuition suggests that the effect is nonlinear. If only 10% of faculty are conservative, intimidation causes them to suppress their already marginal voice, but 30% might be enough to allow them to feel secure in providing the sort of criticism those fields seem to desperately need. Of course, I'm picking on the academy, but the same probably holds true for journalism.
Again, the point isn't to assign blame, but to improve the science.
I'm sort of making two only vaguely related claims here.
Claim 1: In this particular case, I don't think bias is even a problem, because we shouldn't even be doing this leg work on behalf of ISPs in the first place.
My point is that ISPs choose not to contribute to the quality of publicly available information. When entire industries (or governments) willfully hide data that's necessary for a populace to make an informed decision, I default to a worst-case assumption on that data. Not because I think that's most likely, but out of contempt for the lack of transparency and as a forcing function for greater transparency.
In other words, these "biased" researchers are already a hell of a lot more charitable toward the ISPs' position than I think we ought to be. If ISPs aren't willing to share granular data on network accessibility, then assume it's a major problem until they muster up the will to share that data.
>... the humanities and social sciences...
Claim 2: On this issue, I think it's helpful to discuss "critical studies" and related departments separately from the rest of "academia" and separately from the natural and mathematical sciences in particular. I don't think that the liberal bias in natural/math sciences has the same underlying causes OR the same long-term effects.
I.e., I don't think that chemistry departments are hiring based upon political beliefs, and I also don't think that political homogeneity effects the quality of chemistry research.
You also have to explain why political diversity is even important in the face of other forms of homogeneity. E.g., I'd rather a very liberal, very theoretical physics department hire a flaming liberal experimentalist than yet another theorist who happens to be conservative. The importance of political diversity in humanities and social sciences is obvious, but I don't see why science would suffer if all algebraists were libertarians...
My assertion is that if you want a politically diverse chemistry department, you have to 1) justify why that diversity is even necessary / more significant than other major problems; and then 2) probably also solve different underlying causes.
> Of course, I'm picking on the academy, but the same probably holds true for journalism.
And also literally every other profession, from clergy to CEOs to career criminals. There's even a trite quote about it.
I respect that your opinion is that ISPs have a social responsibility to disclose this information so think tanks and academies don't have to find it. I'm curious if anyone thought to ask them, but that's neither here nor there. I don't generally expect businesses to provide data apart beyond that which is required by law, but I'll concede the point since it's unrelated to my claims.
> Claim 2: On this issue, I think it's helpful to discuss "critical studies" and related departments separately from the rest of "academia" and separately from the natural and mathematical sciences in particular. I don't think that the liberal bias in natural/math sciences has the same underlying causes OR the same long-term effects.
I agree, and I intended to. I should have done better, but I quickly authored that comment from my phone in the waiting room at the vet clinic.
> You also have to explain why political diversity is even important in the face of other forms of homogeneity. E.g., I'd rather a very liberal, very theoretical physics department hire a flaming liberal experimentalist than yet another theorist who happens to be conservative. The importance of political diversity in humanities and social sciences is obvious, but I don't see why science would suffer if all algebraists were libertarians...
Again, I don't mean to apply my "political diversity" quip to apolitical fields.
> My assertion is that if you want a politically diverse chemistry department, you have to 1) justify why that diversity is even necessary / more significant than other major problems; and then 2) probably also solve different underlying causes.
I agree, and I don't think political diversity is important in chemistry, since few chemistry topics are politically polarized.
> "Of course, I'm picking on the academy, but the same probably holds true for journalism."
And also literally every other profession, from clergy to CEOs to career criminals. There's even a trite quote about it.
Probably some degree of political diversity is good for all fields, but it's uniquely critical for politically polarized epistemological fields.
1) How do they define "Access"? Does it mean actual subscriptions? Does it mean the building/home is connected? Or does it mean a line passes the household, but there's actually no way to connect to it? (Look up New York City's lawsuit against Verizon's FIOS rollout.)
2) How do they define "Broadband"? In 2010 the FCC defined it as 4 Mbit/s down, 1 Mbit/s up. In 2015 they redefined it as 25 Mbit/s down, 3 Mbit/s up. Currently I have 50Mbit/s down and 1Mbit/s up which Comcast absolutely defines as "Broadband", but it doesn't meet the FCC definition.
As illustration: The UK has some community owned projects which supply Internet access to otherwise under-served rural areas. The government ensures commercial suppliers don't run "spoiler" projects (e.g. when a community project announces they're coming to some village, the incumbent Telco can't suddenly remember they decided to run ultra-fast Internet to that village with a free introductory offer)
B4RN (Broadband For the Rural North but pronounced "Barn") is the most famous. They run dedicated fibre near to all the properties in a rural area, if you own some cottage in a little country village they're serving, the fibre probably runs in the grass outside your fence.
But people (probably volunteers since it's B4RN) need to come dig up your lawn and run cable to your cottage before you can actually use the service. B4RN charges a fair amount of money for this setup, although it's waived if you instead pay even more money to become a shareholder in their corporate entity. But still, the cottage would have "Access" once that fibre is outside, because B4RN have a commitment to do the install if you pay for it.
It's silly to say one cottage in a row doesn't have "Access" to B4RN's 1000Mbps symmetric Internet just because the old lady who lives there doesn't need the Internet. If you bought that cottage you could fork over the price of a Nintendo Switch and get Gigabit within a week or so.
Subscription _could_ be a good metric if we were arguing that Internet service is unacceptably expensive in specific regions and thus prices out the lower middle classes or working classes, but that's clearly not the focus of Vox's investigations.
You're right that it's important to classify "Broadband" and give some idea how fast that actually is. Certainly that's something that needs to be in the footnotes for such an article.
Affordability is actually the main point of 538's second article: "Lots Of People In Cities Still Can’t Afford Broadband" ( https://fivethirtyeight.com/features/lots-of-people-in-citie... )
However, that article is centered around the data for Washington D.C. and their retraction specifically talks about the data for D.C being incorrect.
I admit that "a few billion" which is clearly beyond what I was suggesting might open doors, but not by just signing up somewhere.
For example, the cable TV company offers Internet service to many people in my city, and indeed on my road. They put unsolicited advertisements through my door every week. But, despite the insistence that I could have "Super fast fibre-based Internet" tomorrow, in fact I can't, my building isn't connected and the owners of the building (an opaque off-shore property holding corporation) would need to sign off on work to change that, which they won't do even if I wanted it.
Now sure, I could buy all the flats in the building, then leverage control of those flats to force the current owner to sell me the building, and then I'd have the right to get that work done. That means now I'm paying maybe £2-5M to get their merely "super-fast" broadband to my flat in a major city. Short of your "billions" but not by so many orders of magnitude...
Or another example, for a few years the incumbent telco technically offered FTTP to any customer who could get their existing FTTC product, for a fee they'd work out what it cost them to run the extra cables and so on, add a percentage and if you agreed you'd pay that to get FTTP. The service was weakly advertised but immediately over-subscribed and it was effectively cancelled altogether without more than a handful of deployments. A few billion wouldn't do much about that I think, you'd need tens of billions to buy that incumbent telco and "change their minds" about the priority of such bespoke services that way.
> I admit that "a few billion" which is clearly beyond what I was suggesting might open doors, but not by just signing up somewhere.
Well, but what does "sign up somewhere" really mean? If you called a major telco and made it clear that you have a few billion to spend and you wanted a gigabit internet link, you don't think they would find a way? I mean, think Bill Gates calling ... you think he would have to do much more than order what he wants to get it delivered in a reasonable time frame?
> Now sure, I could buy all the flats in the building, then leverage control of those flats to force the current owner to sell me the building, and then I'd have the right to get that work done. That means now I'm paying maybe £2-5M to get their merely "super-fast" broadband to my flat in a major city. Short of your "billions" but not by so many orders of magnitude...
And really, you wouldn't have to do any of that. For one, you don't have to buy it, you simply have to tell them that you pay them a billion bucks to sign off on things, and secondly, the telco will happily do that for you.
But also, they wouldn't have to sign off on anything. The telco could just as well install an LTE cell just for you, or a laser link from across the street, ...
> A few billion wouldn't do much about that I think, you'd need tens of billions to buy that incumbent telco and "change their minds" about the priority of such bespoke services that way.
Erm ... what's the yearly profits of that telco? What's a few billion in comparison to that? You don't think they'd take a substantial increase in profits for a few days of work for some of their technicians?
There's a footnote in the article which says that the FCC data uses the 2016 redefinition of broadband back down to 10mbps:
> The FCC has two data sets on broadband connections. For our analysis, we used the data specifying connections with at least 10 Mbps downstream and 1 Mbps upstream.
I recently signed with Comcast (it was that or DSL) for their 150Mbit/s down plan, and didn't even check the "up" speed, like a fool.
I just assumed around 50Mbit/s, maybe 25Mbit/s at the worst.
It's 5Mbit/s... so I'm downloading gigabytes of huge datasets in a minute, making a small change, then uploading... and uploading... and uploading... better get a coffee....
I recently got a 75/15 package with Comcast Business for $150 monthly. The guaranteed bandwidth seems to be worth it.
Yes, that's why they can use band pass filters to enable and disable specific services and packages. i.e., specialty channels that are part of a premium package are grouped together in certain frequency ranges, and they use equipment to block or allow signals at varying frequencies on each consumer connection.
How can that be? That's the same time we achieved net neutrality. That's not net neutral. That's fast lane on download, slow lane on upload. Asymmetric bandwidth is a prime reason why we all can't host our own clouds and social networks.
Not really, because the options exist at different tiers. If you have 5/1, 10/3, 25/5, and 100/20 as options, and you notice that people choose 10/3 and saturate their download but only use a third of their upload capacity, then turning that into a symmetric system would result in them having to pay more for the same usage.
On the other hand, if you noticed that people were choosing the 100/20 but only peaking at 15 down and 15 up, then yes, it would be clear that customers would prefer a symmetric system.
As it stands, P2P is a pretty rare use case for residential broadband, even when the capacity exists. Access to symmetric speeds is unlikely to increase P2P usage, because those aren't the main blockers.
There are not so many options where I live, none in practice.
Anyway, those tiers are, after all, commercial options. Is the cost for providers really different? If there isn't a substantial difference, then the upsell theory stands.
If people doesn't want upload, why do they throttle it? It would be a nice selling point: you offer a plus that looks nice and costs you nothing.
Access to symmetric speeds is unlikely to increase P2P usage, because those aren't the main blockers.
That's backwards. What they're supposedly doing is discouraging usage. There is more usage than wanted (no matter if it's not common) so measures are put in place to shrink it even more.
The measure does not affect most users so they can get away with it: "only those few freeloaders want more".
Edit -> also consider this: upload limitation in P2P isn't obviously negative. You still can download at max speed so you could think the limitation as a bad thing for others. You need to think for a while to understand why it's also bad for you.
Network neutrality means if I purchase a pipe with x Mbps of potential bandwidth, my communication on that pipe to any destination should have equal priority.
In other words, an ISP shouldn't be able to add rules that say, "When our pipe is congested and we need to slow or drop packets, Google has the highest priority, followed by Amazon, then x, y, z." You cannot shape or throttle traffic basted on where it comes and goes.
You could still have congestion. Maybe the route to AWS has a shorter round trip than to Digital Ocean because of which backbones the ISP has decided to pay for and connect to. But the point of Network Neutrality is that ISPs cannot priorities one packet type over the other. They must try to deliver each payload from each customer evenly and, if they can't, slow or block those payloads evenly and without prejudice.
The one exception to this is if the customer is on IPv6 and uses the prioritization available in the protocol. But even then, the customer is sending an indication to the ISP saying "this packet is not as important" or "this packet is very important."
1) US Census, which is based on surveying households "do you have broadband, Y/N"
2) FCC data, which is based on ISPs self-reporting (In a footnote the article says they're using Pai's new slower definition of broadband, 10Mbps, not the 2015 definition of 25Mbps.)
3) ASU/Iowa, which depends on a derived variable in commercially-purchased data which "denotes interest in ‘high tech’ products and/or services [including] personal computers and internet service providers" as a proxy for broadband ownership
...and the first two roughly match each other, while the third doesn't. The academics claim the company that sold them the data told them it was a reasonable proxy for broadband, the company says they didn't say that.
Just for context,of its not obvious, I work with data. Both putting it together and analyzing it. And one of my chief frustrations with academia (and biggest lessons to people I advise) is a kind of "cultural reverence for the data set".
Just because data is collected, in no way does that assure that's it's right or suitable, even if the valuable name says it is.
Be skeptics. Private suppliers have incentive to sell you data. Private industries have incentives to keep data from you (it constitutes competitive advantage). Government data has political interference on what is collected, even if you're lucky enough to live in a world where the actual collection is independent and rigorous. Reporters and responses to surveys and interviews may be innacurate even when people thought they were being honest, and on socially contentious topics they usually don't have that.
And even if you managed to avoid all that, it doesn't mean your data isn't problematic. Our census in my country, for example, is done in the winter time. How good is that at tracking information in seasonal towns?
Proper data collection is some of the hardest work you can do, and proper analysis comes from measuring, corroborating, justifying, hypothesising on the data you have. It does not involve just calculating a stat or, god forbid, just testing things for statistical significance just because it's on your data set.
For all those reasons, I highly commend this article. We need more of it.
1. Go to a large Internet services provider (Amazon, Google, Akamai, Netflix).
2. Ask them to statistically sample the TCP flow rate observed in client traffic, by source IP address.
3. Get a data set that geolocates IP addresses to ZIP code (Amazon for example has this data).
4. Join the two.
One flaw I can already identify with your data set is that it doesn't differentiate between traffic originating from workplaces, or cellphone traffic. A source like Netflix could be a heavily biased set since it relies on self-selected subscribers; there is no skipping research about if their user base provides an adequate sample at the county level. Asking a huge company like Google and getting them to pay attention is another challenge.
The article writes about how the FCC does have more granular data, but "the commission is wary of “one carrier learning about another carrier’s market share or where their customers are,” Rosenberg said." So you are at the mercy of competitive businesses willingly releasing data that competitors might find useful.
You can get reasonable rates at a larger scale using survey approaches (like OxIS, the British survey series from the Oxford Internet Institute) but these aren't granular enough to compare small areas. Otherwise you're using some kind of proxy data.
Your approach has two issues. First, people without Internet access are not in the sampling frame at all, which is a critical problem if you're interested in network access. Secondly, the geolocation in step 3 is also not particularly reliable (mine tends to think I'm 100 miles away from where I am) so you'll get the same kinds of problems they found here in that the data look kinda plausible but aren't actually reliable enough to draw defensible conclusions from.
Getting good data at nationwide scale is never as easy as it sounds in your head, unfortunately.
1. Typically a single IP address represents a single connection. Yes there are providers who NAT multiple subscribers onto one IP but they are rare (because if they do that then they have to maintain NAT logs in order to identify criminal subscribers to law enforcement -- easier to just have 1:1 IP to subscriber).
2. Residential addresses are in fact not really dynamic. Yes they can change from time to time but for the most part they don't (see #1).
3. Cellular traffic can be identified because the cell carriers use specific identifiable netblocks.
4. It doesn't matter if not everyone uses the sampling service because that's the point of sampling.
Notice we applaud the careful report on a research report in exactly the same way we applaud the post-outage report.
A monitor of exactly how much traffic is used by ads vs content.
So if I load a page, and say that page is just an article with text. What % of the content is the bandwidth-consumption I am interested vs the ads surrounding it?
The reason why this number is important is Mobile.
So a user signs up for "3 gigs of data" - how much of that 3 gigs is consumed by ads and shit they dont want/need?
Actually - it would be good to have a standard on reporting for any given page "this page weighs in at 50KB for content and 500KB for ads..."
Does this exist?
Previously they had to show state and fed gov't this info. Now they get to concentrate on providing access to the most profitable while ignoring the less profitable.
FiveThirtyEight's biggest mistake seems to be trusting an academic dataset when they had no idea how it was collected. This is understandable, especially when the data was published on the Arizona State University's Center for Policy Informatics data portal. (You can go there right now and download the bad data - scroll to CATALIST DATA here https://policyinformatics.asu.edu/broadband-data-portal/data...) A university should be a trusted source. But FiveThirtyEight took an unbelievable outlier from this dataset and wrote an entire post about it (https://fivethirtyeight.com/features/lots-of-people-in-citie...). The dataset claims that only 29% of Washington D.C.'s adults have broadband. (The real number according to the other datasets FiveThirtyEight looked at in the new post is closer to 70%.) They even make a point of how extreme the Washington D.C. datapoint is on the histogram in the article as the only large county with such a low percent. That should be a clue to question your data.
What I find worse is that the academic researchers published this dataset. They bought behavioral marketing data and trusted a salesperson that the variable HTIA (“Denotes interest in ‘high tech’ products and/or services as reported via Share Force. This would include personal computers and internet service providers. Blended with modeled data.”) was a good proxy for broadband access. To be clear, HTIA includes modeled data, which means they took demographics, voting records, and whatever other individual data they could grab (maybe they have records of your purchases, I'm just guessing), and predicted whether each adult in the US was interested in tech. This is the kind of data companies buy for ad campaigns, figuring that if they advertise to these adults, it might be better than random. There's no reason to think the aggregates of these numbers would be accurate or calibrated correctly, especially for an entirely different purpose (broadband vs high tech).
It's disturbing that these sort of datasets are floating out there in academia and really makes you wonder what other bad data is being blindly trusted to write blog posts, research papers, and news articles.
"After further reporting, we can no longer vouch for the academics’ data set. The preponderance of evidence we’ve collected has led us to conclude that it is fundamentally flawed.... The idea behind the stories was to demonstrate that broadband is not ubiquitous in the U.S. today, even as more of our lives and the economy go online. We stand by this sentiment and the on-the-ground reporting in the two stories even though we have lost confidence in the data set."
If the data you used to reach a conclusion is fundamentally flawed, it's pretty disingenuous to claim you stand by the sentiment. So they started with a conclusion, set out to prove it, later found the data they used to prove it was flawed, but still believe it's true.
The second thing I don't like is it seems readers are very confused between access and usage and their sloppy wording often conflates the two. It appears they were studying usage (actual subscriptions) not access (availability of a high speed connection).
Lastly, they also seem to disregard an LTE wireless connection as usage of broadband, when I would have assumed it would clearly be considered. If LTE wireless is more commonly used as a form of access to broadband internet in certain areas (i.e. rural areas where density can't justify running the fiber, or dense metro where the LTE is so good there's no need for a wire) then not surprising you'll find broadband "usage" is low in those areas, even if those households are absolutely using broadband internet through an LTE hotspot.
Nate Silver and 538 are fairly hardcore Bayesians, and this is how pretty much all Bayesian thinking works.
You start out with some prior "sentiment" (a.k.a. prior belief), and use then use data to update your "sentiment".
In turn, invalid data would mean that you'd revert back to your original prior sentiment, and when you get new data you'd start Bayesian inference once again.
Edit: Looking at the on-the-ground investigative reporting and the other sources and studies they've cited in the related articles, I actually agree that they have decent evidence to support their belief without these data sets.
I mean, ignoring the data sets, would you argue against the idea that many people in cities can't afford broadband or that many places in rural America have crummy internet infrastructure? Certainly, I'm less confident than I would be with the additional data, but my confidence is still relatively high without it.
I do agree with your other points. Mobile internet access--with or without hotspots--is unreasonably ignored. The conflation between access and usage was a lesser concern since I managed to navigate the articles well enough.
It's just a sentiment. I have a strong sentiment that LTE reception in the middle of the Sahara is terrible. I haven't collected any data whatsoever to support that sentiment, yet I fully stand by it. There's nothing disingeneous about that.
You do make a reasonable point about usage and access. Which is more relevant depends on what you're trying to show. Access is whether you can get it at all while usage could well bring in economics as well.
It's a fair point about LTE. They don't seem to have included satellite either, which definitely gets used in rural areas where nothing else is available. I suspect though that neither of these would affect overall conclusions that much.
Although, in some ways, what FiveThirtyEight is saying is worse than the Bellow quotation. They have not just "not considered," but have re-considered and decided to disregard the facts, while affirming what they originally wanted those facts to demonstrate. Just take their example of D.C., where they used a number that said there was 28.8% coverage, when in reality the coverage is 70%+. How can they then justify the sentiment that even metropolitan places like D.C. lack coverage?
The discrepancy, in their words, comes from totally different definitions: "We looked into it and found that the data set we used had a fundamentally different understanding of broadband access than other sources did." That FiveThirtyEight would stand by a sentiment derived from one definition of broadband coverage, and apply it to a definition that yields entirely different, indeed contradictory, data, seems irresponsible.
Good on them for writing this, it's important to admit when you're wrong. However, I feel like this outlet has a larger responsibility to be actual data analysts as well as journalists (over say a more traditional journalist for a more traditional news outlet). As such, why was the analysis done in the postmortem article not done prior to publishing the original articles? A good analyst is one you can trust, and trust for an analyst is built by drawing conclusions from highly defensible data, and highly defensible data is data which has undergone sever scrutiny of the analyst before conclusions are drawn, not after. Also, they should probably update the now erroneous articles with a disclaimer indicating that much of the research is now invalid.
If you think of them as a news organization that uses data as it's gimmick to sell page views, you will be less surprised at events like this (disappointed, yes- surprised, no). They have the same incentive to sensationalize things as a regular news organization. Their mission is not to increase the knowledge of the human race, their mission is to bring in page views to sell ads and make money.
It may sound like I'm coming down harsh on fivethirtyeight. I genuinely enjoy reading some of their articles, but I make sure to remember what kind of organization they actually are and don't fall for the trap of thinking they are a think tank staffed with postdocs.
You basically end up arguing that it is all about money, and real journalism cannot happen under profit-seeking organizations. It also trivializes that journalism's big challenge right now is how to do real journalism when the big tech players have vacuumed up their ad revenue.
I also wouldn't treat 538 as the same as CNN writing about a thing Trump tweeted. They actively talk about and discuss their journalistic goals. They try to be openly self-critical about what and how they cover topics. They are trying to compete by not doing the same thing as other organizations.
I'm a statistician, and we always work with investigators that are experts in their field- unless we are researching statistical methods, then we act as our own experts. The statisticians handle the data analysis and make sure that the investigators don't make silly data mistakes. The investigators handle the reasoning and mechanisms behind the research. When they work together they can collect good data that they are familiar with and know how to interpret correctly. When they work separately they are prone to mistakes.
Fivethirtyeight (and also the ASU researchers) fell into this trap. They were not involved in the data collection, so they don't really know what each variable means, and just took someone's word for it who, (back to the economic motives) has an incentive to sell their dataset and not to tell the researchers to go look elsewhere.
I'll admit I was too harsh in comparing fivethirtyeight to an article about Trump's tweet, 538 is typically decent investigative journalism. However, I maintain that it isn't on the level of peer-reviewed academic research. Articles on 538 don't go through peer review. They aren't submitted and languish for months with revisions and answering follow up questions. I'm not saying that this mistake would have been caught by a referee, but in the peer-review process a referee could have caught it by asking questions about the analysis process and if it's valid to use a variable named "X" was being used as a proxy for a variable named "broadband access."
It's much much worse than this. I was once interviewed about my research by an NPR reporter. He had already decided what his story was about and tried a variety of tricks get me to say some pithy quote he had devised so he could use it on air. The problem was that my research actually debunked, rather that supported, the story he wanted to run, and the pithy quote was scientifically unsupportable.
I often wonder whether some of the quotes I hear on NPR are edited versions of the interviewees saying "no it wouldn't be accurate to say x" cut down to "x".
Also, they did update the erroneous articles:
Ah so they did, my mistake.
I think the idea is that we all make mistakes.
I am in total agreement, we all do make mistakes, and it's refreshing when people own up to them. I guess my main question still stands though. As a top-tier data journalism outlet, why was a data QA not performed prior to analysis, or, before publishing results? And if there was data QA performed, where did it fail? It just seems that they got bit by a time crunch to publish content amid the FCC repeal and sloppy analysis was the result.