You are no longer providing value to our users. You will be quickly replaced with something that provides more value to our users.
Likely, what percentage of users come back to search and keep trying other results. From a simplistic point of view, the value proposition to users can be derived what what percentage had their query answered by that source. If WSJ is taking some number of visitors from Google that used to get a page that caused them to stop searching and now providing a page that does not cause them to stop searching, they are providing a less useful resource overall for Google users.
WSJ is providing objectively worse results on average for Google users than they were previously. It makes sense that would cause their ranking to drop.
With news content it's free vs a few hundred dollars a year, more if you want multiple paid sources, unless you find that all of those free sources including BBC, AP, etc are so bad that they provide negative value and just waste your time then it's in your best interest (infinite ratio) to try to get by with those free sources, ultimately the information from most news articles is by and large equivalent. If Google is able to raise those higher quality free sources to the top of results, that provides much better value to users than suggesting paid sources - most users on finding a paid source will just click the back button and pick another source, the paid source having just wasted their time completely.
So, like the GP said, that's "a broken definition of 'value'" on the face!
Wouldn't you at least concede that if some sources do in fact provide negative value, it might be preferred to connect those sources that do consistently provide value, directly with even some small reward (like a subscription, maybe a per-article cost) in exchange for consistently providing positive value?
How do content creators make a return on their investments, is it meant to be indirectly, through advertising, or is it some other way I haven't thought of that will recoup their costs)
This is all nonsense anyway and I don't believe any of it. Information has to be free, and it's either all free or none of it is. The paywall is wrong, if it seeks to prevent us from sharing information then it will fail, and Google, a part of the system of freedom, is properly set up to route around the damage of censorship inflicted by asserters of copyrights. The paywalls that don't provide free information, moved down or were delisted in the rankings like I think should happen.
If growth in your business model depends on broadening the subscriber-ship and thus reach of your own information, but also on limiting the proliferation of your own information, then it is wrong too. I don't know what this means for news media companies that have to turn a profit for their shareholders; I guess I can safely say their concerns are not my concerns.
Now if you'll excuse me, I'm going to go watch another episode of Black Mirror.
I don't think it's right that they carve out this valley between their legally afforded copyright protections and my fair-use rights meant to assure I can never use my fair-use rights even after their copyrights are long expired (as if it was even possible for a copyright to expire anymore.)
Yes, I am a little off-topic from WSJ Paywall, but it's all one discussion. How do content creators get paid in my ideal version of reality? At the pleasure of content consumers. What can content creators do when that's demonstrated not to be working? IDK not restrictive digital rights management schemes, tho.
There are no easy answers that would satisfy me as either a content consumer or creator.
You can get most of what WSJ writes somewhere else for free. You'll probably say that's not true because the value you get from WSJ is not news but their commentary, but even in this case that particular "value" is very subjective and for most people it's not valuable enough.
I for one don't care about whatever elite content they write. And I definitely don't care for a mere website wasting my time by making me click through just to find out I can't read it, over and over again.
But to build that loyalty, you need to get readers to come to your site in the first place. Policies that harm your performance in search results would seem to be contraindicated.
It's not even funny how you go from my argument to "This makes it not valuable" to "A rejection of the concept of IP with different words".
Here, let me spell it out for you by copy and pasting the same comment:
> You can get most of what WSJ writes somewhere else for free. You'll probably say that's not true because the value you get from WSJ is not news but their commentary, but even in this case that particular "value" is very subjective and for most people it's not valuable enough.
- It is fact that you can get most of what WSJ writes somewhere else for free.
- And I said the "value" is very subjective. and it's not "valuable enough" for most people. Yet you go on and say I said "it's not valuable". Then somehow go from that to accusing me of rejecting the concept of IP. I don't even know where to start.
"Strawman" not a catchall for "I don't see all the steps in your argument for how my position X implied Y." I'd be glad to spell those out more explicitly.
>...It is fact that you can get most of what WSJ writes somewhere else for free.
It is a fact because people copy it. Therefore, you're saying it's not valuable because people can get it elsewhere because it was copied. So, the copy-ability made it valueless, exactly as I inferred from your argument (rather than misattributing it).
(late edit: Also, the fact that you insist on "atom-based" and "bit-based" being incomparable doesn't help your case that you're not rejecting IP: "Comparing the economics of a pure bit-based product (media) with a pure atom-based product (camera)".)
Okay, perhaps that is not what you meant and therefore it was a strawman -- but that's the only self-consistent, plausible reading I saw.
The only other meaning is that,
"You can get all the interesting facts contained in this story from different sources, without just copy-pasting."
Is that what you meant? If so, it's implausible on its face: why are people trying to circumvent the paywall if they didn't want the WSJ's article specifically? Why can't they just go somewhere else? Why do they load HN discussion with complaints about a paywall rather than "here's the free, independent version that's just as good"?
>And I said the "value" is very subjective. and it's not "valuable enough" for most people. Yet you go on and say I said "it's not valuable". Then somehow go from that to accusing me of rejecting the concept of IP. I don't even know where to start.
I said that because you dismissed the value on the grounds of it being "subjective", which was close enough in this context to saying "oh, I can't quite put a hard value on it, so I don't have to care about this journalism going away". As above, why don't people just find another non-copied source? Because they non-subjectively do want to look at this specific source.
With that said, I do agree that it may not be obvious how your position is tantamount to rejecting IP. But I was deriving that as an implication, not misattributing anything to you. Whether or not you recognize this position as implicitly rejecting of all IP, there is certainly a clear logical chain for how it has such an implication.
Edit: I know we're not supposed to talk about downvotes, could they at least wait the 60 seconds necessary to read this?
Because people come to HN, click on a link, see it's paywalled and leave a comment complaining about it, then move on to a different post. Most people who complain about the paywall likely aren't invested enough in the headline to find another source for the same information.
Personally I mostly skim HN for news. If an article is paywalled and isn't profoundly interesting to me, I can spend the time it would cost me to look up alternate sources for the same story just reading something else instead. I don't read WSJ articles because they're better, I read them because they're there. In fact, I actually prefer other news outlets.
You're misattributing motives to what boils down to mere laziness. And even in doing that you don't actually have a case because many news stories have alternative sources in the HN comments, especially when they're paywalled or too superficial.
So, yes, you're misattributing motives, which is literally how you just defined strawmen.
Social objects. The actual value of a particular article isn't the article itself, but the fact that you and other people in the tread have read the same article. In HN discussions, people will go around paywalls not because they can't find another source, but because they want to read the same source as everyone else.
With the exception of more creative editorial pieces, most news falls into this bucket.
If traditional media journalism was truly appreciated, then they wouldn't even have this problem. And honestly they have lost a lot of respect from people in the recent years because they have been doing a lot of things to undermine their own journalistic principles in order to make more money.
See also the sibling reply.
- WSJ subscriber, where WSJ results should be present and maybe even get a ranking boost.
- User happy with paywall results, these results could be surfaced, with UI treatment indicating paywall.
- User uninterested in paywall results, where these results would be essentially removed.
The ranking between the latter 2 of these users could be a bit of a sliding scale too, rather than two distinct groups.
Setting "don't be evil" aside, Google's only responsibility is to Google.
But here's the rub. Google's product is its users' attention. If they bias their search results enough, (maybe) they lose product inventory.
How much is "enough"? Probably much more than filtering WSJ.
But IMO bias in a search engine – a presumed oracle of truth – is evil. So there's that.
Google itself is fueled by online ad revenue. Facebook has ads too, and it's trying to explore every angle (video, etc.) to get people to click.
Both companies know about your likes and interests when you are logged in (Facebook because you declared them, Google because of your past searches) and those of your friends (GMail contacts and GPlus / Hangouts connections in Google's case).
Facebook is more blatant about trying to monetize the ads and still do worse, because people use Google for search, and that's when their intent lines up with actually buying.
Thing is, ads are a commodity in a race to the bottom. You can see every company struggling with this, not just Wall Street Journal but Google. They keep going down. They try new things. Then the ad revenues drop again.
In the real world, you buy a product or a service and you pay for it. This doesn't go down nearly as much in price (although automation and outsourcing may make some impact on wages to locals). If you get paid a commission PER SALE, you've got a sustainable business.
David Heinemeier Hansson constantly talks for years about how you should be building $AAS and charging for it, instead of making everything free and slapping ads on it.
Another acquaintance of mine, Albert Wenger from USV has a whole book about how digital goods are essentially free to perfectly copy, leading to a post-scarcity world in digital goods, with only artificial constraints such as copyright enforcement propping up their price. He has a book out called worldaftercapital.com where he describes humanity's progression from the hunter gatherer societies through today, and explores what the economy will look like later.
So, if you're in the business of creating content, you're now competing in a GLOBAL market. WSJ and other publishers got to enjoy the distribution of their newspapers for decades with copyright protections, as did the music industry. But at the end of the day, when the means of copying information make it cheap to free, you've got to figure out other business models. The freemium one is a good trade-off. But really, why would I pay for WSJ after my free 3 articles a month are up? When there are so many other outlets reporting the same things? Because I like or want to support the WSJ specifically. Their brand. I did pay for a NYT subscription after all.
You know what I really value far more than WSJ and NYT? Wikipedia.
And how much does it cost to run it? Not much (aside from hosting) because people collectively build on each other's work and editors check new contributions to achieve a result that is quite good. People cooperate more than they compete, because the rules of the game are such.
I see such things as being superior to capitalist competition. Look at open source (Linux, Firefox) vs closed source (Windows, IE). The former runs on toasters, the latter doesn't and still over time is less stable. The former overtakes it in quality.
Why not as a society embrace WikiNews and other such sites? Why do we even need the old model of journalism? Because it will appeal to certain niches, just like newspapers used to circulate to small communities and not sold all over the world the next day. Now they'll be sold all over the world but their readership will shrink.
The information economy is and has always been. different. Digital ads for digital websites are far different than commission sales of real world items. 3D printing may one day bring the two worlds closer in some areas, but we are still all a long way away from that.
I'm not sure we should put the morality in Google's hands. If you want top journalism, you ought to pay for it, but it makes sense search engines provide the quickest path to information, and that's usually not a paywall.
Google shareholders value
If publishers want to attract people to get them to subscribe, they need to find other ways that don't violate Google's search policies that have been in place since seemingly forever around cloaking.
Generally speaking, paid journalism is not a sustainable business model for most entities. WSJ might be an exception, but they won't be getting any special Google treatment, nor should they.
Personally, I believe that the business model of most journalism sites will switch to a combination of sponsored content and the sale of Facebook and Google custom audiences. With retargeting, it's possible for an adveriser, say Microsoft, to tell a site like WSJ "we want to be able to advertise on Facebook/Google to people who read to the bottom of this article you wrote about cloud services" and pay WSJ for being able to use that custom audience. This kind of retargeting is already possible on Facebook and Google, but currently only limited to people that have visited your own site(s). Having a custom audience marketplace would be amazing for many advertisers and deliver badly needed revenue to publications.
There is specific discussion of Google as an example in that article, but the whole article is worth reading.
If News Corp wants search ads for it's paid service, it can by then like anyone else instead of expecting Google to treat them specially, which is what they are really asking for.
That could be interesting, though; a storefront with a model somewhere between Steam/Amazon and Netflix/Spotify. Somewhere to both collate the offerings which you can purchase, and highlight content from those outfits at the same time.
Maybe the market is big enough to accommodate that business model, maybe they're unique enough to do so, but many others have tried and failed.
In fact, that's a known winning strategy in a lot of industries, not just journalism: let the suckers rip each other to pieces in the race to the bottom, while you deliver enough value to have fat margins. That's what Apple does, for example.
I'd expect to see more paywalls going forward, not fewer. I think there's increasing recognition that the economics are better.
Google is built to optimize relevance against what people are searching for at a heuristic level (increasing the utility of their search engine based on each immediate choice people make, as opposed to a model like Facebook that tries to increase overall relevance of experience to get more time spent).
Most people who end up on WSJ are searching for quick, accurate, free information. The landing page (a full article) provided that, albeit in an unsustainable business model.
The vast majority of internet users are not looking to subscribe — which has become the main function of the landing page now. That means that the site is, on average, less relevant at a heuristic level.
But AIUI this is not what WSJ is requesting here. They want free ads for their product ranked high in the search results.
WSJ is slanted I dunno recently but I feel like anything Murdoc kinda ruined it.
If anyone controls the NWO, its Rupert.
More to the point, though, if you have an affinity for WSJ and similar content, yes, Google will probably pick that up over time, though getting a big enough boost to outweigh the cloaking penalty completely may be difficult.
OTOH, if the WSJ is useful to.you even with a hard paywalled—that is, if you are a paying subscriber—you'll presumably have it bookmarked and it will be on of your go to direcrt sources for news; you won't need discovery through Google to find content there very often.
You can check it out yourself by searching for "SGML" and compare my site http://sgmljs.net/docs/sgmlrefman.html.
- You have no/little sites or forums linking to your content.
- Your page titles are uninformative. Biggest offender is probably the homepage, with a page title of "index". But even for your reference page, it it is just "Syntax Reference" (way too general), and Google actually uses your page headings to repair this to "SGML Syntax Reference". Try inverse breadcrumb style "SGML Syntax Reference | Docs | SGML.js". BTW: you ranked 2nd for "SGML Syntax Reference".
- Suspicion: Content not visible (like those in the content slider) is ranked lower than always visible content. Chrome headless crawler can detect this. Add this slider content as regular text to your homepage, and also try to expand content there. Include links to your latest blogs.
- I prefer hierarchical headings, not just sections and <h1> for everything. This, because hierarchical headings can not hurt, but non-hierarchical headings could hurt.
- Finally, SGML being a standard, there are simply a lot of competitors for this keyword. These competitors are not commercial competitors, but authoritative websites with lots of informative content. Exactly the sites that Google likes to rank high. If you want to rank for SGML, you may be fighting an uphill battle.
I'm aware of some of the issues you mentioned, but don't you think my site, with the depth of information provided, deserves at least a mention among the other ~200 ones? I'll try and fix the heading issues first, then see if search results improve.
It seems like your complaint is that adding one page of reference info wasn't enough to serve as an ad for your business. It doesn't seem like a valid complaint to me.
Couldn't you argue that if the free "ripoff" has the same information and is more easily accessible it should be considered more relevant?
If users don't have access to info because it is restricted to paying users, google won't have access either.
In the video the guy explains that while you are not allowed to treat gbots request in a way no other people are treated, it is okay to differentiate between "boxes" of users. In their example it is the country USA, but if you define a "google user country" and let all users of google in it is ok to bundle the gbot with those. Grey area for sure, but might makes right.
"So geolocation, that is, looking at the IP address and reacting to that-- is totally, fine, as long as you're not reacting specifically to the IP of just Googlebot, just that very narrow range".
Also, they will crawl you from an unusual IP using a user-agent that doesn't say it's Google. And when that happens, and you deny access to undercover-Googlebot, but allow Googlebot in full uniform, you'll be penalized for cloaking.
They would have to react to just gbot when constructing the special google urls when the bot is crawling the site, tho.
This how they were originally handling it, before February. It would display if you visit the link from Google, or set your refer to it looks like it (this is why HN has the "web" link under articles), even if you weren't a subscriber. It's allowed, because regular users coming from Google do see the same thing as Googlebot.
WSJ have since changed that, so only subscribers can view articles, and you no longer get a "free click" coming from Google Search as Google calls it. They now show a short snippet, and are following guidelines to be labeled a “subscription” service by Google Search. This caused their rankings to drop below being a "free" news source though. But it's not nearly as bad as if they had cloaked Google.
This way WSJ would be showing the same (www.wsj.com/Bezos-with-hair) to every user.
*based on the video
Google's rules and help documents are spread all over, but here's some from Google News about subscriptions:
"If you prefer this option, please display a snippet of your article that is at least 80 words long and includes either an excerpt or a summary of the specific article. Since we do not permit "cloaking" -- the practice of showing Googlebot a full version of your article while showing users the subscription or registration version -- we will only crawl and display your content based on the article snippets you provide."
edit, forgot link: https://support.google.com/news/publisher/answer/40543?hl=en
That's a new one. Buy a subscription to the WSJ now for a song.
For instance, if you happened to be a logged WSJ user, then google could show you the result based on your cookies.
I did not say that.
I said google could display the results to which you have access, based on your cookies. Assuming they had access to crawl the whole article.
It is not possible for Google to get access to cookies for other sites, anyway. This is a pretty fundamentally important security restriction that browsers implement to protect you from nefarious sites. So it isn't possible for Google to know which sites you have paid accounts with unless you explicitly tell it.
How would that work? Would Google create or be given a login for WSJ? That would benefit WSJ as an incumbent news provider at the expense of startups too new or small to get special treatment from Google.
How would a new website indicate to all search crawlers (so google doesn't befit at the expense of other search engines) how to get access to it's pay-for content in such a way that end-users can not also pretend to be seach crawlers and get access to the same content?
Sure, some people could set up GCP instances and proxy it, but that's a very tiny percentage of people.
DNS lookups are far more expensive to perform than an IP filter, and couldn't be done in realtime. So WSJ would have to set up a system where they regularly find all Googlebot referers in their logs that were rejected, do DNS lookups on the IPs, and add any that were valid to a whitelist so that they won't get rejected again. This will cause new Googlebot IPs to get rejected until the whitelist is updated, hurting indexing and ranking. The WSJ would also have to go through their whitelist regularly and do DNS lookups to verify that all of those IPs are still valid Googlebot IPs, and remove any that aren't valid anymore. That opens a window for invalid IPs to continue to get access, which may or may not be a problem depending on how often IPs change and where they get reassigned to.
The IP whitelist would need to be distributed to WSJ's webserver farms and used to update firewall rules, in an automated way that may or may not integrate with how that stuff is currently managed. (Generally, those rules would be tightly controlled in a big org like the WSJ.) The HTTP access log gathering from the farms and their analysis would also need to be automated, which again might be a management issue if the logs contain anything sensitive. (Like, I don't know, records of particular individuals reading particular stories which certain government agencies might be interested in acquiring without the hassle of a warrant.)
So yeah, there's a way to find out if an IP belongs to Googlebot. That's a long way from a manageable filtering solution at the WSJ's scale, even if Google wouldn't penalize them for doing it, which they would.
After the Journal’s free articles went behind a paywall, Google’s bot only saw the first few paragraphs and started ranking them lower, limiting the Journal’s viewership.
Yes, of course technically that is an accurate description.
But although I am a user of Google, I don't like the idea of them thinking of me as their asset, even though I obviously am.
And irrationally because Google feels like such an omni-present utility, my intuitive expectation is that they index the web in a non preferential way.
Which still doesn't make any sense, because the entire utility of their search results is because they weight what they index.
There's some cognitive dissonance here, but I can't put my finger on exactly what it is.
It's the cake... you can't have your cake, and eat it too.
The commoditization of search can't come soon enough. Google shouldn't be able to monopolize this space.
To go out in the streets and talk to people, dig deep into obfuscated government archives, and actually make sense of the world takes real work. It's often work that results in content that isn't always just what their readers want to see.
That's not how it works.
That's not how anything in life works.
Allow people interested in paying for content to pay for content, and then use the discovery mechanisms the content owner makes availlable for that content.
If the content owner wants to promote content that isn't available to the general public via search engines, they can buy search ads like anyone else.
They don't need a subsidy at Google's (and, in terms of time, Google users') expense by way of organic search results for content that most search users will not be able to access when they click the result.
> The commoditization of search can't come soon enough. Google shouldn't be able to monopolize this space.
Commoditization of search (which means heavy competition focussed on minimizing costs) isn't going to make it any more likely that competing public search providers are going to subsidize paywalled content that their users can't use.
Are you sure it's true?
Are you sure it's not the opposite?
Could either or both be true?
We must critically examine the structural incentives we put in place that support shallow outrage porn over deep, well-sourced, investigative journalism.
Killing search isn't going to add more "real", "substantial" content online.
Without good search, there's a discovery problem. How will people find quality content easily? Where'd you get "killing search?" Search clearly provides value.
From the fact that I'm searching for e.g. a black & white old Western movie, it doesn't follow that I only want movies I can view on the internet right this second. Edit: I could very well be satisfied with "here are the names of popular ones, but you'd have to go to their publishers since they're long out of print.
Update those periodically (hours / days / weeks). The adverts don't change particularly quickly.
On the flip side, some people can't change their IP addresses easily, and getting IP banned (even if rare because of the reasons you stated) is actually a major hassle when it actually happens for those people. :/
If only they all did this. So many sites I get to and they're a blank page or an absolute disaster....
However, I don't see cache links on Google :(
Edit: Oops, I'm wrong. The article does say that the Google bot only sees the first paragraph or so.
"The reason: Google search results are based on an algorithm that scans the internet for free content. After the Journal’s free articles went behind a paywall, Google’s bot only saw the first few paragraphs and started ranking them lower, limiting the Journal’s viewership."
So maybe this is why there's no Google cache.
Also, if Google can only index the first few paragraphs, the results are much less comprehensive.
Of course, the problem with nuclear is collateral damage. Drop the bomb and ads don't work, but neither does a lot of other stuff. E.g., the site shows a blank screen, images are invisible or blurry, drop-down menus don't drop. And, of course, the deal-breaker: videos don't play.
You can completely change how a site presents. E.g., change a slide-show in a static slide window that barely moves due to the background ad-tech load changes into a set of `divs` that roll upwards as your finger swipes.
It's a hobby at best. Disabling ad-tech components by origin is the practical option.
I used to play around with filtering sites to make them less antisocial, but find that slog less entertaining these days. So now when confronted with a site that's useless without JS, eh, there's almost always another site out there that doesn't mind the terms I demand for my attention.
Sometimes I do a Google Image search because I found something interesting, but don't know what it is, so I'm hoping of going to a page that describes what I was looking at. Pinterest shows up as a result, but with no backstory nor does it lead me back to a source, so it's worthless as a search result. It'd the ExpertsExchange of image searches.
Endy's advice of using `-inurl:pinterest` seems invaluable and I'll be adding that for all image searches in the future.
Pinterest needs more competition.
In the context of Google being the alternative, this is sad and funny at the same time.
Hacker News in the abstract: The web needs more decentralization.
Hacker News IRL: Let Google own every vertical.
That said, there aren't many players in a position to provide actual competition to Pinterest like Google can. The obvious concern is where you draw the line at anti-competitive if Google tried to do the equivalent of what they did with Yelp ratings.
Would also love to have one that rewrites URL's in search results to avoid the frequent 5 second pause when Google's redirector gets it's head stuck up its ass or whatever the problem is.
Note that it doesn't block some spam domains that sneakily use certain special characters i their domain names. Unfortunately Google hasn't fixed this issue for forever.
And for Firefox: https://addons.mozilla.org/en-US/firefox/addon/personal-bloc...
I want a world that supports business models other than web advertising. It negatively impacts journalistic integrity and freedom while further exacerbating the race to the bottom search engines and other content aggregators create.
WSJ is responding poorly to a bad situation. I suspect it'll cost them.
http://brave.com/ is one approach.
Eg, have newspapers _ever_ had integrity, and if they _did_, what's different?
They did somewhat before the massive corporate consolidation starting in, IIRC, the late 1970s when newsrooms started getting axes and the major dailies progressively became skins over wire services and lightly-rewritten press releases.
The internet often gets the blame, but it providing actual competition was decades after the terminal quality and subscribership decline of American newspapers began.
It actually was the internet competition (both wire services being directly available to readers and the loss of advertising) that actually got some of them talking about building up newsrooms, rebooting investigative journalism, and relying more on subscription income (paid subscription was never paying the bills before, it was pursued as a key metric advertisers used in determining how much it was worth to advertise in a paper.)
Multiple revenue streams (sales, classifieds) would make them less beholden to advertisers.
Of course they would still have to write material that sold!
I don't like to use John Oliver as a source, but there's some decent content in this:
If Steve Jobs was still alive, I'd bet Apple would be working on a competing search engine with some of these features.
krschultz's comment is really relevant as well. In a complete system, search should actually know about what you subscribe to already, and not penalise those results for you.
I don't want search services to know what I subscribe to. That's private information.
Steve Jobs reinvented PCs, reinvented mobile. I think "the next Steve Jobs" could do the same thing for search. I'm less and less certain of Google's monopoly on that space going forward. It's still built around Web 1.0 tech, has hacks into Web 2.0, but there's a Web 3.0 it's not ready for.
Free zero-revenue startup idea: there'll be an IP-over-ham-radio or something to preserve "internet classic". (Largest use will be bitcoin-for-pornography).
There's got to be a startup idea in there somewhere.
put this in perspective: there are millions of sites out there. theres just 1 major search engine. Yeah I think i know who has leverage here.
It specifically mentions cloaking.
> Webspam pages try to get better placement in Google's search results by using various tricks such as hidden text, doorway pages, cloaking, or sneaky redirects. These techniques attempt to compromise the quality of our results and degrade the search experience for everyone.
Facebook doesn't even need Google, their users just visit the site directly.
I guess Mark zuckerberg doesn't lose sleep over this.
- hidden text,
- doorway pages,
- cloaking, or
- sneaky redirects
They just show a big popover nagging you to log-in. But you can click this away.
If certain Facebook content pages rank low, or do not rank at all, it is because Facebook actively blocks Googlebot from accessing the content, not because Facebook is trying to deceive Google (or the user).
Though Facebook does not need Google, it could get quite a lot more visitors if it lowered the wall of its garden a bit. As is, Facebook is an inaccessible social echo chamber, and I don't lose any sleep over this.
For the record, if anybody needs a draggable WSJ paywall bypass bookmarklet, I put one up here:
WSJ subscribers, online and print combined, are in the low single digit millions. Google search monthly unique users are about three orders of magnitude greater. WSJ online subscribers are close enough to 0% of Google's users as to make no difference.
So, that "some" is essentially all.
If we take just the population of the US in 2017 (326.5M), and assume every American searches via Google for their news, we're looking at give or take 1% of the US with the WSJ subscriber estimate you provided (~3M).
We can refine these numbers further...
19.4% of the US population is less than or equal to 14 years of age (18 would be better, but couldn't find) - so that gives 263M potential American news readers
The Pew Research Center http://www.journalism.org/2016/07/07/pathways-to-news/ states that about 38% of adults in 2016 often get news online (~99M)
That leads to at least 3% of US Google news searchers are potential WSJ subscribers.
So that's not a tiny number (yes the number could be adjusted for worldwide English speaking news googlers - but I think I've made my point).
How many of these subscribers are wealthy and coveted by advertisers?
What if more news publishers follow WSJ and you happen to be a subscriber of that content?
With the amount of information that Google has on its users, I don't see why it can't adjust search results based on whether or not you subscribe - and bring value to whichever side of the paywall you reside on.
There's an interesting difference between overlay paywalls like Wired uses, and content-not-loaded walls like WSJ uses. In the Wired case, they text is sent to you but they try to stop you from reading it. In the WSJ case, they don't even send you the text of the page you supposedly clicked on.
Since we're in the second case, this isn't even a decision by Google. The WSJ actually isn't sending you the data in the search result snippet, so the crawler rightly says "wow, nothing useful here". The complex, ideal solution might let me tell Google "search as though I'm a WSJ member", but short of that they're accurately assessing what content is actually available.
I could imagine Google adding some sort of "content is locked behind a paywall" indicator on search results, but if I'm searching for something on the web, a link to blocked content is not very helpful most of the time.
At least being able to know it exists can help consumers decide if they should pay for an article/subscription.
That's called advertising, but the WSJ will have to buy ads like everyone else.
Not sure if they track this but whatever the Googlebot's view of the site's content, if it's constantly bouncing users back to the search page it should get hit with a hard penalty.
For example, Google scholar will search papers that are behind a paywall. By blocking them, I wouldn't even know what to purchase.
This is google foisting their business model down our throats.
Just append site:twitter.com to your Google search, click through, and voila.
Clicking the bookmarklet when you're on a WSJ article will shunt the url through Facebook's redirect service, which will allow you to view the article.
Similar to how software companies release free software to augment what makes them money, Bloomberg is able to spend a lot of money on producing content that is sponsored by their terminal subscriptions.
The WSJ might be in a unique situation where their primary audience will pay, often due to companies footing the bill for employee's, so perhaps they can be one of the few news producing companies that doesn't have to depend on Google for traffic in that their primary audience loads up their front page multiple times a day just to see what's there.
I wouldn't be surprised if they did a deal with Bloomberg to provide their content on terminals to further strengthen their ties to their core audience.
To be honest, that's the past and present of most news as well.
Newspapers would have never been around based on advertising alone. The fee usually paid for printing and sending, and classifieds made up the bulk of the revenue. Now that they're decoupled and both news and classifieds are pretty much free, it's no wonder that news is struggling.
Hard news has (almost) never been a wildly profitable endeavor in and of itself.
No idea how we get there, though. You basically need to persuade those who understand how critical journalism is to freedom to care to fund it, and to have a centralized platform to fund journalism that itself is not corruptible by monied interests trying to push propaganda.
Why is a centralized platform necessary? Why isn't, e.g. Patreon, for individual journalists or individual teams of journalists sufficient?
We got Fox/CNN etc because journalism in service to advertisers = clickbait.
At this point, I think we can codify this: Journalism + Ad-Supported-Model = Clickbait.
Good journalism coming from firms which advertise is by accident. Like a broken clock being right twice a day.
I love subscription journalism because you're not playing the clickbait game.
There is no future for journalism in ad supported models. Plenty of future for infotainment, heck, infotainment masquerading as journalism might just take it all over, but journalism will certainly not be a part of that fold.
I am just surprised that sites don't do this already. Its par for the course with a lot of software, why not news or similar?