Hacker News new | past | comments | ask | show | jobs | submit login
WSJ Ends Google Users' Free Ride, Then Fades in Search Results (bloomberg.com)
469 points by mudil on June 5, 2017 | hide | past | favorite | 387 comments

Makes sense from Google's point of view.

You are no longer providing value to our users. You will be quickly replaced with something that provides more value to our users.

But what if you are the kind of person that wants to pay for good journalism? Will Google figure this out from your history, and rank WSJ higher?

Then go to that source, not to Google. If you go to Google, expect to find content that provides Google users value.

As a Google user, I value it because it finds things. Some of those things cost me money. To make this visceral: what differentiates Google finding me an expensive camera that I can buy from B&H that matches my search criteria and helps me get my work done of Google finding me an expensive article from the Wall Street Journal that matches my search criteria and helps me get my work done? If Google wants to optimize for "lowest price" then it should make that a non-default criterion as otherwise they are just helping me find cheap, low quality crap: if that is providing you value you probably have a broken definition of "value" :/.

> To make this visceral: what differentiates Google finding me an expensive camera that I can buy from B&H that matches my search criteria and helps me get my work done of Google finding me an expensive article from the Wall Street Journal that matches my search criteria and helps me get my work done?

Likely, what percentage of users come back to search and keep trying other results. From a simplistic point of view, the value proposition to users can be derived what what percentage had their query answered by that source. If WSJ is taking some number of visitors from Google that used to get a page that caused them to stop searching and now providing a page that does not cause them to stop searching, they are providing a less useful resource overall for Google users.

WSJ is providing objectively worse results on average for Google users than they were previously. It makes sense that would cause their ranking to drop.

You nailed it. It's in Google's interest to provide the best user experience and a large part of that is optimizing for the "best" result. Whether or not a user returns to search results, or executes a subsequent search are some of the many signals used to determine this.

It's a great metric for a lot of things, but definitively not to determine the value of a piece of journalistic writing. Here we're saying that the article SHOULD be free in order to gain value. So basically we're saying: it's ok to attribute no value to it if Google thinks so. I'm not a fan of hard paywalls, but this is definitively a fallacy in the definition of value, and Google shouldn't have it so easy.

Google isn't determining the value of paywalled articles, the general population is. Google is just facilitating the desires of users. The obstructive thing for Google to do would be to put WSJ near the top because Google thinks it's valuable, even though the users find it useless.

The problem with your assumption there is that both of those things cost money. The benefit ratio for a cheap, but crappy camera to an expensive one might be $250 to $1000, about a 4x ratio, that's a fairly low, finite and often measurable improvement on that camera. That ratio might be acceptable for someone who wants a high quality camera.

With news content it's free vs a few hundred dollars a year, more if you want multiple paid sources, unless you find that all of those free sources including BBC, AP, etc are so bad that they provide negative value and just waste your time then it's in your best interest (infinite ratio) to try to get by with those free sources, ultimately the information from most news articles is by and large equivalent. If Google is able to raise those higher quality free sources to the top of results, that provides much better value to users than suggesting paid sources - most users on finding a paid source will just click the back button and pick another source, the paid source having just wasted their time completely.

> unless you find that all of those free sources including BBC, AP, etc are so bad that they provide negative value and just waste your time then it's in your best interest (infinite ratio) to try to get by with those free sources

So, like the GP said, that's "a broken definition of 'value'" on the face!

Wouldn't you at least concede that if some sources do in fact provide negative value, it might be preferred to connect those sources that do consistently provide value, directly with even some small reward (like a subscription, maybe a per-article cost) in exchange for consistently providing positive value?

How do content creators make a return on their investments, is it meant to be indirectly, through advertising, or is it some other way I haven't thought of that will recoup their costs)

This is all nonsense anyway and I don't believe any of it. Information has to be free, and it's either all free or none of it is. The paywall is wrong, if it seeks to prevent us from sharing information then it will fail, and Google, a part of the system of freedom, is properly set up to route around the damage of censorship inflicted by asserters of copyrights. The paywalls that don't provide free information, moved down or were delisted in the rankings like I think should happen.

If growth in your business model depends on broadening the subscriber-ship and thus reach of your own information, but also on limiting the proliferation of your own information, then it is wrong too. I don't know what this means for news media companies that have to turn a profit for their shareholders; I guess I can safely say their concerns are not my concerns.

Now if you'll excuse me, I'm going to go watch another episode of Black Mirror.

People are downvoting but I don't know if you disagree with the first half of what I said, or the second half. It would be helpful if you responded!

I think people might be downvoting because you're sort of all over the place and expressing 2 different opinions which seem contradictory.

I think that content creators should get paid, but I don't want them to try to put DRM chips in our brains to make sure we're all paying for whatever information we consume.

I don't think it's right that they carve out this valley between their legally afforded copyright protections and my fair-use rights meant to assure I can never use my fair-use rights even after their copyrights are long expired (as if it was even possible for a copyright to expire anymore.)

Yes, I am a little off-topic from WSJ Paywall, but it's all one discussion. How do content creators get paid in my ideal version of reality? At the pleasure of content consumers. What can content creators do when that's demonstrated not to be working? IDK not restrictive digital rights management schemes, tho.

There are no easy answers that would satisfy me as either a content consumer or creator.

Comparing the economics of a pure bit-based product (media) with a pure atom-based product (camera) is not the most convincing way to make an argument in my opinion.

You can get most of what WSJ writes somewhere else for free. You'll probably say that's not true because the value you get from WSJ is not news but their commentary, but even in this case that particular "value" is very subjective and for most people it's not valuable enough.

I for one don't care about whatever elite content they write. And I definitely don't care for a mere website wasting my time by making me click through just to find out I can't read it, over and over again.

That's probably the biggest concern for the WSJ and other news sources that might follow their lead. Most news coverage--excluding content like features, initial exclusives, and commentary--can be considered a substitute good. You'll find it on other sites. Unless you've built a sort of loyalty or trust with a reader, they'll just go elsewhere.

But to build that loyalty, you need to get readers to come to your site in the first place. Policies that harm your performance in search results would seem to be contraindicated.

So you believe that there is such thing as a news report without commentary?


So, because your product is easy to copy, that makes it not valuable. A rejection of the concept of IP with different words.

I don't know if you have heard of "strawman fallacy" but you and the other guy below saying "So you believe that there is such thing as a news report without commentary?" have fallen into that trap.

It's not even funny how you go from my argument to "This makes it not valuable" to "A rejection of the concept of IP with different words".

Here, let me spell it out for you by copy and pasting the same comment:

> You can get most of what WSJ writes somewhere else for free. You'll probably say that's not true because the value you get from WSJ is not news but their commentary, but even in this case that particular "value" is very subjective and for most people it's not valuable enough.

- It is fact that you can get most of what WSJ writes somewhere else for free.

- And I said the "value" is very subjective. and it's not "valuable enough" for most people. Yet you go on and say I said "it's not valuable". Then somehow go from that to accusing me of rejecting the concept of IP. I don't even know where to start.

The strawman fallacy is when you misattribute a position to someone.

"Strawman" not a catchall for "I don't see all the steps in your argument for how my position X implied Y." I'd be glad to spell those out more explicitly.

>...It is fact that you can get most of what WSJ writes somewhere else for free.

It is a fact because people copy it. Therefore, you're saying it's not valuable because people can get it elsewhere because it was copied. So, the copy-ability made it valueless, exactly as I inferred from your argument (rather than misattributing it).

(late edit: Also, the fact that you insist on "atom-based" and "bit-based" being incomparable doesn't help your case that you're not rejecting IP: "Comparing the economics of a pure bit-based product (media) with a pure atom-based product (camera)".)

Okay, perhaps that is not what you meant and therefore it was a strawman -- but that's the only self-consistent, plausible reading I saw.

The only other meaning is that,

"You can get all the interesting facts contained in this story from different sources, without just copy-pasting."

Is that what you meant? If so, it's implausible on its face: why are people trying to circumvent the paywall if they didn't want the WSJ's article specifically? Why can't they just go somewhere else? Why do they load HN discussion with complaints about a paywall rather than "here's the free, independent version that's just as good"?

>And I said the "value" is very subjective. and it's not "valuable enough" for most people. Yet you go on and say I said "it's not valuable". Then somehow go from that to accusing me of rejecting the concept of IP. I don't even know where to start.

I said that because you dismissed the value on the grounds of it being "subjective", which was close enough in this context to saying "oh, I can't quite put a hard value on it, so I don't have to care about this journalism going away". As above, why don't people just find another non-copied source? Because they non-subjectively do want to look at this specific source.

With that said, I do agree that it may not be obvious how your position is tantamount to rejecting IP. But I was deriving that as an implication, not misattributing anything to you. Whether or not you recognize this position as implicitly rejecting of all IP, there is certainly a clear logical chain for how it has such an implication.

Edit: I know we're not supposed to talk about downvotes, could they at least wait the 60 seconds necessary to read this?

> why are people trying to circumvent the paywall if they didn't want the WSJ's article specifically? Why can't they just go somewhere else? Why do they load HN discussion with complaints about a paywall rather than "here's the free, independent version that's just as good"?

Because people come to HN, click on a link, see it's paywalled and leave a comment complaining about it, then move on to a different post. Most people who complain about the paywall likely aren't invested enough in the headline to find another source for the same information.

Personally I mostly skim HN for news. If an article is paywalled and isn't profoundly interesting to me, I can spend the time it would cost me to look up alternate sources for the same story just reading something else instead. I don't read WSJ articles because they're better, I read them because they're there. In fact, I actually prefer other news outlets.

You're misattributing motives to what boils down to mere laziness. And even in doing that you don't actually have a case because many news stories have alternative sources in the HN comments, especially when they're paywalled or too superficial.

So, yes, you're misattributing motives, which is literally how you just defined strawmen.

> If so, it's implausible on its face: why are people trying to circumvent the paywall if they didn't want the WSJ's article specifically? Why can't they just go somewhere else? Why do they load HN discussion with complaints about a paywall rather than "here's the free, independent version that's just as good"?

Social objects. The actual value of a particular article isn't the article itself, but the fact that you and other people in the tread have read the same article. In HN discussions, people will go around paywalls not because they can't find another source, but because they want to read the same source as everyone else.

It's not just that the product itself is easy to copy, it's that the facts themselves are not copywritable, only the writing is. Meaning a source without that restriction provides the value without the cost.

With the exception of more creative editorial pieces, most news falls into this bucket.

You're still saying that the work involved in collecting those facts -- journalism -- is without value, and if no one bothers to do this professionally because their work will just be instacopied, so be it.

You can be the person who just complains about how people don't appreciate journalism, or you can acknowledge the reality and think in more productive ways. The ones who make a difference are generally the ones who think more productively instead of just blaming the people who "don't get it".

If traditional media journalism was truly appreciated, then they wouldn't even have this problem. And honestly they have lost a lot of respect from people in the recent years because they have been doing a lot of things to undermine their own journalistic principles in order to make more money.

The fact that someone prefers to cheap out and read the copy-pasted version doesn't mean they don't appreciate the paywalled version; it just means they want it for free. For all we know, it was high-quality journalism.

See also the sibling reply.

It would be very interesting if Search could differentiate between the three users:

- WSJ subscriber, where WSJ results should be present and maybe even get a ranking boost.

- User happy with paywall results, these results could be surfaced, with UI treatment indicating paywall.

- User uninterested in paywall results, where these results would be essentially removed.

The ranking between the latter 2 of these users could be a bit of a sliding scale too, rather than two distinct groups.

Why is it Google's responsibility to administer all this? They have a published algorithm that is intended to maximize fair play on the web. If you don't like that algorithm find a way to buy off the other search engines for paid placement.

>> Why is it Google's responsibility to administer all this?

Setting "don't be evil" aside, Google's only responsibility is to Google.

But here's the rub. Google's product is its users' attention. If they bias their search results enough, (maybe) they lose product inventory.

How much is "enough"? Probably much more than filtering WSJ.

But IMO bias in a search engine – a presumed oracle of truth – is evil. So there's that.

If you think that bias is a bad idea than Googles current approach seems okay. They don't bias against WSJ, but they don't bias for WSJ either - The Google bot has less material to work with, so their is less content compared to other sources, so it has a worse ranking. No bias.

There are no more unique WSJ results if they're all a paywall.

They could apply a subtle background (like they do for sponsored results) to results that are behind a paywall, behind a registration-wall, or otherwise inaccessible.

Or they could just demote them, since inaccessible by any of those means less likely to be useful to a searcher.

Alternatively, wsj could sponsor those results, and jump to the top with their paywalled content.

I assume the rule that a site must show searchers what they show google bots derives from that contract with searchers is a key component of the search user experience. I think google have become enough of a monopoly that much of their behavior is negative for users as a whole, but not this. Imagine a search engine where cloaking is allowed...

You hit upon the main difference between the digital economy and the real world economy. People keep trying to equate the two.

Google itself is fueled by online ad revenue. Facebook has ads too, and it's trying to explore every angle (video, etc.) to get people to click.

Both companies know about your likes and interests when you are logged in (Facebook because you declared them, Google because of your past searches) and those of your friends (GMail contacts and GPlus / Hangouts connections in Google's case).

Facebook is more blatant about trying to monetize the ads and still do worse, because people use Google for search, and that's when their intent lines up with actually buying.

Thing is, ads are a commodity in a race to the bottom. You can see every company struggling with this, not just Wall Street Journal but Google. They keep going down. They try new things. Then the ad revenues drop again.

In the real world, you buy a product or a service and you pay for it. This doesn't go down nearly as much in price (although automation and outsourcing may make some impact on wages to locals). If you get paid a commission PER SALE, you've got a sustainable business.

David Heinemeier Hansson constantly talks for years about how you should be building $AAS and charging for it, instead of making everything free and slapping ads on it.

Another acquaintance of mine, Albert Wenger from USV has a whole book about how digital goods are essentially free to perfectly copy, leading to a post-scarcity world in digital goods, with only artificial constraints such as copyright enforcement propping up their price. He has a book out called worldaftercapital.com where he describes humanity's progression from the hunter gatherer societies through today, and explores what the economy will look like later.

So, if you're in the business of creating content, you're now competing in a GLOBAL market. WSJ and other publishers got to enjoy the distribution of their newspapers for decades with copyright protections, as did the music industry. But at the end of the day, when the means of copying information make it cheap to free, you've got to figure out other business models. The freemium one is a good trade-off. But really, why would I pay for WSJ after my free 3 articles a month are up? When there are so many other outlets reporting the same things? Because I like or want to support the WSJ specifically. Their brand. I did pay for a NYT subscription after all.

You know what I really value far more than WSJ and NYT? Wikipedia.

And how much does it cost to run it? Not much (aside from hosting) because people collectively build on each other's work and editors check new contributions to achieve a result that is quite good. People cooperate more than they compete, because the rules of the game are such.

I see such things as being superior to capitalist competition. Look at open source (Linux, Firefox) vs closed source (Windows, IE). The former runs on toasters, the latter doesn't and still over time is less stable. The former overtakes it in quality.

Why not as a society embrace WikiNews and other such sites? Why do we even need the old model of journalism? Because it will appeal to certain niches, just like newspapers used to circulate to small communities and not sold all over the world the next day. Now they'll be sold all over the world but their readership will shrink.

The information economy is and has always been. different. Digital ads for digital websites are far different than commission sales of real world items. 3D printing may one day bring the two worlds closer in some areas, but we are still all a long way away from that.

well put

Because information wants to be free and you can't copy a camera? Surely most people would not just buy an expensive camera from the place it costs most, even though the service might be better? You would probably compare prices and unless you have a strong alternative argument (moral?), you'd go for one of the cheaper places.

I'm not sure we should put the morality in Google's hands. If you want top journalism, you ought to pay for it, but it makes sense search engines provide the quickest path to information, and that's usually not a paywall.

The difference is that Google makes money if the WSJ were to be using their ad network to monetise, or not using if its paywalled content.

Very well said. I have this image of an alternate universe where people are whining that when they google for that camera, it should give them the name a fence (as in, seller of stolen goods) that offers home delivery.

Provides Google value

Not necessarily mutually exclusive

The whole point of a search engine is to make O(1) queries that iterate through O(n) sources; if I have to individually query all O(n) sources, that defeats the purpose.

> provides Google users value

Google shareholders value

That's great if you want to pay for good journalism. That doesn't excuse publishers from showing you one page if you are Google, and another if you are a non-logged-in user.

If publishers want to attract people to get them to subscribe, they need to find other ways that don't violate Google's search policies that have been in place since seemingly forever around cloaking[1].

[1] https://support.google.com/webmasters/answer/66355?hl=en

It would be nice to have that option. But such a small percentage of the web as a whole chooses to pay for such journalism that it is likely hard to justify implementing that feature. You can search WSJ.com by using the site:wsj.com argument after your keywords, but that isn't really what you're asking for here.

Generally speaking, paid journalism is not a sustainable business model for most entities. WSJ might be an exception, but they won't be getting any special Google treatment, nor should they.

Personally, I believe that the business model of most journalism sites will switch to a combination of sponsored content and the sale of Facebook and Google custom audiences. With retargeting, it's possible for an adveriser, say Microsoft, to tell a site like WSJ "we want to be able to advertise on Facebook/Google to people who read to the bottom of this article you wrote about cloud services" and pay WSJ for being able to use that custom audience. This kind of retargeting is already possible on Facebook and Google, but currently only limited to people that have visited your own site(s). Having a custom audience marketplace would be amazing for many advertisers and deliver badly needed revenue to publications.

If you're paying for good journalism why are you using Google search to find it?

Because a) google is better at indexing a given source, and b) every user shouldn't have to build their own search aggregator across the n sites they have privileged access to; that's the role of a search engine (except for it covering all sources not just a few).

See also Aggregation Theory: https://stratechery.com/2015/aggregation-theory/

There is specific discussion of Google as an example in that article, but the whole article is worth reading.

Maybe there's an opportunity for Google to partner with subscription sites to index their content for users who own a subscription -- but what should Google do when there's a ~0% chance a site contains useful information for a non-subscriber?

If you're running a top-notch search engine, why are you excluding the results I care about most?

Because Google can only index what they can see. As noted in the article, the Googlebot only gets to see the first few paragraphs of WSJ articles, so those pages are less likely to rank on searches.

Indexing more of the content (which would be possible by providing the full content to web crawlers) seems to violate Google's cloaking guideline as well: https://support.google.com/webmasters/answer/66355

It's not excluded. They present less content to Google users and are ranked based on the actual presented content.

If News Corp wants search ads for it's paid service, it can by then like anyone else instead of expecting Google to treat them specially, which is what they are really asking for.

Probably not; unfortunately, we don't seem to have a good digital marketplace for journalism.

That could be interesting, though; a storefront with a model somewhere between Steam/Amazon and Netflix/Spotify. Somewhere to both collate the offerings which you can purchase, and highlight content from those outfits at the same time.

Ultimately I don't think a market exists for it anymore. Information like that is just assumed to be free on the internet, I don't particularly care whether it's AP, WSJ, BBC or CNN as long as it's reasonably informative I'd rather just read a few articles from different source on topics that interest me than pay a single penny for content. As long as free sources of reasonably good content exist, they'll continue to dominate the market.

The WSJ makes most of their money from subscribers, so you're not correct wrt their readers.

I'm sure they do, but their subscribers are still a minority of the news consuming market so that has absolutely no relevance to what I'm saying.

Maybe the market is big enough to accommodate that business model, maybe they're unique enough to do so, but many others have tried and failed.

The market is properly weighted by dollars, not by number of people. A 100,000 people each willing to pay $200 a year for a subscription is a larger part of the market than 5 million people that will collectively click on 20 million ads in a year that generate an average of $0.001 for each click.

If people are willing to pay you a premium, you don't need to capture "a majority of the news consuming market", you just need enough of those people to be profitable. Ben Thompson at Stratechery says he has a little over 2,000 paying subscribers- a tiny fraction of paying WSJ subscribers, never mind the people reading free content- and he's doing fine.

In fact, that's a known winning strategy in a lot of industries, not just journalism: let the suckers rip each other to pieces in the race to the bottom, while you deliver enough value to have fat margins. That's what Apple does, for example.

I think it is relevant, the WSJ has a lot of digital subscribers and they're profitable I think in large part due to their paywall. FT is also profitable and employs aggressive paywalling. I've noticed smaller local publications tightening up their paywalls, as well.

I'd expect to see more paywalls going forward, not fewer. I think there's increasing recognition that the economics are better.

I wish I could use OAuth or similar to give Google permission to index the papers and magazines I subscribe to. If I haven't OAuth'd and the paper requires a login, rank it lower. If I have, rank it higher. Google always talks about wanting to personalize the search experience. This seems like a good way to accomplish it.

I don't think it's feasible for Google to index the web separately for every person!

The answer is yes. If everyone went to this site with the intention of paying for good journalism, Google would rank it higher. Now it might penalize the site if the first thing that it displays is something that looks like an interstitial popup rather than content, based on the assumption that this type of experience is a poor-performing feature on other sites.

Google is built to optimize relevance against what people are searching for at a heuristic level (increasing the utility of their search engine based on each immediate choice people make, as opposed to a model like Facebook that tries to increase overall relevance of experience to get more time spent).

Most people who end up on WSJ are searching for quick, accurate, free information. The landing page (a full article) provided that, albeit in an unsustainable business model.

The vast majority of internet users are not looking to subscribe — which has become the main function of the landing page now. That means that the site is, on average, less relevant at a heuristic level.


It would be fair if WSJ would be asking for that. I agree, that if I already pay for a subscription, the results should be ranked accordingly for me. There is probably a simple, technical way to implement that.

But AIUI this is not what WSJ is requesting here. They want free ads for their product ranked high in the search results.

I subscribe to The Economist and WSJ and have pretty much stopped visiting Google News, except when there is important breaking news. Google News really isn't for someone who subscribes to news journals. Apple News is much better for this purpose.

Coincidentally I just ended my WSJ subscription because they were publishing fewer and fewer articles. It was down to maybe 5-6 a day that were even worth reading.

I highly recommend you try the FT if you are a former WSJ user. Great depth of content updated throughout the day via thier website and their app. I have been a subscriber for ~12 years or so.

Presumably they will have software allowing you to search their private archive. Why would Google want to be this software?

I use Blendle which shows me excerpts of articles from Wall Street Journal, The Econmist, NY Times, etc., etc. and for a small fee I can read any article with no advertisement. I really like the service and I like pay-as-you-go things on the web so I can support stuff I find interesting.

Sure if google and certain users are willing to give up their history and privacy for it. I doubt many people would like that. I wouldn't like giving up my privacy to this severe degree.

WSJ is slanted I dunno recently but I feel like anything Murdoc kinda ruined it.

Maybe WSJ hasn't heard of all that AdSense money.

Then you wouldnt be visiting NewsCorp sites to begin with.

If anyone controls the NWO, its Rupert.

If you want to pay for good journalism, then rankings WSJ higher hasn't made a since several years before the News Corp. takeover.

More to the point, though, if you have an affinity for WSJ and similar content, yes, Google will probably pick that up over time, though getting a big enough boost to outweigh the cloaking penalty completely may be difficult.

OTOH, if the WSJ is useful to.you even with a hard paywalled—that is, if you are a paying subscriber—you'll presumably have it bookmarked and it will be on of your go to direcrt sources for news; you won't need discovery through Google to find content there very often.

I don't know about Google search being as great and flawless as you're implying here. I run a website which is dedicated to one particular topic, and Google won't list it at all ("no more results after like 198 hits" of lots of outdated and superficial content), even though the site is registered via Google search console. On other topics, I get pointless "123000 matches in 0.23 ms" results. Not impressed at all; a naive keyword count as rank criterion would do better

You can check it out yourself by searching for "SGML" and compare my site http://sgmljs.net/docs/sgmlrefman.html.

- There are other websites with way more authority talking about SGML. (Like the W3C website).

- You have no/little sites or forums linking to your content.

- Your page titles are uninformative. Biggest offender is probably the homepage, with a page title of "index". But even for your reference page, it it is just "Syntax Reference" (way too general), and Google actually uses your page headings to repair this to "SGML Syntax Reference". Try inverse breadcrumb style "SGML Syntax Reference | Docs | SGML.js". BTW: you ranked 2nd for "SGML Syntax Reference".

- Suspicion: Content not visible (like those in the content slider) is ranked lower than always visible content. Chrome headless crawler can detect this. Add this slider content as regular text to your homepage, and also try to expand content there. Include links to your latest blogs.

- I prefer hierarchical headings, not just sections and <h1> for everything. This, because hierarchical headings can not hurt, but non-hierarchical headings could hurt.

- Finally, SGML being a standard, there are simply a lot of competitors for this keyword. These competitors are not commercial competitors, but authoritative websites with lots of informative content. Exactly the sites that Google likes to rank high. If you want to rank for SGML, you may be fighting an uphill battle.

Many thanks for the tips (also the other guys).

I'm aware of some of the issues you mentioned, but don't you think my site, with the depth of information provided, deserves at least a mention among the other ~200 ones? I'll try and fix the heading issues first, then see if search results improve.

Your site shows up fine if I search for SGML Syntax Reference or SGML syntax. Meanwhile, it doesn't appear to generally be about learning basic SGML concepts, so it seems quite reasonable that it doesn't show up when just searching for SGML. And why blame Google when Bing does the same?

It seems like your complaint is that adding one page of reference info wasn't enough to serve as an ad for your business. It doesn't seem like a valid complaint to me.

While your point still stands, don't you think there's things you could do to help your site index better on that topic? It has no meta data like keywords and descriptions, no robots.txt/sitemap.xml, the links to the /docs pages are hidden under an overlay that requires JavaScript to show, that URL has 10 different <H1> tags, etc.

Googles job IMHO is to find relevant information, not judge it based on costs. How often do we get second hand news where an article is simply parroting what another site wrote? I'd rather get information as close to the source as possible in most cases. Perhaps that means paying for it, but that choice should be mine to make. Put the most relevant at the top and if people want to read a free second hand ripoff of some news they can select a lower ranked search result.

>Put the most relevant at the top and if people want to read a free second hand ripoff of some news they can select a lower ranked search result.

Couldn't you argue that if the free "ripoff" has the same information and is more easily accessible it should be considered more relevant?

Now how would google find the relevant info (and index it) yet at the same time a 'regular' user would be restricted by the paywall? That might be a solution you can implement technically, but will it also work for your users?

If users don't have access to info because it is restricted to paying users, google won't have access either.

It is technically possible via web cloaking. It is the same technology that tricked GoogleBot think you are a legitimate content site and get served pharma ads for big blue pills.

It also makes sense because they can't crawl a page that is behind a paywall.

Of course they let in google's crawler.

I guess what this one infact states is that what they were doing was okay.

In the video the guy explains that while you are not allowed to treat gbots request in a way no other people are treated, it is okay to differentiate between "boxes" of users. In their example it is the country USA, but if you define a "google user country" and let all users of google in it is ok to bundle the gbot with those. Grey area for sure, but might makes right.

From the video

"So geolocation, that is, looking at the IP address and reacting to that-- is totally, fine, as long as you're not reacting specifically to the IP of just Googlebot, just that very narrow range".

Also, they will crawl you from an unusual IP using a user-agent that doesn't say it's Google. And when that happens, and you deny access to undercover-Googlebot, but allow Googlebot in full uniform, you'll be penalized for cloaking.

I guess they in that case could argue on the wording if they don't JUST react to gbot, but gbot and anyone coming from the google homepage.

They would have to react to just gbot when constructing the special google urls when the bot is crawling the site, tho.

> but gbot and anyone coming from the google homepage.

This how they were originally handling it, before February. It would display if you visit the link from Google, or set your refer to it looks like it (this is why HN has the "web" link under articles), even if you weren't a subscriber. It's allowed, because regular users coming from Google do see the same thing as Googlebot.

WSJ have since changed that, so only subscribers can view articles, and you no longer get a "free click" coming from Google Search as Google calls it. They now show a short snippet, and are following guidelines to be labeled a “subscription” service by Google Search. This caused their rankings to drop below being a "free" news source though. But it's not nearly as bad as if they had cloaked Google.

I went through the whole thing and it seems, from the video, that they could avoid cloaking by redirecting non-paid users to the sing-up page.

This way WSJ would be showing the same (www.wsj.com/Bezos-with-hair) to every user.

*based on the video

In the video, he stresses pretty hard that all forms of cloaking are disallowed, even if not malicious or deceiving. Unless the Googlebot has a paid subscription to WSJ that's still cloaking, as you're showing Googlebot a different page than a regular user.

Google's rules and help documents are spread all over, but here's some from Google News about subscriptions:

"If you prefer this option, please display a snippet of your article that is at least 80 words long and includes either an excerpt or a summary of the specific article. Since we do not permit "cloaking" -- the practice of showing Googlebot a full version of your article while showing users the subscription or registration version -- we will only crawl and display your content based on the article snippets you provide."

edit, forgot link: https://support.google.com/news/publisher/answer/40543?hl=en

> sing-up page.

That's a new one. Buy a subscription to the WSJ now for a song.

You just say bingo.

The whole purpose of making a webcrawler for a search engine is that you are crawling the content that a user will see upon clicking a link in search results.

Even if it's behind a paywall, there is benefit in crawling it.

For instance, if you happened to be a logged WSJ user, then google could show you the result based on your cookies.

Googlebot does not crawl the web separately for each user with a copy of that user's credentials borrowed from their browser. Aside from the privacy and security issues, it would require Google to multiply it's search resources by the number of users.

> Googlebot does not crawl the web separately for each user with a copy of that user's credentials borrowed from their browser.

I did not say that.

I said google could display the results to which you have access, based on your cookies. Assuming they had access to crawl the whole article.

That would require Google to be in the business of knowing every site that you have a membership in.

It is not possible for Google to get access to cookies for other sites, anyway. This is a pretty fundamentally important security restriction that browsers implement to protect you from nefarious sites. So it isn't possible for Google to know which sites you have paid accounts with unless you explicitly tell it.

> Even if it's behind a paywall, there is benefit in crawling it.

How would that work? Would Google create or be given a login for WSJ? That would benefit WSJ as an incumbent news provider at the expense of startups too new or small to get special treatment from Google.

How would a new website indicate to all search crawlers (so google doesn't befit at the expense of other search engines) how to get access to it's pay-for content in such a way that end-users can not also pretend to be seach crawlers and get access to the same content?

Yep, that's literally https://en.m.wikipedia.org/wiki/Cloaking . If they were a smaller site, they'd probably get outright banned by google.

Possible, but seems unlikely. To set that up the WSJ website would have to allow Googlebot access while denying others. Any filtering based on the url or HTTP headers would be discovered and abused by others. An approach based on a security token or IP filter could work, but would be un-managable on Google's side because of the scale of their spidering operation. It would be much more effective for them to use their position to force the WSJ to be an open website, or to accept that their paywalled content does not get indexed.

Just allow access from all of Google's IP space, as long as the user-agent contains "googlebot'. It's pretty trivial to do...

Sure, some people could set up GCP instances and proxy it, but that's a very tiny percentage of people.

It might start out as a tiny percentage, but all it takes is one person setting it up and letting the world know about it. Then pretty soon the WSJ is faced with millions of people getting free content again. They're complaining about people accessing their articles via a Google search and then clearing cookies to reset their counters. That's hardly mainstream; browsers have been burying the clear-cookies functionality deeper and deeper over the years because it's seen as an advanced-user-only kind of thing. And yet the WSJ has millions of people doing it, enough to make an impact on their bottom line.

I'd bet it's browing in incognito mode versus actually clearing cookies.

You can verify if an IP belongs to Googlebot or not, no need to whitelist GCP.


Google doesn't post a public list of IP addresses for webmasters to whitelist. This is because these IP address ranges can change, causing problems for any webmasters who have hard-coded them, so you must run a DNS lookup as described next.

DNS lookups are far more expensive to perform than an IP filter, and couldn't be done in realtime. So WSJ would have to set up a system where they regularly find all Googlebot referers in their logs that were rejected, do DNS lookups on the IPs, and add any that were valid to a whitelist so that they won't get rejected again. This will cause new Googlebot IPs to get rejected until the whitelist is updated, hurting indexing and ranking. The WSJ would also have to go through their whitelist regularly and do DNS lookups to verify that all of those IPs are still valid Googlebot IPs, and remove any that aren't valid anymore. That opens a window for invalid IPs to continue to get access, which may or may not be a problem depending on how often IPs change and where they get reassigned to.

The IP whitelist would need to be distributed to WSJ's webserver farms and used to update firewall rules, in an automated way that may or may not integrate with how that stuff is currently managed. (Generally, those rules would be tightly controlled in a big org like the WSJ.) The HTTP access log gathering from the farms and their analysis would also need to be automated, which again might be a management issue if the logs contain anything sensitive. (Like, I don't know, records of particular individuals reading particular stories which certain government agencies might be interested in acquiring without the hassle of a warrant.)

So yeah, there's a way to find out if an IP belongs to Googlebot. That's a long way from a manageable filtering solution at the WSJ's scale, even if Google wouldn't penalize them for doing it, which they would.

There are commercial solutions¹² to this which are widely used for cloaking.

[1] https://my.bseolized.com/products/ipgrabber

[2] http://wpcloaker.com/

Googlebot's IP range and GCP's IP ranges are disjoint.

There used to be a lot of sites that did that, and the result was that people would set their user-agents to match google's and get all the paywalled content for free.

Not sure why you're being downvoted... It says in the article:

After the Journal’s free articles went behind a paywall, Google’s bot only saw the first few paragraphs and started ranking them lower, limiting the Journal’s viewership.

I find the turn of phrase interesting: "Google's" users.

Yes, of course technically that is an accurate description.

But although I am a user of Google, I don't like the idea of them thinking of me as their asset, even though I obviously am.

And irrationally because Google feels like such an omni-present utility, my intuitive expectation is that they index the web in a non preferential way.

Which still doesn't make any sense, because the entire utility of their search results is because they weight what they index.

There's some cognitive dissonance here, but I can't put my finger on exactly what it is.

The possessive in English doesn't necessarily indicate actual or presumed ownership. It can also indicate looser affiliations and even relationships where the subject is not the controlling party. It's my shovel and my horse, sure, but it's also my country, my boss, and my God.

>> There's some cognitive dissonance here, but I can't put my finger on exactly what it is.

It's the cake... you can't have your cake, and eat it too.

The cake is a lie.

Agreed. Seems simple to me; if you stop providing the content those users are coming for, they stop coming. Why is this hard for NYT to understand?

How do you plan on balancing this overly strict definition of value with a sustainable model of funding (critically necessary) investigative journalism? All this helps is the proliferation of ideology-reinforcing propaganda and clickbait. Are we going to shut out real, substantial content from the online space?

The commoditization of search can't come soon enough. Google shouldn't be able to monopolize this space.

That's WSJ's job to figure out, not search providers. From google's perspective, a search which leads users to a page that doesn't load, which the user immediately abandons, is a bad result. WSJ isn't entitle to traffic because they're an old organization.

It's not about "entitlement" it's about the structural incentives put in place by the online media playing field.

To go out in the streets and talk to people, dig deep into obfuscated government archives, and actually make sense of the world takes real work. It's often work that results in content that isn't always just what their readers want to see.

None of that influences the way Google's spiders see their site, bad results are bad results. WSJ could leave their content freely available and sell banner ads, vs. locking themselves into the subscription model they're familiar with. It's not Google's fault WSJ doesn't want to change.

Or they could seek patronage without a paywall, like the Guardian. If they want subscriber exclusivity, they shouldn't be surprised that search engines serving people that are mostly not WSJ subscribers rank their content based on what is visible to non-subscribers.

It's entitled to traffic because it puts in work.

That's not how it works.

That's not how anything in life works.

> How do you plan on balancing this overly strict definition of value with a sustainable model of funding (critically necessary) investigative journalism?

Allow people interested in paying for content to pay for content, and then use the discovery mechanisms the content owner makes availlable for that content.

If the content owner wants to promote content that isn't available to the general public via search engines, they can buy search ads like anyone else.

They don't need a subsidy at Google's​ (and, in terms of time, Google users') expense by way of organic search results for content that most search users will not be able to access when they click the result.

> The commoditization of search can't come soon enough. Google shouldn't be able to monopolize this space.

Commoditization of search (which means heavy competition focussed on minimizing costs) isn't going to make it any more likely that competing public search providers are going to subsidize paywalled content that their users can't use.

So the implication then, is that real & substantial content comes from those who get paid, and ideology-reinforcing propaganda and clickbait come from those who don't?

Are you sure it's true? Are you sure it's not the opposite? Could either or both be true?

Substantial content takes more work to produce then clickbait.

We must critically examine the structural incentives we put in place that support shallow outrage porn over deep, well-sourced, investigative journalism.

The problem is that is false. Ads (and the click bait they encourage) certainly is a monetization strategy; but to suggest it's either free rubish or paywalled quality is patently false. Ignoring completely state sponsored media (NPR, BBC, etc), groups like News Deeply [0] offer proof by contradiction to your assumption.

Killing search isn't going to add more "real", "substantial" content online.

[0] https://www.newsdeeply.com/

NPR isn't anywhere close to "completely state sponsored media"; NPR gets basically no direct government funding, what it does get is indirect through its member stations, who get less from government than from corporate sponsors, and less from sponsors than listener donations.

Please don't frame your response as a logical argument while ignoring the ambiguity introduced by your use of "that" to refer to my entire statement.

Without good search, there's a discovery problem. How will people find quality content easily? Where'd you get "killing search?" Search clearly provides value.

If the monetization strategy is, as WSJ, to make their content inaccessible, it is them shutting themselves out of online spaces, not Google. Walled gardens break the web.

Because every Google user lacks a WSJ subscription and will only be satisfied with news articles they can read immediately for free?

From the fact that I'm searching for e.g. a black & white old Western movie, it doesn't follow that I only want movies I can view on the internet right this second. Edit: I could very well be satisfied with "here are the names of popular ones, but you'd have to go to their publishers since they're long out of print.

Which is the reason Google still shows WSJ in its search results. So, you can decide to still go there even though it is a subscription website. But Google only works with things its bot can see. What you suggest amounts to boosting WSJs search position despite only seeing a small part of their content, because "we know that they are good". Doesn't sound like a great idea.

If you click a search result and end up seeing something completely different than what you expected based on the search result snippet, it shouldn't matter if you're the WSJ or a scam site trying to hack your Google rank. It's deceiving the user and inflating your search ranking at the expense of more deserving listings.

This. I installed the personal blacklist extension for Chrome just to blacklist Quora results because of this. I wish Google would actually punish them more. I click through on something which looks relevant, and the result is a useless login screen. LinkedIn is the same.

You can bypass the Quora login screen by appending ?share=1 to the URL. There's also an userscript which does this automatically: https://github.com/sindresorhus/quora-unblocker-userscript

If you change your browser's UserAgent string to Googlebot, then your client will be treated as a first-class citizen, by many of these sites. Google always wins, so let's all be Google.

Works great until you stumble on one of the big sites who will auto-ban you for not having a valid Google IP address.

* Reverse DNS records. Webmasters shouldn't be verifying Google's bots by hard coding IP addresses.


Shouldn't != never happens

CIDR blocks and ASN advertisments are cheap.

Update those periodically (hours / days / weeks). The adverts don't change particularly quickly.

It's extremely rare to be ip-blocked by any website just for using the Google's user agent from a non-specific range. IP's get re-used and you can switch to a new one easily, so it's really not common or good practice for this to happen.

> IP's get re-used and you can switch to a new one easily, so it's really not common or good practice for this to happen.

On the flip side, some people can't change their IP addresses easily, and getting IP banned (even if rare because of the reasons you stated) is actually a major hassle when it actually happens for those people. :/

Is that really a thing? That must be such a hazard for their developers. I usually have a test for sites that I work on, that scrapes a few URLs as Googlebot, to verify that they are getting an optimized view (no JS, structural-only css).

God. That's the reason so many sites look great in the results and are confusing interactive messes when I get into them.

If it's any consolation, the site is well tested in Noscript mode.

You sir are a god among Web developers.

If only they all did this. So many sites I get to and they're a blank page or an absolute disaster....

Yes. Googlebot only crawls from legit addresses (even when their developers are trying new things) so it's an easy scraper/scammer signal to key off of.

No. Most websites don't do this.

I just tried this, doesn't work on WSJ. Tried the user agents listed here: https://support.google.com/webmasters/answer/1061943?hl=en

Isn't that what the entire article is about? That WSJ no longer gives access to Google users (and bots)?

No. It allows Google bots to see full articles, but shows only the first paragraph or so to non-subscribers. Even if they're coming from Google search results.

However, I don't see cache links on Google :(

Edit: Oops, I'm wrong. The article does say that the Google bot only sees the first paragraph or so.

No. You're wrong, as it states in the article:

"The reason: Google search results are based on an algorithm that scans the internet for free content. After the Journal’s free articles went behind a paywall, Google’s bot only saw the first few paragraphs and started ranking them lower, limiting the Journal’s viewership."

Yes, you're right. I got confused by all the discussion about Google checking for cloaking by comparing results using different user agents.

So maybe this is why there's no Google cache.

Also, if Google can only index the first few paragraphs, the results are much less comprehensive.

Or just turn off Javascript. Makes lots of sites better.

I call that enacting the nuclear option. It's almost guaranteed to win the war with ad-tech! It should be enacted for sites with run-away ad engines that spin up your CPU fans and make scrolling laggy.

Of course, the problem with nuclear is collateral damage. Drop the bomb and ads don't work, but neither does a lot of other stuff. E.g., the site shows a blank screen, images are invisible or blurry, drop-down menus don't drop. And, of course, the deal-breaker: videos don't play.

The remedy for killing JavaScript is more JavaScript (and CSS). But supplied inside a Chrome extension targeted at the offending site. An injected stylesheet makes `<body>` visible again, hides assorted useless junk, and styles injected UI elements. Your content scripts load the missing images, drop the menus down, and play the unplayable videos in button-activated pop-over windows displayed at superior resolution.

Of course, the problem is, there are a lot of sites out there, and they change unpredictably, requiring your extension library to change in response. That argues for crowd-sourcing the extension library, but the crowd needs to be proficient in HTML, JavaScript, and CSS and know the ins and outs of browser extensions and care and have time.

You can completely change how a site presents. E.g., change a slide-show in a static slide window that barely moves due to the background ad-tech load changes into a set of `divs` that roll upwards as your finger swipes.

It's a hobby at best. Disabling ad-tech components by origin is the practical option.

Call me Dr. Strangelove, then. I usually browse with JS off, enabling it on occasion. And there are some whitelisted sites.

I used to play around with filtering sites to make them less antisocial, but find that slog less entertaining these days. So now when confronted with a site that's useless without JS, eh, there's almost always another site out there that doesn't mind the terms I demand for my attention.

Ah, I made a similar (but simpler) one years ago: https://github.com/kevmo/greasemonkey/blob/master/quora_upvo...

Pinterest too.

The worst thing about Pinterest results, is they aren't primary sites -- I'd rather get a link back to the source page the image came from.

You could always use Pinterest directly if that's what you want.

Do you need an account to actually make Pinterest useful?

Sometimes I do a Google Image search because I found something interesting, but don't know what it is, so I'm hoping of going to a page that describes what I was looking at. Pinterest shows up as a result, but with no backstory nor does it lead me back to a source, so it's worthless as a search result. It'd the ExpertsExchange of image searches.

Endy's advice of using `-inurl:pinterest` seems invaluable and I'll be adding that for all image searches in the future.

You can use Pinterest directly without signing up - https://www.pinterest.com/search/

pinterest has ruined image search results, it's infuriating

Would kill for a Google Image Search extension that forwards my browser to the original source page rather than the Pinterest landing page. Any recommendations?

would you pay for it?

One-time payment of a few bucks, sure.

Agree. Pinterest search results is spam.

You know, even in image searches "-inurl:pinterest" works pretty well.

Google images feel like it's nothing but Pinterest now.

Frankly I'd love Google to turn Google Images into much more of a true Pinterest competitor. Image search volume is pretty large, and they are already testing monetizing it. Now they just need to add in features that let you store and organize that information (and build a better profile of you in the process while also crowdsourcing tagging).

Pinterest needs more competition.

> X needs more competition.

In the context of Google being the alternative, this is sad and funny at the same time.

Hacker News in the abstract: The web needs more decentralization.

Hacker News IRL: Let Google own every vertical.

I certainly recognize the irony of suggesting that Google might help improve competition.

That said, there aren't many players in a position to provide actual competition to Pinterest like Google can. The obvious concern is where you draw the line at anti-competitive if Google tried to do the equivalent of what they did with Yelp ratings.

If they did that I would switch to Bing's image search immediately. Definitely not what I want when I'm searching for an image.

Bing is a lot better than Google for searching images. It also has better tools built in to the results page (similar images, different sizes, multiple source pages).

Could you PLEASE PLEASE PLEASE link to this extension so I can install it to, I used to have one but lost it and couldn't find one that worked last time I checked. (Would love one for both Chrome and FF if it isn't cross platform).

Would also love to have one that rewrites URL's in search results to avoid the frequent 5 second pause when Google's redirector gets it's head stuck up its ass or whatever the problem is.

Personal Blocklist by Google https://chrome.google.com/webstore/detail/personal-blocklist...

Note that it doesn't block some spam domains that sneakily use certain special characters i their domain names. Unfortunately Google hasn't fixed this issue for forever.

Have never signed up to quora and I don't get these nag screens :-s

What's your problem with Quora? They provide great answers most of their time and their digests are very inspiring.

Quora is one of the biggest offenders of growth tactics at the cost of user experience. They make it so easy to create an account without intending to that I probably have 15 of them.

The Google search result looks like you'll see the content when you open the link, but instead you have to login first. OP doesn't want to login.

This 2013 article sums up nicely why quora is frustrating to be directed to while seeking enlightenment, and as others have pointed out they use some unpleasant anti-patterns to lure in (potential) users.


You're right, this is wrong in our current paradigm. But our current paradigm is wrong, too.

I want a world that supports business models other than web advertising. It negatively impacts journalistic integrity and freedom while further exacerbating the race to the bottom search engines and other content aggregators create.

WSJ is responding poorly to a bad situation. I suspect it'll cost them.

http://brave.com/ is one approach.

Out of curiosity, do you think journalistic integrity was also impacted when newspapers used traditional paper advertisements as their source of revenue?

Eg, have newspapers _ever_ had integrity, and if they _did_, what's different?

The papers lost classified ads. They were a large revenue source which did not dictate an agenda.

> Eg, have newspapers _ever_ had integrity, and if they _did_, what's different?

They did somewhat before the massive corporate consolidation starting in, IIRC, the late 1970s when newsrooms started getting axes and the major dailies progressively became skins over wire services and lightly-rewritten press releases.

The internet often gets the blame, but it providing actual competition was decades after the terminal quality and subscribership decline of American newspapers began.

It actually was the internet competition (both wire services being directly available to readers and the loss of advertising) that actually got some of them talking about building up newsrooms, rebooting investigative journalism, and relying more on subscription income (paid subscription was never paying the bills before, it was pursued as a key metric advertisers used in determining how much it was worth to advertise in a paper.)

That's a very interesting thing to try to quantify. I would say they had _more_ integrity, but risks always existed. Reuters used to have a corporate structure that would prevent it from capture, and even that didn't guarantee impartiality.

Multiple revenue streams (sales, classifieds) would make them less beholden to advertisers.

Of course they would still have to write material that sold!

I don't like to use John Oliver as a source, but there's some decent content in this:


I'm totally with you, but what would you propose as a pragmatic solution? I really can't think of anything myself.

This actually sounds like a really interesting/profitable problem to solve. I'm sure there are tons of sites that want to be pay sites, but also want articles to show up in google search results. But users don't want to get inaccessible results. Google (or whomever) needs some solution to handle non-free results intelligently. Allow users to filter out non-free results, to configure which journals they do have subscriptions to, etc. Even an easy way to make micropayments.

If Steve Jobs was still alive, I'd bet Apple would be working on a competing search engine with some of these features.

I agree. Advertising is not a great business model on its own, particularly with the proliferation of adblockers. If there was a way to get people supporting more paid content then IMO the quality of content could also improve. But who knows how all this will go in the end. There's always been a lot of high quality free content on the Internet as well, just because people are willing to share their knowledge with each other.

krschultz's comment is really relevant as well.[1] In a complete system, search should actually know about what you subscribe to already, and not penalise those results for you.

[1] https://news.ycombinator.com/item?id=14492870

> search should actually know about what you subscribe to already

I don't want search services to know what I subscribe to. That's private information.

Again, Steve Jobs. Under Steve Jobs, I think Apple would figure out a way to make the user experience terrific, without the end game of "Apple hordes every piece of your existence" like Google.

Steve Jobs reinvented PCs, reinvented mobile. I think "the next Steve Jobs" could do the same thing for search. I'm less and less certain of Google's monopoly on that space going forward. It's still built around Web 1.0 tech, has hacks into Web 2.0, but there's a Web 3.0 it's not ready for.

That is a step that tends towards a walled-in closed Internet...

It's probably inevitable. TV was free airwaves for so long. Now it's cable subscriptions. (And there's still advertising). Internet is already controlled by your ISP, so I have a hard time seeing that being "open" for much longer. Browsers, content providers, etc will all be regulated / commercialized / sandboxed.

Free zero-revenue startup idea: there'll be an IP-over-ham-radio or something to preserve "internet classic". (Largest use will be bitcoin-for-pornography).

I wish I could just subscribe to news like I subscribe to Hulu. I'd happily pay $15/month ($22 ad-free) and then the papers I get access to can figure a fair way to divy it up.

There's got to be a startup idea in there somewhere.

I'd happily pay, but not a flat fee. I'd maintain an account with some third party, with ~$50 balance. Browsing ad-driven sites, the agent would bid whatever it took to get all ad slots, with a limit of $0.10 or whatever. Over that, I'd get a quote, and could accept or decline. Browsing paywalled sites, I'd just get the quote.

So the company that needs this is google? I doubt they care whether WSJ paywall results are missing or not. Content in many industries is a commodity and plentiful

If the future involves a domino effect where major content providers start falling WSJ's lead, then yes, it's an indication that the current model isn't working and Google needs to either evolve or be replaced.

thats a big if. of course if that happens then yes you are right but its not likely at all.

put this in perspective: there are millions of sites out there. theres just 1 major search engine. Yeah I think i know who has leverage here.

Yup. We used to call it "cloaking", I don't know if that's still a term in use in the SEO world.

They are known to crawl using human-like user agents instead of the typical Googlebot one precisely to counter this (weak) effort at playing the system. I'm surprised WSJ is surprised by the outcome here.

They're in no way surprised; they're trying to marshal public outrage or lean on anti-trust efforts in order to force google to rank them higher.

I wish google would give us somewhere to report sites that are doing this.

It's here: https://www.google.com/webmasters/tools/spamreportform?hl=en

It specifically mentions cloaking.


> Webspam pages try to get better placement in Google's search results by using various tricks such as hidden text, doorway pages, cloaking, or sneaky redirects. These techniques attempt to compromise the quality of our results and degrade the search experience for everyone.

According to this Facebook should rank very low.

Facebook doesn't even need Google, their users just visit the site directly.

I guess Mark zuckerberg doesn't lose sleep over this.

Google does not penalize Facebook for web spam. Facebook, to my knowledge, does not do:

- hidden text,

- doorway pages,

- cloaking, or

- sneaky redirects

They just show a big popover nagging you to log-in. But you can click this away.

If certain Facebook content pages rank low, or do not rank at all, it is because Facebook actively blocks Googlebot from accessing the content, not because Facebook is trying to deceive Google (or the user).

Though Facebook does not need Google, it could get quite a lot more visitors if it lowered the wall of its garden a bit. As is, Facebook is an inaccessible social echo chamber, and I don't lose any sleep over this.

But many Google users do see the content. I wish Google knew that I paid for subscriptions to WaPo, WSJ, & NYT and ranked things accordingly. I never want to open an ad supported story when the same thing is covered by a publisher I subscribe to.

I think it's not far fetched to imagine someone with a paid subscription is subscribing to use it habitually as opposed to just having access whenever they clicked on a link via google.

Right but lets say I search for an old topic (not current news). It won't be on the home page of the WSJ, NYTimes, or WaPo. But I'd like Google to surface that for me since I'm a subscriber and pay to have access to journalism like that.

100% agree. I've actually reported WSJ to Google several times for cloaking (that's what Google calls this bait-and-switch). The penalty for this is supposed to be deindexing of the entire site from all Google results, but apparently WSJ gets a pass on that. At least they're being punished to some degree.

For the record, if anybody needs a draggable WSJ paywall bypass bookmarklet, I put one up here:


WSJ can in principle counter it by offering a summary for the unregistered logins (and bots).

If Google doesn't differentiate between "the WSJ or a scam site", its value to me is severely curtailed.

Since WSJ objectively is a scam site in the relevant dimension, offering different cotbent to Googlebot than to search users, it absolutely should be treated exactly like other scam sites.

Different content to some search users. All WSJ subscribers see the same thing as the Googlebot.

> Different content to some search users. All WSJ subscribers see the same thing as the Googlebot.

WSJ subscribers, online and print combined, are in the low single digit millions. Google search monthly unique users are about three orders of magnitude greater. WSJ online subscribers are close enough to 0% of Google's users as to make no difference.

So, that "some" is essentially all.

The vast majority of google searches are certainly not all searching for things that would lead them to WSJ.

If we take just the population of the US in 2017 (326.5M), and assume every American searches via Google for their news, we're looking at give or take 1% of the US with the WSJ subscriber estimate you provided (~3M).

We can refine these numbers further...

19.4% of the US population is less than or equal to 14 years of age (18 would be better, but couldn't find) - so that gives 263M potential American news readers

The Pew Research Center http://www.journalism.org/2016/07/07/pathways-to-news/ states that about 38% of adults in 2016 often get news online (~99M)

That leads to at least 3% of US Google news searchers are potential WSJ subscribers.

So that's not a tiny number (yes the number could be adjusted for worldwide English speaking news googlers - but I think I've made my point).

How many of these subscribers are wealthy and coveted by advertisers?

What if more news publishers follow WSJ and you happen to be a subscriber of that content?

With the amount of information that Google has on its users, I don't see why it can't adjust search results based on whether or not you subscribe - and bring value to whichever side of the paywall you reside on.

Then the WSJ shouldn't cloak content like a scam site.

They're making Google's search engine less valuable in order to get free advertising. From Google's perspective, how are they not a scam site?

It can be both the WSJ and a scam site at the same time.

I think this is not what WSJ is after. I think they are trying to pressure Google to find a more suitable policy (suitable to WSJ) by putting Google users behind paywall while allowing free visits from social media. They are probably laying the groundwork for future battles, sorry, I meant talks.


There's an interesting difference between overlay paywalls like Wired uses, and content-not-loaded walls like WSJ uses. In the Wired case, they text is sent to you but they try to stop you from reading it. In the WSJ case, they don't even send you the text of the page you supposedly clicked on.

Since we're in the second case, this isn't even a decision by Google. The WSJ actually isn't sending you the data in the search result snippet, so the crawler rightly says "wow, nothing useful here". The complex, ideal solution might let me tell Google "search as though I'm a WSJ member", but short of that they're accurately assessing what content is actually available.

Why can't WSJ provide full indexing access to Google crawler robot but keep it paid for users? That way at least the scam problem will be solved?

They'd probably be happy to do that, but my understanding is that Google generally frowns strongly upon sites that display different content (or no content) to users when they tell Google those pages have content.

I could imagine Google adding some sort of "content is locked behind a paywall" indicator on search results, but if I'm searching for something on the web, a link to blocked content is not very helpful most of the time.

Yep - imagine a world where most the top search results are subscription only. That's a terrible UX for a Google searcher to have to go past page 1 to actually have content. Google definitely doesn't want that, and its also anti-open web

I actually would find it useful if Google had an indicator like 3 pay-walled results omitted, show them?

At least being able to know it exists can help consumers decide if they should pay for an article/subscription.

if Google is going to accommodate pay-walled results like that, you can expect this indicator won't just show/hide 3 high quality pay-walled results, but also 30 shitty ones that will want to get in on the potential pay-wall business.

Google could charge paywalled sites to make such a feature, but as a user I never want to see that.

> a link to blocked content is not very helpful most of the time

That's called advertising, but the WSJ will have to buy ads like everyone else.

Why would Google know who is and who isn't a WSJ subscriber? Would WSJ subscribers want Google to know that information?

+1, this is already pretty bad for a lot of searches with so many sites erecting paywalls, pushing required logins, or anti-ad-block.

Not sure if they track this but whatever the Googlebot's view of the site's content, if it's constantly bouncing users back to the search page it should get hit with a hard penalty.

My thoughts exactly. If WSJ wants to increase monetization through paywalls, it has every right to do so, but should suffer in SERP accordingly as a paywalled-article is generally not what users are looking for.

It isn't necessarily deceiving. In many situations, I'd rather know where the information is rather than Google pretend it doesn't exist.

For example, Google scholar will search papers that are behind a paywall. By blocking them, I wouldn't even know what to purchase.

This is google foisting their business model down our throats.

Note that the Twitter workaround still works:



Just append site:twitter.com to your Google search, click through, and voila.

WSJ also still serves articles to folks coming from Facebook. There's a Chrome extension to redirect all WSJ URLs through Facebook to bypass the paywall, works for now.

I made a bookmarklet for it in chrome:

javascript:(()=>{window.location = "https://l.facebook.com/l.php?u=" + window.location})();

Clicking the bookmarklet when you're on a WSJ article will shunt the url through Facebook's redirect service, which will allow you to view the article.

Can you explain this in more depth? I can't make it work. Thanks

I believe they are saying: if you find the tweet announcing the article, you can click through from the tweet and view the article.

This doesn't work for me. Do I need javascript enabled for it to work?

Yes you do, unfortunately. I had to allow scripts for t.co in NoScript for it to work.

Confirmed. Clicking on any article from @wsj Twitter feed works.

Watching this very closely, I pay for a WSJ subscription because I think their content is better than most, and also because I get sent alot of links to their content. Something about this later point feels like the argument people make about using Office because people still send them Excel and Word docs.

Similar to how software companies release free software to augment what makes them money, Bloomberg is able to spend a lot of money on producing content that is sponsored by their terminal subscriptions.

The WSJ might be in a unique situation where their primary audience will pay, often due to companies footing the bill for employee's, so perhaps they can be one of the few news producing companies that doesn't have to depend on Google for traffic in that their primary audience loads up their front page multiple times a day just to see what's there.

I wouldn't be surprised if they did a deal with Bloomberg to provide their content on terminals to further strengthen their ties to their core audience.

That's a really good point. Maybe the future of news survival is to pair it with a company that makes money, to support the journalism. Then again I feel like that's how we got CNN, MSNBC, FoxNews, etc. Maybe not a great idea. I really like Blomberg and to some extent WSJ. I hope they can maintain their integrity.

> Maybe the future of news survival is to pair it with a company that makes money, to support the journalism

To be honest, that's the past and present of most news as well.

Newspapers would have never been around based on advertising alone. The fee usually paid for printing and sending, and classifieds made up the bulk of the revenue. Now that they're decoupled and both news and classifieds are pretty much free, it's no wonder that news is struggling.

Hard news has (almost) never been a wildly profitable endeavor in and of itself.

The future of news survival is independent journalists being funded by people who care to have an unbiased investigative news media.

No idea how we get there, though. You basically need to persuade those who understand how critical journalism is to freedom to care to fund it, and to have a centralized platform to fund journalism that itself is not corruptible by monied interests trying to push propaganda.

> have a centralized platform to fund journalism that itself is not corruptible by monied interests trying to push propaganda

Why is a centralized platform necessary? Why isn't, e.g. Patreon, for individual journalists or individual teams of journalists sufficient?

Eh if we had benevolent entities funding journalism at a loss we'd all be far, far better for it.

We got Fox/CNN etc because journalism in service to advertisers = clickbait.

At this point, I think we can codify this: Journalism + Ad-Supported-Model = Clickbait.

Good journalism coming from firms which advertise is by accident. Like a broken clock being right twice a day.

I love subscription journalism because you're not playing the clickbait game.

There is no future for journalism in ad supported models. Plenty of future for infotainment, heck, infotainment masquerading as journalism might just take it all over, but journalism will certainly not be a part of that fold.

The Economist does this. EIU is about 1/3 revenues.

Organizations like WSJ should really try to market corporate level packages where specific IP ranges are cleared for free use. Keep it low enough that it does require more than a tier of approval and they could see some real money.

I am just surprised that sites don't do this already. Its par for the course with a lot of software, why not news or similar?

The WSJ is owned by Dow Jones & Company, which is profitable.

Isn't that BuzzFeed's business model?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact