
How paywalled sites permit access to visitors from social media sites and apps - kevinphy
https://elaineou.com/2017/01/19/how-the-twitter-app-bypasses-paywalls/
======
yalooze
This bookmarklet works too:
javascript:location.href='[http://facebook.com/l.php?u='+location.href](http://facebook.com/l.php?u='+location.href)

~~~
dTal
In exchange for Facebook tracking, and potentially intercepting and modifying,
every page you look at this way.

~~~
lrem
Which, for many people, will be preferable to the obvious alternative way to
get through the pay wall...

~~~
bigbugbag
Actually it is not, but most people lack the knowledge and understanding to
care enough.

------
NKCSS

        if (details.url.includes(url)) {
    

So, if I want to detect if you have this plugin installed, I load an image
with ?plugin-test=wsj.com as any part in the url can have it to trigger.

Might want to improve this...

~~~
NKCSS
If you add this to the manifest "permissions" section and go to
[http://pbb.nkcss.com/](http://pbb.nkcss.com/) you can see it's easily
detected:

    
    
        "http://pbb.nkcss.com/*"
    

And if I can do it in a few minutes, I'm sure those who have a paywall can do
so as well.

------
chrismorgan
Some code review just in case anyone is interested. (I don’t expect it to make
it into the article as it was written six months ago.)

This pattern:

    
    
      if (details.url.includes(url)) {
        return true;
      }
      return false;
    

should be replaced by this:

    
    
      return details.url.includes(url);
    

This pattern:

    
    
      array.map(someFunction).reduce(function(a, b) { return a ||b}, false)
    

should be replaced by this:

    
    
      array.some(someFunction)
    

(Note the semantics are slightly different—`.some` will break early, so it’s
more efficient and equivalent provided there are no side-effects in the map
function.)

Taking both of these, the following:

    
    
      var useTwitter = VIA_TWITTER.map(function(url) {
        if (details.url.includes(url)) {
          return true;
        }
        return false;
      })
      .reduce(function(a, b) { return a || b}, false);
    

can be rewritten much more simply as:

    
    
      var useTwitter = VIA_TWITTER.some(function(url) {
        return details.url.includes(url);
      }
    

You could even do it thus if you desired:

    
    
      var useTwitter = VIA_TWITTER.some(details.url.includes.bind(details.url));
    

… but that’s probably harder to read. I will mention arrow functions, however,
which are pretty:

    
    
      var useTwitter = VIA_TWITTER.some(url => details.url.includes(url));
    

This part:

    
    
      details.requestHeaders.filter(function(header) {
    
        // block cookies by default
        if (header.name !== "Cookie") {
          return header;
        } 
    
      })
    

`.filter` only cares about truthiness in its return value—as this code does
it, undefined is false and an object is true. But you could simplify it:

    
    
      details.requestHeaders.filter(function(header) {
        // block cookies by default
        return header.name !== "Cookie";
      })
    

Also in the original code’s usage of map, it’s not actually changing the
values, only things inside them, so using `map` is wasteful (as it entails
allocating a new array). You could just use `forEach`:

    
    
      var reqHeaders = …;
      reqHeaders.forEach(function(header) {
        if (header.name === "Referer") {
          header.value = setRefer(useTwitter);
          foundReferer = true;
        }
        if (header.name === "User-Agent") {
          header.value = setUserAgent(useTwitter);
          foundUA = true;
        }
      });
    

A remark on fine-tuning performance: when you access properties inside an
object multiple times, it’s optimal to store it as a local variable to save
having to look it up multiple times. (This is especially the case if the
property is expensive to access.) Take the `blockCookies` method:

    
    
      function blockCookies(details) {
        for (var i = 0; i < details.responseHeaders.length; ++i) {
          if (details.responseHeaders[i].name === "Set-Cookie") {
            details.responseHeaders.splice(i, 1);
          }
        }
        return {responseHeaders: details.responseHeaders};
      }
    

This is accessing `details.responseHeaders` many times when it only needs to
access it once. It is also accessing its `length` member once per iteration,
rather than caching the length. Normally for that I’d say “store the length
once up front,” but in this case the code is changing the array length in the
loop, so that’d actually break things. On that note, the code as published is
actually missing some cookies, because it removes an item from the array and
then skips past the new element at that index. To fix that, you need to move
the `++i` into the loop so it can be skipped if you do splice the array. Also
in order to not need to access the length property many times you could
iterate in reverse instead of forwards. I might write the whole function like
this:

    
    
      function blockCookies(details) {
        var headers = details.responseHeaders;
        var i = headers.length - 1;
        while (i > -1) {
          if (headers[i].name === "Set-Cookie") {
            headers.splice(i, 1);
          } else {
            i--;
          }
        }
        return {responseHeaders: headers};
      }

~~~
mattmanser
You can actually just do this:

    
    
        for (var i = 0, l = details.responseHeaders.length; i < l; ++i) {

~~~
chrismorgan
I would have done that, but because of the splicing you don’t actually want
++i every time. That’s why I rewrote it to a while loop.

Even then you could do it as a for loop with an empty increment clause, of
course. Or even use a one-liner for loop with no body:

    
    
      function blockCookies(details) {
        for (var headers = details.responseHeaders, i = headers.length - 1; i > -1; headers[i].name === "Set-Cookie" ? headers.splice(i, 1) : i--);
        return {responseHeaders: headers};
      }
    

But that’s crazy talk. Leave optimisations that to the minifier if it feels so
disposed.

------
userbinator
I remember this trick many years ago was useful for bypassing download sites
that otherwise obliged you to use their adware-filled "download manager" to
get files or gave bonuses to such usrs (special user-agent). The words "User-
Agent: MEGAUPLOAD 2.0" might bring back interesting memories for some here.
;-)

It's trivial to do this with a filtering proxy, which means it works in all
browsers. On the other side, I've unknowingly embedded images from image
hosting sites that didn't allow "hotlinking", only to be told by other users
they couldn't see them, because of the referer headers (or lack thereof) my
usual configuration sends.

IMHO things like this shouldn't be spread _too_ widely... much like the fight
between adblockers and anti-adblockers, it can only eventually lead to a more
hostile computing environment.

------
netcan
Paywalled sites (particularly news/media sites) are playing at a messy game
here. It’s similar to the free vs paid music. In the radio-records day, free
radio play drove record sales revenue. Ideally they would sell records to
anyone willing to pay, but give music away to anyone willing to listen. In the
actual world, you can’t do this perfectly so you need another way for free &
paid to co-exist. Once the medium changed to digital, all the rules get thrown
out. Attempts to force digital to play by radio-&-record’s rules have been a
slog.

Many internet generations ago news sites were mad at Google for showing ads
alongside their headlines on news.google. They even got some lawmakers to
agree. Google offered to de-index them. Stalemate.

Similar issue here. News sites want paid subscriptions if they can get them,
so paywall. They also want the readers who won’t pay, so no paywall.

Overall, I think paywalling is a semi-dead end. I don’t mean that it won’t
work for any site, but it’ll probably be a niche revenue source like a print-
only publication. Most news sites want to be part of the greatest, most
relevant discussions. Those happen on the internet as a whole, not inside
walled gardens. This mess free for some, paid for others mess is to much of a
kludge to be _the_ model.

~~~
frandroid
It's fine talking about messes, but _someone_ has to pay for the content, and
display ads don't cut it. Paywalls will be the dominant form of monetization
for most mainstream news orgs in the next 5 years. The second most important
one will be the Guardian's supporter model, which is a paywall lite. It's
still not clear this model will work at the Guardian is still bleeding money.

I think eventually you'll see a re-arranging of press wire cooperatives such
as AP and CP to limit dissemination of wire content for free without strings
attached, forcing the hand of many.

~~~
bigbugbag
The content is usually crap used to brainwash people, so let the shareholders
pay for it. Besides once _someone_ has paid then no one needs to pay ever
again because sharing and digital copies.

The business model is wrong and broken, and this failed business model is not
our problem.

------
JokerDan
I honestly prefer to not use pay walled websites. I also run adblock but in a
whitelist mode, sites i often visit, find entertaining and worthy of any money
from my activity, get whitelisted.

------
mola
I am curious as how these sort of guides are taken as incouous fun reads while
a guide to, say, shoplifting would seem less than legitimate.

~~~
amelius
And how about a guide to illegally copy Spotify tracks without an account?

~~~
clarkenheim
link?

~~~
amelius
Just hypothetical. I'm curious about the perceived value of music versus news
articles.

------
donohoe
As someone who has worked on building two paywalls and still involved in them,
please bear in mind a few things.

1\. Publishers have ability to configure access on paywalls as they see fit.

Whether based on referrer, UA, and a whole host of other attributes, history
and so on.

2\. Publishers don't care about a degree of paywall evasion.

Studies show that people willing to pay will pay, and those who go to great
lengths to evade won't ever subscribe. The question is then, do you want to
waste expensive developer resources in an arms-race against ppl who'll never
give you a cent, or do you want to spend that developer time enhancing the
experience for those who will subscribe.

3\. Full locked-down paywalls are known to be bad.

Publishers still want to ensure their content is in the public conversation,
and that means their content has to be accessible in soem form - or you
strategically choose to follow a different business model. See:
[https://techcrunch.com/2010/11/02/times-paywall-4-million-
re...](https://techcrunch.com/2010/11/02/times-paywall-4-million-readers/)

------
cjg
This is the kind of thing that eager prosecutors will turn into a CFAA charge.

------
gnicholas
Another way to get free access to the WSJ, which doesn't require stepping into
a legally/ethically grey area, is to use one of the handful of apps/extensions
that has been granted free access.

The WSJ approached me and offered this access to a project I'm building, Read
Across The Aisle [1]. I've built an iOS app and Chrome extension, both of
which are free. I don't know exactly what the WSJ gets out of this deal (we
have not given them, or anyone else, any user data), but I think it's that
they want to be associated with post-filter bubble projects.

1: [http://www.readacrosstheaisle.com](http://www.readacrosstheaisle.com)

------
personjerry
So why isn't this on the Chrome store?

~~~
igravious
A Paywall Extension, hmmm? What if it were part of a dev tool bundle?

It's arguably unfair that users of Facebook and Twitter and other social media
sites should have access to more content than someone who has decided not to
use those platforms.

~~~
gms7777
Private websites have no duty to allow "fair access" to their content. Non-
users of Facebook is not a legally protected class.

------
digi_owl
Firefox and ModifyHeaders is nice for this.

------
whizzkid
While I think these kind of posts are really fun to read, they should also
mention little bit more about why these are paywalled. People needs to
understand that traditional newspaper is dying and in order to support real
journalists, they need to make money. I am neither working for any news
organisation nor talking about a specific news outlet.

As far as I can see, they are trying hard not to annoy people but at the same
time try to make some money for their work.

~~~
amelius
I'm wondering what keeps e.g. Google from implementing micro-payments. If I
can get access to an article or website for 20 cents by simply clicking a
(universal) payment button, I would do so in many cases.

~~~
saimiam
Like the old adage goes, they won't do this because their livelihood depends
on it.

A single user can be exposed to 3-4 (or more) ads per page. Each ad can earn
them some money depending on the relevance of the ad to the viewer. In this
case, the price they charge the advertiser is more guesswork/auction than
science. They don't have to/can't prove to anyone that the ad truly worked.
Advertisers tweak and A/B test ads to achieve higher engagements and better
results.

A fixed rate model like you're suggesting limits the scalability of selling
the same pair of eyeballs to 3-4+ advertisers. It forces google and content
creators to attach a hard number to each visitor which is essentially upper
bound by people's spending habits - if people are feeling poor, they can
simply stop visiting websites to make ends meet!

Fir these reasons, I don't believe any advertising based company is ever truly
going to get behind pay-as-you-go content. I've made my bet (see my profile
for deets aka FD) and I'm in the process of quitting my job to put my money
where my mouth is.

~~~
URSpider94
I don't think these two approaches are diametrically opposed. In the
traditional print journalism model, I pay for my copy of the
newspaper/magazine, and I get a lot of ads -- most magazines are > 50% ad
content by area. Even online, I am a subscriber to the NYTimes and WaPo -- I
still get ads when I view the site. Paying does not turn off ads.

I am sure I'm in the majority on this site, but I believe that it's possible
to do advertising-supported content in a tasteful way where the ads add to the
experience, or at least do not detract from it.

~~~
dreamcompiler
Likewise cable TV. You pay for a subscription and you also see a lot of ads
(more than on OTA TV, which is one of many reasons I cut the cord.) In the
early days of cable networks, many had no ads because it was thought they
could survive on subscription revenue. That idea didn't last long.

~~~
saimiam
As a whole, broadcast and cable TV is (used to be) a captive market. It was
very possible for channels to synchronize or nearly synchronize their
commercial breaks which meant that they could get away with charging for
content AND showing ads.

The proof is that the minute add-ons like DVRs and set top boxes like Roku
came along, the first thing people did was to bail on the old TV model.

------
Animats
This only works, of course, for paywalls that want links in Twitter feeds to
bypass their paywalls. I'd at least expect an ad at that point.

------
porker
Interestingly the FT's paywall is immune to this in my testing. Device
fingerprinting?

~~~
animeseinfeld
Not that I condone doing it, Google Bot trick still works for FT.

------
bypasspaywalls
You can download my Firefox plugin to bypass the WSJ paywall (also bypasses FT
paywall): [https://addons.mozilla.org/en-
US/firefox/addon/bypasspaywall...](https://addons.mozilla.org/en-
US/firefox/addon/bypasspaywalls/)

And if you want the Chrome version you will need to manually download it:
[http://bypasspaywalls.weebly.com/](http://bypasspaywalls.weebly.com/)

------
allenleein
How Google’s Web Crawler Bypasses Paywalls

[https://elaineou.com/2016/02/19/how-to-use-chrome-
extensions...](https://elaineou.com/2016/02/19/how-to-use-chrome-extensions-
to-bypass-paywalls/)

~~~
chinathrow
That article links back to the one published on HN:

"Update: A newer version of the chrome extension is available here."

------
cupcakestand
tl;dr:

Wall Street Journal ended allowing special access for search engines through
their paywall. By spoofing Twitter app's Referer and User Agent access is
still possible and an included Chrome Extension script implements this idea.

~~~
Jaruzel
<rant> Personally, I'm finding the increased use of tl;dr here on HN annoying.
I feel, that like myself, HN readers are intelligent enough to read and
understand the articles without someone coming along and posting a summary,
simply for up-votes.

HN is all about the articles, and then the discussion on top. If people are
finding the articles to hard to penetrate that they need a tl;dr summary, then
maybe HN isn't the site for them? </rant>

~~~
rovek
This is exactly the kind of article I expect a tl;dr on, I actually looked for
it to save myself opening the article, scanning for "referer" and then closing
it immediately (which I did because the tl;dr was so low)

I think this is a backlash from the self-congratulatory, self-indulgent tone
medium has perpetuated, not through design I'm sure.

Sometimes one just wants to validate what we expect/know and move on;
personally I don't want more than 200 words to elaborate this headline.

edit: typo

------
Kenji
Is it just me or are websites that implement these kinds of selective paywalls
rarely worth visiting, let alone worth spending effort to get through the
paywall?

~~~
xU1ppskunDmy6oz
The Wall Street Journal is a very, very good newspaper that brought us such
amazing series as "What They Know", which I believe started the practice of
referring to the use of trackers as "spying" (much to the dismay of
advertisers), which has certainly helped deliver the message that tracking is
something users should worry about.

