

Ask HN: At Google, are you able to read every online scientific journal for free? - asciilifeform

Paywall-protected journal articles often appear in search results (in Google Scholar and otherwise.) Can anyone here who works for Google tell us if all of these results are freely loadable from the Google campus's IP range? This seems likely, given how they are indexable by the search engine but are blocked to anyone else save paid subscribers. Google could easily put a stop to this abuse by declaring that it will publicly cache any page which it finds to be different when loaded from its publicly-known IP range than from a separate, secret IP.
======
Travis
I suspect the paywall protected articles that Google runs across aren't
protected by IP address, but by User-Agent string. I've seen a few hacks for
this (look at the NYT User Agent hack), and the FF plugin that allows you to
alter your UA string indicates this similarly.

Put another way, I don't know if Google necessarily has registered all of its
crawling IP addresses. Sure, you could do a reverse IP lookup -- but you'd
have to do that on every request, which can get expensive.

No, the cheap-easy route is probably something as mundane as UA string.

~~~
asciilifeform
Faking the UA string does nothing on all of the journal sites I've tried.

------
almost
An important part of Goolge Scholar is that it lets you find (almost) any
article published. If they removed all the pay-protected only articles it
would be a lot less useful.

Note how they even show articles not available online at all, then at least
you know they exist and can track them down at a library.

------
blinks
I work at Google (blinks AT). Our search results are no different, but we have
a corporate membership with ACM's Digital Library which is _insanely_ useful.

I'm sure most other major tech companies have the same deal.

------
ivank
This practice is explicitly allowed. Google wants paywall-protected content to
be "discoverable".

~~~
rjprins
And I can imagine the content owners to also want it to be discoverable, as it
can only lead to more sales.

------
kierank
Chances are you won't have an IP address that's part of the "indexing range"

