Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: At Google, are you able to read every online scientific journal for free?
8 points by asciilifeform on April 21, 2009 | hide | past | favorite | 8 comments
Paywall-protected journal articles often appear in search results (in Google Scholar and otherwise.) Can anyone here who works for Google tell us if all of these results are freely loadable from the Google campus's IP range? This seems likely, given how they are indexable by the search engine but are blocked to anyone else save paid subscribers. Google could easily put a stop to this abuse by declaring that it will publicly cache any page which it finds to be different when loaded from its publicly-known IP range than from a separate, secret IP.



I suspect the paywall protected articles that Google runs across aren't protected by IP address, but by User-Agent string. I've seen a few hacks for this (look at the NYT User Agent hack), and the FF plugin that allows you to alter your UA string indicates this similarly.

Put another way, I don't know if Google necessarily has registered all of its crawling IP addresses. Sure, you could do a reverse IP lookup -- but you'd have to do that on every request, which can get expensive.

No, the cheap-easy route is probably something as mundane as UA string.


Faking the UA string does nothing on all of the journal sites I've tried.


There are not that many publishers and not all crawlers need access. My guess would be the publishers gave Google their own license with which a specific crawler could identify itself.


An important part of Goolge Scholar is that it lets you find (almost) any article published. If they removed all the pay-protected only articles it would be a lot less useful.

Note how they even show articles not available online at all, then at least you know they exist and can track them down at a library.


I work at Google (blinks AT). Our search results are no different, but we have a corporate membership with ACM's Digital Library which is insanely useful.

I'm sure most other major tech companies have the same deal.


This practice is explicitly allowed. Google wants paywall-protected content to be "discoverable".


And I can imagine the content owners to also want it to be discoverable, as it can only lead to more sales.


Chances are you won't have an IP address that's part of the "indexing range"




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: