how are you going to do it without having to know the actual authentication key(s)? if i don't trust anyone enough to give my auth away, and so unless the site being scraped has some sort of oauth support, how are you going to get any data?
of course, if this was an offline product, or self-hosted product, then it would solve that problem of auth instantly.
Would there any way to fake the beginning of an OAuth session with Facebook, Google or any other OAuth authenticated site? Kind of like replaying cookies to hijack sessions?
The route of proxying the web page presents much difficulty in doing actual authentication on Facebook or Google's website via the proxied webpage without first rewriting most of the javascript and hijacking their Ajax calls on the fly.
The approach I took was to hijack the Cookies from the browser once the user has signed in after on e.g. Facebook via the browser extension.
The route of proxying the website does in fact do away with the need to install any external 3rd libraries.
This browser extension I built coupled with the web service its integrated to does allow for scraping of pages from Facebook, Google and LinkedIn logged in pages as well.
Hah, I've been working on this recently with Facebook, on a TV set-top-box. It was painful and I ended up giving up. xd_arbiter.php is the key, I think.