We’ve always recommended that our customers use the escaped fragment protocol, so it will be a smooth transition as Google slowly stops crawling the ?_escaped_fragment_= URLs. No changes need to be made if you are currently using Prerender.io. Keep an eye on our twitter (@prerender) and we’ll give updates on Google’s transition.
Assuming renders take 7-10 seconds at worst, that means (if I've got my math right!) that you need to do between (60m/(86400/7)=4861) and (60m/(86400/10)=6944) renders per second in order to keep up. (86400 = seconds in a day)
Given that a single Chrome instance on my new-but-not-particularly-amazing i3 box can be sluggish at the best of times... I have no idea what sort of tolerances Xeon(?)-class hardware (possibly running Xen? :P) have to running multiple entire copies of Chromium... I initially wondered if you needed 1000 compute instance, then I realized maybe you only needed 400, now I honestly don't know at all.
I'm also curious how using Headless Chrome and PhantomJS is working out. As in, genuinely interested. IIUC my understanding is that PhantomJS has pretty much wound down, while Headless Chrome is fractionally different enough from Chrome that it's possible to tell which one you're running on (https://news.ycombinator.com/item?id=14936025). I've been idly curious about "perfectly sandboxing" webpages so they honestly can't tell they're not in a "normal" PC/laptop/mobile environment, and my impression is that I'd have to start with a _very_ carefully configured copy of normal Chromium in order to do it.
I must admit that I got curious at what 60m monthly renders looked like against the pricing structure... but couldn't really figure it out, it's not a simple enough exponential curve (and I can't math for nuts). Single-stepping through the pricing algorithm was very interesting though ($1522 for enterprise, huh cool).
PS. The view-source link at the bottom is unfortunately broken; Chrome blocked opening such URLs recentlyish. Fixing it will likely require, ironically, a little server-side renderer :)
EDIT: One last thing, note https://news.ycombinator.com/item?id=15882066 from this thread
Headless Chrome is great and we're super thankful that the Chromium team put the work in! PhantomJS is good... it just doesn't have all of the latest features, like ES6. So it was really helpful that headless Chrome came along right as people started using more ES6.
Yeah, Chrome did break the opening of view-source URLs a while back for our https://prerender.io/ buttons on the bottom of the homepage.
prerender is almost what I have been looking for a while! I am now having a great weekend!
I want to scrape a JSON object within a script tag in an AJAX heavy webpage - if prerender removes all script tags, does that mean I can't use prerender for my project?
Is there a way to tell prerender not to remove certain scripts?
PS: I intend to use prerender to replace scrapy-splash middleware.
Does your team have any plans to build a scrapy middleware?
Do you think Google indexing also uses Chrome Headless now to index the pages?
I think they still use a plain HTTP request based crawler for "most" sites (mainly for speed), but then flip on Chrome-based crawling for popular sites and for sites that seem to be JS-heavy. I see no reason why, long term, Chrome wouldn't become the primary crawl/render engine for Google.
I wonder if all. the. security. patches. from every subsequent Chrome milestone regularly get backported to M41?
Obviously it's sandboxed to the hit. Poking the sandbox and seeing what it was made of would be VERY interesting though.
"If you can only provide your content through a 'DOM-level equivalent' prerendered version served through dynamic serving to the appropriate clients (ed. note: e.g. Google bot), then that works for us too."
not quite the same thing as abandoning ajax crawling.
You could even go so far as to remove all unnecessary JS and CSS, but that'd require a bit more elbow grease.
Why would someone do that? I'm reminded of this exchange from the Hitchhiker's Guide to the Galaxy:
> “But the plans were on display…”
> “On display? I eventually had to go down to the cellar to find them.”
> “That’s the display department.”
> “With a flashlight.”
> “Ah, well, the lights had probably gone.”
> “So had the stairs.”
> “But look, you found the notice, didn’t you?”
> “Yes,” said Arthur, “yes I did. It was on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying ‘Beware of the Leopard.”
 - https://scrapy.org
 - https://github.com/scrapy-plugins/scrapy-splash
A meta tag to autodetect the lack of JS and auto-redirect could be interesting feature...
(The idea being the JS deletes the tag from the page)
I'd say that Google totally ruined Deja but if I remember correctly it had already declined before the acquisition.
Save for the endless cross-posting. Out of control trolls. and the FTDSOJ thread.
What's it mean?