Reddit is a SPA though right

devjab · on June 1, 2023

Yes and no, the terrible attempts to build a new front-end are, but the old front-end that runs on python with Pylons (I believe) as it’s “front” isn’t.

I like react, and I love typescript, but sites like Wikipedia, old.Reddit.com, stackoverflow, hacker news and so on are nice showcases of how you should never be afraid of the page reload, because your users won’t be unless you’re building something where you need to update screen states without user inputs. Like when your user receives a mail and can’t just reload the page because your users input would be lost if you do so. I think this last part is the primary reason (along with mobile) that Reddit has been attempting to move to React, because the “social” parts like chats and private messages don’t instantly show up for users in the old front-end. Unfortunately they haven’t been very successful so far.

You probably can scrape their current or their new.Reddit front-ends since you can scrape SPAs, but it’s much, much, easier to scrape the old.Reddit front.

matheusmoreira · on June 1, 2023

Don't know to be honest. I assume new reddit is while old reddit isn't. Truth is the job is even easier in case of an SPA: you just need to use whatever internal APIs they built for themselves. Can't add idiotic restrictions to that without affecting the normal web service.

delfinom · on June 1, 2023

SPA being a problem was last decade. Headless chromium is pretty standard for scraping nowadays.

VWWHFSfQ · on June 1, 2023

How does that work for clients though. Should Apollo ship a headless chromium in their mobile app

rovr138 · on June 1, 2023

Can they launch a browser view hidden and scrape it? I have no idea if they can read from it.

moneywoes · on June 1, 2023

Isn’t that very expensive to run at scale especially mixing in residential IPs to avoid blocking

ipaddr · on June 1, 2023

Each person would spider for their own needs and most would use residential ips