I'd note, regarding storing data, that HTML compresses extremely well. It will shrink to <10% of its size. If you want to save it, you would be a fool if you saved it uncompressed.
Guess I'm a bit of a fool then because I was definitely saving uncompressed haha.
Maybe I'll give it a try at some point, but right now, I don't have any need for it. It'd be work without a clear payoff. I'm starting a new job at the end of March, and I'd rather concentrate on some features that help me run the site between now and then.
Does create a bit of work when you have to figure out which parts of cURL need to be ported to Python, and which can be safely omitted. Copying a cURL request adds in a lot of headers - many of which I still don't properly know the purpose of.
I'll get there eventually. : ) In the meantime - thank god for whoever wrote "Copy as cURL request"!
Is there such thing as an unscrapable site? I tried to open driver.uber.com with Pyppeteer and it fails. I’m guessing it’s due to redirects, so what have you seen solve this problem?