How is this possible?
To highly summarize...
A frequent allegation is that this is unauthorized access of computer systems. The scrapers argue that this is public data so they are just accessing it. Their access isn't meaningfully different from regular users which are allowed. From their point of view if the service doesn't want to share the data they shouldn't make it available.
Another common accusation is breaching the ToS. Generally the defense is that they didn't agree to any contract.
A last effort is some sort of copyright. Generally the scrapers will argue that that the data can't be copyrighted, isn't owned by the service or that some sort of license was given (back to the public data argument).
Of course every case is different and has different points but these are the common ones that I have seen.
In fact this is one of the downsides of the US legal system - litigation is so expensive that nobody dares trying it even though it could set a legal precedent that would benefit society at large. This is IMO something a consumer-friendly regulatory environment (such as the EU) should settle in advance like with the GDPR for example, but given they're not even bothered to enforce that effectively, I don't have much hope (if they enforced it, it would actually remove a big use-case for scraping Instagram, as you would be able to use the official clients without compromising your privacy).
The only way is for Instagram to restrict registration altogether, but you might create a black market where existing users sell their accounts, and cannabilize its own userbase (Bad for meta stock prices).
If there's demand for their service I don't see why it wouldn't scale. Get more phones, more SIM cards, and have automation around all this infrastructure to automate away as much of the stuff as possible.
> Meta/Instagram have multiple teams dedicated to preventing this type of scraping
That's great but ultimately they still have a weakness: they want people to be able to see their stuff - at least some of it - without logging in. As long as you can either simulate a normal device perfectly, or even better, use real devices or virtualize them, there isn't much that Facebook can do without impacting legitimate usage which they don't want.
The most kosher way to get Instagram data is to get it through CrowdTangle which is owned by Meta but has its own caveats.
You think Instagram is going to get FBI to bust doors in Mogadishu or wherever the operators are?
Might be an issue if you are in the US or West since its behind a walled garden (you need to authenticate to access) but you do not need to pay for it, nor are new registrants restricted (they have access to everything) so its a public website that forces user accounts. The best Instagram can do is throttle or ban those accounts scraping.
'*Also, I'm not an Insta user, so in my mind it is just a thread of images and comments. Maybe my understanding of Insta is off?
(Remember when Facebook and Twitter had this built in?)
what do you guys do differently vs somebody just making the leap and purchasing the 4g/LTE proxy hardware and doing things themselves?
Where are you guys located?
you don't login for 2 weeks and suddenly focusing your comments on me after the last one under your other nick got flagged ;)
Yes, some new accounts are abusive, but we have ways of dealing with that (you can always email email@example.com to alert us). But it's very important to err on the side of welcoming people. There's a limit to how much damage new accounts can do anyhow.