Can anyone explain to me how these services are legal? I didn't read Instagram terms and conditions but I'm pretty sure there are tons of points against scraping, copying and distributing their data, in particular using them to make money.
I don't follow this topic closely but it is definitely in a legal grey area and under frequent debate (and lawsuits).
To highly summarize...
A frequent allegation is that this is unauthorized access of computer systems. The scrapers argue that this is public data so they are just accessing it. Their access isn't meaningfully different from regular users which are allowed. From their point of view if the service doesn't want to share the data they shouldn't make it available.
Another common accusation is breaching the ToS. Generally the defense is that they didn't agree to any contract.
A last effort is some sort of copyright. Generally the scrapers will argue that that the data can't be copyrighted, isn't owned by the service or that some sort of license was given (back to the public data argument).
Of course every case is different and has different points but these are the common ones that I have seen.
yeah post linkedin, it gave the green light to scrape any publicly available information. Craiglist bullied scrapers via lawsuit (EFF covered it) but post linkedin, there has been zero grounds for Craigslist to use the DDOS argument (since the website is built to handle far more traffic than scrapers can).
Breach of ToS has nothing to do with legality. It's definitely a breach of ToS, but legality will depend on the local jurisdiction, and enforcement will depend on whether the user is in reach of a legal system that cares about it (good luck when the user is anonymous or based in Russia or other US-unfriendly country).
The simple answer is: this is not legal and also doesn't work at scale. Try running this type of scaling for a few thousand profiles - you will quickly be restricted.
It's definitely a breach of ToS, but I wouldn't be so fast at calling it illegal. It's a grey area that has yet to be properly litigated - I think the closest we've got is the LinkedIn scraping case and I don't remember whether that one even reached a conclusive answer.
In fact this is one of the downsides of the US legal system - litigation is so expensive that nobody dares trying it even though it could set a legal precedent that would benefit society at large. This is IMO something a consumer-friendly regulatory environment (such as the EU) should settle in advance like with the GDPR for example, but given they're not even bothered to enforce that effectively, I don't have much hope (if they enforced it, it would actually remove a big use-case for scraping Instagram, as you would be able to use the official clients without compromising your privacy).
You are wrong. This is not illegal. With an 4g/LTE proxy machine you can easily generate thousands of profiles rapidly and cheaply. They would be able to detect them at some point (will be harder if goes slowly) but it wouldn't stop the scraping.
The only way is for Instagram to restrict registration altogether, but you might create a black market where existing users sell their accounts, and cannabilize its own userbase (Bad for meta stock prices).
I may be wrong about this being illegal (depending on the country you reside in), but it is certainly not an approach that scales. Meta/Instagram have multiple teams dedicated to preventing this type of scraping. Unless you're willing to invest an equivalent level of resources, any success in scraping Instagram data will be temporary.
If there's demand for their service I don't see why it wouldn't scale. Get more phones, more SIM cards, and have automation around all this infrastructure to automate away as much of the stuff as possible.
> Meta/Instagram have multiple teams dedicated to preventing this type of scraping
That's great but ultimately they still have a weakness: they want people to be able to see their stuff - at least some of it - without logging in. As long as you can either simulate a normal device perfectly, or even better, use real devices or virtualize them, there isn't much that Facebook can do without impacting legitimate usage which they don't want.
Instagram is one service that is very particular about enforcing their API usage. Anyone attempting to monetize Instagram data obtained outside of the developer program will get a C&D very quickly.
The most kosher way to get Instagram data is to get it through CrowdTangle which is owned by Meta but has its own caveats.
so? ToS is not the law! Nor can you use the CFAA here. It is not hacking. In addition, the operator lives in a jurisdiction that does not respect an American corporation C&Ds, what happens? Instagram has no legal ground to start an extradition treaty because somebody is scraping them lol.
You think Instagram is going to get FBI to bust doors in Mogadishu or wherever the operators are?
Might be an issue if you are in the US or West since its behind a walled garden (you need to authenticate to access) but you do not need to pay for it, nor are new registrants restricted (they have access to everything) so its a public website that forces user accounts. The best Instagram can do is throttle or ban those accounts scraping.
It’s a bit difficult to monetize all the data you get from Instagram if you don’t have an Instagram account. And Instagram will happily mess you up by requiring phone number confirmation, and by banning IP addresses or phone numbers.
The business model here is that they've streamlined the process to get phone numbers & IPs - Facebook can't do shit without impacting other, legitimate users on the same IP & number ranges.
What kind of things are people scrapping Insta for? I have a hard time with scrapping apps anyways, but at least some of it makes sense when making comparisons on prices or what not. But I'm just not imaginatve to come up with why you'd want to scrape obviously copyrighted images.
'*Also, I'm not an Insta user, so in my mind it is just a thread of images and comments. Maybe my understanding of Insta is off?
I scrape the data that public officials post (thought not on instagram yet) -- that has lots of utility in determining their positions on issues, where in their jurisdiction they visit most, who they're meeting with, etc.
Real state agencies in my town post newly available properties on insta. I'm looking for a place to rent so I'd like to scrape it so that I don't have to be checking my phone constantly.
have we really gotten to the point that this is the only place they post the data? you have to be "cool" to know the listings are available rather than checking "lame" websites? If true, I weep for society
If scraping Instagram was allowed or easy, there are a tonne of use cases. One example: detection of products and sentiment for marketing (e.g. a post: I love my new Apple Watch!)
One is a bee while the other is a fish, so there are many differences. One lives in water while the other lives on land. One is tasty when fileted, the other vomits tasty goodness. I'm sure you can think of other differences. </kidding>
Yes, some new accounts are abusive, but we have ways of dealing with that (you can always email hn@ycombinator.com to alert us). But it's very important to err on the side of welcoming people. There's a limit to how much damage new accounts can do anyhow.
How is this possible?