
Scraping Tinder selfies to make a dataset for AI experiments - denzil_correa
https://techcrunch.com/2017/04/28/someone-scraped-40000-tinder-selfies-to-make-a-facial-dataset-for-ai-experiments/
======
danso
> _“I have often been disappointed,” he writes of other facial data sets. “The
> datasets tend to be extremely strict in their structure, and are usually too
> small. Tinder gives you access to thousands of people within miles of you.
> Why not leverage Tinder to build a better, larger facial dataset?”_

This guy sounds like a caricature of the exact kind of socially unaware techie
who blithely implements dystopian-powering software, because having data and
writing code is all that matters apparently. Privacy law admittedly isn't the
simplest concept but you have to wonder how blissfully solipsistic someone has
to be to write a simple script that scrapes proprietary data without asking
himself, "this is so easy, I wonder why no one else has had this idea?"

And then he released the data under CC0, just to add insult to injury by
making CC look unethical.

~~~
SkyMarshal
CC aside (which I'm not sure he has the right to release Tinder's data under
any arbitrary license of his choosing, CC or otherwise), I'm actually ok with
security and privacy vulnerabilities being openly exposed in this way. Some
random hacker in his garage figures out how to do it and just does, and makes
it completely public. The general public is so complacent about this stuff, it
feels like events like this, or the fappening, motivate them far more than
most advocacy does, while doing far less harm than a govt agency using it in
secret could.

~~~
joshumax
Now I feel bad for doing this exact thing with Tinder to create a data-set for
determining age based on facial features...

At least I didn't publicize it after and removed it when I was done

------
propman
This is a serious ethical violation in my opinion. Like the sjsu student in
that article, I am extremely uncomfortable with people of suspect moral compas
analyzing my face without my consent. As a Bay Area frequent user of tinder, I
do actually feel violated. Only one of my photos is public on my profile but
with that you can find my name, my college, and then using basic data
engineering you can find my phone number and address. Using face recognition
you could theoretically scour the web for nudes or suspect information and
stalk, blackmail or worse. I hope someone drops the hammer on this person to
send a message and prevent future occurrences of this

~~~
devrandomguy
I do not believe that it would be possible to prevent this sort of behavior by
force. Any script kiddie could pull this off, and there are more born every
day. Some of them barely even understand Western values, let alone have an
incentive to value them.

Rather than (edit: in addition to) relying on protection, we should rely on
personal safety practices. Using our own portraits as avatars may have been a
mistake, in hindsight. I will recommend to my friends, that they teach their
children to use a photo of their favorite toy or something, instead of their
own face.

~~~
nojvek
Tinder is very cut throat. The whole social network values people based on
their appearances. When I was single I had written a bot on their v1 API when
Tinder was first on app store. I was curious of their security and social
dynamics. No intent to release anything in public.

I had then created 5 different profiles of same picture, with different shades
of skin color. Interesting results. I came to the conclusion that tinder
wasn't for me. I had to try good ol' face to face interactions.

I'm sure Tinder and snapchat are a trove of data to help study the science of
human mating. Releasing a dataset online is crazy although I wonder what would
happen if tinder got hacked and their data published on bittorrent like the
fapgate with celebrities that happened a while ago.

A security disaster like that is just waiting to happen.

------
yoodenvranx
I am still waiting for someone to scrape all the images of naked people on
reddit/imgur and do some classification based on the subreddit from which the
image comes from. (For those who don't know: a lot of porn subreddits are
_highly_ specific to a certain topic which might help for classification)

~~~
minimaxir
That is not the same case morally as Tinder because photos posted on Reddit
are _public_ and have no expectation of privacy. (also, it's not technically
complex to get the image URLs from the Reddit data dumps, albeit getting the
images themselves _will_ likely hit a ToS violation)

~~~
pjc50
Really? Does this apply even if the photo is posted without the consent of
photographer or subject?

(because we already had that, and it was a fiasco)

~~~
minimaxir
I said not-the-same-case; the legality is still incredibly complicated.

------
dvt
Data set no longer exists, Tinder probably sent a C&D. I did something similar
with FB a while ago -- scraped all photos of people that "Liked" a certain
topic. It's no longer possible.

Definitely feels wrong having such easy access to people's pictures; it leads
to interesting questions: is your face PII? Do you have ownership rights to
your face? What about facial recognition algorithms that use pictures you
upload to FB, Instagram, Tinder, etc. to generate, e.g., Haar cascades?

~~~
TillE
Basic copyright protections mean you have zero rights to random photos you
grab from some internet service. It's probably ok to just download them, but
doing _anything_ with them creates a derivative work which you cannot
distribute without permission.

My point is, this kind of thing is already illegal.

~~~
dvt
What about services that use my face for things like facial recognition
(without explicit consent)?

------
kolbe
Question unrelated to the article, but related to that link: is anyone else
developing a knee-jerk reaction to just leaving a web site the moment they ask
you to turn ad-block off?

~~~
ben_w
I used to, but now I disable Javascript by default and only enable it on
websites that demonstrably benefit from it. The result is, the ad-block
blocking scripts never run.

------
ben_jones
Let's not forget that pictures like this are used for social engineering
attacks in domains ranging from finance to prostitution to blackmail. I
honestly don't care about the researcher (assuming he doesn't release the data
set publicly). It's the responsibility of Facebook, Tinder, and similar sites
which profit from such images, to go to extreme measures to protect them even
if it incurs significant costs such as counter scraping.

------
zitterbewegung
Nowadays scraping is the Uber of data collection . You will be fraught with
legal problems with the dataset and people will block you if you become a big
enough problem.

~~~
hkon
"you wouldn't download an image"

------
exabrial
Am I the only one who is greatly disturbed by the ethics of this? I don't
think he had those user's consent, nor permission from Tinder.

------
alexc05
Experiments without informed consent. While Tinder _might_ have something in
their T&C which requires that they're allowed to look at the data, it isn't
ethical for third parties to scrape and use that same data without informed
consent.

Interestingly though, I wouldn't think the same of an instagram scrape.
Probably because insta is public by design, while tinder is expected to be
local, requires a login and ostensibly is for the purpose of looking for
companionship.

~~~
denzil_correa
> Probably because insta is public by design

There's a difference between public and public domain. Just because something
is public, doesn't mean you are free to use it as you wish. You _may_ still
require permission to use it for tasks beyond the originally intended purpose.

~~~
alexc05
See, I agree and disagree with that. People on tinder are probably reasonably
expecting that their pictures are used for meeting up with locals for the
purposes of dating. The debate about whether this is a reasonable expectation
isn't really what I'm on about. (Obviously, as techie types, we have a
different understanding of how private this stuff really is)

On instagram, it is a broadcasting and publishing platform and it is very
clear that by posting a photo, on a non-private-account the photo can be seen
and generally browsed by others.

A researcher who conducts a study of "looking at" 400k public instagram photos
is doing a different thing than spoofing their location and auto-swiping
through 400k lonely hearts.

Obviously people disagree with this since I'm getting downvotes for my
position. This is certainly my opinion But I do feel like running a study
against one isn't unethical while the other is.

Of course, that should be caveated that "republishing" the scraped data is
unreasonable in both cases (IMO). I also think things like this artist
[http://fortune.com/2015/05/26/instagram-copyright-
art/](http://fortune.com/2015/05/26/instagram-copyright-art/) are
reprehensible... but that's art, not science and doesn't actually have the
same expectation to abide by ethical principles.

~~~
denzil_correa
> On instagram, it is a broadcasting and publishing platform and it is very
> clear that by posting a photo, on a non-private-account the photo can be
> seen and generally browsed by others.

The keyword is published for "browsing" and other things which are codified.
You can't use it for mining data without consent.

------
unityByFreedom
This is an ugly thread. Just a heads up =)

------
ourcat
Pretty tasteless array name in there: `for hoe in hoes:` which exposes the
base mentality of the coder.

~~~
helthanatos
Well, it's probably more showing of a sense of humor.

~~~
dvt
I'm pretty liberal about stuff like this and I honestly think it's pretty
tasteless also.

I've often said that the sexism issue in tech is overblown (due to various
agendas at work) but when you see shit like this (especially in the context of
Tinder, a dating app), it's just hard to deny :/

