
GitHub and Medium take down database of ICE employee LinkedIn accounts - nathanielks
https://www.theverge.com/2018/6/19/17480912/github-ice-linkedin-scraping-employees
======
donbright
Q "But don't we sell the same kind of data to our paying customers?"

X "Yes, but when we do it, it's not doxxing"

Q "Why not?"

X "Because we would never do it for any nefarious reason"

Q "What about that time we scraped the data of those groups that were linked
to that reporter, so that one company could do oppo research and prepare a PR
counterattack, targeting specific journalists"

X "That wasn't nefarious. We made a lot of money doing that"

~~~
tinus_hn
‘Doxxing’ is such a useless term anyway. In this case all the database is is
the search results for ‘ICE’ on LinkedIn. That’s data these people filled in
themselves. But now that attention has been called to it it’s ‘doxxing’.

The phonebook is full of ‘doxxing’. Who cares.

~~~
erric
Not to mention that most govt. employees record of employment is public info(
in the us anyway). Being law enforcement this may be a little different,
however this info may be foiable or even just sitting on the ice web site.

------
allenz
Sam Levigne is a performance artist. The context for this work is that ICE
plans to increase its surveillance of immigrants, including surveillance of
social media accounts. Sam argues that if this level of transparency is too
invasive even for our public figures, then why do we demand this for
prospective citizens?

After a public backlash, ICE recently suspended its Extreme Vetting
Initiative, which would scan social media history and automatically flag
people for deportation based on the exact criteria from the original Muslim
ban. The Brennan Center discusses why this is bad:
[https://www.brennancenter.org/analysis/ice-extreme-
vetting-i...](https://www.brennancenter.org/analysis/ice-extreme-vetting-
initiative-resource-page)

ICE will still require five years of social media history:
[https://www.cnn.com/2018/03/29/politics/immigrants-social-
me...](https://www.cnn.com/2018/03/29/politics/immigrants-social-media-
information/index.html)

------
simula67
> “I think that’s a totally valid question to bring up,” Lavigne said earlier
> today about whether the database could be used for targeted harassment, “but
> I think that the information is already out there, and if people want to
> embark on individual campaigns of harassment, then they’re going to be doing
> that no matter what.”

That does not mean you have to make it easier

~~~
qop
There can sometimes be a fine line between "enabling" and "meddling"

I mean, surely you can't expect a site like GitHub with thousands of projects
happening every day to be able to keep track of each one and make sure that
something bad isnt being done with the project.

~~~
txcwpalpha
What? Yes you surely can expect that. That's what moderation is. That's what
every site on the internet that allows user-submitted content does. Thats what
Twitter does, YouTube does, Facebook does, Reddit does, and HN does. That's
what GitHub does. GitHub is ultimately responsible for everything that is
hosted on their site, and so _of course_ they track each one and monitor them
for violations of their policies. If they couldn't do that, they wouldn't be
in business.

~~~
MichaelMoser123
Then how comes that gitlab is still hosting this content? Do the have a
different policy? Or is it they just didn't notice?

~~~
txcwpalpha
GitLab isn't still hosting the content. They too have taken it down.

------
zaarn
> The database included information like job title, profile picture, and
> general location of work.

Ah, yes. The White Knight of Severely Violating People's Privacy because
You're Right and They Aren't. Truly the most moral person in the world.

In the current political situation in the US, afai can judge it, will enable
harassers to seriously harm or even endanger the lives of these people for the
crime of having the wrong employer (they might not even be involved in any of
the bad crap you see on TV but who cares, wrong employer!)

In other countries or, for example, the EU, such behavior would be a crime,
end of story. And you'd be responsible for the damage that comes from doxing
people.

Saying "the information is already out there" is no excuse. It's like a
swatter saying "it was just a prank".

~~~
mad_tortoise
If you sign up to be in the police, or ICE, or any such governmental arm used
to jail the poor and protect the rich, and are passively accepting these
policies you are supporting them. As such by supporting these despicable
policies those enforcing them are scum in my books, and if they are doing
these things I see nothing wrong with publishing their information. Maybe now
they will be as afraid as the people they persecute.

~~~
exegete
What if you're an law enforcement agent and do your job ethically (refuse
unethical orders, etc.)? Or what if you're pressured by your supervisor to do
things that unethical and know that the consequence of not doing those things
is getting your and your family's lives ruined? Should your private info be
exposed so you can be harassed?

~~~
eucitizen
i’ll leave this here

[https://www.washingtonpost.com/archive/opinions/1979/10/21/t...](https://www.washingtonpost.com/archive/opinions/1979/10/21/they-
were-just-following-
orders/34d2eb42-daf6-49b9-af28-37a9582d0688/?noredirect=on&utm_term=.7bed9bdaada2)

------
staticelf
So if I get this correct; this person wrote a program to gather information
about individuals that may or may not be responsible for things related to the
government agency and published that list with the intent that people who are
angry about that should do what exactly? Contact them? Threat them? Stalk
them? I see no other possible outcome of this action.

Either you:

1\. Don't care

2\. Use this information in a bad way, like harassing or stalking the
individuals that are doing their job (I'm assuming).

I don't understand people who do this kind of things and also expect support
and sympathy. You have none from me at least. It is great that Medium and
Github removes such databases to protect individuals that probably are
innocent.

~~~
tomaha
Sadly I think this is the new way of doing things. It gets more and more
acceptable to crucify people using social media. But at the same time I don't
agree with your conclusion that this makes it right for Medium and Github to
remove this. They should keep out of the judging/censoring business. Steam is
sadly the only good example of how it should be done at the moment.

Edit: Because really otherwise they should remove the information from
LinkedIn (or make it non-searchable) which also makes it very easy to get and
they don't do that.

~~~
staticelf
Yes I agree with you that sadly this is the way people do stuff today. An
allegation is more important than truth it seems sometimes.

That said, we must protect people that are having their information leaked
with a nefarious intent. In Sweden for example, you can take this information
find out their addresses and social security numbers since all of this is
basically public information.

Github and Medium simply don't want their platforms to be used to harass and
stalk people, which is a real problem that anyone with experience can attest
to.

------
azertyxxx
For the sake of completeness and putting moral questions aside, what is
currently the best way to publish similarly questionable information in a
hard-to-censor way?

~~~
TheDong
BitTorrent is fairly hard to censor.

Mega.co has an okay track record at this point, though it's centralized
obviously.

Tor hidden service offering a link to the files + bittorrent magnet link may
be the best option.

------
TravelTechGuy
So the author copied data from one Microsoft site (LinkedIn) to another
Microsoft site (GitHub).

Both sites use PII to target people and companies: how many annoying emails
have _you_ received from LinkedIn this week? And I mean the creepy ones,
suggesting contacts based on minute details from your profile, or encouraging
you to import all your contacts so they can be spammed?

But I guess if a user does it, it’s doxxing. I wonder if this is political, or
maybe Medium and GitHub are just trying to avoid a potential fight with a
federal agency.

~~~
Ntrails
> I mean the creepy ones, suggesting contacts based on minute details from
> your profile, or encouraging you to import all your contacts so they can be
> spammed?

None at all. The only emails I get from linkedin are friend requests and
message notifications. That's how I set the thing up.

I'm not sure that's super creepy?

------
Animats
And now the Verge article has been scrubbed of the link to the Gitlab copy of
the "ice-linkedin" repository.

The real casualty here is going to be Linkedin. They don't publicize much how
easily their data can be acquired in bulk.

~~~
luckydata
You mean legally?

~~~
Animats
No, that this will probably cost Linkedin their users who work for government
or government contractors.

~~~
tomnipotent
A user violated the TOS and wrote a screen scraper that pulled down public
profile data. Yeah, real "easy". You cannot protect from this sort of behavior
outside of completely disabling this functionality altogether. Anything a
human has access to, so does software.

~~~
johnnyfaehell
> A user violated the TOS and wrote a screen scraper that pulled down public
> profile data. Yeah, real "easy".

Yep, that's pretty easy to do. But I'm pretty sure this was even easier since
I don't think they wrote a screen-scraper. They just accessed a JSON endpoint.

> You cannot protect from this sort of behavior outside of completely
> disabling this functionality altogether. Anything a human has access to, so
> does software.

You say that like it changes anything. People don't care if it can be
protected against easily, people just care if it can happen at all.

------
King-Aaron
So, I have to ask as it's not noted anywhere in the article... What does the
acronym ICE stand for in this context?

~~~
seanhunter
Immigration and Customs Enforcement

------
cozzyd
I don't get it... the author has a web site of his own. If he wants to
distribute information, why not put it there instead of relying on third
parties? (not offering any opinion on whether or not the data should be
distributed, just questioning the means).

~~~
zdragnar
Throwing it in a git repository ensures that every person who clones it can
readily republish it. Hosting said repository on Github, when anticipating an
utterly massive spike in traffic, is an easy way to not have to pay for said
spike in traffic (either from provisioning or data transfer).

I didn't look at what format the "database" is in, or if the size would make
it (im)practical to simply zip it up and email it around, but if the format
isn't readily consumable by non-technical people, there wouldn't be any reason
to not utilize a tool like git anyway.

~~~
JetSpiegel
Why not host the repo on his site? At least a read only copy.

`git clone --bare` is enough.

~~~
zdragnar
Git itself was a bit of a red herring, even though it was somewhat relevant to
that specific point. I know nothing about what hosting platform the author is
currently using, so to make a quick assumption:

\- hosting on AWS is not free for super high traffic (assuming the free tier
can't keep up) \- serving files from S3 is not free (though it's cheap enough
at low read levels, it adds up)

At a typical level of traffic, the author's current host may be sufficiently
inexpensive. Assuming the author was assuming many, many times the usual
traffic (even if everyone is kind enough to bare clone), it would be a
pointless expense.

Of course, third party hosting can take the content down... and this is where
git became relevant. Assuming the author was more interested in distributing
the content than the prestige of being the distributor, even though Github
etc. took down the repo, every person who has since cloned is now capable of
re-publishing to any new upstream repository of their choosing, on any server.

Assuming, again, that all of this was the goal, it probably made sense to
utilize the free, fast, scalable third party hosting as long as possible
rather than risk self-hosting slowing down or collapsing under traffic, or
creating a massive spike in cost.

That's a whole boat load of assumptions, any of which could be wrong. In the
realm of possible motivations, though, I think it's a fairly logical
conclusion.

------
dnautics
Is there a broad right to privacy if you work a job that is funded by the
taxpayer? I know the states of Maryland and California disclose the salaries
of all professors, postdocs, and grad students, and I believe generally public
servants as a class.

