
GitHub sued for aiding hacking in Capital One breach - Ice_cream_suit
https://www.zdnet.com/article/github-sued-for-aiding-hacking-in-capital-one-breach/
======
noego
> _The lawsuit said GitHub had an obligation under California law and industry
> standards to keep off or remove the Social Security numbers and personal
> information from its site. The plaintiffs believe that because Social
> Security numbers had a fixed format, GitHub should have been able to
> identify and remove this data_

> _The lawsuit alleges that by allowing the hacker to store information on its
> servers, GitHub violated the federal Wiretap Act._

> _The lawsuit also makes a bold claim that "GitHub actively encourages (at
> least) friendly hacking." It then links to a GitHub repository named
> "Awesome Hacking." ... not associated with GitHub staff or management, but
> owned by a user who registered on the platform_

This lawsuit is a natural extension of the calls for internet platforms to
better police and accept liability for the content they are hosting. These
complaints are usually directed towards the big corporations like Google,
Facebook and Amazon. But if the rationale is accepted, it will need to be
applied universally to startups and SMBs as well. As someone who thinks the
world will be a far better place if we had decentralized dumb platforms, as
opposed to very centralized platforms with heavy-handed top-down censorship
and moderation, I sure hope this movement is turned back.

~~~
ForHackernews
> As someone who thinks the world will be a far better place if we had
> decentralized dumb platforms

Then you should be very much in favour of assigning expensive liability to
companies running these centralized platforms. If it becomes extremely
expensive or legally risky to maintain a big centralized database, that opens
a window for free, open source federated protocols to fill that gap.

Consider: You can sue Megaupload Ltd, but you can't sue BitTorrent-the-
protocol.

~~~
twunde
Ah, but you can sue anyone using BitTorrent the protocol and sue the creators
of the protocol

~~~
ForHackernews
Yeah, good luck with that. If you're lucky, you'll win a judgement for both
the defendant's Playstation _and_ his Xbox. I'm sure white shoe law firms will
be beating down your door to represent you in that suit.

~~~
dahfizz
So breaking the law is only bad if you have lots of money?

People don't care about principles anymore. Everything is a special case and
you have no concern for precedent.

------
nickjj
_> The plaintiffs believe that because Social Security numbers had a fixed
format, GitHub should have been able to identify and remove this data_

I don't see how they can expect to enforce this with 100% accuracy.

SSNs do have a fixed format but other things could potentially follow the same
format.

For example what if you had a library that lets you configure randomly
generated codes in a XXX-XX-XXXX format and it just so happens one of the
random codes matches a valid SSN pattern?

Is GitHub going to mess with your code? What if you have tests that matched on
a hardcoded random number in a SSN-like format. If GitHub scrubs that then
suddenly your CI tests might not pass. Also how would it deal with modifying
your git history so others couldn't clone it with the potentially sensitive
data?

~~~
antpls
Detecting plain SSN numbers wouldn't be difficult with a combination of regex,
machine learning and human verification.

Even if hackers could just encode the SSN numbers, it would at least mitigate
the spreading of PII.

Edit : I don't care about the downvotes, I care about privacy. _Enough_ of the
argument "but wait, can't you imagine the cost?", well if you can't afford to
protect people's privacy, don't do business at all.

Edit 2 : people are totally missing my points. The goal is to not display any
plaintext SSN that would be scraped by bots. As I said, the hackers could just
encode the SSNs, but then the numbers won't be readable by scrapers

Edit 3 : once a project is reviewed and verified, it would stop to trigger
alerts. This is trivial, but the HN mentality is just disrespectful regarding
people's data, until it's their own personal data that leaked

~~~
cgb223
Wait...

Given enough SSNs as training data, could someone make an ML model that churns
our mostly valid SSNs?

Because that could be really really bad

~~~
JumpCrisscross
> _Given enough SSNs as training data, could someone make an ML model that
> churns our mostly valid SSNs?_

Most SSNs are deterministically derived. Given a single valid SSN, you can
often generate millions of valid preceding values.

~~~
ageitgey
That changed in 2011. Now they are random. But it is still true for older
numbers.

[https://www.ssa.gov/employer/randomization.html](https://www.ssa.gov/employer/randomization.html)

------
brianzelip
‘The lawsuit also makes a bold claim that "GitHub actively encourages (at
least) friendly hacking." It then links to a GitHub repository named "Awesome
Hacking.”[0]’

Oh brother.

[0] [https://github.com/Hack-with-Github/Awesome-
Hacking](https://github.com/Hack-with-Github/Awesome-Hacking)

~~~
dfcmt
"A collection of awesome lists for hackers, pentesters & security
researchers."

They are not using "hacker" in the "Hacker News" sense of the word, they are
using meaning breaking into some system. So no "oh brother" moment here.

~~~
stordoff
So? Trying to break into a system can be the only way to know it's reasonably
secure. This is like saying locksmiths are bad. Preventing this makes systems
_less_ secure, defeating the point of trying to ensure privacy.

It doesn't even appear to be an official GitHub page (Hack with GitHub -
location: Bangalore, India, email: hackwithgithub@gmail.com). Just because
someone creates an "X-with-Github" repository, it doesn't GitHub are actively
encouraging X.

------
beecat
> The lawsuit said GitHub had an obligation under California law and industry
> standards to keep off or remove the Social Security numbers and personal
> information from its site. The plaintiffs believe that because Social
> Security numbers had a fixed format, GitHub should have been able to
> identify and remove this data

I can't wait until I get to debug our first build that won't run because we
uploaded some data, however broadly that ends up being defined, with 9 digits
in a row...

------
vharuck
Barriers to posting SSN-like data would make it difficult for a lot of people
to do their job. Software that handles SSN info should have fake data for
tests.

~~~
RijilV
Also SSN isn’t that distinctive of a format. nnn nn nnnn. Check bits and
reserved prefixes were all removed decades ago when it became clear we’d run
out unless we use the whole name space (and even then that buys us to 2100).
\d{3}\s?\d{2}\s?\d{4} will match a surprising amount.

Detecting SSNs is hard without accepting a high false positive rate. Much
harder than phone numbers, credit card numbers, or cloud credentials.

~~~
dfcmt
\w\d{3}[\s\\-]?\d{2}[\s\\-]?\d{4}\w should not have many wrong results.

You can also try to guess is something is a list of SSNs from the context.

~~~
marcinzm
I'd assume many systems would store SS numbers without spaces or dashes in the
backend so that rendering is up to the client. Which means you're looking for
9 digit strings. For example, full zip codes (xxxxx-xxxx) are also 9 digit
strings.

~~~
greggyb
I've posted elsewhere in this thread about this. There's really no reason to
expect SSNs as strings for internal use. 32bit integers readily represent the
same, as the max SSN is just a 9-digit number. I've seen at least one client
store SSNs as INTs in a database and handle left-padding to 9 characters and
interposing hyphens in display code.

Any 9-digit integers are immediately suspect under this reasonable storage
choice.

------
johnnyAghands
Honestly, it scares me that this was even filed. Even though we know how
ridiculous it is to include Github in this suit, I'm afraid we're going to be
left with some weird middle ground that shouldn't even exist to begin with
made by people who have no idea how things work trying to fix something that
isn't broken.

~~~
Arubis
This might be the first good, non-entitled argument I’ve run across for having
some form of software engineering licensure: having qualified people whose
technical testimony in court would carry more weight than Larry McSues-a-lot.

~~~
mindslight
Except that professional licensure tends to attract the lower end of the
spectrum, because obtaining that credential represents a better path to
success. So it would be easy to get a certified technical person to say the
exact opposite for their paycheck, despite the technical consensus being
"wtf".

The general problem you're referencing is one of stature, specifically that
someone who's core activity is forwarding emails without trimming the replies
is viewed as more-equal by the court because they've obtained a thick piece of
paper. Alternatively we could just remove professional licensure from the
field of Larry McSues-a-lot and diminish his stature.

------
fencepost
This is easy. You sue the companies with money.

[https://www.gocomics.com/bloomcounty/1986/06/22/](https://www.gocomics.com/bloomcounty/1986/06/22/)

~~~
DangitBobby
I found this Harvard study a while back about the volume of litigation in the
us versus other countries. Their conclusions are interesting.

PDF link:
[http://www.law.harvard.edu/programs/olin_center/papers/pdf/R...](http://www.law.harvard.edu/programs/olin_center/papers/pdf/Ramseyer_681.pdf&sa=U&ved=2ahUKEwjyuNyS2OnjAhXBSt8KHU-
WAEAQFjAEegQICRAB&usg=AOvVaw1zkpFlLgafNUlTIS0bg6Hl)

"Coffee spills, Pokemon class actions, tobacco settlements. American courts
have made a name for themselves as a wild lottery and a money machine for a
lucky few lawyers. At least in part, however, the reputation is unfounded.
American courts seem to handle routine contract and tort disputes as well as
their peers in other wealthy democracies.

"More generally, Americans do not file an unusually high number of law suits.
They do not employ large numbers of judges or lawyers. They do not pay more
than people in comparable countries to enforce contracts. And they do not pay
unusually high prices for insurance against routine torts.

"Instead, American courts have made the bad name for themselves by mishandling
a few peculiar categories of law suits. In this article, we use securities
class actions and mass torts to illustrate the phenomenon, but anyone who
reads a newspaper could suggest alternatives.

"The implications for reform are straightforward: focus not on the litigation
as a whole; focus on the specifically mishandled types of suits."

I don't know where I first heard this, but I have in my head the impression
that America has the reputation of being overly litigious because mis-behaving
companies think they benefit from creating that misconception.

~~~
hysan
Getting a _file not found_ for that link. Is it available anywhere else?

~~~
DangitBobby
My bad.

[https://www.google.com/url?q=http://www.law.harvard.edu/prog...](https://www.google.com/url?q=http://www.law.harvard.edu/programs/olin_center/papers/pdf/Ramseyer_681.pdf&sa=U&ved=2ahUKEwjfr8nuourjAhVBMt8KHfQpC4kQFjADegQIChAB&usg=AOvVaw32fv5Kmt48xM_zgBWo1rTh)

------
marcinzm
> The plaintiffs believe that because Social Security numbers had a fixed
> format, GitHub should have been able to identify and remove this data

No it's not, especially once you add binary files to the mix.

I once worked at a company that required everyone to run some sort of local
scanner to see if there's sensitive data on their laptops. My laptop with no
sensitive data had something like 10k+ matching files. I promptly ignored the
thing.

------
theshadowknows
I swear lawyers are a bunch of geniuses. How long until your personal computer
becomes a liability because some cached content in the browser becomes
knowingly hosting content? Had it on your phone and took a trip? Now you’re
transporting across state lines. Absolutely ridiculous. And as per my last
comment, this is 100% about increasing settlement size because the lawyers get
a percent. Never been through a class action before? Let me tell you how it
works: the opposing lawyer has a set amount of payout they want before they
even begin the process. They work to achieve that amount. Once it’s agreed on
then they are happy. It has absolutely nothing at all to do with enforcing
laws or protecting rights. It is all about buying a lawyer a new house.

------
ForHackernews
Microsoft's lawyers are going to have fun with this one.

~~~
Avamander
This might the first time I'm happy about the acquisition if they actually do
win this and don't have to create a ridiculous censorship engine.

------
epylar
It would be trivial to post every SSN online. There's maximum one billion
numbers fitting that format. Store each number in 30 bits, that's about 3600
MiB of data.

~~~
colejohnson66
It would also be highly compressible. So maybe 1-2 GB after compression?

------
srmatto
We need to fix Social Security Numbers and the entire SSN system of
identification and stop with this insanity around protecting them.

------
usaphp
Is google liable for piracy if they show you a link to a torrent website?

------
nesadi
"GitHub actively encourages (at least) friendly hacking." Failing to see how
this is a bad thing.

------
dang
[https://news.ycombinator.com/item?id=20598138](https://news.ycombinator.com/item?id=20598138)

------
mebazaa
from the thank-god-for-section-230 dept...

~~~
OldHand2018
Obviously they are going to argue that section 230 doesn't apply.

Section 230 subsection d:

> (4) No effect on communications privacy law Nothing in this section shall be
> construed to limit the application of the Electronic Communications Privacy
> Act of 1986 or any of the amendments made by such Act, or any similar State
> law.

