Hacker News new | past | comments | ask | show | jobs | submit login
GitHub censored my research data (gwillem.gitlab.io)
361 points by doctorshady on Oct 15, 2016 | hide | past | favorite | 192 comments

GL sent me this statement. For the record, I didn't publish vulnerable systems, I published stores that have malware.



GitLab has opted to remove the list of servers that you posted in your snippet. GitLab views the exposure of the vulnerable systems as egregious and will not abide it. While GiLab reserves the right take further action, up to and including termination (https://about.gitlab.com/terms/), we have chosen not to terminate or lock your account.

Please know this decision was not reached lightly and we appreciate your understanding on the matter.

Regards, GitLab

GitLab Support Team GitLab, Inc.

>For the record, I didn't publish vulnerable systems, I published stores that have malware.

This is a crucial point, because it shows GitLab is basically nonresponsive to the key issue; it's the difference between "Here's how to hack Giant Anchor Retailer" (unethical, possibly illegal) and "Giant Anchor Retailer has been hacked, estimated NNN cards may have been compromised" (of public interest, not illegal). In my case, I want to know if I used any of the retailers on the list!

For GitLab to call this "egregious" and that they "will not abide it" suggests that either GitLab is technically incompetent in security matters, or that they've received legal notices and decided that the shortest path to resolution is to throw their users under the nearest publicly-operated multiwheeled passenger conveyance. In either case, poor show, good reason to seriously consider moving off GH and GL.

And even if it were (a list of vulnerable systems, that is), why the fuck do they think that they should censor serious journalism? If you operate a public venue, then it is an important societal role of journalism to report on it if that public venue poses a risk to the public, whether that might also have negative consequences for the people operating it is completely irrelevant.

You mistook "free and accessible" with "public".

You may exercise freedom of speech but not on server that belongs to a private company - it is their right to limit what kind of content they like.

But in an essence you are right - companies should exist to benefit society, but it is not how it exactly works right now.

I really hate this trend of journalism leaking into services like Github. We have secure ways to share files with high redundancy, why put a service like Github/Gitlab in the line of fire when their primary goal is to enable open collaboration, vs open information.

What services are you referring to please?

Torrents are what in mind really, low barrier of entry for viewing, and someone in an oppressed state who can get arround a website block can probably get the torrent anonymously. To me not being easy to edit is an upside, additional data should require the initial trusted party to share a new magnet

Make a torrent?

Can't be easily updated

How about IPFS?

Is there support for managing multiple IPNS addresses with a single node yet?

Are they in the business of journalism?

Lots of people are saying "But the sites are already exploited" ... they are probably still exploitable further also, and GH/GL don't want to be at that party.

Would they be required to publish this story if they "were in the business of journalism"?

This is not about whether they are legally required to do anything, but whether what they are doing is responsible behavior.

No, they would not be required - they are a private business and set their own terms.

As for being responsible - that is their motivation.

Should I assume that now you have access to this list that you will be contacting the site owners to notify them their sites are infected & exploitable? Would that be responsible on your part?

There is a right of free speech in many countries (I assume you are in one of them), but that right does not force anyone else to distribute or publish your speech.

Their servers, and their decision on what data is on them.

Want to make it available for every to read? Run your own server and host it there.

tl;dr - you have the right to say what you want, but you cant force anyone to listen.

And that's why no one is talking about forcing GitHub and Gitlab to do anything. They're merely complaining. Just because someone complains about something doesn't mean they think it is illegal or ought to be illegal.

Yes, but one by one the word "censorship" loses it's meaning. It used to mean preventing people from publishing their work. Now all it means is disagreeing about what should get shown prominently on social networks.

Anyone suppressing a work based on ethical judgments is practicing censorship. They could be acting on the authority of a state or religious institution, they could be removing immodest young adult fiction from the shelves at a children's library, or they could be moderating the content of a web site. Web sites are new, but the concept is the same as it's always been.

Sorry for our mistake Willem, we reinstated the snippet. Also see our blog post about this on https://about.gitlab.com/2016/10/15/gitlab-reinstates-list-o... TLDR; The owners of web stores have a responsibility to their users. And it is in the users interest to have the list published so owners. We currently think that the interest of the user weights heavier.

For reference the HN discussion of our blog post https://news.ycombinator.com/item?id=12715473

I am working on a partial solution to this kind of problem and plan to move from Alpha to Beta version later today.


This is not a CVS, so you would still need to run something like git locally on your own server, but the idea of self-hosted modules will solve for the censorship of central authorities.

So? Gitlab doesnt owe you anything.

Did you ask them for permission to publish a private communication? Probably not, bad of you! -

Github/-lab is for projects imho and not a publishing platform. Why don't you publish it on your blog or something? All power to Github/-lab, kick out such stuff!

Given that they both offer to host web sites, then they are both (to some extent) publishing platforms.

https://pages.github.com/ https://pages.gitlab.io/

As you said: to some extent! Have a look at the "What is Github Pages" [1] and one clearly feels that Pages is meant for software, projects, manuals etc. And NOT to publish documents to shame 3rd party misbehaviour and hopefully attract publicity and quarrel. Such content should go to other places (imho).

[1] https://pages.github.com/

Why are you so angry? This is not about shaming, but protecting customers - first thing I did even before reading the full article was to click the link to see if I have given my CC to fraudsters, then I was gifted with a 404 from GitLab.

Yes you are right, I 'over-reacted'.

They are both offering hosting of websites. They are absolutely a publishing platform.

The linked article is his blog on gitlab.

Github/-lab is for files.

We at GitLab believe the author did not responsibly disclose this security information in a proper manner, and today we removed the list of hosts in accordance with our terms of service (https://about.gitlab.com/terms/).

The author says that he contacted "about 30 merchants directly", but the published list includes over 1000 merchants. Most merchants were neither informed nor given a chance to respond in a timely manner. We did not feel comfortable hosting information that could be construed as an open invitation for malicious users to exploit.

This is completely unacceptable. You're treating this as though the author was publishing a list of vulnerabilities about sites. That's not what the author did. The author published a list of sites that are already infected with malware and thus are dangerous for users to visit. This is a public service and there is zero expectation of "responsible disclosure" to the sites. The only thing that disclosing to the sites without telling the public does is protect the reputation of the site, but there's no expectation for anyone to try and protect the reputation of sites that are serving malware.

> We did not feel comfortable hosting information that could be construed as an open invitation for malicious users to exploit.

The sites were already exploited. That's the thing. At best I can see the argument you're making being "we don't want the sites to get exploited a second time", but that really shouldn't be a concern. The sites were already exploited, there's nothing left to protect. And publishing a list saying "this site has malware on it" doesn't actually tell anybody how the site was vulnerable anyway, unlike disclosing security vulnerabilities which by their very nature informs people how to take advantage of them.

I agree. GitLab's reputation has taken a pretty big hit in my opinion. I kind of expected GitHub's reaction, but had hoped for something better from GitLab (if nothing else, just to differentiate themselves from their opposition).

Definitely stuffed my opinion of them. I thought they were for the people and made good decisions.

Nope - corporate suits - and they give us a response which doesn't even make logical sense. Do they think we're idiots?

Couldn't agree more.

I'm not sure if GL is trying to protect themselves against something and are making up some excuse to justify it but the reasons given for taken down the list don't hold water.

GL would be far better stating the real reason for taking the info down (surely there must be one).

So far what they've done is seemingly to protect the merchants reputation and perhaps protect GL from some imagined legal backlash? The backlash would have no basis is court or other services like Google's own SafeBrowing would not be viable.

Are we to assume the GL cares more about that than users/visitors to these sites?

The statement from GP is frankly pathetic.

The real reason is almost certainly something along the lines of 'it's possible we could get sued for libel' or 'we have been threatened with a libel suit'. Not that such a case would ever go anywhere, but I imagine they figure that it's more trouble than it's worth.

They have safe harbor protection.

But maybe not any more - since they're now shown to be acting in an editorial capacity...

Is it possible that gitlab's position is that the presence of malware is proof that the site is vulnerable, not that the malware is the vulnerability?

I addressed that in the second half of my comment.

I'm not sure if responsible disclosure applies here. This isn't an unannounced 0-day that could in theory be in the hands of criminal elements. Responsible disclosure only works because there is a reasonable chance that an exploit is not already widespread.

> an open invitation for malicious users to exploit.

Malicious entities have already exploited these websites. This problem is widespread - over 1000 merchants. The author has not put the cart before the horse.

If I wanted to protect my family against this by installing uBlock Origin on their machines, could uBlock possibly be hosted somewhere where they wouldn't face this censorship? They have in the past temporarily blocked websites (e.g. Sourceforge adware) but have rapidly unblocked them when the issue is resolved; this has saved my bacon on numerous occasions.

I really appreciate the transparency here - kudos. You have your facts wrong.

Why are you putting the interests of merchants before the interest of users (of these merchants)?

Do you think Google should "responsibly" disclose and wait for webmaster's response before putting websites into Safe Browsing list?

Can I host a project which includes a list similar to Google Safe Browsing, or an adware remover where I list software I consider to be adware?

What about the consumers who are being put at risk of being defrauded? Do they not have a right to protection? Malware infected ecommerce sites could be stealing credit card info and robbing consumers. Merchants who endanger consumers by failing to provide a secure platform for digital transactions do not have any right to be protected from having their negligence exposed.

A 'normal consumer' won't be helped by such a technical list on github/gitlab. Do you really believe they would look there? If they wanted protection they could have installed Ad-blockers etc. long time ago already. (Or use more reputable shops)

Lots of people google the name of a webshop to check if it's legit.

Not all, but some non-tech people do that.. And lots of webshop owners google their own shop. Shaming sites that are hosting malware seems perfectly reasonable.

On topic: I assume github/gitlab both completely misunderstood what is going on, and thought this was a disclosure of security holes that could be exploited. I wouldn't be surprised if they do a lot of these.

Perhaps try to throw it up on a few different CDNs where you pay for the service and can contact support. Like S3 or dreamhost (they have decent support too). Arguably github/gitlab isn't the best hosting platform for misunderstood journalists.

How long is the author going to check and update that list of compromised websites? Right now they are broken but in 6 months when the site gets upgraded it will be a knock against them unless the author updates the list. This is the real problem.

Not many do and they're even less likely to be on the first page of google hits.

a lot of non-normal consumers can end up making a lot of noise, sometimes its enough to cause change to happen

as was noted, 600 sites have already cleaned up their act

> Update Oct 14: 631 stores have been fixed, good work everybody!

So is it really as useless as you claim?

It's really maybe not so useless as I thought. I forgot that such a list may show up on the first page when doing a google search for a (new) shop.

The 631 stores have likely been fixed b/c of the publicity (thanks to kicking the list out;)

I think I just don't like when this shame & name business happens on github/gitlab servers. Somewhere else, it's fine.

You're saying that because normal consumers wont be helped by a "technical" list on github/gitlab, we shouldn't bother?

What about "technical" users?

So when GitLab finally bothers to respond they do so from a new account? Any proof this comes from GitLab?

Which terms specifically does it violate? The terms page you linked to is 10k words long.

As others have mentioned, this is not about responsible disclosure. If those merchants have the good of their customers to heart, they will act to cleanup their sites and disclose the breach themselves. If not they will move to censor this to avoid losing face, maybe not even bothering to remove the malware. And you're just helping them with this.

> So when GitLab finally bothers to respond they do so from a new account?

That does seem quite odd.

Don't you feel uncomfortable in making it harder for users to avoid websites with malicious software? It's definitely worth mentioning and explaining if you do.

Certainly yes, this is why he created new account at HN

According to the article, the stores were running malicious javascript which grabs people's credit card info. This obviously means they are vulnerable in some kind of way, but I fail to see how this is reasonably likely to be exploited. Even if it was, you also have to consider the benefit of warning the users.

I am not a security expert though, and I might be missing out on something.

The responsibility of GitLab and GitHub is not to investigate if those 1000 sites are indeed running malware and how dangerous the malwares on these sites are, and who could be harmed by these malwares.

The responsibility of GitLab and GitHub is also not to judge if it's "more important" to protect the site owners' businesses or the people going to the sites.

If some sites are running malware, the site owners are responsible for fixing it and not harming the people using their sites, not GitLab or GitHub.

On the contrary if site owners could be harmed by the name of their sites being on such list on GitLab or GitHub, then GitLab or GitHub are responsible according to the DMCA.

So GitLab and GitHub are just acting on what they are held responsible for according to the law.

Disclaimer: I am working as a contractor for GitLab and I am not a lawyer. I took no part in GitLab's decision to censor the list and this is just my own opinion.

> On the contrary if site owners could be harmed by the name of their sites being on such list on GitLab or GitHub, then GitLab or GitHub are responsible according to the DMCA.

Nope. DCMA is about copyright, and we have not gotten to the point where someones URL is copyrighted.

According to https://en.wikipedia.org/wiki/Digital_Millennium_Copyright_A...:

> It criminalizes production and dissemination of technology, devices, or services intended to circumvent measures (commonly known as digital rights management or DRM) that control access to copyrighted works. It also criminalizes the act of circumventing an access control, whether or not there is actual infringement of copyright itself.

So no the DMCA is not just about copyright.

These sites are actively participating in harming their users with their negligence. Users should have the ability to know which site is safe and which isn't.

I think responsible disclosure would be to the people who would most be affected

businesses who have insurance for things like this, and possibly have software to reduce any issues that they would be directly affected by

or users who could lose a significant (and damaging) amount of money if they arn't careful with checking their statements?

It seems silly to support the scammers in this case?

tptacek makes a good argument that responsible disclosure is not a requirement: https://news.ycombinator.com/item?id=12309035

I don't believe it is your place to do so. You have the right to, but you have no real reason to. You will suffer no consequences from removing that list.

Responsible disclosure works because companies are responsive to disclosure.

Full disclosure works because companies are negligent and they've been outed.

The author did contact said websites, it is their responsibility to fix the issue in a timely manner. Removing the content was akin to a newspaper removing stories relating to certain politicians to protect them. It is censorship.

This censorship show that you don't even begin to understand the problem.

This list could prevent people from getting their card skimmed, and you take it down.

I'm moving away from gitlab.

To where?

I have to confess, when GH did something offensive I did cancel my paid account there but I still use it, so I guess I didn't care that strongly about it after all...

Someone who is able to exploit such vulnerabilities already has his very own and much much longer list.

I don't even believe that. One or more merchants on that list threatened you with legal action so you took it down.

Even if your cover story is true, you are basically throwing users and banks under the bus to help merchants with dirty card readers continue business as usual. Meanwhile users will continue to be defrauded, and banks will have to take the hit when users complain about charges they didn't make.

Why did you feel like you needed to respond through a throwaway account?

How can we even tell this is an official statement from GitLab?

You've completely screwed up.

You still screwed up handling things. Did you very they are actually vulnerable? Or are you hoping you got it right? Cause if the later you really shouldn't remove something you don't know anything about

Archive of the list on Gitlab which is now 404:


A pastebin link with spaces converted to newlines, apparently for purposes of copy/paste: http://pastebin.com/rYqEeuNm

I ran magereport.com on some of these and it's scary that they have multiple serious vulnerabilities still not patched.

Do this kind of thing on your own domain.

I have a list of major sites with currently active phishing pages.[1] This is basically a join of PhishTank and DMOZ. Nobody seems to be upset by that.

Google is at the top of the list because of their hosting business. It's not just Google Sites. You can put a web site in a Google Spreadsheet cell, which Google doesn't seem to check as a possible phishing site.

If you host for others, or offer a URL shortening service, you need automated checking against all available phishing lists or you will be exploited.

[1] http://sitetruth.com/reports/phishes.html

Okay, so assume he hosted the list himself and is now DDoS'd.

Now what? I'll give you a budget of $100 a year.

Well there is Frantech / BuyVM. We had a VM we used as a proxy for about $15 a year + the $3 IP DDoS filtering. We had several proxies on top of our Limestone Networks server (they didn't offer protection at the time - still this is cheaper).

Game servers become hot targets by kids who just don't care and do anything to get your server taken down, whether it be competition or a user got banned or just about any reason they can sum up to try and find any given approach to take your service down.


Forgot to mention the free CloudFlare option as well (unless they stopped doing this, haven't had to deal with these things in a few years, but I have a feeling it's still about the same, there's likely even more offers now out there).

Well, if it's true journalism, you sign up for project shield for free and laugh at the kiddiots banging their collective heads against the immovable mountain that is Google.


So your recommendation for people who have material that might incite a DDoS is let Github/Gitlab/another free hosting service deal with the expense/inconvenience?

I don't believe that gitlab's offered service is "DDoS Protection"

I wonder how big of a DDoS you can withstand (and remain operational) with Cloudflare Free/Pro.

> I understand that Github doesn’t have the resources to investigate each and every DMCA notice.

The DMCA as written really encourages no investigation whatsoever on the part of the service provider, this is pretty much how everyone acts. File a counter-notice with the service provider if you don't think your content violates anyone's copyright.

In this case, if Github took it down because of a DMCA notice, i think Github actually behaved _better_ than Gitlab. Github is simply following DMCA, if you file a counter-notice, they'll probably restore it -- if they don't, and say it's not an issue of copyright, it's just that they don't want to host your material, then at that point they'll be behaving similarly to Gitlab. Gitlab did not take it down because of a DMCA notice, they took it down because they decided it was 'egregious' and they just didn't want to host it.


I can't find any gitlab docs on filing a DMCA counter notice. Their DMCA policy at https://about.gitlab.com/dmca/ is short and solely targetted at those claiming infringement, there is no description of how to file a counter-notice.

In this case, I think github wins. The terrible parts of github's counter-notice policy (10-14 days until your content comes back) is part of the DMCA law. Take it up with your congresspeople. http://io9.gizmodo.com/the-dmca-how-it-works-and-how-its-abu...

However, reading OP again -- it's not clear to me that Github took it down because of DMCA. They may simply be acting exactly like Gitlab, taking it down because they don't want to host it, unrelated to DMCA. But I wanted to clear up some things about the DMCA, since OP mentioned it.

>>>it's not clear to me that Github took it down because of DMCA.

Exactly, there seems to be a common misunderstanding that if anything is taken down then it because of DMCA, there are number of ways content may be removed from a platform, be it GitHub, Facebook, YouTube, etc. Not all of it is DMCA.

In fact for large platforms that offer take down processes outside of DMCA I would say the vast majority is not, for example anything taken down via ContentID on Youtube is NOT a dmca take down.

How exactly does publishing a list of malware-infected stores fall under the DMCA? I always thought DMCA was meant to be for copyright infringement cases.

I didn't see the list, but did it by any chance contain the logos of the online stores? If it did, the DMCA notices make sense.

The claims probably have little to do with DMCA violations themselves, but it has become an easy way to get content removed quickly from sites without very much review from the host service.

The claims are probably from the malware perpetrators as a way to censor discussion and keep the gravy train rolling. It's possible also that the embarrassed commerce sites would file a DMCA claim to keep the public in the dark about how insecure their sites are/were.

Indeed, this is one of many cases where the DMCA is almost an invitation for abuse - which was, if I remember correctly, also one of the main points of criticism when the DMCA was created.

it falls under the DMCA because the DMCA mandates a takedown-first and check the validity later workflow. To be in compliance, a site must respond to any DMCA takedown notice as quickly as reasonably possible, regardless of how fraudulent it might be.

As long as you're okay being found guilty of perjury by a US court, or have no plans to enter a US jurisdiction in the near future, you can get any content you want taken down from any site that's required to follow the DMCA.

Good luck being rich and getting prosecuted with that.

DMCA is a copyright law. It requires a takedown for alleged copyright violation. That doesn't apply here. The stores don't have copyright over their domain names on a list.

Yes. So to get something taken down, all you have to do is allege copyright violation. If you're not concerned about perjury in the US courts, the claim doesn't have to have merit.

Service providers have a choice:

a) follow the DMCA takedown process and be shielded from all liability, whether or not the claim has actual merit

b) evaluate each takedown notice to decide if the material falls within the scope of copyright, ignore takedowns where in the service provider's opinion the material is outside the scope of copyright, and should it go to court, be forced to defend that position on its merits instead of having an automatic liability shield

What sensible service provider is going to choose option b?

Compilations can be copyrighted. There's a long legal history on the topic. Also, logos are generally a topic for trademark laws, not so much copyright.

> Compilations can be copyrighted. There's a long legal history on the topic.

This is true, but totally irrelevant as it misses the point of the question.

Actually, you're wrong. The DMCA was used as a premise for takedown on the basis of purported copyright infringement. I was pointing out that such a claim, however spurious you may think it is, would likely hinge on the claimant's exclusive rights in the compilation. I can't see any other theory that would support a DMCA takedown. Again, it's irrelevant to the question whether or not you think the claim had merit. Also, the question conflated trademark with copyright.

That's all well and good if this case ever goes to court, which it almost certainly won't. Most service providers will pull down any content that's included in a takedown notice. The alternative is for each service provider to make a legal decision about each takedown notice. That won't happen. If they don't respond in good time, they risk their safe harbor status.

It's smarter (under the seriously f'ed up DCMA system) to pull it down and wait for a counter notice, at which point they put it back up, and they aren't responsible any more — the person filing the counter notice is responsible because a false counter claim counts as perjury.

At that point, the original issuer of the takedown notice can sue.

Of course, it almost never gets that far when the takedown notices are spurious, as in this case. That's what makes the DMCA such a bad idea — it's a perfect tool for censorship.

> I was pointing out that such a claim, however spurious you may think it is, would likely hinge on the claimant's exclusive rights in the compilation.

I don't think it is that simple. Yes, you may claim a copyright on whole compilation. But you could as well claim a copyright on some parts within that compilation. Since the compilation is a single document, claiming copyright on some part of it would also be sufficient for a takedown.

Agreed. I think this is correct. If I recall correctly, the person filing has to bring a claim within N days.

Gitlab CEO just called me and apologized, will restore data shortly.

I am personally very sorry that GL got in a bad light here. They had misinterpreted my data and have acknowledged that. For comparison, I have heard nothing from GH over the last two days.

Gitlab, you rock.

You don't have to be sorry Willem. We just made a mistake and corrected it. Thanks allowing us to correct it and for posting about us fixing it here and on Twitter. And thanks for making the internet a safer place.

FWIW, if GitLab sees this: Well done. We were planning on buying their hosted service (GitHost) for my employer next week and after this story initially broke I was going to table that and use something else. This has restored our faith in GitLab as a company.

Thanks Charlie, glad to hear that. For reference our response blog post is https://about.gitlab.com/2016/10/15/gitlab-reinstates-list-o...

They should be much more careful in how these types of reportings are done.

You were doing a public service of already-explited machines that are snarfing credit card numbers. And that is of great importance for anyone who buys stuff online (Like... all of us).

It also goes to show, that we need to further develop P2P type technologies like IPFS and similar stacks, to rid ourselves from monolithic companies dictatorial hands. Because what they find "unsuitable", you are no longer welcome. That's a problem, for all of us.

I had a similar problem with a VPS provider 2 weeks ago. I run IPFS on all my nodes, and comply with good netizen and compsec principles. They ToSsed me, because "my machine was scanning :4001 and they received a complaint". Bullshit. That is the chatter IPFS uses when maintaining the DHT. However, I was actually using a machine and what I paid for... and they didn't like it.

Fortunately, since all my data is via IPFS (and /ipfs and /ipns thanks to filesystem mounting), my data was already backed up and distributed. Filing a dispute with Paypal and purchasing another VPS provider was simple.

GitLab and GitHub are both pretty active on HN. I look forward to their response - where is it already?!

I'm especially disappointed by GL. GH is already too big to care about such things.

There is a service http://www.cryptograffiti.info/ which can be used for posting sensitive information that should not be removed or hidden. It allows to write a message and store it in Bitcoin blockchain as a transaction. It costs near 0.0015 BTC or $1 per kB. Large files can be posted as magnet links to torrents with them.

Even if the service's site had been shut down, everyone would always be able to obtain the transaction from it's hash using any bitcoin client/blockchain explorer, convert it to ASCII and read the text.

I'd like to note that it is worth to sign with GPG all messages posted that way in order to have ability to post updates and verify authorship.

After briefly looking at https://github.com/github/dmca/tree/master/2016 there doesn't appear to be any DMCA request for gwillem's content, so maybe it wasn't removed through the DMCA process?

Definitely feels like a bad interpretation on Gitlab's part, but not done out of malice.

The person was not exposing sites that nobody previously knew about -- the sites were already compromised, there is nothing to compromise again except maybe having more than one attacker in your compromised account. The damage is already done, though.

These are likely web applications that were not kept up to date so the responsible security disclosure already happened when it was reported for WordPress/Drupal/Joomla. It is the site owners responsibility to pay attention to those security disclosures, which they likely failed to do.

And those compromised sites, in my experience, are usually attacking and infecting other sites and servers on the Internet. That makes them a public nuisance and so public disclosure is necessary so they can be appropriately blocked/isolated.

Discussion of origin of the list here


Article doesn't say that gitlab censored the list, though the gitlab link is a 404.

Also, are they actually using DMCA to get these lists taken down? If so, isn't there some penalty for filing a false DMCA?

The first comment mentions it was taken down about an hour ago.

Hmm. I didn't (still don't) see any comments. Maybe it's a mobile thing.

>If so, isn't there some penalty for filing a false DMCA?

Only if you get caught, which I don't think will be the case if they did anything to cover up their tracks

As I understand it, it's actually worse: the thing you swear under penalty of perjury is not "this content infringes a copyright", but rather "I am the copyright holder or authorized to act on the copyright holder's behalf". So even if you know beyond doubt something doesn't infringe, you're not committing perjury unless you misrepresent yourself as a copyright holder or authorized agent of a copyright holder.

For once, the Gitlab employees on HN don't comment on a Gitlab-related story.

It's a new account that offers no proof that they are legitimately speaking on behalf of Gitlab.

I missed it, plain and simple. But usually there's more.

Perhaps distributing this list is a use case for IPFS?


> I understand that Github doesn’t have the resources to investigate each and every DMCA notice. However, it still took me by surprise that Github censors data so easily.

Send a counter-notification asserting that the data is not under copyright and have it put back up. Assuming this is really DMCA.

So it seems the real bug here is that a site that is hosting malware is doing so because its actually vulnerable to being hacked, was hacked, and malware was installed. So posting the site name identifies a vulnerable site (which is wrong) and stops informing people that those sites have malware on them (which is an issue as well).

That is quite the catch 22. And of course many of the sites owners are clueless and don't even know how to patch or fix their systems.

My isn't that that a mess?

It's not wrong to expose negligence which endangers others.

Isn't pastebin the correct site to post lists like this?

Not if you want to update it.

Also, Pastebin has some shady practices. One example: they offer HTTPS support as a premium feature.

I am sorry, but I fail to see how offering https support as a premium feature could be considered "shady".

All web traffic should be encrypted, regardless of purpose. This increases the work that nation-state level adversaries must do to effectively spy on the population. And it's cheap and easy to do these days.

It's not a question if its easy to do or not. Yes I prefer an encrypted connection over one that is not.

BUT! the point here is the op claims this is a shady practice to provide https to their paying customers.

A shady practice is if they take your personal information and sell it to another without telling you. Shady is when companies lie to their customers.

And this situation is not.

Fair enough. I don't think this is a "shady" practice either. But I think it's an old, out-dated practice to charge extra money for a feature that should be on by default.

Why not ask the archive.org guys to publish it somewhere? The research already relies on their data and they seem to be pretty good at defusing shady legal requests.

Would be fun to see if the list is censored on Google docs.

You should host the list and post a high visibility link (i.e. news article). You'll probably discover real quick the pressures that caused GitHub and GitLab to buckle so quickly.

Isn't it the whole point of GitLab that it's decentralized? As in, you can roll your own instance and stop worrying about censorship?

I'm pretty sure someone here has a GitLab instance that is willing to share for this purpose.

And get DDoSed by the malware guys who don't want their victims know they are in the list and fix their site. It would be an altruistic offer but I would think twice about it.

However it's a problem and we need a solution.

A torrent? Yes but Google won't index it.

A file on S3 wouldn't work because they could just download it as many times as needed to make the bill skyrocket. Better than a DCMA.

Anything that is unaffected by DDoS, costs no money, can be indexed?

Edit. One more requirement: not easy censorship.

Let someone selfhost who can just update the hardware firewall rules every few minutes to mitigate the DDoS?


DMCA, again. I forgot that constraint.

Blockchain maybe?


Not quite. IPNS.

IPFS links are immutable, whereas IPNS links are a mutable pointer to an IPFS hash. They can be updated at will, making them ideal for changing content with a stable hash.

Does the author need a git repo with a web UI, as github/gitlab provide, at all? Or were they just using github.com and gitlab.com as convenient free cloud-hosted publishing platforms?

Back to the original problem of skimming.

I think the fastest way to get sites fixed is to run a script that crawls sites in the list, parses their Twitter and posts a warning there with link to original article.

Can someone help with that?

And just like that, we discover how helpless the average Joe is against corporate money.

Let's crowdfund an AWS s3+CloudFront hosted site. DDosing that is no easy feat, and if corps do try it, the logs can prove their complicity, which has legal implications I presume

It's an endless cycle. The Man keeps you down, so you throw together resources and collaborate to have a crowdfunded solution. It becomes successful, grows, hires some employees to maintain things, keeps growing, and becomes The Man.

@gwillem thanks for doing this important investigative work.

As soon as I read this I assumed it was as a libel prevention method.

I'd be curious whether Gitlab/hub could be held responsible for proving the accuracy of the claims? (That was my initial assumption as to the reason they were taken down.)

Why is a third-party required to publish the list, couldn't it be hosted on the blog post itself? That would have the advantage of being to archive the whole thing on a single page.

To be fair, maybe some automated method that hub/lab owners have not vetted the data overall. I hope your list stays up/public personally as long as you are willing to take responsibility for its upkeep. I wish there was some format to submit this list (and your responsibility for keeping up with it) to vendors on the lookout for this kinda stuff (up to them to decide inclusion).

Please post it to pastebin.com or something like that! put up a torrent I want to know which sites so i can steer clear of them.

What is the problem to publish on your own site?

Or publish by BitTorrent?

It would probably get DDOS'ed just as quickly.

Just put it behind the free Cloudfare protection.

>Only limited DDoS protection and mitigation is provided to domains on a free or Pro plan through "I'm Under Attack" mode. If you are looking for advanced DDoS protection and mitigation and frequently suffer sizable DDoS attacks, please consider looking at our Business or Enterprise plans.


by whom?

I don't know enough about these kinds of situations yet to form a reasonable argument for-or-against. Is what was done considered a kind, favourable thing for the developers behind those sites or is it something that shouldn't have been displayed?

It doesn't matter what is kind for the devs, it matters that their website has malware causing their user's data to be compromised.

"Moderated", not "censored". Neither GitHub nor GitLab have stopped the message going out from outlets other than their own. Would we be comfortable calling the moderators here on HN "censors"?

I tried to pay in several of these shops. Most didn't even had the functionality to pay with card. The only one in my sample where I was able to get to the payment form had redirected me to the proper payment gateway.

http://gogsys33repvmfz5.onion/ Free gog git server in Tor.

Also, it will ask for an email on registration, but it isn't verified and no email is sent.

I wonder how google acts, if you dump the list here:


That's covered in the article:

> I have, prior to publication, submitted all URLs and malware samples to Google’s Safe Browsing team. They have since only acted upon a small portion of the sites.

Money talks. Uness it's consumers' money being stolen, it seems.

Now I see. I thought this was a list of stores hacked through the current vulnerability. This is a list of stores hacked through 1-2y old vulnerabilities.

Post it on a WordPress.com blog or host it on an OVH box or put it behind CloudFlare. All of these are quite censorship resistant in my experience.

The article states the lists were taken down but does not say why. Perhaps there will be an explanation forthcoming from Github or the researcher.

The only way this will get fixed is if a script is written to take advantage of the vulnerabilities and clean the sites affected.

Post it to Pastebin.com or make a torrent!

Well, as an absolute last resort, you can use Freenet or Dat to store your list.

Have you thought about contacting Adblockers or even Browsers? They might be interested in this data to block the sites for the average Joe.

From the article:

7. I have, prior to publication, submitted all URLs and malware samples to Google’s Safe Browsing team. They have since only acted upon a small portion of the sites.

When I read that I assumed it was Google Search team, not Google Chrome team

I'm kind of with Gitlab on this one, just publishing a list of broken sites isn't going to help them get fixed. Most of the owners probably barely know the Googles from the Facebooks, so even if you email them saying 'you have this JavaScript thing that's bad' they won't understand and will blow you off.

OP doesn't go into details of how they check the stores, but I'd assume they have some sort of script as they checked 255k. If that's the case it would be trivial to send an automated email if malware is detected, and include links explaining how to fix it.

It won't resolve everything but it's a lot nicer than naming&shaming businesses who have effectively done nothing wrong. What I mean is they probably hired a developer or team to build their website, and assumed that they would build a secure website - they didn't go out purposely and find someone to build them a site that would be hacked.

> naming&shaming businesses who have effectively done nothing wrong.

Aren't they actively running malware as a result of their own laziness not to upgrade their Magento[1] Site?

This malware is being used to steal customer credit cards (at the very least) - perhaps identity theft as well. They are the very agents of 3rd-party hackers. This list[2] should be sent to the proper authorities and have the stores closed immediately.

Customers who have made purchases at these stores can and should be looking at lawsuits.

If you're going to run a 3rd-party solution for your ecommerce needs and patches have long been made available, you patch. If you can't even do this one simple thing - you get run off the internet for criminal negligence.

[1] https://magento.com/ [2] https://www.magereport.com/page/about

> I'm kind of with Gitlab on this one, just publishing a list of broken sites isn't going to help them get fixed.

So the malware should be allowed to continue stealing credit card numbers just because the site owners don't know any better?

Is that really a position you wish to defend?

No I don't agree that it should be allowed to continue, but how is naming&shaming people going to fix anything? Nothing is going to come of this, other than maybe some other hackers will see them as weak targets. Do you expect this list to be read on prime time CNN or something? People who want to buy something online aren't going to search through GitLab to check if the site has been hacked (maybe they should though), they just look for the green padlock and assume it's safe.

As I said, and GitLab suggested, OP should at least contact them. If you contact them and they say they won't do anything, now that's a different story...

> So far, between Oct 10 and Oct 14, 631 stores have been fixed. [0]

So it sounds like does a pretty good job of fixing things

[0] https://gwillem.gitlab.io/2016/10/14/github-censored-researc...

> No I don't agree that it should be allowed to continue, but how is naming&shaming people going to fix anything?

Someone will make a browser extension that uses this list to warn users?

Well, from the responses he's got it's pretty obvious some shop owners just don't care. I can imagine if they're being publicly shamed about it, they might start caring and actually do something about it. So yeah, it helps.

Sites will always continue to carry malware. Naming and shaming without looking at all the parties involved is a crude and ineffective way of changing things.

Change the browser, change the payment system, educate the user by using plugins, propose enhanced security methods in ECMAscript. Write about how easy it is to make missteps on the net. These are all alternatives which might help in a more permanent fashion.

That's like saying "corruption will always exist, investigative journalism is crude and ineffective - instead educate people to spot corruption and live with better morals".

There's nothing ineffective about uncovering a problem and using transparency to create an incentive to fix it. Crude, yes, but not ineffective at all.

What GL did might be considered complicit in skimming users, especially if they acted under pressure from fraudulent shops or even the skimmers themselves (how do we know?).

Did you even bother to read the article before commenting?

There is nothing a client can do when the server is compromised.

Actually, yes, I did read the article. When I read 'Javascript malware', I understand this to be ECMAScript which is executing on the clients machine but delivered by the origin site. The ECMAscript has access to the keyboard while the window has focus, so could do a MITM. Like I said, there are various ways to prevent this from happening:

1. Instruct users never to enter card details directly into a website, but rely on a redirect to the card provider. This would change the origin. When properly setup, this should catch 99% of the problems.

2. Provide stricter browser controls, so third-party ECMAscript is not loaded into the browser by CORS. Again, instruct the user or have us make better browsers.

3. Lobby for better payment services. Here in the Netherlands, payment is done using iDEAL, on a separate origin (using redirects) and the payment is validated using a separate device.

The secondary problem is with the websites, the primary problem is with the supporting technology.

So, you don't think that if you operate something that is a risk to other people, it's your obligation to reduce the risk? You can just say that you are incompetent at what you are doing, and therefore you shouldn't be shamed for putting other people at risk?

I edited my post, but I don't think that's really fair. The business most likely outsourced the development of their site to someone who probably assured them that they would build a secure site. The business probably trusted them (maybe the developer was even recommended) yet here we are. The business didn't know enough about building a secure website, so hired someone they assumed did.

Edit - Poor analogy removed.

A solution to this is to hire a pentester for your site. You can find them for ~$5k, with followup tests for new features being around $2k. A professional, world-class pentest runs around $50k, but a lot of smaller sites can't afford that.

You can't really hire someone with the expectation that they'll develop secure code. Finding flaws in code people thought was secure is a pentester's job, and it's a completely different skillset.

I agree, but my point is the people who hired the developers probably don't even know what a pentester is.

Yes. A lot of them also respond with threats when you try to contact them to tell them about vulnerabilities. It's not really up to the author to shoulder that kind of responsibility.

https://news.ycombinator.com/item?id=12309035 is informative. For what it's worth, it changed my mind on the matter.

What's even your analogy here? Because sometimes people have bad luck and you can make up a completely unrealistic scenario of how bad luck someone possibly could have, we should not hold anyone responsible for anything?

To give you an idea of how unrealistic your scenario is: In reality, yes, the owner of the damaged property could force you to pay for the repair, but also, you could force the plumber to reimburse you for that, and they in turn probably will have insurance for that sort of thing that will reimburse them in turn. Noone would be kicked out of anything, except for the plumber by the insurer if they had that happen a bit too often.

> Most of the owners probably barely know the Googles from the Facebooks, so even if you email them saying 'you have this JavaScript thing that's bad' they won't understand and will blow you off.

So, because you expect the owners of the affected websites to be dumb, users should not be protected from malware?

But I'll give you an example of how such sites would get fixed: I used to own and work for a CSE, where thousands of stores are listed. If I had easy access to this list, I'd take a quick look at it for possible clients and ask my former company to contact them. They'd first delist the affected shops from the comparison shopping website, then spend enough time with the owners to make sure they understand and can fix the problem.

Github in their ignorance prevented this, but thankfully archive.org exists...

On the other hand, the safe browsing feature of modern browsers usually creates enough pressure on affected shops that they are fixed quickly - so if some shops don't seem to act quickly, they're probably fraudulent themselves.

> I'm kind of with Gitlab on this one, just publishing a list of broken sites isn't going to help them get fixed

This had been empirically proven false. As noted in an addendum to the post, 631 broken sites got fixed in just 2 days after the list got published.

> It won't resolve everything but it's a lot nicer than naming&shaming businesses who have effectively done nothing wrong.

They are putting their users at risk through negligence. Many would argue that's wrong.

>send an automated email if malware is detected, and include links explaining how to fix it

Twitter is better. Warned users are the best motivation to fix.

Whenever a business hires a contractor to do something for them, the customers will blame the business first if anything is wrong in their product or service. It is up to the business to take the damages in reputation and lost business opportunities to the contractor. Why should we handle this differently for software running online shops?

The malisious code on those websites isn't your bussiness guys. If their owners wish not to respond and not fix it -- they have a right to do so. Why are you all so anxious? It has nothing to do with you all. Just don't buy from them, that is simple.

Needless to say, I've seen most of those websites for the 1st time.

The world is unfair? No, it's fair and this proves that it is fair.

What a stupid comment. Here is a list of thousands of sites that people other than you do visit and buy from, causing their cards to be skimmed, and your response is 'stop being so uptight'?

Imagine if this is instead a list of real world ATMs. The correct response is not "well ive never seen these ATMs before just don't use them, duh", it's "how the fuck can this be allowed to happen and how do we fix it".

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact