Hacker News new | past | comments | ask | show | jobs | submit login
Repositories held for ransom by using valid credentials (gitlab.com)
300 points by linuxbuzz 46 days ago | hide | past | web | favorite | 154 comments

> We believe that no data has been lost, unless the [...] GitLab copy was the only one.

One difference between how GitLab and GitHub run their infrastructure is that GitLab doesn't keep reflogs, and uses git's default "gc" settings.

As a result they won't have the data in question anymore in many cases[1]. Well, I don't 100% know that for sure, but it's the default configuration of their software, and I'm assuming they use like that themselves.

Whereas GitHub does keep reflogs, and runs "git repack" with the "--keep-unreachable" option. They don't usually delete git data unless someone bothers to manually do it, and they usually have data to reconstruct repositories as they were at any given point in time.

GitHub doesn't expose that to users in any way, although perhaps they'd take pity on some of their users after such an incident.

This isn't a critique of GitLab, just trivia about the storage trade-offs different major Git hosting sites have made, which might be informative to some other people.

I'm surprised no major Git hosting site has opted to provide such a "we have a snapshot of every version ever" feature. People would probably pay for it, you could even make them opt to pay for access to backups you kept already if they screwed things up :)

1. Well, maybe as disaster backups or something. But those are harder to access...

Re GitHub keeping unreachable data, if I understand it right, isn't that GitHub painting a giant target on their back? Wouldn't that that imply every secret accidentally committed and then 'deleted' is still accessible, when one would expect it not to be? It's one thing to have your source code in the wild, but pairing it up with thought-to-be-deleted secrets would be an absolute disaster.

Certainly one should not ever keep using a secret once it has escaped into a Git repo, but I'm sure it happens quite frequently.

> Wouldn't that that imply every secret accidentally committed and then 'deleted' is still accessible

This should be a moot point because anyone (in IT) should realize that an accidentally committed secret is now 100% public for all eternity and needs to be rendered irrelevant to restore secure operations.

And a hundred times so for any public repos. There are bots feeding on the GitHub firehose, scavenging for accidentally committed credentials.

A few years back (2015 or so) the average time from push-to-repo to AWS account compromise was 6 minutes. Surely that time has only gone down, and the number of different credentials identified has gone up.

> the average time from push-to-repo to AWS account compromise was 6 minutes.

Wow, I didn't realize it had become so efficient, but I shouldn't be surprised. I never really understood the value in hosting non-public software in the public, and if it's open source, it shouldn't be getting anywhere near secrets that can be used to extract money from its developers.

I remember thinking, back when it became trendy for people to upload their personal dotfiles to Github, that it would be a source of endless suffering. Who knows what information you're leaking in your ".profile" or ".bashrc"? Is that risk justified by the dubious benefit of storing your dotfiles on the internet for everyone to see, forever?

I had accidentally pushed an AWS credential out a month or two ago- within about a minute and a half AWS had disabled the IAM user, and automatically emailed me(as well as my entire org- how embarrassing!)- when we were going through the access logs it looked like it had taken only a minute and a half longer for some other, presumably malicious, system to attempt to access my compromised user. Probably between 2 or 3 minutes total. I'm not a huge Amazon fan but props to AWS for saving my butt.

Why have credentials anywhere outside of the .aws directory in your home directory? When developing locally all of the SDKs will read them from there and when deploying to AWS, the SDK will get them from the attached role.

I understand what the best practices are, it was a total mistake- I never even intended to push what I did to github.

`git diff` before making a commit `git log` and `git show` before pushing to a remote

These 2 simple things have saved me on more than one occasion.

> A few years back (2015 or so) the average time from push-to-repo to AWS account compromise was 6 minutes. Surely that time has only gone down, and the number of different credentials identified has gone up.

I don't doubt that a second and I'd like to use that as a quote. I'd like to be prepared if someone doubts it, so: Do you have a primary source for this?

This paper may be relevant to your interests: https://blog.acolyer.org/2019/04/08/how-bad-can-it-git-chara...

I'll need to find the talk I lifted it from. Not easy... but looks like downthread a sibling comment gives a relatively decent update about the current speed of compromise.

Answering myself: I think it was a BSides London talk. (Quite likely from 2017.) After doing a search, I don't think it was recorded.

Hence, I can't provide a primary source. Sorry.

I thought that AWS nowadays is also feeding at the firehose and auto-disabling any of its keys it could find in a commit?

Security isn't rendered in absolutes. We have to assume some sheepish new employee somewhere is scared of approaching management about a mistake they made committing a secret, so they reverse the commit and pretend nothing ever happened.

We have to try and mitigate damage from lapses in communication and protocol like that.

Math and cryptography don't care about a sheepish new employee (thankfully). The fact that leaked secrets will cause trouble is not mitigated by git forgetting a deleted commit. It is only mitigated by revoking that secret and creating a new one and not leaking it. So if a sheepish new employee fails to revoke them, why blame git or any other system? We have contracts, insurance and then criminal code for people who fail to follow protocols.

> So if a sheepish new employee fails to revoke them, why blame git or any other system? We have contracts, insurance and then criminal code for people who fail to follow protocols.

Because you won't know that a protocol isn't being followed. Your contracts, insurance, and criminal code won't cause you to realize that an employee caused an infosec incident if they don't tell you (and neither will your math and cryptography). And the more you threaten use of the criminal code, the less likely people are to admit that they made a mistake.

You can either build defense in depth (e.g., regular secret rotation, policies on use of GitHub in the first place or better yet automation that only pushes publicly after internal review, DLP via a corporate MITM, segregating your open source dev from your secret dev, etc.) or you can let your single defense get breached and have no idea.

The criminal code? I doubt there's anything in there that criminalizes a failure to follow an employer's secret revocation policies.

No, I wasn't referring specifically to this case. Generally, if people don't "follow the protocol", we have criminal code. If machines don't follow protocol, they end up with wrongly decrypted garbage data. It was to highlight the point that we have different measures to deal with people than we have with computer security, because math cannot prevent people from deviating from their protocols.

No need for any git-blame, the blame lies in GitHub not making this feature optional and more known. So well known that a noobish employee is aware of it, so that they do not feel like they are "safe" and no longer need to alert management about their error.

Contracts, insurance and criminal code are responsive measures, not preventative measures. Security is preventative, not responsive.

> the blame lies in GitHub not making this feature optional and more known.

Which feature are you meaning should be optional and more known?

In either case, the secret is already out whether the user wants to admit to it or not

But in one case, damage is mitigated because the sys admins didn't assume everyone is infallible and strictly adheres to protocol.

The correct way to deal with fallibility in this situation is to make it feasible to change secrets when they leak, not pretend they weren't leaked.

That doesn't prevent someone from not following protocol.

It's not their job to prevent that.

It is a sys admin's job to mitigate damage from security leaks and to introduce hardened, fault-tolerant security paradigms.

You are just hacking the leaves. Once the secret is posted. It is public, it has multiple copy elsewhere on the internet. Even if you delete there is a copy kept somewhere on the internet -- and that's not an assumption. For example, iirc, Github copy is dumped to google every some-x-time.

Hmmm, it seems like having layered security, where accidentally exposed credentials aren't automatically "game over" would be better than not.

Not sure how practical that is to implement for every technology, but for many it could probably be done.

Seems like a time+money vs risk trade off thing.

Is it mitigated? Once it's leaked you can't force everyone who may have captured it to delete it. So GitHub deleting it doesn't solve the problem.

The definition of mitigation is to make something less severe. Yes, GitHub making this policy as clear as possible and allowing controls to toggle it per-repository or per-account mitigates the problem.

I agree with the general statement about security absolutism (it's often very dumb and irrational), but in ths case in particular, most keys are swept from GitHub within seconds of being pushed, so the additional harm of not pruning those commits is very low. Data loss concerns are probably a much larger source of harm to weigh against it.

Well now I would say that I'm not interested in most keys, and I am interested in figuring out how to mitigate damage from the rest of them. You only need one key to get inside.

99% coverage is not good enough from a security standpoint, not when we can achieve 100%.

Simply, this functionality should be transparent and toggleable.


I think you're confused about what absolutism means. Just because I want 100% coverage when achievable does not mean I am being absolutist.

Yes and this is why Github handicapped their search so much.

But at the end of the day, any secret you post publicly is compromised.

That's a very poor reason to not have good searching functionality considering the uses far outweighs the potential risk. I highly doubt it's the case.

I'm 90% sure it is.

GitHub had great search, then they took it down when they found people we're scraping credentials with it, then they had bad search. I'm connecting the dots.

Thankfully the guys over at Sourcegraph feel differently about how searchable projects should be.

You can permanently delete data from github, but doing requires a bit of work and a message to customer service: https://help.github.com/en/articles/removing-sensitive-data-...

If you ask nicely they'll run a one-off "gc expire" for you.

It also requires an attacker to know at least the partial SHA-1 anyway. It's infeasible to start brute-forcing that without being banned for dDoSing them, and if you know what the SHA-1 is you probably had access to the data already.

But yeah. It definitely creates security caveats peculiar to git, e.g. a hostile actor guessing that a force push in an IRC commit announcement clobbered secret data, and the accessing the old commit in the web UI.

This is precisely why secret rotation mechanisms are essential. If you are regularly rotating your secrets, your window of vulnerability for an accidentally leaked secret reduces to the rotation window. With good automation, and in the context of secrets which don't need to be remembered or input by a human, your secrets should be rotating nearly constantly. Additionally, automation greatly reduces the risk of human intervention, which reduces the risk of a human writing secrets to files by hand, which reduces the risk of those secrets being committed to version control in the first place.

Of course, automatic secret rotation is hard. Vault is a great help, but it can't be grafted onto everything. Good DevSecOps engineers are worth their weight in gold.

They do keep disk snapshots for 2 weeks though (created twice a day).


Thanks for linking to that. I’m asking inside the company about the comment that the repositories are lost. I think it is a lot of work to restore individual repositories as opposed to restoring a disk, so maybe that is why we said that.

Also any commits pushed between the last snapshot and the deletion would be lost too.


Hey, since I'm having trouble getting a response from support/security and I know you're active on here, can you get someone to respond to this ticket (related to OP)? I don't usually try to publicly escalate something like this but it's urgent and the ticket's gone 24 hours+ without a response.


That is a crazy-ass quote. “We believe that no data has been lost... well, except for the data we keep. But you weren’t actually relying on us to save any data, right?”

I know, back up everything at least twice. But still, when somebody loses one of your copies, they don’t get to say “it’s cool, no data was lost, you have other copies, right?”

To provide additional context, on GitLab.com, we maintain two weeks' backups. The last time we restored a single project repo, it was a significant effort that utilized many hours of an SRE's time to complete.

What’s crazy isn’t the loss of data (that happens, and I don’t really expect most cloud services to be save me from explicit deletion requests) but rather framing it as no data being lost because your customers have it elsewhere.

"many hours" is too high. Regardless of this incident, if backups take too long to restore, you may as well not have them.

I agree, the longer it takes to recover the less valuable the backups are. However, in this context where we're restoring individual repo's (and all the metadata baggage involved) it's much different than a DB restore or other disaster recovery/prevention mechanisms.

The per-repo restore time today is not where it should be. We're working to speed this up so we can help users recover and get back to a productive state quickly.

Does support have the ability to restore keeparound refs? Internally (at least as of late 10 series), anything show in the UI (ie, merge requests) is also copied to refs/keeparound/sha1, which isn't presented to users.

I'm surprised github runs regular git. I'd always assumed they were emulating it, especially with the lag we've observed between github-api and github-git at $DAYJOB (update repo 1 via api, update repo 2 via api, fetch repo 1 and repo 2 via git, we've had cases where the repo 2 update was visible but not the repo 1).

There was a talk about "Scaling Git at GitHub" [1] a few years ago at Git Merge. While it may no longer be accurate, it still gives some insight into how GitHub is run.

[1]: https://youtube.com/watch?v=xK5yaWTt0R0

I think github runs https://github.com/libgit2/libgit2 which is not regular git.

Just out of curiosity but how do you know about this inner workings of Github and GitLab?

I contribute to git, and I read the mailing list, where people a lot smarter than me comment about this sort of thing.

I'm also on a team that runs an in-house enterprise GitLab instance for an S&P 100, so I have experience with it in that configuration, which I understand isn't different from what gitlab.com uses in this regard.

None of this is secret or some sort of insider knowledge. If you know how "git gc" works you can trivially observe most of the behavior of these hosting sites from the outside.

E.g. try pushing a commit and then view it at to git{hub,lab}.com/YOU/PROJECT/commit/SHA-1. Then "push --delete" the branch that references it.

You'll find that you can still view it on both sites, even if when you clone the relevant repository you won't get that SHA-1. This is because it's expensive to do a reachability check before serving up the content, and the web frontends access the object store directly.

Then if you e.g. keep making pushes sufficient to trigger a "gc --auto" and it's been longer than the relevant git "gc.Expire" time(s) you can deduce that the site uses something close to git's default "gc" semantics, or not. If you do this on GitHub.com you'll find you can access the data for longer than that, possibly "forever".

Which is actually a thing relevant to data recovery in this case. If those impacted by this security incident have lost their data, but have some of the SHA-1s involved (e.g. because they were pasted in IRC) they might find they can still view that content on gitlab.com if they were to browse it in the commit/tree/blob view, and painfully recover it that way. They won't be able to clone it since neither site turns on uploadpack.allowAnySHA1InWant=true.

I know some git internals including gc, expiration, reflog etc but your description was still very interesting. Thanks for taking the time to write this!

I don't know about GitHub, but GitLab has an open source distribution, so you can assume they use the same configuration internally.

You can access unreachable refs by means of Pull Request force pushes; even if you force push, you can still view the state at the original commit given that you have the hash of that. That plus some internal knowing of Git allows the author to guess how GitHub works this way.

As on GitLab, it's open source, so you can easily check that.

> GitHub doesn't expose that to users in any way

Well, links to orphaned commits still work, and GitHub has recently started surfacing UI when you force push a branch.

> One difference between how GitLab and GitHub run their infrastructure is that GitLab doesn't keep reflogs, and uses git's default "gc" settings.

Does this also apply to self hosted GitLab CE/EE? Also how does Gogs/Gitea handle this?

What I outlined applies to self-hosted GitLab, it just uses git's default settings, and I'm assuming gitlab.com does the same.

Of course if you self-host you can simply change the defaults in /etc/gitconfig (which is in /opt/... if you're using the omnibus package).

That's really fascinating. Thanks for sharing. As someone who uses Gitlab (and likes them a lot), I definitely like Github's approach more here.

you are assuming mistakenly deleted data.

circumventing this is very trivial for an attacker.

This tendency of Hacker News users to want to monetize everything is sickening.

It's kind of inherent on a website that's a side project of a venture capital firm.

Besides, without monetisation, you're relying on the goodwill of a surprisingly small number of people. I like to call this "Postel decentralisation" - in the early days of the internet before IANA was the bureaucracy it is today, a lot of functions which people might naievely assume were decentralised were in fact done by hand by John Postel.

I wonder if one could monetize this tendency.

Monetyzer: The world is your oyster, it's about time you start collecting pearls.

Maybe this comment was inspired by WellDeserved, a truly underappreciated but still important app:


No, but that was incredible. Thanks for introducing me to Cultivated Wit.

This is curious to me. You either run a charity or a business. Is that sickening?

OSS is huge on HN, and a ton of HN users release OSS all the time. Yet, we all have bills to pay, and a lot of us look for ways to make money as well. Food and whatnot.

I'm not really sure what you're objecting to here? You make it sound like because a user talked about monetizing a feature to a hypothesized product that they're the same as a pharmaceutical company with life-needing medication forcing users to pay absurd amounts.

I agree that in certain scenarios how you monetize matters heavily. Yet, I can't help but feel that only applies to freedom and life-essentials. Things like basic internet access and medications.

But a git hosting service? In my view, you could open one and make it as colossally greedy as you like. It seems you disagree with this, can you voice your thoughts in more depth?

Thanks :)

The irony of people using fast Internet (often on a fast mobile network), using fast computers, on sites like HN/FB/Twitter, etc, typically with a full belly and in an air conditioned room, to speak about the evils of the capitalistic spirit, always amuses me.

Some of us have bills to pay. This generally isn’t a hobby, but a profession. Until rainbows and good vibes pay the rent, then yes, monetization is important.

To explain some of my sickening tendencies: I'm not associated with any such for-profit hosting site, so I have nothing to gain from this. I just use them.

Implementing such a feature would cost resources that someone would have to pay for. Storage costs would go up, it's not atypical that e.g. a repo that's 100MB on disk might be 1.5x or 2x that (or beyond) if you were keeping every version of every ref ever. Think e.g. accumulating throwaway topic branches with library imports you never ended up using.

So how do you pay for running such a thing, nevermind the initial development cost?

You could just make it "free", but then you'd need to roll the cost onto customers across the board. Or you could only enable such "backups" for opt-in paying customers, but most people aren't going to think to enable/pay for that, or think "I won't need this", until they day they do.

So wouldn't it be neat to have such a service on in the background, funded by high premiums to recover the data in case their backup version is your last option?

I've certainly permanently lost personal data by accident where I'd wished I could have paid hundreds of dollars to get back, nevermind someone for whom such a thing might be of critical business importance.

Think about it as being able to pay money after-the-fact to undo the car crash you just got into. With technology that becomes feasible in some cases, and in particular due to how git stores data & what people tend to store there it's relatively cheap compared to some other types of storage.

Hosting 10 million git repos is easy, getting users to pay for the service is hard.

Yeah. But it's not surprising considering the site is being run by a VC fund.

Host 10m, add a feature that 10k need. Charge each customer enough to cover the 999 who aren’t willing to pay.

Maybe so, but please don't post unsubstantive comments or get flamebaity here.


(Nearly all such generalizations about HN users are just sample bias anyhow.)

Because nothing is free. You don't have to pay for it if it has no value, or you can do it yourself.

Because software companies never charge for features?

Also GitHub users are affected. By the time of writing 379 public GitHub repos have been compromised:


The "Global Association of Risk Professionals" got hit. That should be a fun meeting. https://github.com/GARPDev

I like this threat for open source software published on GH --

If we dont receive your payment in the next 10 Days, we will make your code public or use them otherwise.

(At least for public repos) Isn’t this almost a compliment? (Having another person host your code and use it?) xD

Based upon other comments the title has already been changed several times.

It still suggests Gitlab's infrastructure (internally) was compromised: "Suspicious git activity detected on Gitlab"

Something like "Gitlab users' repos held for ransom" seems more appropriate.

"Gitlab.com was compromised" is a bad title. _Accounts on_ GitLab, and the credentials to access them, were compromised, but the title suggests that the whole platform was affected, which doesn't seem to be the case.

Looks like they changed the title. But I would have to say that the ability to delete a full repo with the credentials is a bit of a vulnerability.

To me, it seems like a good measure would be to mark deleted repos as "delete requested" then notify the users involved and give them a week or two to undo a total delete. Especially if it is an older repo with lots of commits.

Gitlab was NOT compromised. Someone found passwords/tokens for Gitlab repositories exposed on the internet and held them for ransom.

Purging user data is one of the most common action attackers take when compromising an account. This makes it prudent for storage service providers to silently delay mass deletions to the extent allowed by their data deletion policy/GDPR to allow time to discover any breaches, or perhaps require second factor verification like a link sent via email.

There was a Docker Hub breach a few days ago, that's probably related.

I took a good look at how my personal tokens were used in Github and Gitlab.

- Enable 2FA.

- Enable Commit signing with GPG. for the past 2-3 years, I have slowly moved to sign commits and tags. GPG keys take a log of hygiene to work with (sub keys, revocation, etc), but they definitely can help in a situation like.

Git is a distributed VCS. If you have a repo cloned in a secure location (your server, Dev machine, etc), that is just as good as your Gitlab/hub hosted copy.

The ‘play with docker’ site used to make it pretty easy to see what others were up to and snag git creds if they left them around.

The current title "Gitlab.com Was Compromised" doesn't seem accurate. There's someone (or a group) currently attacking online repositories (gitlab is not the only affected provider) using passwords found in scans for files like .gitconfig's and the like. Unless new information comes to light about gitlab specifically being compromised, I'd say this is more about individual private repos being on the sights of a targetted attack.

So git doesn't let you add the `.git` to the index. Most reports I've seen mention that SourceTree was used as a git client. Is it possible that SourceTree committed .git and pushed it to remotes which were then scraped?


Looks like someone was scraping for `.git/config`

I don't get the ransom thing: users of a git repository have a clone of the repo that contains the whole history, no? So isn't it trivial to recreate the repository?

The attacker is also threatening to make these private repos public, or misuse their access to the repos in other ways (likely additional types of breaches).

I do not have local clones of my old projects.

For some repositories the code may be of little concern if the hacker shares it with the world or deletes it.... but it could impact some users.

No, replication is not backup.

Weird aside question: I notice the article says "at approximately 10:00pm GMT". Can someone explain why GMT might be chosen as a reference point here? Is there something I'm missing about the usage of GMT (and not UTC). It just seems particularly odd given that GMT is not (to my knowledge) actually being used as a concrete time-zone at the minute (BST is in effect for daylight savings).

I usually attribute it to mild ignorance, not in a bad way.

For a long time GMT was a good reference point. Times have changed.

I used to work with a gentleman who would always schedule meetings on the phone as:

> Great, let's put that on the schedule for 2:00 o'clock Eastern Standard Time.

There was always a bit of officiousness to his tone and I think he just liked the idea of being precise.

And he certainly was precise. He was also off by an hour for half the year. Somehow no one ever missed a meeting, though.

I always sat on the other side of the room and ground my teeth.

And an additional observation. Many applications that allow a user to pick their time zone typically show offsets from UTC and a time zone name.

It bugs me to no end when I have to select something like "-5:00 Eastern Time (US/Canada)" in those dialogs. I think a lot of people just don't care enough to truly understand time zones and there is enough flexibility in human communication to just absorb the endless ream of off-by-one-time-zone errors.

Isn't that more likely to be an artifact of some framework or library? I have a less than zero interest in creating or maintaining any list of timezones myself, I can tell you that. Besides, if I'm not mistaken, Rails, for instance, is using TZInfo underneath, which is an IANA timezone database. I have to imagine that any other self-respecting web framework is going to also provide things like this out of the box.

Sure. It probably is. I'm not sure what the point is.

It's still something that I see on a regular basis, and it seems clear that I care more about it than others, because in my experience I talk about it more than others.

But the frameworks are not using standard IANA time zone names. Those look like "America/New_York".

The most recent time zone selection I made was installing OpenBSD on a new laptop yesterday. That had me choose a proper time zone name.

As best I can read your post you're implying that I am impugning the character of developers of applications I use. I have already noted very clearly that I think I just notice/care about this more.

You've also appealed to a couple sources of authority (framework maintainers and IANA). If I wanted to impugn the characters of those developers, I think I'd have good standing, as your authorities agree with me on proper time zone names. I don't want to do this, though. I don't think it's a big deal, because, as I've already mentioned, human communication offers much affordance for this type of technical incorrectness. I'm not confused. I doubt others are confused. I'm not frustrated. It just tickles the pedantic annoyance lever in my brain.

Wikipedia article: https://en.m.wikipedia.org/wiki/Tz_database

Tzinfo (note default examples using strings like I mentioned above): https://github.com/tzinfo/tzinfo/blob/master/README.md#examp...

> I'm not sure what the point is.

Believe me, I'm having the same reaction right now.

> The most recent time zone selection I made was installing OpenBSD on a new laptop yesterday. That had me choose a proper time zone name.

If you don't understand the difference between you selecting "America/Los_Angeles" in an OpenBSD installation and the average user being confronted with a list of country/city names vs. a timezone name and offset then I feel sorry for your users.

Luckily, I get to avoid building time zone UIs in my day job.

If I were to put it in a UI, I'd try to have locally understandable time zones as the labels, without incorrect offsets. I'd also probably try to give a better than a drop down selection of multiple dozens of options. I might not succeed, in which case I'd fall back to some lowest common denominator based on a survey of popular services. In any event, I agree with you that it is not a high priority for me, and would not be in any app I might develop.

Few people appreciate the difference between timezones and their UTC offset at a given date. That's because it's very unattractive to learn about DST. Without that, timezone offsets are relatively stable and it wouldn't matter (in the short term) which one you'd work with.

I'm just glad I don't need to have the "DST" talk more often. It takes people a night of sleep to process the level of time fuckery that is DST. Code reviews get delayed for a day when this happens. So yeah, when somebody refers to EST when they mean EDT I wouldn't give them "the talk". It works because people know what is meant.

Just a few days ago I had the understandable reaction of "what do you mean this won't work in India?"

I steer into the skid. I've acquired several clients by giving talks on implementing time intelligence on several analytics platforms.

More often I deal with fiscal calendars, rather than DST issues. The thing it takes them some time to realize is that their attempts to use date functions built around the standard calendar lead to huge pain when dealing with their weird fiscal calendar.

I guess people who aren't overly pedantic say "GMT" to mean "UTC", just like everyone says "SSL" when it's actually "TLS". Older but better sounding names stick around.

In the UK, GMT is often used to refer to "the current British time" both GMT/BST. I've seen the same in the US where people say EST but mean EDT.

Whenever someone says GMT/PST/EST during the summer, I need to ask them to confirm whether they mean BST/PDT/EDT.

Whatever I assume, I might be off by an hour.

Is it? I mean, I'm British and I'm not aware of this.

I've seen a few people do it, but it's also usually corrected as been wrong.

- Sidebar

For those who might not be aware: It's possible to configure your .git config to push to different remotes.

While the fault lies with the users for not following security best practices, including enabling 2FA there are things gitlab/any site can do to help defend against these sorts of attacks. Some suggestions: Treat logins from datacenters as suspicious. (In this case the IP block identified belongs to World Hosting Farm Limited). Treat logins from a new/different ISP as suspicious. Limit access to the account and verify the login via email. It’s not foolproof but as part of a defense in depth strategy it can be quite effective.

Thank you for your feedback and suggestions. Unfortunately, for each of these proposals, we're likely to have users asking us why we are restricting and/or blocking access.

A better defense-in-depth strategy would be to scan each public repo for credentials, and act accordingly when credentials are discovered in repos. We are working on this strategy, currently.

That doesn’t help stop the attacks using breach lists that are even more prevalent.

You could start with email warnings of suspicious activity and fine tune the model parameters based on feedback from false positives. But generally a login from a device that has no previous cookie, from an ASN the account has never used before, especially if that ASN is a known data center, that then immediately attempts a destructive action, should be a pretty big warning flag.

A bit of a misleading headline. Somebody had passwords/tokens for certain repos, either from previous breaches at other services or those passwords being stored in plaintext as part of a deployment.

Since the threat is to make the code public, there is nothing more gitlab can do to shut down the attempted blackmail. It seems unlikely to be a real threat to most?

Odd threat in that paying the ransom doesn't assure they wouldn't make it public anyway.

Also, someone else noted the ransom email domain has no MX or A records, so the instructions to email them won't work. They seem to be hoping someone will blindly pay the ransom.

"...to wipe their Git repositories and hold them for ransom."

What an idiotic strategy to take with git repositories. Every local copy is a complete and fully-functioning copy of not just the code, but all history, etc. It's a non-centralized protocol.

U2F token, which is supported by gitlab, would have stopped this

A client of mine was hit by this. Makes me think I aught to go into security.. I brought up many concerns, and specifically about git access (using very weak mechanisms), a few months ago, to which I was told to "clean things up when I can" but that it wasn't a priority.

This is why I'm hosting GitLab myself even if I'm the sole user of the instance. For one thing I'm less of a target, the other is that this is not the first major problem with the hosted GitLab. I have better uptime for my instance than the hosted version.

Mandated 2FA should really be a thing, especially on tech-oriented sites with such importance.

Both GitLab and GitHub allow organizations to require members to use 2fa. So the option is there.

Someone I know had one of their private repos on GitHub replaced despite having 2FA enabled so it may have been from a leaked personal access token somewhere. What's odd was that this user has push access to multiple active private repos yet only one was hit with the ransom.

I agree, but read that at least one of the users had 2FA enabled and still lost their repos. They said they received no notification emails either.

Out of all things to hold for ransom git repos seem like a bad idea. Most of the time there are multiple clones lying around anyways. I agree that having the source code leak can be bad news, but the code itself being secret should not be a critical part of the business.

Probably not, but I suspect some people will still pay the ransom anyway. Keep in mind, many people also do keep "secrets" used for other authenticating to other services in their git repos - even though it's a terrible idea.

These secrets are now compromised anyway.

So you’re telling me that if, say, Google’s Search algorithm got leaked it wouldn’t seriously hinder their business? There’s definitely cases where leaking a business’ code can be pretty disastrous.

> So you’re telling me that if, say, Google’s Search algorithm got leaked

Google's search algorithm was public for a long time [0], it's been improved now and more under the curtains, but the answer is no.

Google wins not only by technology, but also by size of the index

[0] https://en.wikipedia.org/wiki/PageRank

I believe Google has also bolted enough ML to search that they couldn't really recreate current search with just code.

Also the code base is probably so big and esoteric that there is no way to replicate the services without knowledge external to the repo.

You can get bits and parts and it would be disastrous for Google, but you would need 1000 engineers to reverse engineer the source code into some working search engine or what ever service you would want to replicate.

I've been thinking about what if the competitors got hold of my employers code base. I think it would require years of reverse engineering to get anything useful out of it for their product, and that they could just as well put that effort into their product directly without espionage.

kinda hoping "testing weak passwords on your existing user's passwords" becomes standard practice at some point.

GitHub does that...it warns you with a banner.

The title on HN is clickbait, the article mentions Gitlab users storing their own Gitlab password/tokens insecurely. It doesn't look like "Gitlab was compromised" to me.

The original title is "Critical security announcement: Suspicious git activity detected".

Correct, users across GitLab and GitHub as been affected and in all cases valid credentials were used. Also see https://www.bleepingcomputer.com/news/security/attackers-wip...

We are updating our title to better reflect what happened.

Sorry but the title still doesn't reflect the actual issue which per your link is : "Attackers Wiping GitHub and GitLab Repos, Leave Ransom Notes"

>The breaches seem to rely on the attacker having knowledge of the affected users passwords in order to wipe their Git repositories and hold them for ransom.

Yeah, until I go to my computer and use "git push" again. No?

Also gitsbackup.com is registered but has no A/MX records so...

"Also gitsbackup.com is registered but has no A/MX records so..."

Gitlab should really note that in their blog posts and emails to users. Just in case someone is thinking of paying the ransom.

We agree that paying a ransom doesn't guarantee any further actions on the part of the attackers. But in our blog post we want to stick to what we know and can influence and not talk about an external DNS record that can be added at any time.

> We believe that no data has been lost, unless the owner/maintainer of the repository did not have a local copy and the GitLab copy was the only one.

Too bad they don't make backups of users repositories?

They do infact create snapshots twice a day and keep them for two weeks.


GitLab is not a repository backup service. They are only required to hold multiple copies of what is the current version of the repository for their own hard drive fault tolerance purposes; they are not required to hold copies of the repository from a day, week, month, or year ago.

Was just a question, don't get me wrong here. Thought they would do this for their managed service.

You can see how GitLab handles repository backups on the following page https://about.gitlab.com/handbook/engineering/infrastructure...

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact