I don't like the advice he gives on just denying access to .git. I think the idea of cloning the repo in the htdocs folder is just wrong.
A much better approach (or at least, what I use) would be to set up the repo somewhere private with --bare and set a receive hook to checkout HEAD to the htdocs folder, this way the htdocs only has the content and you get the extra feature that you can sneak extra commands on the checked out source (such as building/minifying) without changing the original source
It's actually kind of mentioned at the end of the article:
> Another approach is to use git’s --git-dir and --work-tree switches to move the git repository out of the document root.
Yet another option is to make the htdocs directory a worktree of the git repository, which doesn't require passing flags around, or setting environment variables. Technically, this still leaves a .git, but it's only a file containing the actual location to the real .git directory.
This sounds like an awesome approach. Do you have any resources you could share on a few of the details? Using --bare I get, but I've not yet played with hooks and such.
Git hooks are just arbitrary scripts. They need to exist in a specific location, and accept certain arguments depending on when in the lifecycle they are meant to execute. But they can be written to do basically anything. From linting and syntax checking, to complex deploy behaviors.
I feel like whenever possible, the answer is to stop storing sensitive information in source control. That solves a whole class of problems, including this one.
If your history has sensitive info, see about rewriting the history. If that's not possible, maybe fork the repo, remove the sensitive info, and get the team to switch to the fork. If that's not possible either, make the sensitive info meaningless (reset your DB passwork, revoke the API tokens, etc).
If you've got a static site hostable in htdocs directly, then that "sensitive source code" is also accessible in browser dev tools and View Source menus.
You're correct that secrets shouldn't be stored in source control, but it's a problem from management's perspective:
a) how do you search a repository's history, along all branches (including undeployed development branches, which may nonetheless contain production secrets) to find secrets mistakenly stored in version control?
b) what happens when somebody stores a secret which isn't easily rotated (because legacy systems have hard-coded the secrets etc.)? How do you deal with trying to rotate secrets which aren't well-managed, because the real problem isn't that your teams are storing secrets in source control but that you don't have proper secret management set up across your organization?
The simpler (and more correct!) way to deal with this is to stop using version control for deployments and to start using proper package management and deployment tooling. Version control is not designed as a deployment tool; it is a poor replacement for proper deployment tooling; and teams which think they only have hammers to hit what are not nails but screws, need to learn that sometimes they'll have to go out and get a set of screwdrivers too.
This is really true only if you're ascribing to the perimeter model of security rather than defense-in-depth, though. If your systems are (properly, to my mind) constructed, knowing what your infrastructure looks like doesn't provide significant value. That obscurity becomes a nice-to-have, rather than an essential, aspect of your security.
(When building systems for clients, this is something I stress. "We should be operating as if it is assumed that an attacker has a VPN into your network space and has nmapped all your stuff.")
Counterpoint: Knowledge of the infrastructure can be used in social engineering attacks, e.g. to increase the likelihood of success for password spearphishing.
If people with access to your infrastructure can be spearphished for passwords, I would venture to say that you're probably doing other stuff wrong that needs to be fixed first.
No, you need to delete the .git folder from your server entirely. Ideally, delete it before you deploy. In fact, don't even put a github deploy key on the server. Deploy binaries.
And don't just stop with .git: Delete any folder/file that's not required to operate the app in production.
Yup. I used to host .git on a server but it ended up being more complication than it was really worth. Now I just have a “deploy” script that uses sftp to push all the build artifacts (and nothing else) to the server.
> When deploying a web application, some administrators simply clone the repository.
Step one: stop randomly smearing crap around. Prod should only have files that came from a .deb or .rpm signed by the legit build process, because that's how you know your system is reproducible and has everything it should and nothing else.
Yes, this. Cloning a git repo is not a reproducible build and deploy strategy, especially once you scale past one developer. Building a package (RPM/DEB/Docker container/whatever) once, and promoting it from dev through to test and prod, is. You are guaranteeing that the same code you wrote and tested is what is finally being deployed to production.
If the deploy process is "git pull", you're praying that Joe McGee didn't push some untested crap to the relevant branch 5 seconds before you deployed.
I'll argue with "that's how you know your system is reproducible and has everything it should and nothing else", though. To get there, you need to look at immutable infrastructure, where you're building a new container or VM image for every deploy. Otherwise you might have libfoo installed on the app server, despite your app dropping support for foo 2 years ago.
This is yet another problem. Joe McGee shouldn't be able to push in your master branch anyway.
You can use git for deployment but indeed you must have a clear commit policy to do that (and it's a good idea anyway to push only major curated versions on master).
Woah, that's not very kind! Some of these sites are run on a shoestring by sysadmins who are also juggling other issues, and it's hard for them to justify a brand-new deployment pipeline when the existing 'fput ~/code/the_web_site ftp://hosting/our_great_web_site' deployment costs nothing to maintain and works fine (sales/conversions are coming in).
Solving the problem in one place (your CI/CD server) is a significantly less complex a task than solving it in N places (every server on which your application is running). It removes the concerns around configuration drift (have all your machines been properly brought up to policy?) and enables easier reasoning about the whole thing.
I personally like to mount gitRepo volumes on Kubernetes, and have my CI pipeline automatically update the revision of the deployment whenever it validates tests. Then I have kubernetes roll out the update automatically.
".git" is only one of many checks performed by Nikto (an open source security scanner - https://cirt.net/Nikto2), but there are other checks and many other scanners.
Is this just hosted nikto2? It sounds very cool. Do you distribute against different cloud systems? I know you're a pro from the site address but I'd be worried about quick blacklisting; I'm sure you're effective.
Not only nikto2, but other scanners (https://gauntlet.io/en/product/supported-scanners/) as well. It comes with open source installed, but integrate with commercial too. Currently each scan triggers a new virtual machine - thus a new IP Address and all applications need to be verified prior to execute a scan (e.g., Google Analytics require metatag, file upload or dns record).
> It seemed like an accessible git repository was intended on some websites - mostly open source projects where the website’s sourcecode is available online.
Every so often, the Django security address gets an email from someone who wants to claim bug-bounty money because "Dear Django team, I have discovered source-code disclosure vulnerability in your web site..."
Even then though, they should be aware of the meta data that's stored in the repo and make sure it's appropriately sanitized.
> On the other side, we had to hold our breath when we noticed that more than 100 projects used HTTP-Authentication for server-client communication. That means, that the protocol://user:password@host/repository combination is saved in the .git/config file, giving attackers access to the users (companies) GitLab-instance or GitHub/BitBucket account. With a bit of luck an attacker gets access to the CI-Server and then runs malicious code to further compromise your infrastructure.
It is possible to separate the work tree from the git repository files with the "--separate-git-dir" flag. .git is then a file whose contents point to the directory where the repository files reside. Any other command works as usual without specifying the directory, so it is just needed for clone or init.
If I remember my Apache config right, the two examples are switched. The 2.4 config should be 'Require all denied' and the 'Order deny,allow' the old 2.2 syntax.
An interesting thing to note is that .pack files give you some safety against this sort of disclosure. Bare git objects are very easy to access even with indexes disabled because their name is their hash (and so if you have access to the index or the current HEAD you can recreate the history). Pack files contain multiple objects, but their name is computed from a hash of the packed objects. This makes it quite difficult to figure out the path to the pack file (you have to brute force the entire history and how it was packed in order to get a single .pack file's worth of data).
Not that you should have .git exposed on your public webserver anyway. I do remember participating in a CTF that had a problem like this a few years ago, it's possible that it was the same one the author mentioned.
Ah, that makes sense. I probably should've thought about it a bit more, because if there wasn't mapping from object -> pack then git wouldn't be able to quickly look up objects in packfiles.
The real solution to this problem is to reference passwords, tokens, keys, or and other "private" strings with environmental variables or external config files, which are then excluded from the source control system. That way your super-secret stuff can never be extracted from git or svn. This approach also has a whole class of additional benefits relating to being able to run a system in different places (for example- setting up dev->test->prod staging servers)
here's one of the blogpost's authors. Although it has been a while since we published the blogpost, I'll try to answer any questions or listen to any suggestions.
> Bad people can use tools to download/restore the repository to gain access to your website’s sourcecode.
So if I post my website's sourcecode on github, I'm equally vulnerable? I could see problems if said checkout contained a credential cache, but that doesn't seem to be mentioned.
If you post your source code to github you know it will be public, and might remember to remove passwords from debug code etc., however if you expect it to be private, while it's not, you might be lazy and store passwords in plaintext in the code. A horrible thought for a programmer, yes, but human beings are lazy.
git clone https://username:password@github.com/... will end up in git reflog, so yes, it's a problem, and there's no guarantee a future feature of Git that you've never heard of doesn't make the problem worse
I don't understand. Do you mean, the remote also has a reflog? It sure does, it's local to the repository!
But entries recorded in my reflog are not pushed to or fetched from any remote I interact with. Instead, their reflog is updated when I push (to record that my commits were accepted & their branches/tags changed), and my reflog is updated when I fetch (to record that their commits were accepted & my branches/tags changed).
And what happens when Ansible checks out the Git repository for you? Or some equivalent shell script or deployment system, or some helpful developer hand-deploying something with 'git clone' on the server?
Or what about helpful developer that checked in some secrets which are visible in the repository history but not the current checkout?
Or that stupid PHP thing where 'config.php' and its MySQL passwords are world-readable, but rely on the web server interpreting it as a PHP script due to its file extension to prevent secret leaks.. not so valid when a copy of the script if available as ".git/objects/00/cf74f2066b0c72a4c4b2a24ef116f1fd23df42".
But of course, even if these weren't problems, the original point still stands: there is no guarantee .git doesn't contain secret data (such as username:password) either now, or into the future, so exposing it is a bad idea.
That's too much commentary for me to extract from a single fake link to an NXDOMAIN.
> there is no guarantee .git doesn't contain secret data (such as username:password) either now, or into the future, so exposing it is a bad idea.
The same can be said of the HTML and images, so I don't find it a useful heuristic. Note that I was disputing your claim that a username+password used to fetch a repo over http would leak into the remote's reflog.
If you perform a Git checkout on a web server e.g. as part of an Ansible script, and you embedded secrets in the repo URL (common enough, believe me), then that secret is readable per above.
FWIW this isn't some unbelievable theory or hypothetical scenario, I've seen plenty of Ansible setups like this and found domains with this exact problem in the process of writing http://pythonsweetness.tumblr.com/post/52587443706/devs-plea... a few years back
You might not remember me. I'm the poster you're responding to. How have you been? Me, I'm all right.
I was just thinking of when we first spoke… it seems like so long ago! I remember it as clearly as yesterday: you had made a partially-conherent argument that the auth creds for a git URL could leak into a remote deployment's reflog! Oh, how we laughed, and our amusement doubled in size as you fancied a implausible situation where the read-only deployment credentials could be recovered from the very same repo they allowed access to!
It was much later when we crossed paths again, but your talent for sharing inventive tales had not waned in the slightest. For this next performance, you regaled us with the simple truth that no person can be certain that their commit history will not reveal their darkest secrets, and thus should strictly eschew sharing it in a public place; but that the contents of their index was above suspicion, and could be shouted to the world without a moment's thought! Many of us stumbled to determine what byzantine process made the working directory automatically scrub itself of secrets, before finally the jape dawned on them.
I eagerly anticipate our next encounter; what fresh new hilarity will you share with us?
I hope my restatement of my understanding of your position helps make my position clear,
Doesn't solve the issue of the data being available on a public server, though. Any piece of code run by the web server would most likely have access to those directories, as well as any other static content in the web root...
Isn't this a non issue (don't need to change any config to block .git) with a properly configured firewall and nginx proxy passing to localhost when the code does not live in a publicly visible location? Eg- https://www.digitalocean.com/community/tutorials/how-to-set-...
You could have worded it a little differently: if a folder is not accessible in the root directory of the web server, there is no need to modify the web server config to deny access to .git.
These type of snarky responses discourage newcomers to participate in discussions. I have seen this happen to many people, so please dial back the snark.
I see where you're coming from. From what I understand you're suggesting the same thing as Hamcha, who currently has the top post: make the web root a subfolder in version control, so the version control folder is above the web root. However, when I read it, it sounded like "if you have some uncommon setup with proxying to localhost [and then filtering out requests to .git?]" which indeed sounds like addressing the issue. Your second comment clarifies what you mean.
It's an issue, just not really specific to git and has been around for a long time. The issue is having source files or any sensitive info, under the web root where it could get exposed by an incorrectly configured web server. This is why modern setups keep the source code somewhere else and use some sort of application server behind the web server or similar arrangement.
If someone being able to download your source code repository is opening yourself up to attacks, you're doing something wrong. Either you are relying on security through obscurity, or you checked keys into git. Both horrible practices.
I would do it, if I was certain I could avoid accidentally publishing the repo. I'd never describe it as 'bad', as that's extremist & it's easy for someone who does this to misinterpret you as saying, "you are bad for doing this and not following best practices".
A much better approach (or at least, what I use) would be to set up the repo somewhere private with --bare and set a receive hook to checkout HEAD to the htdocs folder, this way the htdocs only has the content and you get the extra feature that you can sneak extra commands on the checked out source (such as building/minifying) without changing the original source