
One in every 600 websites has .git exposed - jamiejin
http://www.jamiembrown.com/blog/one-in-every-600-websites-has-git-exposed/
======
phantom_oracle
Imagine you implement every type of possible security...

Keeping your entire server-stack up-to-date, making sure you have SSL, using
strong encryption for logging-in, hashing the passwords, making sure your
server can only be reached via SSH, adding firewalls, filters, etc. etc.

Then some hacker in Eastern Europe comes along (or some beginner at the
NSA/GCHQ) and finds out that your .git is exposed and somehow gains all vital
user-data and admin data.

Being bashed with a boulder repeatedly would probably be less painful than the
torture of knowing "I did it all, but they got me with an HTTP request...
because nobody thought of double-checking what our VCS is doing".

How many other glaringly obvious mistakes might be out there right now? I can
only imagine.

~~~
Dylan16807
Wrong lesson.

Don't put secret keys in your repository.

Someone getting a copy of your code should be a big annoyance at worst.

~~~
dangero
Where is the right place to store db passwords, api keys, etc? What is best
practice in this area?

~~~
cddotdotslash
I use AWS for a number of applications, so I've started doing the following:

1\. Create a JSON file containing encrypted secrets (DB pass, etc.)

2\. Upload the file to a secure S3 bucket with fine-tuned permissions and
server-side encryption

3\. For the instance that is launching, include the permission in the IAM role
that allows it to "S3:GetObject" on the specific JSON file you uploaded.

4\. Deliver decryption keys to the app in some manner (Chef, Ansible, etc.)

5\. When the app starts, it downloads the JSON file and loads the environment
variables.

I wrote a blog post[1] about this with more detailed info and an NPM module if
you use Node.js.

[1] [http://blog.matthewdfuller.com/2015/01/using-iam-roles-
and-s...](http://blog.matthewdfuller.com/2015/01/using-iam-roles-and-s3-to-
securely-load.html)

~~~
err4nt
This system is truly beautiful, I think one of the best suggestions in the
thread (definitely the best self-rolled solution not using other tools)

I have a question about redundancy or "What happens if your gatekeper EC2
instance goes down"? If you have multiple gatekeepers could they be set up
this way:

\- let's say you have five different web apps using a gatekeeper to hold their
secrets

\- let's say you have n gatekeepers (let's say 3) and each of the apps knows
the address of all three gatekeepers.

\- If the primary gatekeeper is unreachable, all five apps would try to
contact the secondary gatekeeper, but that gatekeeper would only (ever)
respond in the event that the secondary gatekeeper _also_ found the primary
gatekeeper unreachable.

It's like a sleeper cell - at any given moment you have multiple replacement
gatekeepers ready and waiting to serve, but each of them is unable to respond
unless the one above it in the list stops responding. In this way you could
lose gatekeepers (even permanently) and build a little bit of resilience into
the apps depending on it while you're able to sort out what happened and
restore normal behaviour.

Is this a good idea?

~~~
cddotdotslash
I'm confused by your reference to "gatekeeper EC2 instances." In the scenario
I described, the secrets are housed on S3, not a separate EC2 instance. So,
theoretically, as long as the underlying instance running the application code
can access S3, and S3 doesn't go down (very unlikely), there shouldn't be any
issues.

------
markwakeford
For Apache,

    
    
        <Directorymatch "^/.*/\.git+/">
          Order deny,allow
          Deny from all
        </Directorymatch>
        <Files ~ "^\.git">
          Order allow,deny
          Deny from all
        </Files>
    

[https://serverfault.com/questions/128069/how-do-i-prevent-
ap...](https://serverfault.com/questions/128069/how-do-i-prevent-apache-from-
serving-the-git-directory)

~~~
TheDong
Probably better to link to the stackoverflow you quite possibly copied this
from so that people can see discussion / alternatives, etc:
[https://serverfault.com/questions/128069/how-do-i-prevent-
ap...](https://serverfault.com/questions/128069/how-do-i-prevent-apache-from-
serving-the-git-directory)

See also the nginx question: [https://stackoverflow.com/questions/2999353/how-
do-you-hide-...](https://stackoverflow.com/questions/2999353/how-do-you-hide-
git-project-directories)

Note, if you actually did take it from the stackoverflow, you just infringed
on someone's copyright; SO's user content is 'creative commons, attribution
required'.

Edit: Thanks for adding the attribution, all clear with copyright now :)

~~~
userbinator
_Note, if you actually did take it from the stackoverflow, you just infringed
on someone 's copyright_

Does a simple access rule, which I can't see there being many sane ways to
express, meet the minimum level of creativity/originality to be eligible for
copyright...?

~~~
morninj
Probably not, but it depends on the court and the skill of the lawyers.

------
Nate75Sanders
Obviously you shouldn't be storing sensitive information in your codebase (I
hope everybody knows that), but the problem here is that you might have been
_way back when you were prototyping and then moved them out of the codebase_.
It's really common to start a codebase just by hacking something together with
hardcoded secrets.

If you have the proper secret segregation now, but you're deploying by doing a
git pull, now you run the risk of not really having segregated secrets all
over again.

~~~
agumonkey
Growing a project from one shot mindset prototyping is really problematic.
Every time I wish I started by using a real project structure and design
philosophy.

~~~
nostrademons
And then every time I actually start a project with a real project structure
and design philosophy, it goes nowhere and I wish I hadn't wasted the time. Or
best case, it's used by a few people internal to whatever company is currently
employing me, and security doesn't really matter.

The tech industry is shaped like a funnel, with lots of raw, bad ideas at the
top and a few smash mega-hits at the bottom. 99% of the ideas at the top are
bad; investing more time than is necessary to prove them out is a mistake.
100% of the ideas that make it to the bottom wish that they'd spent more time
designing things at the top. But y'know, if they'd actually done that, they
wouldn't have made it to the bottom, they'd be outcompeted by the guy who got
a quick and dirty prototype up, made his users happy first, and then closed
the gaping security holes (hopefully!) before anyone noticed.

~~~
agumonkey
I can't deny that nature likes quick n dirty, but I wish I could just find a
balance between reckless and too slow.

~~~
nostrademons
The balance is whatever works, gets people using the product, and ideally
keeps them happy.

The balance is generally far more on the quick 'n dirty side than most
engineers (myself included) would prefer, but we could look at this as a
cognitive bias of engineers rather than a failing of nature.

------
hoodoof
Is there an automated "security as a service" service that if I subscribed to
it, it would have told me that this is a problem on my websites?

It really annoys me randomly hearing about critical security issues through
tech news websites - there should be a more systematic way for "non-security
professionals" to ensure their sites are protected to best practice levels.

~~~
rasapetter
There's a service called Detectify that might fit the bill.
[https://detectify.com](https://detectify.com)

------
317070
It seems Google doesn't like people looking into the extent of this problem
[1].

When googleing for "inurl:.git", it returns no results. And on top of that, I
need to enter a captcha first?

[1]
[https://www.google.be/search?q=inurl%3A%22.git%22](https://www.google.be/search?q=inurl%3A%22.git%22)

~~~
username
The .git directory wouldn't be crawled though, correct?

~~~
Dylan16807
Most of the time it wouldn't be crawled.

But zero results is blatantly wrong.

------
akerl_
It seems like if you're storing secrets and the like in your code's repo, the
solution is to not do that, rather than just putting a bandaid over it by
hiding the repo.

Deploy the secrets separately: they don't belong in your site's codebase.

~~~
bbcbasic
Hiding the repo is hardly a bandaid. It should never be exposed even if the
repo is perfectly secret-free.

Except in the rare cases where it is intentional e.g. an open source repo and
you happen to want people to download it from the same domain not github or
git.domain.com.

~~~
akerl_
Can you elaborate on why it shouldn't be exposed?

Presumably, for most commercial entities, the parts of the site that are
valuable are the assets, which are served from the site as part of it doing
the thing it's meant for. For a large percentage, they're running a CMS like
Wordpress or Drupal or whatever, where the codebase is public anyways. And for
even more, we're talking about a directory of hand-crafted HTML files, where
the version control is the HTML of the site plus some "damn, I always forget
to close my tags" commit messages.

~~~
bbcbasic
> Can you elaborate on why it shouldn't be exposed?

Because there is no need to expose it. If you expose it then you need to vet
not only it's head but it's entire history, and for what purpose?

> For a large percentage, they're running a CMS like Wordpress or Drupal or
> whatever

I doubt someone running Wordpress would be using version control anyway. Most
of the time it is WP + some standard plugins and no custom coding.

> And for even more, we're talking about a directory of hand-crafted HTML
> files

And anything else you happened to have in your directory. Your passwords file?
Even if you commit a delete there is still history!

------
nodesocket
In nginx, best to just not serve dot files:

    
    
        location ~ /\. {
            deny all;
            access_log off;
            log_not_found off;
        }

~~~
therealmarv
This returns 403 and in my opinion logs should not be turned off for that. I
would return 404 to not expose that you are blocking . files with your server.
My suggestion to put in each server { ... }

    
    
        location ~ /\.  { deny all; return 404; }

------
userbinator
More precisely, it's "one in every 600 websites _examined_ "

Git is popular, but I find it hard to believe that 1/600 of all websites on
the Internet use it.

~~~
pessimizer
You find it hard to believe that 0.17% of all websites use git? I'm sure 10x
that many do, most probably don't misuse git to deploy rather than solely as a
source code manager.

~~~
crindy
Most websites are made through Wordpress, Squarespace, Wix, and similar
products. I bet the number here is far lower than 1/600.

------
Zarel
Related question: Is there any risk to exposing .git if your Git repository is
already publicly available (e.g. on GitHub)?

~~~
viraptor
If it's exactly the same repository - no. If it contains some extra branches
with local changes, or potentially commits with private information /
passwords - definitely.

So in general - it's better not to have it in the first place, because it's
unlikely that the person doing the commits knows the whole deployment
strategy.

------
iiiggglll
Not going to name names, but there are mobile apps have their .git packaged up
with them too.

------
jvehent
90% of security incidents are due to human errors, not to some secretive
hacker group spending $10m to crack TLS. Doing system administration right
(eg. no secrets in repos) has a lot more impact on security than implementing
all the other complex controls.

Of course, doing everything is much better.

------
aaronbrethorst
I wonder what would happen if you searched for .svn, too. I'm sure you'd run
into the same problem in many places. But would it be more or less likely to
occur?

~~~
viraptor
I think less likely. Svn actually had an `export` command, which allowed you
to do a checkout of a specific commit with no svn metadata. If someone was
actually using svn for deployment, they likely knew about it.
([http://svnbook.red-
bean.com/en/1.7/svn.ref.svn.c.export.html](http://svnbook.red-
bean.com/en/1.7/svn.ref.svn.c.export.html))

~~~
matt_kantor
git has `archive`, which is essentially the same thing.

[1]: [http://git-scm.com/docs/git-archive](http://git-scm.com/docs/git-
archive)

~~~
viraptor
No, archive is very different. `svn export` could be used at the target side.
Basically you can export from remote repository to the chosen directory
without any extra operations, so .svn is never created.

`git archive` requires you to have a clone of the repo from which you can
create an archive. That means people are more likely to just do a local
checkout than play with archive on top of it.

Deploy with svn export:

    
    
        svn export url.of.repo destination/path
    

Deploy with git archive:

    
    
        git clone url.of.repo
        cd repo_name
        git archive --format=tar some_commit_or_branch | (cd destination/path && tar -xf -)

~~~
nraynaud
I completely agree, every time I do a new deployment script for git, I miss
the old svn export command, I hate the idea of cloning on the deployment
machine, having to git sync in the script etc. Whereas 'svn export' feels
stateless, and is clean by construction.

------
quicksilver03
Some of the other commenters suggest adding git-dir and work-tree to the git
commands, but there's a better solution: use the --separate-git-dir option
when cloning the repository.

For example:

    
    
        git clone --separate-git-dir=<repo dir> <remote url> <working copy>
    

where <repo dir> is outside of any directory served by the web server and
<working copy> is the htdocs root.

This option makes <working copy>/.git a file whose content is:

    
    
        gitdir: <repo dir>
    

The advantage is that all git commands work as usual, without the need to set
git-dir and work-tree, and that there's nothing special to add to the web
server configuration.

~~~
raverbashing
I disagree

It may be possible that gitdir is still accessible through a misconfiguration
or security issue (and you're giving them exactly where to look)

Production servers have no business having the .git directory anywhere.

~~~
quicksilver03
> Production servers have no business having the .git directory anywhere.

I agree with you in principle, but in practice this is not always possible,
there are situations when having a git checkout in production is better than
nothing.

I've seen WordPress sites where a semi-technical administrator updates plugins
and themes directly in production: with that git checkout I would at least be
able to track the changes and pull them in a dev or staging environment.

This could be the first step to a saner deployment workflow for those sites,
where production gets changes that have been tested and validated elsewhere.

------
chdir
A comment mentions this deep below, but I think this deserves a bit more
attention:

If you're using a modern framework with url routing, you don't need to worry
about hiding .git or .hg in your webserver config file.

------
georgerobinson
I once jumped-in on a PHP project where the previous developers had written:

    
    
        $page = $_GET['page'];
        include ($page.".php");
    

Whilst allow_url_include
([http://php.net/manual/en/filesystem.configuration.php#ini.al...](http://php.net/manual/en/filesystem.configuration.php#ini.allow-
url-include)) was set to false, I could still craft a URL like:

[http://example.com/?page=admin/index](http://example.com/?page=admin/index)

which expanded to
[http://example.com/index.php?page=admin/index](http://example.com/index.php?page=admin/index)
where the real admin was at
[http://example.com/admin/index.php](http://example.com/admin/index.php) and
offered complete access to the backend without authentication or authorization
- let alone other files in the file system.

In another project, I found that the server had register globals turned on,
and therefore could craft a URL like:

[http://example.com/admin?valid_user=1](http://example.com/admin?valid_user=1),
where valid_user was a PHP variable set to true iff their session cookie could
be authenticated in the database.

I think it's terrifying that these things still make it through to production
websites

------
hoodoof
Someone on StackOverflow says this will tell nginx not to serve hidden files.

location ~ /\\. { return 403; }

My question - do I need to put this once at the top of my configuration file
and all it good or does it need to go into multiple places in the nginx
config?

It would be great if there was a simple, universal way to say to nginx "don't
serve hidden files from anywhere under any circumstances".

~~~
therealmarv
it has to go in every server { ... } section. Also use "deny all;" to really
block. See my other answer here in comments.

------
kelyjames
Looks like this returns some results inurl:.git "intitle:index.of

------
blindhippo
Why are people serving web traffic to a folder with a .git folder anyways? I
thought it was basic deployment practice to export your code OUT of the VCS
before deploying... every shop I've worked at had this in place.

Other solutions just seem hackish to me, but every project is different I
suppose.

~~~
nmrm2
So that deployment is "just" git pull.

I don't get it either.

------
sarciszewski
I have a fake /.git on my personal website to troll would-be hackers into
wasting their time. (PROTIP: I don't run Laravel there.)

So far a few people have requested my .git/ directory but none have attempted
to plunder the riches they think they'll find within.

------
foobarbecue
What I find more interesting is that github is full of passwords and
credentials.

------
stcredzero
This is one example for why hierarchical directories are bad. They're not all
bad, but with all their power and flexibility, they carry some inherent flaws.
It's much the same as JSON, which is also versatile and highly useful.
However, both of these abstractions tend to lead to the ad-hoc creation of
more complexity and more details to remember, while having no clearly
delineated way to be self-describing.

(Is JSON an abstraction? Not really, as it's a concrete spec, but its general
kind of serialization format is an (incomplete) abstraction.)

------
garethsprice
It always felt unclean to have the source repo on production, so we use
deployhq.com (similar: circleci.com, Jenkins) to push changes up when code
changes are made, rather than pull them from git. We also use a clean-up
script that removes any SASS, Grunt, etc source files when deploying to
production.

------
therealmarv
How to check all your nginx access logs (also compressed) in Ubuntu if anybody
has accessed your .git directory (example with root access):

    
    
        apt-get install zutils
        zgrep -r "\.git" /var/log/nginx/

------
lugh
How can a project be accessed/downloaded with just the .git folder?

~~~
brandonwamboldt
Even with directory listings disabled, you could request
[http://example.com/.git/refs/heads/master](http://example.com/.git/refs/heads/master)
to get the sha of master, then get
[http://example.com/.git/objects/<sha](http://example.com/.git/objects/<sha) 2
char>/<rest of sha> to download the data for that one sha, then you can get
the rest.

Wouldn't be terribly difficult to write a script to crawl it. Git's format is
well known and documented.

------
aikah
It's clear the problem involves some PHP sites developed with git and instead
of using a specific www directory inside the project the server points to the
root folder of the project thus exposing .git (and the rest). Classic dumb
error by PHP developers. I have hard time believing one would be able to
expose the .git folder in a Rails,Spring or Django application since the
public folder isn't the root folder of the project.

I wish servers would be configured so they don't server ^\\..+$ files by
default. I wish servers would behave as secure as possible then it's up to the
developer to whitelist features rather than the other way around.

~~~
realityking
Excluding anything that starts with a period also doesn't work - RFC 5785
specs the folder .well-known with special meaning.

~~~
samuellb
True, but you can whitelist /.well-known/. I don't think anything else uses
dot-filenames in URLs, because not all operating systems and software even
allow such file names (for instance, the file browser in Windows forbids it
when creating a new file or folder).

------
francisb07
There are configs within nginx that prevent this from happening. Also, I
symlink only the publicfiles from a different directory into my /var/www/html
folder.

------
nmalaguti
The author doesn't give any suggestions for alternative ways to deploy.

What are the best practices here? What should operators that currently deploy
this way do instead?

~~~
orik
599 out of every 600 sites don't have the problem of a public facing .git
folder. I don't think alternative deployments need to be suggested as they are
commonplace.

It depends on what sort of platform and server you're running on but just do
what you have to do so you're not serving your .git.

~~~
ludamad
Doesn't change anything, but worth noting, these are not necessarily 599 git-
using websites being compared.

------
jasonkester
I see a lot of discussion here about best practices to avoid this, but nothing
about the obvious question: Why is it that the go to version control system
for developers everywhere is designed such that there it puts a single file at
the root that magically gives anybody that can see it the ability to download
all the source code in the repository?

That is:

    
    
      - completely non-obvious and unexpected
      - a terrible idea
    

Why, instead of trying to figure out how to avoid handing this magic file out
to everybody, are we not trying to fix it so that no such magic file need
exist?

~~~
grey-area
Where else would you suggest Git keeps the version control information for a
repository, if not at the root of that repository? SVN tried it in every
subdirectory, which is worse, using a separate db server would add unwelcome
external dependencies. It could be in a non-hidden folder, but that's annoying
for those who just want to see their files under VC, not an implementation
detail of the VC method. Also, there are hundreds of dot files scattered
around your computer, and none of those should go anywhere near your host -
.git is just one example.

The error here is uploading sensitive or hidden information to a web host into
a public directory, not how it is stored locally.

If you use the root of your app, including source code and _hidden files_ , as
the public directory of your website, one permissions error means all sorts of
things might be exposed, e.g. other dotfiles would also be exposed, and
potentially all of your source code too, because you're relying on the web
server to hide it somehow in every instance. That's the problem that needs
fixed here (exposing the wrong files to public root), not that one particular
hidden folder exists.

------
zimbatm
Imagine you find a typo or bug on some site that you use daily and it really
annoys you. How awesome would it be able to just `git clone
[http://www.microsoft.com/`](http://www.microsoft.com/`), fix the issue and
then either host your own version of the site or submit a pull-request. Seems
like it's already possible on 600 websites :) (well they probably didn't run
`git update-server-info` ).

------
bhaai
For all those using github pages (gh-pages) to host their website, don't worry
they've got you covered :D

------
mailslot
I discovered almost the exact same problem in a very large production site
once, except it was the .svn directory.

------
mkhpalm
I'm not surprised in the slightest given the current trend to build everything
in production.

------
vvpan
Wait, do people store repo passwords in .git? Otherwise a simple remote
address means nothing.

~~~
rckclmbr
I think it's more they have access to your codebase, which could then be
analyzed to look for exploits.

------
emmelaich
git (and hg and svn and tar and cpio and pax and zip and ...) should set the
permission of the archive (vcs db) to the most restrictive of the permissions
of those added.

Also ... webservers should make it very hard to offer up .dotfiles as
webcontent.

------
NewsReader42
Not rocket science, block it with "deny" ...

------
herf
tor exit node scanning my site 6:20pm PDT.

~~~
samuellb
I found what seems to be a botnet scanning for /.gitattributes in my logs.
First scan was on 2015-06-02

------
marcosdumay
I'll get on tangent here, and blame PHP.

The model of exporting a Unix directory as the structure of a website is
barely good enough for static sites (you'll get into all kinds of problems
with URL management), and is completely unsuited for applications.

Now, of course, PHP was created as a tool to add a visitor counter at the
bottom to your pages. With a bit of caution, it's indeed secure enough for
that. Nowadays people create huge applications using the same security model,
and PHP developers don't even think about changing it.

