
Don't publicly expose .git (2015) - g4k
https://en.internetwache.org/dont-publicly-expose-git-or-how-we-downloaded-your-websites-sourcecode-an-analysis-of-alexas-1m-28-07-2015/
======
Hamcha
I don't like the advice he gives on just denying access to .git. I think the
idea of cloning the repo in the htdocs folder is just wrong.

A much better approach (or at least, what I use) would be to set up the repo
somewhere private with --bare and set a receive hook to checkout HEAD to the
htdocs folder, this way the htdocs only has the content and you get the extra
feature that you can sneak extra commands on the checked out source (such as
building/minifying) without changing the original source

~~~
glandium
It's actually kind of mentioned at the end of the article:

 _> Another approach is to use git’s --git-dir and --work-tree switches to
move the git repository out of the document root._

Yet another option is to make the htdocs directory a worktree of the git
repository, which doesn't require passing flags around, or setting environment
variables. Technically, this still leaves a .git, but it's only a file
containing the actual location to the real .git directory.

[https://git-scm.com/docs/git-worktree](https://git-scm.com/docs/git-worktree)

~~~
paraknight
Or have a `www` directory and symlink it.

------
kleinsch
I feel like whenever possible, the answer is to stop storing sensitive
information in source control. That solves a whole class of problems,
including this one.

If your history has sensitive info, see about rewriting the history. If that's
not possible, maybe fork the repo, remove the sensitive info, and get the team
to switch to the fork. If that's not possible either, make the sensitive info
meaningless (reset your DB passwork, revoke the API tokens, etc).

~~~
asattarmd
Source Code is the sensitive information that he's talking about in the
article.

~~~
WorldMaker
If you've got a static site hostable in htdocs directly, then that "sensitive
source code" is also accessible in browser dev tools and View Source menus.

------
gonyea
No, you need to delete the .git folder from your server entirely. Ideally,
delete it before you deploy. In fact, don't even put a github deploy key on
the server. Deploy binaries.

And don't just stop with .git: Delete any folder/file that's not required to
operate the app in production.

~~~
evincarofautumn
Yup. I used to host .git on a server but it ended up being more complication
than it was really worth. Now I just have a “deploy” script that uses sftp to
push all the build artifacts (and nothing else) to the server.

~~~
Silhouette
_Now I just have a “deploy” script that uses sftp to push all the build
artifacts (and nothing else) to the server._

We do something similar. "Clean and minimal" is a good strategy when you're
deploying assets to publicly accessible systems.

------
concede_pluto
> When deploying a web application, some administrators simply clone the
> repository.

Step one: stop randomly smearing crap around. Prod should only have files that
came from a .deb or .rpm signed by the legit build process, because that's how
you know your system is reproducible and has everything it should and nothing
else.

~~~
marcosdumay
And then you moved the problem from keeping your files in sync on the web
server to keeping your files in sync on the package source.

~~~
eropple
That's a significantly more straightforward and safer problem to have to
handle. You should prefer it across-the-board.

~~~
marcosdumay
That's the exactly same problem, with the exactly same failure modes and
consequences on failure.

It's also solved the exact same ways, by scripting your stuff on one level or
another.

~~~
eropple
Solving the problem in one place (your CI/CD server) is a significantly less
complex a task than solving it in N places (every server on which your
application is running). It removes the concerns around configuration drift
(have all your machines been properly brought up to policy?) and enables
easier reasoning about the whole thing.

------
andersonmvd
".git" is only one of many checks performed by Nikto (an open source security
scanner - [https://cirt.net/Nikto2](https://cirt.net/Nikto2)), but there are
other checks and many other scanners.

shameless plug: I've developed a service that you run to check against
vulnerabilities in your apps/servers and it has a free plan
([https://my.gauntlet.io/registration.html](https://my.gauntlet.io/registration.html))
in case you're interested ([https://gauntlet.io](https://gauntlet.io)).

~~~
616c
Is this just hosted nikto2? It sounds very cool. Do you distribute against
different cloud systems? I know you're a pro from the site address but I'd be
worried about quick blacklisting; I'm sure you're effective.

~~~
andersonmvd
Not only nikto2, but other scanners
([https://gauntlet.io/en/product/supported-
scanners/](https://gauntlet.io/en/product/supported-scanners/)) as well. It
comes with open source installed, but integrate with commercial too. Currently
each scan triggers a new virtual machine - thus a new IP Address and all
applications need to be verified prior to execute a scan (e.g., Google
Analytics require metatag, file upload or dns record).

------
libeclipse
The author fails to acknowledge a scenario where you wouldn't care, or where
you'd even actively want your source code to be public.

For example, static websites for open source projects, et al.

~~~
codezero
I think they did...

> It seemed like an accessible git repository was intended on some websites -
> mostly open source projects where the website’s sourcecode is available
> online.

~~~
ubernostrum
Every so often, the Django security address gets an email from someone who
wants to claim bug-bounty money because "Dear Django team, I have discovered
source-code disclosure vulnerability in your web site..."

------
mercora
It is possible to separate the work tree from the git repository files with
the "\--separate-git-dir" flag. .git is then a file whose contents point to
the directory where the repository files reside. Any other command works as
usual without specifying the directory, so it is just needed for clone or
init.

------
mioelnir
If I remember my Apache config right, the two examples are switched. The 2.4
config should be 'Require all denied' and the 'Order deny,allow' the old 2.2
syntax.

~~~
gehaxelt
Hi,

I just rechecked with [1] and you seem to be right.

I'll update the blogpost in a second. Thanks for the hint!

[1]
[https://httpd.apache.org/docs/current/upgrading.html](https://httpd.apache.org/docs/current/upgrading.html)

------
cyphar
An interesting thing to note is that .pack files give you some safety against
this sort of disclosure. Bare git objects are very easy to access even with
indexes disabled because their name is their hash (and so if you have access
to the index or the current HEAD you can recreate the history). Pack files
contain multiple objects, but their name is computed from a hash of the packed
objects. This makes it quite difficult to figure out the path to the pack file
(you have to brute force the entire history and how it was packed in order to
get a single .pack file's worth of data).

Not that you should have .git exposed on your public webserver anyway. I do
remember participating in a CTF that had a problem like this a few years ago,
it's possible that it was the same one the author mentioned.

~~~
duskwuff
Not safe, just slightly more obscure. The .pack files are listed in
.git/objects/info/packs.

~~~
jwilk
According to [https://github.com/kost/dvcs-
ripper/issues/6#issuecomment-11...](https://github.com/kost/dvcs-
ripper/issues/6#issuecomment-117952742) , .git/objects/info/packs is not
reliable.

------
auscompgeek
> A tool to discover, one to download and one to extract git repositories.

Hasn't dvcs-ripper [1] been around for longer? It supports other VCSes as
well.

Also, the article fails to mention that a simple `git clone` would usually
work as well, although that tends to be blocked in similar CTF challenges.

[1] [https://github.com/kost/dvcs-ripper](https://github.com/kost/dvcs-ripper)

~~~
gehaxelt
Hi, one of the authors here.

I knew about dvcs-ripper, but thought that implementing another variant might
be fun and let me learn about git internals.

Does a simple `git clone` really work? I just tested it and it failed:

``` $> git clone [http://x.domain.tld/](http://x.domain.tld/) fatal:
repository '[http://x.domain.tld/'](http://x.domain.tld/') not found

$> git clone [http://x.domain.tld/.git/](http://x.domain.tld/.git/) fatal:
repository '[http://x.domain.tld/.git/'](http://x.domain.tld/.git/') not found
```

And yes, the post's background is a CTF challenge that blocked a simple `git
clone`.

~~~
bitwave
I remember, that it was the 9447 ctf in 2014. The challenges were bashful and
tumorous. See: [https://github.com/ctfs/write-
ups-2014/tree/master/9447-ctf-...](https://github.com/ctfs/write-
ups-2014/tree/master/9447-ctf-2014/bashful)

------
fergie
The real solution to this problem is to reference passwords, tokens, keys, or
and other "private" strings with environmental variables or external config
files, which are then excluded from the source control system. That way your
super-secret stuff can never be extracted from git or svn. This approach also
has a whole class of additional benefits relating to being able to run a
system in different places (for example- setting up dev->test->prod staging
servers)

------
gehaxelt
Hello HN,

here's one of the blogpost's authors. Although it has been a while since we
published the blogpost, I'll try to answer any questions or listen to any
suggestions.

~~~
dmitrij
[https://news.ycombinator.com/item?id=14534499](https://news.ycombinator.com/item?id=14534499)

------
jldugger
> Bad people can use tools to download/restore the repository to gain access
> to your website’s sourcecode.

So if I post my website's sourcecode on github, I'm equally vulnerable? I
could see problems if said checkout contained a credential cache, but that
doesn't seem to be mentioned.

~~~
_wmd
git clone
[https://username:password@github.com/..](https://username:password@github.com/..).
will end up in git reflog, so yes, it's a problem, and there's no guarantee a
future feature of Git that you've never heard of doesn't make the problem
worse

~~~
falsedan
reflog is local though

~~~
_wmd
[http://your.broken.site/.git/logs/HEAD](http://your.broken.site/.git/logs/HEAD)

~~~
falsedan
I don't understand. Do you mean, the remote also has a reflog? It sure does,
it's local to the repository!

But entries recorded in my reflog are not pushed to or fetched from any remote
I interact with. Instead, their reflog is updated when I push (to record that
my commits were accepted & their branches/tags changed), and my reflog is
updated when I fetch (to record that their commits were accepted & my
branches/tags changed).

~~~
_wmd
And what happens when Ansible checks out the Git repository for you? Or some
equivalent shell script or deployment system, or some helpful developer hand-
deploying something with 'git clone' on the server?

Or what about helpful developer that checked in some secrets which are visible
in the repository history but not the current checkout?

Or that stupid PHP thing where 'config.php' and its MySQL passwords are world-
readable, but rely on the web server interpreting it as a PHP script due to
its file extension to prevent secret leaks.. not so valid when a copy of the
script if available as
".git/objects/00/cf74f2066b0c72a4c4b2a24ef116f1fd23df42".

But of course, even if these weren't problems, the original point still
stands: there is no guarantee .git doesn't contain secret data (such as
username:password) either now, or into the future, so exposing it is a bad
idea.

[https://en.wikipedia.org/wiki/Principle_of_least_privilege](https://en.wikipedia.org/wiki/Principle_of_least_privilege)

~~~
falsedan
That's too much commentary for me to extract from a single fake link to an
NXDOMAIN.

> _there is no guarantee .git doesn 't contain secret data (such as
> username:password) either now, or into the future, so exposing it is a bad
> idea._

The same can be said of the HTML and images, so I don't find it a useful
heuristic. Note that I was disputing your claim that a username+password used
to fetch a repo over http would leak into the remote's reflog.

~~~
_wmd
I'm not "making a claim" or inventing a heuristic, you can test this
trivially:

    
    
        $ git clone https://....:x-oauth-basic@github.com/dw/csvmonkey.git
        Cloning into 'csvmonkey'...
        remote: Counting objects: 340, done.
        remote: Compressing objects: 100% (27/27), done.
        remote: Total 340 (delta 19), reused 27 (delta 10), pack-reused 303
        Receiving objects: 100% (340/340), 138.93 KiB | 0 bytes/s, done.
        Resolving deltas: 100% (212/212), done.
    
        $ cat csvmonkey/.git/logs/HEAD
        0000000000000000000000000000000000000000 c9d566bf167dcf3556008df58be37c4a27ff5062 David Wilson <dw@botanicus.net> 1497289486 +0100	clone: from https://....:x-oauth-basic@github.com/dw/csvmonkey.git
    

If you perform a Git checkout on a web server e.g. as part of an Ansible
script, and you embedded secrets in the repo URL (common enough, believe me),
then that secret is readable per above.

FWIW this isn't some unbelievable theory or hypothetical scenario, I've seen
plenty of Ansible setups like this and found domains with this exact problem
in the process of writing
[http://pythonsweetness.tumblr.com/post/52587443706/devs-
plea...](http://pythonsweetness.tumblr.com/post/52587443706/devs-please-stop-
serving-git-to-the-outside) a few years back

~~~
falsedan
Hi,

You might not remember me. I'm the poster you're responding to. How have you
been? Me, I'm all right.

I was just thinking of when we first spoke… it seems like so long ago! I
remember it as clearly as yesterday: you had made a partially-conherent
argument that the auth creds for a git URL could leak into a remote
deployment's reflog! Oh, how we laughed, and our amusement doubled in size as
you fancied a implausible situation where the read-only deployment credentials
could be recovered from the very same repo they allowed access to!

It was much later when we crossed paths again, but your talent for sharing
inventive tales had not waned in the slightest. For this next performance, you
regaled us with the simple truth that no person can be certain that their
commit history will not reveal their darkest secrets, and thus should strictly
eschew sharing it in a public place; but that the contents of their index was
above suspicion, and could be shouted to the world without a moment's thought!
Many of us stumbled to determine what byzantine process made the working
directory automatically scrub itself of secrets, before finally the jape
dawned on them.

I eagerly anticipate our next encounter; what fresh new hilarity will you
share with us?

I hope my restatement of my understanding of your position helps make my
position clear,

\--falsedan

~~~
_wmd
I guess we're both having a bad day. Let's break down the original statement:

> git clone
> [https://username:password@github.com/..](https://username:password@github.com/..).

This command produces a new git repository by cloning the supplied URL

> will end up in git reflog

The newly generated repository's reflog will contain the credentials passed on
the command line.

> so yes, it's a problem

Assuming the newly generated repository also happens to be a static HTTP
server root, which is the subject of the thread in which you've been posting

~~~
falsedan
Thanks, that makes it clear to me.

------
jeisc
? Would it not be secure enough to put .git in the directory above the public
root:

    
    
      /mysite/.git
      /mysite/mysiteroot/index.html
      /mysite/.gitignore
      /mysite/lib/common

------
wooptoo
For nginx:

    
    
            # deny access to HG and Git repositories
            location ~ /\.(hg|git)/ {
                deny all;
            }

~~~
kchr
Doesn't solve the issue of the data being available on a public server,
though. Any piece of code run by the web server would most likely have access
to those directories, as well as any other static content in the web root...

------
eliq
Isn't this a non issue (don't need to change any config to block .git) with a
properly configured firewall and nginx proxy passing to localhost when the
code does not live in a publicly visible location? Eg-
[https://www.digitalocean.com/community/tutorials/how-to-
set-...](https://www.digitalocean.com/community/tutorials/how-to-set-up-
django-with-postgres-nginx-and-gunicorn-on-ubuntu-16-04)

~~~
bluetooth
Are you asking if this is a non-issue if you've... addressed the issue?

~~~
eliq
You could have worded it a little differently: if a folder is not accessible
in the root directory of the web server, there is no need to modify the web
server config to deny access to .git.

These type of snarky responses discourage newcomers to participate in
discussions. I have seen this happen to many people, so please dial back the
snark.

~~~
lucb1e
I see where you're coming from. From what I understand you're suggesting the
same thing as Hamcha, who currently has the top post: make the web root a
subfolder in version control, so the version control folder is above the web
root. However, when I read it, it sounded like "if you have some uncommon
setup with proxying to localhost [and then filtering out requests to .git?]"
which indeed sounds like addressing the issue. Your second comment clarifies
what you mean.

------
tutufan
Rationale for "make install" rediscovered. Film at 11... :-)

------
Kenji
If someone being able to download your source code repository is opening
yourself up to attacks, you're doing something wrong. Either you are relying
on security through obscurity, or you checked keys into git. Both horrible
practices.

~~~
falsedan
> _you checked keys into git. Both horrible practices._

Hey! That's not very kind to disparage everyone using a text file in a git
repo to manage their passwords/keys.

Putting keys into a git repo is fine! But be careful when publishing that
repo, as something you thought was private could suddenly become public.

~~~
zandor
It still is a pretty bad practice though.

~~~
falsedan
I would do it, if I was certain I could avoid accidentally publishing the
repo. I'd never describe it as 'bad', as that's extremist & it's easy for
someone who does this to misinterpret you as saying, "you are bad for doing
this and not following best practices".

------
partycoder
If the .git folder is exposed, you can download it, then do "git checkout" in
that folder and get the full working copy.

~~~
Spivak
Just like anyone can go to Github, Gitlab, Bitbucket, etc. and get a full
working copy.

If your code is public then what does it matter? If it's not then you should
be protecting it like any other sensitive information.

~~~
partycoder
You make the assumption that code is public and in a git hosted service.
Neither of those are necessarily the case.

You can unintentionally expose your repository by deploying .git by mistake.

