Hacker News new | past | comments | ask | show | jobs | submit login
One in every 600 websites has .git exposed (jamiembrown.com)
424 points by jamiejin on July 26, 2015 | hide | past | web | favorite | 205 comments



Imagine you implement every type of possible security...

Keeping your entire server-stack up-to-date, making sure you have SSL, using strong encryption for logging-in, hashing the passwords, making sure your server can only be reached via SSH, adding firewalls, filters, etc. etc.

Then some hacker in Eastern Europe comes along (or some beginner at the NSA/GCHQ) and finds out that your .git is exposed and somehow gains all vital user-data and admin data.

Being bashed with a boulder repeatedly would probably be less painful than the torture of knowing "I did it all, but they got me with an HTTP request... because nobody thought of double-checking what our VCS is doing".

How many other glaringly obvious mistakes might be out there right now? I can only imagine.


You seem to imply this is a novel attack vector. But it's really just an instance of a very old mistake:

Don't use the root of your app as document root!

It's really as simple as that. Almost all modern apps have a subdirectory "public/" or similar. That one is meant to be used as document root. You only have to ensure there are no sensitive files in there.

If you fail to introduce such a directory, you'll have a game of cat-and-mouse, where you have to add extra webserver rules for each sensitive file: VCS, crypto secrets, private keys, and so on. In that setup it's easy and very likely to forget one. Of course, this then creates the feeling of "How can anybody keep track of this never ending list of security details?"

In the end, this is a blacklist vs. whitelist thing. Like with your firewall, you want one rule that blocks everything and allows only specific stuff. The alternative is to allow for everything, have rules to deny all sensitive stuff, and finally get in trouble for having forgotten one rule (e.g. probably because an additional service was introduced after the firewall rules have been written.)


Great point. How much do you want to bet most of these are PHP, where it takes special discipline not to make your top directory web-accessible?


It takes very little effort in php. You simply have your index.php and any public assets in the document root and then use index.php as a bootstrap to bring up your application. Everything else goes outside of the document root.


Yet, almost all well-known PHP applications don't have a public directory. And almost all non-PHP webapplications do have a public directory (e.g. practically all Python/Django and Ruby/Rails projects).

So the connection to PHP is apparent till today, although it may have to do more with crappy super-cheap hosting providers than with the language itself.


Very simple setup. You have:

    .git
    web/index.php
    src/*.php
Make the document root /full/path/to/web/index.php.

Edit: Also use a deployment tool like Capistrano which removes the .git directory as well.


Yes, it is easily avoided. But it is still violated by lots of PHP projects, including well-known projects with large userbases:

    * Wordpress
    * Tiki
    * ... and so on.
I don't think this is by accident: This technique is too old and well-known to be ignored by large projects. Rather, they conciously don't do that to cause less hassle for the occasional admin: Those can simply dump it into some directory and don't have to setup a proper docroot or anything. And extra work for those who want a more secure setup.

This is clearly a usability versus security issue, resolved in the unfortunate, usual way.


I don't have insight into the decision-making at WordPress, but they probably do this because they want to be compatible with as many web hosts as possible, and so many super-cheap shared hosts just give you a public directory that you're supposed to dump everything in.

It's possible to set it up properly on most of these hosts, but it's much more difficult, and if you ever have issues the support team says you're using a "non-supported configuration." At least WordPress lets you move your config file one level above the webroot and will find it automatically.


Drupal, MediaWiki, Laravel, Slim, it's a pretty long list, and it includes projects that target Serious Devlopers.


Also, this is really PHP specific.

Look at any larger Python (Django/Flask/...) or Ruby (Rails/...) project, and you will always find a proper "public" directory, although it may be named differently.

Not sure about Perl though ...


Does it? If you're on a shared host with just the one directory, maybe. But if you are configuring your own server, you still point it to htdocs and keep the config below it.


Wrong lesson.

Don't put secret keys in your repository.

Someone getting a copy of your code should be a big annoyance at worst.


Don't put secret keys in your repository is also the wrong lesson.

The right lesson is: Know where your secret keys are and take the appropriate steps to secure them. Whether that's in the codebase, a properties/ini/conf/whatever file, environment variables, whatever - know where they are and make sure you understand possible threats against them.

This story could just as easily have been written about how easy it is to download ALL_THE_SECRETS.txt. Don't feel smugly secure just because you don't store passwords in git.


Putting keys in a text file doesn't fit the narrative of a generally-careful user forgetting about side effects and metadata.

It's important to know where your keys are, but it's also important to not store your keys in certain ways that are easily overlooked.

A lesson of "don't put secret keys inside the web root" is also useful.

But a lesson of "know where your keys are and secure them" is a bit too short-sighted. You don't just want them to be secure right now, you want the mechanisms keeping them secure to be mistake-resistant.

Don't put them in the code, even if you promise to be super careful.


Is there a general algorithm that can tell you all possible threats against your secret keys?


My approach to security, when discussing things with our engineers:

1. Make a list of everything that absolutely positively cannot live without this data/access/permissions/etc.

2. Put the data somewhere where absolutely nothing whatsoever can ever read it (except root).

3. Figure out what one single change will resolve #2 so that the things in #1 can happen without any other things gaining access.

If you don't do #1, you don't understand your requirements/applications. If you don't do #2, then your data is probably vulnerable through some other mechanism. If you can't do #3 then you probably need to change something else (e.g. stop running all processes as the same user, stop running all services on the same box, stop trusting users, set up more granular sudoers rules, etc).

What I find is that when you come up with an idea for #3, and then come up with a list of side effects, you can actually find a lot of the kinds of issues I mentioned above, for example where the public website CMS (as 'daemon') and the accounting backend (as 'daemon') both have access to the same resources, and thus someone gaining access to the CMS can get the accounting DB user/pass and get access to your transaction records, user database, etc.


No, there are an infinity of things that shouldn't have access to your secret keys. So you have to take a default deny approach, and ensure that only things that positively should have access to your secret keys do.


Where is the right place to store db passwords, api keys, etc? What is best practice in this area?


I use AWS for a number of applications, so I've started doing the following:

1. Create a JSON file containing encrypted secrets (DB pass, etc.)

2. Upload the file to a secure S3 bucket with fine-tuned permissions and server-side encryption

3. For the instance that is launching, include the permission in the IAM role that allows it to "S3:GetObject" on the specific JSON file you uploaded.

4. Deliver decryption keys to the app in some manner (Chef, Ansible, etc.)

5. When the app starts, it downloads the JSON file and loads the environment variables.

I wrote a blog post[1] about this with more detailed info and an NPM module if you use Node.js.

[1] http://blog.matthewdfuller.com/2015/01/using-iam-roles-and-s...


This system is truly beautiful, I think one of the best suggestions in the thread (definitely the best self-rolled solution not using other tools)

I have a question about redundancy or "What happens if your gatekeper EC2 instance goes down"? If you have multiple gatekeepers could they be set up this way:

- let's say you have five different web apps using a gatekeeper to hold their secrets

- let's say you have n gatekeepers (let's say 3) and each of the apps knows the address of all three gatekeepers.

- If the primary gatekeeper is unreachable, all five apps would try to contact the secondary gatekeeper, but that gatekeeper would only (ever) respond in the event that the secondary gatekeeper also found the primary gatekeeper unreachable.

It's like a sleeper cell - at any given moment you have multiple replacement gatekeepers ready and waiting to serve, but each of them is unable to respond unless the one above it in the list stops responding. In this way you could lose gatekeepers (even permanently) and build a little bit of resilience into the apps depending on it while you're able to sort out what happened and restore normal behaviour.

Is this a good idea?


I'm confused by your reference to "gatekeeper EC2 instances." In the scenario I described, the secrets are housed on S3, not a separate EC2 instance. So, theoretically, as long as the underlying instance running the application code can access S3, and S3 doesn't go down (very unlikely), there shouldn't be any issues.


Yeah, we take this same approach. I wrote an ansible module that does this.


We (Shopify) use https://github.com/Shopify/ejson -- we store encrypted secrets in the repository, relying on the production server to have the decryption key.

It's relatively common to provision secrets with configuration management software like Chef/puppet/ansible/etc using, e.g. Chef's encrypted data bags.

Another slightly heavier-weight solution with some nice properties is to use a credential broker such as Vault: https://www.vaultproject.io/


For ansibile the built-in solution is: http://docs.ansible.com/ansible/playbooks_vault.html


Just wanted to +1 the suggestion for Vault - I've found it to be a really nice balance between usability and security.


Environment variables are the best and easiest way that I know of. You can supply those anyway you want to, and any programming language can easily get their values.


Glad I don't work at your shop then. Environment variables are a terrible way to give your app secure information. There's well over a dozen reasons why you shouldn't do this in your apps, but one super obvious one is there's way to many frameworks that expose environment variables in their debug output if not properly configured. Think you'll never misconfigure a server? Guess again, pretty much every major site (Google, FB, Twitter, Yahoo, EBay, Microsoft, etc) have all done it at some point.


Please review HN's guidelines on civility.


Fair point, I potentially should've left off the first sentence. I stand behind the rest of the post, but the first sentence is a bit on the edge and I apologize.


Alright, well, I've never seen an application/framework spit out environment variables when it was misconfigured. But then again, I barely work with web-related stuff so maybe I just don't use the kind of software that does this. Could you provide some examples?


Many web frameworks do this when in "debug" mode.


Your comment sounded more colloquial than uncivil to me, but thanks for responding so respectfully.


[flagged]


I'm glad you made a new account named 'shutupbitch' just to tell me this. Thank you for your contribution.


Please don't feed trolls.


The "dump environment" problem is an issue for novice developers, but mature shops should have security-conscious frameworks for secrets handling that do things like clear the variable from the environment at initialization time.

What are your other 11 objections?


I'm surprised this isn't higher up. Are there any arguments against ENV variables in favor of something else?


Here's my caution to this. If low level processes can do "ps aux", and they see something like:

DB_USER=scott DB_PASSWORD=b3withm3pl3aze /usr/bin/python webapp.py

That could be troublesome if an attacker figured out a way to run remote commands on your server even as an unprivileged user.


When I test that, the command line does not include the variable-setting. Is this a problem that depends on version or are you mistaken?

It would be kind of weird to include that considering how argv works.


Reading the environment of another process is a privileged operation.


If an attacker can get the process that's running the webapp.py to exec some abitrary bash command, that process has the ability to read its own /proc/$PID/environ . In general, you can read /proc/$PID/environ on processes that you own. At least I can do that on my Debian system:

    pikachu@POKEMONGYM ~ $ sleep 99 &
    [1] 21340
    pikachu@POKEMONGYM ~ $ cat /proc/21340/environ
XDG_SESSION_ID=5COMP_WORDBREAKS= "'><;|&(:TERM=screenSHELL=/bin/bashXDG_SESSION_COOKIE=8571b679eed8952dd96ad28a54...<etc>

(I actually gave the wrong example in my previous comment. While it is true that giving the ENV on cmdline will show up in ps eaux, the more appropriate example is what I just explained in this comment.)


If you can get it to exec some arbitrary bash command (or otherwise access the environ of a process) you can also have it cat any file on the server, and even the memory of the running processes that belong to the same user as the exploited process, and also execute network requests. So if you get that far, pretty much nothing will protect you.


Sure, but there are some shops that do their security from a point-of-view of "Attacker can run commands on your server as the user that started whatever-public-service/webapp/api", and go from there. I happen to think that's the best way to think about it.

Now, if an attacker manages to get root access then it's game over[1]. That just shouldn't happen. But nobody should be running their webserver as root. So, whatever that user is should be low-powered with only enough privileges to start the webserver & bind port 8080 (and use iptables or whatever to reroute connections to port 80 --> 8080) and the whole setup should be designed that this account won't be able to escalate things further if someone got a bash shell to it.

______

1. You should at least have some way of detecting that it happened and consider all data & files compromised and just wipe the whole machine & start over. Or take that machine offline for investigation into what happened and put a fresh new one in its place.


If an attacker can run an arbitrary command on your server, it's already time to rotate all the credentials in your system and let any data subjects whose data you hold know that you fucked up, big time. That's just the Linux model.


The example above is someone who have stupidly started a process with the environment variables exposed on the command line


Ok, but that's not a problem caused by the exposure of the environment, it's caused by the exposure of the command line.


I agree - I was just explaining what the issue the above commenter raised. It just means you should use a saner way of initializing your environment with sensitive values.


ENV variables are inherited by child processes by default, so please use care when using this approach.


My preferred solution currently is to use try to use encrypted strings in config files that are not stored in VCS. The host machine encrypts and decrypts using host specific keys so if the file is copied off-server, it is not fully compromised immediately. This is usually via python script which rewrites the file. (BTW, pretty easy to do on Windows boxes with MS API). I've considered using encrypted folders on windows in addition but not sure if that really makes a difference.

Usually the base config is in VCS but without user/password/db strings. We then manually configure the file with the encrypted strings on the server (usually with the machine name in the filename so that we can use hostname in code to find it and makes it clear the file is machine specific). Not all tools make this easy though and only works if you can add your own code in between. Also prefer files to environment as the files can be locked down easier in my opinion and more obvious what is going on.

I like some of the other solutions that are using encrypted strings but with a keystore server and may consider for the future if they support both windows and linux.


Stack Exchange's blackbox [1] is one solution. I haven't played with it personally and I'd love to hear other people's take on what's worked for them.

[1] https://github.com/StackExchange/blackbox


FWIW I have a /private directory in the root of all vhosts, so it looks like:

    /srv/www/domain.com/public_html/
            |--------->/private/
            |--------->/logs/
            |--------->/tmp/
Anything stored in /private/ is not publicly accessible by the web server process, but can be read or written by anything running under the user's username.

It's specifically for storing things like configuration files.

I think this should be standard practice.


Thanks for the tip. How do you keep passwords and keys in sync amongst team members safely?


I only just recently had to figure that out. I opted for setting up a .kdb KeePass file in a private git repo and giving everyone ("everyone" = myself + one other) access to that. I'm pretty sure that's not a very good solution.


Why do you store such stuff under /public_html anyway? One level higher would be more appropriate I think.


It's not under public_html. It's under /srv/www/domain.com. It is one level higher.


Oh, that makes more sense. Sorry I got confused by the ascii tree.


http://12factor.net provides a good guidance for this at a high level. In reality all config should be separated from code.

There's a variety of mechanisms for loading this into your environment.


You can use a datastore like HashiCorp's vault: https://vaultproject.io


Windows has something similar in DPAPI - https://msdn.microsoft.com/en-us/library/ms995355.aspx


Config files that are not version controlled, or environment variables. I prefer config files because it's easier for me to communicate to other team members what needs to be present in their local development environment.

I typically handle this by versioning a `config.example` file, which includes all the necessary config keys an application expects. The example file defaults these attrs to various strings meant to show they are examples only. I include instructions to copy the `config.example` to a `config.yml` (or some other appropriate extension), and replace the values as necessary. The `config.yml` file is specifically excluded in the `.gitignore` file. The application will only load the `config.yml` file when started, so I also ensure to raise a descriptive error informing team members when they are missing a local `config.yml`.

This allows the `config.example` to also serve as a self-documenting config for the application, as comments can be included that identify and explain each of the config keys and their purposes.


Keywhiz is a good solution. See here for some background info: https://square.github.io/keywhiz/

Disclaimer: I worked on it.


I store dummy values in VC, then edit the real data on the production server. (And I obviously never check anything in from production, if you can set the production VC user to read only.) This has a nice side effect that if I edit the configuration file the new stuff gets merged in without causing a mess.

Another way is a second file that overrides settings as needed. Although I have found that to be less maintainable if the configuration file changes. That file should be somewhere entirely out of the VC tree.

Either way, the file must be placed in a directory that is not served by the web server.

/include

and

/public

are traditional. Only /public is exposed by the web server.




For me, I have connection criteria for a configuration database as environment variables... the config library will then connect to the configuration server with those credentials and get everything that application needs to connect to other services... I'd considered using etcd for this, but was unstable for me at that time... I keep settings cached for 5 minutes, then the library will re-fetch, in case they changed.


I think that if you're using git,

    use git crypt[1][2]
    use git-dir and work-tree options/env vars
[1] https://www.agwa.name/projects/git-crypt/

[2] though you have to remember to git crypt lock


In a configuration file that is not version controlled, or even environment variables, so that your application starts with the right variables, but they are not in some config file.


How do you communicate that data amongst team members then?


As I detailed in my other response to your original question, use an example config file that is version controlled. It includes all the necessary config keys, but example-only values. All team members would then be able to easily create a local config file based on the example that works. You can even document the config with comments in the example file so devs know what is needed and what it's for.


I think at one point, if you have a shared password for a development DB, production DB, etc. then just keeping those on a pen and paper notebook is your best solution. Usually, for shared environments such as that (although I hope the team can set-up their own DB's for development!), the number of shared "secrets" is relatively small. Some secrets are best not stored electronically, especially if they can give away user data.


Can also try maintaining a different repository of passwords and and pulling it on to the server during deployment.


This is the asymmetric nature of security in general.

You only need to make a single mistake and you are hosed. Your attacker can fail an arbitrary number of times and only needs to succeed once.

If you are 99.9% likely to make the right call on anything that could have a security impact then you only need to make 1000 decisions before you probably screwed one up and have a hole.

Some would say this means true security is impossible.


As humans are fallible, it's inevitable we will make mistakes carrying out even what we intend to do, when we know it and are actively trying to do the right thing.

Sometimes our mistakes are not recognizing the right thing to be done, or not recognizing anything at all.

I think the view that true security is impossible is true if security is up to one person. What about a system with multiple layers of sign-off, or an automated system that can help test the security of what you're doing and alert/prevent dangerous behaviour without specific reasons.


A good practice is to disable features that you don't use. I don't think many people need their hidden files to be remotely accessible, so maybe they should either remove the permissions or set a flag in their server so it doesn't allow downloading them.


I did a conference talk at derbycon on exactly this, regarding startups. The amount of obvious holes of founders not knowing what XSS is, or writing bad PHP apps with obvious code execution vulns, or glaring logic and auth mistakes allowing full account hijacks is incredible.

It's really bad out in AppSec land


Hell, I recently saw an application where there was unchecked input for being able to download files outside the application... if you passed it a path of, for example `../../somefoo-file` would take you out of that application's path.


This is called either a Local File Inclusion or a Directory Traversal Vulnerability. The name depends on the details. It's really really common, and definitely something I see a lot of.

The OWASP Top 10 is deadly.


Link to your talk ?


It was my first conference talk. If you would like to ask any questions email me at the one in my profile

https://m.youtube.com/watch?v=wzrVYyouQTk


This is what makes security hard: To attack successfully you only have to find one significant mistake, to defend successfully you can't make any mistakes.


What "all vital user-data and admin data"?


Well, that's why you either have a team of competent people making sure all your stuff is up to date, routinely performing pentests, etc., or you delegate as much as possible of those responsibilities to 3rd parties (e.g. Heroku).


Doesn't matter. Eventually, there is a 0-day, that no one knows about, that is used on you.


For Apache,

    <Directorymatch "^/.*/\.git+/">
      Order deny,allow
      Deny from all
    </Directorymatch>
    <Files ~ "^\.git">
      Order allow,deny
      Deny from all
    </Files>
https://serverfault.com/questions/128069/how-do-i-prevent-ap...


Would it be safer to not put .git within the reach of the webserver, and separate the development path from the production host path?


It certainly would, however like another comment I replied to ITT security doesn't have to be either-or it can be "do all the secure things".

I.e.

* No secrets in your repo

* Only copy to the server what you need (and automate this)

* Add conditions to your web server to not serve up .git, in-case the previous two checks failed.

When working in teams I think having additional checks and balances and not one 'perfect solution' is vital.


Extremely good point - we should always aim for security in depth and especially don’t rely on people not doing stupid things, because if it is possible then someone will at some point.


Don't you specifically have to configure Apache to allow access to dot directories in the first place? (disclaimer, I didn't read the article... but if access to dot directories were enabled.. ya, .git would be exposed, but I'm pretty sure it isn't the default Apache setting.)


As far as I knew .ht* is the only file(s) not accessible.


Better yet:

    $ rm -rf .git/
It's way safer to delete the repo history from the production server than to rely on Apache rules copied from a forum.


Better better yet, don't use git to move code from test to prod. Use rsync, and exclude .git and other nuisance files.

Unfortunately I can't seem to convince anyone that this is good practice. :-(


Amen! Use

   rsync --checksum --ignore="..."
unless you have a damn good reason not to (--checksum is essential to prevent corruption/malicious modification, without it you are implicitly assuming the version on the remote machine is exactly how you left it: that assumption is why Linus built git around shasum in the first place).

rsync is even easier than SSHing to git pull, or opening up a pushable repo on a server. For once the simple approach is clearly better!


> (--checksum is essential to prevent corruption/malicious modification, without it you are implicitly assuming the version on the remote machine is exactly how you left it: that assumption is why Linus built git around shasum in the first place)

That's not true (though not completely wrong).

rsync is stateless. It does not assume the version on the remote machine is "exactly how you left it"; Rather, it compares file size and file modification time; if either changed, it will do a transfer -- efficient delta transfer, usually - which might be as little as 6 bytes if the contents is exactly the same.

--checksum makes it ignore file size or modification time, and compare the file checksum in order to decide if it's time for a transfer (delta or not).

A malicious actor, or bad memory chips, might change your file's contents, but keep the file size and time/date the same. In that case, --checksum will overwrite that file with your source version, and a --no-checksum wouldn't. So it's not bad advice. Whether the cost in disk activity is worth it depends on your thread model, data size, and disk activity costs. (Though, if corruption is due to bad memory, this is the least of your problems)

However, a corruption because of a program error / incompetent edit to the file is very unlikely to leave both the size and modification date intact - and a standard rsync will figure that out as well.


> Whether the cost in disk activity is worth it

If the comparison is with using Git, then it is clear you're not so resource constrained that you can't countenance running MD5s, since Git would run shasum.

I think in our current laissez-faire climate w.r.t. security, I think recommending leaving security on the basis of saving a few cycles isn't very wise.

> It does not assume the version on the remote machine is "exactly how you left it"

I was ambiguous and sloppy, sorry. It doesn't not check for changes in a secure way, but assumes that, as long as the meta-data for the file matches, the content is as you left it.

When Linus built git, he specifically did so around sha1 to ensure that you ensure the data you think you have in the file, you do in fact have. rsync --checksum is thus a reasonable replacement for git deployment, but rsync --no-checksum isn't, imho.

Sorry if I was vague or misleading, thanks for the clarification.


I'm personally a fan of using git-archive to make a tarball that can be deployed. These tarballs won't contain the .git directory and can be pushed/pulled instead.


One approach that I've found works well (YMMV, etc) is deploying with Ansible. It has a Git module built in (so it's almost 0 work to configure), and you can set up SSH agent forwarding so you never have put keys on the server that have access to your source control, nor manually SSH in and pull.


Amen. Don't use git to deploy code. Use it to version code. Use a script on your CI to compile/test/minify/convert your code into a deployable tar ball and stick that somewhere highly durable like S3, Swift, or your own company filestore.

Remember, git providers go down (i.e. DDOS to GitHub or internal fail at BitBucket). Don't depend on git being up to deploy your code or you'll look like a fool next time a DDOS at GH coincides with a deployment.


Probably better to link to the stackoverflow you quite possibly copied this from so that people can see discussion / alternatives, etc: https://serverfault.com/questions/128069/how-do-i-prevent-ap...

See also the nginx question: https://stackoverflow.com/questions/2999353/how-do-you-hide-...

Note, if you actually did take it from the stackoverflow, you just infringed on someone's copyright; SO's user content is 'creative commons, attribution required'.

Edit: Thanks for adding the attribution, all clear with copyright now :)


Note, if you actually did take it from the stackoverflow, you just infringed on someone's copyright

Does a simple access rule, which I can't see there being many sane ways to express, meet the minimum level of creativity/originality to be eligible for copyright...?


Probably not, but it depends on the court and the skill of the lawyers.


I was busy editing it, thanks though.

edit: sounded wayyy too snarky lol.


[deleted]


HN does not use standard Markdown. It uses a very simple markup language possibly inspired by Markdown, but with much more limited functionality.

On HN, only two spaces are necessary for code:

  example
And it only supports asterisks for italics, two blank lines for paragraphs, and turning URLs into links; it doesn't support any of the rest of Markdown.


Obviously you shouldn't be storing sensitive information in your codebase (I hope everybody knows that), but the problem here is that you might have been way back when you were prototyping and then moved them out of the codebase. It's really common to start a codebase just by hacking something together with hardcoded secrets.

If you have the proper secret segregation now, but you're deploying by doing a git pull, now you run the risk of not really having segregated secrets all over again.


You probably should revoke all your existing credentials and replace them with fresh ones as soon as you pull them out of the VCS. That way, your attackers have the credentials, but they don't work anymore.


We've left test keys in our git repos. They don't work for anything except a virtual machine used for local development, but I always thought it would be amusing if a hacker grabbed them and got frustrated trying to use them.


So you leave them in as a kind of poisoned honeypot. They'll go right to the wrong info....

I'm sure you've daydreamed of the facial expression of the scriptkiddie the moment he stumbles across your fake keys illicitly, only to be disappointed hours of unfruitful hacking later :)


> Obviously you shouldn't be storing sensitive information in your codebase (I hope everybody knows that)

Sadly, in my experience hardcoding secrets such as (database) passwords and encryption private keys is not uncommon at all in web applications. I don’t like criticising other developers, but sometimes the people who get to make these decisions don’t necessarily have the perspective or experience to make the rights calls.


Growing a project from one shot mindset prototyping is really problematic. Every time I wish I started by using a real project structure and design philosophy.


And then every time I actually start a project with a real project structure and design philosophy, it goes nowhere and I wish I hadn't wasted the time. Or best case, it's used by a few people internal to whatever company is currently employing me, and security doesn't really matter.

The tech industry is shaped like a funnel, with lots of raw, bad ideas at the top and a few smash mega-hits at the bottom. 99% of the ideas at the top are bad; investing more time than is necessary to prove them out is a mistake. 100% of the ideas that make it to the bottom wish that they'd spent more time designing things at the top. But y'know, if they'd actually done that, they wouldn't have made it to the bottom, they'd be outcompeted by the guy who got a quick and dirty prototype up, made his users happy first, and then closed the gaping security holes (hopefully!) before anyone noticed.


I can't deny that nature likes quick n dirty, but I wish I could just find a balance between reckless and too slow.


The balance is whatever works, gets people using the product, and ideally keeps them happy.

The balance is generally far more on the quick 'n dirty side than most engineers (myself included) would prefer, but we could look at this as a cognitive bias of engineers rather than a failing of nature.


Once it gets to a certain level of "no longer prototype", it can help if you then start VCS fresh by initing a new git repository. You lose the prototyping history, but you probably won't need it anyway.


Retroactively remove them from the commit history. Your sensitive secrets should not be on every developer machine.


Is there an automated "security as a service" service that if I subscribed to it, it would have told me that this is a problem on my websites?

It really annoys me randomly hearing about critical security issues through tech news websites - there should be a more systematic way for "non-security professionals" to ensure their sites are protected to best practice levels.


There's a service called Detectify that might fit the bill. https://detectify.com


Mail me at mdcrawford@gmail.com if you'd like to be my very first early adopter.


It seems Google doesn't like people looking into the extent of this problem [1].

When googleing for "inurl:.git", it returns no results. And on top of that, I need to enter a captcha first?

[1] https://www.google.be/search?q=inurl%3A%22.git%22


The .git directory wouldn't be crawled though, correct?


Most of the time it wouldn't be crawled.

But zero results is blatantly wrong.


It looks like for some reason Google actually searches for the character "." (U+FF0E, "fullwidth full stop") when performing those sorts of queries, not "." (U+002E, "full stop").

For example, check the URLs of these search results: https://www.google.com/search?q=inurl:%22.hello%22


I've had more success with 'intitle: Index of /.git'



So google doesn't like me after doing that search.

I suddenly started getting a captcha for searches right after using that search. o_O


I often get captcha requests when doing any google search with inurl or intitle.


It seems like if you're storing secrets and the like in your code's repo, the solution is to not do that, rather than just putting a bandaid over it by hiding the repo.

Deploy the secrets separately: they don't belong in your site's codebase.


Hiding the repo is hardly a bandaid. It should never be exposed even if the repo is perfectly secret-free.

Except in the rare cases where it is intentional e.g. an open source repo and you happen to want people to download it from the same domain not github or git.domain.com.


Can you elaborate on why it shouldn't be exposed?

Presumably, for most commercial entities, the parts of the site that are valuable are the assets, which are served from the site as part of it doing the thing it's meant for. For a large percentage, they're running a CMS like Wordpress or Drupal or whatever, where the codebase is public anyways. And for even more, we're talking about a directory of hand-crafted HTML files, where the version control is the HTML of the site plus some "damn, I always forget to close my tags" commit messages.


> Can you elaborate on why it shouldn't be exposed?

Because there is no need to expose it. If you expose it then you need to vet not only it's head but it's entire history, and for what purpose?

> For a large percentage, they're running a CMS like Wordpress or Drupal or whatever

I doubt someone running Wordpress would be using version control anyway. Most of the time it is WP + some standard plugins and no custom coding.

> And for even more, we're talking about a directory of hand-crafted HTML files

And anything else you happened to have in your directory. Your passwords file? Even if you commit a delete there is still history!


> Can you elaborate on why it shouldn't be exposed?

Because for a non-trivial number of websites their codebase is their IP and product and not something they want to be public.

I'm all for open sourcing as much as possible, but Google doesn't publish their search algorithms for a reason.


In nginx, best to just not serve dot files:

    location ~ /\. {
        deny all;
        access_log off;
        log_not_found off;
    }


This returns 403 and in my opinion logs should not be turned off for that. I would return 404 to not expose that you are blocking . files with your server. My suggestion to put in each server { ... }

    location ~ /\.  { deny all; return 404; }


More precisely, it's "one in every 600 websites examined"

Git is popular, but I find it hard to believe that 1/600 of all websites on the Internet use it.


You find it hard to believe that 0.17% of all websites use git? I'm sure 10x that many do, most probably don't misuse git to deploy rather than solely as a source code manager.


Most websites are made through Wordpress, Squarespace, Wix, and similar products. I bet the number here is far lower than 1/600.


I agree with your first point, but not your last.


Related question: Is there any risk to exposing .git if your Git repository is already publicly available (e.g. on GitHub)?


If it's exactly the same repository - no. If it contains some extra branches with local changes, or potentially commits with private information / passwords - definitely.

So in general - it's better not to have it in the first place, because it's unlikely that the person doing the commits knows the whole deployment strategy.


If you don't accidentally expose your credentials in .git/config to push changes to the repo (e.g. GitHub username/password for http auth) there is no risk.

Still - having a real deployment process is much better. It could be as easy as extracting the contents of a tarball generated by git archive.


No.


Not going to name names, but there are mobile apps have their .git packaged up with them too.


90% of security incidents are due to human errors, not to some secretive hacker group spending $10m to crack TLS. Doing system administration right (eg. no secrets in repos) has a lot more impact on security than implementing all the other complex controls.

Of course, doing everything is much better.


I wonder what would happen if you searched for .svn, too. I'm sure you'd run into the same problem in many places. But would it be more or less likely to occur?


I think less likely. Svn actually had an `export` command, which allowed you to do a checkout of a specific commit with no svn metadata. If someone was actually using svn for deployment, they likely knew about it. (http://svnbook.red-bean.com/en/1.7/svn.ref.svn.c.export.html)


git has `archive`, which is essentially the same thing.

[1]: http://git-scm.com/docs/git-archive


No, archive is very different. `svn export` could be used at the target side. Basically you can export from remote repository to the chosen directory without any extra operations, so .svn is never created.

`git archive` requires you to have a clone of the repo from which you can create an archive. That means people are more likely to just do a local checkout than play with archive on top of it.

Deploy with svn export:

    svn export url.of.repo destination/path
Deploy with git archive:

    git clone url.of.repo
    cd repo_name
    git archive --format=tar some_commit_or_branch | (cd destination/path && tar -xf -)


I completely agree, every time I do a new deployment script for git, I miss the old svn export command, I hate the idea of cloning on the deployment machine, having to git sync in the script etc. Whereas 'svn export' feels stateless, and is clean by construction.


In svn's heyday, the standard way to install or update a popular app like WordPress was to download and extract a tarball. Only people who actually participated in the development of the app itself used svn.

Nowadays, lots of open-source projects encourage ordinary webmasters to clone a Github repo and run `git pull` to update.

So I suspect that public .svn folders will be less common.


That's problematic in itself. "git" has no business being installed on your public-facing web server. Nor should there be any compilers installed, nor any scripting languages not explicitly needed, etc. etc.


https://news.ycombinator.com/item?id=838981 - similar breach was reported in 2009. It focused on .ru part of internet and got a bunch of big names.


Some of the other commenters suggest adding git-dir and work-tree to the git commands, but there's a better solution: use the --separate-git-dir option when cloning the repository.

For example:

    git clone --separate-git-dir=<repo dir> <remote url> <working copy>
where <repo dir> is outside of any directory served by the web server and <working copy> is the htdocs root.

This option makes <working copy>/.git a file whose content is:

    gitdir: <repo dir>
The advantage is that all git commands work as usual, without the need to set git-dir and work-tree, and that there's nothing special to add to the web server configuration.


I disagree

It may be possible that gitdir is still accessible through a misconfiguration or security issue (and you're giving them exactly where to look)

Production servers have no business having the .git directory anywhere.


> Production servers have no business having the .git directory anywhere.

I agree with you in principle, but in practice this is not always possible, there are situations when having a git checkout in production is better than nothing.

I've seen WordPress sites where a semi-technical administrator updates plugins and themes directly in production: with that git checkout I would at least be able to track the changes and pull them in a dev or staging environment.

This could be the first step to a saner deployment workflow for those sites, where production gets changes that have been tested and validated elsewhere.


A comment mentions this deep below, but I think this deserves a bit more attention:

If you're using a modern framework with url routing, you don't need to worry about hiding .git or .hg in your webserver config file.


I once jumped-in on a PHP project where the previous developers had written:

    $page = $_GET['page'];
    include ($page.".php");
Whilst allow_url_include (http://php.net/manual/en/filesystem.configuration.php#ini.al...) was set to false, I could still craft a URL like:

http://example.com/?page=admin/index

which expanded to http://example.com/index.php?page=admin/index where the real admin was at http://example.com/admin/index.php and offered complete access to the backend without authentication or authorization - let alone other files in the file system.

In another project, I found that the server had register globals turned on, and therefore could craft a URL like:

http://example.com/admin?valid_user=1, where valid_user was a PHP variable set to true iff their session cookie could be authenticated in the database.

I think it's terrifying that these things still make it through to production websites


Someone on StackOverflow says this will tell nginx not to serve hidden files.

location ~ /\. { return 403; }

My question - do I need to put this once at the top of my configuration file and all it good or does it need to go into multiple places in the nginx config?

It would be great if there was a simple, universal way to say to nginx "don't serve hidden files from anywhere under any circumstances".


it has to go in every server { ... } section. Also use "deny all;" to really block. See my other answer here in comments.


[deleted]


Best thing for nginx is do an include in each server {} block.

    # /etc/nginx/deny-dot-files.conf
    location ~ /\. {
       access_log off;
       log_not_found off;
       deny all;
    }

    server {
        include /etc/nginx/deny-dot-files.conf;
    }


Is there no way to set that universally in nginx?


Looks like this returns some results inurl:.git "intitle:index.of


Why are people serving web traffic to a folder with a .git folder anyways? I thought it was basic deployment practice to export your code OUT of the VCS before deploying... every shop I've worked at had this in place.

Other solutions just seem hackish to me, but every project is different I suppose.


So that deployment is "just" git pull.

I don't get it either.


I have a fake /.git on my personal website to troll would-be hackers into wasting their time. (PROTIP: I don't run Laravel there.)

So far a few people have requested my .git/ directory but none have attempted to plunder the riches they think they'll find within.


What I find more interesting is that github is full of passwords and credentials.


This is one example for why hierarchical directories are bad. They're not all bad, but with all their power and flexibility, they carry some inherent flaws. It's much the same as JSON, which is also versatile and highly useful. However, both of these abstractions tend to lead to the ad-hoc creation of more complexity and more details to remember, while having no clearly delineated way to be self-describing.

(Is JSON an abstraction? Not really, as it's a concrete spec, but its general kind of serialization format is an (incomplete) abstraction.)


It always felt unclean to have the source repo on production, so we use deployhq.com (similar: circleci.com, Jenkins) to push changes up when code changes are made, rather than pull them from git. We also use a clean-up script that removes any SASS, Grunt, etc source files when deploying to production.


How to check all your nginx access logs (also compressed) in Ubuntu if anybody has accessed your .git directory (example with root access):

    apt-get install zutils
    zgrep -r "\.git" /var/log/nginx/


How can a project be accessed/downloaded with just the .git folder?


Even with directory listings disabled, you could request http://example.com/.git/refs/heads/master to get the sha of master, then get http://example.com/.git/objects/<sha 2 char>/<rest of sha> to download the data for that one sha, then you can get the rest.

Wouldn't be terribly difficult to write a script to crawl it. Git's format is well known and documented.


It's clear the problem involves some PHP sites developed with git and instead of using a specific www directory inside the project the server points to the root folder of the project thus exposing .git (and the rest). Classic dumb error by PHP developers. I have hard time believing one would be able to expose the .git folder in a Rails,Spring or Django application since the public folder isn't the root folder of the project.

I wish servers would be configured so they don't server ^\..+$ files by default. I wish servers would behave as secure as possible then it's up to the developer to whitelist features rather than the other way around.


Excluding anything that starts with a period also doesn't work - RFC 5785 specs the folder .well-known with special meaning.


True, but you can whitelist /.well-known/. I don't think anything else uses dot-filenames in URLs, because not all operating systems and software even allow such file names (for instance, the file browser in Windows forbids it when creating a new file or folder).


PHP? Static sites, ASP sites, CGI sites, these would also be vulnerable. Don't be so quick to laugh.


There are configs within nginx that prevent this from happening. Also, I symlink only the publicfiles from a different directory into my /var/www/html folder.


The author doesn't give any suggestions for alternative ways to deploy.

What are the best practices here? What should operators that currently deploy this way do instead?


The author doesn't need to give such suggestions; that's out of scope of this article. There are many possibilities however.

This simplest solution is to add something like ' location ~ /\.git { deny all; }' to your nginx config (or just hide all paths starting with '.'; you rarely want to serve hidden files).

The most important practice for remediating this is to have a very clear disconnect between deployment artifacts and development repositories. You want to have your developers write code, and then some preprocessing step to distill that repository into the minimal deployment artifact. This artifact then goes through your integ tests on a gamma stage and onto prod. Whatever deployment system you use should know what git commit this artifact originated from, but the artifact should not have that information on its own.

If you want the really poor man's version of this, just have a build system in your repository which populates an "out" directory with everything you actually want to deploy, and then rsync that sucker around.

If you want a hip answer to this, Docker has the "Dockerfile" which will specify all artifacts that should be added into it, and should serve as a distillation of what's needed to run your application.

I'll point out that this issue won't affect many types of sites in the first place. For example, if you use any sane rails setup, you already have the "public" directory which you use as your static files root, and then you proxy to the unicorn/rack/whatever process which will never serve such files.

I would wager that the majority of these sites are poorly configured apache2 + php messes since php made the massive mistake of having the filesystem act as routing; any web framework with good routing (e.g. rails, django, even sinatra, web.py) will not suffer from this without going out of your way a bit.


I use a separate work folder.

Like this:

  git --git-dir=/foo/bar.git --work-tree=/foo/bar.work init

  git --git-dir=/foo/bar.git config receive.denyCurrentBranch ignore
Then create (and chmod +x): /foo/bar.git/hooks/post-update

  #!/bin/sh
  work_tree=/foo/bar.work
  GIT_WORK_TREE=$work_tree git checkout -f
Then you just create a symlink to the work tree for your website root, or put the work tree there, or whatever, depending on preference. Can't say this is perfect, but it works pretty well for smaller projects.


I personally prefer to have my web directory be a sub folder of the project root. This not only solves issues with .git directory be accessible but also helps prevent gotchas with exposing documentation or other data not intended for public consumption.


I simply use a 'checkout' process where all website assets to be deployed are copied into for example a /dst folder and the folder then rsync'd to the webroot (with rsync -ru --delete --chmod). This avoids clutter and still allows to just update changed files. The actual page source is still under git.


599 out of every 600 sites don't have the problem of a public facing .git folder. I don't think alternative deployments need to be suggested as they are commonplace.

It depends on what sort of platform and server you're running on but just do what you have to do so you're not serving your .git.


Doesn't change anything, but worth noting, these are not necessarily 599 git-using websites being compared.


For fairly modest sites or projects

1. Configure your webserver to hide common directories and files.

2. Don't store passwords / credentials in version control. 2a. If you use a test suite, add a test to verify this.

3. Have a make step that builds the deployment content into a separate directory. (e.g. a gulp deploy task)

4. Rsync the deploy content to your destination server.

Instead of 4, more complex systems (with multiple servers, libraries, etc) use Docker to build an image from your deloy content.



You can use git-archive (http://git-scm.com/docs/git-archive , examples provided at the bottom)


Besides what TheDong has suggested (hiding your .git folder from public view), you could also have a build server that in the end tarballs everything up and deploys it onto your live servers.


Perhaps you didn't read my response fully; after the initial 'simple solution', having a deployment artifact is exactly what I suggest.


Eh yep you're right. Should have read slower.


We deploy from git via "git archive". Wastes a bit of space, but there's no extra resources on the server.


Personally I would use standard practices of the open source world; package your app properly into a .deb/.rpm, deploy that and push configuration for it out using puppet/ansible/etc. aka, debops!

https://enricozini.org/2014/debian/debops/


I would dispute the idea that packaging your web-app (which is what the article is talking about) into a .deb or .rpm is a standard practice.


I see a lot of discussion here about best practices to avoid this, but nothing about the obvious question: Why is it that the go to version control system for developers everywhere is designed such that there it puts a single file at the root that magically gives anybody that can see it the ability to download all the source code in the repository?

That is:

  - completely non-obvious and unexpected
  - a terrible idea
Why, instead of trying to figure out how to avoid handing this magic file out to everybody, are we not trying to fix it so that no such magic file need exist?


Where else would you suggest Git keeps the version control information for a repository, if not at the root of that repository? SVN tried it in every subdirectory, which is worse, using a separate db server would add unwelcome external dependencies. It could be in a non-hidden folder, but that's annoying for those who just want to see their files under VC, not an implementation detail of the VC method. Also, there are hundreds of dot files scattered around your computer, and none of those should go anywhere near your host - .git is just one example.

The error here is uploading sensitive or hidden information to a web host into a public directory, not how it is stored locally.

If you use the root of your app, including source code and hidden files, as the public directory of your website, one permissions error means all sorts of things might be exposed, e.g. other dotfiles would also be exposed, and potentially all of your source code too, because you're relying on the web server to hide it somehow in every instance. That's the problem that needs fixed here (exposing the wrong files to public root), not that one particular hidden folder exists.


The terrible idea is using the root of your source as the root of your published application, which has been considered bad practise as long as I can remember (and that is long before the existence of git).

The root of your project should contain nothing more than build documentation/scripts and other developer/user notes & scripts (which for a web application could be as simple as "expose the subdirectory call "public" via your web server" but could be much more for more complex applications that have larger build requirements).

Ignoring this long held recommended practise (which less experienced developers might not be aware of): git originally came from an environment where you couldn't simple expose your repository as your application. The Linux kernel and other projects needed building from source before they could be put into production. So this is in part due to people using a tool in a new context without sufficiently thinking about the possible implications (that the tool designer, thinking about other environments, might not have considered). Security requires a lot of "due diligence" like this unfortunately: you can't expect the tool designer to be aware of all the potential security considerations in your environment, you have to deduce and mitigate them yourself.


And how would you expect it to work? Keep the repo in ~/.local/ or something like AppData? What happens then if you want to have two clones of the same repo? What if you want to move a clone around (e.g. to a different machine)? Of course it's possible to find solutions to these issues, but they will never be as simple and easy to use as the current model (btw: git supports having the magic directory external to the working tree).

It's non-obvious and unexpected if you have never used a VCS and didn't read a single page describing git. In which case almost everything in git will be unexpected and non-obvious (so will be any programming language or technology).

But I agree that the problem shouldn't be trying to avoid handling out the .git dir to everybody. The problem is - git clone in document root of the webserver for website deployment is a terrible idea and was never a supported usecase.

edit: so I came the third! Should try typing quicker.


It's actually not a problem for many web developers. If you're working in Rails, Django, etc., you're not going to have the DOCROOT pointed at your top level directory. And .git is only in the top level directory, which is already an improvement over .svn.

But let's say we agree it's an issue and should be fixed. What alternative solution would you propose? You can't just get rid of the "magic file" because it actually contains your version history, the thing you wanted to track in the first place.


The design of the version control system is not the problem here, is it?


Imagine you find a typo or bug on some site that you use daily and it really annoys you. How awesome would it be able to just `git clone http://www.microsoft.com/`, fix the issue and then either host your own version of the site or submit a pull-request. Seems like it's already possible on 600 websites :) (well they probably didn't run `git update-server-info` ).


For all those using github pages (gh-pages) to host their website, don't worry they've got you covered :D


I discovered almost the exact same problem in a very large production site once, except it was the .svn directory.


I'm not surprised in the slightest given the current trend to build everything in production.


Wait, do people store repo passwords in .git? Otherwise a simple remote address means nothing.


I think it's more they have access to your codebase, which could then be analyzed to look for exploits.


git (and hg and svn and tar and cpio and pax and zip and ...) should set the permission of the archive (vcs db) to the most restrictive of the permissions of those added.

Also ... webservers should make it very hard to offer up .dotfiles as webcontent.


Not rocket science, block it with "deny" ...


tor exit node scanning my site 6:20pm PDT.


I found what seems to be a botnet scanning for /.gitattributes in my logs. First scan was on 2015-06-02


I'll get on tangent here, and blame PHP.

The model of exporting a Unix directory as the structure of a website is barely good enough for static sites (you'll get into all kinds of problems with URL management), and is completely unsuited for applications.

Now, of course, PHP was created as a tool to add a visitor counter at the bottom to your pages. With a bit of caution, it's indeed secure enough for that. Nowadays people create huge applications using the same security model, and PHP developers don't even think about changing it.




Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: