Hacker News new | comments | show | ask | jobs | submit login
How to Host Your Own Private Git Repositories (eklitzke.org)
271 points by eklitzke 159 days ago | hide | past | web | 108 comments | favorite



For those who aren't aware, Git is actually a fully decentralised system. It doesn't require a central server as such - though most teams use one, as it's a convenient setup for most projects.

Even if you are hosting on github/bitbucket/et. al. though, that repository is just one of many equals. You can push and pull from multiple peers as long as you have access set up appropriately.

I recommend the chapter on distributed workflows in Pro Git:

https://git-scm.com/book/en/v2/Distributed-Git-Distributed-W...

There's also an explanation of the different supported protocols here:

https://git-scm.com/book/en/v2/Git-on-the-Server-The-Protoco...


> Even if you are hosting on github/bitbucket/et. al.

Normally if you're hosting on one of those sites, you're using other features (wiki/issues), which may _not_ be decentralised


I kept my office machine as an additional remote for when I was working at home, pushing over ssh to save an extra pull when I got in.


Gitlab also can be run in your own server. It actually has an enterprise and an open source version. We use the open source one for a couple of years and it is really great - I recommend it with all my heart.

It has great instructions for installing directly from source and you don't really need to be familiar with ruby to install it. It requires some standard components (web server, database) which should exist on all servers and then you follow the instructions and presto! You have your own gitlab!! It also has great upgrade instructions so you can always be up-to-date.

I know gitlab can also be used through the cloud version (and it even has free private repos) however some organizations feel better if the source code of their projects stay inside the organization.


Gitlab is massive though. I think it's a great solution for teams on dedicated hardware, but if you need something quick, low-maintenance and lightweight Gitlab might be a bit overkill.


Digital Ocean has so called one click apps, that they can deploy and setup at any droplet and gitlab is one of them: https://www.digitalocean.com/products/one-click-apps/gitlab/

I have tried it before and it just works, took my less than 5 minutes to have a fully working gitlab instance on a $10/month droplet. It's not the cheapest option probably, but very low effort so it's quite worth it.


You can't get much quicker or low maintenance than gitlab, though it is definitely not lightweight.


Gogs? Gitolite?


Or just plain git, as in TFA.


This is a good setup if you are the only one accessing the repo.

If you need something a little bit more complex, I would highly recommend gitolite for managing repositories & users. Configuration is done via some INI/TOML-like files in a git repo. User public keys are stored in the same repo.


gitolite has amazingly powerful access control, not just per-repo but per-branch or per-directory within a repo. It has very flexible self-service features, if you are OK with remote commands over ssh. Lots of nice scripting hooks.

Gitolite home page: http://gitolite.com/gitolite/

I run a multi-tenant gitolite setup for the University of Cambridge: https://git.csx.cam.ac.uk/


I've run a ~30-user, ~1000-repo gitolite instance for a few years and while it has done its job admirably, there are a couple of things to know about:

It has a pretty clever config management system where the configuration is actually committed to a git repo. It's great if you're managing configuration by hand, but was difficult to automate via chef.

In order to let users create their own repos, we enabled "wild repos". It's as simple as a `git clone` with the desired repo name, which is great except that users often make typos and end up accidentally creating typo'd repos. The only person who can delete repos is the user who created them or the server admin deleting directories. Perhaps there are better features I should have used?


Wild repos are the self-service thing I referred to. You can turn off auto-create, so users have to explicitly run the `create` command - I did this, and typos have not been a problem for my users. The other niggle is the `perms` command is not as easy to use as I would like. But on the whole gitolite with wild repos has been really low-maintenance.


Could you point to the documentation regarding per-directory access control? I find the documentation very difficult to navigate and I can't find anything relevant. I'm also not sure how it could work.


> I'm also not sure how it could work

Git repos can have an "update hook" that is executed during a push, after the new commits have been uploaded, but before the branch is updated to include the new commits. The hook can inspect the new commits, compare them to the old ones, and decide to reject the update. (It's a shell script, so every imaginable condition is possible.)


But that cannot prevent read access on a per-directory basis. I appreciate now that's not what you were referring to.

I did find http://gitolite.com/gitolite/list-non-core/#partial-copy-sel... which seems to almost do what I wanted but that'll quickly explode to many copies of copies for any non-trivial situation. I know git can't do what I'm asking but every now and then something comes along and I gives me some hope before reality sets in.


I do have a colleague using a partial copy, but if you need fine-grained read access control, you either need lots of git repos, or a different version control system.


...meanwhile, if your access control needs are less sophisticated, you can create the common group for all your git users, and use "git init --bare --shared=group" to set repository permissions correctly.

(You'll have to create separate UNIX shell accounts for all your developers sharing a common group with this setup, and this may or may not be a good idea.)


I second gitolite. It was great for preventing force push by newbies :)


Hehe, I just found out that I am using gitolite since about six and a half years by now. It exists primarily for my personal endeavors, but at the moment it hosts about (small) 50 projects and 10 users.

While the hardware below had some major failures (2 SSDs died during that period) the gitolite always survived. I like it :-)


What about Gitlab? Isn't that more popular than gitolite?


I think the point here is providing a simple, lightweight solution. GitLab encompasses much, MUCH more than repository hosting.


My impression is that GitLab is a web application. It does not really serve the same use case. Web access is good for open source projects where people may want to read the source without downloading the full repo. In the case of private projects, chances are that everyone involved will want to clone the whole repo anyways.


I don't mean to be pedantic, but Gitlab helps to host your own private git repos, which fits the title of the article. The article goes further, giving an example of how you can do some bare metal git hosting on a lightweight VM.

It's the same as running postfix/dovecot for configuring your own mail server (which could run on a lightweight VM), or using a turnkey solution such as Zimbra (which will include spam/virus filters, LDAP, calendars and much more).

Depends on your organisation: how often do you create accounts? who can do the sysadmin work? etc. I'm happy to sit back and let managers handle account management, and to be able to put Gitlab/Zimbra on a job description if we need to hire someone.


Tried Gitlab: It's incredibly messy. Haven't tried Gitolite. One of my criteria with git repos is the education of new programmers, and I believe starting with a clean, lean UI gave me an initial positive impression that Git was simple (surprising ey?). I won't advertise for the product of my company, but there are a few good UIs around there.


Why do you say it's messy? Would love to get feedback on improvements we can make :)


How many UI designers do you have per developer?

Giving you feedback is a one-off; but the right density of UX designers in your teams and user interviews in your process will teach you much more.


I've been using Gogs happily for two years:

https://gogs.io/docs/installation


Eh. The method described in this post (ie bare repositories on a filesystem and ssh transport) is perfectly sufficient for my personal use if I'm hosting. The minute there's a MySQL dependency I'm a hard pass...

I buy the need for the gitlabs of the world when multiple users show up, if only because managing credentials is a pain. But for single user use cases I wouldn't waste my time.


The instructions say that you can just use SQLite, though.

"Based on your choice, install one of supported databases or skip this step"


> I buy the need for the gitlabs of the world when multiple users show up, if only because managing credentials is a pain.

gitolite has worked well for small multiuser teams for me. No difficulty with managing credentials and no need for a heavy solution.


Both gogs and gitea (forked from gogs) support using sqlite instead of mysql. This means there's no external database server to deal with, and since it's a small use set of private repositories it performs quite well.


I moved to https://gitea.io/ myself, largely because it seems to be more frequently updated these days.


definitely the best git repo for small teams! better than gitolite and manual hosting, lighter than gitlab, the best for me.


Do you do code review in your teams? That's the main reason that I am using Gitlab over Gogs/Gitea for my team.



Assuming you already have files in your home directory on the server backed up, and SSH access, then a repository for a single user is as simple as

  mkdir project.git; cd project.git
  git init --bare
And to clone

  git clone user@example.com:project.git


"Doing this is cheaper than paying GitHub, and it will give you the satisfaction of being a True Hacker."

Or I could work on problems that really matter and leave the sysadmin job to someone who gets paid to do it. ;)

More seriously, when I was in college, I spent a lot of time doing things like running my own mailserver, selfhosting various projects, etc. I learned a lot. But in the Real World, I don't want to be responsible for more than I have to be; off the shelf products are just better for me, most of the time.

The fact that Bitbucket and Github will pay a guy to run a git server for me is amazing (even if it is evidence of some sort of irrational enthusiasm on part of VC firms). Why would I not want to take advantage?


Except running a git server is little more than having a place to store a git repo and giving ssh access to it. There's really no maintenance if you already have the server. What github et al provide isn't repo hosting so much as fancy UI tools on top of that.

Edit: seriously, I wonder how many people who just automatically go to github have ever bothered to try the simple act of creating a git remote on a file server on their own network, or even just to a different host. It's really easy, and it really underscores how simple it is to have your repo distributed without any 3rd party infrastructure. Once you see that, you see that putting a copy on a shell account on your hosted VM is dirt simple and requires almost no administrative burden.


On the contrary, there's a ton of maintenance, and the hosting providers like GitHub and Microsoft pay teams of people to deal with the infrastructure. (I've worked on both.) This involves not just the physical infrastructure like the servers, though of course there's that, but also maintaining the bits on disk. Your repository will get duplicated across multiple disks on multiple machines, perhaps in different availability zones, and then of course they're backed up to yet another location.

So what companies like GitHub and Microsoft provide is - yes - the fancy tools on top but also teams of professionals ensuring that your repositories are available quickly.


GitHub etc. need "a ton of maintenance" because of the "fancy tools", which are an unusually complex and sophisticated constantly evolving web application with many users.

A private source repository is far less demanding: it's almost never upgraded, and for system administrators it's just another server to keep running and another file system to back up.


I'm not talking about the fancy tools. I'm talking about just serving Git repositories, not about web applications.

GitHub is distributing your Git repository across multiple servers in multiple racks in real time for reliability and availability and is the world's largest Git repository hosting provider. Some nice conference talks discuss this, like from Git Merge: https://www.youtube.com/watch?v=f7ecUqHxD7o and GitHub Universe: https://www.youtube.com/watch?v=DY0yNRNkYb0

Visual Studio Team Services is hosting your Git repository across Azure, and is hosting the world's largest Git repositories. https://arstechnica.co.uk/information-technology/2017/02/mic...

I have my own Git server as well, and I agree that its maintenance isn't very demanding. But I'm not putting production bits on it. My open source repositories go to GitHub and my private repositories go to VSTS - they're providing a level of service that I simply can't match by myself.


I'm not talking about putting production bits on my own git server, either. I'm talking about how easy it is to create a couple of personal remotes just as a second and third copy of whatever I happen to be working on on my laptop. One goes to my NAS and one goes to my rackspace VM that I keep around for random projects.

I have github and bitbucket accounts, but for little projects, I much prefer the simplicity of effectively just having dupes of my repos on other machines of mine. And it's really nice that the way I interact with them is precisely the way I'd interact with github or repositories at work, despite the fact that they're just directories sitting behind an ssh connection.


Yeah, this is what I used to say when I was in college, too. ;)

"Just another server to backup" is, indeed, not a big deal. It's strictly more work than not having another server to backup, however, and when it fails, it fails hard--in the sense of, "Hey, I just blew Saturday fixing my personal git server" or "Hey, I just lost data because I realized my backup cron was broken."

Running your own server is a bug, not a feature. The fact is, it's a very _minor_ bug because running your own is so damn easy. But it's still a bug. (The fact that you can run your own server, in the contrary, is a feature.)


Github, bitbucket, gitlab, et. al are more than a git server. They are code review, community management, issue tracking, and the literally hundreds of integrations for all parts of the software development lifecycle.

Anyone with more than a personal pet-project will understand the value created by these services the moment they have to stand up the supporting items that make SDLC possible.


The problem (for me) is that I have no clue how to effectively manage UNIX users. I don't want my git user to have access to the full filesystem, and maybe I want to add another user for a friend who can only see game.git but not taxes.git, etc.

Using SSH on a local server works for some use cases (and I do use it) but it doesn't scale at all.


But the fancy UI tools are the thing the people want. In real world after setting up your ssh on your fileserver, you are ending up in installing gitlab, etc which includes maintaining a database, webserver, ssl certs, etc


Well, I've been pushing my personal code to bare git repos on a rackspace VM for years and I never felt the urge to install gitlab or any of that other fancy stuff, so ... I guess I'm strange? To me, it's very nice to just keep it simple.


I did, because I wanted to have the experience of understanding how a Git server works. But I still like using GitHub. Their web UI is useful, and I don't have to maintain a server.


In the "real world" people like to keep their private source code private, and they run version control servers on their own LAN with tape backups or the like.

I personally set up a remote-access git repository on Windows, with Windows domain authentication (Apache+mod_ldap+mod_authnz_ldap), and it wasn't any more difficult than any other Apache installation; a more sensible platform would require a negligible effort.


Why do this when http://bitbucket.com will host private git repos for free? They also have have Large File Storage implemented as well.

I use GitHib client using BitBucket for repo hosting with LFS and it works great, no need to host anything.


> Why do this when http://bitbucket.com will host private git repos for free?

Because you never know when one of these online services will suffer an outage, suffer a security breach that leasks your private repos or email & credentials, or even lose your data entirely. The fact that the service is free doesn't mean that it doesn't come without potential issues.


But... you are using another online service to host the service yourself, no? And presumably they are also throwing in backups and redundancy of some sort... certainly it would take you a minute to set those things up, test them, monitor they are working... And isn't GitHub basically like everyone's resume these days?

I don't know... at some point you sort of just have to trust someone... be it the hosting provider, or the service provider. And I'm old... but I've had to untangle issues with CVS / VSS / SVN / etc... over the years I don't want anything to do with that crap -- if I can punt it to someone else to manage I'm OK paying some tiny subscription fee.

I think there's been one day in the last ~10 years when I couldn't use GitHub. To me that seems worth the $20 a month, or whatever they charge now.


It's especially sad because git is distributed and every time there's a GitHub outage I hear online some teams are blocked for the day. But everyone has the whole repo. You're not screwed like if your SVN or Perforce server goes down. Anyone could become the new "remote to push to / pull from" until the outage is resolved, or you could set up one of these bare repos somewhere pretty quick. When the outage is resolved someone just pushes to the original and you're all fine.


IME, the popular-remote-service-is-down panics mostly seem to apply in web development world, where apparently any cool project must have an insane number of dependencies, no local failover or caching, and production build/test/deployment processes that are intimately connected to those dependencies that have no local equivalents. GitHub goes down, their entire CI and deployment system falls apart, and no-one has any idea what any of their fancy cloud-hosted doodads was actually doing at the underlying level of the Git repos or which files need to go where on the servers, so they don't know where to start with recovery...


This is what happens when people cargo-cult git because it's what the cool kids are doing, instead of actually making an informed choice.


S/git/GitHub/

Fixed that for you.


Well, that too, but the issue here (thinking that a central repo being offline means you can't use your decentralized version control system) shows that they don't understand git in any form.


I think it's honestly that some people think Git === GitHub, or that GitHub has some magic secret sauce that makes things work.


My company uses github, and it's been down enough to significantly impact work about a dozen times in the past 3 years. There were periods before that where github would be down enough to cause CI or CD builds to fail on a daily basis for months. Still, It's probably worth it for what we pay and the service we get for us to continue using github, even with outages. However, for my personal repos, I host the primary repo on my own infrastructure on my own network on my own property, using only software that is licensed under a free/oss license. I mirror public repos to github as a way of publishing the work, but I'm not going to get caught having to make a tough/expensive decision when Github changes their service plans again.


Why have pictures stored anywhere else when https://facebook.com will host private pictures for free?


Many people (including me) like the idea of self-hosting because we feel more in control.


Why do this when http://bitbucket.com will host private git repos for free?

Not everything needs to be in the cloud, and using any online service run by someone else brings some degree of reliability, security, privacy and longevity risk. Some of us just prefer to avoid those risks, and usually will unless there's some compelling benefit that outweighs them.


Or gitlab.com which offers private repos and also allows you to self host your own gitlab instance if you need to.


Because maybe for security reasons you want to host it on a server internal to your organization's network.


Hosting it yourself means that you don't have to muck with LFS. Want to check in a 7GB file? Go right ahead.


LFS is a tool to use to avoid checking in 7 gb files. If you want to weigh down your repo with binary bloat, don't configure your repo to use LFS.


Have you tried setting up LFS on your self-hosted ssh git repo?

Turns out LFS is designed to authenticate over https and doesn't work at all with ssh credentials out of the box because f* you.

I'm still hoping for a native git feature for large files, so the git-lfs crap can die in a fire.


LFS was developed by GitHub, a company that depends on people thinking it's "too hard" to run their own DVCS repo hosting.

Compare this with the Mercurial LargeFiles extension, which needs nothing more than a line in the .hgrc on each end (client/server) to enable it.


Very interesting benchmark of gitbucket vs gitea vs gitlab on a raspberry pi ;)

https://gitbucket.github.io/gitbucket-news/gitbucket/2017/03...


Security question. Can `git-shell` restrict users to their remote home directory? Or if you give me a git shell, can I still do things like `git clone me@example.com:/home/you/secret-sauce` ?

This is only an issue if you're sharing the box and/or remote repositories with other people. For shared remote repositories I've been using the following setup:

1. Create a bare, shared repository at `/var/git/foo`. Configure unix group permissions and the directory setuid bit on it.

2. Give alice access via a `/home/alice/foo -> /var/git/foo` symlink.

3. Set alice's shell to a patched version of the git shell I call `git-home-shell` that sanitizes the repository path argument and makes it relative to her home dir.

Is there a better way these days?


Why not just set standard file permissions (owner and group)?

You could create a group for each repository, and add and remove members as necessary.


This is precisely what I do.

But, I'd prefer it if git-shell didn't let users probe and read git repositories at any absolute path on the remote end. That's not great behavior for a restricted shell.


Remove the 'others' read/execute permission from user home directories.


Sure, but there might be git repos sitting around elsewhere. Why risk exposing a git repo literally anywhere in the filesystem to a restricted shell account?


Then use a chroot.


You could do that, but that means for a shared repo and N git shell accounts you've got N chroots, presumably using null or bind mounts.

That's a lot more work than a restricted shell which just...restricts.


True, it all depends what your requirements are.

Personally I'd just designate a path for shared repos (e.g. /srv/vcs/<project>{.git,.hg} etc), give people write access using ACLs and group membership. If they create repos in their home directories, thats their business.


For multi-user use probably best solutions are scripts/systems that automate this whole thing like gitosis and gitolite. You get multiple users logging in as same system user which are authenticated by their SSH key and administration of the thing is mostly performed by commiting into special git repository.


Thanks for that. The gitosis approach of a single shared unix account (typically `git`) with git-shell and a master authorized_keys file definitely seems simpler. You lose some of the auditing and security benefits of the one-unix-account-per-human approach, but that might be fine for some use cases.


Quick note: what this article doesn't explicitly says is that as long as you have a shell account somewhere with a decent amount of disk space, you can host or mirror all the repository you want.

If I may make a suggestion, I'd recommend the Super Dimensional Fortress Public Access UNIX System (https://sdf.org/).

They're NetBSD-based if I remember correctly, and for a low fee (36$/lifetime ARPA membership + 9$/quarter) you can host most of the things you would like to host.

And you don't have to do system maintenance.


I thought the ARPA membership was yearly?


Why self-host your repo but store backups without encryption at google or amazon? If you want it to be private, just make it so.

edit: thanks anyway for your version!


Privacy isn't binary, and there's a large difference between "Amazon could potentially read this, but they'd be breaking their own ToS and some laws to do so" and "Public on the internet".


But what is the difference with BitBucket then?

Besides, if you worry that state actors are interested in your source code, I do not see how Amazon being law-abiding would be of any help here..


If you don't need code review and don't mind a hosted solution, you can get by with the AWS free tier and use IAM for all your access control.

    AWS CodeCommit:
    5 active users per month
    50 GB-month of storage per month
    10,000 Git requests per month
    Does not expire at the end of your 12 month AWS Free Tier term.
https://aws.amazon.com/s/dm/optimization/server-side-test/fr...


I do self-host a few repositories as well, but I do not set up a separate user: I just create a git/ dir in the home dir of the account I want to host the repositories on and put the repositories there.

To simplify the initial setup, I created a handy shell script, reposetup [1]. It makes to create repositories, push to them and remind me their urls.

[1]: https://github.com/agateau/reposetup


If you’re using Google Cloud, you can already use Google Cloud Source Repositories. https://cloud.google.com/source-repositories/ It supports git and the Beta release of Cloud Source Repositories provides free hosting of up to 1GB of storage.


Yup and it can mirror Bitbucket which is also free. So you can just create it in Bitbucket and get free backups via Google.


As others have already stated, this article is a great introduction for when you'll be the only one to access the repositories, as anyone able to authenticate for that account will have access to all repos.

There are some tools that restrict access with varying levels of granularity, but if you just want to restrict access on a per-repo-per-sshkey basis, one of my projects is a simple shell script that does just that:

https://github.com/cbdevnet/fugit

It originally came to be because I've found gitolite too big to maintain for simply sharing some repositories with a few other people. It has since served me well and is used in some business applications, too.


Does anyone know of a good AMI or Docker container that's got all this already set up, as far as it's possible?

(I know it doesn't look complicated, but if there's a decent "standard" already out there...)


Gitlab has their own docker images, and it includes pretty much anything you could ever want out of a git service.

https://hub.docker.com/r/gitlab/gitlab-ce/


I've been quite happy with gitea/gitea.


Cloudron.io is easily the best for docker deployments.


Has anyone had any experience in getting Passbolt (https://github.com/passbolt) working for authentication with Git?


http://gitblit.com/ is also worth looking into. It is more sophisticated then gitolite (which I also use) and less hairy then gitlab. It is Java based, but doesn't require a database.

Also, for backup, rather then tar up the ".git" directory, I use "git bundle <backupfilename> --all" which creates a flat file with all branches included. This file can then be uploaded to GCS or S3.


If you want something light weight and "run and done" then check out GitBucket [0]. You only need the JVM & git installed. When you run the program it sets everything up. Very easy, clean interface, and simple to back up (I just snapshot the entier folder).

[0] - https://github.com/gitbucket/gitbucket


Just do it in google drive or dropbox, get backups for free.

https://stackoverflow.com/questions/1960799/using-git-and-dr...


I do this often. One thing, if you want to have multiple users all working on the same repo, which I believe is a common use case, you have to take group permissions and umask into account. Works really nice.


Kallithea is good enough for this use case. We are using it in a 15 member team, without any issue.


Anyone use AWS's CodeCommit as a mirror for your GitHub repository?


Mmm. Just use GitLab/Gogs on cloudron?


What this article doesn't cover are how to set up the things that make Github so useful like issue tracking and pull requests. Github (and similar services like bitbucket) are more than just Git.


pip install git-remote-dropbox

Is a git extension to allow a Dropbox to be used as your remote. Works from the CLI. Been using it for two years.


I use Gogs and a self-hosted sandstorm.io.


The gogs package is ancient. I woukd be very careful of running outdated stuff much less recommending it to others


RhodeCode is a good option for hosting, it has an advanced permission system, streaming push support and it scales well.


Thanks for the pointer. Your service looks interesting!

A little advice: it's probably best to post a disclaimer that you are the founder of RhodeCode whenever you post about it on HN. OTOH, you get credit for including that info in your HN profile.




Applications are open for YC Winter 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: