Hacker News new | past | comments | ask | show | jobs | submit login

I work for a non-profit open source organization that collaborates on github (https://github.com/edx/) We have lots of people who aren't employees, but have signed a contributor agreement with our organization and contribute changes to our software. Our bill will go up from $200/month to over $2000/month with this new pricing. We can afford it (it's still a small fraction of our AWS bill) but it will force us to look at other alternatives. Github's code review tools are already pretty mediocre compared to other tools like gerrit, and we've long since moved off of github issue tracking due to lack of features compared to JIRA.

I've used Bitbucket in the past. They charge per-user[1], and their pricing is significantly better free for 5 users and then once you eclipse 5 users it's $1/user up to 100 users and then $200 for unlimited users.

[1] - https://bitbucket.org/product/pricing


I use Bitbucket for a variety of small, personal projects. And when I teach Git courses, I use Bitbucket to illustrate (and practice) pull, push, and pull requests (among other things). Sometimes people wonder why I'm not using GitHub, which has made itself synonymous with Git for many people, but after a short explanation, they understand.

However: GitHub has a huge network effect. Most developers with any sort of open-source connection have a GitHub account; many fewer have Bitbucket accounts. This isn't the end of the world, but it does mean that GitHub has a leg up on the competition as far as name recognition is concerned.

Plus, there are lots of tools that talk to the GitHub API.

So, while I personally endorse using BitBucket, I can understand why many would stick with GitHub. It'll be interesting to see what this price change does.

This. I use bitbucket too and I can't understand why people are still paying github when bitbucket does exactly the same and is free.

Github does seem to be the gold standard for open source projects.

Having said that, I used to pay for Github's personal plan and found the cap on private repos (5 when I subscribed) to be a little too limiting. I ended up canceling the subscription and using a Bitbucket free account for private repos and Github for public repos.

After using Bitbucket for a while, I think if I had to upgrade to a paid service, I'd just stick to Bitbucket.

I also use Bitbucket and am very pleased with it. It seems like their model is that bitbucket is well integrated and complements their paid products like JIRA and Teamcity, so they are not depending on free users to upgrade like github needs to, so it seems more likely to be sustainable.

I will tell you one reason: Bitbucket is unbelievably slow where I live (Tokyo). Using it is mostly a matter of gnashing your teeth and waiting.

GitHub is actually painfully slow too; just not as slow.

That's the main reason I pay GitHub and use them for those projects where I need something more than just a git server for collaboration.

People use github over bitbucket if they like the features that github provides, the diffs, issues etc. Not sure which features are unique to github alone, but each provider (including gitlab) has their own flavor. Many are just used to github flavor.

I use bitbucket at work, and it has many of the same features as github. The UI is a bit clunkier, and it does lack some of the flashy features such as automatically squashing commits when merging a pull request. However it does work fine, and it is well integrated with atlassian's other offerings which we also use. Mainly JIRA and hipchat, though bitbucket also has a per-repository issue tracker.

An aside, but I hate that squash feature more and more each day at my job.

Could you elaborate? Is it used for every pull request? That would be annoying, but sometimes developer leaves a trail of "work in progress" commits, so I'll wind up squashing those anyway.

These services are not the same, not since I last checked. Someone forks your repo, makes a change and then submits a PR. I can't pull their changes down until they give me read access to their forked repo.

I found a lot of these little annoyances to be a big reason to switch to github. Not to mention their UI pales in comparison to github, which I think says a lot.

I've never understood why all the fork/change/commit/push/pull-request fol-de-rol is necessary. Why can't we just pull, hack, and submit a "push request"?

But I guess I've never understood why everyone else seems to be OK with a build/repo system where it is even possible to break the build at all; it's always seemed to me like it should be set up so you push to a testing stage, which merges to master if the build succeeds and the tests pass, but otherwise fails and sends you an error log. Never seen a build system set up that way, though, and the couple of times I've looked into the matter it seemed like I'd basically have to code it up myself, which was more effort than I was willing to put into it.

With such an architecture, the idea of a "push request" would just be a manual OK step in addition to the automated test validation step.

It's not necessary, it's a waste of development time and easily the biggest big-picture collective failure of our software engineering profession of the last decade. It easily beats out anything from the $foo.js world and the ongoing low-level security nightmare of web application development, because git, and more importantly, unnecessarily complex and error-prone git workflows have seen adoption across all kinds of software. There are probably a dozen projects in the world (the kernel admittedly being one of them) that are justifiably a good fit for the complex git-native workflows that have become standard practice across the industry today.

But if you're doing it the old-fashioned way, you might as well use Subversion. Or mercurial, if you want all the local history, with the added bonus that unlike git it sensibly keeps the least-surprise semantics of 'commit', 'revert', and other commands that merely have a 30-year history of expectations that held true prior to git. But Mercurial was not authored by Linus, nor does it have the impenetrable, otherworldly data model that a first-time version-control-system author would unavoidably end up concocting in scratching their itch without consulting the existing, completely satisfactory, solutions that served us well for decades, which greatly reduces the number of interesting topics you can blog about for Mercurial.

And so, git sees the adoption, github gets the $2bil valuation, and even bitbucket ends up switching to git as it's default. It won.

Software engineering, collectively, has a lot of maturing to do.

I have been using git for years and I still find its interface completely inscrutable. I have given up trying to learn it in a way that makes it make sense, and simply use a handful of everyday commands I've memorized by rote and look everything else up when I need it. I can't think of any other piece of software I actually use which has such a messy, non-predictable interface. Even 'make' eventually succumbed to rational analysis when I finally managed to suppress my nausea long enough to dig in and learn it as though it were a real language.

This. Even simple things like "undo last commit" has hard to remember syntax that I always have to google for it. I wish darcs was more successful.

> I have given up trying to learn it in a way that makes it make sense, and simply use a handful of everyday commands I've memorized by rote and look everything else up when I need it.

? Here are the commands that I have used and make sense to me:

branch, tag, log, diff, push, pull, fetch, commit, rebase (with or without -i), reset, add, rm, mv, stash, status, remote, bisect, reflog, blame, and fsck.

Is this set of commands more or less the same as the set of commands that you've learned by rote memorization?

Commonly: pull, push, commit -a, reset --hard HEAD, checkout $FILE, status, log, show $HASH.

Sometimes: commit --amend, rebase -i, add, rm, mv, stash [pop|apply].

Rarely: branch, revert.

Git's documentation uses such a wide and flagrantly inconsistent variety of terminology and maintains such a poor distinction between its interface and its implementation that trying to read it actually worsens my understanding and reduces my confidence. I get everything useful from stackoverflow and ignore the docs at this point, and have thus resigned myself to using git as a form of voodoo.

Subversion was so much clearer; I wish it had done a better job with merges and hadn't been so server-dependent. Mercurial seemed to actually care about its interface design, and the DVCS experience might suck less if it had won, but I've never had a chance to actually use it.

Thanks for the reply!

I agree that some of the commands have a very large array of options. I see many these options as very specialized tools (and certainly don't claim to know all (or -in some cases- most) of them). I, too find the "git config" command to be largely useless. Its value is in scripts or in Git frontends. Also, the git-config manpage has a complete listing of all valid git config options. So, there's that. :)

> Commit, checkout, and reset seem rather more complicated than necessary and don't do anything useful in their default forms...

When called with no args, commit records changes staged with add/rm/mv in the local repo's history. Checkout changes the tracked contents of the working copy to that of another point in the repo's history (so it is meaningless to call it without an argument), and git reset is destructive (and has no --force option), so it makes sense to require an argument. [0]

In regards to commit and checkout:

I came to git by way of Subversion. These two confused me for quite a while. What helped me to understand the logic behind them was to realize that -unlike SVN- git has

* The working copy, which is manipulated with a bunch of commands

* The area where changes that will be included in the next commit live, which is cleared out after every successful commit, and is manipulated by add, rm, and others

* Your local repo, which is manipulated with commit and checkout

* One or more remote repos, which is manipulated with push, pull, and merge

But maybe you already had this solidly in mind, and this explanation was a waste of your time. :(

> As always, trying to read the documentation leaves me with less understanding than I had before I started.

Have you familiarized yourself with a significant fraction of git's vocabulary? The man pages became much clearer once I did so. [1]

> ...for reasons I cannot comprehend they also get involved in merge resolution, where they perform tasks with no visible relationship to their names or their normal jobs.

Oh. That's because a merge operation adds a series of commits from one or more branches into another branch and effectively makes a new commit with the result of the operation. If conflicts can be automatically resolved, then they are. If they cannot, then it's up to you to stage the changes you want to see in the merge commit (using add and friends) just like you would do when preparing any other commit. Does that make sense?

> I have no idea what reflog would do; is there a flog command too?

Nah. It's a command for examining and manipulating the reflog, which is -effectively- where git makes a record of every change that happened to your repo. You pretty much never need to use the command, but I have used to see just how git handled a set of complicated squash and commit reorder operations. From the man page:

  Reference logs, or "reflogs", record when the tips of branches and
  other references were updated in the local repository. Reflogs are
  useful in various Git commands, to specify the old value of a
  reference. For example, HEAD@{2} means "where HEAD used to be two moves
  ago", master@{one.week.ago} means "where master used to point to one
  week ago in this local repository", and so on. See gitrevisions(7) for
  more details.
If you've gone on a mad history rewriting spree and have confused yourself (or simply accidentally moved a branch a while back and can't remember where it used to point), you can use reflog to trawl through the change history to save yourself.

> I am generally more inclined to use stash or a second working directory than to deal with branches, since it's less busy-work.

I'm curious. What busy-work do you have to do? I typically just have to do: "git branch whatever; git checkout another-branch; git branch -D some-other-branch".

[0] Though -conceptually- a substantial portion of reset's functionality overlaps with checkout's functionality. So, that's silly and nonsensical.

[1] Not that I'm implying that such a thing is be required to use git, mind.

Thanks for your thoughtful dig into these commands.

Perhaps my biggest point of confusion with git comes from that nebulous intermediate structure which sits between the real working directory and the real repository, which sort of acts like a repository and sort of acts like a working directory. It has many names and no clear purpose, and it doesn't fit into my mental model of the work to be done when working with a VCS.

Your explanation of merges makes more sense from that context. I don't think of add/mv/rm as operations on the nebulous repository, because I don't have any idea why one operates on the nebulous repository; what I'm trying to do is tell git to track a file, or stop tracking a file, or notice that I've moved a file from one place to another. The fact that these operations also kind of half-commit changes to this semi-repository is just... confusing, because I don't know why one would care.

If the pseudo-semi-repository thingy actually made sense, perhaps it would seem more natural that add/mv/rm do things to it during merges. I suspect the behavior of checkout, reset, and commit might also make more sense; as is, they seem to be needlessly complex, because I am never manipulating the semi-repository on purpose: I'm either trying to move my changes from the working directory into the local repository, or I'm trying to update my working directory to match the local repository, but in no case am I ever trying to half-update the intermediate state I can't actually see.

Given this somewhat confused explanation and the fact that you've done a great job of explaining what git is doing so far, can you point me at something not written by the git authors that explains what the hell is going on here and why? I would like to understand the tools I'm using instead of just blindly typing arcane rituals cribbed from the internet, but as I said before trying to read the git documentation just leaves me more confused than before I started.

Is the semi-repository thing you are talking about the index?

If so, I would describe the index as a sort of staging area for preparing your commit. You might not necessarily want to include every single change in your workspace in your next commit. The index allows you to pick which things you want to go into the commit then git-commit creates the commit from what's in the index. If you don't care for such behaviour, and you just want to commit all changes in tracked files in your workspace, git-commit -a does that.

This[0] is one of the best tools I've seen for understanding git commands and even a bit of how git works. It's interactive, divides things into the different 'places' that content can be in git and then shows you how each command moves content between those places. Click on the workspace and it will show each command which does things to content in your workspace, the bar the command is written on shows what the other area it interacts with is and the direction that the command moves content. I hope it helps. [0] http://ndpsoftware.com/git-cheatsheet.html

paddyoloughlin's explanation of the index is a good explanation.

I might add that if you didn't have the index, then you could not make a commit that contained an add, a rename, and a deletion. If you think far too deeply about how Subversion handles these operations, it becomes clear that -conceptually- Subversion had an index/"staging area", too. Ferinstance, the output of 'svn help add' says

  add: Put files and directories under version control, scheduling
  them for addition to repository.  They will be added in next commit.
  usage: add PATH...
What's git's index but a list of changes that have been enqueued to be performed with the next commit?

The big conceptual distinction between SVN and git is that with SVN you tie exactly one repository to a given working copy. In git you can tie multiple repos to a given working copy and (by default) one of those repos is stored in the same place as the working copy.

Does that make sense, or is the index still somewhat-to-rather unclear and/or mystifying? (I mean, other than its kinda crappy name.)

> ...can you point me at something not written by the git authors that explains what the hell is going on here and why?

If "here" is "with git in general", I read the Git Book [0] ages ago, and combined what I learned from it with a fair amount of fucking around with my repo, and also with the contents of the gittutorial(7), gittutorial-2(7) and (parts of) gitglossary(7) man pages. [1]

From looking at the ToC of the Git Book, it looks like chapters 1, 2, 3, and 7 would be relevant to your interest. Chapters 5 and 10 might be relevant. I can't offer any guarantees, as I last read the Git Book ages ago, and this look like it's a new version... the one I read didn't make any mention of Github.

Though, if you were asking about something more specific, I'm happy to take a stab at answering that question once I know what it is. :)

[0] https://git-scm.com/book/en/v2

[1] Even though you asked for things not written by the git guys, I got a fair bit of value from the official git tutorials. It's also possible that you've overlooked them, so I bring them up.

Very sorry to have rewritten my comment out from under you - I took a look at it, decided it was needlessly verbose, rewrote it, and promptly dropped into a subway tunnel... So my edit actually went through somewhat later. Now I wish I hadn't bothered!

You've nothing to apologize about! :)

I'm often overly verbose, but am typically too lazy to write shorter comments. I considered re-working my comment to address your edited comment, but I think that it covers both comments.

I agree, especially when all you really want to do is make a small change. (I'm less likely to contribute then - preferring to open an issue and nag someone on the team/who already has a fork to implement my one-liner!)

But I think it would be look pretty radically different architecturally - where would such a 'push request' live? You don't have push rights on the repo; nor do you have your own fork.

It needs to work in a Gitty-way - if not GH would anger far more people than one-line contributors. And Git needs access to a repo to which to push.

I suppose a fairly nice solution might be something like: - clone - edit - push --set-upstream origin gh-pr-<my-patch-name>

Github could then respond by mocking push-rights for the repo, but really creating a new 'hidden' repo; and mirroring commits on that branch to a PR opened on the original repo (to which you don't actually have anything beyond read-access).

When the PR merged they could delete the 'stealth repo'; of course if you wanted to maintain your own fork it could still work the way in which it does today.

A "push request" would just be a git-format-patch on the contributor's side and a git-am on the project maintainer's side. Github wouldn't have to create a separate repository - they'd just have to keep an incoming "mail" spool for each repo and show some UI letting the project maintainer review these patches and approve them or deny them. Maybe there actually would be an email address representing "push requests" to the repo, but more likely I'd imagine github would just offer an upload box on the project page where you could post a patch file.

This would be so much easier than the pull-request dance that I bet it would lead to a lot of simple fixes or improvements being contributed by people who otherwise wouldn't get involved at all. Those have value in themselves - but it would also go a long way toward helping recruit new project members, if people could dip their toes in easily before having to go through all the mumbo-jumbo of a private fork and configuring the upstream and all that.

> I've never understood why all the fork/change/commit/push/pull-request fol-de-rol is necessary. Why can't we just pull, hack, and submit a "push request"?

I've always wondered why GitHub doesn't add automatically add an "upstream" remote when forking a repo. Once you fork a repo in the GitHub UI, there is no way to pull changes from upstream repo other than manually creating your own "upstream" remote.

Thanks - I am unfamiliar with gitlab but will have to check it out.

Network effects make github more valuable, to an open source organization, than self hosting or hosting on another provider. Almost everyone has a github account and asking users to sign up for another account is a significant barrier to contribute.

Neither service charges for public repositories. We're talking about private ones here.

I'd be more careful with any offering from Atlassian. They recently increased significantly Crucible/FishEye prices. IIRC unlimited plan used to start at 200 users, now they added more tiers but kept the prices of the original tiers.

If you are a non-profit, your bill can be $0. You just need to sign up here[0] and send in the data requested.

[0] https://github.com/nonprofit

It doesn't seem this is valid for academic nonprofits (from the linked page) so I don't think EdX would qualify.

That's right. I didn't notice it was academic. GitHub has a similar deal for education[0]. The bottom of the page has a link to apply for a free educational org.

[0] https://education.github.com/

> We have lots of people who aren't employees, but have signed a contributor agreement with our organization and contribute changes to our software.

So you have volunteers, working on your proprietary, private software for free. The labor is free & now you're complaining that you'll have to pay a per-free-laborer fee for the infrastructure to manage all these free-laborers? I hope I'm missing something here...

The problem being that a minor volunteer who donates 4 hours of coding over the course of a year would now incur a $108 github bill for having that access. It's totally out of proportion with how much they're using the service.

Or say you have 80 very-part-time contributors who together match the output of 1 full-time employee, github is going to charge you as much as they would for 80 full-time employees.

Any pricing structure is going to have some people who get a great deal and other people who get screwed, but it sucks when you've selected a platform, invested in getting set up on it, and then have the pricing rug pulled out from under you.

the software is AGPLv3'd, and run by hundreds of educational organizations around the world. Those organizations typically contribute changes back via Github. Non-employees don't contribute to our private repositories. We gain quite a bit from maintaining a large open source community, but it's not "free labor."

I apologize- I thought volunteers were contributing to your private repos. If I understand correctly, your issue is that you have (say) 10 employees accessing private repos and 100 contributors to public/Free ones, but you're to be charged for all users you add to your org? I can see how this would be frustrating.

I see two possible solutions that don't force you to switch vendors:

1. Have non-employees fork & submit pull requests. 2. Split your private stuff off to a different org & formally separate free stuff from proprietary, make the free stuff community managed.

If these are problems, I'd maintain that this is a "have your cake and eat it to" problem, on the one hand keeping ownership & control of the project and reaping the attendant benefit to the edx brand, and on the other hand getting people to hack on your stuff for free. But in any event this is a broader existential issue that exists across the OSS world right now (see express.js), so I'm probably reading too much into your case. :)

From the announcement:

"These users do not fill a seat:

Outside collaborators with access to only public repositories"

Ah, I didn't see that part of the announcement at all. That makes the new pricing much closer to what we were paying before. Thanks for pointing it out!

oh yeah. That's probably why I assumed "collaborators" must have had access to private edx repos.

You are jumping to a whole lot of conclusions there.

Why are you assuming that the labor is free? And regardless if it is or isn't, how is that even relevant? GitHub has no idea how much or how little a contributor is paid for their contributions.

If you are non profit and open source do you really need private repos?

We have about 100 private repos at the moment, including internal tools, branding related components, infrastructure code and pre-release stuff that we're not developing in the open. We may just move those to AWS code commit, gitlab or gogs and switch back to a free org.

You're very welcome to switch to GitLab.com, we are free forever https://about.gitlab.com/gitlab-com/#why-gitlab-com-will-be-... and it can import directly from GitHub.

GitLab CE (Community Edition is Free) and is truly great we use it in our own internal software development process.

However for a bigger enterprise they require more functionality here is a comparison of the differences between the Community and Enterprise Editions.


Several pricing options for EE but essentially the base cost is $39 /year per user.


gitlab.com runs EE though doesn't it? You only need to pay if you want self hosted EE.

It does, yes :)

Great to know, thank you! However our requirements were offsite and must be why I never knew about the free EE hosted on GitLab!

I've only ever seen the trial but I could be wrong!

You are replying to the CEO of Gitlab... :P

Which is very OK :) BTW we published a blog on SaaS git pricing a minute ago https://about.gitlab.com/2016/05/11/git-repository-pricing/

Is gitlab still using MySQL or its variants behind the scenes?

The gitlab import is really SLOW and there is no clear indication of the progress. Try to import https://github.com/torvalds/linux.git, it takes forever.

FWIW, I work for an academic non-profit and we use public repositories with an appropriate LICENSE. We like it because we don't need to manage read permissions.

What kind of bad things will happen if people are able to see what you are doing?

Your first comment stated the problem was your large number of non-employee contributors. You said in another comment that non-employees don't contribute to your private repos.

I don't see the problem here. Your employees will have access to your private repos, and the volunteers will not (thus you won't be paying for their seats).

"that we're not developing in the open"

Honest question... why?

I totally understand the mindset of "gotta go all secret squirrel to protect our profits" but if your org isn't in it for the profits there's not much to protect?

I have seen examples of people performing very naughty acts like private repos to hold plain text passwords, plain text cloud service keys, plain text corporate credit card numbers for expense payments, etc.

There are a ton of reasons not to develop in the open, no matter what your structure.

- You're experimenting - You don't want comments from the peanut gallery while things are in progress - It is not for external use, specific to an institution or project, or otherwise nobody else will care - It deals with something sensitive - You've made an agreement with someone else that requires it - etc. etc. etc.

People seem to have weird notions about nonprofits. Your tax structure doesn't change the fact that you operate in a world of other human beings.

mostly these reasons. We also want to make sure that code we open source is properly documented, has appropriate functional tests, and is useful outside of our organization. Our typical workflow is to build a POC, then an MVP, then build out documentation and unit tests.

Non-profit organizations still have a mission that they need to protect, and they almost always have revenue they care about, though not profit.

Private repos are a good way to review code for things like plaintext passwords and service keys before it's in production. If a developer commits something with a key, and code review goes "Oh, you shouldn't have put that there," and it was public, now you have to rekey. Private repos allow that code review step to take place.

(They're also pretty useful for legacy code where eliminating all the private keys is difficult and not an immediate priority, and for the rare but existent cases where including private keys in source is the right engineering tradeoff for new development.)

There's also no way to disable pull requests and other outside comments on your code, other than making a private repo. Having it private is a simple way to avoid inviting the public to have opinions all over your repo.

My first job out of university was at a not-for-profit, and it is a surprisingly cut-throat sector.

We had two main competitors in our space, and while the ultimate goal for everyone including our competitors was to do a common good, we were competing for a limited pool of donation dollars.

Because of that, sharing any intellectual property that made us better at what we did (i.e., raise more money, hire more staff, fund more initiatives) could result in a competitor using that same IP to put us out of business.

I get that in the big picture, it's not the way things should be done, but in the small picture, you're usually talking about individuals with their own agendas.

In a non-profit that I collaborate with [1] we use private repos to keep the server setup and some tickets that contain sensitive information (user data). All other code is open source. Obviously we don't want to keep api keys etc. in the public repos.

[1] https://github.com/sozialhelden/wheelmap

You shouldn't be keeping api keys or other sensitive information in git at all. And please note -- if you do remove it from git, it will be available in your git history so that needs to be taken care of as well (should the repo ever become public -- a common "exploit").

Git is just a format for storing data with a record of how that data changed. Saying you shouldn't store it in git seems rather like saying you shouldn't store it in btrfs. It's true that if your btrfs disk image becomes public, the data is recoverable, and it's hard to reliably scrub deleted files from btrfs, but that doesn't mean it's the wrong tool for a filesystem (or a repo) that stays entirely internal.

Saying that you shouldn't keep it on GitHub is different, and I might be more inclined to agree with that, but it still seems like it's not a 100% rule.

> Saying that you shouldn't keep it on GitHub is different

I'd be willing to argue about that, but for my newrelic api key, a private github repo is sufficiently safe - even if I'd prefer if nobody starts having his servers report to my account.

A git repo is usually shared over multiple machines/developers. So the chance of someone publicating it is larger. As well as the entire history is usually copied everywhere

All of these machines and developers have legitimate access to the secret in question, though. Hence my framing of git as just a file storage format—any other mechanism provides the technical means for any of these machines or developers to publicize it. (And a few other simple mechanisms, like "scp the secret from another machine" or "copy/paste it with your terminal", have an increased risk of doing so by accident. Accidentally making a git repo public is generally unlikely.)

Why would that be a general rule. I track my personal passwords in git (using pass). I'm the only person with access to that repo. I just like to have a history - and the convenient way of moving files around and merging changes.

> Saying you shouldn't store it in git seems rather like saying you shouldn't store it in btrfs.

Best practice is to avoid storing secrets in plaintext, or sharing secrets between users/roles. Yours isn't an argument in favour of git, it is an argument against btrfs.

(I don't have any problem with storing passwords in that way, I'm just pointing out why it's not the best practice.)

> Best practice is to avoid storing secrets in plaintext

How do you store them, then? If they're encrypted with a password, how do you store that secret?

I'm pretty sure best practice is in fact to store things like SSL private keys, cookie HMAC secrets (e.g. Django's SECRET_KEY), and so forth on local disk unencrypted, protected by only filesystem permissions (and the host OS as a whole protected with standard means). In fact I'm not even sure it's possible to store OpenSSH private keys unencrypted.

> or sharing secrets between users/roles.

There's only one role here: the application that has an API key. There are multiple developers of that application, and possibly multiple instances of that application, but it's a single role.

OpenSSH client private keys can be stored encrypted - that's what ssh-agent is for: it allows you to enter the key passphrase only once and then remember it for the rest of your desktop session.

OpenSSH server private keys, on the other hand - I don't think that makes a whole lot of sense. Unless you have a threat model that forces you to encrypt the entire server disk, but then adding private key encryption on top of that doesn't make much sense either.

Right, exactly. (I did mean to say "server", thanks.) It sounds like the secrets in question are essentially analogous to OpenSSH private keys: they allow a server / service to prove its own identity to others, and the servers should be able to launch automatically at boot so there's not a reasonable place to enter a passphrase.


It's just one role, but multiple users.

Do you have any tools you recommend for that? I love TPMs, but this seems wildly impractical for a small project with developers who aren't excited about becoming TPM experts.

Also, does this rule out hosting on clouds that don't offer vTPM support? (Are there any that do?)

There are dedicated discrete HSMs that can be installed. That's what I would do. Or, rather, wouldn't. I agree with you that it would be very impractical, unless the platform has a first-class API:

Chrome OS uses TPM heavily[1], and iOS has the Security Enclave. The standard TPM API is PKCS#11, so any hardware that speaks it can be used with any software that speaks it.

Problem with TPM is that the whole hardware and software stack needs to be secure, which in practice means it needs to be designed top-down with awareness of the TPM, and audited. The secrets must not be cached, written to file system, kept in memory, leaked over network. There are implementations such as Trousers[2], but it's more or less just a proof of concept; it may provide additional security, but most likely you're just using a very complex lock, and leaving the key under the mat.

[1] https://www.chromium.org/developers/design-documents/tpm-usa... [2] http://trousers.sourceforge.net/man/tpmtoken_setpasswd.1.htm...

care to explain why? I need to keep my API keys somewhere so I can roll them out to the machine. Keeping them in git is as good as any storage - what would you propose instead? A shared dropbox account?

The newfangled approach is something like HashiCorp's Vault, which is a dream when you're looking at more than half a dozen systems with similar roles. A different approach that I like to use for single or smaller cluster systems is Ansible's Vault and rolling out config files based on templates per environment. All actual config files are gitignored so I don't have do deal with conflicts on the server if I use a git-pull style deployment, and ansible itself can backup/version whenever they change.

Additionally, git does keep that history (as it's supposed to), so if you just delete the key from a private repo as you're trying to make the repo public, it's trivial for someone to walk the commit history looking for historical API keys that might not have been rotated. In order to purge that information from git, you then have to go re-write the commit graph from the point of the key's insertion (with it removed) all the way to the present. It's not impossible to do, it's just a major pain.

I'm aware of the implications concerning the history, but sorry, the machine park is two machines. Setting up vault would just be total overkill. The people that have access to that repo change like once every few years. The repo will never go public. Let's keep the solution at least somewhat tailored to the problem.

Hey, I'm not arguing one way or the other. I like using Ansible for configuration in the way I work. I can trust that I can show my best friend and my worst enemy my project and they won't have the capability of making my life hell. Rock on though. Use the simplest solution for the problem at hand. If you're just managing two boxes though, I'd have a hard time coming up with an argument for adding more complexity to the setup to essentially make it unchanged.

It's all chef-based so we could be using encrypted databags, but as anybody with access to the repo has root on the machines anyways, there's little to gain there as well, especially given the very limited security implications. I'd be more worried that somebody adds his account to the sudoers list that stealing the secret data. But hey, things were that way when I joined and there's better places to spend my time to improve security.

All keys are tracked in git's history, so it's a possible attack vector for hackers. You could use https://github.com/sobolevn/git-secret. But beware, anytime you revoke someone's access, you should regenerate all secrets stored in there. In any case you should always regenerate the keys whenever someone's access is revoked

What the hell? Are you defending a decision to keep keys unencrypted in a git repo?

It's a perfectly defensible decision. The standard cryptographer's reply at this point would be, what is your threat model?

If "A developer could have their GitHub account broken into" or "Someone could break into GitHub deeply enough that they could access private repos" are in your threat model, you shouldn't be using GitHub at all for anything, including code, because it would be straightforward to use that access to subvert your site in other ways. Which is to say, especially for small sites, that's not a useful threat model.

If "You might do a git commit to remove them, then push the repo somewhere" is in your threat model, then the answer is just "Don't do that" (or more precisely, "Make sure everyone on the team understands that can't be done without precautions"). The easiest way to don't-do-that is to have them in a separate git repo from your code. But either way, as projects grow, there's going to be stuff in your git history you don't want to be public (like, oh, git commit -m "Implementing this stupid feature because this customer is stupid") because human error happens sometimes. So if you want to publish a previously-private codebase, the only robust approach is to copy all the files into a new non-git repo and make a new commit.

And the other part of the cryptographer's reply is, where else are you going to store the secrets and what are its security properties?

yes, indeed I am. I'm all in favor of keeping the tools used to a level where the effort makes sense to protect the value of the goods. I totally could lock up my newrelic api key in a bank safe, double encrypted with two persons 4096 bit GPG keys, but that would be a little overkill, wouldn't it? Do you do that? I'd be moderately annoyed if somebody started pushing false metrics to my NR account, but that's about all the damage they could do with the information in that repo. So what level of effort would you propose?

Agreed, we do this as well at some scale. The vast majority of application configuration falls into this category. The advantage of storing them in a git repo (we use a different git repo to the main codebase) is that you can re-use the same access control mechanisms (note that is not the same as giving the same people access to the different repos) and you get strong change history.

> we use private repos ... that contain sensitive information (user data).

Wait, what?

email addresses and (account) names of people reporting bugs in private. Some people prefer it that way. Nothing "sensitive sensitive". Sorry for being unclear.

never store sensitive data like API keys in a repository. Or you can do that but encrypt it so that nobody which can view your repo (even if it's private) can use that data immediately. It's like storing passwords in plaintext in a DB. Every (DB) admin will tell you: Don't/Never do that.

I've recently also have to do with this problems while doing server setup with a private repo. I'm using Ansible and Ansible Vault to encrypt sensitive data and the encryption key itself is only accessible (a password safe) to certain members of our team http://docs.ansible.com/ansible/playbooks_vault.html

see, the whole repo is accessible to the members of the team that are allowed to see the secret - basically the two folks that have root on the machine anyways. There's very limited use in encrypting the repo. There are no SSL keys or any secrets that would require tight security. It's basically our newrelic and some other api keys for reporting services. Even if that repo would be breached you could only start sending fake data to those services.

I'm more concerned about someone hacking the machine than someone hacking github to access the repo and retrieve the newrelic key from there.

ok, agree. That is not that critical.

And you share the passwords with enough volunteers that per users pricing becomes a problem?

I don't handle the account in that case, so I can't even say if it's free or not. I was just replying to the implied question "why would a nonprofit org with an OS project need private repos?"

Fix your security.

Or... else?

Nonprofits need CRUD apps just like any other organization.

"Being a nonprofit" doesn't mean "developing open source software".

At that scale of developers and cost, surely it makes sense to host your own instance of, say, GitLab?

Especially as you note AWS costs are much more - I'd have thought it would be much more economical to consolidate into AWS and run a VC server there.. but I'm not trying to tell you what's good for you, I'm just a guy with no experience of responsibility for things at that scale who's curious ;)

It's a non-profit and open source? Why not just use public repos like Code.org and many other non-profit software teams. $0/mo is surely better for a non-profit.

For non-profits, you could reach out and see if they'd do something special for you. I used to work in for a higher ed institution and they were gracious enough to give me a free account.

just use gogs. https://gogs.io

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact