The backups aren't as important as each git repo is a fully blown code. If your local repo is destroyed, you still have the server copy. If your server blows up, you still have the local copy.
There are many other good reasons for a service like Github, like the excellent collaboration features, the really good repository and history browser or the good bugtracker.
If you don't need those (small team, working alone) but are concerned about uploading your intellectual property to a third party server in a potentially foreign country (depending on your location), then quickly setting up gitosis / gitweb / redmine might be enough for you.
In my personal case, I would really love to use github even for my small team, but I'm too concerned about the legal issues to go ahead with that (and the local installation is plain too expensive).
What legal issues/other issues from uploading your code to GitHub are you worried about?
I can't imagine that GitHub would steal your code. They've never heard of you, they have no reason to believe your code is worth anything to them, and one "I have pretty damn good evidence that GitHub stole my code" could ruin their entire business.
You mentioned legal issues. Are you afraid someone's going to ... subpoena your code or something? Because if that happens, you'd have to turn it over anyway.
They've got some pretty intense-looking security[1], and people like Twitter trust them with their code[2]. If they aren't worried, why are you?
2: I don't know that that's officially known, but I saw Twitter commenting on the "GitHub now has Organizations" post complaining about the lack of the cheaper plan that they added the next day. So they definitely have some private repos on GitHub.
I don't live in the US. Our company isn't based in the US. While I'm somewhat familiar with US legislation by reading HN, I certainly don't feel comfortable to upload my code to US based servers of a US company as I plainly don't know their laws well enough to trust them with my companies intellectual property.
Of course, I could always trust them for now and instantly remove my stuff when there are signs of trouble, but I asked them (a year ago) whether deletions are instant and irreversible and they told me the usual thing: repositories are not instantly deleted so they could restore them in case of accidental deletions. In addition they stay around in backups for an indefinite time.
Legislation not known well enough and no control over the removal of my code from their machines - call me paranoid, but these are good reasons not to upload my code to them.
it could also happen that github loses data and it's hard to valuate the exact likelihood of you accidentally deleting two repos or github losing a fileserver and its backup.
Also, if github is down or your repo with them is corrupted, you have to go through their support. If your own server has a problem you can fix it instantly.
I'm not convinced that reliability is the correct reason to go GitHub. Features: Yes. Reliability: Not necessarily.
Sure, Github can lose data. And you can lose data. But the advantage is that you and Github are much less correlated; the odds that both of you will lose the data at the same time are fairly low. [1]
Data safety is all about fighting correlation. You don't back up one partition to another on the same spindle, because when the drive dies the whole spindle is lost. Paranoid people back up to two different drives, two different disk controllers, two different machines, two different datacenters, two different continents...
---
[1] But nonzero. It is worth thinking about the scenarios.
As soon as bandwidth costs come down enough I have a great startup idea: backup copies stored on a different planet/planetoid. Mars. The Moon. Either one. Half-kidding!
Agreed. For that exact reason my main dev machine now has hourly local backups through time machine, local HDD clone backups every 4 hours and a separate offsite backup with Mozy.
In addition to the remote repos on Github and my normal local copies...
Agree 100%. Furthermore, it's important not to conflate version control with data backup. Although they share some traits, they have different goals. For example, if I lose my local working copy of a repository before any commits, I've lost valuable work. In absence of a good backup strategy, the existence of the remote repository is of little consolation.
I agree and disagree with you in equal measures. Paying github is no protection against github messing up, in the end you are still responsible for your data and any subsequent loss will be your problem, regardless of the cause of the loss.
So github can be a part of a backup strategy but it isn't a strategy by itself.
Likewise, there are plenty of parties that wouldn't dream of storing their data in a third party repository, it could be compromised, there are at least 'n' github employees that now have access to your data etc.
So there is a need for both options, one where you outsource your headache to github and keep a couple of local copies just in case, another where you do have your own repository that you control with the associated backup mechanisms and a number of off-site copies.
Fortunately github makes it easy to do the former and git itself can make it (relatively) easy to do the latter.
For plenty of people the first is enough. For me it wouldn't work, so I'm really happy this got posted.
My poorman's offsite backup script involves zipping up the repository, encrypt it, and mail it to GMail. The task is scheduled daily and works pretty well.
I did this with my university project source code. The only difficulty is that Gmail does not allow executables even within a zip file. You have to work pretty hard to avoid getting them into your repo.
Daily, weekly and snapshot backups with Linode are $5 on top of what I'm paying them anyway (and just a few clicks to set up). That's less than the cost of GitHub's micro plan and I can have as many private repos as I want.
I love GitHub and I'm sure they'll continue to do well, but running your own Git server is only going to get easier. If you don't need the social side of what they offer, hosting yourself makes sense.
Or even theft. I lost more than a week's work last year becuase a team member had his laptop, external harddrive, and desktop stolen from his house while he was visiting his parents for Thanksgiving. Whoops. Triple backups don't count if they're all in one apartment.
You should already have those things in place for your server. I don't think that the marginal cost of managing git is much if you're already managing a database, web server, application, mail server, and so on.
Backups are important, but not that important if it's just a small private Git server. After all, the full history is not only stored on the server, but also on each of the clients.
You have to have backups on server anyway. And setting offsite backups to amazon s3 is like 30 mins of work and may be $1 per month in s3 costs. and this will backup not only your git repositories, but the whole server.
As I've said before, the problem is: Who is going to sell that device to the customer?
Google has no direct-to-consumer sales worth talking about. Phone carriers sell Android phones because they make money on the contracts; without the contracts they have little incentive to stock, market, or sell an Android equivalent of the iPod Touch. So who's going to step up to compete with the Apple Stores and the iTunes installed base? The same electronics companies that blew their years-long headstart in the personal music player business?
This article has to be read very carefully. Near the beginning it says:
UberCab contracts with black car services – mostly Towncars and Escalades
That suggests that they are probably doing some due diligence.
My friends the management consultants report that they use these black-car services all the time. You think a business traveler with a big expense account hails one of those yellow cabs, like me and the rest of the plebes? ;)
The follow-up idea of dispatching cab traffic to any random idiot with a Lincoln Town Car, an iPhone, and a possible drug problem seems to be Arrington's idea. It is, as you point out, a pretty lousy idea. [1] But don't blame UberCab.
---
[1] I should know. I had this idea five years ago. Then I conducted a thirty-minute thought experiment and decided that the due diligence would be annoying and not my cup of tea.
whether it's not programming that is complex, but many problems that exist in various fields of business or study
Unfortunately, it's both. ;)
While PHP is indeed a programming language (the term "scripting language" is a fairly meaningless label) when you work in PHP building web pages you're likely to spend most of your time working on things that are computationally tractable [1], but hard because it's just hard to translate the customer's problems into code within the available budget. Your customer has a problem, it's lots of work to map that problem onto code, it's hard to explain to the customer just how much work it is to turn the "simple" activities performed by (say) their administrative assistant into algorithms, and the result tends to be expensive to document, deploy, and maintain. So, yeah, it's the problems that seem to be hard, not the "programming" -- though, in fact, there is no hard-and-fast distinction there.
But then there are problems in programming that are difficult to impossible, all by themselves. The CS folks around here can point you at plenty of them, but here's a famous one:
One may suspect, of course, that this distinction I'm trying to draw between "difficult programming" and "difficult problems" is not real; it's just a matter of the degree of abstraction you use when describing the programming. And I think you'd be right to suspect that. Programming is programming, and programming is hard.
---
[1] Though it is very, very possible to put something that is computationally intractable into a "simple" web page. Web pages have no upper bound of complexity.
It's darkly amusing how people pretend this is about anonymity. As if there had never been a sexist jerk who had a name.
The first-order answer - the word that belongs in this headline - is moderation; reputation and identity are just tools to make the moderation slightly easier. But people try to avoid facing up to this. Moderation is a tedious task that we all really wish could be done by a machine. It can't.
You're getting all wound up over nothing :) I write a blog about anonymity, so I'm pointing out that anonymity is part of the problem here.
Your point that moderation is the larger goal and reputation/identity is only a tool is certainly valid. I guess we disagree about how effective a tool it is. For a visceral, deeply depressing account of how different the same person's behavior can be depending on whether or not they're anonymous, check out the story of the harassment of two female Yale Law students on AutoAdmit: http://www.portfolio.com/news-markets/national-news/portfoli...
I have a hypothesis: One secret to staying "young" in software is to avoid uttering statements of the form:
we chuck our experience, wholesale, every ten years or so... These are gratuitous reinventions.
Even when these statements seem to be true. Don't train yourself to think like this. It isn't going to help.
Yes, people -- especially the young, but the old as well -- go over the same ground a lot in software. That's called practice.
It's partly a generational thing: New generations need to learn from mistakes by making some of those mistakes; the process of having your elders lecture you about the mistakes is great, and can save you valuable time, but it is imperfect. Some things you just have to experience for yourself.
But it's mostly an evolutionary thing. We reinvent because the environment keeps changing, especially in computing, which has been rocked by epochal changes over the course of the last forty years. (Take out your iPhone and compare it to the machines and networks in use in 1970. Then compare the owners of modern smartphones to the owners of machines and networks in 1970.) The reinvented thing never turns out quite the same as the original, and those differences -- many of which look like accidents, some of which are in fact accidents -- are usually where the progress can be found. The new tool fits the new problem better than the old tool because it evolved in the presence of the new problem.
The Fields Medal cannot be awarded to mathematicians over the age of 40.
Say what?
Geez, you're right. Wikipedia:
The Medal also has an age limit: a recipient's 40th birthday must not occur before 1 January of the year in which the Fields Medal is awarded. As a result some great mathematicians have missed it by having done their best work (or having had their work recognized) too late in life.
Clearly someone needs to endow a better math prize. You know, one that is for the best math, not the best math by a specific sort of person.
That would be the Abel prize. Although I wouldn't call it a better or worse prize. The two awards simply have different goals they want to recognize. Abel is more of lifetime achievement award and Fields is more of a recognition of brilliant younger mathematicians and an encouragement to go on to greater things.
Okay, before we go anywhere I need a couple of answers.
A. Is Instant Personalization ("InP") built on an open protocol?
In other words, if I build foo.example.com and it uses InP to access Facebook's social graph, and later on I decide I also want to support Google's social graph, will Google be able to serve that graph using the same InP protocol? Or can I write a middleware service that grabs data with InP, massages the data (e.g. "filter out only the friends who live in a particular zip code") and reexports the data using the InP protocol?
Or will Facebook's lawyers be fighting such a move every step of the way?
B. Okay, we're playing in Facebook's world, the world where everyone has a public social graph and everything in their profile is public, public, public. Fine. But does this mean that, if I use Instant Personalization to grab a user's extremely-public social graph, I'm allowed to use it as I would any other public information that I stumbled across on the web? Can I, for example, index it and search it? Or can I write a tool that populates a second social network's database with a list of friends from Facebook?
Or will Facebook's InP terms of service require that I avoid doing that, because what "public" really means is "everyone can look at it, but only if they pay Facebook one cent every time they look and promise not to take any pictures or remember anything?"
Hey there. I'm Austin Haugen a product manager on the Facebook platform team.
A/ Instant Personalization uses the same set of APIs as the rest of platform. The only thing that changes with Instant Personalization, is that when you hit FB to see if the user is connected to your application, we will return 'connected' for all logged in Facebook users, who haven't opted out.
B/ Instant Personalization follows the same data policies as data you would get through a standard Facebook application. Details can be found here in section III: http://developers.facebook.com/policy/
Yes. I never encountered someone who actually could have (a) sold me a CS degree, then (b) coached me properly, until after the Web was invented.
The other point I'd make is that I went to grad school, and frankly in the technical subjects the professors are rarely the best coaches. Most of them have not been beginners for a long time, and their primary focus is not on how to relate to beginners. (Primary focus: How to relate to funding agencies. Secondary and tertiary focuses: How to relate to the rest of the department, and how to relate to grad students. Somewhere down near the bottom of the priority queue: Relating to undergraduates. Those profs who are primarily focused on teaching are absolute godsends, but are probably also having trouble getting tenure; such is the nature of undergraduate technical education at elite universities.)
The best coaches tend to be slightly-older fellow students. Ironically, the social structure of undergraduate classes is designed to segregate students at different levels of experience; the moral of that is: You must take steps to seek out older students who are willing to offer advice.
I started using this last week to refactor a pile of legacy Ruby code. Gemsets per app is a godsend for that. You can disentangle the giant set of installed gems on your machine, figure out which ones are actual dependencies and which ones are only there because they were part of some crazy experiment six months ago, which ones are only there because they are dependencies of some other gem...