Hacker Newsnew | comments | ask | jobs | submit | mechanical_fish's commentslogin

I decided to move all my private repositories to my own server.

When you do this, make sure that the server has continuous backups. Also, make sure you still have an offsite backup.

Once you figure out what these things are worth, you may realize that you should probably just keep paying Github.

reply

32 points by pilif 1 day ago | link

The backups aren't as important as each git repo is a fully blown code. If your local repo is destroyed, you still have the server copy. If your server blows up, you still have the local copy.

There are many other good reasons for a service like Github, like the excellent collaboration features, the really good repository and history browser or the good bugtracker.

If you don't need those (small team, working alone) but are concerned about uploading your intellectual property to a third party server in a potentially foreign country (depending on your location), then quickly setting up gitosis / gitweb / redmine might be enough for you.

In my personal case, I would really love to use github even for my small team, but I'm too concerned about the legal issues to go ahead with that (and the local installation is plain too expensive).

reply

4 points by jackowayed 21 hours ago | link

What legal issues/other issues from uploading your code to GitHub are you worried about?

I can't imagine that GitHub would steal your code. They've never heard of you, they have no reason to believe your code is worth anything to them, and one "I have pretty damn good evidence that GitHub stole my code" could ruin their entire business.

You mentioned legal issues. Are you afraid someone's going to ... subpoena your code or something? Because if that happens, you'd have to turn it over anyway.

They've got some pretty intense-looking security[1], and people like Twitter trust them with their code[2]. If they aren't worried, why are you?

1: http://help.github.com/security/

2: I don't know that that's officially known, but I saw Twitter commenting on the "GitHub now has Organizations" post complaining about the lack of the cheaper plan that they added the next day. So they definitely have some private repos on GitHub.

reply

4 points by patrickaljord 17 hours ago | link

> Are you afraid someone's going to ... subpoena your code or something? Because if that happens, you'd have to turn it over anyway.

Not if you don't live in the US.

reply

1 point by pilif 16 hours ago | link

I don't live in the US. Our company isn't based in the US. While I'm somewhat familiar with US legislation by reading HN, I certainly don't feel comfortable to upload my code to US based servers of a US company as I plainly don't know their laws well enough to trust them with my companies intellectual property.

Of course, I could always trust them for now and instantly remove my stuff when there are signs of trouble, but I asked them (a year ago) whether deletions are instant and irreversible and they told me the usual thing: repositories are not instantly deleted so they could restore them in case of accidental deletions. In addition they stay around in backups for an indefinite time.

Legislation not known well enough and no control over the removal of my code from their machines - call me paranoid, but these are good reasons not to upload my code to them.

reply

1 point by jpablo 18 hours ago | link

http://www.businessinsider.com/guy-who-stole-goldmans-doomsd...

reply

1 point by ErrantX 1 day ago | link

In 18 months when you need to a clients project gets deleted and you and find the server version destroyed... well...

Might sound unlikely but it happens.

reply

9 points by pilif 1 day ago | link

it could also happen that github loses data and it's hard to valuate the exact likelihood of you accidentally deleting two repos or github losing a fileserver and its backup.

Also, if github is down or your repo with them is corrupted, you have to go through their support. If your own server has a problem you can fix it instantly.

I'm not convinced that reliability is the correct reason to go GitHub. Features: Yes. Reliability: Not necessarily.

reply

5 points by mechanical_fish 1 day ago | link

Sure, Github can lose data. And you can lose data. But the advantage is that you and Github are much less correlated; the odds that both of you will lose the data at the same time are fairly low. [1]

Data safety is all about fighting correlation. You don't back up one partition to another on the same spindle, because when the drive dies the whole spindle is lost. Paranoid people back up to two different drives, two different disk controllers, two different machines, two different datacenters, two different continents...

---

[1] But nonzero. It is worth thinking about the scenarios.

reply

1 point by mkramlich 12 hours ago | link

As soon as bandwidth costs come down enough I have a great startup idea: backup copies stored on a different planet/planetoid. Mars. The Moon. Either one. Half-kidding!

reply

2 points by aarongough 23 hours ago | link

Agreed. For that exact reason my main dev machine now has hourly local backups through time machine, local HDD clone backups every 4 hours and a separate offsite backup with Mozy.

In addition to the remote repos on Github and my normal local copies...

reply

1 point by qjz 1 day ago | link

Agree 100%. Furthermore, it's important not to conflate version control with data backup. Although they share some traits, they have different goals. For example, if I lose my local working copy of a repository before any commits, I've lost valuable work. In absence of a good backup strategy, the existence of the remote repository is of little consolation.

reply

0 points by ErrantX 1 day ago | link

I'm <s>guessing</s>sure Github does backups.

QED.

reply

6 points by loewenskind 23 hours ago | link

Seeing "I'm guessing" paired with "QED" is... strange to say the least.

reply

1 point by ErrantX 23 hours ago | link

Only guessing in the sense I haven't actually checked :)

reply

10 points by jacquesm 1 day ago | link

I agree and disagree with you in equal measures. Paying github is no protection against github messing up, in the end you are still responsible for your data and any subsequent loss will be your problem, regardless of the cause of the loss.

So github can be a part of a backup strategy but it isn't a strategy by itself.

Likewise, there are plenty of parties that wouldn't dream of storing their data in a third party repository, it could be compromised, there are at least 'n' github employees that now have access to your data etc.

So there is a need for both options, one where you outsource your headache to github and keep a couple of local copies just in case, another where you do have your own repository that you control with the associated backup mechanisms and a number of off-site copies.

Fortunately github makes it easy to do the former and git itself can make it (relatively) easy to do the latter.

For plenty of people the first is enough. For me it wouldn't work, so I'm really happy this got posted.

reply

1 point by avar 1 day ago | link

I wrote this little shellscript to backup all my Git repositories on GitHub: http://github.com/avar/github-backup

It runs in cron and backs up all my data daily.

reply

5 points by ww520 22 hours ago | link

My poorman's offsite backup script involves zipping up the repository, encrypt it, and mail it to GMail. The task is scheduled daily and works pretty well.

reply

2 points by VBprogrammer 15 hours ago | link

I did this with my university project source code. The only difficulty is that Gmail does not allow executables even within a zip file. You have to work pretty hard to avoid getting them into your repo.

reply

1 point by koenigdavidmj 1 hour ago | link

    mv $ZIP_FILE_NAME $ZIP_FILE_NAME.txt

reply

1 point by ramidarigaz 12 hours ago | link

If you encrypt the zip, it works fine. Took me ages to discover that :)

reply

4 points by mcobrien 1 day ago | link

Daily, weekly and snapshot backups with Linode are $5 on top of what I'm paying them anyway (and just a few clicks to set up). That's less than the cost of GitHub's micro plan and I can have as many private repos as I want.

I love GitHub and I'm sure they'll continue to do well, but running your own Git server is only going to get easier. If you don't need the social side of what they offer, hosting yourself makes sense.

reply

3 points by mike-cardwell 1 day ago | link

I thought the point of git was that it was decentralised. So even if the server died, he wouldn't lose anything would he?

You can also stick a git repo on top of Dropbox... There are several articles about how to do this if you do a quick google.

reply

4 points by mechanical_fish 1 day ago | link

Git makes it trivial to replicate your repos to more than one directory, no matter where those directories are located.

Whether or not this makes you "decentralized" depends on where your directories live. Two machines in the same location? Not decentralized.

reply

2 points by mike-cardwell 1 day ago | link

Obviously. You can also backup your files to a folder on the same disk, but you wont will you.

reply

2 points by intranation 1 day ago | link

As the author, I can respond to this:

Given the decentralised nature of Git, having continuous backups becomes less important unless all my computers (including the server) fail at once.

But yes, backups are important, and they are done.

reply

2 points by nuclear_eclipse 1 day ago | link

If all your computers are in a single location, it's not decentralized in cases of fire, flood, earthquake, or other natural disasters...

reply

4 points by krschultz 1 day ago | link

Or even theft. I lost more than a week's work last year becuase a team member had his laptop, external harddrive, and desktop stolen from his house while he was visiting his parents for Thanksgiving. Whoops. Triple backups don't count if they're all in one apartment.

reply

1 point by davidw 1 day ago | link

You should already have those things in place for your server. I don't think that the marginal cost of managing git is much if you're already managing a database, web server, application, mail server, and so on.

reply

3 points by gst 1 day ago | link

Backups are important, but not that important if it's just a small private Git server. After all, the full history is not only stored on the server, but also on each of the clients.

reply

1 point by hippich 21 hours ago | link

You have to have backups on server anyway. And setting offsite backups to amazon s3 is like 30 mins of work and may be $1 per month in s3 costs. and this will backup not only your git repositories, but the whole server.

reply

1 point by yatsyk 1 day ago | link

Server most likely needs to be backuped anyway. Backups not so important for git as for svn because you are cloning repository with history etc.

reply


As I've said before, the problem is: Who is going to sell that device to the customer?

Google has no direct-to-consumer sales worth talking about. Phone carriers sell Android phones because they make money on the contracts; without the contracts they have little incentive to stock, market, or sell an Android equivalent of the iPod Touch. So who's going to step up to compete with the Apple Stores and the iTunes installed base? The same electronics companies that blew their years-long headstart in the personal music player business?

reply


This article has to be read very carefully. Near the beginning it says:

UberCab contracts with black car services – mostly Towncars and Escalades

That suggests that they are probably doing some due diligence.

My friends the management consultants report that they use these black-car services all the time. You think a business traveler with a big expense account hails one of those yellow cabs, like me and the rest of the plebes? ;)

The follow-up idea of dispatching cab traffic to any random idiot with a Lincoln Town Car, an iPhone, and a possible drug problem seems to be Arrington's idea. It is, as you point out, a pretty lousy idea. [1] But don't blame UberCab.

---

[1] I should know. I had this idea five years ago. Then I conducted a thirty-minute thought experiment and decided that the due diligence would be annoying and not my cup of tea.

reply


whether it's not programming that is complex, but many problems that exist in various fields of business or study

Unfortunately, it's both. ;)

While PHP is indeed a programming language (the term "scripting language" is a fairly meaningless label) when you work in PHP building web pages you're likely to spend most of your time working on things that are computationally tractable [1], but hard because it's just hard to translate the customer's problems into code within the available budget. Your customer has a problem, it's lots of work to map that problem onto code, it's hard to explain to the customer just how much work it is to turn the "simple" activities performed by (say) their administrative assistant into algorithms, and the result tends to be expensive to document, deploy, and maintain. So, yeah, it's the problems that seem to be hard, not the "programming" -- though, in fact, there is no hard-and-fast distinction there.

But then there are problems in programming that are difficult to impossible, all by themselves. The CS folks around here can point you at plenty of them, but here's a famous one:

http://en.wikipedia.org/wiki/Travelling_salesman_problem

which is a member of an entire class of famously hard-to-compute problems:

http://en.wikipedia.org/wiki/List_of_NP-complete_problems

which (I believe) are not particularly rare, but which come up in various disguises, and which must be carefully worked around.

On a somewhat more applied level, there are lots of difficult problems in code optimization that you can work on:

http://en.wikipedia.org/wiki/Low_Level_Virtual_Machine

Or you can spend your day exploring a giant set of data-storage possibilities, each of which is right in its own way, and wrong in its own way:

http://en.wikipedia.org/wiki/CAP_theorem

http://www.allthingsdistributed.com/2008/12/eventually_consi...

One may suspect, of course, that this distinction I'm trying to draw between "difficult programming" and "difficult problems" is not real; it's just a matter of the degree of abstraction you use when describing the programming. And I think you'd be right to suspect that. Programming is programming, and programming is hard.

---

[1] Though it is very, very possible to put something that is computationally intractable into a "simple" web page. Web pages have no upper bound of complexity.

reply


It's darkly amusing how people pretend this is about anonymity. As if there had never been a sexist jerk who had a name.

The first-order answer - the word that belongs in this headline - is moderation; reputation and identity are just tools to make the moderation slightly easier. But people try to avoid facing up to this. Moderation is a tedious task that we all really wish could be done by a machine. It can't.

reply

1 point by randomwalker 2 days ago | link

You're getting all wound up over nothing :) I write a blog about anonymity, so I'm pointing out that anonymity is part of the problem here.

Your point that moderation is the larger goal and reputation/identity is only a tool is certainly valid. I guess we disagree about how effective a tool it is. For a visceral, deeply depressing account of how different the same person's behavior can be depending on whether or not they're anonymous, check out the story of the harassment of two female Yale Law students on AutoAdmit: http://www.portfolio.com/news-markets/national-news/portfoli...

reply


I have a hypothesis: One secret to staying "young" in software is to avoid uttering statements of the form:

we chuck our experience, wholesale, every ten years or so... These are gratuitous reinventions.

Even when these statements seem to be true. Don't train yourself to think like this. It isn't going to help.

Yes, people -- especially the young, but the old as well -- go over the same ground a lot in software. That's called practice.

It's partly a generational thing: New generations need to learn from mistakes by making some of those mistakes; the process of having your elders lecture you about the mistakes is great, and can save you valuable time, but it is imperfect. Some things you just have to experience for yourself.

But it's mostly an evolutionary thing. We reinvent because the environment keeps changing, especially in computing, which has been rocked by epochal changes over the course of the last forty years. (Take out your iPhone and compare it to the machines and networks in use in 1970. Then compare the owners of modern smartphones to the owners of machines and networks in 1970.) The reinvented thing never turns out quite the same as the original, and those differences -- many of which look like accidents, some of which are in fact accidents -- are usually where the progress can be found. The new tool fits the new problem better than the old tool because it evolved in the presence of the new problem.

reply

1 point by wccrawford 3 days ago | link

Or to put it in a car analogy:

If we hadn't reinvented the wheel, we'd be using stone tires.

Reinventing is not always a mistake... And sometimes you have to make a lot of mistakes to make progress.

reply


The Fields Medal cannot be awarded to mathematicians over the age of 40.

Say what?

Geez, you're right. Wikipedia:

The Medal also has an age limit: a recipient's 40th birthday must not occur before 1 January of the year in which the Fields Medal is awarded. As a result some great mathematicians have missed it by having done their best work (or having had their work recognized) too late in life.

Clearly someone needs to endow a better math prize. You know, one that is for the best math, not the best math by a specific sort of person.

reply

3 points by dagw 3 days ago | link

That would be the Abel prize. Although I wouldn't call it a better or worse prize. The two awards simply have different goals they want to recognize. Abel is more of lifetime achievement award and Fields is more of a recognition of brilliant younger mathematicians and an encouragement to go on to greater things.

reply


Okay, before we go anywhere I need a couple of answers.

A. Is Instant Personalization ("InP") built on an open protocol?

In other words, if I build foo.example.com and it uses InP to access Facebook's social graph, and later on I decide I also want to support Google's social graph, will Google be able to serve that graph using the same InP protocol? Or can I write a middleware service that grabs data with InP, massages the data (e.g. "filter out only the friends who live in a particular zip code") and reexports the data using the InP protocol?

Or will Facebook's lawyers be fighting such a move every step of the way?

B. Okay, we're playing in Facebook's world, the world where everyone has a public social graph and everything in their profile is public, public, public. Fine. But does this mean that, if I use Instant Personalization to grab a user's extremely-public social graph, I'm allowed to use it as I would any other public information that I stumbled across on the web? Can I, for example, index it and search it? Or can I write a tool that populates a second social network's database with a list of friends from Facebook?

Or will Facebook's InP terms of service require that I avoid doing that, because what "public" really means is "everyone can look at it, but only if they pay Facebook one cent every time they look and promise not to take any pictures or remember anything?"

reply

1 point by ahaugen 5 days ago | link

Hey there. I'm Austin Haugen a product manager on the Facebook platform team.

A/ Instant Personalization uses the same set of APIs as the rest of platform. The only thing that changes with Instant Personalization, is that when you hit FB to see if the user is connected to your application, we will return 'connected' for all logged in Facebook users, who haven't opted out.

B/ Instant Personalization follows the same data policies as data you would get through a standard Facebook application. Details can be found here in section III: http://developers.facebook.com/policy/

reply


Yes. I never encountered someone who actually could have (a) sold me a CS degree, then (b) coached me properly, until after the Web was invented.

The other point I'd make is that I went to grad school, and frankly in the technical subjects the professors are rarely the best coaches. Most of them have not been beginners for a long time, and their primary focus is not on how to relate to beginners. (Primary focus: How to relate to funding agencies. Secondary and tertiary focuses: How to relate to the rest of the department, and how to relate to grad students. Somewhere down near the bottom of the priority queue: Relating to undergraduates. Those profs who are primarily focused on teaching are absolute godsends, but are probably also having trouble getting tenure; such is the nature of undergraduate technical education at elite universities.)

The best coaches tend to be slightly-older fellow students. Ironically, the social structure of undergraduate classes is designed to segregate students at different levels of experience; the moral of that is: You must take steps to seek out older students who are willing to offer advice.

reply


I started using this last week to refactor a pile of legacy Ruby code. Gemsets per app is a godsend for that. You can disentangle the giant set of installed gems on your machine, figure out which ones are actual dependencies and which ones are only there because they were part of some crazy experiment six months ago, which ones are only there because they are dependencies of some other gem...

reply

More

Lists | RSS | Search | Bookmarklet | Guidelines | FAQ | News News | Feature Requests | Y Combinator | Apply | Library

Analytics by Mixpanel