Hacker News new | past | comments | ask | show | jobs | submit login
GitHub Trip Report (fossil-scm.org)
195 points by networked 5 days ago | hide | past | web | favorite | 55 comments





This is a pretty interesting writeup, even though I don’t necessarily agree with everything. Makes you think, at least.

I’ve been out of version control for a bit, and GitHub itself for quite a bit longer, but this part was interesting to me:

> Because: It isn't really about Git anymore. GitHub used to be a Git repository hosting company. But now, they are about providing other software development and management infrastructure that augments the version control. They might as well change their name to Hub at this point.

Just for added color: starting around 2010 or 2011 or so (around when we added Subversion to GitHub), we had a pretty solid idea that version control wasn’t “the thing”. Mostly because Mercurial was a strong alternative, and there always felt like there might be something new in the wings that could dominate in the next few years. Version control felt really dynamic, and deeply susceptible to change. And if something was better, we’d totally ditch Git for the new thing. The name was a bit of a misnomer, from that perspective.

I think that changed over time — I had a hell of a time trying to get movement on basic version control improvements back in 2014 — and now they’re clearly much more about the ecosystem rather than version control itself. It’s where the money is, of course, but it’s also where the bigger innovation is, at least for the time being.

I think the author is right to say that Microsoft is targeting a different area than version control specifically, though you could argue if the outcome of that is good or bad. It’s certainly different, though- they’re especially growing headcount right now, and the company makeup is wildly different than what many customers tend think it is today, imo.


> Because: It isn't really about Git anymore. GitHub used to be a Git repository hosting company. But now, they are about providing other software development and management infrastructure that augments the version control. They might as well change their name to Hub at this point.

I agree fully, but the one "feature" missing from drh's analysis is GitHub's social aspect. Getting code out in front of people and allowing them to interact with it (directly, via commits and PRs, or indirectly, via issue comments) as a feature of a social network is their differentiator.

GitHub's choice of git wasn't germane to building a social network: GitHub could have been successful using svn, excepting that git was ascendant at the time.

Fossil's feature of an integrated web server is an anti-feature to GitHub; it defeats the purpose of having a centralised "Hub."


ChiselApp.com provides a similar "Hub" for Fossil and it still serves a useful function in the Fossil community -- somewhere to host your Fossil repos.

I've been thinking about adding CI/CD support and other tooling, but the CPU resources required would be cost prohibitive.


I would love to use Fossil, but git is too entrenched. The only thing that would allow me to use it right now is basically automatic, two-way migration between git and Fossil, so I can use it locally and other devs can use Git, and we can all still use GitLab for the CI, etc.

Unfortunately, this seems pretty hard to do, so I'm afraid git is too powerful. I absolutely detest its porcelain, but what can we do?


Fossil can be configured to do a bidirectional sync between git [0], with the exceptions noted here [1] -- but it may not be a good idea [2] and I've never set it up.

[0] http://fossil-scm.org/home/doc/trunk/www/inout.wiki [1] http://fossil-scm.org/home/doc/trunk/www/mirrorlimitations.m... [2] http://fossil-scm.org/home/doc/trunk/www/mirrortogithub.md "Notes" section


I guess we could write better porcelain on top of it…

Maybe i’m crazy but i find it completely unremarkable that a GitHub strategy session leaves core git completely alone and instead focuses on improving the ecosystem around it.

This doesn't surprise me in the slightest. Looking at GitHub product changes over the last few years (or even GitLab product changes), the core git components have been largely untouched. In fact I'd argue that the core git part of GitHub hasn't really changed since ~2-3 years into the company.

This makes sense, because GitHub doesn't add value in "improving" the git part. Anyone can run a git server, and git itself doesn't really have much else to it.

GitHub adds value in all the places around it. Collaboration, process, teams, communication, code review, security. These are things that most people using git will need, which are better when integrated or close to their version control system, and things that git doesn't address.

As a GitHub user, I'm glad that this was what they focused on.


I'd be less generous and say outright that GitHub benefits from git being a bad tool that NEEDS an external ecosystem to make it usable.

Improving git itself goes directly against their interests.


Many of the things that GitHub adds are not things I see being part of Git.

In case people are unaware, drh is Richard Hipp, aka the author of SQLite and Fossil.

https://en.m.wikipedia.org/wiki/D._Richard_Hipp



> Surely a better approach is to record the complete ancestry of every check-in but then fix the tool to show a "clean" history in those instances where a simplified display is desirable and edifying, but retain the option to show the real, complete, messy history for cases where detail and accuracy are more important.

This is one thing (but not the only thing) that I intensely dislike about git vs bzr. Loglevels are not particularly new and can be applied so directly to the VCS log that bzr log has a level option.


boy, does this guy's ethos for fossil impress me a lot more than the justifications i hear for the insane complexity of git.

to me, this kind of result is the most compelling argument that the world of high tech isn't nearly so much a meritocracy as it is made out to be.


> boy, does this guy's ethos for fossil impress me a lot more than the justifications i hear for the insane complexity of git.

git is not complex though. It just has an absolutely godawful "high-level" user interface.

Porcelain (high-level commands) are really shortcuts on sets of low-level operations usually performed together, which is why e.g. `checkout` is used simultanously for reverting working copy chances and switching the entire thing to arbitrary commits or branches.

That also makes git extremely hard to learn "top down" beyond rote learning of a few commands: from that POV the CLI is completely incoherent so you can't really build an intuition for what command could do what operation. The terrible naming doesn't help either.

If you have the time and desire to start from the on-disk storage (ignoring packfiles) you can build your own in a few hours.


This is slowly improving, at least; `git switch` is now a thing, for example. https://git-scm.com/docs/git-switch/2.23.0

At a glance, it seems to do the same as `git checkout` and `git checkout -b`. Can someone give a TLDR about the difference?

Edit: gonna be fun to see how guides like https://www.atlassian.com/git/tutorials/using-branches/git-c... are becoming outdated in a way where switching/checking out doesn't mean the same thing anymore, but the article uses them interchangeably


These three invocations are not present in switch:

    git checkout [-f|--ours|--theirs|-m|--conflict=<style>] [<tree-ish>] [--] <paths>… 
    git checkout [<tree-ish>] [--] <pathspec>… 
    git checkout (-p|--patch) [<tree-ish>] [--] [<paths>… ]
That is, aside to / on top of the ability to switch to a specific branch or commit, checkout

* is used to revert working tree changes, possibly interactively

* is used to set working tree changes to either side of a merge (in case of conflict)

* is used to revert specific local paths to a historical version thereof, possibly interactively


Regarding the last point: some OpenBSD developers are currently writing their own front end ("got") using the same on-disk structure as git.

http://gameoftrees.org/goals.html


That looks interesting.

However there have been lots of "porcelain replacements" efforts in the past, but most of them either were abandoned because by the time they'd built something complete enough the author had had to so deeply understand the underlying model they could use the official fine, or remained niche because they were extremely opinionated (and limited) with respect to the workflows they'd support.


The point of that project is to create a BSD-licensed Git clone, not build better porcelain.

Nah, the point is definitely not to create a Git clone. They're developing their own system designed around a more centralized structure than Git (closer to CVS/SVN), but remaining compatible with Git on disk.

I encourage you to read its source code, a mishmash of several dialects of tcl, html and js and sql inlined in c, etc. I made an effort to use fossil throughout the summer, but lack of tooling + a lot more git experience caused me to abandon it except for some existing repositories. Note, I have tremendous respect for the authors of fossil and I hope for its success in the long term.


The moralising about it doesn't make much sense. In other professions, it's not "lying" or "dishonorable" to present your work in the most clear and understandable way to others, it's a basic condition for entry. I don't see how software's any different.

We had a proposal around commit standards at work recently: we came to the conclusion that rebasing your private branches and squashing out irrelevant commits is the recommended flow, to make reviews easier.


> In other professions, it's not "lying" or "dishonorable" to present your work in the most clear and understandable way to others, it's a basic condition for entry. I don't see how software's any different.

That's because you joined the "it's a story" camp without noticing. If you instead view git history as a sort of gentlemen's audit log, then "refactoring" it is indeed both lying and dishonorable. And in no other profession it would be OK to mess with something used for review / audit purposes.

Personally, I'm in favor of the history/audit log view, because you can't predict today what information you may need in the future, and refactoring git history throws away a lot of historical context.


Not GP, but I'm certainly in the story camp. My commit history is not an audit log, it's documentation about the choices I made while writing my code, and a tool that helps me combine relevant parts.

In fact, I'm sure that you're lying too: every time you Ctrl+Z in between commits, you're removing parts of your audit log. Choosing when to commit is telling a story.

(Unless Fossil/whatever system you're using stores every character you ever type - I'm not familiar with it.)

Edit: To use an example from the article:

> Yet, sometimes we come upon a piece of code that we simply cannot understand. If you have never asked yourself, “What was this code’s developer thinking?” you haven’t been developing software for very long.

With my commit, I'm telling other developers/my future self what I was thinking, rather than having them try to figure that out by themselves from my code. The assumption there is that I'm better at explaining my thinking than my code is.


> With my commit, I'm telling other developers/my future self what I was thinking, rather than having them try to figure that out by themselves from my code.

Sure, that is the intention. But humans are very fallible in communication and communication is hard. Being later able to see what you did along with of what you explained is of obvious utility.


> In fact, I'm sure that you're lying too: every time you Ctrl+Z in between commits, you're removing parts of your audit log. Choosing when to commit is telling a story.

And I disagree here. The way I view it, my editor is my sandbox, I keep playing in it until I have something that I want to enter into record. When I commit work, I enter it into record, with a commit message explaining what the piece of work is.

But to be honest, my repo clone is sort of my own sandbox too, I don't consider Git a fully append-only log, so I'd sometimes do commit editing on my local repo. But once published, I consider it immutable.

(Or rather I'd prefer to; the team in the main project I'm working on right now has a rebase-heavy workflow.)

On a practical note, I'm fine with history cleanup done on the spot. E.g. I've committed three things in the past hour, I squash them together. Or I rearrange stuff I made over the course of the day. But I don't like attempts at messing with history that's many days old (or more), because at that point the person doing the cleanup doesn't have the context of the work in their minds anymore, so history edits throw away valuable information.


> And I disagree here. The way I view it, my editor is my sandbox, I keep playing in it until I have something that I want to enter into record. When I commit work, I enter it into record, with a commit message explaining what the piece of work is.

The way I view it, my development branch is my sandbox. I keep playing in it until I have something that I want to enter into the record. When I merge work, I enter it into record with a merge message explaining what the piece of work is.

I don't see the point of immutably recording typos a reviewer noticed. I view a pull request (or whatever you call them) as a patch series. Something to be tinkered with, rewrite and resubmit until it's considered good enough. If I want to record every stumble, issue and hare-brained idea, that can go into the patch messages.


So if I fix a typo and commit the fix in my feature branch within a few hours, it's ok to squash the fix, but if someone notices it on code review the next day, I should keep the typo fix as a separate commit?

What's the value of having this in your audit log? Is it more valuable than being able to revert the commits without having to first revert the typo fix, or do a git bisect without running into broken commits that exist only to preserve an audit log?


> And I disagree here. The way I view it, my editor is my sandbox, I keep playing in it until I have something that I want to enter into record. When I commit work, I enter it into record, with a commit message explaining what the piece of work is.

I guess I just don't see how that's not "telling a story". The commit is not a recording of the process you went through to get the code in a certain state, but a piece of work of which you decided it should be entered into record, with a message you wrote that explains it.


>> The way I view it, my editor is my sandbox

This position makes no sense to me. How can your editor be your sandbox but your branch before pushing is not? That’s an arbitrary distinction without merit.

Noone rebases on a public branch so both scenarios are about what you do before publishing your work.

I think the whole conversation about history is missing the point. The problem we’re trying to solve is complexity. Having a bunch of out of date commits floating about does nothing to reduce complexity.


I view any branch that only I work on as my sandbox as well. You should only ever rebase non-public branches in any case. So I don't see the problem.

Or you could record the original graph of commits during a rebase. When the actual content wasn't that important, you could choose to discard the old commit data but keep that metadata graph. (That way someone who builds on top of an old commit can use that graph to get the right merge logic.)

But then you would have reimplemented mercurial's obsolescence markers.


I can acknowledge that some settings exist where you might want a strict audit log, where you have pre-agreed the rules and legal requirements, and must stick to them.

I don't believe most software development work requires such a strict log. I believe that most teams should be free to use git in the most productive way for them. I believe the "PR-as-story" camp is more productive, and already have war stories where not having that hurt. Always happy to hear war stories from other camps.


For you to believe that "Rebasing is the same as lying" is mistaken, it isn't necessary for you to be in the "it's a story" camp: it's only necessary for you to believe that the "it's a story" camp exists.

I prefer to squash away commits that leave the code in a non-working state. Because it makes using git bisect easier later if you need to find a commit that broke something.

A habit I learned when working with SVN-versioned project is to never commit code in a non-working state. There's rarely ever a reason to, if you're thoughtful about your work. With completely new code, you just keep it disconnected from the main product and then you only have to ensure it compiles. With changes, you make them incrementally. Even large-scale changes can be done without breaking commits this way.

So don't commit code before it's gone through automated CI testing, codereview and QA? Or never make mistakes that require any of those to catch them? (At least before CI, it might not even compile correctly everywhere)

Git can easily handle getting rid of or modifying commits, so why not make use of that ability in a controlled manner?

I.e. I'd typically not re-build the history of the master or another major branch just to fix a bug introduced a while back at the source (although I guess backporting fixes might be similar). But for work in progress, one or a few commits about to be merged, smaller fixes are better made at the source. Not to talk about entirely irrelevant commits like "it's the end of the day, let's commit and push to a branch so it's not just on my machine".


[flagged]


> I edit emails before I hit send. Guess I should apologize for propagating massive campaigns of deception.

I'm not talking about that. But do you edit your diary to "refactor" the things you wrote a month or year earlier?

Related: people here seem to implicitly assume "never rebase published branches". I don't know how widely accepted that rule is, but in the teams I worked with, the workflow was "every now and then, rebase develop / feature branch on top of master, and then push --force".


Your "teams" violate standard best practice employed virtually everywhere I've worked and every open source project I've worked on.

In public debates about this people often defend rebasing feature branches rspecially if they feel nobody else will work on the same branch. My personal experience bears this out as well.

It's very easy to accidentally break your own rules, si this goes wrong regularly and the gallout becomes apparent only at some point down the pipeline.

It's an added cognitive burden to be vigilant about which git mode you are working in.


How do you tag releases? Wouldn’t the commit hash change out from under the commit you cared about?

People rarely tag releases from feature branches I think.

But if you did it, hopefully the tag would continue to refer to the pre rebase revision. Anyone know the answer? This is the kind of uncertainty that rebase introduces to the semantics!


Tags are immutable, the tag would still point to the original commit. There's no uncertainty at all.

It makes reviews easier, as well as debugging with things like bisect. If you leave commits with typos or syntax errors in your master branch history, determining weather a commit is good or bad for the right reason becomes a much more arduous task. If you do a rebase + squash on merge approach, the commit history is succinct, descriptive, and healthy, which I’ve found immensely beneficial.

(Disclosure: I currently work at GitHub, but these opinions are my own)


> The GitHub staff says that the four pillars of their organization are

> 1. DevOps

> 2. Security

> 3. Collaboration

> 4. Insights

Interesting to see "Security" as number two.


IDK who D. R. Hip thinks agrees that rebasing is bad, but I sure as heck don't.

Rebasing is good on private branches: it lets you write clean linear history. Conversely, intra-project-branch history is of little or no interest whatsoever once pushed upstream. Rebasing published branches is not a good thing, of course, but fans of rebasing don't propose that.


Winners are not about who's best. Git is not better than hg, Koalas are not better than giant sloth, Windows is not better than linux. So? All major hosted scm dropped hg support. What op tries to do seems emphatically Quijote-ish

Best is a subjective term. When Windows won the desktop it was the best at running MS Office. I don't know enough about hg to say how git is better, but I'm sure there are areas where git excels. From what I've read hg has a better cli, but that alone may not be enough to be the 'best'.

I maintain that GitHub itself is actually an important reason that Git won out.

Isn't the English word Quixotic?

I don't know and welcome any correction to my grammar or spelling.

yes! it is although it's relatively obscure. (native English speaker here.)



Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: