Hacker News new | more | comments | ask | show | jobs | submit login
Move Fast and Fix Things (githubengineering.com)
538 points by samlambert on Dec 15, 2015 | hide | past | web | favorite | 90 comments

I'll highlight something I've learned in both succeeding and failing at this metric: When rewriting something, you should generally strive for a drop-in replacement that does the same thing, in some cases, even matching bug-for-bug, or, as in the article, taking a very close look at the new vs. the old bugs.

It's tempting to throw away the old thing and write a brand new bright shiny thing with a new API and a new data models and generally NEW ALL THE THINGS!, but that is a high-risk approach that is usually without correspondingly high payoffs. The closer you can get to drop-in replacement, the happier you will be. You can then separate the risks of deployment vs. the new shiny features/bug fixes you want to deploy, and since risks tend to multiply rather than add, anything you can do to cut risks into two halves is still almost always a big win even if the "total risk" is still in some sense the same.

Took me a lot of years to learn this. (Currently paying for the fact that I just sorta failed to do a correct drop-in replacement because I was drop-in replacing a system with no test coverage, official semantics, or even necessarily agreement by all consumers what it was and how it works, let alone how it should work.)

This is probably very context dependent, because I've learned the opposite.

For example, I was rewriting/consolidating a corner of the local search logic for Google that was spread throughout multiple servers in the stack. Some of the implementation decisions were clearly made because of the convenience of doing so in a particular server. But when consolidating the code into a single server, the data structures and partial results available were not the same, so re-producing the exact same logic and behavior would have been hard. Realizing which parts of the initial implementation were there because it was convenient, and which were there for product concerns let me implement something much simpler that still satisfied the product demands, even if the output was not bitwise identical.

I didn't read the parent comment as reproducing the exact same logic perfectly. More as a definition of the interface between the external code and the part to replace and matching that interface closely with the replacement.

This isn't always possible but seems like a reasonable objective given my experience.

You can break this down even more.

As we speak, I'm "replacing" old code by just writing a wrapper around it with the new API it should have.

Then I'll rewrite it without the wrapper, bug-for-bug.

And then I'll actually fix the bugs.

There's a simpler method than this that provides even more surety, used by e.g. LibreSSL:

1. Start writing your new implementation (or heavily refactoring your old implementation, whichever), but in parallel, for each legacy function you remove, write an equivalent "legacy wrapper" function that implements the old API (and ABI; you have to return the same structs and all) in terms of the new API.

2. As you develop the new code, continue to run the old code's tests. (This shouldn't require any work; as far as the tests can tell, the codebase containing all of {the new code, what's left of the old code, and the legacy wrapper} presents exactly the same ABI as the old codebase.) The old tests should still all pass, at every step.

3. Once you're finished developing the new code, and all the old code's tests are passing, rewrite the tests in terms of the new API.

4. Split off all the legacy-wrapper code into a new, second library project; give it the new "core" library as a dependency. Copy all the old tests—from a commit before you rewrote them—into this project, too. This wrapper library can now be consumed in place of the original legacy library. Keeping this wrapper library up-to-date acts to ensure that your new code remains ABI-compatible; the old tests are now regression-tests on whether a change to the new "core" library breaks the legacy-ABI-wrapper library.

5. Document and release your new core as a separate, new library, and encourage devs to adopt it in place of the legacy library; release the legacy-wrapper (with its new-core dependency) as the next major version of the old library.

When all-or-nearly-all downstream devs have transitioned from the legacy wrapper to the new core, you can stop supporting/updating the legacy wrapper and stop worrying about your updates breaking it. You're free!

In LibreSSL, if you're wondering, the "new core" from above is called libtls, and the "legacy wrapper" from above is called libssl—which is, of course, the same linker name as OpenSSL's library, with a new major version.

This is a textbook example of 'How to Kill an OSS project'

#1, #4, and #5 are completely unnecessary.

If it's desirable to 'kill with fire' the old codebase, you could always create a fork and merge the breaking changes on the next major release.

Creating a second library is a terrible idea for 2 reasons...

You lose the legacy source control history which is (arguably) more valuable than the current source because it can be used to research solutions to old problems.

You split the community which is devastating to the culture and survivability of an OSS project. Even something as a simple name change will have massive negative impacts on the level of contributions. Only the most popular and actively developed projects can get away with forking the community.

LibreSSL will likely survive because everything in the world that requires crypto uses OpenSSL. Even then, that was the absolutely wrong way to go about things.

The only solid justification for a rename and complete rewrite, is if there are license/copyright issues.

> You lose the legacy source control history which is (arguably) more valuable than the current source because it can be used to research solutions to old problems.

No reason for that. Both projects—the wrapper and the new core—can branch off the original. Create a commit that removes one half of the files on one side, and the other half of the files on the other, and make each new commit "master" of its repo, and now you've got two projects with one ancestor. Project mitosis.

> You split the community

How so? I'm presuming a scenario here where either 1. you were the sole maintainer for the old code, and it's become such a Big Ball of Mud that nothing's getting done; or 2. the maintainer of the old code is someone else who is really, really bad at their job, and you're "forking to replace" with community buy-in (OpenSSL, Node.js leading to the io.js fork, gcc circa 1997 with the egcs fork, MySQL leading to MariaDB, etc.).

In both scenarios, development of the old code has already basically slowed to a halt. There is no active community contributing to it; or if there is, it is with great disgust and trepidation, mostly just engineers at large companies that have to fix upstream bugs to get their own code working (i.e. "I'm doing it because they pay me.") There are a lot of privately-maintained forks internal to companies, too, sharing around patches the upstream just won't accept for some reason. The ecosystem around the project is unhealthy†.

When you release the new legacy wrapper, it replaces the old library—the legacy wrapper is now the only supported "release" of the old library. It's there as a stopgap for Enterprise Clients with effectively-dead projects which have ossified around the old library's ABI, so these projects can continue to be kept current with security updates et al. It's not there for anyone to choose as a target for their new project! No new features will ever be added to the wrapper. It's a permanent Long-Term Support release, with (elegant, automatic) backporting of security updates, and that's it. Nobody starting a project would decide to build on it any more than they'd build on e.g. Apache 1.3, or Ubuntu 12.04.

> Even something as a simple name change will have massive negative impacts on the level of contributions.

Names are IP, obviously (so if you're a third party, you have to rename the project), but they're more than that—names are associated in our brains with reflexes and conventions for how we build things.

The reason Perl 6, Python 3, etc. have so much trouble with adoption is that people come into them expecting to be able to reuse the muscle-memory of the APIs of Perl 5/Python 2. They'd have been much better off marketed as completely new languages, that happened to be package-ecosystem-compatible with the previous language, like Elixir is to Erlang or Clojure is to Java.

If these releases were accompanied by their creators saying "Python/Perl is dead, long live _____!" then there'd have been a much more dramatic switchover to the new APIs. Managers understand "the upstream is dead and we have to switch" much more easily than they understand "the upstream has a new somewhat-incompatible major version with some great benefits."

One good example: there's a reason Swift wasn't released as "Objective-C 3.0". As it is, ObjC is "obviously dead" (even though Apple hasn't said anything to that effect!) and Swift is "the thing everyone will be using from here on, so we'd better move over to it." In a parallel reality, we'd have this very slow shift from ObjC2 to ObjC3 that would never fully complete.


† If the ecosystem were healthy, obviously you don't need the legacy wrapper. As you say, just release the new library as the new major version of the old library—or call the new library "foo2", as many projects have done—and tell people to switch, and they will.

It's easy to find healthy projects like this when you live close enough to the cutting-edge that all your downstream consumers are still in active development, possibly pre-1.0 development. The Node, Elixir, Go and Rust communities look a lot like this right now; any project can just "restart" and that doesn't trouble anybody. Everyone rewrites bits and pieces of their code all the time to track their upstreams' fresh-new-hotness APIs. That's a lot of what people mean when they talk about using a "hip new language": the fact that they won't have to deal with stupid APIs for very long, because stupid APIs get replaced.

But imagine trying to do the same thing to, say, C#, or Java, or any other language with Enterprise barnacles. Imagine trying to tell people consuming Java's DateTime library that "the version of DateTime in Java 9 is now JodaTime, and everyone has to rewrite their date-handling code to use the JodaTime API." While the end results would probably have 10x fewer bugs, because JodaTime is an excellent API whose UX makes the pertinent questions obvious and gives devs the right intuitions... a rewrite like that just ain't gonna happen. Java 9 needs a DateTime that looks and acts like DateTime.

Interesting! It seems almost as if the order doesn't matter, so long as each step is incremental and maintains the invariants of the previous step.

Sounds like a perfectly reasonable deprecation strategy.

To go a bit further...

Document and mark the old endpoints for deletion on the next major release.

Mark the old API endpoints as deprecated in minor releases when the new implementation is completed and the old API endpoint is wrapped.

Isn't that the whole purpose of SemVer?

Yup, this is how I do it too, generally there's an implementation that needs an API and a new implementation.

I've generally found that do the smallest possible thing that results in an improvement to be the best way forward.

There's always more use cases for the existing code than you think there are.

There's always more corner cases handled by the existing code than you think there are.

There's always more bug fixes in the existing code than you think there are.

Combine all of those, and writing a replacement is always much harder than you expect it to be.

> ...that is a high-risk approach that is usually without correspondingly high payoffs

That's probably true, but it's also true that over a long enough timescale (100 years, to trigger a reductio ad absurdum) there is a very high risk that not replacing or rewriting that code will sink your technology and possibly your organization.

Just because the risk will be realized in the long run doesn't mean it's not a risk. And if the worst-case scenario is death of the entire organization, then the math could very well add up to a full rewrite. Most business managers are not prepared to think strategically about long-term technical debt. It's the duty of engineers to let them know the difference between "not now" and "never". And the difference between "urgent" and "low priority".

I think jerf was trying to say that when re-implementing something, re-implement it first then improve it, don't do both at once.


That means even if your new pretty-shiny is going to eventually have a new API, it is usually worth it to still offer an old shim in with the old interface. It may seem like extra useless work, but it reduces risk.

Far be it from me to suggest rewrites are never useful. Half my career could be characterized as "rewriting the things nobody thought could be rewritten because they were too messy and entrenched". It's personally a bit risky (you end up owning not only the bugs you created, but the bugs you failed to reproduce... doubly-whammy! better be good at unit testing so at least the new code is defensible), but the payoff is pretty significant, too, both for your career and for your organization.

> re-implement it first then improve it

Right. I'm saying (poorly, I guess) that that doesn't always work. If the technical debt is sprinkled throughout your system or your organization is sunk deeply into a broken paradigm, you're in trouble. Sometimes you need to provide a completely new technology that serves many of the same use cases in a different way.

I'm not saying we should reach for rewriting things as our first instinct.

I am saying that your COM interface will need to be replaced some day. Or your COBOL business logic will become a liability. Or your frames-based web API will cost you business.

Incremental change leads you to a new local maximum, but sometimes that local maximum stunts your growth or even proves deadly.

One strategy that I've seen work for this kind of deep architectural change is to write the new system, then write a shim that provides a compatibility layer (however hacky and ugly) to emulate the old system. This lets you test the new system without then having to also test everything that the system interacts with. And then, start replacing usages of the shim with direct interaction with the newer prettier system.

Think the win32 to NPAPI transition.

Yup! Loved that article when I first saw it on HN - I felt it described well a lot of upgrade projects undertaken at my then-employer.

This koan also keeps your change history coherent which will likely be useful for the person that has to fix it later.

It boils down to "don't change the implementation and the interface at the same time". It doesn't say "don't ever change the interface".

>but it's also true that over a long enough timescale (100 years, to trigger a reductio ad absurdum) there is a very high risk that not replacing or rewriting that code will sink your technology and possibly your organization.

Could you elaborate on what basis you claim this as a truth?

It's largely conjecture, admittedly. But there are very few pieces of software that last 40 years. Most companies don't plan to be out of business in the next 40 years. And many tech companies have gone out of business in even the last ten years because they weren't agile enough to adapt quickly to new technological changes.

I don't think that argument holds. Usually it's true, that if something hasn't lasted a long time, it likely won't last a long time in the future. But we live at a quasi-special time, close to a boundary condition.

There exists no software that has existed for 100 years, because software was invented sometime in the past hundred years. You can reductio ad absurdum that argument by saying, the week after the first software was written, that no pieces of software have lasted more than a week, and therefore very few pieces of software will last more than a week.

So very few pieces of software have lasted 40 years, because very few pieces of software were written at least 40 years ago.

While I agree that in the abstract, technical debt can catch up to you, I'm just not sure that its impact is necessarily such that it cannot be contained or mitigated.

I mean UNIX is still around for 30 odd years. It hasn't sunk. It carries tremendous technical debt, in terms of bad design, in terms of implementation bugs that need to be carried forward, etc.

My sense is that the more organizations that depend on a piece of tech, the more chance there is that it is going to age well, warts and all.

I've learned a variation on this theme, which is more specific:

You have some code that, for whatever reason, you think is not very good. There is a reason why you have ended up with code that is not very good.

If your action is to sit down and write it all again, you should anticipate getting a very similar result (and this has been the outcome of every "big rewrite" effort I've ever seen: they successfully reproduced all the major problems of the old system).

The reasons why this happen are probably something to do with the way you're doing development work, and not the codebase that you're stuck with. Until you learn how to address those problems, you should not anticipate a better outcome. Once you have learned how to address those problems, you are likely to be able to correct the problem without doing a "big rewrite" (most commonly by fixing them one piece at a time).

Sometimes I see people attempt a "big rewrite" after replacing all of the people, thinking that they can do a better job. The outcome of this appears to me to invariably be that the second team who tried to build the system with no real experience end up following a very similar path to the first team that did the same thing (guided by the map that the first team left them, and again reproducing all the same problems).

From these observations I draw one key conclusion: the important thing that you get from taking smaller steps is that you amplify your ability to learn from things that have already been done, and avoid repeating the mistakes that were made the last time. The smaller the step, the easier it becomes to really understand how this went wrong last time and what to do instead. Yes, the old codebase is terrible, but it still contains vitally important knowledge: how to not do that again. Neither writing new code from scratch without touching the old, nor petting the old code for years without attempting to fix it, are effective ways to extract that knowledge. The only approach I've ever really seen work is some form of "take it apart, one piece at at time, understand it, and then change it".

I think what works even better is to have "permission" to gradually change both the old and the new. It can drastically simplify the process of creating a replacement, if you're only replacing a slightly more sane version of the original instead of the actual original.

The biggest issue with new things is unknown unknowns.

This is as true as it can get. I'd like to reemphasis the bug-for-bug. Sometimes that's the only way you can drop-in replace a system.

The strategy of proxying real usage to a second code path is incredibly effective. For months before the relaunch of theguardian.com, we ran traffic to the old site against the new stack to understand how it could be expected to perform in the real world. Later of course we moved real users, as incrementally as we possibly could.

The hardest risk to mitigate is that users just won't like your new thing. But taking bugs and performance bottlenecks out of the picture ahead of time certainly ups your chances.

Out of curiosity - when you've done this type of proxy test, what do you do about write operations? Do you proxy to a test DB, or do you have your code neatly factored to avoid writing on the test path (I guess most code I've worked on that needed a rewrite also wasn't neatly factored :) ).

> The hardest risk to mitigate is that users just won't like your new thing.

Do they ever? Why change the part users are used to?

This is tangential, but given the increasing functionality and maturity of libgit2, I wonder if it would yet be feasible to replace the Git command-line program with a new one based on libgit2, and written to be as portable as libgit2. Then there would be just one Git implementation, across the command line, GUIs, and web-based services like GitHub. Also, the new CLI could run natively on Windows, without MSYS.

While I think the libgit2 initiative is fantastic, I don't think there needs to be just one Git implementation.

One of my favourite things about git is that the underlying storage and protocol is really simple and straight-forward to implement. You could do a lot of it in shell scripts, if you wanted to.

The stateless storage is simple and consistent, but the thing that does vary is the various operating algorithms: diff, merge, garbage collection, etc.

This creates a really interesting ecosystem where you could potentially have third-party tools that have some secret sauce producing more efficient diffs or fewer conflicting merges but still base it entirely on the open git ecosystem and remain completely backwards-compatible with all the other tooling.

No matter how simple and straight-forward it is, someone is going to fuck it up. And they're going to do so in a way that isn't immediately detectable, but screws the rest of the company because now they have to support something using the screwed up implementation.

Git has a bunch of tests [0]. I expect that not all of them are git-the-official-CLI-specific.

> And they're going to do so in a way that isn't immediately detectable, but screws the rest of the company because now they have to support something using the screwed up implementation.

eeeeeeeeeh. Sensible companies will figure out how to massage the broken data into the correct form, then go on using the correct software.

[0] https://github.com/git/git/tree/master/t

That is my dream. Right now, many Windows GUIs (SmartGit/SourceTree) use 'git.exe' to manage the actual Git repositories.

If libgit2 is fully mature, I can imagine more GUIs/tools will be built to manage/analyze the git repositories.

> Right now, many Windows GUIs (SmartGit/SourceTree) use 'git.exe' to manage the actual Git repositories.

For what it's worth, all of the Git functionality in Visual Studio uses libgit2.

I believe github windows client is going that route https://github.com/blog/1127-github-for-windows.

I also remember some blog post going into sync/async details of git.exe vs libgit2 stuff. Will try to google it.

This is the article you speak of: http://githubengineering.com/git-concurrency-in-github-deskt...

The AsyncReaderWriterLock mentioned in blog does not directly show up in Google, but it appears to be based on the one in this blog post: http://blogs.msdn.com/b/pfxteam/archive/2012/02/12/building-...

At GitLab we're very grateful for all the work that has been done on libgit2 ​by GitHub and others, and we plan to move to it completely.

How does Scientist work with code that produces side effects? In the example, presumably both the new and old each create a merge commit. Maybe these two merge commits are done in in-memory copies of the repo so that the test result can just be discarded, but what about in the general case where a function produces an output file or some other external effect?

I would think that the operation would either (a) have to be pure or (b) be executed in two different environments. I think going for (a) is the easier approach. If you produce an output file, make a pure operation that generates the contents, then write it as a subsequent operation. Now you can test the contents against each other, but only actually write one of them.

Basically, create an intermediary object that represents your state change and test those. Then "commit" the change from control and discard the one from the experiment.

You'd only want to use pure functions in this manner. If external state is being modified, you can use a monad, or similar, to contain it.

I am trying to understand why the new merge method needed to be tested online via experiment. Both correctness and performance of the new merge method could have been tested offline working with snapshots (backups) of repos. Could a github engineer shed more light here?

Author here. 5 years ago I would have agreed with you and logged e.g. 10 million merge requests to replay them offline. But one thing I've found over the years (which may seem obvious in retrospect) is that staging environment are not identical to production. Particularly not when it comes to finding sneaky bugs and performance regressions -- the code doesn't run on the same exact environment it will run when it is deployed (it has different input behaviors, and most importantly, it has different load and performance characteristics).

The question then becomes "why would you run these experiments offline when you can run them online?". So we simply do. I personally feel it's a game changer.

> why would you run these experiments offline when you can run them online?

It probably doesn't apply in this case, but many bugs cannot be replicated in the production environment. At least, you don't want to break things in production to see if your software does the right thing in adverse scenarios.

Some would say that's a really good strategy. Keeps you on your toes...


Couldn't agree more. It is almost impossible to have a staging environment which is EXACTLY like the Production environment. A slightest difference can introduce bug in Production which will get missed out in staging. And if you can run your experiment in Production without any downside, why not?

TL;DR: In theory, theory and practice are the same. In practice, they are not.

If you read about what they're doing, they basically are doing that. The tests are run independently of the production code, and production is just providing the test cases.

That is harder than just running the experiment online.

Speculation, but if they already have the infrastructure to run the test online then it was probably easier than building one-time-use tools to test backups.

maybe they just do everything in production. "move fast and fix things"

Seems like the biggest takeaway is "have good tooling and instrumentation". I'm working with a complicated legacy production system, trying to rebuild pieces of it, and we have little or no instrumentation. Even _introducing_ such tooling is a potentially breaking change to production systems. Ach schade.

Very cool. I like this parallel execution of the original version and the update with comparisons between the two. They use a ruby package developed in house that has been made open source, Scientist. Does anyone know if there is an similar type package for python (preferably 2.7) development? It seems like an interesting area in between unit tests and A/B tests.

> Finally, we removed the old implementation — which frankly is the most gratifying part of this whole process.

On average, I get much more satisfaction from removing code than I do from adding new code. Admittedly, on occasion I'm very satisfied with new code, but on average, it's the removing that wins my heart.

I've long been dreaming of adding the following tagline to my resume: "Fixing bugs by removing code since 2006"

TIL that github used to merge files differently than git because it used its own merge implementation based on git's code, to make it work on bare repos. Showcases a benefit of open formats and open source, showcases a downside as well (I'd never guess it might merge differently.)

It's a good thing nobody contributes to my github repos since noone had the chance to run into the issue...

I wish they would add the ability to fast-forward merge from pull requests. I know many large projects (including Django) accept pull requests but don't merge them on Github simply because of the mess it makes of the history.

In my team's projects, code review in PRs made a mess of the history (certain devs in particular :P). We switched to a squash merge based workflow to address it, git reflow is our particular poison: https://github.com/reenhanced/gitreflow

This is inspiring reading. One may not actually need the ability to deploy 60 times a day in order to refactor and experiment this effectively, but it's clearly a culture that will keep velocity high for the long-term.

In the pursuit to get-things-done, we forget fundamentals and more often than not its the sound fundamentals that come in handy when your product has grown beyond your 1 or 2 member original tech team.

For operations that don't have any side effects, I can definitely see how you could use the Science library.

I'm curious though if there are any strategies folks use for experiments that do have side effects like updating a database or modifying files on disk.

The first thing to do is to try to minimize the scope of mutating operations: e.g, decompose a monolithic read-write operation into independent load, transform, and store. You can then easily test as much as possible (the load and the store) side-by-side. The parts that must have side effects are still hard, but at least there are fewer of them. (One variant of this would be to build your code such that you can always intercept side-effects, and then block the new code's side effects and compare with the old code)

The next thing is to try to take advantage of idempotence (and, by extension, try to make as many of your mutating operations idempotent as possible): there's nothing completely risk free, but you can at least verify idempotence for either ordering of new and old code, and if you're factored right, you can run both paths on the same input and verify they have the same side-effects and output.

Finally, making the observation that the new code must in general be backwards-compatible with the old code, and both versions need to be able to run concurrently (because this situation will always exist during deployment): in the worst case, you can always start with a limited deployment of the new path, which limits the amount of damage done if the new code is bad.

Point the control to the real database, and the experiment to an unused database. After each request through Scientist, the two databases should be identical.

Or, make the output be the SQL command used to mutate the database. If the two outputs are different, then you've found something that needs investigation.

Github sounds like a great place to work.

Wow, strange that people weren't reporting these merge issues when they were clearly impacting people.

My read of the article implies that they were running the new method on the side, and comparing the results to the old method that was still running in production. They got to 100% before they actually pulled the lever on what customers would use.

Edit: Ah, I see - talking about the Git bugs, not the differences . I’m actually not surprised that “256 (or a multiple) merge conflicts” was never noticed (or at least root-caused and fixed) by the entire git community.

Wonderful ability to use a large userbase as a giant fuzzer.

I think OP was talking about the issue in git itself that caused merges with mod 256 conflicts to go though and be committed, including all the merge error markers.

This happened on their live system (and would have happened on the command line for local git users), so OP (and incidentally, me too) was wondering how that wasn't noticed and wasn't causing support issues (it probably was which might have been another reason for this refactoring)

I think it is a combination of two things:

- the mod-256 conflict bug is exceedingly rare. Keep in mind that this is mod-256 individual hunk conflicts in a _single file_. Most files in a merge with conflicts have a handful of hunks. Over all of the testing at GitHub, only a single merge triggered this bug, and it was on a long repetitive file with automated changes.

- the failure case was to quietly accept the merge. The result was obviously bogus, but didn't look any different than a user accidentally checking in the merge conflict markers. So if it was happening, I'd suspect that it went undiscovered either because the merge results were never used (e.g., it was a test-merge to feed the PR "Merge" button status) or the users simply scratched their head and fixed it.

I would guess abritishguy was talking about the two git command line bugs they found.

I haven't read the whole article yet, so I might have missed something; but how do we know that people weren't reporting these issues?

I've always had to report issues to GitHub via email as they do not have a public issue tracker (something I've always found a bit ironic).

It's interesting that git is the same. [EDIT:] ...in that all issues and PRs are emailed rather than entered into a web app.

I'm not familiar with the Git project's inner workings, but their website git-scm.com tells me they are hosted on GitHub, which has a public issue tracker: https://github.com/git/git-scm.com/issues

GitHub, however, only allows issues to be reported via private mail. I'm not aware of a 'public' issue tracker for GitHub anywhere (even if a mailing list).

This is the web application for the git-scm.com site. It is meant to be the first place a person new to Git will land and download or learn about the Git SCM system.

This app is written in Ruby on Rails and deployed on Heroku.

git remains the same, indeed :)

Nothing really to contribute or ask, other than to say that I really enjoyed the writeup. Although I have nothing coming up that would use the code, the new library sounds really neat. Kudos!

Very interesting, definitely gonna try this out as I have seen similar use-cases.

Any change Github is at anytime going to show the specific merge-conflicts for a PR that cannot be merged?

Humans will always reverberate around truths like this.

The emphasis shift on breaking vs fixing looks like a good example of how fashion trends in tech create artificial struggles that help new people understand the "boundaries" of $things.

Fashion's like a tool for teaching via discussion

Edit: I'm just commenting on what I percieve as a fashionable title not the article.

When running with Scientist enabled, doesn't that mean you will add both the runtime of the old/new implementation instead of just one implementation?

I could see this begin ok in most cases where speed is not a concern, but I wonder what we can do if we do care about speed?

We could go look at the implementation/documentation of Scientist to confirm, but nothing says the comparison has to be happening on the response thread. You could fork a new thread to run the unused implementation without impacting response time, as long as you have the server capacity.

The article mentions running both code paths in parallel, so it becomes a matter of whether they have enough capacity for an extra cost for a (rather specific) path of their code. For this specifically I would imagine the cost is negligible in the big picture, considering regular activity far outnumbers pull/merge requests.

Does anyone know what an "O(n) issue" is? I can think of a few possible meanings in the usage here, but I've never heard it before and they all seem wrong.

Using a linked-list where the actual access pattern is random and an hash table is more suitable would be the most obvious, especially since this is C code.

Similar things includes C-style string functions like strlen() and so on that require iterating over an unknown length array. Caching (or better avoiding!) this work can save a lot of computing time. Example from libgit2: https://github.com/libgit2/libgit2/commit/7132150ddf7a883c1f...

In fact C-style string is a rich source of O(n) performance issues. And git is full of strings like filenames.

To me, I would have called that an "O(n^2)" (or whatever the proper non-linear order is) issue. The issue isn't that the algorithm is O(n), but that it's not O(n) when it could be! The other interpretations by andrewaylett and daveguy seem to be more accurate (and also agrees with my own thinking). But I could also see it being used for "the asymptotic complexity was hiding a huge constant".

Have you heard your interpretation used in the wild?

It's an issue where processing something took O(n) time, when an algorithm to process it asymptotically more quickly (say O(log n)) would be more appropriate.

The old code probably works properly for the average case, but will really blow up on pathological cases. My best personal example was an optimisation pass that turned out to be exponential in the worst case. On average `n` was small, so no-one cared, until a pathological case came up where compile time ballooned to over half-an-hour. Working out that the operation could be done in a linear way only shaved a fraction of a second off the average case, but brought the half-hour case down to a few seconds.

"O(n) issue" is generic for some asymptotic optimization issue. It doesn't literally mean it was running in O(n) and there was a better O(n log n) solution. It could be O(n^2) and there was an O(n) solution or O(n) when there was an O(log n) solution or O(n!) when there was an O(n^2) solution. It just means there was a poor algorithm choice / design decision for some part of the code when a more efficient option is available.

It's notation denoting the performance of an algorithm. See https://en.wikipedia.org/wiki/Big_O_notation


The word "debt" is not just a financial term. There are debts of gratitude, debts to society, debts of honour, and so there are also technical debts.

Objecting to the name "technical debt" on the basis that it is not the correct financial use of the term is like objecting to the name "work day" on the basis that it isn't measured in joules. It's a category error.


Yet you have not given one example of a better term. If you're not part of the solution, you're part of the precipitate... here at the bottom of the page.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact