This has so far been working great.
As John Carmack wrote, it enables us to be a little more fearless in introducing new capabilities and putting them on trial.
For example, my recent trial implementation of disallowing more than one mutable pointer to the same variable being passed to a function:
It's hard to tell in advance what effect this will have on the existing D user code base.
When refactoring or doing performance work, its very helpful to create two variant implementations, for instance the simple, obviously correct version and the new, optimized version. One can then send the output from a property based tester or a fuzzer into both implementations and test: assert forall x . referenceVersion(x) == optimizedVersion(x). A good property based tester or fuzzer will give one high confidence that behavior has been both understood and reproduced. Since changes to legacy code and bug fixes are among the primary causes of defect introduction, these testing techniques usually quickly bring to ones attention how fallible one is.
For non code artifacts like configurations and build systems where output is assembled or generated. The Unix diff utility can be useful. One assembles or generates the artifacts in two directories. One directory is representative of the project before the change and the second directory after the change. Anything that shows up in the directory diff should correspond to exactly to what one expected to see as a change. Since the full actions of package assembly and generation are often opaque, this technique provides assurance in parts of the system development process where there are often few other safety nets.
You can even do that in a production environment, assuming the "optimised version" is properly protected so it can't take down the entire system if there's an error in it.
It seems he's talking more about refactoring/optimizing/debugging working code rather than adding new features. Then the old version of the code becomes a point of reference against which you test your new implementation.
Once you are very confident that your new implementation is no worse than the old implementation you have two choices.
1. Leave your old implementation around as a reference implementation, but mainly use your new implementation. This is useful if your old implementation is functional and readable but slow and your new implementation is less readable but faster. This helps other developers understand your new implementation.
2. Remove your old implementation altogether and just keep your new implementation in your codebase.
Either way the thing Carmack is advocating for is having an extended period of time where the two implementations coexist, even though one of the implementations is "redundant." He is explicitly advocating this despite the fact that you could in theory just retrieve the redundant implementation from version control and not have to "pollute" your codebase with a redundant implementation.
> If the task you are working on can be expressed as a pure function that simply processes input parameters into a return structure, it is easy to switch it out for different implementations. If it is a system that maintains internal state or has multiple entry points, you have to be a bit more careful about switching it in and out. If it is a gnarly mess with lots of internal callouts to other systems to maintain parallel state changes, then you have some cleanup to do before trying a parallel implementation.
If you have clean interface boundaries, it's easy to have two versions of something coexisting. If you don't, you can't just change the component in question, you have to change the whole program, which means you can't go this route. It's not just about deciding to do two implementations of something, it's about writing your whole program in ways that facilitate this style of development (no side effects, etc.)
The less AAA/polished version is to do this in code instead of in data. Maybe you comment out old.h in favor of new.h, or have a #if 0 ... #else ... #endif chunk in your .cpp files. This can be a lot quicker than adding conditional logic to hundreds of call sites, or implementing hundreds of forwarding stubs, if you don't have a nice dynamic module boundary already. You can make a define, two build variants, and side-by-side compare things that way too. Not quite as nice as runtime switching, but potentially a lot easier, if you've kept things from spilling into common headers too much and have reasonable link times.
Another alternative is to do this on your second computer, carefully synced to the same code/config/debug settings/feature flags/???, and hopefully with the same specs. This manages to work even if things spilled out into headers, but can require some painful coordination to keep in sync.
You can opt into using the reference implementation of a renderer with a flag.
You can opt into using a debug implementation of an allocator with a flag.
You can opt into your experimental hopefully faster renderer with a flag.
There's very little that's fundamentally different in these cases - maybe some get removed from some builds at compile time - and maybe your plans for the future are a little different.
But if plans change - perhaps maintenance of the reference implementation ends up being more costly and less beneficial than expected - you may very well end up deprecating, and then deleting, your reference implementation. To an outside observer not privy to your plans, this is identical to classic "I'm making my new work opt-in, then opt-out, then the only choice as development progresses" temporary feature flags.
EDIT: Heck, https://martinfowler.com/articles/feature-toggles.html even lists plenty of long-term and even static build-time configuration stuff under the category of feature toggles, so I really can't see what's hard to square, unless you're using a much narrower definition of "feature toggles" than is being used there.
But arguing over definitions is not particularly enlightening or fun, so let's take feature toggles in their full generality as DI frameworks.
I still don't think that's what Carmack is advocating for.
Indeed Carmack specifically calls out:
Code fearlessly on the copy, while the original remains fully functional and unmolested. It is often tempting to shortcut this by passing in some kind of option flag to existing code, rather than enabling a full parallel implementation. It is a grey area, but I have been tending to find the extra path complexity with the flag approach often leads to messing up both versions as you work, and you usually compromise both implementations to some degree.
Now Carmack isn't talking about feature flags quite in the same way you are (I think the short cut he's talking about here is not doing a full re-implementation and interleaving parts of the old code with new code at runtime).
Nonetheless in one important way he is talking about feature flags in the more general case, which is that his approach should lead to ideally no change to your original codebase (the only exception he lists is if your original code relies heavily on global state). This is not true for feature flags (or DI). You need to write new code just to support it. Indeed your link calls this out as the "carrying cost" of feature flags because of the additional "abstractions or conditional logic" they require.
Hence feature flags are not the AAA form of what Carmack is talking about here (although they are one choice of implementation for the particular functionality Carmack is getting at if not necessarily his philosophy of do no violence to the original code). There are several other candidates, but the one I personally gravitate towards is hot code reloading. You write your separate code, then hot swap it in. And hot swap back your old code as necessary (either for educational purposes or for testing purposes). You don't need to decide up-front which parts of your code need to be designed around feature flags; it's all hot-swappable. You don't need to write extra logic to enable feature flags, the hot-swapping makes it all come for free. You don't need even need to shut down your process and restart it to read a new configuration!
> What I try to do nowadays is to implement new ideas in parallel with the old ones, rather than mutating the existing code. This allows easy and honest comparison between them, and makes it trivial to go back to the old reliable path when the spiffy new one starts showing flaws. The difference between changing a console variable to get a different behavior versus running an old exe, let alone reverting code changes and rebuilding, is significant.
This is great advice, always being able to compare and easily switch between parallel implementations is key to maintaining systems long into the future and not doing the v2 rewrites every 6 months.
Creating a solid system that doesn't have breaking changes, but new paths/flows to use at will is key to debugging between the two and maintaining a live app/game/system/infrastructure.
Parallel implementations sometime takes more work and more care to signatures and surface/facades in the app, but the benefits in debugging, comparing multiple implementations and easing into new systems when they are fully ready, not rushed, is iterative knowledge from the trenches of the shippers that was learned through pain.
It allows you to perform black-box testing, recording and replaying test data (gathered from production).
From there you can refactor your code and even have the library run both new/old codepaths in production, raising errors if there is a mismatch.
To your point about constant rewrites, I think using a library like this while continuously refactoring existing code is a pretty exciting idea.
Too bad I need a Java version (maybe a good idea for a side-project).
Very appealing, but one problem I've found with this general idea is equivalent but non-identical results. A simply-solved example is a serialized set: different orderings differ, but are equivalent. You can get more complex ones, such as ASTs ax+bx and (a+b)x.
Such cases should be pretty rare, but they come up for me all the time.
It's kinda like surgery. You can't just rip open your patient, take out all their organs, then put them back after. You need to keep the patient alive. If you refactor but end up stuck halfway with a bunch of broken code, you either have to heroically finish the refactor and have it introduce minimal regressions and minimal issues or, more probably, you'll run out of time/people will get impatient and the refactor will be abandoned.
I call it “holistic” because instead of focusing on details and interfaces first, the possible problems kind of serendipitously find obvious solutions, progressively emerging from a bigger whole as I move forward. It feels like dropping a Rubik’s Cube from chest height and have it solve itself by the time it hits the ground.
See how the picture “emerges” in this timelapse. The overall thing is always present and things get increasingly precise as the drawer progresses, adjusting details from the overall design. What you don’t see is the timeline is the artist zooming in and out every other second to make sure the emerging details fit in the overall piece, like you’d zoom in and out of a fractal where everything influences the rest at every level (but note that this does not mean coupling! See how the boat gets scrapped at some fairly advanced point and mostly redrawn to be a better fit in the overall composition). The good thing is you can basically stop at anytime and still have a working system, one that will be very easy to improve over time.
This is also how I approach an existing codebase, looking at the overall composition, and will often resort to a parallel implementation of my own, however rough yet capturing its essence, to understand it better. Refactoring the original piece is made much easier then, and depending on its quality, possibly incrementally rewritten in a Ship of Theseus way to gradually eat the original away and make it converge towards a future-proof system.
> Rolling back code and rebuilding to run a test is a pain
was “bisecting with distributed version control” isn’t that hard. The rest of the article doesn’t definitively place it in any era. If I’m reading this correctly, he’s essentially advocating for a/b(/c/d ...) testing and feature flags?
So interested to know when this idea occurred to him, and read this HN discussion.
Regardless, the idea (“competing” or alternate implementations in the same build) is compelling.
Feature flags and A/B testing are used more for testing new features or functionality. This is directed at evaluating completely different implementations while trying to keep the same functionality. Particularly regarding performance, he touches on the fact that if not all features are implemented in the new version, you can’t trust the metrics.
I've fallen into this trap, with a horrible mess of conditions.
But you still need some kind of flag, right? To switch which implementation to run. He's talking about modules, not entirely separate standalone programs, so I guess he means the flag is outside the implementations chosen between (and never passed into any of them).
EDIT He's using console/environment flags, so it just needs a little code to read flags, and switch implementations.
> The difference between changing a console variable to get a different behavior versus running an old exe, let alone reverting code changes and rebuilding, is significant.
Production code necessarily accumulates features and efficiencies, obfuscating the basic idea.
Keeping the barest possible version around (which you probably wrote anyway, early on) documents what is basically going on.
Write templates or interfaces. Integrate existing code against those, but link the correct module implementation to a dedicated binary for each approach.
C++ templates and shared objects or Dagger multibinding in Java are good ways to do this.
I’ve not tried his way, but he’s pretty explicit about what The Way is.
Modularize your code. Then only rebuild modules that have changed.
Instead, bind different modules to different binary build targets. If you're smart about your module size, compilation time is a non issue. It will be way more maintainable. And by forcing yourself to do this, you force yourself to modularize your code in an extensible way instead of placing flag hacks everywhere.
In a way they do. Game development/engine development is more possible for this but it happens in all software, especially widely successful software that must iterate while existing implementations are available along with the new one.
A big tech version might be old verses new reddit using same data store to verify how things work, and allow people to slowly change over and on demand with a url switch. Digg did not do this and alienated everyone famously.
Or when Facebook or others swaps out a new version of the api, the old ones run for a time based on the version switch, as well as apps built that integrate it or libraries used for it.
Or some A/B testing in terms of software flow or usability/presentation.
Or in Unity for instance they had their old GUI available while their new Unity UI was available, allowing people to switch. Same with their animation system to Mecanim animation system, same with their particle systems where they had two. They have to roll all new features in like this over time now that so much is built on the engine.
When you use Unity an example of parallel implementation might be which particle system you use at runtime, maybe both are integrated and you flip between the two to see what looks or works best. Or flipping between legacy animation or Mecanim. We have the ability to switch between ui libraries, particle systems, animation libraries on the fly because of all the different games/implementations we have to support and it is needed and they need to be a baseline support across all apps/games. Same with utilities, they can flip between these systems at runtime to check differences.
Doing parallel implementations can lead to cleaner internals to more easily plug in, and it can prevent the 'version 2' disease of hard cutting over legacy that ends up missing a bunch of features.
Also you're talking about two different things. There is feature experimentation, then there is development experimentation.
Yes, of course you need runtime flags to enable A/B testing. Implementing two implementations for development purposes has nothing to do with that.
No, it is not basic or even recommended. Experimental feature flags should be used sparingly, especially those that take effect at runtime.
The worst codebases I have worked in were littered with runtime feature flags all over the place, which rendered many tests worthless by creating a combinatorial explosion of runtime complexity.
It is the greenest of developers who use experimental flags liberally.
For example let's say a component needs a rewrite. They would scrap the function signatures for the old one, even if the problem being solved and expectations from a callee perspective haven't changed and they would apply equally well to the new thing.
And they would pretty much never use an "interface" in languages that have them, or a wrapper, everything is a direct call into their explicit dependency.
Can be extended to things like web services. Every rewrite renames identical parameters for no particular reason.
This is a lighter weight version of creating a feature branch and attempting to keep that up to date or doing AB testing. That kind of thing typically involves a lot of project bureaucracy which can make a change more controversial. This way, until you are ready to enable the new path, there is no risk and you can keep on committing/merging changes. The only downside is temporary code clutter.
What he’s doing is building the new version in the same branch as the old, with a simple code switch to use one or the other
This is very similar to A/B testing, but not for user engagement but for accurate replication of the original results.
When you decide to do this up front, it can be fairly easily done and is very valuable for debugging the new version
Ye people like you kinda suck.
Unfortunately, you’re just now learning that maybe the new implementation isn’t “better” but that it’s just...new.
And now it’s slowed down the team too. I know it’s John Carmack, but still it’s not very collaborative nor empathetic towards your coworkers to plow through code like you’re the only one working on it.