
Test && commit || revert - grzm
https://medium.com/@kentbeck_7670/test-commit-revert-870bbd756864
======
chowells
One release of the GHC Haskell compiler, long ago, had a bug where it put all
your source files in the list of build artifacts it had created. When GHC ran
into a type error, it deleted all its build artifacts from the current run.

There are stories told that a few argued to keep it around as a feature for
training in doing mental type checking.

------
jrockway
Do people not typically modify the tests in the change where they introduce a
feature?

I have never worked anywhere where I would check in a change without the tests
passing. If you are making a huge refactoring, you do it a step at a time,
with each incremental change still having passing tests.

When I was at Google I believe that changes were automatically rolled back if
they broke enough tests globally. I don't recall anyone ever complaining
except that the rollback was slower than they would have preferred. You simply
can't check in a library change that breaks half the company, after all.

If people are writing adequate tests and consider commits to be "change lists"
instead of just a random thing they want to back up from their local
machine... this is a no-brainer. But it does require the culture of having
your code reviewer check that they think there's enough code coverage.
Otherwise "passing tests" doesn't mean anything.

(Google's tools happily showed lines of code that were covered by tests in the
review UI, so you have objective and subjective data as a code reviewer to
look at before approving.)

~~~
derefr
Note that this post is talking about the idea as a follow-on to the idea of
automatically running tests and then _generating a commit_ , from every
filesystem-observable change ([https://medium.com/@kentbeck_7670/limbo-on-the-
cheap-e4cfae8...](https://medium.com/@kentbeck_7670/limbo-on-the-
cheap-e4cfae840330)).

I'm sure your (manual) commits keep together the changes to the code and the
matching changes to the test that keep the test green—but your Ctrl+S'es
probably don't. In this strange workflow, the tests are required to be green
every time you press Ctrl+S; and any time you press Ctrl+S and the tests
aren't green, everything you did since the last time the tests _were_ green
disappears from the worktree.

~~~
danShumway
I just can't imagine how that would scale in any significant way.

One of the many points of automated testing is that it's automated. I don't
have to mentally run all of my tests in the back of my head to tell whether or
not my code is correct. With this setup, I do. I become scared to even hit
Ctrl+S in case I made a typo somewhere. Or perhaps even worse, what if I made
a typo in my test and just committed a half-finished implementation? I've
taken automated testing, one of the most brilliant time-saving inventions in
all of programming, and turned it into a chore.

And as a programmer, that's supposed to encourage me to do _more_ TDD?

I understand that the article is specifically about addressing those kinds of
doubts, but... I dunno. I can't imagine ever trying this for any program
bigger than a couple thousand lines of code.

If the goal is to have tiny changes, why not just count up the changelog and
revert it if anyone ever tries to commit more than 25 lines of code? It seems
like that would be just as effective, and potentially less annoying.

------
ryanmarsh
This is the exact method I use to teach TDD and it works. It works really
well.

The article mentions the sunk costs fallacy. This is true. In the workshop I
give you tougher and tougher parameters, shorter and shorter time frames. If
your tests aren’t passing you reset to the last commit. You can’t commit
without passing tests. People learn, a) write smaller simpler tests, b) code
is cheap. The third time you write something you’ll write it faster and
better. Better to do that in a tight loop than over a prolonged dev-test-
refactor cycle that could last days to years.

You also learn how to refactor in the green. Boy is that a good skill to have.

~~~
microtherion
But doesn't the classical TDD process start by writing a test and making sure
it fails before the code is changed?

~~~
sdenton4
Generally if I make a change and nothing fails, I assume something has gone
even more wrong than usual...

------
LukeShu
I don't like lumping "add test & pass" together. It's very easy to
accidentally write a test that doesn't test what you think it tests, and
actually passes even without the "fix" applied. I do think it's very important
to verify that the test fails without the associated change applied.

This breaks `git bisect`, so maybe they shouldn't be separate commits in the
public commit log, but they should be separate commits while you're working on
it.

------
zacherates
... except seeing the test fail is how I know my test might be working.

It's trivial to write passing tests.

~~~
l0b0
That's why the red-green-refactor style of TDD is so useful. You start by
writing a _failing_ test, then make it pass with the simplest possible change,
and then refactor the code while keeping everything passing. You'd be hard
pressed to write a test which fails and _then_ passes by accident. It leads to
extremely clean code IME.

~~~
mlthoughts2018
In my experience this approach offers nothing, because the whole challenge is
in writing the right tests, and you almost never know what the test is
supposed to be like until after writing code for a while. There’s a symbiosis
between knowing what test needs to be written and knowing what code needs to
be written.

No part of specs, TDD, BDD, etc., actually ever captures the reality of this,
despite every Name Brand Workflow always claiming to.

~~~
l0b0
> There’s a symbiosis between knowing what test needs to be written and
> knowing what code needs to be written.

This should only be the case while unfamiliar with the language or project at
hand. In my experience, once you're familiar with the project and the
language, mulling over a requirement for a short time should be enough to
split it into smaller features which are simple to test in isolation.

~~~
mlthoughts2018
It has pretty much nothing to do with your experience level, your familiarity
with the project or the language or tooling. When those things matter, they
amplify this phenomenon, but the baseline effect is already high.

It’s a function of requirements being ambiguous by design. Non-engineering
stakeholders have a severe need for requirements and specs to be fungible and
ambiguous, such that whether or not the state of the project satisfies
requirements is fundamentally not codified in any objective document and no
test can reflect useful objective measures of it. Tests can only loosely
approximate it and the ways to do so are always shifting and changing and up
to a subjective interpretation of some non-technical people.

Perhaps the only objective measure that non-tech people will agree to be
specific about is financial cost, and even then usually only after it becomes
a problem.

That’s every project in a product-oriented tech company period, and anyone
claiming otherwise is full of it.

There can be isolated, tiny pockets of maintenance work or last stages dev
work once you pass a point of no return when business people can no longer
sustainable lobby for requirements to continue to be ambiguous and fungible,
and finally you can start proving things with accurately scoped tests, but
it’s so late in the game that the idea you could do something like TDD at that
point is comical.

Instead, becoming a good product-oriented developer means in part learning how
to write well factored tests on the fly at the same time you are writing the
code to be tested, and keeping testing code clean and set up with fixtures and
set up to run in highly automated ways, so that when you’re inevitably ripping
stuff out of the implementation for the 18th time due to yet another priority
pivot from sales, you can nimbly also update the tests as you go.

~~~
jdlshore
I've been using TDD for years (since 2000!) and my experience doesn't match
yours. I've used it on countless projects, big and small.

I think the disconnect is that you're assuming the tests are about
requirements, when in fact the tests people write with TDD are much more
tightly focused.

~~~
mlthoughts2018
> “in fact the tests people write with TDD are much more tightly focused.”

This usually doesn’t help and sometimes hurts, because if the tests are
tighlty scoped yet the requirements are ambiguous, it usually means the tests
are very related to the current implementation details, which are going to
rapidly change due to scope pivots.

TDD simply doesn’t work when you don’t know what tests to write or how to
structure tests to accomodate scope and priority changes, which is all the
time. Starting out by writing a test then coding to pass that test is baking
in an assumption about that test which is almost surely already wrong, and
just creates rework to undo a tightly focused test.

~~~
bethly
When TDDing the tests are going to change a lot, just like the code you are
saying you write anyway. I recommend the book Refactoring for the tools so you
can make the changes without breaking the test, then clean up the now-
unimportant tests, and then clean up the unused code. That way the tests stay
green even when making major, breaking changes.

Being able to slice projects that way, in terms of functional behavior, is
absolutely a skill that requires practice. You are already thinking about what
your code needs to do: the practice is to get good at expressing that in terms
the computer understand. Once you are in the habit, the tests make it easier
to evolve the code rather than harder.

It is possible to be a good developer without that skill, but learning and
practicing the skill will make you better.

------
wink
This sounds highly useful for changes you can do in a few hours. Or for toy
projects where I can keep the whole application in my head.

I have no idea how this should work on a big refactoring where you touch every
single one of your.. say 200 files and add something to every single method.
(Don't ask, assume a change that moves every single primary key to a composite
key, so every getX(id) would become getX(id1, id2)).

I mean, we were multiple people - we split out the work because a lot of it
wasn't even very error prone, but this took days (ok, weeks). Are you supposed
to just not commit or even show the stuff you did to the others? (Yes, there
were many, many reviews and pair programming sessions in between). Of course
this was on a feature branch. Yes, it lived too long, yes there were still
bugs at the end when we fixed the tests. I don't think it would've gone better
or quicker if we had meticulously broken down the exact same change (add an id
to every getter and setter on controller and db-layer level). Actually we
tried - and as expected it took deep dive through all layers.

------
OJFord
I don't think the contrast to TDD (made in the image more than the text) is
right: why not split 'add test & pass' into 'add test' & 'pass', and then you
still have TDD.

Sure, you have one commit where the test was broken. But the product was
broken anyway, and now you have a commit proving the test works.

------
hnruss
The author describes the script in [https://medium.com/@kentbeck_7670/limbo-
on-the-cheap-e4cfae8...](https://medium.com/@kentbeck_7670/limbo-on-the-
cheap-e4cfae840330)

"test && git commit -am working"

This seems like a recipe for a useless commit history.

~~~
derefr
The workflow is using Git as a realtime state-synchronization primitive (like
Operational Transformations in Google Docs), not an SCM. The point is to push
the commits over to the other person as soon as they're created, and for the
other person to pull them and rebase against them as soon as they're created.

Unlike character-by-character OT sync, though, each synced Git commit _does_
represent a working codebase. You could theoretically retroactively grab huge
numbers of these commits and squash them down and describe the result—though
multiple independent things might have happened in them that would be very
difficult to untangle, given that, in this workflow, it's entirely possible
that two separate people are working collaboratively on two separate features,
in the same file, at the same time.

------
bilalq
This brings Vigil lang back to mind:

[https://github.com/munificent/vigil](https://github.com/munificent/vigil)

It's a programming language that erases any code that fails to meet contracts
or throws an exception.

------
l0b0
We did this _in CI_ at an earlier job, and what we found is that it works if
and only if you have a good test suite. In our case we had too many false
positives because of badly written legacy tests, so it was not a usable
approach.

------
supermatt
The problem with this is making changes TOO small - then you can find you are
testing the wrong behaviour just to make your code committable. I think there
is a balance to finding the right size increment while still keeping direction

------
thechao
The is how I’ve developed code for almost a decade. For a given test we have
the following behavior triple: desired result, expected result, and observed
result. A test fails if it’s expected result differs from the observed result.
The set of development work is every place the desired & expected results
differ. The CI is passing when _no_ test changes behavior. There’s a revert
“window” where a change-in-state can be blessed.

~~~
Something1234
What's the difference between a desired and expected result?

~~~
thechao
Desired is what you want to happen, expected is what you expect to happen, and
observed is what happened with the current run.

~~~
XorNot
From the perspective of a test that only passes or fails, this doesn't gain
you anything.

I suppose we could imagine a test system which includes more metadata, and
allows for placeholder values to let tests pass while producing an annotation
that the product is not production ready.

------
city41
What am I missing? If commit is a possible action, then we have a unit of code
not yet in the repo. So if that is the case, won't the revert be reverting a
previous, unrelated commit? If we can't commit unless the tests pass, then why
would we ever revert?

I believe what I'm missing is taking the word `revert` too literally. I
suspect it's actually `test && commit || checkout .` essentially.

~~~
derefr
It's:

    
    
        ./test.sh && (git add . && git commit -m "auto") || (git reset --hard HEAD)

------
ridiculous_fish
How do high-quality narrative comments get written in this workflow?

Either I write the comment up-front and risk it being deleted, or I write the
comments after the tests pass and thereby propagate uncommented code.

~~~
inerte
Commit your comments. Your tests should pass.

~~~
ridiculous_fish
The idea of this workflow is that saving runs the tests, and then either
commits or deletes all changes. I think in this workflow I would write the
code, get the tests to pass (thereby creating a commit), and then create a new
commit with good comments.

~~~
inerte
Sure. Or, write your comments. Run the script, tests pass, comments are
committed.

------
arithma
This sounds like a sure way to enforce a local optimum. If you can't make
mistakes and break enough things at a time to change the architecture, in
order to fix the architecture itself, it means your architecture is going to
be extremely sticky, and there will be no chance that design mistakes will be
corrected.

~~~
jasonpeacock
Owning production systems means you can't make mistakes and break things.
People are already doing what you're saying can't be done, everyday, when they
change production systems.

He's saying you should figure out the path from A to B and get there in small
steps. And make sure your tests pass at each step. Don't make big changes that
are hard to propagate and merge with others' changes.

Basically, avoid the classic "nobody change anything, I'm doing a merge
today!" situation.

~~~
lifthrasiir
> People are already doing what you're saying can't be done, everyday, when
> they change production systems.

Only seemingly so. People do break tests when they are writing (or removing)
code, and even intermittent commits often don't pass tests (they thus have to
be squashed before the merger, they only exist for making reviewer's jobs easy
and keeping them risks faulty bisect for example). Development and deployment
are separate things.

------
boffinism
Test && (commit || revert)?

Or (test && commit) || revert?

~~~
adrianmonk
I'm thinking they mean the Bourne(-style) shell, where this syntax means that
test is run, either commit or revert is run. That is, it is a shorthand for:

    
    
      if test
      then
        commit
      else
        revert
      fi
    

Presumably the point is to encourage you to keep your code changes and tests
so small that you are OK with losing work.

~~~
mikelward
It's actually equivalent to

    
    
      if test; then
        commit || revert
      else
        revert
      fi
    

Or in another language:

    
    
      try {
        test
        commit
      } catch {
        revert
      }
    

This is because

    
    
      test && commit
    

is evaluated, and the resulting value is the left side of

    
    
      expr || revert
    

Not sure if that was intentional, but it's a very nice one liner assuming
either your commit never fails, or your commit runs extra tests, and you want
to revert if those fail.

------
eterm
I feel this comes from a frustrating kind of purism, like xp, which is a nice
thought experiment but is completely unworkable in the real world.

The real-world consequence of this would be people _not writing tests_ until
after they are feature-complete so they don't risk losing work.

~~~
henryfjordan
You could couple this with code-coverage requirements, like coverage must
always increase or stay level with every commit. That can still be gamed but
at least it's harder and more obvious to other devs

~~~
groestl
People would just write tests which can never fail, and (in the best case) add
the assertions in the end, one by one.

~~~
was_boring
Speaking of tests that never fail... I've seen contractors hand over code
exclaiming they've met the 75% test coverage required by SalesForce's platform
but never actually asserting anything.

------
jb3689
Thank god for buffers :)

~~~
jessaustin
My habit of CTRL-Z'ing out of multiple running vims has saved the day a few
times.

~~~
jasonm23
You can easily time travel in Vim.

[https://coderwall.com/p/twr_bw/time-traveling-in-
vim](https://coderwall.com/p/twr_bw/time-traveling-in-vim)

~~~
jessaustin
Sure, but what I meant was that a deleted file that's still in a vim buffer
may be rewritten.

