On a more serious note, the author is describing a scenario where mocks are generally not useful, ML or not: never mock the code that is under your control if you can help it.
Also, any test that calls out to another "function" (not necessarily a programming language function) is more than a unit test, and is usually considered an "integration" test (it tests that the code that calls out to something else is written properly).
In general, an integration point is sufficiently well covered if the logic for the integration is tested. If you properly apply DI (Dependency Inversion/Injection), replacing external function with a fake/mock/stub implementation allows the integration point to be sufficiently tested, depending on the quality of fake/mock/stub.
If you really want to test unpredictable output (this also applies to eg. performance testing), you want to introduce acceptable range (error deltas), and limit the test to exactly the point that's unpredictable by structuring the code appropriatelly. All the other code and tests should be able to trust that this bit of unpredictable behaviour is tested elsewhere and be able to test different outputs.
I consider the distinction between unit and integration test useless. The important part is tests give me confidence that my system is working, or if not working I can quickly find the problem. I don't care if the bug is in my code or in some third party code, I care that there is a bug. More than once the bug was because the third party code works different from what I expected, but mocks only tell me that if the code works as I expect then my code works - which isn't very good.
Fast tests are important. However I find that most integration tests run very fast if I select the right data. I can write unit tests that run very slot if I select the wrong data. I find that the local filesystem is very fast and so I don't worry about testing it (I place my data in a temporary directory - the problems with temporary directories are about security which I don't worry about in unit tests). Likewise I find that testing with a real database is fast enough (my DB is sqlite which supports in-memory databases - not everyone can work with a real database in that way). Which is to say you should challenge your assumptions about what makes a test slow and see if you can work around them.
You call both tests (which they both are) but if one specifically uses the word "unit test" then they should try using it appropriately, since there is a reason why the extra word is there.
We don't need a global definition of what a unit is, we just need to know what unit is being tested by a given unit test. This is important because it's how we understand coverage:
1. We need to understand what's within the unit (covered by the test) so we don't write redundant tests, and...
2. ...we need to understand what's not within the unit (not covered by the test) so we can write separate tests to cover that unit.
The unit can be anything, as long as we know what the unit we're testing when we read/write/run the test.
An integration test also needs to know what it's testing, but now you're looking at it from an outside perspective. You no longer care which specific areas of code (units) are within the integration, you just care what features are within the integration. Sometimes a refactor will mean that feature A no longer causes class B to call class C: that probably means changes to B's unit tests, because the code unit changed, but it doesn't mean changes to the integration test because the feature didn't change.
The important thing is to understand what's being tested by a given test.
For example, a project I'm working on has integration tests which are written with Selenium and run in a headless browser. Most of these tests involve users, but most of these tests aren't testing user creation, so I don't care how the users get created. Instead of going through the signup interface using Selenium, I'm just injecting the users into the database directly because that's faster and easier. The one exception is I'm not injecting the users into the database in my signup/login integration tests, because whether the user gets created is part of the feature under test.
Of course there are significant overlaps between these two categories, that is to say, there are some unit tests that are also integration tests.
It certainly doesn’t bother me, but considering the length of your rant, which I won’t bother to read after your first sentence, you certainly seem to have quite a few hangups.
At Google they draw the distinction based on i/o. Anything that does i/o wouldn't be considered a unit test by them. Unit tests are usually for algorithms, math functions, data structure classes, and stuff like that. Most developers have never needed to write any of those things because they're abstracted by modern languages, which is why I suspect the terminology confuses some people.
Unit testing in the realm of software originated from the Smalltalk world, where a unit described the test suite for a class. That, with some liberty around the use of class in languages that are not class-centric, reflects the vast, vast, vast majority of tests I see out there in the wild.
The reality is that unit testing swept the world and has become synonymous with testing. The days of yore when things like capture and replay testing were it are behind us. Today, it is uncommon for anyone to write anything other than unit tests (as originally defined).
And that's why "unit testing" has become confusing. Its original meaning no longer serves a purpose. Today, we just call it "testing", leaving those who wish to continue to use "unit" for the sake of fashion to invent pet definitions; definitions not shared by anyone else.
That's a useful distinction for creating a shared language within Google, and it might be a useful distinction for your team too.
But it's definitely not the only way to divide things up. The important thing is that the terminology your term uses is useful for your team.
One project I'm working on has two kinds of tests: unit tests run within the code and test classes, state transitions, etc. by calling the code directly, while integration tests are run in Selenium in a headless browser from the perspective of the user. Some of the unit tests on models do hit the DB (i.e. they do I/O).
I think it's actually quite simple to come to a common language for unit and integration tests if you start from "integration" instead: if there is any sort of integration (two things interacting: your test setup can tell you that's happening), it's an integration test.
If there are no integrations, it's a unit test.
As in, it's all in the name. How does one fit a whole "service" (if we are talking something like a networking service like web API server there and not a DDD-style service) there really beats me, but I am happy to hear you out.
I'd note that things like setting up a client and then configuring it to access a service already fails that test, even if you hide that complexity into fixtures or "external" test setup.
Edit: for avoidance of any doubt, these are terms without clear, widely accepted definitions. I am mostly sharing how I am making what I think is a useful distinction to drive me to write properly scoped tests and structured code.
> if there is any sort of integration (two things interacting: your test setup can tell you that's happening), it's an integration test.
Two things of what? There are people that claim that as soon as there are more than one method involved, it's an integration test, and more common, that as soon as there are more than one class involved, it is. This lead to absurd mocking of utility classes to make sure that only one thing is tested at the time. And then, of course, the tests become very fragile.
And while there are situations when it is useful to only test one specific thing, what we really care about is the behaviour of whole system of many parts. We mock (fake or stub) to make the tests faster, repeatable and reliable. That is, what we don't want in our tests are slow calls, over slow networks or lengthy calculations. We want the calls to be repeatable, so we don't want to call, say, a database where the state changes between test runs. And we don't want to make calls to unreliable services; unreliable because they are under development or because they are over an unreliable network.
I would be in the camp that says two methods calling each other are an integration, but I avoid mocking as much as possible.
But if you need to use mocking there heavily, your code is not structured well: look for functional style and acyclic directed graph structure of calls.
The whole point of unit tests is that to write them you have to break the system up into units and that decision is an important one that depends on the problem being solved. If they are not well defined then unit tests force you to do define them.
I appreciate your historical account but words have acquired or changed meaning frequently in the past ("hacker" or "cloud" just to name a couple of obvious ones).
There is a large number of people using terms like unit, integration, system or end-to-end to differentiate between different types of automated tests ran in an xUnit-style framework: why insist on not allowing the term to gain a new meaning?
It's funny you bring up speed — people have different ideas of fast: if you need to run 50k tests, filesystem or database through ORM are usually not fast (you won't have those tests complete in sub-9s if that's the not-lose-focus time).
Having 50k tests complete in that time when backed by memory on a modern CPU is trivial (easily achievable with proper unit and integration tests).
I've been on one too many projects where tests were exactly slow because developers didn't care about testing just-enough when unit- or integration-testing — not to mention becoming parallyzed by any tiny change breaking a ton of unrelated tests.
What you are talking about are tests that cover even more than just integration points, and are usually described as end-to-end or system tests (depending on the scope and context of the code under test). I suggest having as few of those, but in general people structure their code badly so sometimes it's hard to avoid it (most web frameworks don't help, for instance).
I have tools like make (not actually make, but I expect you to know what make is unlike my tools) that ensure while I in theory have more than 100k tests and so running them all would take 10s of minutes, in practice I'm never running more than a few hundred at a time and so the total time is a couple seconds and I don't have to worry.
There is more than one way to solve the problem you suggest one alternative. I have found other answers that let me not need to worry so much.
> On a more serious note, the author is describing a scenario where mocks are generally not useful, ML or not: never mock the code that is under your control if you can help it.
When I'm deciding what to mock, I don't use "code under my control" as the determining factor. For me the question is, "is this the code I'm trying to test with this test?". If not, it's often a good idea to mock it even if it's your code, because you want your test failures to indicate where the error is.
Of course, you probably shouldn't be testing code that isn't yours, so that does mean you can mock code that isn't under your control.
For example, let's say you have two components, A and B, where tests of A end up triggering calls to B. In an ideal world, you have tests of B in isolation, so when writing tests of A, you don't need to test B again. If you don't mock B when you test A, then if something fails in B, you get failures in both A's tests and B's tests, and that makes it harder to figure out what's wrong. Instead, if you mock out B in A's tests, when something fails in B, you only get failures in B, and that pinpoints where the issue is.
There are caveats galore. The first caveat that comes to mind: If you're on a project with poor test coverage, B might not have any tests, and it may be worthwhile to write tests against A that don't mock B, so that you get some testing of B "for free". You can mock out B in A's tests once you have tests against B, but that may not be worth the effort. Yes, that's sort of blurring the lines between "unit test" and "integration test", but whether or not the code is tested at all is far more important than whether it's unit tested or integration tested.
[...]
LLMs are a prime candidate for mocking, because they're non-deterministic. Non-deterministic code it terrible for unit tests, because it will fail randomly, and over time that slowly normalizes ignoring test failures. You can make probablistic assertions on non-deterministic code, but there are two problems with this:
1. The more "wiggle room" you allow in your probablistic assertions, the less useful the assertion is--3rd-standard-deviation results are sometimes bugs, not just unusual results, and you don't want to ignore them, and...
2. Probablistic never equals deterministic; if you're running your tests often (and you should) then you'll get test failures on probablistic tests eventually. Broadening probablistic assertions only decreases the chance of failure, it doesn't remove it.
Probablistic tests create an impossible problem: if you tighten your assertions you struggle with meaningless failing tests and normalize ignoring test failures, while if you loosen your assertions your tests aren't really asserting anything any more. And there isn't a happy balance: if you go somewhere in the middle, you end up having both problems.
Mocking the LLM removes the non-determinism problem when the LLM is not the unit under test. That is to say, if you're testing code that calls the LLM, it may be a good idea to mock the LLM in those tests. Obviously, don't mock the LLM if your intent is to test the LLM.
Another possibility for removing non-determinism, which is probably more appropriate when you are testing the LLM, is to make it non-deterministic. You can do this by abstracting the RNG out of the LLM. Then when you want to test the LLM, you pass in an RNG that is seeded with a constant or even mock the RNG (gasp! You can do that!!?).
There are no silver bullets, and this approach has its downsides, because non-breaking changes to the LLM will likely result in changes to the now-deterministic outputs, meaning that a non-breaking change to the LLM code will still result in a bunch of breaking tests. This is why I would prefer mocking the LLM for tests that aren't intended for testing the LLM.
> Counterpoint when using mocks - if Bs behavior changes, one may not remember to update A’s test which would be falsely passing.
> This problem is exacerbated if B is a popular object used by many components.
This is a great addition to the caveats I mentioned.
> IMO if you own A and B, never use mock.
I'm going to take issue with your use of the word "never" here.
If you said something like, "If you own A and B, don't use mocks the vast majority of the time", I'd even agree--but there are important exceptions (and I'd argue that non-deterministic B is one such exception).
> Possibly write a Fake B if B is non deterministic or slow. Then write a parity test for B and Fake B
Maintaining that your Fakes are correct takes work. An easy way that I’ve found to do that is to run the tests against a “real” component and the Fake component with the exact same set of assertions and set up. If that test breaks, then you know that consuming code should also break
Okay, I think I understand. In Fowler's terminology it appears I've been using stubs and mocks and calling them both mocks (shrug).
I haven't used fakes so take these critiques with a grain of salt, but I have two concerns with fakes:
1. The process you're describing to verify fakes does take work, and sounds suspiciously like it veers into testing your tests. That's probably worth it for applications like the space shuttle or an X-ray machine that need an extreme degree of reliability, but seems pretty overkill for the kinds of applications that make up most of the work in the software industry.
2. More often than not, I can't think of a useful fake for most things I'd use mocks for. The in-memory database mentioned by Fowler is an exception: you're skipping writing to disk, which saves a lot of time. But in most cases, the fastest implementation of a unit is... the unit, because if it wasn't the fastest you'd use the fake instead of the unit.
And all this sort of sets aside the larger issue which is that there are plenty of cases where the unit you want to test isn't the interface between A and B, so having a real B in that test just adds unwanted dependencies. The ideal here is that if you break the interface between A and B, one test fails--the test of that feature of the interface between A and B--which tells you exactly where the bug is. With an LLM, for example, this could look like you calling the real LLM and just testing that the call doesn't call any errors and returns syntactically valid output (even if the output is unpredictable, you can verify the syntax). If you're using fakes everywhere, then a failure of the interface between A and B is going to cause failures in a bunch of A tests, which makes debugging harder, not easier.
The case of the in-memory database makes sense because a general purpose fake (the in-memory database) can be written by a third-party and act as an effective mock for all the places you're using the DB--it's easier to use the in-memory database than to mock all the places where you call the DB (and notably, I've never seen anyone write a bunch of tests to test the in-memory database as you describe being a good idea for fakes). And in most cases, I'm using mocks because it's not easier to use the real thing.
The last point of that comment is the core to "never use mocks": implementing a fake that behaves exactly as the real implementation means that you can run a test that passes both for the fake B and B.
Eg a simple example is having FakeSentimentAnalysis that will behave exactly like SentimentAnalysis for the classes it can return. You can then have an identical (parametrized) test that works on both, and from there on, you can trust that this test will break when fake diverges from the real implementation, without having to worry about mocks littered through code.
I like to push that even further and implement fakes inside the code/library/service that implements the real thing too: this ensures interfaces are really identical, and you only need to structure the code using the same approach.
As this is all code under your control, that is an ideal which is not hard to achieve if the entire team accepts that mindset.
But more often than not, particularly when interfacing with remote services, you don't want 'FakeSentimentAnalysis' to behave exactly like SentimentAnalysis. You want it to do crazy and unexpected things exactly unlike SentimentAnalysis to ensure that you properly handle the failure conditions that shouldn't, but theoretically could, occur when using SentimentAnalysis. You almost don't even need to worry about the cases where SentimentAnalysis is working as expected. It's the failure states that are of top concern.
As we are talking about "code under your control", I don't see a conflict there: you seem to be assuming that I was suggesting the Fake should only implement the happy path, but on the contrary, it's quite easy to have the fake exhibit failing behaviour, but it might be harder to have the same test work against both the fake and real implementation in that case (eg simulating a network connection issue in a fake is easier than with a real implementation).
I generally do that by having a test that works both against a fake and against the actual implementation, a bunch of others that only use fakes, and a few system/e2e tests covering the whole thing.
With not much effort, you get increased trust in the code you write.
But most notably, it makes you write testable code which I think is most maintainable and most readable code to write.
> you seem to be assuming that I was suggesting the Fake should only implement the happy path
There is no such assumption. The assumption is that if you rely on a fake to service all your testing needs it cannot be tested against the real implementation as, in many cases, you do not want it to work the same way.
I don't know if a network connection failure is the greatest example, but let's run with it. Why bother adding simulated network failure into your fake, which, due to the problems you point out, won't be touched by your double-duty test suite anyway, when you can just create an additional mock that does nothing but simulate network failure? Why add needless complexity to the fake? You haven't gained a testability advantage.
That's not to say that a fake isn't also useful, but you haven't made a case for why it has to be the be all and end all. You can use both.
Let's imagine you hard-code a network failure in your fake for a particular domain, say "this-domain-fails.com", but it otherwise "works" for all the other domains. While your double-duty test can't confirm that your real implementation handles the failure properly, it will confirm that your fake otherwise works quite similar to the real implementation for other domains. And you'd test the failure condition with a separate test with the fake set up in exactly the same way as in the double duty test (eg. with a fixture).
And sure, this does not gain any testability advantage compared to a mock, but if your test for the failure uses as much as possible of the same code paths as the real implementation, only substituting the fake in, you increase trustworthiness — if the APIs between a fake and a real implementation diverge (a common problem with mocks as tests continue to pass), it's likely to be caught by the double-duty test, and as you adapt your fake to match the new reality, you'll likely start getting the network-failure test to fail too.
In the above example, the only bits you can't fully trust, since it's not automatically tested for both implementations, is your "emulation" of the failure: you want to be careful about how you implement that so it really happens in comparable circumstances to the real implementation (eg. it's ok to throw an error where you would otherwise be calling out over network and returning data).
A lot of it depends on how you structure the code. In memory database fakes are the easiest examples, because it's clear to most people how you can structure the code to have a facade API that's used everywhere, and only have the final fake/real implementations that either do stuff in-memory or on the actual database. But you can generally do that with anything.
In general, testing is never equivalent to proving code works correctly, but I think this is the closest you can get (with a healthy dose of fuzzing on top).
However I found most software engineers not to believe it to be possible or doable with not much effort to achieve this level of trust in the code. But "showing the code" has managed to convince most — it does require a switch in the mindset, but it's quite similar to accepting that real TDD is possible for anything but toy problems (I don't think it's the most efficient way to develop, but I think it is possible and teaches people to write testable code).
A better phrasing would be, ML models are better suited for integration testing rather than unit testing. Since the test is no longer running in isolation.
I don't do any fancy research but for my simple stuff, I've mostly given up on the idea of unit tests. I still use them for some things and they totally help in places where the logic is wonky or unintuitive but I see my unit tests as living documentation of requirements more than actual tests. Things like make sure you get new tokens if your current ones will expire in five minutes or less.
> Don’t test external libraries. We can assume that external libraries work. Thus, no need to test data loaders, tokenizers, optimizers, etc.
I disagree with this. At $work I don't have all day to write perfect code. Neither does anyone else. I don't mock/substitute http anymore. I directly call my dependencies. If they fail, I try things out manually. If something goes wrong, I send them a message or go through their code if necessary.
Life is too short to be dogmatic about tests. Do what works for your (dysfunctional) organization.
Assume but verify I think is reasonable for external dependencies. Meaning do not test them explicitly, but have tests that exercise rh external dependencies such that it will uncover at least major braking changes in the dependency. Cause such things do happen from time to time. And it is highly benficial to be able to update dependencies and be confident that nothing broke - much easier to stay up to date.
The original argument for unit tests was on semantic units (do a thing) and not code units (a function). The interface could be a class or a set of functions, but the idea was in part not to break it down further than needed or combine separate requirements into one tests.
I think code units / syntactic units became popular as a simplification of the original idea, people copying the style from examples and seeing it as a solution to their problems without actually understanding what the example was describing. Then it gets repeated over and over as "test individual functions" until the original idea is lost and most people haven't even heard of it.
Nowadays the original idea is partially described by black box testing.
If you are writing the standard library for a programing language than unit tests are useful. If you are implementing an algorithm for any other reason you should ask why isn't this in a standard library already. Sometimes your company will have good reasons for making their own standard library, sometimes you will bring in a third party algorithm library. Most coding though isn't making an algorithm it is using existing known algorithms to massage data. As such most code shouldn't be unit tested.
If you are making a standard library then you should be writing a lot of unit tests. However odds are that isn't your job.
I think the problem with the article is that the author talks about testing the actual machine learning processes. In that case, you aren't really doing unit tests, yeah.
If you're just using a ready-made model in an app, for unit tests mocking the model out is fine. For integration tests, you probably want to use the actual model, of course (maybe with low temperature, to reduce flakiness?).
Let me stop you right there. No logic is learned in this process.
[edit] Also, the LLM is inductive, not deductive. That is, it can only generalize based on observable facts, not universalize based on logical conditions. This also goes to the question of whether a logical statement itself can ever be arrived at by induction, such as whether the absence of life in the observable universe is a problem of our ability to observe or a generally applicable phenomenon. But for the purpose of LLMs we have to conclude that no, it can't find logic by reducing a set of outcomes, regardless of the size of the set. All it can do is find a set of incomprehensible equations that seem to fit the set in every example you throw at it. That's not logic, it's a lens.
I was under the impression that there's a branch of models that use Logic gate networks and in that case the "logic" part seems sensible. The way I think about it is that a machine learning model is a hairball of NAND gates and spaghetti connections that nobody can explain, but work, and in that sense it'd be "logic" in one sense but not in the human logic thinking sense. Like if you pressed "random" on an FPGA millions of times until it replied with sensible outputs.
You can think of a neural network as millions of NAND gates, but those gates are not conforming to the problem if you ask it what 1+1 is. They're conforming to the known answers of 1+1, 1+2, 2+2, etc. They are in other words limited by the size of their memorization of patterns of tokens. (Which is enormous). They have no handle on any underlying logic.
They can explain addition to you but they can't perform it without calling out to an external API.
In dealing with complicated code, this is a very insidious problem, because they can often write functions that compile and run, but which miss edge cases because there is no actual logic behind them.
[edit] I should add that a sufficiently large NN may be able to create an entire computer inside itself that can run code, but even that would not be reliable because it will have arrived at the gating by inference rather than starting from a deterministic root.
While I agree in general, I would venture to say that even humans don't really use logic to say what 1 + 1 is.
Best evidence for this is the construction of natural numbers in traditional math* with `successor` function (A. There is a one (1). B. There exists a successor of one (succ(1)) called 2.) — everything else follows from that using formal logic, and that's why everything else is generally out of reach for generative ML models (eg. summation can be defined in multiple ways, one of them being successive application of a successor function the right number of times on one of the operands).
* Modern approach is to use set cardinality instead, but the essence is the same: something has to be memorized (axioms), and then logic is applied.
Hm. Well I agree that humans probably arrive at 1+1 by induction first, but quickly progress to the application of formal logic. Reasoning is essential to human development.
Take a 3-year-old who's holding two pieces of candy. Say, "I'm going to take these and give you two pieces back." Then give him one piece back. He's going to start crying. Perhaps he can't explain why, exactly. But if you ask him why it's not fair, he's going to tell you that two is more than one. Ask why is two more than one: It doesn't require succession. It's a fact of the reality we inhabit. We can measure it with our hands, our eyes, its weight. We can hear two sounds, two voices at once and differentiate them (non-blocking, nonlinear processing). AIs don't inhabit any reality grounded in the physical constraints they purport to understand, therefore they can never arrive at the conclusion that 2 > 1. They just appear to "know" it because it's part of the data set.
But once you understand that 2 > 1, then 3 must be > 2. If acquisition is hardwired into our brains, then so is the capacity for logically explaining why Y > X.
Yeah I'd agree, we learn addition/multiplication/etc as processes of smaller problems. If you gave an LLM a prompting framework to do addition I'm sure the results would be better (add the units, add the tens,
Food for thought- would a savant use the same process? Or have they somehow recruited more of their brain to where they can memorize much larger problems.
So first of all, prompting and re-prompting an LLM is basically forcing it to deduce rather than induce; using millions of gates to get from 1+1 to 1+2. That's what our brain does, too (uses millions of gates for dumb stuff), but we designed computers to do that using 4 bits, so it's ironic that we're now trying to write scripts to force something with 60 billion parameters to do the same thing.
I think savants usually solve problems in the most deductive way, using reasoning that leads to other reasoning. I went to an elementary school in the 80s where more than half the kids would now be labeled autistic... some got into math programs at colleges by the age of 12. I believe it's all pure reasoning, not like some magical calculator that spat out answers they didn't understand the reasons for.
[edit] If you meant: Do savants solve problems by recursively breaking problems into smaller and smaller problems, then yes. But the breaking-apart-of-problems is actually the hard problem, not the solving.
>But for the purpose of LLMs we have to conclude that no, it can't find logic by reducing a set of outcomes, regardless of the size of the set.
A ML model is probably not going to converge strictly on a formal logic system practically and there's also the question of if formal logic even underpins the result of the dataset you're feeding it, but that's entirely different from saying it cannot in principle.
> Avoid loading CSVs or Parquet files as sample data. (It’s fine for evals but not unit tests.) Define sample data directly in unit test code to test key functionality
How does it matter whether I inline my test data inside the unit test code, or have my unit test code load that same data from a checked-in file instead?
It makes tests self-contained and easier to reason about. As a side-effect, random tests won’t accidentally break whenever you change some seemingly unrelated csv file.
As a rule of thumb, I also only assert on input/output values that are explicitly defined as part of the test body. Saves a ton of time chasing down fixture definitions.
Don't make the files "seemingly unrelated". Have a very clear relationship between test data and the tests. And keep the mapping simple, ideally 1-1 data file to test. Or 1 file to 1 group of tests.
Of course it depends on the amount of data. If < 50 lines, inlining is practical.
The problem when mocks happen when all your unit test
passes and the program fails on integration.
The mocks are a pristine place where your library unit test works like a champ.
Bad mocks or bad library? Or both.
Developers are then sent to debug the unit test... overhead.
I don't much about ML but I would think that they should follow some rules
resembling judicial rules of precedence and witness cross-examination techniques.
Depends on the model honestly. If you include gpt model in your unit tests, be prepared to run them over and over again until you get a pass, or chase your own shadow debugging non-errors.
Not intended as a comment on the current TFA, but based on observing many conversations on the topic of unit testing in the past, I believe this to be a true statement:
"If you're ever lost in a wilderness setting, far from civilization, and need to be rescued, just start talking about unit testing. Somebody will immediately show up to tell you that you're doing it wrong."