When writing a class, as a rule of thumb I'd posit there's a general advantage to outsourcing units of behaviour or complexity to other modules. Those other modules could be constructed by the class itself, but this is likely to make the current class concerned with the construction details of those modules. One of the simplest things to do, for example, is to ask for those modules to be provided to the class on construction.
This sounds like a great idea until somebody comes along and calls it "dependency injection" and a bunch of us lose our minds.
> So if Martin Fowler says that it is possible to use a service locator instead of DI in unit testing, then who are you to argue otherwise?
This argument is the same as "It is possible to use a global variable instead of an argument when unit testing a procedure." That's really what a service locator is - a facade around global variables, unless of course, you use DI to inject the service locator, then you've not really gained anything, but just inserted another layer of indirection.
And that brings us back to why we even use techniques which are now labeled "DI" in the first place - they're basically there to avoid the use of globals (and hence, tight coupling). Interfaces are in place to keep implementations decoupled while providing everything necessary for them to interact.
There are good arguments against a service locator - one of them is presented here:
Another argument again the service locator pattern is this: If you ask the locator for a service which has dependencies you have to resolve those dependencies yourself. So you need to know about the specific implemntation of this service interface which defeats the purpse. If you work around that, you end up with something that is pretty close to a DI container.
> they're basically there to avoid the use of globals (and hence, tight coupling)
One could say that you merely delegate the management of the globals to the DI framework, just as you do with a ServiceLocator.
Singletons aren't global variables. Neither are service locators. Nor databases. Nor screens. Nor keyboards. Global variables are global variables.
Unless they all are. And if you're going that way, you have a single main function, somewhere.
A database is a good example. One might have some class "Person" with some business related logic in it. If your goal is to simply test this business logic of a person, then why would your test be concerned about whether it can establish a connection to a database? By definition, this is no longer a unit test, but a systems test, and the test surface is much larger because now you need to be concerned about whether or not a connection can be established and myriad of other possible problems, all of which could be tested separately, and by knowing which tests succeed/fail, we can have immediate feedback on where some problems might exist, rather than having to debug an entire system to find an issue.
In the case of the service locator - you can't really perform "unit tests" on individual blocks of code which have dependency on the SL, because the SL is mutated in arbitrary places througout the codebase. The SL acts to increase coupling, because now instead of depending on just a specific interface, you depend on the whole runtime data of your application.
Strange, I thought it was terminology for global variables.
> What I mean by "globals" in the scope of testing is "side-effects"
Which is as whole different kettle of mackerel. Minimising side-effects is, of course, excellent advice, and can help with testing. But global variables and SL are not the same thing as side-effects.
Your example (of testing a database) seems very confused to me. You're talking about coupling now. Not global variables. Why would you suddenly need a database connection? Why does the existence of global mutable state mean that nothing in your code can be tested independently? You seem to be imagining the worst case of coupling as an argument against service lookup.
The last paragraph seems blatantly false. You're again straw manning this version of an SL that is mutated in arbitrary places through the code-base and therefore can't be tested in isolation, and neither can the components that use it. That is a strange version of SL you have here, and one that DI wouldn't help with.
They're examples of side effects. It's not good enough to set the values of a global variable or register some service purely for the purpose of a test, because the test then does not reflect the runtime behavior of the code. The benefit of a unit test is to assert that code behaves the same way all the time - not just for specific values you use at the time of testing.
> Your example (of testing a database) seems very confused to me. You're talking about coupling now. Not global variables. Why would you suddenly need a database connection? Why does the existence of global mutable state mean that nothing in your code can be tested independently? You seem to have a strange idea of how software works.
Global variables increase coupling - code which consumes a global variable now has a dependency on all of the code which mutates it. You simply cannot test the consuming code in isolation without regard for the code mutating the variable, unless your test is exhaustive of every possible value which the global variable may contain.
My example was not of testing a database, it was about testing algorithms or logic that might exist inside some class named "Person", but which has a data dependency on an actual person (held in a database). If one wants to test the logic only, then mock data must be supplied instead of the real data from the database - else you're not testing only the person, but also testing that the database is connected and querying it is successful. The correct way to test this is to decouple Person from the database, usually be means of a mock object, or by passing the mock data into the person directly. Either way, it seems the blog author does not do such unit tests, as he doesn't use mock objects.
> I do not use mock objects when building my application, and I do not see the sense in using mock objects when testing. If I am going to deliver a real object to my customer then I want to test that real object and not a reasonable facsimile This is because it would be so easy to put code in the mock object that passes a particular test, but when the same conditions are encountered in the real object in the customer's application the results are something else entirely. You should be testing the code that you will be delivering to your customers, not the code which exists only in the test suite.
The problem with the author's philosophy is that it means when problems do arise in his applications, he must perform whole system testing/debugging to find them. He is missing perhaps the main benefit of unit tests - which is that, when a bug arises, you can quickly eliminate many possible causes because unit tests against those parts of code have succeeded (unless your unit tests were wrong to begin with, which will more or less be the case if they're testing against code which depends on globals).
Testing with a mock object implies that the mock object can generate all the required output that the real object can generate that might have some effect on the consuming code. Not only that, but it assumes that the mock object generates the correct data in ways that cannot generate false positives in the test. This doesn't mean you're only testing the client logic. You're now testing the client logic using services that are ad-hoc and aren't guaranteed to behave like the real thing. You're testing a fantasy.
It is far better to test against the real database. Using a fixture, or a transaction, or some way to use the actual system with representative data. Mocks have their place in very complex services where this is practically impossible. But they don't suddenly make things better for testing, or more atomic. IMHO, when you have to use a mock, it should be as a last resort, when you have to sacrifice fidelity for tractability. Your code is coupled in behavior to the services it uses, pretending it isn't is just fooling yourself.
I have very much the same problem with people who write unit tests against, say SQLite databases, rather than the full DBMS. The complexity of 'masking sure the database is connected and can be queried' is pretty trivial compared to the complexity of mocking a whole RBMDS interface. Good software engineering will, of course, limit the number of places the database interfaces with (I'm not suggesting code with SQL statements in strings everywhere, that's a straw man). But I'd not accept mocked tests that exists just to avoid a database connection or because the developer doesn't understand how to write a transaction.
So I don't understand. Either you're advocating a very bizarre, and seemingly pathological development style, or you're consistently muddying the waters by comparing good programming in your chosen methodology with bad programming in mine, which just misses the point.
Here's an example then. In your Person object, on a platform with reasonable transaction/fixtures support (like Django). Is it better to write your unit test using a mocked ORM layer, or a fixture with the test data in it?
> He is missing perhaps the main benefit of unit tests - which is that, when a bug arises, you can quickly eliminate many possible causes because unit tests against those parts of code have succeeded
I've no idea why this is somehow impossible. I write unit tests at various levels of abstraction. If I have module A, calling module B which calls module C, then I need tests for C, B(+C) and A(+B+C). If I get a failure in A, I make sure that there is a test in B that corresponds to the way A is using B, if so, it is a problem with A, not B. If B and C were mocked, I'd have no way of knowing if the problem was with the mock logic without having to test C, C-mock, B+C-mock, B-mock, A+B-mock.
> now has a dependency on all of the code which mutates it
This seems a bizarre claim. Does your code have a dependency on everything else that can possibly change what's on the screen? If so, how do you deal with that?
That's why pretending 'global variables' = 'all central resources' seems foolish to me.
If I were testing a salary calculation which takes values from a database, and I named my test "Test_salary_calculation_correct", where instead of using some sample data which could easily cover the range of values I need to test against, I instead relied on a database connection, and this test failed because the database was not accessible - I've only confused the developer who picks up my shit where "Test_salary_calculation_correct" fails, and he thinks there's a problem with my calculation rather than a misconfigured firewall somewhere else. The firewall has nothing to do with my salary calcuation - why should it have any effect on the test passing?
The way I see unit tests is this: If you write a test and it passes on your machine, then some other developer takes your code and the same test fails - it's a fuckup on your behalf. Unit tests should not depend on the environment in any way. Actually, by definition, a unit test is a test of a single "unit" - including database access into this is well beyond the scope of unit testing, but into integration testing.
To me it seems you're skipping unit testing and just going onto integration testing with your unit testing framework. I'm not sure what you've observed or where, but I can tell you it's certainly not standard or best practice in the industry. It might possibly tell you something about your own code style though - are you writing units which can be treated in isolation? (Certainly not if you depend on a SL, which is a global context of services with no clear boundary)
Ideally a codebase should be designed to maximize unit-testability and reduce the need for integration testing to as little as possible - since this is where most of the "unexpected", or "out of my control" problems are most likely to occur. This testing is more a case of "am I handling all the relevant exceptions" than getting green lights to pass in a unit testing framework. It doesn't really help to make unit tests against code which is expected to fail out in the wild due to whatever circumstance - what matters here is that your code is prepared for the worst and knows how to recover.
It's these cases where mock classes are particularly useful - because you can forcefully simluate any behavior from the external service and make sure your code is working correctly for all the potential circumstances. Having to rely on divine intervention to trigger some event that may only happen 1% of the time in the real-world situation is hardly practical. Unfortunately testing in the wild is often like this - everything works fine 99% of the time.
Even for cases where you're arguing for a fixture with real test data in (from a database), then the reasonable thing to do is extract this data beforehand and encode it into the unit testing language (which is fairly trivial to do). Now you have a reliable test which will continue to work as you update the code. Testing against live data is giving a false sense of security to begin with anyway. Imagine the scenario where you have a bunch of data in the database, you run your unit test against it with all green flags - then after deployment, somebody inserts into the database a value which your code doesn't expect. The unit test shouldn't be testing against real world data, but against data representitive of the possible values it should accept (ie, include all the obvious edge cases which should fail too, but are not likely to exist in the real world DB).
Singletons can be this, service locators can be this, databases can be this. This can be bad, notably when assumptions are made about the state without taking into account that you're not the only one that could be changing it. Depending on what's changing, this can lead to very confusing / seemingly unpredictable behavior.
Another way to put it is that the nature of the issue with "global variables" is often present in the things you mention.
Thus, if you have a program that has an immutable global constant it is likely in the same situation. How do you ever change that value in any way without retesting everything to see what the effect was?
I don't accept your premise.
Firstly, constants are used for all kinds of things, from hard mathematical data as in your example to configuring connectivity with external services via particular database details or URLs.
Secondly, while some programs really do only have one main purpose and so some of the background/context data really is almost globally applicable, this is certainly not always the case. In particular, the larger a software system becomes, the less likely this is to be true.
How do you ever change that value in any way without retesting everything to see what the effect was?
If the value doesn't have global scope, you don't have to retest everything, only parts of the system that can be affected by the change.
I don't think you do either, although if it's a globally accessible constant throughout a run, problems with inconsistencies only tend to crop up if the value is being used to determine behavior -- which you might need to check for, as you mention.
I'm not sure if global mutable state is the same as the problem you point out though -- they do share a similar nature in the framing of dependencies, but they cause problems for different reasons.
Also, mutability tends to be able to be a real problem in systems much smaller than ones where problems from abuse of global constants start to rear their heads.
I agree, there are (at least) two distinct problems here.
One is the existence of any implicit dependency because of global (or other broad) scope. This can create surprising spacial interactions in the code, making it harder to maintain.
Another is the existence of mutable state. This can create surprising temporal interactions in the code, also making it harder to maintain.
Loosely speaking, the danger from these effects multiplies.
And use of SL doesn't create a dependency that isn't there otherwise. The dependency will be there anyway, because bits of code do depend on one another. The point is, do you want those dependencies to be managed by another piece of code and another piece of configuration? Or is it better to say what you need in the code where you need it?
I'm not religious about it, there are times when being implicit is better, but often it is dramatically more complex for no obvious benefit.
Because if we have module A depending on module B via some clearly defined interface, but in fact the behaviour of module B also depends on global module C that is set up elsewhere, then B's interface no longer fully describes what A can expect.
If C's scope were limited to what's happening within B anyway then this would just be an implementation detail. If C were given to B as some form of explicit dependency by A, then it would be a specified part of the interface. However, if C effectively has global scope by any mechanism then there is now an implicit interface to change B's behaviour that A doesn't know about. Developers reading or maintaining the code for A, C, and anywhere else that can affect C if it's mutable, then need to be aware of what each other are doing, and changes to any of these parts of the code potentially affect any of the others.
(I'm not going to address your other point, because I didn't say anything about service locators in the first place.)
Since then I keep hearing about "Dependency Injection" so my impression is that it's gaining in popularity. But my kneejerk reaction is always that if someone is talking about this subject, they probably are not a very good programmer, just like if someone is talking about how important UML diagrams are. It is maybe a hasty conclusion but that is where my brain goes.
That said, I absolutely despise autowiring and annotation-based DI...
DI lets you push all that stuff to configuration rather than wiring it up in code; it reduces drag during development because you can always say, ah, I don't need to think about how or where the micro-ORM boots up, I can tune it later... I can even share the connection pool across five different consumers that are not aware of each other and share no compile-time dependency.