I haven't had a chance to try it yet, but IntegreSQL[0] looks like this on steroids. It allows you to create a template (run migrations and seed data), and then uses Postgres's built in cloning functionality to maintain a pool of fresh databases. They claim 500ms to clone a database without the pool, and that the pool pretty much hides the latency entirely.
If the numbers hold true, you could actually use a separate database for each test, which could be hugely beneficial. It's way to easy to cause some unintended interaction between tests if sharing a db.
On the other hand I've seen quite a few bugs caught in testing exactly due to the lack of isolation between tests, because the same ones would have shown up in production...
We make tests robust in the face of existing data. For example, each name is "GUID-ified", we only compare specific subsets of rows that we know will be isolated from the rest etc...
We still use transaction rollback for most of our tests, but all tests are written with the assumption of existing data, to accommodate for the few tests that must commit their transactions (such as concurrency tests).
One approach we used was to wrap the connections in a proxy (easy in Python), and flag if the code called the commit() method at any point. You do need to trust that the code doesn't call COMMIT explicitly via the execute() method, and flag tests running subprocesses that might dirty the database. If the test is dirty, rebuild the test database from the template instead of relying on rolling back the transactions.
However, if you add this behavior to an existing test suite you will likely have to fix a lot of tests assuming sequences will return predictable numbers, such as in automatically generated primary keys. Technically, these tests are already broken even if it is unlikely to see them fail.
Typically the approach I take is monkey patch out the transaction methods (begin, commit, rollback) in the test harnesses and wrap each test in a real transaction. Some test runners have this built in.
This is easy in dynamic languages, really hard in in static languages.
So you‘re testing against a database some process that you expect to fail ond you would like to test whether the rollback is done properly.
And because you abstracted all your transaction logic away for the tests, your test result is not worth anything.
IMHO, there are only 2 ways to achieve proper database testing:
1) new db for each test (very slow) or delete all data from all tables before the test (ok for smaller projects)
2) tests do absolutely have no impact on other tests with the data they added/modified/dropped (very hard to achieve).
I seem to recall savepoints differ in some ways that can make this difficult - but without remembering the specific problem I hesitate to say anything concrete about that.
Not sure how Django handles exactly it but it works really well transparently. The only issue is when the code you are testing rolls back a nested transaction, I seem to remember that caused issues but there is a workaround at least
It was a huge monolith schema. As far as the size of the data being created during the tests, quite small, since they would test individual pieces. Very data intensive tests seems like an outlier for most types of tests I've seen.
Your description sounds cool, I just really wish the project's README told me what it does and how to use it instead of giving a 10-part story with no concise "and here's how the new solution works, all wrapped up!" summary.
The concept rocks, so I really hope it will work and become popular.
A few remarks:
- the go get install is broken in the readme. I opened a ticket.
- since go cross compile, maybe you could provide binaries for Linux/Windows/Mac ? No need to go get, no need to install docker. That would open the door to include the tool in web frameworks.
I use `entr` _constantly_, I can't imagine not having it available.
I study Feedback Loops, and the incredible ease of saying "run this program when those files change" has been revolutionary. Auto-deploy when a Terraform file changes, or auto-test when a Python file changes. Wonderful!
- it's unix only, which, when using a multi-plateform language like python, is limitating
- can't be used programmatically without subprocess hacks
If you use Python, watchdog is an interesting alternative:
- it works were Python works: pretty much everywhere
- it uses the best API for the job (e.g: inotify on linux), but will fallback on manual scanning in a thread if the underlying OS has no special file system watcher API
- it's a lib that can be use programmatically
- it features the command "watchmedo", which is basically what entr does, but with an optional rich file format and plugin system
- for simple use cases, all that stuff stays super simple
My workflow is hugely reliant on entr. I use it to rerun a command whenever I save a file.
1. Configure vim to touch a well-known file on save.
2. Create a command `w` which reruns it's arguments whenever that file is touched.
I must run hundreds of reruns under this setup every day. Basically every language, every too, everything runs on this. Changing my NixOS config? `w sudo nixos-rebuild test` Writing rust? `w cargo ltest` running terraform? `terraform init && w terraform plan`.
With the advent of easy to use docker containers, this is really easy to do for most backing services, right on your machine during tests (and in CI).
If all your app needs is rabbitmq (or some queue), redis, and postgres you can run all of them in the background and give out databases/exchanges to your app as you see fit. This also does something interesting -- it turns you away from tools that don't do badly in containerized environments or who are hard to set up, which is a benefit in my mind. It also turns you away from proprietary cloud solutions that you can't run yourself (which isn't always possible but meh).
I trust my E2E tests a lot more when they're not built on mocking. I'm almost tempted to say that mocking is pointless in this world -- just test the real thing.
[EDIT] Since most people will know how to run redis/postgres/rabbitmq easily, one of the high value things is smtp testing. Here are a few projectsfor that:
Remember, for a lot of these projects don't be worried if you don't see updates for years -- once you've written a reasonably working SMTP server, it may not need to change much.
I might take one of these and try to build a MailTrap[0] competitor eventually. $10/month is twice the cost of a digital ocean droplet running one of these pieces of software (and I know that I can run the instances for even cheaper).
I agree with most of what you write, and I use test fixtures that spin up real services with docker for running integration tests. However, I'd argue that mocking will always have a usefulness, as it goes hand in hand with unit tests. Integration tests confirms the behaviour for the code branching you are able to reach that way, and more closely reflects reality and normal operations. Mocking allows you to more easily test the edge cases that the software unit might encounter that are hard to create conditions for otherwise.
In fact, from also being tempted by what you write (ie. why mock, when you can automatically spin up a docker service). I've done this too, and my conclusion is that it is more trouble than it's worth, although that can differ for other systems, I suppose. The only realistic approach is that a test fixture spins services up once, and not per test/unit, so it just introduced a concern for correct clean and concurrency limitations that wouldn't exist. It also has unexpected issues with wildly varying startup times whenever or not the container image is cached.
All in all, it works well enough now, but when I do it all over somewhere, I'm tempted to keep a clean distinction of deploying temporary services for running integration tests, and regular unit tests, without the in between.
TL;DR: mocking isn't just for "i need service A in my test", but also "I need A to behave like X", and don't get that with external services.
Just want to make sure, but it seems like I can sum up your reasons for mocking are:
- spinning up the service is hard
- startup times can be inconsistent
I may have not read deep enough but I don't see much explanation around the "I need A to behave like X" -- could you give a more concrete example? Lets say you had two queue implementations and one used Redis and one used RabbitMQ (i.e `GenericQueue<Redis|Rabbit>`), it seems ridiculous to need one of them to act like the other -- I'd love an example if you have one in mind.
Optimizing spinning up the service and making startup times consistent are basically solved in half a day of work (that pays off forever), in my experience. Here's an excerpt from a Gitlab CI config I carry around from project-to-project:
e2e:
stage: test
only:
- merge_requests
services:
- docker:dind
variables:
DOCKER_HOST: tcp://docker:2376 # TLS enabled (2375 for disabled)
DOCKER_TLS_CERTDIR: '/certs'
DOCKER_TLS_VERIFY: 1
DOCKER_CERT_PATH: '/certs/client'
script:
- make ensure-images setup build
- make test-e2e
The "ensure-images" make target actually just runs a `docker pull` before the tests:
Which actually reminds me, another cool thing you can mock if you do this -- it's not 100% parity but there's a cool stripe replacement called localstripe[0]. There are bugs/unimplemented features (expecting a random F/OSS project without a ton of members to replicate stripe's API is a lot) but it exists and isn't too hard to use, so I use it in test/local/dev environments, use stripe test API keys in staging, and the real keys in prod.
Another thing worth thinking is that how you optimize each piece is different -- it's often pretty fast to make new exchanges/databases and you can even just TRUNCATE tables if you're really worried about it. Here's an example from some code I wrote:
public resetForTest(): Promise<this> {
if (!this.conn) { return Promise.reject(new DatabaseDisconnectedError()); }
// Do not truncate if database is <production db name>
// for example when process.env.USE_LOCAL_DB_FOR_TEST is set
if (this.getDatabaseName() === "<production db name>") {
return Promise.resolve(this);
}
return this.getConnection()
.then(c => c.query(`
TRUNCATE TABLE <some table> CASCADE;
TRUNCATE TABLE <another table> CASCADE;
... more ...
`))
.then(() => this);
}
And this pattern extends itself to anything that is testable, so I've enshrined it:
Completely separate from all this -- some of the time you can literally just.. run these dependencies in memory[1]. Doesn't get much faster than that. I haven't run into a mock that didn't take time and effort to maintain/alter whenever code or the underlying systems changed. Time spent maintaining mocks is time wasted.
Anyway, I won't go as far as saying that mocks are never useful, sometimes a mock is the easiest and best way -- but I can say that I spend most of my time writing E2E tests, and I trust them a lot and they give me trust in my codebase much more than mocks or unit tests do. I barely write unit tests anymore to check edge cases, because if I really cared I'd just generate them -- and I don't really have to care because I use strong type systems where possible (Typescript for JS, I avoid PHP/Python/Ruby, Rust, Haskell).
Here comes the hot-take -- unit tests (and their rise in popularity/necessity) is actually a direct reflection of the adoption of non-compile-time-type-checked dynamically interpreted languages, and burdensome class-based "type systems". Correct usage of good, expressive and concise (where possible) type systems encourages you to make invalid/nonsensical states impossible, and people got excited about how fast you could churn out code (compared to Java most things feel pretty productive) and they started having numbers where they thought they had strings, and negative numbers where they thought they had natural numbers. Good type systems make it easy to make these cases impossible. As for the business stuff (never store decimals for currency, make sure your amounts are natural numbers, etc), you have to learn that with time/intuition built over time -- you don't know to write that unit test unless it's bit you before (though sitting down to write unit tests might tease it out of you).
I write unit tests as regression tests basically now, but I find I rarely have to do that, since most of the time when a weird case has gone through it's an indicator that I was too lose with the types.
There's a place for e2e tests that run the actual services, and a place for tests that use mock servers. As a full stack developer, I know for a fact that spinning up a service, even if it's dockerized, is more often than not a hassle. Maybe because of configuration hell, maybe because setting up the database (with the migrations, fixture data, etc.) is never a straightforward job even if it's "automated". And if the architecture requires service A to talk to service B, which in turn needs to talk to service C, you end up having to spin up way too much infrastructure (each with their own configuration and database setup hells) for just a simple test. It's not worth it, and often it's not feasible since you may not even have enough memory in your computer.
For frontend developers or data scientists who couldn't care less about the backend, going through this hassle is frustrating and there's no reason why they should go through it.
Mocks are also a fundamental part of building integrations, since they allow both backend and frontend teams to work in parallel against a specification/contract.
That said, for the real integration test, you do need to run the real services and have them talking to each other. If you can do that in your own machine, great. But often the only feasible way to do this is in the cloud.
> There's a place for e2e tests that run the actual services, and a place for tests that use mock servers. As a full stack developer, I know for a fact that spinning up a service, even if it's dockerized, is more often than not a hassle. Maybe because of configuration hell, maybe because setting up the database (with the migrations, fixture data, etc.) is never a straightforward job even if it's "automated".
Configuration hell happens in the cloud as well, and again I noted earlier -- it should be easy for your application to automatically apply and make sure a given database is migrated. Applications shouldn't start if they're not running on the version of database they expect. Maybe producing fixture data is hard but this is stuff you would have had to do with a mock, anyway. Good mocks also need fixture data because they need to act on that data just like the thing they're mocking would.
As a side note this is literally what tooling/infrastructure engineers get paid to do. They build good systems for application developers so that it will feel like magic. It will be easier for everyone, forever, and you're going to reap the rewards in meaningful, impactful tests that are closer to your production infrastructure. People are doing things these days like "testing in production" because mocks just don't match production enough.
If this is like.. what you should do with 2 engineers at the very beginning of a startup, then sure, write mocks to your heart's content. Or even better, don't mock anything just write the basic E2E tests (against test cloud resources or your production cloud resources), and get to shipping even faster. If resources are that tight, no one has time to sit around writing mocks in a code-base with a high churn rate right?
> And if the architecture requires service A to talk to service B, which in turn needs to talk to service C, you end up having to spin up way too much infrastructure (each with their own configuration and database setup hells) for just a simple test. It's not worth it, and often it's not feasible since you may not even have enough memory in your computer.
People are running around with $3000 Macbooks with very fast cores and lots of memory, but running their cloud instances with ASGs of t2.micros/smalls/larges/xlarges initially. Half the time all you need for most apps is a database (postgres), cache (redis), message queue (redis), and S3 (minio), and something to send emails. You can run local versions of these requirements on <4 hyper threads (let's say 2/4 cores). You're not going to be able to test BigQuery or DynamoDB (actually, you can run ScyllaDB but that's besides the point) or some cloud-specific stuff, but most apps just don't need that stuff. It's not that hard to run all 5 of these services on a modern dev machine, nevermind if companies spent half the price of the laptop and got you a desktop machine (like the "thin client" business machines out there these days[0]).
But again, all I can hear is that the experience is lacking -- you've just never seen it done simply, doesn't mean it can't be done or it isn't a good idea -- you've just never run into a place with the right resources/time to build proper developer tooling.
I'd argue it is worth it because you're going to build a mock that approximates those services to test them, are you not? Or are you just going to mock that the request came back correctly? I already mentioned that if you want to just not spin the thing up at all, then yeah write a mock for the function call or whatever, but if you're going to write a mock for that thing, you're basically writing technical debt that will be wrong/subtly broken as soon as the underlying thing changes. Why do that when you can just... use the actual thing?
> For frontend developers or data scientists who couldn't care less about the backend, going through this hassle is frustrating and there's no reason why they should go through it.
Sure but this is the same point as before -- it's hard only when you haven't dedicated any engineering time or the right engineers to it. If it's crucial to you all moving faster, then take the time, build the shared machinery, and it will pay dividends. Maintaining mocks is not free, and the costs are externalized/hidden -- mocks do not maintain parity with the actual implementation, they are a reflection of how you think the actual implementation works.
As a side note, if you want to accelerate frontend developers or data scientists there are many other good options -- they can develop against an API instance that's in the cloud completely (if you have preview apps for Heroku, or a similar setup elsewhere for example) -- data scientists could use hosted notebooks with access to the appropriate data lakes. There are lots of other things you can do there for a fully streamlined experience. I'm going to assume that a fully streamlined experience wasn't the goal here, but rather easy deployment of this software on their machines -- again, this is a matter of building a good experience. If it's hard, make it easy -- write good software and it will look like magic.
> Mocks are also a fundamental part of building integrations, since they allow both backend and frontend teams to work in parallel against a specification/contract.
Mocks are not a fundamental part of building integrations -- the actual backing service is the only fundamental part. This is why people usually just... don't worry about writing tests, or don't mock when they first start out. Writing and maintaining good mocks that mock what you want and reflect the functionality well is non trivial.
Specifying a contract for the software you're producing is a good practice, but it is not tied to mocking, it is to afford you flexibility in writing integrations (instances of the interface). You have chosen to write an integration that doesn't run anywhere else except in tests. Writing a Queue<LocalMemory> is drastically less useful than writing a Queue<Elasticache> and Queue<Redis> and Queue<Kinesis>, though they are all implementations, because Queue<LocalMemory> is never used anywhere else. Bugs an under-engineering/discrepancies in Queue<LocalMemory> will never be seen in production and are a waste to work on, when you can spin up Redis and use Queue<Redis> locally. Queue<Redis> might actually see use in real life, and we know that Queue<Elasticache> and Queue<Redis> are actually kind of the same thing (definitely more-so than Queue<LocalMemory>).
> That said, for the real integration test, you do need to run the real services and have them talking to each other. If you can do that in your own machine, great. But often the only feasible way to do this is in the cloud.
??? This isn't true really. You can absolutely run software on your own machine and fully integration test a system. In fact, for the amount of time that your app's database (which is normally the limiting factor size wise) is less than 100GB, you can run most complete apps (imagine a simple one like Laravel/Rails/Django) on your own machine.
If your app can run on <10 t2 instances (let's assume 3 of those are RDS), and doesn't use more than 500GB of data in total (again, harddrives are bigger than this these days).
Look, we don't have to agree, diversity of opinion is great -- but none of these arguments are doing it for me. They all just amount to "it's hard" with the assumption being that mocks are free. Mocks are not free, and while they may have their place, I would rather just do the work of making it easy for me to spin up the infrastructure where possible. You don't have to be an expert -- hire someone else who is to make sure your developers are productive. If it's <5 devs then OK, maybe no one has that time or expertise, doesn't mean it's not a good idea.
Apologies for not getting to read and reply to your full post, but I wanted to quickly respond to the request for an example regarding "I need A to behave like X". I don't think the example is too contrived:
Let's say A is a REST API service. And you wish to check how your code behaves when a client receives a 5XX response. There might be reliable ways to cause A to reply with such a failure, however, if what you intend to verify is this particular behavior, then simply forcing A to reply with a 5XX through mocking will probably get you there quicker, and have it be decoupled from A's implementation (e.g. the conditions that triggered 5XX no longer apply after a version bump, etc)
Edit: I took the time to read the full post :). In broad terms, I agree with you. I think however that the approach to software testing allows a bit more nuance, and the usefulness of any approach must be determined more on a case-by-case. In very broad strokes, they follow the pros-and-cons associated with unit- and integration-tests. Are you interested in testing the behavior as end-to-end, then integration tests will get you there more reliably than any complicated setup of unit tests will ever do. Containerized services for temporary deployments is also a great way to spin up such dependencies. However, unit tests serve a different purpose, which deliberately try to be as decoupled from any dependency as possible, and as cohesive as possible. It then follows that if you zoom in to this software unit, and value it for its decoupled nature, and you wish to test how it behaves given certain conditions, mocking is a better and easier way to go about it.
> Let's say A is a REST API service. And you wish to check how your code behaves when a client receives a 5XX response. There might be reliable ways to cause A to reply with such a failure, however, if what you intend to verify is this particular behavior, then simply forcing A to reply with a 5XX through mocking will probably get you there quicker, and have it be decoupled from A's implementation (e.g. the conditions that triggered 5XX no longer apply after a version bump, etc)
Ahh thank you for clarifying what you meant, I see what you mean now, I was imagining something else in my head that was incorrect -- that case makes absolute sense to mock.
> Edit: I took the time to read the full post :). In broad terms, I agree with you. I think however that the approach to software testing allows a bit more nuance, and the usefulness of any approach must be determined more on a case-by-case. In very broad strokes, they follow the pros-and-cons associated with unit- and integration-tests. Are you interested in testing the behavior as end-to-end, then integration tests will get you there more reliably than any complicated setup of unit tests will ever do. Containerized services for temporary deployments is also a great way to spin up such dependencies. However, unit tests serve a different purpose, which deliberately try to be as decoupled from any dependency as possible, and as cohesive as possible. It then follows that if you zoom in to this software unit, and value it for its decoupled nature, and you wish to test how it behaves given certain conditions, mocking is a better and easier way to go about it.
Fully agreed here! Case by case is the way to decide when to use either tool, and I think that kind of timing (what level of "zoom" you're at in the codebase) is pretty reasonable as a metric.
The reason I use mocks is not primarily to avoid latency or setting things up, it's because it's the best way to test a permutation space in an isolated fashion.
Type systems are great and indeed many unit tests are a poor replacement for it, but for any moderately complex piece of logic, types can only make trivial guarantees.
Also, clever use of types might enforce something nicely, but they can still be really hard to work with. Error messages are often cryptic and don't describe anything semantic about the problem. A properly constructed unit test can easily tell you "You ended up returning a date that was set in the past compared to the expected outcome" which gives you much more information.
> The reason I use mocks is not primarily to avoid latency or setting things up, it's because it's the best way to test a permutation space in an isolated fashion.
So I think I've been talking past people on this point and not getting what people mean -- when I think of a "mock" I think of like a completely re-implemented piece of infrastructure. I am totally on board with the idea of mocking out individual methods and/or functions that indicate a certain point in the state permutation space.
What I was trying to disagree with was the idea that people should be maintaining mocks/approximations of pieces of simple infrastructure (ex. Redis) they could just spin up. So in my opinion, never write an in-memory cache component (with redis operations like get/set/...) for a local integration test when you can just run redis. If you want to just test how some piece functions when Cache<Redis>.set(...) fails, then that totally makes sense to just mock out (as in make .set() throw/return an exception/error or whatever).
> Type systems are great and indeed many unit tests are a poor replacement for it, but for any moderately complex piece of logic, types can only make trivial guarantees.
> Also, clever use of types might enforce something nicely, but they can still be really hard to work with. Error messages are often cryptic and don't describe anything semantic about the problem. A properly constructed unit test can easily tell you "You ended up returning a date that was set in the past compared to the expected outcome" which gives you much more information.
Can I introduce you to my friends Haskell and Rust? It really depends on the type system, but to be fair there's also quite a productivity trough early on working with these languages and trying to really take advantage of the expressiveness of their type systems.
In the end you can't model out really complex logic -- but often you can definitely avoid writing the kind of things you would have checked for with units tests, and some integration tests at least. Agreed on the second point, but the languages mentioned above have made great strides in that respect, Rust was well known at the start for it's error message readability and Haskell is pretty decent at it, though the sticky situations you can get into in Haskell are an order of magnitude stickier.
IMO that date question is a really good/easy problem for generated tests -- never producing a date that is in the past is a property that you're trying to maintain, and randomizing input is likely to be more effective in doing the fuzzing for the inputs. To be fair, that's more complexity than a simple unit test would add (especially in terms of simply preventing regression).
Using Docker to run the services they mentioned really isn't that complex and hardly requires maintenance. Using Docker Compose you can literally spin up and link those services together in just a few lines of YAML. Not sure what maintenance you are talking about either aside from sometimes bumping the version number.
> Running a bunch of services creates all sort of failure modes unrelated to the tests you are running and requires maintenance in the long term.
Yes, and mocks are not free either -- the more complex the backend system you're dealing with, the more complex your mock of it must be, and that code will be brittle. I specifically said in the case where backing software is simple/easy to spin up, then it makes sense to spin it up. How much of Redis are you going to re-implement to get a mock with good parity that is in the end still missing the quirks of real redis?
> Adding docker to it only increases the overall complexity.
We are literally discussing a project that runs temporary isolated postgres with an extremely simple single executable experience. It's case-dependent but all you need is a different query string. It's not that hard -- if you think it's that hard to spin up a redis/postgres container on your machine in 2021 then I don't know what to tell you, a lot of things are going to be hard for you.
And again, what you want to do instead is write an approximation of postgres in your codebase that will never run anywhere else -- this kind of thing is only easy if it's done for you. As soon as you have a non-trivial interface (let's say one that would do a JOIN in production but you've mocked locally) then you're going to have a hard time writing that mock and maintaining it.
> There are good reasons for doing unit and functional testing with mocks.
Did I say there weren't? You've eviscerated the strawman you made.
Let me restate -- if you're mocking function calls/point-in-time calls with what they're supposed to return then that's fine. However, if you're maintaining a fake approximation of a redis server in your code that isn't a library (i.e. written by someone else), then that is stupid. Just run redis (even without a container). If you find it difficult to run a single extremely easy to administer (in the case of redis) external process (dockerized or not) in your tests, then I don't know what to tell you -- I just hope we never work together.
> The FAANGs that gave me offers seem to think otherwise...
If you judge your capability by which FAANGs give you offers I don't know what to tell you. The only FAANG I personally respect implicitly is Netflix because they have a well documented culture and they basically cast off a ton of their old stuff as open source and you can see how far ahead they are. Still this doesn't mean every engineer at Netflix is a good one or doesn't have things to learn/things they can't do.
BTW speaking of FAANGs, Google does their builds and tests with Bazel -- which they invented to suit. Developer tooling is definitely. I don't know where Bazel sits vs linux containers on the complexity scale, but it's certainly not simple.
> Guess I struck a sore spot here.
Nope I just... honestly mean that, with no particular malice directly at you. I find that when I work with/on teams that have developers that just can't tolerate anything outside their own realm (self professed "frontend" engineers who hate backend/ops/anything else) it really holds everyone back.
For example when a small squadron of developers if vehemently against containers just because they can't see the benefits (this was a lot more common ~5 years ago), without having spent time to understand the technology. Creating good UX for those developers is one thing, but the restriction of the solution space to something suboptimal just because people don't understand and choose not to research/understand technology is frustrating to me so it's not a good place for me to work.
Wow, this is neat when one has a Postgres instance running out there.
A portable alternative which I find very useful is to start a database in a container. I use ory/dockertest for that. I've written more about it here: https://gruchalski.com/posts/2021-04-24-on-software-testing-.... It's in the context of golang and launching the Ory platform for integration testing. Edit / added: could be relevant here for some. ory/dockertest launches containers for tests. It's super easy to do this in any technology.
For those using Java, it's also possible to achieve this using Testcontainers[0]. It's nice to have the test DB setup co-located with your test code, and if you're using JUnit it's really easy to manage and align the DB lifecycle, e.g. fresh DB for each test, shared across class, etc.
It's very simple to use and works perfectly. The one problem, and it's a significant one, is that it only ships one version of PostgreSQL, and adding another was difficult enough that I didn't.
Unless you rely on transactions in your code. I’m going to take the schema route for a spin. You essentially utilize a schema as a namespace and destroy it when tests are complete (as opposed to the “public” default schema)
This way you don’t need to worry about provisioning a stand-alone db for each test run, but still get ephemerality. Plus the ability to run transactions.
> You essentially utilize a schema as a namespace and destroy it when tests are complete (as opposed to the “public” default schema)
Unless you rely on schemas as static namespaces in your code (even if the schemas your code expects includes the "public" common schema).
I prefer using dedicated schemas for dedicated micro-apps/services/components. Say, one for the event log, one for the (business) metadata, one for the reports.
> This way you don’t need to worry about provisioning a stand-alone db for each test run
Practically, `CREATE DATABASE tmp_xxx; \c tmp_xxx` isn't much different from `CREATE SCHEMA tmp_xxx; SET search_path TO tmp_xxx, public;`. They both do essentially the same thing when the dust settles. But the latter requires migrations to be applied to the new schema, while the former allows you to short-circuit that with a simple `CREATE DATABASE tmp_xxx FROM TEMPLATE seeded_test_db`.
We modified our code base to give us nested transactions using postgres savepoints, came in handy for automatically wrapping tests in transactions too (though you can opt-out per test)
Unless your transactional code relies on deferred constraints. They don’t get checked until commit time so any test that rolls back upon “success” will not reflect an error.
Deferred constraints can be awesome, but as you point out, they can be a massive blind spot in tests that rely on transactions to keep the database clean.
However, I've had success with using the `SET CONSTRAINTS ALL IMMEDIATE`[0] statement prior to rolling back the transaction.
There are test frameworks which automate this behavior -- both Django and Rails provide test-case libraries which run each test in a transaction, and roll back to reset. (NB they also have to provide ways to disable this -- in order to, e.g., test application code which itself can conditionally roll back -- so both can also reset database state "the hard way", at a performance penalty.)
Thanks! I put it on pypi in 2011, when you could provide external http links to the tarballs. At some point they've changed all that, I've modernized it now so have another look if you're interested!
There are two things I would love to have in postgres -
1. The ability to set a point in time to roll back to, revert everything back to that point. Would be useful in lots of development scenarios
2. The ability to set a point in time and then allow me to make changes however - direct sql, using an attached application, whatever - and then get the DDL/DDM necessary to take it from the snapshot to the current state. I would love to be able to take a seed database, hook it up to an app and use the UI to set up scenarios that I could then use in my unit tests. This isn't bad in isolated cases, but when setting up a scenario takes coordination of tens of tables, this would be much easier. I could imagine it being useful in a lot of other scenarios as well.
I'm not sure if you could do #2 with something on top of logical replication - ive started to look at that in the past but I would really like something that outputs SQL.
I think a practical alternative is to CREATE DATABASE .. TEMPLATE foo. Maybe it could be boosted up by supporting reflink copies many filesystems nowadays support. In fact reflink in context of PostgreSQL has been discussed earlier, I wonder whatever happened to it: https://www.postgresql-archive.org/file-cloning-in-pg-upgrad...
Full disclosure way up front: I'm one of the developers working on https://spawn.cc at Redgate and the two scenarios you describe are some examples of exactly what we've built Spawn for.
Not trying to sell you anything (since we are still in beta anyway!) but I'd encourage you to try it out and see if it does what you're looking for. We're in the early stages so any and all feedback is really helpful!
It's a hosted service where you can immediately spin up copies of "data images" (a snapshot of schema and data in a database instance) instantly regardless of the size. As an example, we created a copy of the 400GB public stack overflow DB for development in ~15s.
You can also snapshot the state of the DB at any time and move around those save points whenever you like. Creating a "data image" for later copies is also possible based off the current state of the database, so you can curate data through your web UI, save the state, and then make that dataset available for multiple copies across your dev team.
Unfortunately savepoints only work for a non-transactional single connection test. E.g. you can only set a savepoint within a transaction. If the code under test, like an integration test, uses transactions itself, or multiple connections, savepoints won't help you. I would love to be able to globally revert all committed changes to a point in time.
SUPER SAVEPOINT A;
<NO HOLDS BARRED CODE USING MULTIPLE CONNECTIONS, TRANSACTIONS, NORMAL SAVEPOINTS ETC>
SUPER RESTORE A;
It could be implemented by playing the WAL logs backwards, I imagine.
Alternatively, just making CREATE DATABASE WITH TEMPLATE fast would be nice. Even on tmpfs and no fsync this operation takes about 300 ms on my (fast) workstation. Then you could consider the template database a savepoint in itself.
I would love to see something like dbaclone [1] but for postgres which in principle creates a new database and redirects reads to the source db and writes to your clone.
This works very well in unit testing for us but there are situations where we use postgres and not sql server.
> redirects reads to the source db and writes to your clone.
Looks like you can do almost that kind of thing with pgpool[1] (although it does look like you'd need a weird replication setup first which might be a deal-breaker.)
This is a great little package. I added it to one of our back-end services' test suites and it definitely improved issues around tests interfering with one another.
The challenge we had was that, if a test crapped out because one of our CI/CD runners had issues, it would leave the shared database for that runner in an undefined state.
pg_tmp incurs little or no overhead and they just magically disappear the end of a test run.
Super cool and useful. In the page there is this link about A guide for accessing the features of PostgreSQL in test-driven development which is nice:
https://www.youtube.com/watch?v=4Vigf4NG6-s
Are other people not already doing this? You just need a shell script and Docker and you can run your own copies of a database, modify them, export them, hell, even diff both DML and DDL, just with the diff command and stock backup/restore tools. Pop it into a CI/CD pipeline and you don't need to write custom SQL migrations anymore.
You could run MySQL in a container? Start a container per test-run and create a database per test or test suite. If your code is written well, you'll be able to change the database that your DAO/ORM/Query-layer runs on quite simply, and execute/ensure migrations from code.
Unlikely, MySQL does not have transactional DDL (like CREATE TABLE).
It's not a big loss when managing production databases in practice because with large tables you roll forward, but it would be nice for dev/qa environments.
We do this on every build: start a fresh postgres instance, run all of our DB migrations to get us to the latest DB schema, and then run our test suite.
It seems to me that my comment you're replying to applies equally well in response to this...
I agree it's nice, painless, etc. I just wanted to be clear that it's not some novel ground-breaking GitHub feature, it's just a generic CI/CD SaaS feature. Not switch to/use GH Actions, but do this/have CI.
Yeah, was going to comment something similar. For local testing I have some docker-compose files set up that create an ephemeral db that I can test against by migrating up to the current schema revision and then running tests. Saves time in pushing to the cloud and needing to wait for GitHub/Lab to build, pull and provision containers.
[0]: https://github.com/allaboutapps/integresql