
A Two Month Debugging Story - craigkerstiens
https://kev.inburke.com/kevin/a-two-month-debugging-story/
======
rraval
> To ensure each test has a clean slate, we clear the database between each
> test.

The proposed solution to manually nuke the database state seems crazy to me.
Some alternatives:

1\. Run the entire test in a transaction, do flushes and assert as normal. At
the end, ROLLBACK instead of COMMIT and now you have a pristine database
again. [1]

2\. Setup pristine DB state once and then use `CREATE DATABASE ... WITH
TEMPLATE ...` to create a temporary database. Not sure what the perf hit is,
but it's probably worth trying. [2]

[1] [http://alextechrants.blogspot.ca/2013/08/unit-testing-
sqlalc...](http://alextechrants.blogspot.ca/2013/08/unit-testing-sqlalchemy-
apps.html)

[2] [https://www.postgresql.org/docs/9.4/static/manage-ag-
templat...](https://www.postgresql.org/docs/9.4/static/manage-ag-
templatedbs.html) CREATE DATABASE actually works by copying an existing
database. By default, it copies the standard system database named template1.

~~~
kevinburke
Hi, I wrote the post. We'd love to use transactions but our ORM doesn't
support it. More info here: [https://kev.inburke.com/kevin/dont-use-sails-or-
waterline/](https://kev.inburke.com/kevin/dont-use-sails-or-waterline/)

We wrote our own transaction library and have been shifting queries over to
it/a saner ORM. [https://github.com/shyp/pg-
transactions](https://github.com/shyp/pg-transactions).

I've never heard of database templates, I'll have to take a look.

~~~
agf
If your ORM doesn't know about transactions, that doesn't necessarily prevent
you from wrapping your test runs in transactions. I was in a situation where I
needed to start / rollback a transaction without ActiveRecord knowing about
it, and was able to hack AR slightly to make that work -- so you can likely do
the same.

You even mention in the Waterline section of that blog post that you've
written your own transaction library, so this shouldn't be too hard.

~~~
jwatte
Or run your database on a RAM disk with snapshots (LVM, FS, block device, or
virtual host based)

~~~
meshko
I would imagine resetting the whole db even on a RAM disk after each of 6000
tests wouldn't be particularly fast.

~~~
jwatte
With snapshots, it should be instant.

------
mattbee
Oof - seems like you're running on a managed platform that can't be debugged
in a reasonable level of detail?

If it were me I'd want to take the problem somewhere I _could_ get root access
and debug it properly. So I'd be interested to know what value this (unnamed)
CI platform provides to make it worth a wild goose chase.

~~~
mattbee
Additional: Maybe it's just me but not being able to debug drives me CRAZY,
sorry that I'm not being more helpful :) I've done bits of embedded systems
work where you have to chisel out your own debug tools from glue and LEDs and
serial ports and I basically hated it, or rather learned a bunch of ways of
making it "right first time".

Test technology surely doesn't need that level of secret sauce that you need a
hosted service - after two months I'd definitely want another way of doing it.

------
joshribakoff
I'm not saying that e2e testing is without value, it can be useful... but its
just not worth it I've found. I write unit tests only, which don't touch the
database. My litmus test is am I testing an "algorithm" \-- something that has
outputs solely based on its inputs.

If you insist on tests that touch the DB, the speed can be improved by
chaining tests instead of resetting the DB everytime (this has its trade
offs).

If I need to test a piece of code that generates a dynamic SQL statement, I
would simply assert that the correct SQL is generated. I would not need to
actually execute the SQL to see that it is correct. The point is to test that
my logic generated the correct SQL, not to test that my database vendor
implements SQL correctly. The latter would just be caught in manual testing. I
like the BDD school of thought that you are writing specs, not tests.

~~~
primitivesuave
I second this. I worked at a research company that had terabyte-scale SQL
databases of curated data, and the tests would take 5 - 8 hours to run because
they were verifying query results rather than the queries themselves. This
also meant tests would fail because of changes in the data rather than just
changes in the code. I rewrote the entire test suite to match SQL statements
against a symbolic SQL description, which turned out to be a much more robust
and extensible solution.

------
overgard
At risk of invoking the wrath of test fanatics -- aren't tests supposed to
save time? If your tests are both unreliable and so hard to debug that you
haven't been able to fix them for two months, I'd think you'd be better off
just turning them off and figuring out a better testing strategy.

~~~
kevinburke
Hi - we definitely weren't looking at this full time for two months, just off
and on as it failed various test runs, and I had time to look at it.

We have a large suite of tests that integrate with Postgres. Maybe one out of
50,000 database queries would fail. It was hard/impossible to predict which
query would fail, since the problem wasn't with any individual query. By your
logic, we should throw out our entire test suite. I'm not sure what we're
supposed to replace it with.

~~~
overgard
> By your logic, we should throw out our entire test suite.

That's why I figured I was invoking the wrath of the testing fanatics... that
is where my logic leads. :-) I have no idea what your tests look like, maybe
they're super useful, but I've worked at places in the past where there were
thousands of tests that were frankly pretty useless, and were actually a net
negative for a variety of reasons, but to point that out got you labeled as a
"cowboy". (To be clear I'm not against automated testing, just automated
testing done badly.) If you're running them in a context where you can't debug
it, at the very least I would move it in house onto a machine where you can
hook up a debugger.

~~~
kevinburke
Our tests are pretty useful, they check things like "when you ask the API to
submit a pickup, it gets assigned to a driver", and so on. Especially in
JavaScript where anything can take any value they've been extremely helpful.

~~~
mattmanser
That's another argument I've never understood.

I've been working on applications with javascript for over 10 years.

Not once can I remember a javascript bug where someone passed a value as the
wrong argument and it made it into production.

Maybe it has happened, but it's just not a common bug. I just don't remember
it.

Like, you can easily write it as a mistake as you're developing but it's
obvious as soon as you run it. But if that bug gets in production your dev
didn't even bother to run the code to see if it worked as intended.

------
voiper1
First thought that came to mind was to just blow away the database and create
a new one, see how long that takes.

I had found waterline's load time to have a LOT of things that I hadn't
expected and take a long time... I'm just using knex now. Every time I try for
magical solutions, I keep regretting it and end up using less magic.

~~~
jdmichal
Magic is fine, but there should always be a path to do things sans pixie dust.
For something like an ORM, that would be a path to retrieve the raw DB
connection to do things around / between queries. You even see the same
pattern play out in programming languages; C# has the `dynamic` keyword that
disables type checking on a variable.

------
jwatte
What we do when we detect a failure is freeze the test runner instance, and
allocate the same failed test to another runner. If the second runner
succeeds, we okay the build, but we put the test and the frozen runner in a
queue for investigation, and some engineer will be responsible for diagnosing
and fixing this intermittent test. This queue is worked every day on a
rotating schedule.

We run our own CI/CD infrastructure, on top of virtualized infrastructure; if
you use a third party that doesn't give you that, you might want to look for
alternatives.

~~~
akiselev
Has this strategy worked perfectly so far? I would imagine a best two of three
method would work better like those used for integrity in high radiation
environments. However, I'm guessing code is significantly more deterministic
and the danger of false positives (tests passing when they shouldn't) is much
smaller.

~~~
jwatte
This has worked well for many years. Most of the intermittently failing tests
are older, from before we had robust dependency injection, and they get
incrementally improved this way.

False positives are also super costly for us, as everyone works on trunk (by
design, to avoid skew) and deploys all ckeckins directly to production. "Best
out of two" is sufficient for the old tests, and if someone creates new
intermittent tests, we follow up with education so that isn't a persistent
problem.

------
loftsy
I think best practice here is to run the whole test in a transaction and then
roll back the transaction at the end. This is the approach Django uses.

~~~
kevinburke
We'd love to do this but our ORM doesn't support it.
[https://kev.inburke.com/kevin/dont-use-sails-or-
waterline/](https://kev.inburke.com/kevin/dont-use-sails-or-waterline/)

------
switchbak
I usually try to minimize the number of tests that rely on this (shared) DB
state. So I'll focus more on unit and integration tests. Functional tests
still do have value though. See:
[http://xunitpatterns.com/Testing%20With%20Databases.html](http://xunitpatterns.com/Testing%20With%20Databases.html)

Where possible, I like to have data that can exist independent of the other
data on the system. I can make a separate 'tennant' for that test - and just
ensure it's wiped before I proceed. Sort of a multi-tennant approach. Works
great. I don't bother with a 'teardown', but do any cleanup before the test
runs. I also ensure the tests are written to not make assumptions about global
state.

Instead of dropping all constraints as the article suggested (that sounds
hacky), I use ON DELETE CASCADE constraints. If I miss some, the tests fail.
Seems easy enough to maintain.

With the above approach, DB testing is approachable and still pretty quick.

------
fapjacks
They are using CircleCI, btw. I really like Circle, but they could sure be
more helpful with the SSH access, as you mention. Also a personal nitpick,
they (still) have no way to test a circle.yml file except for pushing a change
and seeing if it works. Anybody that's used the service will know exactly what
I'm talking about. You end up pushing four times just to get the syntax right.
They know about that (iirc there was a support forum post requesting the
feature some years ago), too. But besides that, I heartily recommend Circle.

------
markbnj
I always like to read stories of other teams' debugging adventures. One thing
that occurred to me as I got toward the end of OPs (ongoing) narrative was
that perhaps its time to fall back and consider other ways of setting up the
data environment for each test. If the data that each test is dependent on is
a small enough subset perhaps it makes sense to create a separate database for
each test or class of tests, which can just be dropped and recreated before
the test run?

------
utternerd
_disabled autovacuum by default_

I realize this wasn't their solution, but it's worth noting that generally
speaking, disabling auto-vacuum isn't recommended. Even if you do, it will
still force vacuum jobs to prevent transaction ID wraparound.

* [https://www.postgresql.org/docs/9.5/static/routine-vacuuming...](https://www.postgresql.org/docs/9.5/static/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND)

~~~
kevinburke
Agreed, but if you are running a CI provider and recycling containers, seems
easy enough to do it as part of the recycle step.

~~~
utternerd
I agree it'd be a nice option for them to provide, but I don't believe making
it the default for every customer is a laudable goal.

If vacuums are hanging/causing locks and these truly are just test tables, it
might be beneficial to use temporary tables instead as the autovacuum daemon
ignores them. Additionally, unlogged tables that reside on a memory disk can
be insanely fast - might cut some additional time down depending on your data.

------
blakzer0
I thought instead of telling you to drop your test suite, switch your ORM, or
something equally as crazy I would rather just empathize. I'm glad you guys
figured it out. We all have these kind of bugs that drive us insane. We all
have those difficult to test bugs that can't be reliably reproduced. The
feeling when you finally solve them is amazing. So congrats on finding that
crazy bug!

------
gerbilly
With Hibernate we used to connect it to a in memory DB just for the tests and
to a real DB for production.

It was easy and quick to drop schemas.

~~~
kevinburke
I agree that would be faster, but we rely a lot on Postgres's database
constraints and we've previously had code problems go undiagnosed when the
test schema didn't match production. Six in one, half a dozen in the other, I
guess.

~~~
jwatte
Why does your test infrastructure allow a schema that doesn't match
production? Seems like low hanging fruit there!

~~~
kevinburke
Sorry - that was a few months ago, we replaced it as soon as feasible. We had
to upgrade a lot of tests to make it work with our prod schema

~~~
jwatte
Right. In the best of worlds, your tests just apply the production schema
update on start, and similarly you apply checked in schemas ASAP to production
after passing tests!

------
karmakaze
> We could draw a dependency graph between our 60 tables and issue the DELETEs
> in the correct order

If the above works, automate it.

~~~
meshko
creating a dependency graph between tables is not exactly a 20 minute task.
Certainly doable, but getting it right would take time. Anyways, no need to do
it, can just do the deletes in a transaction.

------
dreamdu5t
Writing 6000 integration and unit tests can be avoided by using purely
functional statically-typed languages. There are some that even compile to JS,
such as PureScript. Most of what typical JS tests check can be expressed with
pure functions and types.

------
polskibus
Have you thought about starting up multiple replicas of your database in
parallel and then pointing each test suite to a different one? This way you
may be able to parallelize them very well.

~~~
kevinburke
This is an interesting idea!

~~~
polskibus
Let me know how it works out for you.

------
meshko
You can just do all the deletes in the same transaction and then the
constraints wouldn't need to be disabled.

------
breakingcups
This might be a weird question, but surely you run your tests locally in a
loop?

