Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: I built a tool to get instant test results (github.com/nabaz-io)
96 points by _cfl0 on Nov 16, 2022 | hide | past | favorite | 63 comments
I got sick of the old software development loop: Change code -> Run tests -> wait -> wait some more -> look at failures.

I decided to build a tool that will enable you to: Change code -> look at failures.

No wait time, no explicit test running.

Under the hood:

  - Runs the whole test suite and collects code coverage per test.

  - For each auto file save, analyzes the changes on the tests.

  - Runs changed tests in the background.

  - Display results, the loop time from change to test results is approx 250ms.
Instead of: code -> alt+tab -> arrow up -> rerun all the tests -> wait ... -> test results

code -> alt+tab -> test results!

Check it out! <https://github.com/nabaz-io/nabaz>




This sounds a lot like a couple of old JUnit test runners:

InfiniTest - https://infinitest.github.io/ an dhttps://infinitest.github.io/doc/intellij#how-it-works

JUnit Max - https://web.archive.org/web/20090206151635/http://www.threer... partially surviving as https://junit.org/junit4/javadoc/4.12/org/junit/experimental...

Ruby has something similar:

autotest - https://github.com/grosser/autotest

I think it's a good idea, and i'm surprised it hasn't become a standard feature of IDEs and test runners.


Jest, probably the most popular JS test runner of today, also has had it since forever.


My Jest can run on only changed files, but it still takes forever. Much longer than 250ms


Right.. Because out of a 100 tests it will run close to that number. when you change code most of the time only a handful of tests or less need to run.


Had what since forever? Does jest analyze the changes you make and compare them to code coverage? (hint: no)

The key innovation piece here is the speed that the extreme selectively enables you.


Ruby also has crystalball (https://toptal.github.io/crystalball/) which is similar, but not quite the same.


Main difference is that crystalball still runs all the tests, but runs the recently failed ones first.

Nabaz selects a small subset of affected tests, reducing CPU usage, and more importantly improving speed drastically!


I like the concept and it seems that "run them all and see what changed" is a great model to avoid missing any side effects from a code change. However, if you're running all the tests on every change, doesn't that mean the responsiveness is necessarily greater than 250ms if your test suite is longer than 250ms?

Regardless, running tests in the background to eliminate human delay is good. It would be nice if the (~test time)*(test change likelihood) of the tests were used to order tests based on fastest relevant feedback.

A code->test change likelihood map may also be used to reveal previously unexpected dependencies. Also ~test time could be difficult (see halting problem and windows time remaining estimates).

This is good work.


This tool doesn't run all the tests, only the ones affected by the code changes since the last run. It figures out which tests to run by initially running all tests, and storing the code coverage of each test.

It doesn't need to rely on "test change likelihood" - if a code change is outside of the code coverage of the test, it doesn't affect the test.


> If a code change is outside of the code coverage of the test, it doesn't affect the test.

This assertion questionable, the extent it is true it is language specific.

Here is a counterexample in Java:

  // foo.java 
  public class Foo {
    public static final int LENGTH = 10;
  }

  // bar.java 
  public class Bar {
    public int getFooLength() {
      return Foo.LENGTH;
    }
  }
Code coverage tools will not highlight Foo as being touched when Bar.getFooLength is invoked, even though a test asserting that getFooLength() is 10 will fail or succeed depending on Foo.LENGTH. The reason for this is that Foo.LENGTH is inlined in Bar. If you for example wanted to patch a new version of the value, you need to supply Bar.class to affect the change; patching Foo alone would do nothing.

C and C++ likewise do not satisfy this requirement because macros.


You'd want to augment the code analysis with the dependency graph of object/class files (or use it instead of what the OP is using). I'm not sure if it's bullet-proof, but except for runtime dynamic linking, you should get a superset of all affected changes if you just use... whatever your IDE uses to determine which files need to be rebuild when you make a change. Following that graph should give you a superset of all unit tests that need to be run.

Like, e.g. if I change a macro in a header file and press "rebuild" on the project, MSVC (or MSBuild driven by CMake) will figure out that the header was changed, chase down which translation units include it directly or transitively, and rebuild those, then link the output and... chase down everyone else who links to the output and relink them, etc.

I bet you could produce a counterexample that breaks this mechanism (C++ being what it is), but I don't expect to see it in an actual codebase.


Not bulletproof, but a-ok if your running a full run just before the commit.


So...like gradle?


The ordering feature is actually being built as we speak. analyzing liklihood can still be beneficial to cut down time even more.

If I would get a break from hn comments I could finish it today lol ;)


This is a great idea! Even better: write a Playwright/Codegen[1] test and have that execute continuously (or put a button in the IDE to trigger it). That way you can iterate on multiple code files but see how it impacts one end-to-end test in real time! (I'm developing a feature right now where I'm doing that manually and it would be nice to automate)

[1] https://playwright.dev/docs/codegen-intro


We have built something along these lines, Crusher [1] (demo [2]) is an e2e framework, you can trigger test with click of a button. It's built on top of playwright.

We’re thinking of adding watch mode in the coming months - running tests continuously in real-time. Sending you an email to exchange ideas.

P.S.- there might be few typos. It’s work in progress.

[1] https://www.youtube.com/watch?v=Nc-TlgeKBSE [2] https://github.com/crusherdev/crusher


collab w/ me ;)


I do something similar using entr [0][1]. I use it instead of `watch` because I would rather not re-run things if no relevant files have changed.

For example, I run:

  fd -e j2 -e py | entr -c python3 build.py
To rebuild a static site every time the build script[2] (py) or a Jinja2 template (j2) changes.

[0]: https://github.com/eradman/entr

[1]: https://jvns.ca/blog/2020/06/28/entr/

[2]: https://gist.github.com/polyrand/3bed83897658806bd490e1d44df...


I'm a bit confused how this differs from the existing watch modes that are built into test runners I'm familiar with, such as Jest. Is this something that I'm used to in TypeScript, but which just doesn't exist in most language's ecosystems?


Well, I would say the biggest difference is while the watch mode you described runs all the tests for every change.

hypertest only runs the tests the were impacted by the changed code. Resulting is blazing fast performance.

I’d say it’s even fast than apples alt+tab animation most of the time.


> Well, I would say the biggest difference is while the watch mode you described runs all the tests for every change.

This is not true though. Might depend on the test tooling used, but most of the ones I've used only run changed tests (either directly or through change detection via git, e.g. https://nx.dev/concepts/affected)


Let's say you have 100 tests that depend on the file being changed, but only 10 that depend on the method or line changed since last year run.

How many tests will your system run? That's the important question.


10.


Nx is even clealry different. It isn’t run on every code change, but on commits.

You can’t use it live, And also the analysis granularity is just way too wide compared to this.


> hypertest only runs the tests the were impacted by the changed code

with all due respect, this is also a solved problem with test watchers. unless you mean this is smart enough to not just run unit tests that are directly related to the file changed, but also understands code changes from the perspective of downstream dependencies?


The second part is the aim here.

This is more of a code watcher than a test watcher specifically.


sbt ~test uses the dependency analysis of the scala compiler to determine which tests to run.


Are you sure that's actually true, all the way down to the method or line level?

https://www.scala-sbt.org/1.x/docs/Testing.html


very cool!


I'm confused too. You can add regex patterns for both filenames and test names in Jest to run only the tests you want in watch mode. This tool just makes it so you don't have to input that configuration? Doesn't seem like a huge value add.


For folks using bazel, there is also [bazel-watcher][1]. Also worth checking out [watchman][2] for this style of workflow in general.

[1]: https://github.com/bazelbuild/bazel-watcher [2]: https://facebook.github.io/watchman/


Bazel is awesome, it just runs way too many tests to be fast enough


It sounds like your bazel packages might be too large then.

Bazel (and bazel watcher) really shines when you have lots of small packages.


--test_filter?


That's sending a human to do a robot's job.


I believe this is similar to how Jest works in watch mode, where it only runs the tests for code that changed.


> hypertest is an open-source CLI tool designed to help you maintain focus on the current coding subtask, built specifically for the easily distracted.

I don’t quite get this proposition. If easily distracted, and wanting to stay focused, wouldn’t you rather want to run tests only on demand and not be interrupted by failing tests after changing just a line or two and still not finished with the rest of your changes.

I can see the benefits of a fast feedback loop. But not sure the pitch of keeping you focused is how i would see it work for me. YMMV.


If you are already running tests after every save, and watching for the outcome, but the test takes too long, you might get distracted waiting for the test


Sure. I would suggest a different workflow rather than trying to optimise the current workflow.


You aren't interrupted, you choose when to alt+tab to see your test results.


“Test impact analysis” is one name for this general approach. There are a few projects doing this in the Python ecosystem that don’t fully work (Smother, pytest-tia, and some others). Great to see another entry.

Can you cache the coverage DB so this can be run in a CI pipeline?


Foresight does this if you're using github actions for CI. https://www.runforesight.com/test-gap-analysis (disclaimer: one of the founders)


Sure, read docs.nabaz.io (CI solution w/ storage over mongodb)


For Java devs, Quarkus does this for a while: https://quarkus.io/guides/continuous-testing


1. Does this mean I have to disable IDE autosave so it doesn't attempt to run tests when the project is in an intermediate, invalid state?

2. How does it detect what tests need to be re-run when a given file changes?


1. No, if build fails hypertest will just discard the result. autosave is even prefered.

2. The first run and every subsequent one collects code coverage which is saved in an sqlite db. Every file change is compared to the code coverage.

Although this approach isnt perfect, The idea here is being fast. I always rerun the entire suite just before committing the code.


That's very clever, thanks for this!


I don't think #1 matters, because it runs again once it's in a valid state again. Your test will fail while in an invalid state, but that's to be expected.

As the author explained in another comment, it initially generates the code coverage for all tests. When code is changed, only the tests that cover that code are rerun.


Furthermore, it will still display the build error + the previous failure list.

So you can always rely on that background terminal for actual test failure reasons.


Just for comparison "sbt ~test" reruns all affected tests whenever some source changes. Affected tests are determined by the dependency analysis of the scala/java compiler.


This reminds me of Wallaby (except not for Javascript).


I have always used:

  watch -n 1 run-test-tool
But I guess this depends on your test tool of choice not being too slow to boot.


Not only the time to boot, but also if you don’t run tests selectively, you’re already too slow.


most test runners probably only support running tests that change. it's a good idea to collect this in one tool. good job


Appreciated! Would love to hear you’re feedback after you ran it also :)


This is great and really close to what I use myself:

while [ 1 ]; do inotifywait .{c,h} t/.{c,h}; make && make check ; done


Problem being is “make && make check” take way too much time normally.


Nice! Got an upvote

Liked the name (נב״ז), good luck


Toda ;)


[deleted]


Slow is smooth, smooth is fast.

Though I understand the intent, I think it has the risk to run into a run-hack-run-hack programming style.

I'm currently reading Code Complete which advises against this practice. Instead, one should strive to write code that works from the start instead of hoping it to work eventually.


Love that mantra! Dont really agree, it’s not intended for that.

I use it as I would run tests through a terminal normally. Built this because I hate running tests.


Got it.

Very much like the idea to check the code coverage to find out which tests you should run. First time I saw it. Is it common in other tools?


I also have a CI solution I’m launching soon.

Other than that, most of this tech was forgotten in the late 90s early 2000s




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: