Hacker News new | past | comments | ask | show | jobs | submit login
Launch HN: Meticulous (YC S21) – Catch JavaScript errors before they hit prod
122 points by Gabriel_h on May 2, 2022 | hide | past | favorite | 40 comments
Hey HN, I'm Gabriel, founder of Meticulous (https://www.meticulous.ai). We're building an API for replay testing. That is, we enable developers to record sessions in their web apps, then replay those sessions against new frontend code, in order to catch regressions before the code is released.

I was inspired to start Meticulous from my time at Dropbox, where we had regular 'bug bashes' for our UX. Five or six engineers would go to a meeting room and click through different flows to try to break what we built. These were effective but time consuming—they required us to click through the same set of actions each time prior to a release.

This prompted me to start thinking about replaying sessions to automatically catch regressions. You can't replay against production since you might mutate production data or cause side effects. You could replay against staging, but a lot of companies don't have a staging environment that is representative of production. In addition, you need a mechanism to reset state after each replayed session (imagine replaying a user signing up to your web application).

We designed Meticulous with a focus on regressions, which I think are a particularly painful class of bug. They tend to occur in flows which users are actively using, and the number of regressions generally scales with the size and complexity of a codebase, which tends to always increase.

You can use Meticulous on any website, not just your own. For example, you can start recording a session, then go sign up to (say) amazon.com, then create a simple test which consists of replaying against amazon.com twice and comparing the resulting screenshots. You can also watch recordings and replays on the Meticulous dashboard. Of course, normally you would replay against the base commit and head commit of a PR, as opposed to the production site twice.

Our API is currently quite low-level. The Meticulous CLI allows you to do three things:

1) You can use 'yarn meticulous record' to open a browser which you can then use to record a session on a URL of your choice, like localhost. You can also inject our JS snippet onto staging, local, dev and QA environments if you want to capture a larger pool of sessions. This is intended for testing your own stuff! If you inject our snippet, please ask for the consent of your colleagues before recording their workflows. I would advise against production deployments, because our redaction is currently very basic.

2) You can use 'yarn meticulous replay' to replay a session against a URL of your choice. During replay, we spin up a browser and simulate click events with Puppeteer. A list of exceptions and network logs are written to disk. A screenshot is taken at the end of the replay and written to disk.

3) You can use 'yarn meticulous screenshot-diff' to diff two screenshots.

There are lots of potential use cases here. You could build a system on top of the screenshot diffing to detect major regressions with a UX flow. You could also try to diff exceptions encountered during replay to detect new uncaught JS exceptions. We plan to build a higher-level product which will provide some testing out of the box.

Meticulous captures network traffic at record-time and mocks out network calls at replay-time. This isolates the frontend and avoids causing any side effects. However, this approach does have a few problems. The first is that you can't test backend changes or integration changes, only frontend changes. (We are going to make network-stubbing optional, though, so that you can replay against a staging environment if you wish.) The second problem with our approach is that if your API significantly changes, you will need to record a new set of sessions to test against. A third problem is that we don't yet support web applications which rely heavily upon server-side rendering. However, we felt these trade-offs were worth it to make Meticulous agnostic of the backend environment.

Meticulous is not going to replace all your testing, of course. I would recommend using it in conjunction with existing testing tools and practices, and viewing it as an additional layer of defense.

We have a free plan where you can replay 20 sessions per month. I've temporarily changed our limit to 250 for the HN launch. Our basic plan is $100/month. The CLI itself is open-source under ISC. We're actively discussing open sourcing the record+replay code.

I'd love for you to play around with Meticulous! You can try it out at https://docs.meticulous.ai. It's rough around the edges, but we wanted to get this out to HN as early as possible. Please let us know what you might want us to build on top of this (visual diffs? perf regressions? dead code analysis? preventing regressions?). We would also love to hear from people who have built any sort of replay testing out at their company. Thank you for reading and I look forward to the comments!




Looks like an impressive tool that makes a previously hard but useful process an order of magnitude more approachable.

With waldo.io and Checkly It joins the list of QA force multipliers that would make my life, as the sole developer in a bootstrapped startup trying to punch above it's weight, much easier. First, they give me a taste with a free plan, then hit me with production pricing we still can't justify.

If I have this right, 20 sessions is just a trial and 1000 sessions is very careful use.

If these scenarios are so easy to create, I would imagine you would make something like 50 (?), run them against every deploy (10 a day?). That's 15.000 replays, right?

So the 100$ plan is something like 10 scenarios at 3 deploys a day. That sounds too scaled back to get good use out of it. Or do I have the wrong idea about the intended use case?

I get it though: they all have reasonable pricing for a unique service that provides real, obvious value AND may actually be expensive to run. I'm just a little sad about not getting to use them.


Thanks so much for this feedback here. You have exactly the right idea for the use case, and so we might need to scale up our replay thresholds here.


For pricing that is both "small fish" friendly and not too cheap, I like what Vercel is doing, where the smallest paid plan is basically a discount package on the pay-as-you-go pricing beyond that, especially if you use just one seat.


That's really interesting. I'll take a deeper look at their pricing - thank you!


Pretty slick! I wish we had this a long time ago. At the time, our testing infrastructure was a bunch of very flaky Selenium tests that we would run on through SauceLabs. The tests were super slow, mainly because we tried to reduce flakiness by buffering clicks/interactions with sleep() commands. All around, a painful experience which developers hated, which meant engineers did everything they could to avoid adding/modifying tests. It was the worst vicious cycle.

Biggest concern I would have is portability. One benefit of testing suites, when done right, is they gain more coverage over time, especially against regression bugs. I would be very concerned about building up a large suite of tests for my most critical flows on proprietary tech that could be rendered worthless in an instant if the company goes bust, decides to pivot, etc.


> we tried to reduce flakiness by buffering clicks/interactions with sleep()...

You should have used waits instead of sleep. Wait pools until your DOM is in the state you are waiting it to be, unless given timeout occurs: https://www.selenium.dev/documentation/webdriver/waits/#expl...


It's been a while. I think we did use waits but they were still flakier than you might expect.


It works acceptable for me with a caveat. I do have Retry(3) on UI tests, because yeah, somewhat flaky.

Especially when system has to warm up.

In the end it is Okay


Ouch, that does sound like a painful cycle. It happens, and no one tests as much as they'd like to.

> Portability of the data.

I hear your concern here. The replay data and session data is also saved to disk, so you can save this somewhere. Of course that still leaves the risk of the record & replay tech. I think open sourcing this would solve the portability issue here, and it's something we're actively talking about but haven't reached a conclusion on yet. Anecdotes and examples like this though are incredibly helpful in helping us make that decision.

Thank you for the feedback!


I worked at a couple of dev tool startups that required you to invest some effort to adopt and would require effort to move away from (e.g. if the company went under) - open sourcing ended up being basically a necessity to sell the product (enterprise sales during seed and A). YMMV, but I would definitely encourage open sourcing enough to make potential adopters feel comfortable that they won't have to do a big, urgent migration and/or lose a bunch of engineering investment if you go under or decide to pivot.


I used to work for Testim.is that does this. I’m now a user at Microsoft (which we’re using before I joined).

Overall the idea works very well and Testim had/has a lot of customers using record/playback.


Good luck!

We tried and failed to create a “bug capture” offering in Testim.io - what helped us work with comapnies like Microsoft/Salesforce and eventually make an exit and sell to a much larger player (Tricentis) is focusing on rock-solid AI improving tests. The founder still believes the capture idea (qa capture bugs for devs) has a lot of merit but I think there are fundamental issues with anything that doesn’t reproduce timing perfectly (some do like Firefox’s replay.

I’m not with Testim anymore but still very excited people tacking this problem and I warmly recommend pinging Oren@testim.io (the founder, an engineer, a GDE and a nice guy) for pointers - he likes giving free advice and investing in new players in the space to cultivate the ecosystem (most companies currently have no e2e tests)


We are taking every scrap of luck we can get our hands on - thank you!

> Testim/QA Capture

Self-improving tests is a really interesting area. Timing is definitely an intricate issue. You probably have to layer on top of each other a bunch of different and novel techniques to get something with good signal-to-noise. We're still working on developing those out :)

Oooh, thank you for the rec. I'll make sure to ping Oren after the launch. The space is enormous and my understanding is that the rate of growth for testing tooling will exceed the rate of growth for software, which leaves QA and testing companies in a good position.


> most companies currently have no e2e tests

I would be curious what percentage of corporate repos have any tests.


Around 15% had automated tests that run on deployments with coverage of the app. Out of those 95% were Selenium and the most of the rest tools that wrap it. Tools like Playwright or Cypress while interesting are a very small percentage of automation currently. To be fair Playwright is pretty new.


Heh, I've been waiting for this Show/HN/Launch, as I applied to the company a few months back via workatastartup, and it seemed to me like this would be an awesome product.

Looks very promising, wish the team the best!

Catch those errors before hitting prod, sounds like the dream

PS: As on open sourcing the record+replay code, I'm sure that'd be awesome, I only have this on my radar https://github.com/openreplay/openreplay as a FOSS alternative to fullstory/logrocket for now.


Thank you for the wishes here! That is very kind of you.

> Open sourcing

Openreplay is awesome, but we ended up building heavily on top of rrweb (https://github.com/rrweb-io/rrweb). Did you know they have their own documentary on the project? I only noticed that today.


Both rrweb and OpenReplay are very solid projects. I've spent the last year or so building a session recording tool, and I've scrutinised their code quite heavily. Conceptually, session recording is quite simple, but there are so many edge cases, security models (CSP, feature policies) and performances issues to overcome.

Performance is probably the area I've spent the most time thinking about: if you want to measure performance regressions in a page, instrumenting it with a session recorder is definitely a way to skew the results (for example, checking scroll position of elements during snapshotting will trigger a reflow).


Wow, that's impressive. Love the network request/response capturing for the replays.


Hi Gabriel, congratulations for the launch!

I think software development is due for a disruption and your take on testing is spot on.

As part of a dev tool belt we developing, we are building a tool to translate user interaction into a selenium script, and have the selenium script run on our server. User get to take away the script so they don't get vendor lock-in. What is your approach into replaying a user session?

On a broader picture, I think what you do has potential beyond QA. e.g. if run on production, have CS hop on the same session as a troubled user.

Just checked your profile. It seems we both are based in london. Maybe we can grab a beer to discuss the potential.


How does it differ from ages old selenium and it’s browser plug-in to record tests? Afaik the worst bit is the indeterminism introduced by network waits, etc that makes e2e complicated and often not worth its upkeep.


Congrats on the launch!

I've been looking forward to this launch for a while; I've spent a lot of time experimenting with session recording and how it can work for regression testing, reproducing bugs, measuring performance, sharing feedback during development - there is so much potential here.

> We're actively discussing open sourcing the record+replay code.

I think open source is a good call - supporting an option to self-host would make a lot of sense, since session recording will inevitably slurp up PII or sensitive data which could put off some users.


When talking about "capturing network traffick", does that include SSE and WebSockets? If used for regression testing, how do you go about updating existing recordings?


It does not include SSE or WebSockets.

With regards to updating existing records, unfortunately we don't currently have good tooling & support for this, so you may need to record new sets of sessions as your application changes. I would suggest starting off with testing a few core flows.


Thanks for clarifying. The value proposition of automatic request/response recording brings something genuinely new to the area, but without a WS support, not all projects could benefit completely.

This reminds me of the test suite for one of my projects where E2E tests only covered relatively simple scenarios because of subpar WS mocking support, leaving more important, complex interactions to manual (can't run in CI) or fully integrated (expensive to author and run often) testing. The situation changed only after we wrote a custom WS mocking layer over the HTTP mocking the framework provides, yielding a dramatic increase in coverage. Out of dozens of developers I interviewed, only a few solved this issue to some degree. Clearly, mainstream testing frameworks provide insufficient support for the use case.


I agree that WS replay could be very powerful, but I'm not sure it's straightforward. Once you get out of the realm of request/response and you are dealing with subscriptions, connection multiplexing, or frankly any other sort of pushed data that is triggered by whatever is on the other end, knowing how and when to play that back against new sessions is very hard. It seems very application specific on face value.

But I agree that it would enable replay for a lot of very interesting, complex projects. I've worked on some FX trading UIs that have been challenging to test without standing up a lot of backend services.


Congrats on the launch.

How do you deal with generating resilient selectors automatically? That's a class of problem which plagues this type of tool in my understanding.


Would to love to know this too. Would it fare well against a very interactive desktop-like web app that may not be built everywhere to be testable. We use selenium but it is always a trade off due to natural flakiness if web testing.

My other concern is I would want it to be open source and self hostable. That runs against being commercial though but serious companies will pay a licence for support. If not then a solid promise that the company phoenixes to open source if it shuts down, and allows closed source self hosting while open.


What do you think the pros and cons are compared to playwright.dev? The top-level features of recording, replaying, and diff-ing seem very close in my understanding.


One key difference is that with playwright you have to replay against some environment.

Meticulous captures network traffic at record-time and stubs in responses at replay-time, which removes the need for a backend environment to replay against. We also apply a little fuzziness in our replay, like how we pick and choose selectors (e.g. imagine css selectors with hashes in them, the 'same selector' will look very different between two builds).

We have a long way to go in making this robust though.

Is there anything you wish was easier when writing tests with Playwright?


> Meticulous captures network traffic at record-time and stubs in responses at replay-time, which removes the need for a backend environment to replay against.

That's neat.

We've only just started using Playwright, but we're surprised at how easy it is to get going; but we're also all developers. We primarily use it to test large feature flows that are hard to mock in unit tests. For example, one test logs you in, uploads a file, waits for the result, clicks on the download button and makes sure the downloaded file is what we expected. We mainly want to ensure that we don't accidentally delete or hide the login widget or the download button while working on something tangentially related.

In the example outlined above, we don't mind spinning up the backend locally, as this allows the test also to make sure the response is correct.

However, I see how being able to only test front-end code quickly and easily without the need for a backend can be helpful in many applications. Congrats on the launch, and good luck!


Evaluating technologies by analogy, how far off base am I?

jQuery, Angular, React

are analogous to:

Selenium, Puppeteer, Playwright

Is Selenium still worth considering for a brand new project, though primarily due to ecosystem instead of implementation (Ruby, IDE, etc.)?

To frame the question in context of the analogy, though now completely off-topic:

What is the React equivalent of datatables.net?


Best of luck!

Gabriel and the team are really awesome and this product is a genius idea - I'll definitely be using it at my company as it could save us a ton of work in setting up our testing pipelines.

- Alejandro :)


Congrats on the launch and good luck!

Reading through thus just reminded me of Datadog browser tests. It's not exactly the same, but it might be interesting to check them out.


Could you explain to me the benefit this tool offers over something like Playwright + docker-compose (which I think also does stuff like this)?


It really depends on your use-case here.

If you’re able to spin up an environment via docker-compose and play against that with playwright, then I think that’s good for that use-case.

However, if you’re testing a flow that relies on some initial state, it can sometimes be tricky to seed that state or do so in a way which is representative.


Would like to know how it differs or compares with Cypress. Is it the mocking of the network calls that makes it different?


Congrats on the launch! Really exciting to see this coming together :)


Congrats on the launch! meticulous.ai is great and the team is A+


Wow




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: