Hacker News new | comments | ask | show | jobs | submit login
Polly.js – Record, replay, and stub HTTP interactions (netflix.github.io)
471 points by zwentz 8 months ago | hide | past | web | favorite | 86 comments

Looks well done (it uses unicode art, so it must be amazing) but I have a fundamental distrust/dislike of record/replay frameworks...just seems like you're papering over an inherently bad testing approach.

E.g. sure, when replays work, they're great, but:

a) you have to do a manual recording to create them the first time (which means manually setting up test data in production that is just-right for your test case)

b) you have to manually re-record when they fail (and again means you have manually go back and restore the test data in production that is just-right for your test case...and odds are you weren't the one who originally recorded this test, so good luck guessing exactly what that was).

In both cases, you've accepted that you can't easily setup specific, isolated test data in your upstream system, and are really just doing "slightly faster manual testing".

So, IMO, you should focus on solving the core issue: the uncontrollable upstream system.

Or, if you can't, decouple all of your automated tests from it fully, and just accept that cross-system tests against a datasource you can't control is not a fun/good way to write more than a handful of tests (e.g. ~5 smoke tests are fine, but ~100s of record/replay tests for minute boundary cases sounds terrible).

> you have to do a manual recording to create them the first time (which means manually setting up test data in production that is just-right for your test case)

Your test invokes the the recorder. There isn't anything manual outside of writing & running your test.

> you have to manually re-record when they fail

Again, nothing manual. It would require running your test again with Polly in record mode if you want to "refresh" the recording with a newer set of responses.

> In both cases, you've accepted that you can't easily setup specific, isolated test data in your upstream system, and are really just doing "slightly faster manual testing".

This is by no means a replacement to E2E testing. It is a form of acceptance/integration testing where you're testing your application against a point in time that you verified all systems were talking correctly with your application. E2E tests are much slower, difficult to debug, and intended to capture those breakages in contracts.

It's a tool for your toolbox, reach for it when needed. We plan to release a tutorial/talk to should clear up any misconceptions. There are also other applications for Polly such as building features offline or giving a demo using faker to easily hide any confidential data.

> It's a tool for your toolbox, reach for it when needed

Sure, apologies for being negative about a tool you've worked on and are rightly proud of. I'm sure you already have more users than any open source project I've ever written. :-)

I struggle a bit at this point in my career, as I've made enough mistakes and seen enough mistakes, that I generally have strong gut opinions on "yeah, that's probably not going to work/scale/etc."

So, when observing new developers/teams starting to "make a mistake" that I've seen before, my gut says "no! bad idea!"...but I know I could be wrong, so it's tempting to say "well, sure, that didn't work for us, but go ahead and try again".

Because, who knows, maybe eventually someone will figure out an innovation that makes a previously-bad approach now tenable, and even best-practice.

But, realistically, that rarely happens, and so teams, orgs, the industry as a whole stumbles around re-making the same mistakes, and codebases/teams/etc. pay the cost.

I've thought a lot about micro-service testing at scale:


Basically there are no easy answers, short of some sort of huge, magical, up-front investment in testing infra that only someone like a top-5/top-10 tech company has the eng resources to do.

So, definitely appreciate needing to do "something else" in the mean time. ...record/replay is just not a "something else" I would go with. :-)

> Again, nothing manual

Yes, sorry for being inexact/overusing the term--I understand the tests drive the recording.

What I meant by manual is getting the e2e system into your test's initial state.

E.g. tests are invariably "world looks like X", "system under test does Y", "world looks like Z".

In record/replay, "world looks like X" is not coded, isolated, documented in your test, and is instead implicit in "whatever the upstream system looked like when I hit record".

Which is almost always "the developer manually clicked around a test account to make it look like X".

This is basically a giant global variable that will change, and come back to haunt you when recordings fail, b/c you have to a) re-divine what "world looks like X" was for this test, and then b) manually restore the upstream system to that state.

If no one has touched the upstream test data for this specific test case, you're good, but when you get into ~10s/100s of test, it's tempting to share test accounts, someone accidentally changes it, or else you're testing mutations and your test explicitly changes it (so need to undo the mutation to re-record), or you wrote the test 2 years ago and the upstream system aged off your data.

All of these lead to manually clicking around to re-setup "world looks like X", so yes, that is what I should have limited the "manual" term to.

But in the case we're talking about, where you're reliant on an external service that can change underneath you, "world looks like X" is genuinely not under your control. It feels like pretending that it is will lead to just as many failures as acknowledging it's inherent volatility.

Agreed! And, to me, record/replay is still pretending like it's controllable, b/c even if you decouple for replays, records will always be a PITA.

My depressing solution is to just not even try to automate tests against the upstream system and instead invest in test builders/DSLs that make mocks/stubs on both sides as pleasant as possible.

And when bugs slip through, make sure to update your stubs/mocks on both sides to prevent the regression.

To me this gets the most agility and reliability, and will be a test suite that developers don't hate 1-2-5 years down the road.

Can you post the tutorial here? Thanks @jasonmit

I also don't like recording frameworks for TDD (and similarly dislike using fixtures). However, the place where a recording framework really pays for itself is in isolating changes in protocols. I've often had to interface with, as you put it, uncontrollable upstream systems. These are systems that are not mine -- they are upstream services from other companies that I have to interact with and I have no control over. Often these systems are badly built and they play fast and loose with the "protocols".

In these case I like to have an adaptor layer and use a recording framework to "test" the adaptor. That way I can occasionally rerecord my scenarios and be notified if something important has changed. Normally what happens is that my service stops working for some unknown reason. I rerecord the adaptor scenarios and usually the reason pops out very quickly. All the rest of my code is coded against the adaptor and I stub it out in their tests (which I can do reasonably well because I control it).

I've worked for a while with similar scenarios of needing to integrate with systems beyond my control (e.g. Stripe, Slack, Google), and though I still don't have a good setup for it, I've come to the conclusion that a two-pronged approach would be ideal: Record/and/or stub the calls and responses from the external services so your normal tests are run entirely without external networking (like what Polly allows you to do). But also set up a server to periodically validate the responses from the external services against those that have been recorded, and alert you to any changes in their protocol/behavior. I've yet to see any middleware to tackle the latter (though granted, I haven't been looking too hard yet either)

> isolating changes in protocols

That makes sense.

Ideally protocols are declarative/documented/typed, e.g. Swagger/GRPC, so you can be more trusting and not need these, but often in REST+JSON that does not happen.

> All the rest of my code is coded against the adaptor

Nice, I like (the majority?) of tests being isolated via that abstraction.

Although, if "the protocol" is basically "all of the JSON API calls my webapp makes to the backend REST services", at that point do you end up adapter scenarios (record/replay recordings) for basically every use case anyway? Or do you limit it to only a few endpoints or only a few primary operations?

Yes, I like to have scenarios for all of the end points. It definitely doesn't reduce the number of tests :-) The advantage is in isolating the protocol from the operation of the application.

The main complaint I've seen for this approach is, "We shouldn't be testing the other person's system". There is wisdom in that advice, but it really depends on how much you depend on the 3rd party service and how much downtime you can tolerate. For example, I'm working in the travel industry right now and we often rely on small services that nobody has ever heard of. If we can't use the service then we can't sell anything and our site is essentially down. If it happens frequently (and with a lot of these travel services, they often break things weekly if not daily), then your site is not viable. In that case I'll exercise the protocol as much as I can. However, we also talk to marketing services, etc. If that breaks, and it takes a day or two to get it back up, then it's not a major problem -- our marketing effort might be a day late, which is unfortunate, but not game breaking. In that case I'll usually have a smoke test or two.

Also, recording raw HTTP requests makes tests difficult to organize. Especially if what you want to mock are requests to a REST server. In that case, all the recorded HTTP headers aren't significant, editing the recorded resources in the responses when the API changes is a pain, and testing scenarios where several REST requests are related (e.g. fetching posts than comments) is also a pain.

A better alternative IMO is to craft a list of resources in JSON, then use this data in a fake REST server that takes over fetch and XHR in the browser.

Something like:

    { posts: [{ id: 1, title: "foo" }, { id: 2, title: "bar"}], comments: [{ id: 1, post_id: 1, body : "lorem ipsum" }] }
Incidentally, that's the way [FakeRest](https://github.com/marmelab/FakeRest) has been working for years (disclaimer: I'm the author of this OSS package).

Insignificant things like HTTP headers and extra fields are insignificant until they're not. In my experience, manually assembling what you expect an HTTP response to look like often leads to bugs when an "irrelevant" detail suddenly becomes relevant, like when status codes change, or fields are added in a way that breaks a client, etc.

I think recording tools can be a sharp tool, and require care, but (as a starting point), if you have an automated library that can generate recorded fixtures in a repeatable, automated fashion, you can eliminate a lot of the pain points while still reaping all of the benefits. That's how we set it up where I am - responses and fixtures are generated as part of a full suite execution, but persist with individual test runs.

Not sure I understand what you are proposing as an alternative. It seems that you can either test against a 'real' system on the other end, which probably means one project pulling, building and standing up possibly dozens of other services just to run it's test suite. Or, you mock it in some way. I prefer the recording approach as it mocks at the lowest level possible giving you the most test coverage possible.

I worked on a Ruby project with thousands of VCR recordings. Never again.

How is this distinct from other http stubbing libraries?

Polly records as well as exposes a stubbing API. So it's quite different from what I've seen of the others.

Ahh. Well, there’s quite a history of recording as well as stubbing: see ruby’s VCR.

Came here say the exact same thing: "Hey, look! Its VCR for JS. Yay!"

I would love to hear from people involved in projects like this, what kind of work/ how much, was done to get it ready for and approved to be open sourced by the company.

Especially at large corps like Netflix I'm sure there's a lot of hoops to jump through.

In my company (uber), it's actually not a whole lot of hoops. Basically a light legal review that checks the license, a code review to ensure there aren't references to closed source software and infrastructure, and approval from the team manager, who is usually already on board with the desire to open source.

I'm sure the first project took some time to setup, but Netflix has released dozens of projects since. So very low hoops.

Netflix seems to also have a really strong culture around this though, so I wouldn't be surprised if it's a lot less hoopy than you'd imagine.

Not to shamelessly plug but if you're in the Bay Area on June 28, we're giving a talk that's a bit about performance, a bit about Netflix engineering culture: https://jstalks2018.splashthat.com/.

If I’m a student can I sign up to attend?

Sure! (at least I don't see why you wouldn't be able to)

Exactly. They had engineers dedicated to creating an open source latency/fault tolerance library called Hystrix. Not too surprising they’re dedicating resources to other projects, too.

This is very cool, solving some issues I no doubt many people have when writing tests against a (fast-)moving target. I'll definitely give it a try in my next project.

I looked through the codebase, and noticed that this uses a custom data format to persist HTTP requests and responses in local storage. I'm not sure if it's technically possible in all circumstances, but I think it might be valuable to have requests and responses be stored as HAR 1.2 [1] when possible, so that the trace can be used by other tools [2] to aid in debugging, verifying and analyzing behaviour as well as perhaps automated creation of load/performance tests.

[1] - http://www.softwareishard.com/blog/har-12-spec/

[2] - e.g. https://toolbox.googleapps.com/apps/har_analyzer/

There is already famous Policy package for .NET with same name


I used to use nock which would work very well in node environments. But this works in the browser as well. So I guess this can be fairly helpful while writing tests post development. If you are doing TDD, then recording/replaying doesn't fit anywhere in the development cycle.

I like the API of this library and the browser support that was missing in nock. So thanks Netflix! Although it would have been nice to see nock add this support. Which is what I wonder - why not just contribute to existing libraries.

If you're looking for Nock but not just node, try Mockttp: https://github.com/pimterry/mockttp.

It lets you create & configure mock HTTP servers for JS testing, but with one API that works out of the box in Node and in browsers. This avoids the record/replay model too, so you can still take the TDD approach and define your API behaviour in the test itself.

(Disclaimer: I'm the author of Mockttp)

I love the name! "Polly" repeats everything... and wants biscuit every now and then :).

And this polly would love some cookies too.

"cookie" sounds even better. I wasn't sure if it was "cracker" or "biscuit" :).

Related to that, is there anything that allows to completely save the state of a modern website with all of the fetch requests and websocket related stuff it fired off?

I just want an ability to save and reopen exactly what I'm looking at. There are some cool websites which will eventually go down and I want to preserve an interactive snapshot of them.

This won't work for WebSockets, really websites that use WebSockets require some interaction to generate the transmitted message which is often dependent on the servers response. Private websites, or websites that require a login are hard - but it can be done. Would suggest HTTrack.

But it's not impossible to have some tool that records all of those interactions to reproduce later. A smart enough tool could record everything since you open the site until you click save. It would not reproduce the functionality that is backend dependent, but iy sure can replicate the dom, etc. Am I missing something?

Yeah - but there's a LOT of variables that come into play for something like that. It'd likely be easier to either record it with something like BugReplay.com or video.

This is part of a webtop I built called qKast (https://qkast.com) In fact the chrome extension https://chrome.google.com/webstore/detail/qkast/eliofljjghgd... let's you mix and match live components of webpages and make "living" snapshots, further then that though - they're not i-Framed so you can use an assortments of widgets to modify the contents and look of the components as well as broadcast the whole webtop live.

I haven't used it so I'm not sure that it does everything you want, but take a look at https://webrecorder.io/

Unrelated but There are so many things called Polly that it gets confusing

Yeah. I was going to bring up the library for .net that provides policy based retries.

i went to the slackbot that has a cute parrot logo. we at CodingBlocks love our Polly.

I thought of Amazon Polly.. converts text to lifelike speech.

So VCR gem for javascript. Great! Personally I stopped using VCR gem a while back as it blocks edge cases. However for larger projects where things can get unwieldily this makes a lot of sense. Local test suites should never hit external APIs so it's much better to have mocks/stubs than to have no tests at all.

However on smaller projects I've found that just clicking through to make sure things work and then letting my error reporting system catch bugs to be much more effective :)

It's a hard line to walk and I surely haven't perfected it. I'll give it a shot on a future project!

I personally replace all stubbed HTTP interactions between my services with contracts with good success. https://docs.pact.io/getting-started

Alternate URL (no Javascript required):


a little ironic, no?

I know this is slightly different, but I wish more people knew about Chrome / Safari / Firefox’s “network” console tab. Great for debugging. Can look at all requests, headers, responses, timespans, etc. Some will even let you copy a given network request as a cURL command, capturing all headers, body, query strings, etc.

And out of curiosity, what makes you think that people don't know about it? I've never met a web dev who didn't know about it in the past few years.

New people are introduced to web development every day. Assuming some things are just common knowledge is not very beginner friendly. https://xkcd.com/1053/

I'm curious what the application could be for load testing? Tools like locust and gatling are nice but are still synthetic. I'd love to capture X minutes of traffic, then dupe it Y times and replay it as a more accurate representation of traffic patterns for load testing. Is that a thing?

not had a chance to properly try yet, but https://goreplay.org/ does exactly what you are asking. Alternatively, in the container world, tools such as Istio (https://istio.io/) allow traffic shadowing - you can duplicate traffic and route it somewhere else

I did something similar, but as an interim proxy (can record, replay, there are modifier hooks, can slow down requests). You have to point towards a backend api and on the frontend you use the proxy url instead of the original. But it's mostly for debugging, so the scope is much more limited.

Also check out https://github.com/code-mancers/interceptor which as well uses browser APIs to enable users to mock http responses via a chrome extension.

How does it hook into the browser APIs? I can't seem to find it. By what black magic would it know how to hook into my puppeteer instance? Or I'm I not understand this?

Why another tshark/tcpdump? all this can be done with a simple script with few lines. Today we need javascript recorders, traffic recorders are a kid game and using a certificate to touch https is a dangerous way (but every project there is doing same). Tshark and sslkeylogfile is the only safe way... but I like this project I don't know why! I feel something.

This seems like it will be especially useful with apollo-client for graphQL requests.

I've used mitmproxy + proxychains to do this. How is Polly different?


Why Polly?

Keeping fixtures and factories in parity with your APIs can be a time consuming process. Polly alleviates this by recording and maintaining actual server responses without foregoing flexibility.

* Record your test suite's HTTP interactions and replay them during future test runs for fast, deterministic, accurate tests. * Use Polly's client-side server to modify or intercept requests and responses to simulate different application states (e.g. loading, error, etc.).

Not sure that tells me how it's different from replay or the other half dozen npm modules that do the same thing. It'd be nice for them to contrast their tool with existing ecosystem options considering some of them are pretty well established.

Can you share which libraries you know that achieve the same thing? I'm happy to go through and respond to the differences.

I only have experience with replay, but an npm search turns up:

replay, replayer, http-record, talkback, sepia, mitm-record, fetch-vcr, tape-nock, jest-playback, eight-track, axios-vcr, replayer, node-vcr, mocha-vcr, mockyeah, yakbak, nine-track, dkastner-replay, node-nock

At which point I stopped looking...recording http requests isn't exactly new territory.

I've personally used node-replay with great success. It has minimal configuration https://github.com/assaf/node-replay

Interesting but would be more useful with support for streaming

On the roadmap, depending on your definition of "streaming" (e.g. buffer streams, websockets).

so this will make the actual http request the first time, then keep a recording? I’m not entirely clear how this works from the docs.

What would a use case of this library look like?

So is this basically selenium in javascript with some neat features?

I think this is more of a complement to Selenium, where you can use Selenium to drive the browser to test the UI, with Polly providing recorded back-end responses. I need to look into it more, but this might address a need we have to make it easy (and quick) for our front-end developers to run our test suite locally during development, without having to spin up anything in the backend or rely on flaky non-Production environments.

EDIT: I am aware there are many other tools that can address this, we just haven't had the time yet to implement them. :)

Sounds more like the VCR gem from Ruby land.

Yes exactly, that's the idea anyway. Has a few nice features on top such as controlling the network latency and expiring recordings (useful when working on a project supported by a big team).

This isn't selenium. More like wiremock.

What's the core distinction between this/wiremock vs selenium?

Selenium is for behavior testing. Simulating clicks and form filling. This is for mocking http endpoints.

I still don't get why we do this [mocking http endpoints].

Sure this makes the problem of mocking the server less painful. Well done. But I'd take completely integrated tests over these any day. Sure they're slower but that's more or less irrelevant with feature toggling, staged roll-out and continuous production monitoring.

It's totally possible to completely avoid mocking http endpoints thus making these tools completely obsolete.

See my comment above. This is not a replacement for E2E testing.

It's not always easy. Especially not if your API is stateful.

Or if it's not your API. I've been looking for something like this to make mocking OAuth flow a lot easier.

The example is form filling though:

    await fillIn('email', 'polly@netflix.com');
    await fillIn('password', '@pollyjs');
This is exactly like selenium code I've written to login. I struggle to see the difference in purpose.

In your selenium code, the browser was talking to a database.

But sometimes that database is down, or really slow.

Polly says "browser, don't talk to the database anymore, instead here's what the database said last time".

So, yes, both Selenium and Polly poke DOM elements, but Selenium stops there, where as Polly does that + as well as tricks the browser into going through the whole test without making a real call to the database (assuming it has a previous recording of "what the database said" for that test).

That's part of mocha or whatever. Polly is the server part.

> /* start: pseudo test code */

Also, what's up with the in-your-face hiring pitch right in the documentation?


I assume that devs read the documentation, and they want to hire devs, and it's their tool, so they put their hiring pitch in their documentation for their tool to try and hire devs

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact