Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Open source JavaScript library to record and replay the web (rrweb.io)
353 points by yz-yu 6 months ago | hide | past | web | favorite | 51 comments



Great work yz-yu. Hope you've learned a lot- I've personally found the session replay space to be incredibly rewarding.

However, as a session replay industry competitor and a former security researcher for most industry players, I caution anyone thinking of using a side-project like this on production applications to proceed slowly with care.

Security and Privacy are extremely hard to get right here. The tricky thing about session replay analytics is that attackers have a huge attack vector, and compromise means gaining a treasure trove of all user data. The nature of replay is in a way a form of XSS. Modern security features help (like CSPs, iframe Sandbox attribute) but browser changes can cause issues.

Some of the challenges: - CSPs can often be bypassed using Google API libraries, <Object/>, <SVG> - Blacklisting <SCRIPT/> tags can often be bypassed with an XML namespace - CSS based data or password exfiltration. - Clickjacking, "data:" urls etc. - Could you imagine a web request proxy server deploying Service Workers? - postMsg() from further nested frames

Substantial work goes into sandboxing replay environments and limiting PII. Defense in depth is particularly important here. Enterprise level research, auditing, monitoring and care should be taken seriously.


Definitely learned a lot and enjoy the process and thanks for your really important suggestions.

Quote from my introduction blog post:

===

Today we already have some commercial session replay products like Logrocket, Fullstory, etc.

If you are just looking for a ready-to-use tool and would like to pay for its service, I would recommend you to use the above products, because they have well-tested backend services that can store the data for you and perform some higher order features.

===

So I don't think rrweb is a competitor of these commercial products.

Actually, I would like to see rrweb grows into a base of many commercial products in the future, which means it handles most of the privacy and security issues, so the other developers can build many fancy projects base on it without spending time on the hard part again and again.


(Sorry your account was rate-limited! New accounts are subject to that restriction but it's definitely not intended for cases like this. I've marked your account legit so it won't happen again.)


My current job has live, in house, QA testers.

One idea I've been toying around with is a tool that records their movements through our site when testing, aggregating them, and then being able to show hot and cold spots of out site that they hit on their full site run throughs.

I haven't really dug into your code yet, but it sounds like this might be a good base for that or am I way off base in thinking that?


You are absolutely correct with regard to privacy and security for any public facing application. However I still think this is fantastic for very specific production use cases. For instance I am working on a very large, but entirely internal web application. The application includes trade secrets that we would never want exposed to a third party, but it is not the kind of application that would ever contain personal information. So this is pretty much perfect for our production use case since the recorded information never has to leave our control and the only users of our application would be employees. Any concern about a bad actor trying to harm our system or steal information is handled at an entirely different level.


To add to the key point about privacy, this research from Princeton is really illuminating and scary: https://freedom-to-tinker.com/2017/11/15/no-boundaries-exfil...


I might be alone on this one, but I feel the Freedom-to-Tinker report was unfair to the analytics providers. I know the folks in the industry work really hard towards privacy and security. They go out of their way to make it clear that not everything is automatically censored, and provide easy tools to limit data and visualize what is and isn't recorded. Holding PII and other sensitive data truly is a liability- nobody wants it.

Companies like Walgreens should be entirely to blame.

I really do appreciate how they author(s) in that report uncovered how those services where used in practice.

[I'm not with any party listed in the report]


I would love to read more about these kind of security issues, maybe you have a blog about this?

Or some great resources maybe?

I know of the obvious ones like OWASP; but that only scratches the surface.


I got one about bypassing GitHub's authentication using Unicode on the company blog: https://blog.getwisdom.io/hacking-github/

I've wanted to write a deep dive on JS defense for a while now. Lots of cool stuff learned I'd love to share- maybe in the next few weeks.


> I've wanted to write a deep dive on JS defense for a while now. Lots of cool stuff learned I'd love to share- maybe in the next few weeks.

Please do! :)


Hm, that's odd - ublock blocks access to the site :-/


Hi, hackers, the author here.

Since I've seen some people are talking about the open source idea and comparing rrweb to some commercial products, I'd like to share a blog post about the vision of rrweb.

http://www.myriptide.com/rrweb-introduction/

Also, you will know about how rrweb works in this post.


Thanks for open sourcing your work. I had already built something similar to this and other commercial products, but in jQuery.

Really interested to see how this compares.

I did see the IE11 issue. Are there any thoughts on what can be implemented for a fallback?


I know there is some MuationObserver(which rrweb used to observe DOM update) API's shim library, but not sure the impact on performance.

BTW, rrweb is also a project to explore the power of modern browser, so IE issues may not be considered in a high priority.


How does the replay work? Do you have to embed the player in the site being recorded or can you replay a recording on another page, like an analytics dashboard? Does it capture styles with the DOM snapshots?


A JSFiddle example would be handy. A record and replay example.


Amazing, I planned to work on something similar in 2019 for our SaaS, you just saved me few days / weeks of work :) Thanks


May I know what open source license is the project under?



I'm not a lawyer, but three characters in package.json isn't sufficient. You need to include the license text - eg https://github.com/nodejs/node/blob/master/LICENSE.


Looks really cool, but I find myself thinking about the privacy implications of using this, especially by default. Even if the user gives consent, it still implies recording every single mouse movement and keystroke on the site.

Has this been normalised? Is this the new default?

Food for thought.


From what I understand rrweb is not introducing or hijacking any browser functionality -- it's just using what's there. Whatever boundaries are being crossed should be considered already crossed, because firms that want this data don't have to work that hard to get it (or they can just buy some off the shelf tool).

If the privacy implications make us uncomfortable we might want to start not sharing this kind of information from browsers by default (this seems unlikely) or at least introducing some sort of browser-level controls. Unfortunately, this represents a lot of work and worries about breaking backwards compatibility contrasted with very little gain for browsers that don't pride themselves on being good for privacy(Chrome).


RMS has been writing about the issue of non-free session recording scripts in The JavaScript Trap[0]:

> In addition to being nonfree, many of these programs are malware because they snoop on the user. Even nastier, some sites use services which record all the user's actions while looking at the page.[1] The services supposedly “redact” the recordings to exclude some sensitive data that the web site shouldn't get. But even if that works reliably, the whole purpose of these services is to give the web site other personal data that it shouldn't get.

[0]: https://www.gnu.org/philosophy/javascript-trap.html

[1]: https://freedom-to-tinker.com/2017/11/15/no-boundaries-exfil...


I agree with the privacy concern, but calling analytics software "malware" is too extreme. It isn't mining for bitcoins on your hardware, or encrypting your documents to extort you.

Always using the most extreme terms just makes it easier to dismiss such views outright.


I think it depends on how the software is used.

A friend of mine got a suspicious tax returns email that had a link to a form asking for credit card information. Being careful and responsible, my friend of course asked me if the site looked legit before actually pressing 'submit'.

Of course it was a scam site, and using session recording, they could very well have gotten my friend's credit card details without per pressing 'submit'.

I think it's always the context that decides whether something is malware. Is a program that erases everything on your disk malware? Perhaps, but if it's a disk formatting tool and you asked it to do so, then it's not.


It's a good example, and something I often wonder about when I'm filling out a survey and give up part-way -- did they save the questions I had already answered?

FWIW you probably wouldn't need something as powerful or blunt as session recording to pull this off, though. You'd only need to listen for keystrokes on the relevant input (with document.addEventListener or similar), and send them to the server as they're typed. Same with partially-filled surveys. IIRC Facebook got in some heat a while ago for sending the partially-typed messages up to the server and to the other chat participant.


Yes, that's a fair point and I agree with you on both examples.


I agree. So while the project is technically super-cool, I prefer a browser extension for privacy reasons. With an extension (that does not "phone home") all data is stored locally on my machine. And if needed, the open-source kantu tool offers a way to embedded recordings into (your) a website, too:

https://a9t9.com/kantu/demo/runweb

Inside your local team you can of course share the recording simply as JSON files, via github and other services.

Another advantage of using browser extensions like kantu, selenium ide and imacros is that they are more powerful by design, but that is another topic.


Totally agreed with your consideration about privacy. Unfortunately, as I wrote in my blog post, some commercial products already shipped features like this.

So another passion of rrweb is to teach people the 'power' of the modern browsers, and I also wish rrweb has a chance to improve the standard of web privacy.


I think, this should be considered as an easy to use library to do things that is anyway being done already, or anyone can do with some amount of work. Comparing this library to surveillance tool kind of distorts the idea of what this library is trying to achieve here.


Even though I do agree that the total recording of sessions is not nice, such tools can be extremely important during early stage testing with your web app's UX. Especially for solo founders who don't have colleagues to tell them that they can't figure out how to use your app :)

Nice way would be to do some recording for a week or so. Get any sessions that obviously were quite long and the user didn't achieve anything. Go through them and try improving UX so the user won't get stuck there the next time.


I played with the examples and I am extremely impressed. I couldn't tell at all that I was being recorded; I figured the examples would just be videos of people interacting with those pages. The speed-up feature is very neat too.

Very good work. I am actually a little surprised something like this is open source.


You should offer a commercial and open source version. The commercial service could provide a few extra features at a modest price point, but support development of the open source platform. Perhaps it could pay your bills and be a cheaper alternative to the existing expensive commercial offerings. I could see you taking a big bite out of their market if you keep it up.


This is so awesome that you made this open source. There's a bunch of companies that basically use this tech and making lots of $$. We're actually working on a user testing service and currently using a chrome extension to record video:

https://www.userlook.co

May consider switching and using your library!

Thanks for sharing this :)


really nice!

Few years ago, I created something very similar when working for validately.com - user testing company. The solution was tailored for our needs and was quite unique and rather sophisticated.

Below few main points:

- automatic injection of recording script by proxing original site / app via our domain (optionally user's could have inject the script by themselves)

- using iframe to serve the recorded page in order to preserve context and allow to display content on top of the page

- audio recording

- broadcasting in real-time

- storing all assets from a recording (images / stylesheets) to make playing back independent from original urls

Not everything was perfect and there were always something to improve. Some sites did not work at all due to technical limitations. But the technology was good enough so the company could grow and transit to webrtc based solution.

I am very grateful for this rare opportunity as the project taught me insane amount of useful stuff. Would love to work on something similar again.


Regarding security & privacy of products like these, I think it would be interesting to not capture any data not included within the app's codebase by default, instead of relying on any kind of redaction steps.

For example,

A table and interacting with a table, the cells would just be filler elements. Form field data just wouldn't be captured, contents unique to any record on a page would be filled with sample data.

That way you get to see how someone interacts with a page, but not any context/personal information.

I think relying on sensible defaults or redacting data is a lost cause and puts the trust/responsibility in the wrong hands. Some companies may care about redaction while others don't prioritize it.


This is similar to Heap's Identify [1]. Main difference is Heap focuses on metrics while this looks to be for debugging and UX. Any plans for a heatmap feature built-in or third party?

[1]: https://docs.heapanalytics.com/docs/using-identify


How does this compare against something like Full Story?


you can basically build fullstory with this library!


Very cool. So much better than loading youtube videos. Would be great if there is a way to annotate parts of it. Apologies if it is a feature creep :(


Great idea! As I said on the landing page, demonstrate is an interesting use case for rrweb and the annotation feature will make it even better.


This looks awesome! Does anyone have recommendations for a tool that can do this but also let me save the video? Preferably, below 100MB per 5sec recording...

I mean, come on, how smooth is this library?


will this work with MVC frameworks like Angularjs, Vue and ReactJS?

I checked the DOM of a angularjs app and when I enter something in the input field, its not appearing in the DOM at all.


Very nice project. May I suggest to add "Chinese version" (in english) to the link Chinese in the Readme?


This is not working on replaying opening select elements.


Interesting, seems like the core tech of smartlook / hotjar


ya ill stop my hotjar subscription next month and use this. ill save over 500 dollar every month


wait, call me dumb or whatever, but what this tool use for? I can't think of any usage, any examples?


This is NOT Open Source in a way you think it is yet, because the repo doesn't have a LICENSE attached to it, so the owner own every right and you are not allowed to sell it or do whatever you want with it.


Correct - does fair use cover even running software without a license?

This wouldn't pass the first hurdle at my dayjob...


neat. include that blog link in the readme




Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: