In section 3.2.2 they mention being able to handle obfuscated/minified scripts, but based on the description it doesn't look very robust. Any sort of anti-debug/tampering would break this, eg. storing the value of window.localstorage somewhere, then comparing it against the value of window.localstorage when you try to access it. If the values differ, there's probably some debugging/tampering going on, and the site can hold the content hostage and demand you turn off the protections. I'm not sure why they don't just patch the javascript runtime environment (ie. the implementation of window.localstorage itself). That would be much more robust and harder to detect. Plus, you don't have to mess around with rewriting scripts.
Hey, author here! Kinda shocked to see this on HN.
Regarding anti-tampering: this work is in a "taking the Web as we found it" kind of model. We focused on improving the existing state of the art for content blocking and resource replacements rather than adversarial environments deliberately trying to get around SugarCoat. There are already other ways that sites try and circumvent URL-based blocking anyway—bypassing SugarCoat won't be the low-hanging fruit. We touch on this in the discussion section of the paper, but if a script is making itself too much of a problem, filter list authors can always opt to block it entirely.
Regarding patching the runtime environment: other systems have done this, but they haven't been adopted. Deep engine modifications are hard to get upstreamed and, absent an actual standard, don't give you cross-browser compatibility. SugarCoat-generated scripts can be (and are!) deployed in existing content blocking systems today, and aren't locked to one particular browser.
I feel like the right solution to using the modern web is a allow-list based approach. Let the user be in control of the details of every request and response and action taken by the browser. Then everyone can choose their own way to interact with the web.
Looking at the source code [1], I have to agree with you. This is very easily detectable by anti-tampering scripts. Using JS Proxies would have been a better approach, although that is detectable as well (see e.g. [2]). To be really undetectable, the mocks should have been done on V8-level.
I’d really love to see this in Firefox, even though I already use uBlock Origin, Privacy Badger and Container Tabs. Even if this is added, I’d still not give up on these extensions.
Though Brave has been involved in (controversial?) work that’s tangential or unrelated from the core web, such as a substitute for advertising based income for sites, a crypto wallet, etc., I do admire the relentless focus on creating features that help and protect users. It also seems to have a higher velocity of feature releases, perhaps because it can still rely a lot on the open source Chromium project (which it customizes) as opposed to the Firefox team that has to maintain and improve Gecko/Servo as well as handle end user facing features.
Just FYI, Firefox is working on Total Cookie Protection and other features like SmartBlock to keep third party storage access blocked while not breaking web pages. It's definitely nice to see the anti-tracking space getting so active over the past couple of years.
Author here! Only the data-collection phase of SugarCoat that feeds the script-rewriting step is dependent on Brave (because it's built on PageGraph [0]). The resulting neutralized scripts can be deployed cross-browser in existing content-blocking tools, e.g., uBlock Origin. No reason why they couldn't be hooked into Firefox!
No, it's not that one, but I'll check it out. I use Firefox Multi-Account Containers by Mozilla [1], which is what many other container related extensions depend on.
I also use Temporary Containers [2], Facebook Container by Mozilla [3], and Google Container [3]. There are also container extensions for Twitter, YouTube, etc.
I actually don't care that much when sites break because of my ad blockers. If sites require my ad blockers disabled to work correctly, these sites are what is broken in the first place.
Good attitude. The only thing that is "breaking" is a web developer's opinionated view of how to present information. As we all know, that view is not always a reasonable one. The web developer does not work for you, the user. She works for advertisers or advertiser-funded organisations. IME, pages that are almost wholly JSON can easily be "redesigned" on the client side by the user (me), to present the information in a format that is most pleasing to the user (me).
Having a company that has chosen online ads as its "business model" sponsor researchers to "improve privacy" is inherently flawed. If this company was serious about user privacy they would not show ads. Today's online ads imply data collection and as such are are not compatible with privacy. Companies want (need) to know who looked at an ad and when. That conflicts with privacy. Solve the problem by not showing ads. Brave's customers are advertisers. Make users the customers not the targets. Forget about ads.
Will not happen. When users are not willing to pay for whatever "service" the tech company can offer, privacy problem cannot be solved.
This publication is a nice bit of "submarine PR" as PG would call it.
> IME, pages that are almost wholly JSON can easily be "redesigned" on the client side by the user (me), to present the information in a format that is most pleasing to the user (me).
What's a page that's made of JSON? And isn't it less semantic and more dependent on JS to convert the page into a readable representation, than a pre-rendered static site?
YouTube, i.e., a video page or a search results page, is one example of a page that is mostly JSON.
These YT pages rely on automation. The browser runs Javascript to format the page and to make HTTP requests that send data back to Google (privacy violation, no user benefit). The browser loads thumbnail images, automatically. There are many steps that have been automated. The JS is of course not written by the user, but by Google to support its data mining and advertising business.
However, it is also possible to retrieve a web page and perform the necessary steps manually, without Google's "help". Instead of letting a browser do whatever the website's Javascript programmers want it to do (to suit advertising interests, not user interests), the user controls the process, performing the steps manually.
This is how I approach YouTube and other convoluted websites. I retrieve the page to memory (tmpfs), using a relatively simple TCP client + local TLS proxy (no gigantic web browser is needed for such a simple task). I do not retrieve the separate Google Javascript files (which a Js-enabled browser will automatically request. There is no need for them; they are used to manipulate the user for Google's interests. (I am not interested in commercial videos nor am I interested in Google's JS "video player"; I do not use a mouse.) As the page is mostly JSON, it is not formatted to be easily readable on the screen. I reformat it manually, using tr and sed. Then I extract the bits I want from the text, i.e., playback URLs, and various metadata such video IDs, descriptions, durations, suggestions, views, likes, channels (if any), time since upload, thumbnail URLs, continuation token, etc. Then I make a subsequent HTTP request if I want something further.
By contrast, using a "modern" Javascript-enabled browser controlled by an advertising-funded organisation to retrieve a page from YouTube will result in all manner of privacy intrusion. Even just leaving a page open in the browser, without interacting with it at all, the Google JS will trigger constant HTTP requests, some empty (zero benefit to the user, the user would never intentionally make such requests).
The amount of code needed for the fully automated Javascript-enabled browser is gigantic. The program is a security nightmare. The amount of code need for youtube-dl is also relatively large; IME the distributed binary can take over 7 seconds to start up. The amount of code I need is, by comparison, tiny. I only need sed, tr and a TCP client. Everything fits on a single page. Fast and reliable.
> These YT pages rely on automation. The browser runs Javascript to format the page and to make HTTP requests that send data back to Google (privacy violation, no user benefit). The browser loads thumbnail images, automatically. There are many steps that have been automated. The JS is of course not written by the user, but by Google to support its data mining and advertising business.
This is overblown.
Serious question ... how do you propose these companies that provide these services make money?
It's not free to buy yottabytes of space to archive every video we're uploading. Somebody has to send a paycheck to all the employees who make this happen.
If the response is "pay for it", the entire business model collapses because the vast majority won't pay for it.
And I'm no apologist here. I'm running uBlock Origin in Brave and similar things for a reason.
Is it really? I don't doubt it's fast, but in my experience it's very hard, if not impossible, to scrape these "mostly JSON" services without the whole thing failing spectacularly as soon as the site shuffles their data model a bit.
Maybe YouTube is more amenable to this than the services I'm thinking of -- I personally have no qualms deferring to youtube-dl so I wouldn't know -- but for things like my university's lecture recordings my efforts seem to consistently be mooted within a matter of months or weeks.
Originally the web browser was free (Mosaic). The project name "Mozilla" stoood for "Mosaic killer". Then Netscape tried to charge for their web browser, something like $39 for businesses to use it. They also tried to license web server software. Microsoft supplied IE and IIS for free with Windows and the rest is history. Yet people are still trying to find ways to make money from a browser. No one is going to pay for a web browser, so they try to sell out the users of the software to advertisers. Not much of a value proposition for users. Today, so much great software has already been written and is free. No advertising needed. The reason for all the surveillance, the ads and tracking, is greed. The software for using the web has been written already, many times over, it does not need financing. Sadly, the "modern" web browsers have become tools of manipulation and surveillance, too large and complex for any web user to build themselves. It is a travesty. These gigantic kitchen sink programs are written for advertisers not users, because users, including businesses, will not pay for new web browsing software. What they have already works.
Brave's customers are users. Ad model is opt-in, users get 70% without privacy loss, advertisers have to come because such users are valuable but off the grid due to blocking in all browsers they use. We also have paid ad-free search coming fast, so you can put your money where you mouth is.
Agreed, but some sites are unavoidable. For example, I recently had a website break on me that was important for me to access (hospital network website, had to check test results). This approach potentially allows me to browse the web without having a binary switch for blocked ads and traclers.
My biggest friction is not ads but e-commerce. Anti-fraud/anti-bot detection goes red for me from time to time - presumably having somewhat successfully removed surface areas for fingerprinting makes the AIs put me in the "shady" basket, so sites with high sensitivity set will not allow me to proceed.
PayPal is the absolute worst here and the process is horrendous, opaque and time-consuming. I've been blocked by Stripe as well. Sometimes I will abort a purchase when I see that the only payment option is PayPal.
At least with Paypal, I know it will work whereas with some random payment provider I do not. By default, I block all third party scripts which means if I haven't come across the payment provider before then it'll break.
I guess I'm just unlucky. Cursed by the algorithm or something. Maybe having moved a lot and having a very foreign name for my country of residence are factors as well.
I really like the sound of this but I don't trust Brave. I used Brave on iPhone as soon as it came out, always in private mode so as to not save any history or open tabs. A while back, after an update, I opened the app and it immediately opened dozens and dozens of tabs, all of which I recognized as being tabs I had opened in the past. It almost seemed be opening pages back to when I first used the app. I obviously left a complaint in the reviews. The developers quickly pushed another update but never addressed how or why this was even possible.
How is this gaslighting? To be fair, I may just not be fully up to speed on the term but I thought it meant telling someone that their experience didn't happen.
hm. interesting. could be an interesting feature for the mozilla vpn. rather than just redirecting all traffic to a clean pipe, redirect it into a special networking environment where tracking endpoints are mocked up to be benign.
even better would be if users could also analyze their own traffic, block suspicious things and contribute to the mock environment for firewalling personal data.
maybe the future of firewalls will be more about keeping user data in, rather than keeping malicious actors out...
even better would be if users could also analyze their own traffic, block suspicious things and contribute to the mock environment for firewalling personal data.
That's basically a MITM proxy --- and I've been running one for decades now, to adjust pages and block (as well as inject) content. But if Mozilla tries to do that with its VPN, the paranoia-spreading "security" industry (and we all know whose interests they really protect...) is going to roast them for it.
why? because it would cut into sales for local firewall style products? i always assumed most of the money in that world was in enterprise software, services and labor.
i don't think the personal tools for this are all that great anyhow and tools that would allow consumers to monitor their own devices and use oss tools for defeating software that doesn't serve them could very well be a hit.
what's the alternative? a closed platform like apple? there's gotta be a middle ground between having to run your own monitoring infra and handing the reins over for everything to a company like apple.
> SugarCoat is designed to be integrated into existing privacy-focused browsers like Brave, Firefox, and Tor, and browser extensions like uBlock Origin. SugarCoat is open source and is currently being integrated into the Brave browser.
Though does not mention any plans about integration with Firefox, it seems like it would be a matter of time
On firefox you're probably better off using container tabs + temporary containers. With that, you can basically have a separate browsing session (cookies/website data) per tab, which allows for isolation and doesn't require a whitelist to work (unlike the approach described in the OP).
Firefox already has Total Cookie Protection and other features like SmartBlock, which aren't exactly the same, but also work to block third party web-storage access while keeping web sites working.
It would definitely be interesting to compare and contrast the various approaches of different browsers.
Alright - so if the example they provide illustrates the jist of their approach, it's essentially "sandboxing" the scripts so that calls to localstorage succeed but are then effectively non-persistent.
That's right, it's essentially sandboxing the scripts. But I think the real innovation is an automated system they've created for writing the sandboxing code based on tracing the execution of the malicious/ad scripts in the browser.
Otherwise, what you're saying would be true, and this could be easy to break/bypass.
> SugarCoat replaces these scripts with scripts that have the same properties, minus the privacy-harming features
Depending on the scope of these replacement scripts, this may run into API patent & copyright issues. Additionally, the trackers can simply start using different tracker script URLs to avoid this type of implementation.
A better solution is to allow these scripts to load (without cookies) and patch all of their actual network emissions and storage access to follow consent rules.
As the article describes the lists in e.g. uBlock consider allowing a site to work with a privacy harming script preferable to blocking the script and having a broken site. The only time you notice breakage now is when the an the block list is not up to date with the changes on the site. This tool aims to run the functional part of such scripts without compromising by allowing the privacy impacting part of the script to run instead of an all or nothing.
Brave browser combined with Sugarcoat, I wonder how this combination will turn out. Also, it would be great if Sugarcoat could be integrated with other browsers that don't want to jump the Brave train.