Hacker News new | past | comments | ask | show | jobs | submit login
Neat URL cleans URLs, removing parameters such as 'fbclid' and 'utm' (github.com)
213 points by mot2ba 37 days ago | hide | past | web | favorite | 69 comments



Google Analytics expert here.

This is missing some fundamental ones: gclid and dclid. Those are the parameters that identify a specific click from a specific user on a specific ad placement. They are the keys that Google uses on the back-end to join Google Analytics data with Google Ads (formerly AdWords) and DV360 (formerly DoubleClick) data.

The utm_ parameters are tame. They are very coarse-grained, usually representing "which budget did this ad come from" rather than anything about a user. They're ugly, which is enough reason to strip them, but gclid is also ugly, and much more identifying.

This is a bit of a fringe opinion, but I actually consider tools that block utm_ but leave gclid to actually be a decrease in privacy. A lot of people misconfigure their Google Analytics so that _utm params break the gclid. Stripping utms allows that join to happen.


> This is missing some fundamental ones: gclid and dclid.

I always preferred CleanURLs as it has an autoupdate feature, which pulls down this json file https://gitlab.com/KevinRoebert/ClearUrls/raw/master/data/da...

https://gitlab.com/KevinRoebert/ClearUrls/-/wikis/Technical-...

CleanURLs does include both gclid and dclid. From memory the other url cleaning addons required a new version of the addon.


ClearURLs is my choice as well, it seems to catch a lot more stuff, and as you say it autoupdates definitions (like uBlock Origin etc.).


I mean, maybe you're in it for the privacy benefits, but I just want to auto-normalize URLs (i.e. to throw away anything that isn't necessary to get you to the page) for rendering permalinks in journal citations, in printed ad-copy, etc. I'd guess a lot of these libraries have the same goal. And in that context, they probably mostly seek to automate the stripping of the particular parameters that the author themselves has encountered, been annoyed by, and manually stripped before; rather than seeking to strip parameters based on the particular meaning or information they encode.


From a privacy perspective, that's even more horrifying.

You leave in the unique identifier, and then publish it under your real name, allowing all of those clicks to be attributed to your actual identity.


From a privacy perspective, is that really that bad? You've got a bunch of other people's clicks messing up your marketing profile. This might even be a good thing, because it pollutes the dataset that the adtech people are using, which makes the trackers less useful.

So here's a feature suggestion: instead of removing URL parameters, replace them with a randomly-generated value. Be careful to use the same length and character set (e.g. replace a hex id with only hex digits).

And also, set DNT=1.

That gives the digital marketing people an incentive to respect the DNT header, and an easy way to maintain data quality.

I think that this is a better strategy than blocking trackers and "cleaning" URLs. It shifts the economics of web tracking, which short of legislation, is the only way things are going to change.


I've added these parameters to my configuration, thanks!

Do you know of any other parameters like those, that can be safely removed? Maybe someone else here can list a few?


Most users of this would never actually click on ad (they are probably already using an ad-block, or are completely ad blind), so that omission doesn't seem that bad.


I've been using this extension for a couple years now. In fact, I submitted the pull request for the fbclid feature [1]. @Smile4ever merged and released the change the next day. It is a really great extension with a responsive and helpful maintainer. I'm glad its getting some more visibility.

Extension Links:

Firefox: https://addons.mozilla.org/en-US/firefox/addon/neat-url/

Chrome: https://chrome.google.com/webstore/detail/neat-url/jchobbjgi...

---

[1] https://github.com/Smile4ever/Neat-URL/pull/163


Sounds good, I didn't know about Neat URL until now as Canonical Url Detector is what I've been using for a long time. It is slightly smaller in size, but maybe not as effective: https://github.com/irok/CanonicalUrlDetector


Unfortunately it doesn't clean up things like AliExpress URLs fully, and PRs aren't moving.


This is a browser extension, I think. No where does it actually say that.


Created a pull request that fixes this confusion.

https://github.com/Smile4ever/Neat-URL/pull/211


Very minor nitpick, could you change the commit message to be in imperative mood? Thanks for contributing.


Yeah that could definitely be made clearer. I bounced when I couldn’t quickly see if this was a browser extension or a library or something else.


Yeah I was scrolling back and forth as well asking myself the question for what? all the time until I looked at the files and saw background.js and manifest.json, then it clicked.


In order to install this browser extension it requires permissions of "can read and change all your data on the websites you visit". So basically a man in the middle attack that tracks every site you visit and could potentially modify the content on pages you visit without you knowing.

As much as I enjoy privacy, IMO it's a much bigger violation / risk to give my complete browsing history to some extension, especially since it can "read all your data". That means it can look at the HTML response of any page you visit which means access to your banking details, email account and everything you ever browse. That's all personally identifiable data.

I wouldn't even consider utm tags a privacy violation since it's really nothing more than a slightly more useful referrer header that's completely anonymous.


Not sure if this would make you feel any better, but there's a similar extension called ClearURLs[1] (I'm not affiliated with it, but I do have it installed) for Firefox which is one of their "Recommended Extentions"[2] which have passed their reviews for security.

[1] https://addons.mozilla.org/en-US/firefox/addon/clearurls/

[2] https://support.mozilla.org/en-US/kb/recommended-extensions-...


> In order to install this browser extension it requires permissions of "can read and change all your data on the websites you visit".

Any plugin that rewrites URLs can do all that. It so happens that ublock origin also needs this capability in order for it to function[1]. So essentially, it boils down to a matter of trust. If I trust the extension developer enough, I would gladly install the plugin rather than doing nothing to stop the rampant privacy abuses.

Fortunately, this plugin is small and made of easily inspectable javascript. It's open source, so I could even build the plugin myself if I was super-paranoid.

> As much as I enjoy privacy, IMO it's a much bigger violation / risk to give my complete browsing history to some extension, especially since it can "read all your data".

That's some serious allegation. I just hope you had enough proof before you started pointing fingers.

> I wouldn't even consider utm tags a privacy violation since it's really nothing more than a slightly more useful referrer header that's completely anonymous.

How are referrer headers not a privacy violation?

[1] https://github.com/gorhill/uBlock/wiki/Permissions#access-yo...


you can always copy the code and install locally if you don't trust the publisher. That's the full benefit of open source. If it breaks in the future, just inspect and copy the new version again?


I believe the parent is asking for more granularity in browsers permissions in general. On Firefox, permission granularity is a bit lagging behind other platforms, for example Android (which is already not perfect/too permissive)


I think the problem is a plugin that can rewrite the url can rewrite the url to anywhere else which in effect becomes "read all your data". Firefox has taken such side channel attacks very rigorously so almost any plugin doing anything remotely interesting will have such requirement.


What are your thoughts about it being an open source extension?


> What are your thoughts about the being an open source extension?

It being open source doesn't guarantee what you see on GitHub is the code that the extension uses.

I'm not saying this author is acting maliciously but a very common attack is to say something is open source, point to the repo but in reality the code running in the extension is unrelated to that repo.

This often happens with packages installed by popular package managers. The home page of the package will be linked to GitHub so it appears to be open source but the package itself has different code because most of these package hosting sites don't pull in code directly from GitHub. The package author can publish code from a private closed source copy of the code sitting on their dev box and no one would ever know unless they looked at the source code after installing the package.

Now, when it comes to Chrome extensions I do believe there's ways to check out the source code of any extension you use, so you could double check it there but then you have to worry about the extension getting updated too.


>I'm not saying this author is acting maliciously but a very common attack is to say something is open source, point to the repo but in reality the code running in the extension is unrelated to that repo.

The extension is small enough that you can inspect it yourself. Also, AMO addons are code-inspected by reviewers, unlike the chrome store.

>Now, when it comes to Chrome extensions I do believe there's ways to check out the source code of any extension you use, so you could double check it there but then you have to worry about the extension getting updated too.

That's why I disable addon updates for "uncommon" addons.


> Also, AMO addons are code-inspected by reviewers, unlike the chrome store.

Are they still? Last I remember they started auto-accepting addons which passed the automatic tests and just maybe review them at a later point. Which could happen anytime between tommorow and next year. In the meanwhile the unsecure addon is floating around, endangering users.

There now is also this message: "This is not a Recommended Extension. Make sure you trust it before installing."

This kinda indicates the user is on it's own with such extensions, and there is no review at all?


> The extension is small enough that you can inspect it yourself. Also, AMO addons are code-inspected by reviewers, unlike the chrome store.

I've never been involved with performing code reviews for Chrome or FF extensions but I'm not sure this type of attack would be detected by a reviewer.

Because if all they do is take the HTML response and send it over to some web back-end with an ajax request, that looks innocent enough to any reviewer. For example, under what grounds would a reviewer flag that ajax request as malicious and prevent the extension from being published? It's not possible for them to know what purpose that data has for the extension unless they are really doing a deep dive on each review and take the extension's purpose into account based on their opinion of what it "should" do based on its description.

I'd love to hear back from anyone who happens to review extensions for either browser.


>Because if all they do is take the HTML response and send it over to some web back-end with an ajax request, that looks innocent enough to any reviewer. For example, under what grounds would a reviewer flag that ajax request as malicious and prevent the extension from being published? It's not possible for them to know what purpose that data has for the extension unless they are really doing a deep dive on each review and take the extension's purpose into account based on their opinion of what it "should" do based on its description.

I think you're giving the reviewers too little credit. There's no plausible reason why you'd need to send urls to a server to perform such a trivial transformation. Also, a search of BMO[1] shows that addons are regularly being found to breach these policies and blacklisted.

[1] https://bugzilla.mozilla.org/buglist.cgi?product=Toolkit&com... control-f for "Add-ons collecting ancillary data".


> if all they do is take the HTML response and send it over to some web back-end with an ajax request

This extension does not do that. Most extensions with the "Access your data for all websites" permission also do not do that. The permission is required to scan data (in this case, links) in the websites visited by the browser, and does not mean that the data in the website would necessarily be sent to a server. Neat URL processes the links locally.

You can inspect the source code of any WebExtension you have installed by downloading the package, renaming it to the .zip extension, and unzipping it (as .xpi files are equivalent to .zip files). For Firefox add-ons, right-click the "Add to Firefox" button on the extension listing, and click "Save Link As...". The code is not minified or obfuscated.

https://addons.mozilla.org/en-US/firefox/addon/neat-url/


Small correction: you don't need to install the WebExtension to inspect it. You can just download and unzip it.


> Because if all they do is take the HTML response and send it over to some web back-end with an ajax request, that looks innocent enough to any reviewer.

That's extremely not innocent for ANY browser extension.


That was just a broad example.

What if it got a list of every link on a page and sent that and then claimed it did that to better improve the extension by figuring out which query params aren't necessary and claimed that these links help train their app / extension.

That seems reasonable on paper, but it's a wildly over the top violation of your privacy and is only slightly less invasive than an entire page response.

I don't think the above example would get denied by a reviewer and it still uses the same "can read and modify" permissions as the current extension in its current form.


Then it'd probably need a privacy policy and/or disclaimer in the addon description, which in turn would cause people to find out and downrate it to obscurity.


Are you saying it wouldn't get accepted during the review unless it had that in the description?

That seems like a dangerous rule to live by if an extension is allowed to collect all of that information and it's auto-opt-in based on it existing somewhere buried in a privacy policy or long description.

We really only ever notice the permission setting because the browser puts that in front of us before agreeing to install it and it's usually a 1 liner like "hey, this extension can access everything about your browsing history".


Yeah, that alone should be a huge red flag unless it's something the add-on explicitly advertises, like it's doing translation or something.


This is one of the worst GitHub readmes I've ever encountered!

What is this? A CLI tool? A website? A browser extension?

Had to look in the comments to tell – what a desaster.


Submit a pull request for a better one. Be a force of good in this world. Now you’re just dumping on work some volunteer does for free


The mere fact that something is done for voluntarily and for free does not mean it cannot be criticized if it is bad. Note that GP did not criticize the person that made the README, but the README itself. That is how it should be.


They probably expect the Mozilla/Chrome extension sites to be the discovery point for people.


Love this kind of decluttering. I hate gigantic URLs! I've been using ClearURLs for Firefox, seems to work well. Anyone know of any major differences?

https://gitlab.com/KevinRoebert/ClearUrls


I tried both for about 10 minutes, it seems ClearUrl has rules per domain[1], therefore it can do more than tracking-prevention by remove some common but useless query strings in URL (like 'keyword' in amazon.com), but there is no places for custom rules.

On the other hand, Neat URL's rules apply to all domain, so it won't make short Amazon links, but you can add your own rules.

[1] https://kevinroebert.gitlab.io/ClearUrls/data/data.minify.js...


I've written a simple bookmarklet I call UrchinKiller that does the job:

    javascript:window.location=window.location.href.replace(/\?([^#]*)/,function(_,s){s=s.split('&').filter(function(v){return(!/^utm_/.test(v))}).join('&');return(s?'?'+s:'')});


Won't this only apply _after_ you've loaded the page? Doesn't really help prevent the tracking that most of those parameters are used for.


Could still be handy for people who don't mind the URL loading big but might later want to copy-paste it somewhere in cleaner format.


I generally use it just before I trigger the "post to News.YC" bookmarklet. ;-)

    javascript:window.location=%22http://news.ycombinator.com/submitlink?u=%22+encodeURIComponent(document.location)+%22&t=%22+encodeURIComponent(document.title)


On Firefox, I dropped this extension for the more versatile Request Control[1] as Neat URL failed to work for a long time; not sure about now.

[1]: https://addons.mozilla.org/en-US/firefox/addon/requestcontro...


I hate gigantic URLs, but not as much as I hate extensions that want to access my data for all websites.


Grab the source code, review it for AJAX or DOM manipulations and install it via Developer mode? Also, set reminders to review diffs and update it as new releases occur...


That seems like a lot of work for an extension that just makes URLs prettier.


Most users of this extension install it because it nullifies some forms of browser tracking, not because it makes URLs prettier.

I've shared instructions for inspecting the source code of a Firefox add-on elsewhere in this discussion:

https://news.ycombinator.com/item?id=22388603

WebExtensions like Neat URL continue to work even if you don't update it. You only have to inspect the extension code once (no developer mode needed) if you are skeptical, and you don't have to update it if you don't want to.


The paranoid in me says there’s no point in installing the web store version unless you download and inspect /it/. The source code published isn’t necessarily the version distributed, though obviously injecting code in the CI pipeline would be... excessive. This goes back to the trusting trust problem. https://www.schneier.com/blog/archives/2006/01/countering_tr... If someone managed to skip an exploit into a release of webpack, well, there goes the Internet ;-)


I like to squeeze amazon links to https://amazon.com/dp/B0085NTQJK - removing the SEO inlining of the article name

Removing those pesky get-parameters aint enough to keep a BOM link list in line


See, I actually like it when URLs contain human-readable text. Cryptic URLs like YouTube has are meaningless to me, requiring me to actually visit the page to see what it is. That's also why I dis-like link shorteners.

I really like Stack Overflow's URLs because of how flexible they are. You can put anything in the title slug, since only the ID is used. When self-documenting links aren't useful, then you can omit the title slug completely.

For example, these four links all point to the same page (in a code block so they don't get truncated):

   https://stackoverflow.com/questions/37358364/rules-for-the-use-of-angle-brackets-in-typescript 
   https://stackoverflow.com/questions/37358364/typescript-angle-brackets 
   https://stackoverflow.com/questions/37358364
   https://stackoverflow.com/q/37358364


You can put anything you want, but if it's not what the page expects, it will cause a redirect, which will slow down navigation just a bit.


What I enjoy about Amazon links is that what comes after the product slug doesn't matter at all: https://www.amazon.com/dp/B0085NTQJK/Definitely-Not-Underpan...

I've had a few occasions to use this to good effect


It'd be cool to implement this as a browser extension, that either silently strips the parameters, or opens a dialogue upon a link click telling you what the extension is allowed to strip out before proceeding.



Ah, thanks! I couldn't tell that was a case from the readme.


In your defense, I don't see a link to the extension anywhere on the GitHub page.



In another approach, we could use userscript for Violentmonkey/Greasemonkey browser addons. This way remind me the era of UserScript with Firefox version below 57, I almost can do anything with UserScript. Nowaday, it has some changes and limitations.

https://github.com/cloux/LinkSanitizer


I've been using Request Control (https://github.com/tumpio/requestcontrol) which does some of the same stuff, though it looks like Neat URL and Clean URL do more.


Cool. Work well on Firefox Mobile. I installed successfully on Kiwi Browser (Chromium Mobile based) but couldn't edit extension option, glitch somehow. Hope author add a feature like importing config file.


I don't get this.

My solution is to use uBlock Origin and block tracking altogether.

This still allows GA to track you, it just removes some information from the tracking beacon.


Couldn't website owners still analyze their server logs to track users via tracking parameters?


Anyone know if there's an NPM library that does this? If not I may have to abstract this out into one for a project (thanks for making it!!).


Won't somebody please think of the advertisers?!


complaining from advertisement industry in 3...2...1...


It doesn't clean AMP URLs?




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: