Hacker News new | comments | ask | show | jobs | submit login
Blocking website ads with a hosts file (debugandrelease.blogspot.com)
138 points by bobblywobbles 24 days ago | hide | past | web | favorite | 108 comments



It is much easier to use a HOSTS file as a whitelist rather than some sort of blacklist.

HOSTS is useful but limited. For example, it does not allow for wildcards like DNS.

Unbound is included in many distributions nowadays and it has plenty of features now that can make it act like a HOSTS file or authoritative server. These work well for ad blocking.

Blocking ads is like blocking traffic using a firewall. Firewall rulesets often block everything by default and then lines are added to whitelist desired traffic. This can be easier to manage than allowing every domain by default and trying to come up with a list of all undesired domains. The same firewall-like approach has worked well for me in blocking ads. All domains blocked by default; desired domains are whitelisted.

If you use Chrome browser, it will even help you formulate your whitelist. Go to chrome://site-engagement after some routine browsing.

You might find there are some shocking entries in those massive blocking HOSTS files popular on the internet if you ever choose to read one. Sites you will never, ever visit in your lifetime online. Grossly inefficient.

It also appears sections have been cut and pasted from a variety of disparate sources without any sort of verification.

I tried to read through one of these massive HOSTS files once and had to stop as I found it too repulsive. There were far too many dark corners of the web listed that the average web user will never visit. Makes one wonder how the authors even know about these domains.

People's browsing habits are not all the same. A "one-size fits all" HOSTS file seems inappropriate.


Sounds interesting, care to elaborate a bit? How do you deal with, eg: CDNs? Whitelist *.cloudfront.net, I suppose? How often do you revisit your whitelist?


I have found I can block cloudfront domains by default with almost no inconvenience.

Occasionally something like a download link, where the webmaster has chosen to use cloudfront for that specific resource, might require that I whitelist a cloudfront domain temporarily. If the domain has a unique subdomain and I am confident no ads are ever served from that subdomain, I might whitelist it permanently.

Every user is different and visits different websites. Each user's needs are to some extent unique. I think you have to find what works for you. No one can do this for you.

The more engaged you become in blocking ads, when you stop relying 100% on a third party to try take care of it for you, I think the more familiar you become in exactly what domains you need to access to accomplish whatever it is you are doing on the web. That knowledge allows you to make yoiur whitelist.

Meanwhile anyone using Chrome can tap into the built-in diagnostics via chrome://chrome-urls to get a very quick and easy analysis of what domains they are requesting and the ones they actually need:

    chrome://site-engagement
To answer the second question, if I am visiting new sites, then the whitelist is modified accordingly. Otherwise I have found the majority of IP addresses to be quite stable. If I am visiting many random websites, eventually I will find one or two that are changing their address either perirodically or permanently.

Personally I like to know if websites are changing their IP address. I think there can be good and bad reasons for changing IP address. When one is using whitelisting instead of unrestricted recursive queries to a DNS cache then it becomes easy to identify websites that are changing IP address and to monitor the changes.


TIL...While i don't spend much time in chrome's configs and settings, i liked peering into the results of my list when viewing chrome://site-engagement

Thanks for sharing!


I have been using this custom host file for a few months and it works like a charm. Just have to update it from time to time (but it can be automated).

https://github.com/StevenBlack/hosts

"This repository consolidates several reputable hosts files, and merges them into a unified hosts file with duplicates removed. A variety of tailored hosts files are provided."


That's very comprehensive.

I wonder if you could circumvent the hosts method by rotating through unique subdomains as your ads server. My understanding is that you can't wildcard the hosts file.


Yes, hosts do not support wildcards. There are some solutions to blocking advertisers that use tricks like you suggest. A PiHole is able to do wildcard blocking. Also, uBlock Origin (which accepts host formatted lists) will automatically block any subdomain of a blocked root. So as long as the parent domain is also blocked, any subdomains would also be included


Any ideas on how to get it to work on MacOS ? On High Sierra the browser seems to ignore the /etc/hosts file.


/etc/hosts is indeed used by all versions of MacOS.

On MacOS, try this in a terminal window to flush your cache.

    sudo dscacheutil -flushcache;sudo killall -HUP mDNSResponder


A good combination is uBlock Origin and Nano Defender (both correctly configured, there are steps you can follow online). uBlock Origin does a good job of blocking most stuff, and Nano Defender does a good job of stopping sites from detecting you have blocked their adverts, thus stopping the website from displaying a "Hey, you have an AdBlock, we need adverts to keep this site free. Disable your AdBlock and refresh to view this content".


Am I the only one that likes the "Hey, you have an AdBlock" popups?

They come up and I spend a few seconds deciding if it's important to me to read what is behind it, and 95% of the time that answer is "no". Saves me a TON of time. :-)


When I see them I make a mental note to not visit that site again. That's getting more difficult with more sites going that route, though.


I just add the site to my hosts file pointing to localhost.


Ahh, I've always wondered if there was a way to get around those detections. I have learned something new today. Thank you!


I love how the blog itself serves pixels and ads galore. Apparently it's ok if it does yield revenue for the right persons.


It's the perfect plan. A "how to block ads" guide is going to attract tons of users who aren't already blocking ads.


Haha. On a slight tangent, you could create a 'business' around - "The perfect home security system before you go on holiday". Enter your name, address, and last date you need it installed by for a quote.


I do serve ads in order to supplement my income to fund outreach efforts I am apart of. Like I said, the lifeblood of entrepreneurs :P


The money goes to google, not the author of the blog post


I disabled the ad-blocker on a Liverpool Echo page because I wanted to watch the video. 421 cookies and one reboot later I was able to watch the advert before the video and then the 50 second video clip.

I presume that the 421 cookies are tracking something, only a hundred or so go to the Liverpool Echo, the others go to 20 or so other places. Nonetheless there are not many people reading local papers online, it is too much effort wading through the junk that gets downloaded. 6 megabytes to display 15 sentences and a video embed is a bit much.

In the olden days the newspapers were read by many people. Nowadays the newspaper readers are 'read' by many people. It has gone back to front.

How often does anyone here see a link to a newspaper and think to jump straight to the comments in order to see if the article is worth reading? For me this does not happen if the link is to a blog or other site likely to be sensible with the inline spam.

The sooner this ad-spam business dies off the better.


My local weekly paper has very little content, and they want £2 for it, that’s too rich.

Oddly they put every store on Twitter. And email me about it. God knows where they get the money from.

I do weep for the lack of coverage of local democracy though. Where journalism dies, political manipulation and blatant lies run rife. All we have left is private eye to cover the most egregious cases


Thanks for the note about coverage of local democracy correlating with corruption.

Perhaps you listened to the same Hidden Brain episode?

Starving The Watchdog: Who Foots The Bill When Newspapers Disappear?

https://www.npr.org/2018/12/09/675092808/starving-the-watchd...


>The sooner this ad-spam business dies off the better.

they have not yet. the interenet/computers does give a lot more control to the user though. ads are not going anywhere though unfortunately.


Unless you're using a platform where you can't run an ad blocker (and I can't think of any), a hosts file (or a pihole) is a hamfisted approach compared to having ublock origin.


I used this before I switched to Pi-hole. It worked quite well in combintion hosts file + uBlock origin + uMatrix. One thing though, more and more sites now serve ads and content from the same domain, meaning if you block ads at DNS level you'll block the content too.


I run Pi-hole too, can handle much more than the hosts file of a windows computer. It was a while since I used the hosts file to block ads but at that time the computer could lock up quite a while now and then, and the problem dissappeared when I cleared the hosts file again.

It's realy neat to get autoprotection for all your devices at the same time with the Pi-hole.

Just ad uBlock to the browser to remove the rest ads and get a much smother web experience without distractions :-)


also: hosts file can cause problems with things like Windows Update and other software that you might want to keep working. Pi-hole is easier to disable. I always forget that I installed some hosts file blocks with Blackbird ( https://www.getblackbird.net/ ) which is an otherwise pretty nice tool (aside from that and how it's unclear if you're enabling or disabling something, since the switches "toggle" something instead of expressing that you want to disable or enable it specifically).

+1 for pihole; rPis / odroids / SBCs / NUCs / home servers are easy enough to run that it's worth it.


Second Pi-hole + uBlock Origin. Just a bummer that the Safari fork of uBlock Origin seems to be EOL because of the new restrictions in Safari.


Same. My Macbook also took significantly longer to boot when using a large host file - like an approximately 45 seconds freeze...


It's kind of crazy how we've been playing cat-and-mouse games between ads and ad-blockers for over a decade and yet websites still serve ads from third party domains. If they started serving ads from their own domain and randomized the IDs of elements, then they would be much harder to block.



Wow. I automatically assumed that you could still circumvent the random div IDs by just matching against the text itself (https://github.com/gorhill/uBlock/wiki/Procedural-cosmetic-f...), but they even obfuscate that!


Thats extreme!

I would have just made it an image with the text, from a random url.


Clever. I circumvented it by not going to Facebook.


Not enough people have ad blockers yet to change the industry I guess?

The ability to track and monitor internet users is very powerful and lucrative. They won't give it up so easily.


"0.0.0.0 is the invalid, un-routable address."

  That's apparently a windows-centric statement.  In Linux,
0.0.0.0 is the same as 127.0.0.1, whereas 0.0.0.1 works as your invalid address.


It's true both in Windows and Linux that it's a non-routable address. It's false both in Windows and Linux that it's an invalid address. It's also false that 0.0.0.0 is "the same as 127.0.0.1" in general. That it's a valid but non-routable address makes it a good address for applications to assign a special purpose to. You'll find that in some cases 0.0.0.0 means localhost, but in other cases it has other meanings.

For example, you might be in for a nasty surprise if you assume that "nc -l 0.0.0.0 1234" is equivalent to "nc -l 127.0.0.1 1234".


By golly, Linux does map 0.0.0.0 to localhost. That produced a bunch of searches to try to find out why it does that. Nothing found. At this point I strongly suspect that Linux is simply exhibiting incorrect behaviour...

It does it for :: as well...


0.0.0.0 is the address programs will listen on to be able to respond to any IP address assigned to the system.

When you are setting up a socket to listen for connections on a particular port you would specify 0.0.0.0 so then things can connect from anywhere like localhost or on any of the many possible IP addresses assigned to the machine, or you can specify a particular IP address and only be able to get traffic from that. For example if you wanted a program only reachable from the same machine you could listen on localhost (127.0.0.1) and then nothing external could directly connect to that particular service.


I've used "telnet 0", for example, as a quick path to localhost on Linux for decades.


The very first thing I do when I buy a new Android phone is to unlock it in order to install AdAway https://adaway.org/



But unlike rooting and using Adaway, all the other options on Android act as a VPN, and prevent you from using any other VPN, which makes them less handy.


This is awesome, it's just called DNSfilter on F-Droid. :)


I use Blockada on my phone, which runs an adblocker as a local-device VPN; a neat trick to do this without needing to root the phone (although mine is also rooted).


Can you use the local-device VPN in combination with an actual VPN? Or it is automatically turned off when connecting to a different network?


Unfortunately, you can only use one VPN at a time on Android. I'm not sure how you would go about blocking ads on an unrooted phone while simultaneously using a VPN. Samsung phones do have a workaround using knox, but it requires re-generating a developer key every few months and is too much trouble for most people.


Never heard about that; I'll give it a try



Yes, and these aren't that easy to manage, but still doable, but thanks to a really helpful and big community it's easier. Some tools allow for regex, wildcards and similar.

The bigger issue comes from the likes of Google/YouTube/Facebook who host their ads on the same domain as their main website, ergo, if you want to block the ad domains, you'll be blocked the main domains as a whole. In this case, the only way to block ads is through an in-browser addon.


You might be interested in this ticket: Encrypted subdomains for routing ads https://github.com/StevenBlack/hosts/issues/801

A PiHole could do wildcard blocking for the subdomain - but as in the ticket where the content for the site is also served from the same encrypted subdomains - nothing can be done. uBlock origin filters also fail at blocking these requests. After some research, I found a potential solution is to block off of request headers, since the ad tool is using headers as a way to send data. Unfortunately I'm unaware of any browser based tool that is able to block requests based on header content.

Its very interesting that this encrypted subdomain tool is only enabled in chrome and not Firefox. It will also detect if the developer tools are open or not. WebMD is a good example where this tool is being used.


There's a nice and maintained host file here which blackholes most ad sites: https://someonewhocares.org/hosts/. As a bonus, it blackholes some shock sites as well.


The zero[1] version of it works a little faster.

I am using the Unified hosts file[2] (mentioned in the article), it is a great way to combine many other hosts including Dan Pollock's list.

[1]https://someonewhocares.org/hosts/zero/

[2] https://github.com/StevenBlack/hosts


How big would the list need to get before it starts affecting performance? There is obviously some kind of lookup for every HTTP request against the hosts file. I assume the hosts file is converted into some sort of hash list?


The problem is, some (poorly written) websites don't work without the ads. Sometime you just don't care and close the tab, but sometimes you don't have a choice and in that case disabling the host file is a bit of a hassle. I prefer simple extensions like uBlock Origin which do all the work for me and that I can enable/disable as needed.


I've seen a lot of websites that don't work without scripts in my time, but never one that doesn't work without ads.

It would be possible to make one like that by hosting your content and your ads on the same domain, that would trip up naive hostfile blockers, but of course if companies were doing this quite a lot of people who habitually block ads wouldn't mind them doing so, since one of the key complaints against ads is data harvesting by third party ad providers.


I've seen websites being broken because they load some ad js, and when it fails it throws a js error which prevent the rest of the script from working. Also some websites wrap their outside urls in tracking urls, and these break as well with ad blockers.


The ones that explicitly detect adblockers and refuse to show content are usually sites that deal with more... shady material. When I need something from one of those, I find that Google's text-only cache is often enough to get the content, and if not it's really a question of how much the content is worth to me --- the back button is only a click away. What I won't do is enable JS, however; I'd sooner reverse-engineer the script and figure out how to get the content it loads than let dubious arbitrary code run.

But like I said, the back button is effortless and if your content is not rare, I'm going elsewhere.


This is all well and good until Google decides to force the use of DNS-over-HTTPS and completely bypasses the host operating system. Browsers have also done this for certificate trust lists. This takes more and more power away from the users.


Good thing there are alternate browsers then! :)


Such as?

DNS over HTTPs isn't just a Chrome issue; firefox are the ones who are actually shoving it down your throat.

I know most users here on HN are firefox users, but come on... It's an issue with all browsers, not individual.


Firefox allows me to select which server to connect to. I have no problem with DNS over HTTPS as a technology.


This is already happening with chrome async dns. There is a way to fix it for now http://ba.net/adblock/vpn/fix-dns-adblocker-chrome.html


Is Privoxy still a good go to for this stuff? Used to use it on everything back in the day, but haven’t really used it as much in recent years.

https://www.privoxy.org


Somewhat.

Privoxy can disable host requests, but for HTTPS traffic will no longer disable specific page elements.


One downside to this approach is that you still see where the banner was with an "address not found" block. I switched to uBlock origin some time ago which I prefer as 1) it collapses the ad blocks so you never realise they were there, and 2) it auto-updates the block lists for you.


I would say, that rather is an advantage, not a disadvantage. It is good to know after all that /something/ happened so you know that a page might be broken in some way instead of it failing silently behind your back.


That's true if you're prepared to update the file yourself all the time. In my experience there are a lot of URLs to maintain and I am ok with offloading that trust to a 3rd party like Easylist who will maintain the list for me.

Admittedly, I do occasionally have to turn the adblocker off to get a site to work, but this is maybe once a month.

This is the reason I haven't installed Pi-hole. I understand that a broken site may be because of the adblocker and can turn it off in my browser, but a less tech savvy user may not know this. And if they are on a Pi-hole network they won't know or be able to turn it off (I understand there is a whitelist but I believe this is only configured by an admin - could be wrong here).


I use something like this on a self maintained VPN server which I access on my phone and both reduces adverts and crucially reduces data usage.

I'd probably happily pay for a commercial VPN which had similar and better functionality.



I prefer to just use Pi-hole, but you could use many of its lists via the host file as well. I use many of the ones listed here: https://firebog.net


I feel this is also a good option in case you want to block ads network-wide.


I'm wondering whether having a huge hosts file could create any performance issues since it needs to get parsed regularly I assume.

Does anybody have experience in this regard? What about a basic version with ~100 entries?


I have like 30,000 entries in mine and there is no performance issues.

https://github.com/StevenBlack/hosts


Thanks!


I would imagine very little, and as far as I can tell the resolver is called first and then a lookup is done in /etc/hosts, the hosts-file takes precedence.


Next step is to do this at router level for the whole household at once.


Yup. pi-hole[1] works great for this and works for FireTV, AppleTV, Twitch and other Streaming services.

[1]https://pi-hole.net/


That's exactly where I wanted to take the article, but, one step at a time.


I too use Steven Black's hosts file. I can tell when I forget to implement it by the sound of my cpu fan. That said I'm fighting with one big limitation, and that's the fact that I do understand that some sites are ad supported and I'd like to support those sites. I wish there was a way with the hosts method to enable ads for just those sites without also enabling all the tracking that goes with it.


Yes sadly ads and tracking have become the same. While I certainly don't enjoy ads and have reservations about the ethics of ads altogether - I'm 100% totally against tracking, profiling, and targeting. I allow ads on DuckDuckGo since they are related to what I'm actively searching - but other then that I block all ads since I know they are also tracking me.


There was a chrome plugin that allowed you to modify the hosts file from an extension window.

I wonder if there'd be something that allowed you to allow ads on the current page and just removed it from the hsots file.


My browser being able to modify network configuration files. Now there's something I don't want.


If you like that, you'll love the 'all_urls' permission which most apps request.


It's an arms race.

If you block using a hosts file, the ads will be requested from an IP address, thus skipping a DNS lookup.

If certain IP addresses start getting blocked, they'll move to IPv6 and have an infinite dynamic supply, which are randomly picked as the web page is served.

It is an arms race.

Also: advertising ruins every medium it ever touches. There is no self policing or sense of restraint or any line that could be crossed leading to a feeling of shame.


It is an arms race but its worse that the progression you listed. Since the easy solution to ipv6 hosted ads is to just block ipv6 (plus anyone without ipv6 wouldn't see the ads), they just randomly generate div elements with random names making it impossible to distinquish the ads from real content.


Blocking ads via DNS lists? How that different from adguard.com? Or more specifically AdGuard DNS


It's simply another way to block ads. Not better nor worse. A mental and technical exercise that's all.


I think the thought process on this is that you have control over this, but don't have control with AdGuard DNS.


When I see things like this I think of the average user who may not be technically savy enough to implement this type of configuration. If Google blocks things like uBlock origin and ABP, how can we help less tech-savy users have an ad-free experience?


>how can we help less tech-savy users have an ad-free experience

I'd advocate for encouraging adoption of more flexible browsers, if it comes to that. Firefox is far from perfect (see Looking Glass), but for less tech-savvy users it's probably the best option (and you'll get better results than a hosts file or DNS server since the ad blocker can actually fix the page layout and modify elements, too!).


Suggest installing a different browser (eg Firefox) which supports uBlock Origin?


I have this running on the VPS I run a a SOCKS proxy on and surf through. Works great. It really speeds up sites with annoying ads or ads with scripts that hang.

The annoying thing is that it blocks direct links on deal sites etc.


Isn't that what a Pi-Hole does for you, just for the whole network, not just your the computer with the hosts file?

https://pi-hole.net


"Host Flash is the ultimate Linux hosts file manager. "

https://host-flash.com/


Anyway to use this in android?

I already use Firefox + ublock origin and it is enough for my browsing.

However I am looking for something to block ads also on apps.


Take a look at https://blokada.org/

It blocks ads in news and other apps for me.


Has it been working fine for you? It quite often stops working for me and I have to restart the service.

DNS66[1] has been working great for me but for some reason it misses a few ads unlike Blokada which never missed a single ad for me.

[1] https://github.com/julian-klode/dns66


I've had to give up on both which is a pain for me. Reason being that for some reason no connections work when it's running. As soon as I disable it internet works again. Haven't bothered figuring out what the issue is yet.


Yeah, totally. Set myself a Pi-hole + PiVPN up on a small Vultr instance. Routing all Smartphone traffic through it. Costs me 3.50 € a month and blocks half of all DNS requests. Haven't seen a single external ad since then.


AdAway https://adaway.org/ (Only on F-Droid), requires root.


DNS66


another to toss on the pile: https://github.com/jakeogh/dnsgate


How can I make this work on macOS?

In my case: 10.14.2. Mojave


edit /etc/hosts


Install Gas Mask, easy way to paste file


This is essentially how ublock works


Does it work for intrusive trackers too?




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: