Hacker News new | past | comments | ask | show | jobs | submit login
Adblock via /etc/hosts (github.com/stevenblack)
273 points by lpsz on Feb 12, 2016 | hide | past | favorite | 139 comments



I had been doing this until some time ago to block ads and to prevent Google from collecting my web browsing history via Google Analytics. During the time I witnessed a strange phenomenon. Every time I added "127.0.0.1 www.google-analytics.com" to C:\Windows\System32\Drivers\etc\hosts. I saw the line removed from the file some hours later. Although I had added tens of lines I only saw the Google Analytics line removed. IIRC finally I decided to figure out whet caused the removal. I used Filemon to watch file changes, though the line got removed again while watching the file and nothing appeared on the log. I suspected Ring-0 processes were secretly running and causing the removal, but I knew nothing about the Windows kernel so I gave up here. I wonder what was the cause to this day.


Instead of Filemon I'd suggest firing up Process Monitor [1] with a filter of "path contains system32\drivers\etc\hosts" and then Filter -> Drop Filtered Events.

Let this run while you go about your normal work, then check back after you notice the change. Look through the Operation column for WriteFile or something similar, then see what Process Name did it. This'll let you figure out what's actually making the change and you can appropriately assign blame.

[1] https://technet.microsoft.com/en-us/sysinternals/processmoni...


on the subject of sysinternals Process Monitor. Did you know procmon.exe REQUIRES Workstation service running in order to start. It uses it to enumerate something and will silently die without it. This is not documented anywhere and pretty bogus.

Older versions worked fine without this service. It was silently added somewhere between win7 and win8 releases.


Hmm, interesting. No I didn't... I'll poke with this and maybe open a Premier ticket on it.


Some security products (like Windows Defender) as well as some VPN applications are known to modify the hosts file. Might be worth checking out.

Also see: http://security.stackexchange.com/questions/6883/something-i...


Did you have local Google services running, like the Google Updater (afaik also comes with Chrome). Google also adds some entries into the task planner, you can also check there what is getting called.

Though I believe you should have seen something in Filemon.


I thought I had similar, we use Junos Pulse and it rewrites your /etc/hosts file. I think it takes a backup at some point and rebuilds the file from that when it needs to. This means some local changes just disappear. Nothing sinister though afaik.


Probably anti-virus software preventing malware from hijacking google analytics.


Hopefully all the child posts will see this too.

TO edit the hosts file, you need to have admin privileges. That means closing whatever editor you're using, reopening it with 'run as administrator', and then opening the hosts file. You need to do this even if you are an admin account.

Another way to do it is to open the hosts file under normal editing privileges, editing it, saving it somewhere else, and pasting it into the drivers folder. The system will ask you if you want to run as admin, and you need to say 'yes'.

Nothing could (or should, I guess) be changing the hosts file otherwise (AFAIK, my source being many, many SO posts and random forums) without it being given explicit admin privileges when it attempts to change the file.


Just put the entries in the "hosts" file at router-level (e.g. using OpenWRT).


And duct tape said router to your laptop so you can take it with you every where you go?


Running a local dnsmasq server is effectively the same thing.


Yup. You could also use something like PeerBlock which gives you a little easier control.


In Win10 fontdrvhost.exe tries to modify windows Firewall rules every single day to whitelist itself :o

``` A change was made to the Windows Firewall exception list. A rule was added.

Profile Changed: All

Added Rule: Rule ID: {59F33BF3-EAFF-424C-BB26-C2DF4A709398} Rule Name: Usermode Font Driver Host ```

Why would a simple Usermode Font Driver Host need internet access??!?!

binisoft.org Windows Firewall Control has an option to safeguard firewall rules and automagically deletes all unauthorized (by the only person that matters - ME) rules.


Wow, really? Can anyone back up a similar story? That's really interesting if so.


Mine is the reverse. I have this line added to /etc/hosts on my mac, a local A record I want to play with. Now, I am done with my project, and I wanted to remove the A record. I keep rebooting and that line persists. God knows why.


Honest question. why use 127.0.0.1 instead of 0.0.0.0?

EDIT: Now I've read the discussion below regarding this matter. No need to answer, I guess. I asked before reading all the comments, sorry.


Assuming you meant why not: "Using 0.0.0.0 is faster because you don't have to wait for a timeout. It also does not interfere with a web server that may be running on the local PC."


He said he used 127.0.0.1 for google analytics. I was asking why he used 127.0.0.1 and not 0.0.0.0, exactly for the reasons you wrote. Sorry if that wasn't clear.


Maybe ad-malware?


See, the thing is what if a website is broken due to host files, you can't easily re-enable ads for just this one website you need.

A situation we can all imagine ourselves in: You need to check the google analytics for your website/company site. You can't because it's blocked at Host level.

What solution would there be for this use case?


I've been using the hosts file method for years and have never had an issue with checking Google analytics. I use the "Someone who cares" link.

Aside Google sponsored links and the odd ad sponsored link on pseudo-news sites not working (due to them being tracking URLs), I can't see it ever gets in my way.

However to answer your question, these days I run dnsmasq on my home server and have my DHCP server assign that as my primary DNS. So every device (phone, laptop, smart TV, etc) gets their ads blocked as well - which is particularly good for my TV as it's bad enough having regular adverts on TV without LG pushing out sponsored content as well. So if there was a rare occasion that I needed to turn off my ad blocking, I'd just change the DNS to 8.8.8.8 (Google DNS) then switch back to my dnsmasq server once i was done (the only complication being that I sometimes need to close and reopen the browser due to that particular application caching the DNS lookup)

The nice thing about using dnsmasq is that you can import those hosts files verbatim. Which means your update script can be simple.


I used to block various domains that served TV ads for UK Channel 4, but they hosted the client and that not detect ad loading. Seems you could probably truncate the video stream, maybe o using iptables, but their ad-load has reduced so I'm not motivated to try right now.


The weirdest side effect I've had was with the Sky HD box. If it was connected via ethernet then it wouldn't power up while my ad blocking was enabled. I was able to replicate this behavior by enabling and disabling my ad blocking, so my Sky box was definitely phoning home and failing to start if a specific domain was disabled. The weird thing is it would start up fine on WiFi or if the internet was disconnected completely. So I ended up just connecting it to WiFi as my wife was growing impatient by that point!

I did intend to reinvestigate the issue; throw wireshark on a promiscuous NIC and look for what domain Sky was trying to connect to and what data it was sending. I was thinking it might make an interesting article - depending on what I find. But in all honesty I had then forgotten about it until now.


If you're on a Mac I'd suggest Gasmask. It's a lightweight freeware that lets you switch between multiple hosts files right from the menubar. I have a productivity hosts file I switch on everytime I want to block social media, another one for development purposes and a standard one to revert too in case I don't want any hosts overhead and a clean default file. It's extremely useful.

https://github.com/2ndalpha/gasmask


A situation I've found myself in before: asking our devs why the site is broken, only to find it's my overzealous script blocking.


I bet your devs love you for this


That's not as bad as complaining about the speed of your site, only to realize you're running a torrent on your laptop.


https://github.com/jakeogh/dnsgate has a whitelist command for this situation. It's really necessary if you use the block-at-psl[1] config option.

[1] strips domains to the top level that the public can register using https://publicsuffix.org/

It also can quickly "dnsgate disable/enable". (dnsmasq only, quick enable/disable for /etc/hosts is not supported yet, patches appreciated)


http://shinnok.com/rants/2015/04/05/blocking-ads-and-tracker...

I've been using this simple script on OS X for quite some time now. It works like a charm and is using git for that exact reason, to be able to quickly disable/enable and also keep track of exceptions, changes, etc.


uMatrix chrome plugin is working well for me


$man mv


You're suggesting renaming the hosts file to something else temporarily? I guess that would work. How quick does the operating system pick up on this change, do I need to reboot my machine for it to reload the Hosts file?


No reboot needed. I have updated my hosts file regularly in linux, windows and macosx without needing a reboot.

Generally your browser will pick it up quickly as well. It doesn't cache host file entries in the same way as dns lookups so the effect of adding or removing is pretty much instant. I use Chrome mostly, so I am not as sure about other browsers.


No need to reboot, as soon as you change the hosts file the OS should immediately start to use the new version, the hosts file is the first thing the OS checks when resolving names, this is what enables this approach to work.


It takes immediate effect. I often have a terminal open and toggle commenting out an entry when testing something. As soon as you save it, it's good to go.


$sudo nscd -i /etc/hosts

also

$man nscd


Keep in mind that Firefox has its own DNS cache and that has confused me more than once.


Soe does Chrome. You can clear the internal cache by visiting chrome://net-internals/#dns


Hi Folks, this is my repo, thanks for all the comments.

I'm always looking for ways to improve things so I'm open to all suggestions.

EDIT: A couple of clarifications.

1) This isn't just for adblock. Your hosts file is useful for thwarting all sorts of malware. If a bot or trojan phones home with a domain, a vigilant hosts file will block it. A if a bot or trojan phones home with an IP, then the hosts file can't help you but, then again, an IP can be physically located fairly quickly.

2) The key to a good hosts file is keeping it current. This hosts file amalgamates several well-curated sources. So your hosts file is only as good as your ability to keep it current. This repo helps with this.


Hi Steven,

Thank you for your contribution. What do you think about this setup?

I use two Digital Ocean servers in different datacenters in which I have ran this script (https://github.com/jlund/streisand).

I modified the script (https://github.com/jlund/streisand/tree/master/playbooks/gro...) before I ran it and pointed the upstream DNS servers to my two personal DNS servers that are hosted on different datacenters.

The DNS servers are running a script (https://github.com/Kolyunya/afdns) that pulls the hosts file daily from (https://github.com/StevenBlack/hosts).

It has been working great for a few weeks, but I'm curious about any improvements I could provide.


I'm translating it to portuguese (github.com/muthdra/hosts-ptbr). I love it!


I'm running this for years.

using 127.0.0.1, I have a httpd responding to every request by a 200. this avoid some anti-ad-block check. (such as "watch this ad before your video")

you can also configure your server to reply with a cat gif. but who would like to see a such Internet?


Care to explain the software you use? I'm looking into it with nginx, can't find anything explaining "always return 204 whatever the URI" :(


Cats for ads. Has a nice ring to it. I'm surprised no one built it yet.


They have. A little company called Google:

https://support.google.com/contributor/answer/6223848?hl=en&...



I guess I could plug a silly little script I wrote for GreaseMonkey which runs on facebook.com. I hated looking at the "Trending" and "Recommended Pages" sidebar so I axed them out in favor of random imgur images of your favorite subreddit. By default you'll get `/r/aww` but you can flip it with a fixed little box in the top right to whatever floats your boat.

Full disclaimer: I'm a horrid at javascript, and the result below is mainly due to a lot of copy paste from various internet sources.

https://gist.github.com/GrahamBlanshard/d7211436088e0159164a


Just gonna plug hostsman, which has been doing this on windows since forever:

http://www.abelhadigital.com/hostsman

Lets you chose which lists to use, and automatically update those lists. Also makes it easy to temporarily disable your rules if you need something that's blocked. Has a button for flushing the DNS cache.


(Full disclosure: I run a service that blocks and intercepts malware communication using DNS! https://strongarm.io)

Blocking via your hosts file has some great benefits; it works regardless of network and is relatively easy to update. Unfortunately, it doesn't scale easily to many systems or give you any insight into whether or not you are trying to connect to blocked domains.

Blocking via DNS is a good alternative and is suggested multiple times in this thread. You can easily protect a whole network by setting your recursive resolvers and it works across any system.

If you are interested in this and don't want to operate and maintain your own DNS (as well as pulling down various domain lists) check out https://strongarm.io. We manage DNS, aggregating lists of bad domains, and (most uniquely) will alert you if you try and talk to a blocked domain.

It's free for personal use. We are a growing startup and love feedback from HN. Feel free to contact me directly as well! stephen[at]strongarm.io


I setup a DNS server once that would replace ads with porn. Worked pretty well.


"This DNS server replaces ads with porn."

"Everything looks exactly the same."


I'm using my own VPN server and I setup unbound DNS server there. It's the only way for my old iPhone and iPad to browse internet without ads. And it's really fast. I use https://pgl.yoyo.org/adservers/ for ad servers list and a little awk script to convert it to unbound format.


Yes, unbound is great—I run it on my local machine with a list of ad servers to block. I use this little script to download and convert the Someone Who Cares hosts list to unbound format[0] every few weeks. BTW, the pgl.yoyo.org list is available[1] in Unbound format since a few months ago[2].

[0] https://github.com/jodrell/unbound-block-hosts

[1] https://pgl.yoyo.org/adservers/serverlist.php?hostformat=unb...

[2] https://pgl.yoyo.org/adservers/news.php?#unbound


I use this script https://gitlab.com/Khaine/DNS-Unbound-Blocklist-Downloader to pull data from a bunch of different places and load into unbound


> Using 0.0.0.0 [instead of 127.0.0.1] is faster because you don't have to wait for a timeout.

I'm not going to argue that localhost is better than 0, but that specific argument they've raised is incorrect. You don't have to wait for a timeout on localhost either. It will either fail instantly due to no listening processes on that IP and port, or it will connect to whatever process you have open on that address (eg a local instance of a http daemon).


They should also add :: to also block on IPv6, see https://gist.github.com/teffalump/7227752

Although if you use dnsmasq on OpenWRT with these hosts file beware that it can crash sometimes due to a bug that is now fixed in git: http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/201...


Can you more precisely describe why it is incorrect? Both implementations requests data from the network stack, so both are very similar?


You should raise an issue on the issue tracker.


Or use the DNSCrypt proxy. It has a module to filter DNS responses based on their name (full name or using expressions such as sex), or on the IP addresses they resolve to. Instead of returning 127.0.0.1, which can make you vulnerable to rebinding attacks, it returns responses with the standard "REFUSED" response code. https://simplednscrypt.org https://dnscrypt.org


I've been doing this on a Streisand created VPN server : https://github.com/jlund/streisand and use most of the same lists (though I've had to remove a few things from "Someone that cares" and I've also had to add a few things - I target apps too so I nuke some specific to mobile apps that are likely not in those lists).

The reasons:

* Block adverts in native mobile apps

* Block adverts in mobile web browsing

* Create a single connection for the mobile (reduce exposure to latency of new connections to different servers)

* VPN connection keep-alive means I seldom reconnect

* Side effect of mitigating risk of my telco screwing with my traffic or excessively logging metadata

It works really, really well.

I'm sure someone will say "battery!" but the cost of mobile adverts on batteries far outweighs the cost of connecting to a VPN.

This is effectively adblock for mobile that works for all apps and websites.


Some of these map the domains to 127.0.0.1 which is wrong. It should be 0.0.0.0.

On second thought, you shouldn't be using the hosts file for this at all.


Either way I'm not personally comfortable with this method since I often run servers on my personal computers for development. So whether it's 127.0.0.1 or 0.0.0.0 the request will still reach my system and possibly be handled by any port depending on who makes the request.

I would much rather have a browser plugin for this.

Those servers could be anything from MySQL, redis to any web app.

I get that the hosts-method is meant to affect all apps but that's not a big problem for me running Mac OS and Fedora.

Last time I had to block ads this way was when Opera had them embedded and it was much less memory hungry than Phoenix on my 256M RAM laptop. Back then I blocked them in ipfw instead.


Can you explain the reasoning behind using 0.0.0.0 instead of 127.0.0.1? Genuinely interested.



Basically it's slower if you use 127 because it actually checks if something is running (example a server or whatever). 0.0.0.0 means explicitly "there is zilch, nip, nada". It's a little bit faster.


You can have processes listen on 0 - which means all available IPv4 IPs (available to that machine).

And since you can have processes listen on 0, it means you can equally curl 0; just like you could with 127.0.0.1. Here's an example from my IRC server (the only process I run on 0):

    $ curl https://0:9997 -kis | head
    HTTP/1.1 200 OK
    Date: Fri, 12 Feb 2016 07:49:24 GMT
    Server: ZNC - http://znc.in
    Content-Length: 1878
    Content-Type: text/html
    Set-Cookie: 9997-SessionId=54245f15ba592bc691e09ac75e6778e6d4c33841fad71a8d6c56addc998e043f; path=/;
    Connection: Close

    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
(just in case someone queries, 0 is just another notation of 0.0.0.0)


NXDOMAIN is also an option if you use dnsmasq. I havent tested if 0.0.0.0 actually returns NXDOMAIN, but from you comment, it sounds like yes. Also see --dest-ip in "dnsgate config --help".


telnet 0.0.0.0 $port works fine on my machine, it connects to localhost, so besides the speed, I don't think there's any other difference at least on Linux.


"ssh 0" or "ssh ::" works fine too. Also "http://0" or "http://[::]".


I do some blocking with my hosts file, and I used to put 127.0.0.1. That hits my default nginx vhost, so i usually get 404 or 403 pages which is not so bad since my default vhost is an empty docroot. But it is not very intuitive and sometimes I think a server is down because of the nginx error page before I realize its my hosts file. I considered making a custom error message for my default host to remind me what I'm hitting, but I'm lazy.


Security.

I suppose it would be possible to craft a url that attacks local web services sometimes found on developer machines. If someone can confirm this is indeed the case, I'll submit a pull request to their README.


This won't work, at least on Chrome. It blocks all cross-domain requests to localhost[1]. Even if the target is used with a domain that resolves to 127.0.0.1, or has CORS completely disabled with "Access-Control-Allow-Origin: *".

[1] https://code.google.com/p/chromium/issues/detail?id=67743


Which ones? Looking at [1], the only entries I can find for 127.0.0.1 are for localhost or are commented out

[1] https://github.com/StevenBlack/hosts/blob/master/hosts



I didn't check those, because the Python script rewrites them to target 0.0.0.0 when combining them.


While this is ok as an idea, I prefer Privoxy [1] to get my ad blocking outside of the browser. It has the benefit that I can turn it on and off (I use a proxy switcher). It also means that I can have other devices use it either via LAN or SSH tunnel or whatever).

[1] http://www.privoxy.org/


Privoxy is being hosted on Sourceforge, and Sourceforge is still hiding malware in the files. I checked yesterday.


How ironic. I just get it via apt-get in debian.


Whoever is using OpenWRT this is a great script for blocking hosts https://gist.github.com/teffalump/7227752. I guess it could be modified to use this source.


Could someone please explain why advertisers do not re-use functional domain names to defeat domain-based filtering? I always find it fascinating that they still use such obvious ad-only (sub)domains to host assets.


That's really the next step in the evolution here. Content creators/producers and publishers need to make money. They're not going to make all that content for free if ads (their revenue) suddenly disappeared tomorrow. Sure, you'll still have a few donation driven sites, and a few subscription-based websites, but 80% of the internet (probably more) relies on ads. Without that revenue stream, the content goes poof. Not to mention, all that innovation and the innovation yet to come. The next youtube, reddit or facebook would not get invented.

Anyways, yeah, some ad companies are starting to do exactly this. They're serving ads up from the domain/website the ads are displayed on. Host files are completely ineffective here unless you've already previously spotted the ads and manually blocked them (and provided they haven't changed the file name of the ad/picture since your last visit).

But the ad companies doing this are tiny. It's quite likely you've never been on a website that has enacted this method. Most ads on the 'net are served up from google's adchoice/adsense program. Once Google themselves start doing it, it's game over for adblockers and for host blocking. I heard through the grape vine that they're actually working on this very issue (the issue of ad blockers).


Thanks for the tip! I would image that once HTTP/2 becomes more popular, serving ads from the same domain might even be preferred simply for the speed improvement.


Possibly because of the threats of getting hacked by XSRF, XSS etc.


I've recently switched from Windows to Linux, and I feel kind of "naked", because I don't know what is the Linux equivalent of Windows Firewall + NOD32 + Common Sense. I've got ufw and AppArmor installed so far. Is using such a huge hosts file a common practice? Also, what about Flash? I don't want to install it, but some websites insist on it still. What would you advise, folks? I'd appreciate any suggestions.


There is no equivalent, these precautions are simply not necessary under Linux. As for Flash, I'd install Chrome.


This already works quite well on my Android phone using AdAway.

Edit: AdAway uses an /etc/hosts file.


Wouldn't it be more efficient to add blocking on router level? Making all of your devices (at home) ad-free.

Anyone exprerienced doing this?


I've installed some home routers with the OpenDNS ip's as DNS servers and a manual block list. Works quite well. It has its flaws of course.


There are a couple of products for this already: "AdTrap" and for the Raspberry Pi "pihole". (Content warning on https://pi-hole.net/ : contains piping curl into bash, slow-loading SPA site)


Pi-hole is great and I had no trouble setting it up, but I couldn't continue using it as it constantly triggered NoScript's ABE (Application Boundaries Enforcer) which blocks scripts served from a LAN address. This is for a good reason, and it may be possible to white-list the Pi in the ABE settings, but I didn't have a good enough understanding of the implications to feel comfortable just allowing it or disabling ABE.


I use the Pi-hole setup, but on my desktop computer instead of a RaspberryPi... just follow the instructions to "Setting It Up Manually" [1]

1. http://jacobsalmela.com/block-millions-ads-network-wide-with...


Yes, if you had access to every router you use. I travel a lot and have a laptop so I use the blocking via hosts method.


Should be very easy to do in OpenWrt and similar firmwares.


I loved using Gas Mask for multiple host file management on OSX. Not so much for Ads but a great app I had trouble discovering.


dnsagte[1] should use this as it's default source. Fixing.

[1] https://github.com/jakeogh/dnsgate


This looks awesome, plus I am already running shorewall +dnsmasq.


I wanted a service for my laptop with custom blacklisting/whitelisting, blocking stats and a webserver to serve a blank HTML page for any domains in DNS list so I made:

https://github.com/jdoss/dockerhole

It was inspired by https://pi-hole.net/ and I am glad to see there are others making similar things to block Ads.


I made a little C program that converts AdBlock Plus filter lists to hosts file entries: https://github.com/wwalexander/hostsblock

There's a bit of an impedance mismatch since filter lists support some fairly advanced pattern matching while hosts file entries are obviously limited to specific domains, but it gets most domains.


This technique works very well for blocking ads in Skype.

You can also block the BBC Breaking News banner this way by adding polling.bbc.co.uk. Or if you want to play a prank use 192.30.252.153 as the IP. GitHub pages don't check if you own the domain.

https://unop.uk/dev/breaking-the-news-blocking-the-bbc-news-...


For GNU/Linux also check hostsblock [1]. It's available on aur.

A pi-hole clone notrack [2]

[1] https://gaenserich.github.io/hostsblock/ | [2] https://github.com/quidsup/notrack


I'm using http://pi-hole.net running on a Raspberry Pi. I use it as my home dns, it runs dnsmasq and points a list of a million ad hostnames to its own IP, answering every request with a blank HTML page.


How many websites does it break? I'm a bit hesitant to set up pihole since it's rather hard to enable ads for just one domain, as more and more websites just stop working with ads removed/blocked.


It's not breaking much websites I'm using regularly. The only really breaking effect I see is when clicking on forwarder links of affiliate networks, of course. Happens for bargain/deal websites mostly.

And of course you shouldn't have to work with GA or Flurry or other analytics services, because they are blocked by pi-hole.net of course. But you can easily whitelist via ssh.

I'm now using it since 2 months and am pretty satisfied. The traffic saving effect is also nice, which make websites load faster as well.


Relevant: adblock with DNS Server https://hub.docker.com/r/kolyunya/afdns/

This is for people who cannot edit /etc/hosts, but can change DNS server.


"... people who cannot edit /etc/hosts, but can change DNS server."

e.g., "mobile" or "tablet" users who choose Apple iPhone, iPad, etc.

The idea of an ARPA-networked devices that have no user-editable HOSTS file seems inferior to ones that do, i.e. the vast majority of ARPA-networked computers for three decades, but that's just my uninformed view.

The experts selling these things must know better.

DNS works very well for blocking ads. It allows for things that cannot be done with HOSTS alone.

But if you trust a third party for your DNS resolution needs (ad-supported search engine company "free" public DNS, ad-supported, corporate-sponsored browser, etc.), then all bets are off.

If and when advertisers complain and start to cut back on spending, then these third parties could remedy the situation, easily. In my opinion.

If the user is running her own DNS services, then it may be too much trouble for advertisers and the companies they prop up. It is a stretch to think that any ad-supported company could stop users from exchanging lists of names and numbers, whether through a HOSTS file, zone files, or some other mechanism.



Has anyone tried OpenDNS? It seems that a filtered DNS service is the way to go here.


This needs to be a larger effort, one hosts file updated hourly or any regular interval which we just configure to fetch and update through a cron job and forget.

Would love to see some project like that.


Is this reasonably small enough not to cause performance issues? I notice it mentioned trying to keep the size more reasonable.

Ad-blocking via hosts files can often lead to a noticeable performance hit.


My Linux system runs a 60k+ /etc/hosts files, plus dnsmasq for local resolution. Nice thing about dnsmasq is that it treats domain-level entries as domain-level assignments.

No measurable lag.

I'd used the various blockfiles used by uBlock Origin, as well as some additional entries of my own, de-duplicated. There are some overaggressive entries, I've commented those.

A nice plus: I found the dozen or so hosts/domains associated with autoplay video crap, added them, and have no more bother from that.


At the beginning (2006) dnsmasq used very naive O(n^2) /hosts parsing procedure. I was the first person ever to attempt using dnsmasq to block ads :-)

http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/200...


Sweet. And if you had anything to do with the performance improvements, thank you very much.


I highly doubt it. I use AdAway for Android, and my current (/system)/etc/hosts file is 58626 lines with no noticeable lag on my Nexus 7 2012 and my Nexus 9.

I would assume local IO would usually beat network IO.


You can pay for a basic license of OpenDNS and "null route" (to a block page) ads at the device / network level. Those preferences get synced to anycasted DNS resolvers and performance hits are negligible.

Some ideas here, but it's pretty straightforward and functions just like their adult content blockers: http://iradar.blogspot.com/2011/07/useful-free-tool-use-open...


Worst case you can split up the hosts file into several different files, then use gas mask[1] to toggle them individually.

[1]: https://github.com/2ndalpha/gasmask


I remember reading a bit on how the kernel stores the hosts list, but I stopped before finding how it scales. All I remember is that it seemed like a linear list. If it was a trie it would be alright.


Compared to the overhead of a network request, the overhead of going to though even 1000 items doesn't seem that much to me.


you're getting downvoted, but I remember this issue cropping up for windows XP boxes. Other Operating systems and more modern versions of windows cope with large hosts lists more competently.


The problem isn't completely solved, though.

https://www.reddit.com/r/Windows10/comments/401h2o/hosts_fil...

In windows 10, the DNS Client does something that is O(n^2)


Isn't the O(n^2) operation described in that thread happening only when the hosts file gets updated, and not with every DNS lookup? I would imagine even sequentially searching a local list of 350,000 entries, like the post describes, would still be extremely fast in the context of loading a web page.

But, that said, there can be a huge difference between how one might think things should work and what actually happens.


Yeah maybe it's mostly specific to Windows. Wouldn't surprise me if that's the case. I was speaking from a Windows perspective (I should have specified).


I've been using this method for months if not years. I haven't noticed anything unusual about browsing speed, except that it's increased due to less ads.


Does anyone have a convenient way to convert this to a bind9 config format? I would rather run this for the whole LAN than just one computer at a time.


> Does anyone have a convenient way to convert this to a bind9 config format?

If you use BIND RPZs, you can convert from /etc/hosts format to BIND zonefile format, (or just pump the new entries to nsupdate) which should be pretty trivial. Some information and useful links are in this comment and subsequent commentary. [0]

IMO RPZs are substantially easier to manage than an ever-growing set of master blackhole zones, especially when you have slave DNS servers.

[0] https://news.ycombinator.com/item?id=11085521


You could use dnsmasq for your LAN. That way you can just use the hosts files as they are.

If it really has to be bind, check out this page for a tutorial on blocking using bind: http://www.malwaredomains.com/?page_id=6

Then just use awk to go from hosts file format to this: http://mirror2.malwaredomains.com/files/spywaredomains.zones

Hope that helps.


I used too but windows tends to hang up when the host file gets overly long so had to abandon it when the advertiser grew too many.

Has window 10 got better with that?



Another option is to stand up a DNS server that knows how to do something like BIND 9's Response Policy Zone [0][1].

Although figuring out how to propagate RPZ changes to them isn't exactly straightforward (more on this below), if you're using BIND, you can set up views that match certain clients and provide one mix of RPZs to one set, and another to another set.

On updating RPZs in a view (warning: BIND 9-specific instructions follow) :

So, BIND has this nifty option for a zone called "in-view". This lets you say "The data for this particular zone lives in this other view, so when requests come in for this zone, in this view, use the data in this other view.". It might sound complicated, but it's really just a pointer to a pre-existing zone definition. This lets you define your master zones in one big "zone definition" view, and have client-specific views refer back to those definitions.

However, you can't use in-view with RPZs. Why? Who knows? [2] But, what you can do is this:

* Create one unique RNDC key per view

* Add an allow-notify and match-clients entry in each view with that view's key

* In the appropriate views, add a slave zone definition for each relevant RPZ, with localhost as the master, and whatever is your usual domain xfer key as the key [3]

* Back up in your "zone definition" view, add to your also-notify list for each master RPZ definition an entry for localhost and each view key. [4] Having an ACL just for these RPZ slaves cleans up the RPZ definitions.

Now you have dynamically updatable host blocking that can be deployed on a per-host basis, if you like. It's initially a bit more work than managing a local hosts file, but you can easily apply host blocking lists to any set of machines on your LAN, and you can programmatically update the RPZ lists with tools like nsupdate.

[0] http://jpmens.net/2011/04/26/how-to-configure-your-bind-reso...

[1] http://www.zytrax.com/books/dns/ch7/rpz.html

[2] RPZs are handled just like regular zones in every other way except for this one. It's a bit frustrating.

[3] This is actually less burdensome than it sounds, as you can write these slave zone definitions once and include the files containing the definitions in whatever view needs them.

[4] That is, if you had three views, your also-notify list would have something like the following new entries: 127.0.0.1 key "view1-key"; 127.0.0.1 key "view2-key"; 127.0.0.1 key "view3-key"; You can have entries for just the views that use a given RPZ, but it doesn't hurt to have one ACL that notifies all views when any RPZ data changes.


Are you talking about the ability to NXDOMAIN *.example.com while also whitelisting this.example.com? AFAICT dnsmasq cant do this, I'm interested if bind can. I have a simple DNS request forwarder half-written to deal with rule trees.

Edit: https://github.com/paulchakravarti/dnslib looks interesting


> Are you talking about the ability to NXDOMAIN [star].example.com while also whitelisting this.example.com?

Yeah, you can totally do that! Details are here [0][1], but in your RPZ zone file, you use a CNAME with a value of . to return NXDOMAIN, and a CNAME with a value of rpz-passthru. to process the query normally:

  ;allow www.sinfest.net, but deny all others, including sinfest.net
  www.sinfest.net CNAME   rpz-passthru.
  sinfest.net CNAME       .
  *.sinfest.net CNAME     .
And to demonstrate:

  $ dig +short www.sinfest.net ; dig +short sinfest.net; \
  dig +short www.sinfest.net @8.8.8.8 ; dig +short sinfest.net @8.8.8.8
  64.29.145.9
  64.29.145.9
  64.29.145.9
  $
If you're interested in a complete, but simple RPZ zone file I can provide one. If you have more questions, feel free to ask, and I'll try to answer.

[0] http://www.zytrax.com/books/dns/ch7/rpz.html

[1] http://www.zytrax.com/books/dns/ch7/rpz.html#rpz-examples


Fantastic. I'll add bind output and optional integration asap.


Don't the empty <div>s just get left behind taking up space when using this method?


It's hard to do anything else - if you collapse them you can easily break the site layout. (Some sites may even design their layouts to break if the ads are removed).


It took me about a minute to find 12 false entries just by looking what lines end in .de


very cool. It would be interesting to see this built into a (maybe raspberryPI) router and have a more central point/policy for configuration maybe together with caching (dnsmasq).


There is the Pi-Hole Project, which does that pretty nicely: http://jacobsalmela.com/block-millions-ads-network-wide-with...


been doing this for a long time with various facebook related domains - upsets people if they borrow my laptop though.


Whilst I'm sure this is no longer the case, I used to do this back in the day, but it was soooo slow!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: