I had been doing this until some time ago to block ads and to prevent Google from collecting my web browsing history via Google Analytics. During the time I witnessed a strange phenomenon. Every time I added "127.0.0.1 www.google-analytics.com" to C:\Windows\System32\Drivers\etc\hosts. I saw the line removed from the file some hours later. Although I had added tens of lines I only saw the Google Analytics line removed. IIRC finally I decided to figure out whet caused the removal. I used Filemon to watch file changes, though the line got removed again while watching the file and nothing appeared on the log. I suspected Ring-0 processes were secretly running and causing the removal, but I knew nothing about the Windows kernel so I gave up here. I wonder what was the cause to this day.
Instead of Filemon I'd suggest firing up Process Monitor [1] with a filter of "path contains system32\drivers\etc\hosts" and then Filter -> Drop Filtered Events.
Let this run while you go about your normal work, then check back after you notice the change. Look through the Operation column for WriteFile or something similar, then see what Process Name did it. This'll let you figure out what's actually making the change and you can appropriately assign blame.
on the subject of sysinternals Process Monitor. Did you know procmon.exe REQUIRES Workstation service running in order to start. It uses it to enumerate something and will silently die without it. This is not documented anywhere and pretty bogus.
Older versions worked fine without this service. It was silently added somewhere between win7 and win8 releases.
Did you have local Google services running, like the Google Updater (afaik also comes with Chrome). Google also adds some entries into the task planner, you can also check there what is getting called.
Though I believe you should have seen something in Filemon.
I thought I had similar, we use Junos Pulse and it rewrites your /etc/hosts file. I think it takes a backup at some point and rebuilds the file from that when it needs to. This means some local changes just disappear. Nothing sinister though afaik.
TO edit the hosts file, you need to have admin privileges. That means closing whatever editor you're using, reopening it with 'run as administrator', and then opening the hosts file. You need to do this even if you are an admin account.
Another way to do it is to open the hosts file under normal editing privileges, editing it, saving it somewhere else, and pasting it into the drivers folder. The system will ask you if you want to run as admin, and you need to say 'yes'.
Nothing could (or should, I guess) be changing the hosts file otherwise (AFAIK, my source being many, many SO posts and random forums) without it being given explicit admin privileges when it attempts to change the file.
Why would a simple Usermode Font Driver Host need internet access??!?!
binisoft.org Windows Firewall Control has an option to safeguard firewall rules and automagically deletes all unauthorized (by the only person that matters - ME) rules.
Mine is the reverse. I have this line added to /etc/hosts on my mac, a local A record I want to play with. Now, I am done with my project, and I wanted to remove the A record. I keep rebooting and that line persists. God knows why.
Assuming you meant why not: "Using 0.0.0.0 is faster because you don't have to wait for a timeout. It also does not interfere with a web server that may be running on the local PC."
He said he used 127.0.0.1 for google analytics. I was asking why he used 127.0.0.1 and not 0.0.0.0, exactly for the reasons you wrote. Sorry if that wasn't clear.
See, the thing is what if a website is broken due to host files, you can't easily re-enable ads for just this one website you need.
A situation we can all imagine ourselves in: You need to check the google analytics for your website/company site. You can't because it's blocked at Host level.
I've been using the hosts file method for years and have never had an issue with checking Google analytics. I use the "Someone who cares" link.
Aside Google sponsored links and the odd ad sponsored link on pseudo-news sites not working (due to them being tracking URLs), I can't see it ever gets in my way.
However to answer your question, these days I run dnsmasq on my home server and have my DHCP server assign that as my primary DNS. So every device (phone, laptop, smart TV, etc) gets their ads blocked as well - which is particularly good for my TV as it's bad enough having regular adverts on TV without LG pushing out sponsored content as well. So if there was a rare occasion that I needed to turn off my ad blocking, I'd just change the DNS to 8.8.8.8 (Google DNS) then switch back to my dnsmasq server once i was done (the only complication being that I sometimes need to close and reopen the browser due to that particular application caching the DNS lookup)
The nice thing about using dnsmasq is that you can import those hosts files verbatim. Which means your update script can be simple.
I used to block various domains that served TV ads for UK Channel 4, but they hosted the client and that not detect ad loading. Seems you could probably truncate the video stream, maybe o using iptables, but their ad-load has reduced so I'm not motivated to try right now.
The weirdest side effect I've had was with the Sky HD box. If it was connected via ethernet then it wouldn't power up while my ad blocking was enabled. I was able to replicate this behavior by enabling and disabling my ad blocking, so my Sky box was definitely phoning home and failing to start if a specific domain was disabled. The weird thing is it would start up fine on WiFi or if the internet was disconnected completely. So I ended up just connecting it to WiFi as my wife was growing impatient by that point!
I did intend to reinvestigate the issue; throw wireshark on a promiscuous NIC and look for what domain Sky was trying to connect to and what data it was sending. I was thinking it might make an interesting article - depending on what I find. But in all honesty I had then forgotten about it until now.
If you're on a Mac I'd suggest Gasmask. It's a lightweight freeware that lets you switch between multiple hosts files right from the menubar. I have a productivity hosts file I switch on everytime I want to block social media, another one for development purposes and a standard one to revert too in case I don't want any hosts overhead and a clean default file. It's extremely useful.
I've been using this simple script on OS X for quite some time now. It works like a charm and is using git for that exact reason, to be able to quickly disable/enable and also keep track of exceptions, changes, etc.
You're suggesting renaming the hosts file to something else temporarily? I guess that would work. How quick does the operating system pick up on this change, do I need to reboot my machine for it to reload the Hosts file?
No reboot needed. I have updated my hosts file regularly in linux, windows and macosx without needing a reboot.
Generally your browser will pick it up quickly as well. It doesn't cache host file entries in the same way as dns lookups so the effect of adding or removing is pretty much instant. I use Chrome mostly, so I am not as sure about other browsers.
No need to reboot, as soon as you change the hosts file the OS should immediately start to use the new version, the hosts file is the first thing the OS checks when resolving names, this is what enables this approach to work.
It takes immediate effect. I often have a terminal open and toggle commenting out an entry when testing something. As soon as you save it, it's good to go.
Hi Folks, this is my repo, thanks for all the comments.
I'm always looking for ways to improve things so I'm open to all suggestions.
EDIT: A couple of clarifications.
1) This isn't just for adblock. Your hosts file is useful for thwarting all sorts of malware. If a bot or trojan phones home with a domain, a vigilant hosts file will block it. A if a bot or trojan phones home with an IP, then the hosts file can't help you but, then again, an IP can be physically located fairly quickly.
2) The key to a good hosts file is keeping it current. This hosts file amalgamates several well-curated sources. So your hosts file is only as good as your ability to keep it current. This repo helps with this.
I guess I could plug a silly little script I wrote for GreaseMonkey which runs on facebook.com. I hated looking at the "Trending" and "Recommended Pages" sidebar so I axed them out in favor of random imgur images of your favorite subreddit. By default you'll get `/r/aww` but you can flip it with a fixed little box in the top right to whatever floats your boat.
Full disclaimer: I'm a horrid at javascript, and the result below is mainly due to a lot of copy paste from various internet sources.
Lets you chose which lists to use, and automatically update those lists. Also makes it easy to temporarily disable your rules if you need something that's blocked. Has a button for flushing the DNS cache.
(Full disclosure: I run a service that blocks and intercepts malware communication using DNS! https://strongarm.io)
Blocking via your hosts file has some great benefits; it works regardless of network and is relatively easy to update. Unfortunately, it doesn't scale easily to many systems or give you any insight into whether or not you are trying to connect to blocked domains.
Blocking via DNS is a good alternative and is suggested multiple times in this thread. You can easily protect a whole network by setting your recursive resolvers and it works across any system.
If you are interested in this and don't want to operate and maintain your own DNS (as well as pulling down various domain lists) check out https://strongarm.io. We manage DNS, aggregating lists of bad domains, and (most uniquely) will alert you if you try and talk to a blocked domain.
It's free for personal use. We are a growing startup and love feedback from HN. Feel free to contact me directly as well! stephen[at]strongarm.io
I'm using my own VPN server and I setup unbound DNS server there. It's the only way for my old iPhone and iPad to browse internet without ads. And it's really fast. I use https://pgl.yoyo.org/adservers/ for ad servers list and a little awk script to convert it to unbound format.
Yes, unbound is great—I run it on my local machine with a list of ad servers to block. I use this little script to download and convert the Someone Who Cares hosts list to unbound format[0] every few weeks. BTW, the pgl.yoyo.org list is available[1] in Unbound format since a few months ago[2].
> Using 0.0.0.0 [instead of 127.0.0.1] is faster because you don't have to wait for a timeout.
I'm not going to argue that localhost is better than 0, but that specific argument they've raised is incorrect. You don't have to wait for a timeout on localhost either. It will either fail instantly due to no listening processes on that IP and port, or it will connect to whatever process you have open on that address (eg a local instance of a http daemon).
Or use the DNSCrypt proxy. It has a module to filter DNS responses based on their name (full name or using expressions such as sex), or on the IP addresses they resolve to.
Instead of returning 127.0.0.1, which can make you vulnerable to rebinding attacks, it returns responses with the standard "REFUSED" response code.
https://simplednscrypt.orghttps://dnscrypt.org
I've been doing this on a Streisand created VPN server : https://github.com/jlund/streisand and use most of the same lists (though I've had to remove a few things from "Someone that cares" and I've also had to add a few things - I target apps too so I nuke some specific to mobile apps that are likely not in those lists).
The reasons:
* Block adverts in native mobile apps
* Block adverts in mobile web browsing
* Create a single connection for the mobile (reduce exposure to latency of new connections to different servers)
* VPN connection keep-alive means I seldom reconnect
* Side effect of mitigating risk of my telco screwing with my traffic or excessively logging metadata
It works really, really well.
I'm sure someone will say "battery!" but the cost of mobile adverts on batteries far outweighs the cost of connecting to a VPN.
This is effectively adblock for mobile that works for all apps and websites.
Either way I'm not personally comfortable with this method since I often run servers on my personal computers for development. So whether it's 127.0.0.1 or 0.0.0.0 the request will still reach my system and possibly be handled by any port depending on who makes the request.
I would much rather have a browser plugin for this.
Those servers could be anything from MySQL, redis to any web app.
I get that the hosts-method is meant to affect all apps but that's not a big problem for me running Mac OS and Fedora.
Last time I had to block ads this way was when Opera had them embedded and it was much less memory hungry than Phoenix on my 256M RAM laptop. Back then I blocked them in ipfw instead.
Basically it's slower if you use 127 because it actually checks if something is running (example a server or whatever). 0.0.0.0 means explicitly "there is zilch, nip, nada". It's a little bit faster.
You can have processes listen on 0 - which means all available IPv4 IPs (available to that machine).
And since you can have processes listen on 0, it means you can equally curl 0; just like you could with 127.0.0.1. Here's an example from my IRC server (the only process I run on 0):
$ curl https://0:9997 -kis | head
HTTP/1.1 200 OK
Date: Fri, 12 Feb 2016 07:49:24 GMT
Server: ZNC - http://znc.in
Content-Length: 1878
Content-Type: text/html
Set-Cookie: 9997-SessionId=54245f15ba592bc691e09ac75e6778e6d4c33841fad71a8d6c56addc998e043f; path=/;
Connection: Close
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
(just in case someone queries, 0 is just another notation of 0.0.0.0)
NXDOMAIN is also an option if you use dnsmasq. I havent tested if 0.0.0.0 actually returns NXDOMAIN, but from you comment, it sounds like yes. Also see --dest-ip in "dnsgate config --help".
telnet 0.0.0.0 $port works fine on my machine, it connects to localhost, so besides the speed, I don't think there's any other difference at least on Linux.
I do some blocking with my hosts file, and I used to put 127.0.0.1. That hits my default nginx vhost, so i usually get 404 or 403 pages which is not so bad since my default vhost is an empty docroot. But it is not very intuitive and sometimes I think a server is down because of the nginx error page before I realize its my hosts file. I considered making a custom error message for my default host to remind me what I'm hitting, but I'm lazy.
I suppose it would be possible to craft a url that attacks local web services sometimes found on developer machines. If someone can confirm this is indeed the case, I'll submit a pull request to their README.
This won't work, at least on Chrome. It blocks all cross-domain requests to localhost[1]. Even if the target is used with a domain that resolves to 127.0.0.1, or has CORS completely disabled with "Access-Control-Allow-Origin: *".
While this is ok as an idea, I prefer Privoxy [1] to get my ad blocking outside of the browser. It has the benefit that I can turn it on and off (I use a proxy switcher). It also means that I can have other devices use it either via LAN or SSH tunnel or whatever).
Could someone please explain why advertisers do not re-use functional domain names to defeat domain-based filtering? I always find it fascinating that they still use such obvious ad-only (sub)domains to host assets.
That's really the next step in the evolution here. Content creators/producers and publishers need to make money. They're not going to make all that content for free if ads (their revenue) suddenly disappeared tomorrow. Sure, you'll still have a few donation driven sites, and a few subscription-based websites, but 80% of the internet (probably more) relies on ads. Without that revenue stream, the content goes poof. Not to mention, all that innovation and the innovation yet to come. The next youtube, reddit or facebook would not get invented.
Anyways, yeah, some ad companies are starting to do exactly this. They're serving ads up from the domain/website the ads are displayed on. Host files are completely ineffective here unless you've already previously spotted the ads and manually blocked them (and provided they haven't changed the file name of the ad/picture since your last visit).
But the ad companies doing this are tiny. It's quite likely you've never been on a website that has enacted this method. Most ads on the 'net are served up from google's adchoice/adsense program. Once Google themselves start doing it, it's game over for adblockers and for host blocking. I heard through the grape vine that they're actually working on this very issue (the issue of ad blockers).
Thanks for the tip! I would image that once HTTP/2 becomes more popular, serving ads from the same domain might even be preferred simply for the speed improvement.
I've recently switched from Windows to Linux, and I feel kind of "naked", because I don't know what is the Linux equivalent of Windows Firewall + NOD32 + Common Sense. I've got ufw and AppArmor installed so far. Is using such a huge hosts file a common practice? Also, what about Flash? I don't want to install it, but some websites insist on it still. What would you advise, folks? I'd appreciate any suggestions.
There are a couple of products for this already: "AdTrap" and for the Raspberry Pi "pihole". (Content warning on https://pi-hole.net/ : contains piping curl into bash, slow-loading SPA site)
Pi-hole is great and I had no trouble setting it up, but I couldn't continue using it as it constantly triggered NoScript's ABE (Application Boundaries Enforcer) which blocks scripts served from a LAN address. This is for a good reason, and it may be possible to white-list the Pi in the ABE settings, but I didn't have a good enough understanding of the implications to feel comfortable just allowing it or disabling ABE.
I wanted a service for my laptop with custom blacklisting/whitelisting, blocking stats and a webserver to serve a blank HTML page for any domains in DNS list so I made:
There's a bit of an impedance mismatch since filter lists support some fairly advanced pattern matching while hosts file entries are obviously limited to specific domains, but it gets most domains.
This technique works very well for blocking ads in Skype.
You can also block the BBC Breaking News banner this way by adding polling.bbc.co.uk. Or if you want to play a prank use 192.30.252.153 as the IP. GitHub pages don't check if you own the domain.
I'm using http://pi-hole.net running on a Raspberry Pi. I use it as my home dns, it runs dnsmasq and points a list of a million ad hostnames to its own IP, answering every request with a blank HTML page.
How many websites does it break? I'm a bit hesitant to set up pihole since it's rather hard to enable ads for just one domain, as more and more websites just stop working with ads removed/blocked.
It's not breaking much websites I'm using regularly.
The only really breaking effect I see is when clicking on forwarder links of affiliate networks, of course. Happens for bargain/deal websites mostly.
And of course you shouldn't have to work with GA or Flurry or other analytics services, because they are blocked by pi-hole.net of course. But you can easily whitelist via ssh.
I'm now using it since 2 months and am pretty satisfied. The traffic saving effect is also nice, which make websites load faster as well.
"... people who cannot edit /etc/hosts, but can change DNS server."
e.g., "mobile" or "tablet" users who choose Apple iPhone, iPad, etc.
The idea of an ARPA-networked devices that have no user-editable HOSTS file seems inferior to ones that do, i.e. the vast majority of ARPA-networked computers for three decades, but that's just my uninformed view.
The experts selling these things must know better.
DNS works very well for blocking ads. It allows for things that cannot be done with HOSTS alone.
But if you trust a third party for your DNS resolution needs (ad-supported search engine company "free" public DNS, ad-supported, corporate-sponsored browser, etc.), then all bets are off.
If and when advertisers complain and start to cut back on spending, then these third parties could remedy the situation, easily. In my opinion.
If the user is running her own DNS services, then it may be too much trouble for advertisers and the companies they prop up. It is a stretch to think that any ad-supported company could stop users from exchanging lists of names and numbers, whether through a HOSTS file, zone files, or some other mechanism.
This needs to be a larger effort, one hosts file updated hourly or any regular interval which we just configure to fetch and update through a cron job and forget.
My Linux system runs a 60k+ /etc/hosts files, plus dnsmasq for local resolution. Nice thing about dnsmasq is that it treats domain-level entries as domain-level assignments.
No measurable lag.
I'd used the various blockfiles used by uBlock Origin, as well as some additional entries of my own, de-duplicated. There are some overaggressive entries, I've commented those.
A nice plus: I found the dozen or so hosts/domains associated with autoplay video crap, added them, and have no more bother from that.
I highly doubt it. I use AdAway for Android, and my current (/system)/etc/hosts file is 58626 lines with no noticeable lag on my Nexus 7 2012 and my Nexus 9.
I would assume local IO would usually beat network IO.
You can pay for a basic license of OpenDNS and "null route" (to a block page) ads at the device / network level. Those preferences get synced to anycasted DNS resolvers and performance hits are negligible.
I remember reading a bit on how the kernel stores the hosts list, but I stopped before finding how it scales. All I remember is that it seemed like a linear list. If it was a trie it would be alright.
you're getting downvoted, but I remember this issue cropping up for windows XP boxes. Other Operating systems and more modern versions of windows cope with large hosts lists more competently.
Isn't the O(n^2) operation described in that thread happening only when the hosts file gets updated, and not with every DNS lookup? I would imagine even sequentially searching a local list of 350,000 entries, like the post describes, would still be extremely fast in the context of loading a web page.
But, that said, there can be a huge difference between how one might think things should work and what actually happens.
Yeah maybe it's mostly specific to Windows. Wouldn't surprise me if that's the case. I was speaking from a Windows perspective (I should have specified).
I've been using this method for months if not years. I haven't noticed anything unusual about browsing speed, except that it's increased due to less ads.
Does anyone have a convenient way to convert this to a bind9 config format? I would rather run this for the whole LAN than just one computer at a time.
> Does anyone have a convenient way to convert this to a bind9 config format?
If you use BIND RPZs, you can convert from /etc/hosts format to BIND zonefile format, (or just pump the new entries to nsupdate) which should be pretty trivial. Some information and useful links are in this comment and subsequent commentary. [0]
IMO RPZs are substantially easier to manage than an ever-growing set of master blackhole zones, especially when you have slave DNS servers.
Another option is to stand up a DNS server that knows how to do something like BIND 9's Response Policy Zone [0][1].
Although figuring out how to propagate RPZ changes to them isn't exactly straightforward (more on this below), if you're using BIND, you can set up views that match certain clients and provide one mix of RPZs to one set, and another to another set.
On updating RPZs in a view (warning: BIND 9-specific instructions follow) :
So, BIND has this nifty option for a zone called "in-view". This lets you say "The data for this particular zone lives in this other view, so when requests come in for this zone, in this view, use the data in this other view.". It might sound complicated, but it's really just a pointer to a pre-existing zone definition. This lets you define your master zones in one big "zone definition" view, and have client-specific views refer back to those definitions.
However, you can't use in-view with RPZs. Why? Who knows? [2] But, what you can do is this:
* Create one unique RNDC key per view
* Add an allow-notify and match-clients entry in each view with that view's key
* In the appropriate views, add a slave zone definition for each relevant RPZ, with localhost as the master, and whatever is your usual domain xfer key as the key [3]
* Back up in your "zone definition" view, add to your also-notify list for each master RPZ definition an entry for localhost and each view key. [4] Having an ACL just for these RPZ slaves cleans up the RPZ definitions.
Now you have dynamically updatable host blocking that can be deployed on a per-host basis, if you like. It's initially a bit more work than managing a local hosts file, but you can easily apply host blocking lists to any set of machines on your LAN, and you can programmatically update the RPZ lists with tools like nsupdate.
[2] RPZs are handled just like regular zones in every other way except for this one. It's a bit frustrating.
[3] This is actually less burdensome than it sounds, as you can write these slave zone definitions once and include the files containing the definitions in whatever view needs them.
[4] That is, if you had three views, your also-notify list would have something like the following new entries: 127.0.0.1 key "view1-key"; 127.0.0.1 key "view2-key"; 127.0.0.1 key "view3-key"; You can have entries for just the views that use a given RPZ, but it doesn't hurt to have one ACL that notifies all views when any RPZ data changes.
Are you talking about the ability to NXDOMAIN *.example.com while also whitelisting this.example.com? AFAICT dnsmasq cant do this, I'm interested if bind can. I have a simple DNS request forwarder half-written to deal with rule trees.
> Are you talking about the ability to NXDOMAIN [star].example.com while also whitelisting this.example.com?
Yeah, you can totally do that! Details are here [0][1], but in your RPZ zone file, you use a CNAME with a value of . to return NXDOMAIN, and a CNAME with a value of rpz-passthru. to process the query normally:
;allow www.sinfest.net, but deny all others, including sinfest.net
www.sinfest.net CNAME rpz-passthru.
sinfest.net CNAME .
*.sinfest.net CNAME .
It's hard to do anything else - if you collapse them you can easily break the site layout. (Some sites may even design their layouts to break if the ads are removed).
very cool. It would be interesting to see this built into a (maybe raspberryPI) router and have a more central point/policy for configuration maybe together with caching (dnsmasq).