Not the way I'd do it, since you can easily miss on some new domain that belongs to facebook (or perhaps some server that does not look like it belongs to facebook in the first place, but it is sitting in their assigned subnets).
If you really want to block all traffic from/to facebook, lookup the IP prefixes associated with their AS number(AS32934), and setup your firewall to block those. If you are using PF, tables are your friend. With netfilter, consider using ipset.
Very interesting after looking up the AS number [1] you provided it includes domains that are not on that GitHub list - as you mentioned. I would have to agree with this being the better method to blocking facebook.
[1] http://www.tcpiputils.com/browse/as/32934
* note there are more than average ads on this site and it's a bit plain but this was the only one I could find that, with my brief searching, showed ip prefixes associated to the AS number.
If anyone has a better lookup tool that'd be great!
edit: Should also note that since other users are mentioning they might be under multiple AS number's maybe the ticket would be to setup a GitHub that collected these AS numbers for FB.
edit 2: found a less ad filled site!
http://ipduh.com
Search AS32934 click "prefixes" at top
Because most people don't block outbound and certainly not in a stateful way which means it's a poor place or a blacklist. To get this to work outbound, you need to allow all other traffic out (fine that's probably what you are doing already) or have a curated whitelist of other traffic allowed out. I assume this package doesn't want to make that assumption so the safe thing to do is to make an inbound blacklist.
That doesn't make any sense. You can block outbound just fine by having your block rules followed by a default allow. You don't need anything to be stateful when you are blocking whole IP addresses.
Nothing prevents you from having explicit deny rules followed by explicit allow rules followed by default deny.
And if you're already doing outbound whitelisting (which is generally much more trouble than it's worth) then unless you put Facebook on the whitelist you don't need to do anything anyway.
Smart, even this list forgets national domain names belonging to Facebook. For example Facebook.no which is owned by a rather anonomous company (lovellsnames.org /Low Gravity Limited), yet the domain appears to be used by FB and God knows how many subdomains under there again there are.
ARIN, et al, provide WHOIS servers that will map IP address information to ASNs. The list of all registered ASNs is also public. Also, there are routing registries which can provide even more information on which ASNs announce which prefixes.
Most of what you'd want is out there, but you might have to obtain it all and doing a bit of work to combine/correlate i all.
completely agree. unfortunately dnsmasq, firewalls, running your own bind, or PF, table, or the other solutions you mention -- these all require a level of technical expertise far, far beyond the norm, or even 2 standard deviations above the norm of Internet users' technical ability (for some of them, including me). I had maintained the Facebook blocklist for hosts files because it was super simple for me to use, and to share with others.
definitely not ideal, not even complete, and requires work - BUT, nearly any Internet user can implement the solution done this way.
It would be really interesting to autogenerate the domain lists by running background scripts on AS numbers, polling DNS for every IP in the range, and cataloging the domains by script - say, daily, and then printing the list into a 0.0.0.0 prefixed hosts list. Thank you!
disclaimer: github user maintaining linked resource
Most likely yes. There are only about 50k public autonomous systems on the entire Internet. The problem of course is already mentioned by another commenter: some FB assets are in other ASes.
An AS (identified by an ASN) is an autonomous system. It's comprised of multiple CIDR blocks, contiguous regions of IP addresses. The network definition (by CIDR block) is fairly dynamic, as blocks can be added, deleted, or consolidated.
An autonomous system is a single administrative domain over public IP space. Essentially, autonomous systems are what the Internet is inter-networking between, through BGP (border gateway protocol). BGP and AS are what Cisco (and other router) gear are ultimately all about.
So yes: organisations typically have one AS. Exceptions are typically the result of corporate mergers (not uncommon) or government space (where the domains are large).
(Disclaimer: I'm not a networking bithead, don't muck with routers much, and have a rough knowledge of much of this, though it should be vaguely accurate.)
And if you end up with more than one AS due to a merge, you usually pick one them as your primary and connect the other ones to it, then announce their prefixes from your primary, ie. your primary AS becomes transit for your other ones. That way you only have to maintain one external border.
I was at a hackathon a few weekends ago, and to my surprise I needed to register a domain name and get an SSL certificate for it.
I thought I was unreconcilably blocked for the duration of the hackathon, as DNS propagation disclaimers give itself 72 hours, and SSL certificates require who knows what.
I was able to get a completely new domain with Amazon Route 53 and Amazon's free SSL certificates in 20 minutes.
So yeah, I would say these blocklists are futile now, in OP's format.
It's inefficient to specify a large number of hosts in the facebook.com domain instead of blocking the whole domain.
For this, you can run dnsmasq and use the "--address" option or "address" command in dnsmasq.conf:
$ man dnsmasq
[...]
-A, --address=/<domain>/[domain/]<ipaddr>
Specify an IP address to return for any host in the given
domains. Queries in the domains are never forwarded and always
replied to with the specified IP address which may be IPv4 or
IPv6. To give both IPv4 and IPv6 addresses for a domain, use
repeated -A flags. Note that /etc/hosts and DHCP leases over‐
ride this for individual names. A common use of this is to redi‐
rect the entire doubleclick.net domain to some friendly local
web server to avoid banner ads. The domain specification works
in the same was as for --server, with the additional facility
that /#/ matches any domain. Thus --address=/#/1.2.3.4 will
always return 1.2.3.4 for any query not answered from /etc/hosts
or DHCP and not sent to an upstream nameserver by a more spe‐
cific --server directive.
Is there the concept of an 'administratively prohibited' error in the DNS? So your resolver could return an error with that code rather than an incorrect result.
Sadly not. The closest match is probably SERVFAIL, but that covers all sorts of problems. (SERVFAIL is what a validating recursive server will return if its upstream tries to NXDOMAIN a signed domain.)
When a site adds Facebook it's using connect.facebook.* the main bulk of the list is just whitelists for Facebook users (first-party). The hosts file will break all of Facebook even if you visit it directly. The better option is to just block Facebook outside of Facebook.
Knowing corporate org charts for what they tend to be, reporting and analytics initiatives, and any server statistics therein, are considered revenue generating information (leads), and thus subject to agreements for the exchange mutually beneficial data sets.
Across my various jobs, I've had to write reports for departments, and open up permissions to internal people, to give read access for things they'd have no natural reason to care about.
If data is being collected at all, weird people will be looking at it. If not today, maybe tomorrow. But, no matter when, it's there for the looking whenever some internal lookie-loo decides it might be interesting.
I've been blocking facebook for years (nowhere near as comprehensive as this list though).
Many of the most unfortunate problems with these sites are social in nature rather than technical. For example, no matter how much I plead with people not to, they keep uploading information about me to these type of sites, including photographs with timestamps and GPS location metadata, which they then "tag" my face as being me.
I don't have any idea how much of this information is even out there, since these sites require signing up in order to find out. Maybe I should look into my rights under data protection legislation...
This question sounds a little too close to "nothing to hide, nothing to fear" to me, but in any case I think it's Facebook, attempting to build dossiers on billions of people for profit, who need to justify themselves; not me for wanting to remain undocumented.
As far as concrete reasons go, I've had to deal with far too much fallout from being incorrectly flagged by braindead processes trawling private databases which I didn't even know I was in. Since lots of these databases share information, but not necessarily updated corrections, I still run into the same mis-flagging every few years, across utilities, courts, credit agencies, banks, letting agents, etc.
As far as Facebook goes, being a citizen of the CCTV-riddled UK makes me acutely aware of the power, and potential abuse, that facial recognition technology can bring; having images of my face tagged and fed into a database does not sit well with me.
Since I don't use Facebook, I don't even get the meagre upside of whatever services they build on top of this database (some kind of gallery, I presume).
Is it possible to apply this at the network level? I want to update my home router easily so that all devices in my home can benefit, not just my own laptop (since most of these lists are just updating /etc/hosts)
If your router runs dnsmasq, it should be possible. It would depend upon your router, however, and how much control it allows you over the dnsmasq configuration.
> Would wildcard support in hosts files be too heavy for the performance needed?
Can anyone explain me why wildcard support for blocklists would require more performance?
Assuming it's just basic wildcards, not full regex.
Seems to me that 1) with wildcards you need to check less bytes per entry and 2) since the list itself will be shorter you need to check less entries.
Do any of these filtering tools compile the list (offline) to a minimal state machine or a trie? That would probably max performance (and benefits from wildcards too).
I can understand the multiplication of sub domains, to be able to use multiple connections. But what's the rationale for the multiplication of domain names? Ad blocker avoidance?
Also literally separate domains in the sense that one team probably owns authoritative DNS for fbcdn.com and another probably owns facebook.com. with a big infrastructure, it would be negligent to allow everyone permissions to edit a domain like that. But you probably want to do permissions more like an org chart and less of a hand curated list of people who both have business reason to edit and steady hands/full understanding of DNS.
Lots of big infrastructures are pretty much put together like the internet.
I just want to say "Privacy matters, thanks you" (even more when FB decided to leverage their like/share button for a global ad netwotk) :-) non tech saavy guys may love a simple .sh / .bat to automatically add those entries to the /etc/hosts on windows & unix
Telling "non tech saavy guys" to pipe URLs into their shell (or hosts file in this instance) is a pretty bad idea. You're training them to engage in risky behaviour and be even more gullible.
This is probably a reaction to the news that Facebook is now officially tracking non-users to create shadow profiles and serve adverts to them off Facebook itself. I think it's the serve-adverts-off-Facebook-itself part that's the actual news; all of the moderately chilling tracking and profile construction was of course happening already.
And yet, Google started going down a similar path since December 2009 when they introduced personalized searches for non-logged-in users and nobody tries to block them.
The problem with blocking all google doains is the amount of sites it would break. Youtube, gmail, googleapis for js libraries, google's blog platform, maps based on google maps and more would break.
Technically true, but blocking all Facebook domains breaks comments on many sites and the login feature on a few crucial ones too.
<sarcasm>It also 100% breaks your social life, but maybe there's little of that left to disrupt anyway, amongst the typical target audience for these lists :P</sarcasm>
It's actually the other way around. Normal people have managed their social lives without facebook for generations. It's only the recent crop or two who seem unable to do it.
No, "Normal people" who care about their social lives use whatever their friends use to get together at the time. Nowadays this is social networking sites, i.e. Facebook.
This is factually incorrect. Just because you haven't personally seen people block Google domains doesn't mean those people don't exist. I've blocked many Google domains for a long time. Not only was GA the first domain in my blacklist, it's also the domain that many of my non-technical friends/family wanted blocked.
You should do nothing. I can think of at least one good reason why I want to do that though: to disable their tracking of non-users on this my computer.
Snowden has made it clear that that government grants itself direct access, whatever the legal situation really is.
It's also become clear that the government lets itself get away with it. And that there is no resistance from the voters who voted the politicians in and are paying not only for the politicians' salaries but also for their own total surveillance.
If they regulate it, they will probably do it like in the EU where you have to click on some super-annoying "I agree to cookies bla bla bla" thing entering any website, which just trains people to automatically agree.
That was really frustrating. As a UK Web developer at the time, I understood the ruling as preventing the use of client-side tracking technology without an opt-in; this would have included tracking cookies, supercookies, web beacons, etc., but wouldn't include non-tracking uses required for functionality, like "remember me" tickboxes.
It looked like a good first step to tackling rampant privacy violation, but at the last moment the Information Commissioner caved in to bullshit claims that the ruling would cause the collapse of all Web businesses. The enforcement was changed from "not allowed unless opted-in" into "visiting a site counts as opting in".
The end result is not only completely ineffective, as it basically changes nothing; it's also resulted in the profileration of ridiculous "by using our site you agree to our use of cookies" messages, which just annoy without doing anything.
With the help of a couple of prefix aggregation tools [1] [2], the BASH shell, and the RIPE database, it is straightforward to block any Autonomous System with, e.g:
$ ASN=32934; for IP in 4 6; do \
whois -h riswhois.ripe.net \!${IP/4/g}as${ASN} |\
sed -n '2 p' | tr \ \\n | aggregate${IP/4/} |\
while read NET; do echo ip${IP/4/}tables -I OUTPUT -d ${NET} -j REJECT;\
done; done
(note this command uses echo to show the command it could execute)
there is no consistent way of defining the meaning of "all domains of a company". Who pays for the registration? Which email is listed as technical contact? Who has the authority to change DNS-records? Which email listed in the DNS SOA-record?
Aside from being easier to automate, getting IP's via the ASN lookup is also better for blocking HTTPS requests when you are MITM, since the HTTPS request will only contain the IP and not the FQDN.
Also, many firewalls do a 1-time DNS lookup of a given FQDN to resolve a single IP address when a FQDN based rule is created. This doesn't work well if you have an FQDN that can resolve to many different IP's, which is typical for cloud services.
TLS connections from browsers usually include the SNI extension that has the destination host name in clear text. It requires an TLS specific blocker, rather than IP firewalling, but is probably more flexible. You could also just block the names in DNS.
Ghostery only works with the web browser. Other apps would be free to continue to embed such links or assets from those locations (and based off my own use of a hosts file, it's more common than I'd like to admit).
Slightly off topic: It would be nice to have some kind of extension for Chrome that blocks all Time-Wasting websites with one click. Has anyone seen something like that?
I discovered that using https://chrome.google.com/webstore/detail/waitblock/kcnjfepp... to add a delay before opening the time-wasting website actually works better when trying to procrastinate less. Waiting 60 seconds before Fb loads gives you enough time to think about whether you want to visit it, but is also not so inconveniencing that it would make you disable it straight away when you actually want to visit the site.
It's really interesting. It kind of turns your impulsiveness against itself, so your monkey is saying "Ugh, waiting for Facebook is boring, let's do something else."
This extension is really awesome! Thanks for sharing.
I usually open some time-wasting website when I'm waiting on some other tasks to finish and it's got so bad I do that even if the wait is <30sec, such is the addiction to these little useless rewards.
Now with a wait time on the time-wasting websites, maybe I could use this habit against itself and instead go do something else more productive while I wait.
Though hosts files can be fed to uBlock Origin ("uBO"), it will enforce their content differently.
With uBO, a "facebook.com" entry in a hosts file will also cause all subdomains of "facebook.com" to also be blocked, so there is no need to list all subdomains as is done here if the goal is to block "facebook.com" with uBO.
If one wants to block Facebook via uBO, I personally advise to do it through dynamic filtering[1]. This way one can always point-and-click to create exceptions on a per-site basis.
Well, if my goal is to block all of Facebook's domains, I wouldn't complain if uBO happens also to block new.sub.domain.fbcdn.net even if it's not in the hosts file :)
Does uBO optimize these cases, though? E.g. if there's "apps.facebook.com", "connect.facebook.net" and plain "facebook.com", does it collapse to just 1 filter (facebook.com)? I see it says "880 used out of 881" which is the number of entries in the file.
Microsoft was an investor in Facebook, http://whoownsfacebook.com and they are planning an undersea cable between the US and Europe that will only be used by the two companies.
Honest question because I seriously don't know: Is facebook really worse than google when it comes to privacy?
I kind of wonder who exactly are the people telling everyone to block facebook everywhere while everyone seems to collectively ignore google.
Google and facebook seem to both purposely ignore the known implications of their data collection programs. They likely have handed over data to the NSA, and we know they sell the data.
Not really. We know that they sell ads which can be targeted to users with specific characteristics. If you have discovered actual user data for sale from Google or Facebook, that's news.
> Is facebook really worse than google when it comes to privacy?
No. In my view Google (has the potential to) collect a lot more sensitive data than Facebook. All Facebook knows is who my friends are and stuff like what things I like and where I've been - mostly things that I wouldn't mind being public anyway. Google knows everything I search for, every email I receive and every web page I visit.
If you really want to block all traffic from/to facebook, lookup the IP prefixes associated with their AS number(AS32934), and setup your firewall to block those. If you are using PF, tables are your friend. With netfilter, consider using ipset.