
BeGoneAds – A Python script that blocks ads by installing common hosts files - anned20
https://github.com/anned20/begoneads
======
AdmiralAsshat
Python scripts to modify system files make me a little skittish, even with
source code available. I think I would just as soon grab the hosts file from
[https://www.someonewhocares.org/](https://www.someonewhocares.org/) and drop
it in myself.

~~~
joyjoyjoy
Why?

~~~
LASR
Why not?

Modifying system hosts configuration requires privileged file system access.

The mindset here should be default deny.

~~~
sh-run
Yep, finding and modifying a script that runs with root privileges, but is
writable by non-root users is the oldest privesc trick in the book.

With the proper permissions something like this should be ok, but I'd tread
lightly. Especially with something that dynamically updates your hosts file.

~~~
anned20
It won't be able to run if the user that is running it doesn't have the proper
privileges. You could even protect the files by giving them other permissions
so only the root user can use them.

~~~
sh-run
Above you mentioned setting this up as a scheduled job. In this case the job
would need to run as root (or you'd need to assign the appropriate permissions
in the sudoers file, but people are lazy). If a non-root user had write
privileges to the file, they could modify the script and thereby gain root
code execution.

Naturally it's on the user to properly configure the permissions.

I'm not saying this isn't a worthy project, I'm just adding to the discussion
on why people should be cautious when running scripts with root permissions.

------
daverobbins1
How is this different or better than Steven Black's project?

Repo:
[https://github.com/StevenBlack/hosts](https://github.com/StevenBlack/hosts)

~~~
anned20
To be completely fair with you, I didn't know this was out there. Then again,
I think my solution is more elegant, especially if all the todo's are
finished.

~~~
tempsolution
Patching a critical system file with python on a regular basis... What could
go wrong?

~~~
notatoad
I don't understand the aversion to python here. What is it about python that
makes it less reliable than any other piece of software?

Yes, downloading hosts files from 3rd-party sites is kind of sketchy. But
using python to do it is what you're worried about?

------
inlined
Since accepting Host files from someone on the internet can be dangerous I dug
into the code:

The list of hosts to exclude comes from several sites here:
[https://github.com/anned20/begoneads/blob/2c90fcee221edf71f8...](https://github.com/anned20/begoneads/blob/2c90fcee221edf71f870b1282bad3d25ac151488/begoneads/begoneads.py#L15)

The actual application of the hosts file is here:
[https://github.com/anned20/begoneads/blob/2c90fcee221edf71f8...](https://github.com/anned20/begoneads/blob/2c90fcee221edf71f870b1282bad3d25ac151488/begoneads/hostsmanager.py#L32)

I missed something though. Is a simple domain name per line enough to send
that content to /dev/null? I haven’t used that form in /etc/hosts.

My primary concern was that this technique could be used to send ad traffic to
a site that returns 404 but gathers metrics on the web regardless.

~~~
anned20
This is actually sent to the IP address 0.0.0.0, it roughly means that all the
traffic of the listed hosts is routed back to localhost

~~~
unfunco
In this context, it actually means a "non-routable meta-address used to
designate an invalid, unknown, or non-applicable target" [1] - 127.0.0.1 is
localhost, 0.0.0.0 is its own thing.

[1]: [https://www.howtogeek.com/225487/what-is-the-difference-
betw...](https://www.howtogeek.com/225487/what-is-the-difference-
between-127.0.0.1-and-0.0.0.0/)

~~~
anned20
Yes, you're completely right. Mixed those 2 up.

------
sherincall
I get that this is just someone's side project, I'm glad it exists and they're
free to write it in their favorite language/environment and all; but the
effort to actually run this is equivalent to actually copying the hosts files
manually, and I already have all the dependencies installed. I could never get
my non-techy parents to run this properly.

If the goal of the project is actual adoption, a native executable without
external dependencies would have been a much better option.

~~~
anned20
This is a todo, It's already on PyPI and I'm working on getting it packaged
for all the main distros of Linux/Windows and MacOS.

~~~
sherincall
I saw the todo item, but didn't realize it also included providing an
executable for Windows and macOS. Thanks!

------
barbecue_sauce
Anybody have a sense of the performance overhead of using hosts files versus a
detached hardware solution like a pihole?

~~~
NikolaNovak
My understanding is that difference is in scope, not performance.

Hosts files will only affect the host (workstation/desktop/laptop etc) they're
installed on.

Things like piHole try to make it easy to apply the solution to all members of
your network - which even in household cases these days can number in dozens,
making it impractical to manage hosts files for all of them (This includes
items like phones which are typically unfeasible to mess with hosts file).

~~~
ycombonator
It would be nice to see a performance hit based on the number of hosts
entries.

~~~
dredmorbius
About 3ms for 68k entries:
[https://news.ycombinator.com/item?id=20148457](https://news.ycombinator.com/item?id=20148457)

------
gregw2
I have cron jobs on my mac that update my hosts files (to block "addictive"
sites in my case (not ads)). It doesn't really work.

Browsers cache and use outside DNS servers despite the hosts files. Chrome and
sometimes Safari don't really honor the hosts files 100% of the time. Every
once in a while I google around to try and restore my control, try to tweak my
browser settings but I have yet to find anything that makes using hosts files
bulletproof.

~~~
mywittyname
I think firewall rules would be your next line of defense. I'm not sure how
configurable most home routers are though.

------
rafaelvasco
Reading the code one clearly sees why Python is so well suited for these kinds
of applications, one-shot script executables: Really nice string ops, regex,
file io etc. One of my favorite languages. The other is C# for everything
else, that Python is not that suitable for: Huge complex codebases, type
safeness, more strict performance requirements etc. Specially the static
typing. The dynamism and lack of type annotations of Python really bothered me
when I was developing a somewhat complex desktop app in it some years ago. I
guess I'm a static typing guy with optional dynamism kinda person.

~~~
misterdoubt
If you haven't checked back lately, type annotations in Python are getting
better and better. Built-in support via the typing module and a strong
community package in mypy.

------
bigend
If you let someone else manage the hosts your computer resolves, you are
trusting that someone as much as your ISP. A man in the middle.

------
ris
> You ran WHAT script on your machine?!

------
mehrdadn
Hosts files slow down the system as well as the browser itself. Get/create a
browser extension to actually block the request (at least while your browser
supports this) so you get immediate results.

------
firefoxd
I wish the hosts file could have an include directive. Since I regularly add
or remove entries, the file becomes a mess.

------
zactato
Serious question. We all realize that the economics of the internet is largely
fueled by ads, so why are we so keen to block them? It’s ad revenue that have
allowed technology to flourish so strongly over the last two decades.

~~~
dsswh
Not long ago, the economy was largely fueled by slavery. Yet we got rid of
that.

~~~
briandear
Not exactly Godwin’s law, but pretty close. Comparing advertising and
marketing to the ownership of human beings? Slavery infringed on the
inalienable of human beings, the existence of advertising doesn’t take away my
freedom or potentially subject me to beatings.

It’s a ridiculous comparison. I am not a friend to intrusive ad-tech, but
making a moral equivalence to slavery is to trivialize slavery. It’s like
comparing parking tickets to the death penalty.

~~~
dsswh
It's a valid comparison. Long ago it was ok to kill your enemy. Not long ago
it was ok to have slaves. Today either is a sure way to end up in prison.
Standards are rising. IT is a very new thing and the society and the laws are
behind a bit. Adtech uses this to extract profit while it can. But this will
end. Soon it will be a crime to store personal data: names, location, anything
like that. GDPR is just the beginning. Adtech will fight, but it will lose.
This business will disappear entirely, just like slave labor. In far future it
will be a crime to be intrusive: any unwanted ads; and mining personal data
will be seen like cannibalism today, i.e. even criminals will consider such
people as freaks. Right now we are in the era of wild west in IT.

------
joyjoyjoy
I use host flash: [http://host-flash.com/](http://host-flash.com/)

Does anyone know an up-to date list for blocking social networks?

~~~
DyslexicAtheist
Steve Black's hosts file ... just specify _" -e social"_ or _" \--extension
social"_ option, or use a _" myhosts"_ file to name your own domains for a
subset (e.g. all of facebook or whatever)

[https://github.com/StevenBlack/hosts](https://github.com/StevenBlack/hosts)

------
appleflaxen
how does the list compare to the pi hole hostfile?

------
jakeogh
Nice to see projects using click!

Here's another one to toss on the pile (works, I use it, supports wildcards,
*nix only):
[https://github.com/jakeogh/dnsgate](https://github.com/jakeogh/dnsgate)

------
hlau
I'll be the first to admit that the existing advertising ecosystem is broken,
primarily due to misaligned incentives across the board. But, given a choice,
would you rather have a clearly labeled thing that you know is an ad
transparently trying to influence you or a sneaky human billboard, err
"influencer" coming up to you with an agenda along with tons of product
placement in whatever you watch/read/listen to?

~~~
harry8
There's no either/or decision to be made here. You get compromised, paid for
content with our without ads as well. Critical thinking I'd a requirement
always.

~~~
hlau
There definitely is an either/or because blocking of one channel will
naturally necessitate money/barter flowing to the other channel. One is at
least transparent and regulated, the murky world of influence peddling isn't
since it's hard for anyone to tell in the moment whether something is
"organic" or not.

~~~
harry8
Not when the other channel is already at capacity. And it is. Blocking ads has
no effect on that. You never agreed to being tracked either, so blocking that
is the right and proper thing to do. Blocking surveillance capitalism might
push businesses toward honesty, it's at least with a shot.

~~~
hlau
If you think influencer marketing and product placement are already at
capacity you have no idea how much worse it's about to get if ad blocking gets
much worse. And the irony is that, by design, you won't know a good chunk of
the time and other times it'll just merely be implied without being explicitly
stated. Continued use of social networks, including this one, collects way
more identifiable data than what the non-Google/FB/Amazon ad market collects.
Ad blockers have had near 0 impact on FB's operations. Google and others have
paid to ensure that their search ads still make it through most ad blockers.

Blocking ads does not drive businesses to be more "honest". They'll just spend
more on PR and influencers. And given how hostile this community is to ads and
perhaps even marketing overall, (how YC ever backed a marketing or ad startup
is beyond me), companies already realize that getting a fawning TC article
purchased thru connections and favors and PR chicanery is going to be more
effective than ad campaign even though the ad campaign is more honest, upfront
and transparent with its agenda.

