
A Raspberry Pi as a decent residential proxy - AlexITC
https://wiringbits.net/wiringbits/2020/06/07/a-raspberry-pi-as-a-decent-residential-proxy.html
======
tzs
> Don’t expose the proxy to the world or attackers will be able to interact
> with your home devices

Or use it to get stuff on the internet that they don't want to get directly
themselves. Once upon a time I accidentally enabled open proxying on the web
server on my Linux box. This was before smart phones and tablets and such,
when my Linux box was the only network client I had, and so I didn't have a
NAT router. Just modem straight to Linux box.

One day I checked my Apache logs to see if my extremely low traffic web site
had any visitors.

First impression: "What the hell is all this traffic? This is orders of
magnitude more than I should have!".

Second impression: "Why the hell is someone trying to get horse_fucking.mov
from my server?".

Third impression: "That didn't give a 404. Why is my server actually providing
horse_fucking.mov?".

(No, the orders of magnitude extra traffic were not all people getting
horse_fucking.mov. There was a whole plethora of bestiality titles, plus a lot
of other porn that you would not want to fetch directly from your own IP
address either out of embarrassment if you got caught or because it was almost
certainly illegal).

~~~
oefrha
I once accidentally left privoxy open on one of my Linux servers and
misconfigured ACL. A couple hours later it started getting hit with loads of
traffic. A Google search quickly revealed that my ip:port ended up on several
“free proxy” lists. They probably found it on Shodan or something. I quickly
fixed acl to drop the traffic, but IIRC my server persisted on the free proxy
lists for a couple days afterwards.

That was the first hand experience that confirmed my suspicions of what those
“free proxies” really are.

~~~
AlexITC
Actually, you are giving one advantage to the custom-made proxy, it's not
easily discoverable by bots, nor it could be used right away.

~~~
heavyset_go
Just using non-default settings for your proxy would be enough to thwart 99.9%
of automated attacks, and you wouldn't have to rely on a hand-rolled proxy.

~~~
Topgamer7
This is the same for ssh. If you want to decrease log size, change the port
used away from default.

------
iamphilrae
I run a number of news website scrapers for various projects. Originally I ran
them all on AWS EC2 instances and racked up sizeable bills. Recently though, I
pulled an unused Raspberry Pi from my desk drawer, installed them all onto the
Pi, and now run them for near free from my desk! Granted I need to be a little
bit more efficient with memory and processor cycles, but that’s a fun little
challenge. I find if you respect the websites in terms of not hitting them
multiple times a minute (or only once an hour which I do) then you’ll very
seldom get banned.

~~~
city41
I did the same thing when I was running a scraper. I throttled the requests
quite low to be respectful so it took a while to gather all the data, but it
worked just fine.

For this article, does the OP feel they need the proxy due to scraping
rapidly?

~~~
AlexITC
The problem isn't the speed, if it were, I would certainly use something more
powerful than an old Pi.

The actual problem is that some websites block IPs from known cloud providers,
so, it doesn't matter which headers you use because what's blacklisted is the
IP address.

~~~
city41
Myself and the person I was replying to ran our scrapers from our homes. One
benefit being the IP address doesn’t get flagged. Curious why scraping from
your home network doesn’t work for you?

~~~
AlexITC
Actually, that's what I'm doing.

In the post I detailed that I run a service
([https://cazadescuentos.net](https://cazadescuentos.net)) which is a browser
extension to look for discounts.

Sometimes, such service needs to look for the actual discounts, it queries the
actual stores from the server that's on the cloud, but some stores doesn't let
it go.

So, I mounted a Pi at home that resolves only those problematic queries, I
expose the Pi at home to my cloud server by using a SSH tunnel.

~~~
city41
Yeah I saw that in your article. I’m wondering why bother with the cloud
server? Why not scrape all stores from home?

~~~
AlexITC
The latency with the Pi is too high to handle all stores, I could easily do
that if I was using a more powerful device.

~~~
city41
Ah I see, cool thanks for clarifying. That does make sense.

------
vivekf
Have you looked at running wireguard on your server and connecting to it from
the pi. Then you have a vpn between server and the pi where you can send data
in any protocol

~~~
bonyt
I do this to expose services to the internet in a limited way. Wireguard link
between a pi running, say, octoprint, and a vps. Then, the vps is running
nginx or caddy as a reverse proxy over that wireguard link, giving me https
access and even letting me add basic auth if I want another layer of
authentication.

~~~
kawsper
It's a shame that client-certificates is implemented in such a clunky way in
most browsers and operating systems, because that could also be an easy way to
achieve this even without installing anything.

------
ysleepy
Why not just use the SSH dynamic SOCKS5 proxy?

Just ssh -D 12345 raspberry and then just use localhost:12345 as the scrapers
SOCKS5 proxy. No need to involve a http proxy.

EDIT: AAh, ssh port is not accessible from outside and the raspi does a
reverse connect.

~~~
AlexITC
That can certainly work.

A reason to writing the custom-made proxy is mostly to evolve it supporting
several pies, the tunnel being the simpler workaround for now.

Also, I'm a coder and wanted to try getting Scala running on my old pi.

------
donaldihunter
For more reliable ssh tunnels you can use autossh.

[https://www.harding.motd.ca/autossh/](https://www.harding.motd.ca/autossh/)

~~~
AlexITC
The first time I hit an issue, autossh was the first proposed solution but I
preferred to not include more dependencies. Also, I believed it should be
possible to handle this with plain ssh + systemd.

Do you have any experiences with autossh/ssh to share?

------
hjb
That is a very retro raspberry pi :-D

~~~
AlexITC
Indeed, I was planning to buy one until my wife complained that I already
bought one years ago, it was worth trying.

------
tails4e
Is there a solid solution to the SD card problem? Every Rpi I've tried to use
as a server eventually kills its SD card, probably as running an os on SD is
destined for failure with write wear. I'm guessing using SD for boot only and
then using a home NAS would be an option, but I've not come across a dummies
guide to making a Rpi reliable long term.

~~~
deeblering4
Spending a little extra on good quality SD cards helps both durability and
performance.

Also making a backup image of the SD card once configured and using a NAS or
external disk for important info helps mitigate the data loss risk, and gives
you a quick path to restore onto a fresh SD.

With that said, I have a few raspberry pis that have been running for years
without SD failures.

Also, make sure swap is not enabled on the SD

------
Snawoot
> Surprisingly, 256 MB in RAM have been enough to maintain the app running for
> months until now.

What is total RSS of all your proxy processes?

~~~
AlexITC

        $ free -h
                      total        used        free      shared  buff/cache   available
        Mem:          432Mi       194Mi        26Mi        20Mi       211Mi       164Mi
        Swap:          99Mi       2.0Mi        97Mi
    
    
    
        $ top
        Tasks:  69 total,   1 running,  68 sleeping,   0 stopped,   0 zombie
        %Cpu(s):  7.3 us,  4.3 sy,  0.0 ni, 88.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
        MiB Mem :    432.4 total,     26.2 free,    194.7 used,    211.5 buff/cache
        MiB Swap:    100.0 total,     98.0 free,      2.0 used.    164.7 avail Mem
    
          PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
         2915 pi        20   0  299.9m 173.8m  13.6m S   1.6  40.2 447:45.82 java
        32009 pi        20   0    7.5m   2.6m   2.2m R   1.0   0.6   0:00.31 top
        31991 pi        20   0   11.9m   3.8m   3.0m S   0.3   0.9   0:00.10 sshd

~~~
boring_twenties
Don't forget to set gpu_mem in config.txt. It looks like you are running with
the default, which is 64MB for the GPU. I'm guessing you are not using the GPU
in this application and should reduce this value to the bare minimum (which I
don't recall, sorry)

~~~
AlexITC
Thanks for the hint, I would be surprised if this old Pi actually has any GPU.

~~~
boring_twenties
They all do, in fact the GPU is required to boot the CPU. In your free output
it is clear that 64MB is assigned to the GPU, as per default, because it says
you only have 432M total (available to the CPU).

You can gain most (not all) of that 64MB back for free, simply by running
`echo gpu_mem=1 | sudo tee -a /boot/config.txt && sudo reboot`

(Pretty sure that the minimum is actually more than 1, but also that it will
accept that and set it to the minimum... check the docs if you need to be
sure)

~~~
AlexITC
Thanks for the details.

------
Ralo
I setup an Openvpn server on my LAN, with username/password credentials so I
can access my LAN from anywhere. I have multiple vpn configurations for my
cell phone/laptop to either map only 192.x through the VPN or all traffic
through it. It's very very useful, and seems like a much more secure option in
my opinion.

~~~
AlexITC
Can you please elaborate on why it's more secure?

------
amaccuish
I still keep a server running in my basement doing WireGuard since Netflix et
al. all block cloud IPs/ASNs

------
yozel
Why not use a normal HTTP proxy like Squid? It would be more easy to use it
since almost all languages and clients support HTTP proxy built-in.

~~~
AlexITC
I evaluated Squid which was my first option, due to the existing code on the
project I integrated the proxy, it wasn't that straightforward to interact
with Squid.

I'm still thinking to use several Pies and it's simpler to evolve the custom-
made proxy to support that, if you see the Github repo, it's pretty simple.

Last, getting a Scala project to run on the old Pi was a nice experiment worth
trying.

