
My home lab setup for highly-available Internet - bradfitz
https://github.com/bradfitz/homelab
======
mansilladev
I also have redundant WAN at my house, slightly less sophisticated. Comcast
(primary) and U-Verse (backup) on separate modems (wired only, no WiFi). When
an outage incident occurs, and it gets escalated, I received a page (iMessage
from family member, "Dad, the WiFi is down!"). If I'm away from the NOC/DC, I
call the DC remote hands support line (call onsite family member), and have
them perform a hard cutover ("go to back of the device with the antenna
thingies, disconnect the BLUE cable and plug in the YELLOW cable").

I do have a UPS on the modems and main access point.. but after reading this
post, I may invest in diesel generator and a 5,000 gallon subterranean tank.

~~~
chrissnell
OP is using CenturyLink fiber. I don't know if things have improved in the two
years since I moved from Tacoma, WA where I had it, but it was dreadfully
unreliable back in 2016. The unreliability wasn't caused by the fiber drop
itself but rather, by a super shitty oversubscription issue up in their
Tukwilla/Seattle exchange.

Their IPv6 situation was even worse. They used 6rd and I swear, the
translation box was probably a single router or Linux box with a 100 Mbit NIC
in a rack somewhere. If you bothered enabling 6rd, every v6 site would be
awfully slow. Even the browser projects to automate the selection of v6/v4
didn't help.

When I finally moved away and cancelled the service, I mailed my modem back as
directed. A few months later, they sent my account to a collections agency
over the cost of a modem, which their system claimed to have not received. I
spent hours on endless phone calls but ended up just paying them the $250 or
whatever to save my credit and stop the madness.

Seriously, they were the worst provider I ever had.

~~~
c22
I have Century Link fiber in Seattle and the internet experience has been
good. They do keep charging me for 2(!) modems, though--one that I mailed back
and one that I never had. Every six months or so I call them up and they
credit the erroneous charges back to my account and remove the modems.
Invariably the modem charges show back up 1-3 months later. I'm pretty sure
this is some sort of procedural dark pattern meant to rip off everyone
audacious enough to bring their own modem but not routinely check their bill.

~~~
chrissnell
I wish some attorney would initiate a class action over this. Attorneys file
class action lawsuits over all sorts of petty charges all of the time and win.
Reading these comments, this certainly feels like a pattern. The dollar
amounts involved are not small, either.

~~~
justinph
The Minnesota Attorney general did: [http://www.startribune.com/minnesota-
attorney-general-sues-c...](http://www.startribune.com/minnesota-attorney-
general-sues-centurylink-over-billing/434082273/)

------
SpaethCo
Whenever I see solutions like this I think back to an org I worked at where a
high-visibility day-long database outage gained upper level management
attention. The response, after the managers talked to our vendor (IBM), was to
re-architect everything to use HACMP clusters for all of our production
databases company-wide.

That was followed by a couple years of 100+ hour/year cumulative outages due
to HACMP stability issues, and an environment that everyone was deathly afraid
to touch.

The hardcore network engineer in me appreciates the detail in these kinds of
solutions, but these days the practical side of me is satisfied with usability
and maintainability of SPOF cable access with a manual failover to mobile
hotspot on the rare occasions that drops offline.

~~~
larkeith
There's a reason Arthur C Clarke's short story Superiority was once required
reading at MIT [1].

[1]
[https://en.wikipedia.org/wiki/Superiority_(short_story)](https://en.wikipedia.org/wiki/Superiority_\(short_story\))

~~~
Aloha
[http://www.mayofamily.com/RLM/txt_Clarke_Superiority.html](http://www.mayofamily.com/RLM/txt_Clarke_Superiority.html)

Link to the story online.

~~~
ddalex
EU would like to have a word with you.

~~~
Aloha
Me, the person who put it online, or both of us?

------
fencepost
I look at that and all I want to do is raise my eyebrow. That's like water
cooling Celerons or heavy tweaking of Honda Civics - you're not doing all that
for redundancy, you're doing that as a hobby and redundancy (or speed) are an
excuse.

I've set up ISP redundancy on my home network before, I should probably test
to verify that it still works after my update some months back. It's a truly
high-tech solution: A Netgear WNDR3700v2 router (5x Gigabit, dual-band, circa
2011) running LEDE (previously OpenWRT).

It's not automatic, but I can set it to act as a wifi client, so if my regular
Internet goes down I can simply connect into the router, connect to a phone
hotspot, and continue providing internal network access. I don't recall if
it's able to act as both a client and an AP on the same frequency at the same
time, but since my wife's Kindle and Chumby are the only 2.4-only devices in
the house I'm not really that concerned about it either.

And yes, the Chumby does still work though it's just a clock these days.

~~~
scarejunba
It's clearly intended to be for enjoyment and practice.

Like the guys who make videos of sharpening a grocery store knife to an atom
width.

~~~
jeauxlb
that sounds interesting, link please?

~~~
scarejunba
With pleasure. The 'atom width' is hyperbole from me. Sharpening a $1 knife:
[https://www.youtube.com/watch?v=7dFFEBnY0Bo](https://www.youtube.com/watch?v=7dFFEBnY0Bo)

And maybe you'll find this interesting: Sharpening a wooden knife:
[https://www.youtube.com/watch?v=kKH63_r0OCA](https://www.youtube.com/watch?v=kKH63_r0OCA)

~~~
fencepost
If anyone is still reading this, the $1 knife video is from JunsKitchen and a
bunch of his other videos are great as well. I think I'd have to call them
foodie porn.

And his cats are remarkably well behaved.

------
isostatic
Good move on having not just two WANs, but two technologies. I've seen setups
before where people have had two wans, from two different ISPs, but both
cables ran down the same duct in the road. Single digger took them both out.
It would be a pretty severe problem if fibre and wireless goes at the same
time!

I assume you're not running a full BGP handoff to each ISP, so any existing
sessions will die should your WAN die (as your lan get natted behind a
different IP address). Presumably your nat state will move over in the case of
router failure as it's a floating VM of some sort, so what's the failover time
for each component? How does it compare to using say VRRP?

How are you detecting ISP failures -- are you pinging beyond the next hop, or
are you assuming if you can ping/arp the upstream router, it's working? I've
had failure scenarios with ISPs where the next hop works, but nothing past
that.

What benefits are there of tcpproxy over something like nginx (for http/s) or
dst-nat (for other connections)?

It looks like all your traffic defaults to WAN1, and only uses WAN2 in certain
cases. Do you have the ability to send traffic for a given client to WAN2 by
default?

What type of queuing are you using -- can 1 client hog all the bandwidth?

And finally, what keyboard layout is 6 above N?

~~~
halbritt
IIRC, the Unifi stuff as well as Meraki will do multiple ISPs. They do
outbound NAT, and have a liveness check which is just a ping sent to the next
hop. Ping fails, or the interface goes down and the device simply sends the
traffic the other direction. Any established TCP sessions simply fail, but any
new traffic will failover just fine.

I'm using this setup in my office. Easier than finding a last-mile type ISP
that supports BGP.

~~~
isostatic
If you use the same ISP you can probably get a routing working. But you're not
going to get your own AS for a home network, even if you find an appropate ISP
to provide you transit.

Next hop checking isn't always good enough. I had a 7 minute outage on one
line last week, next hop was fine, but outside the ISP network it all fell
apart.

~~~
jlgaddis
> _But you 're not going to get your own AS for a home network, even if you
> find an appropate ISP to provide you transit._

ARIN, at least, will happily assign you an ASN assuming you 1) meet the multi-
homing requirements and 2) pay the bill for it.

~~~
isostatic
Presumably the requirement includes having a couple of ISPs advertising your
IP space, which I assume means having a /24\. Can you still get those easily
from ARIN?

~~~
namibj
The European version of ARIN allows IPv6-only networks. /24 cost you about
3-6k$ each, depending on if you can spare a month to get a good price or need
it announced tomorrow morning for your AS.

------
8_hours_ago
If you want to feel more inferior about your home lab,
[https://www.reddit.com/r/homelab](https://www.reddit.com/r/homelab) is a good
source of safe-for-work porn and information on over-engineered setups.

~~~
jlgaddis
~10 years ago, I had a completely full 42U cabinet in my house, along with
another 8U or so of gear and several devices that aren't measured in RU's
(access points, Cable and DSL modems, VoIP phones, etc.).

Most of the gear was used for lab scenarios and such for various (Cisco,
Juniper, et al) networking certs and was (mostly, but not completely) isolated
from my "real" network. IIRC, I had ~35 VLANs at one point.

My extremely over-engineered home lab certainly served its purpose but I think
I spent as much time maintaining it as I did actually _using_ it, although it
really came in handy for building out PoCs for projects I was handling at
$work (my test/lab network at $work wasn't nearly as well-equipped as my home
lab was!).

For the last several years, though, I've managed to get by with a single
subnet that is shared by everything -- a few laptops, a couple desktops, a
server hosting the handful of obligatory VMs, and, of course, the various
phones, tablets, and streaming devices that are ubiquitous in all of our homes
nowadays.

Just within the last few weeks, however, I've acquired a new server (2 x
10-core Xeons, 256 GB RAM, 4 "Enterprise" SSDs and 12 "Enterprise" HDDs (600
GB 15k SAS)), dug a couple switches out of storage in the garage, replaced my
Internet router with a small industrial box running OpenBSD, and started
building out a few more subnets for proper separation of various devices (I've
twice been offered a 42U cabinet recently but, thus far, managed to say no!).
Like probably most HN'ers, I've got a few VPSes spread out here and there as
well. Finally, I've got a decent (but was over-built) 2U box in a rack at
$work ($work == ISP) that I am planning to use to tie all of this together
(using Wireguard, of course).

Yes, I'm fully aware that I'm in the beginning stages of a relapse. After
these upcoming changes, however, I don't intend to "grow" this lab much larger
(although this kinda stuff does just creep up on you sometimes).

~~~
Jaruzel
You are not alone my friend.

I used to also have a 42U cabinet in my garage for several years. It housed a
bunch of servers, mostly Dell poweredge but also some no-name boxes, plus some
switches and other miscellaneous gear.

The power draw was too strong for my poorly garage circuit and after any power
outage I had to power up the rack one device at a time - it was a massive
pain. I also spent WAY too much time tinkering with it all, instead of
actually using it in anger. Sure, it help me immensely doing PoCs for work or
for my own learning, but it was always overkill. Funnily enough though, every
other tech-head that saw it was envious, until I started detailing the horror
stories of keeping it all running.

Thankfully Virtualisation became a usable and affordable platform for
tinkerers, and I migrated everything (via a streamlined custom P2V process) to
ESX, then later on migrated/rebuilt the VMs over to Hyper-V.

I now just run 2x Tower servers (HP 8xxx series workstations - dual Xeon
based) and run 20+ VMs on each. Plus a single NAS for file storage. Life is so
much easier... and the Garage is so much quieter.

------
gerdesj
Good stuff. However - only one Linux router (VM) which means that you can't
upgrade it and reboot without loss of service. The way around that is two VMs
and VRRP or similar and a lot of very complicated NAT and firewall rules.

Out of the box, pfSense can do multi WAN _and_ CARP (similar to VRRP)
clustering. At the office I have two older servers with lots of NICs and five
WANs. Inbound redundancy is provided by dynamic DNS and SRV records etc. Note
that to do CARP/VRRP, you do need at least a /29 IPv4 allocation. You need an
address per box plus the virtual one that is actually used by services.
PPPoA/E is harder to deal with than cable/leased line etc but it turns out
that low cost Billion 8800NLR2 can do external IPv4 pass through as well as do
the PPPoA/E. They will need an address as well from your range. You need
something like them in this case because only one device can be the PPPoA/E
dial up system at a time. Unless you have some very fancy secret sauce, your
clustered routers' pppd or whatever are going to get confused as to who does
what.

I notice you have a cloud key. Unifi on an Ubuntu VM is easy, and much easier
to backup and snapshot before upgrades, so is safer. You can also front it
with HA Proxy for simple URLs and perhaps Lets Encrypt. pfSense has a HA Proxy
package with a GUI and I believe it is CARP friendly as well ...

~~~
chrissnell
Unfortunately, OP is using Centurylink fiber. It's been a few years since I
lived in Tacoma, WA and used this service but it's something like PPPoE over
VLAN. There was a FreeBSD bug a few years back where PPPoE was ridiculously
slow when running on top of a VLAN interface. OpenBSD did not have this
problem, which is why I ran that for a firewall instead of my preferred
pfSense.

~~~
gerdesj
I have four FTTC (PPPoE/A) WANs and a BT (UK) leased line at work. The FTTCs
are 80/20Mbs-1 and the leased line is symmetric 100Mbs-1. I've put all five
WANs down a separate 802.1q VLAN. Each of my routers has one physical NIC
(Intel 1Gb) dedicated to WANs. The other nine NICs, each, are for internal
VLANs.

I use Draytek 120 or 130s modems for single ADSL or FTTC connections but for
CARP clusters, I use Billion Bipac 8800NLR2, so I am not doing the PPPoA/E on
the pfSense boxes. The Billions are able pass through bits of a /29 and do the
PPPoA/E themselves - the only cheap router (~£60) I've found to do this.

I've been running this thing for about four years now. PPPox is a complex
beast and there are a few things to look out for such as MTU. PPPoE imposes an
eight byte overhead (hence 1492) and back in the day some ill advised auth
mechanism required setting a 1458 byte MTU. Apparently, some BT kit supports
mini Jumbo frames of 1508 bytes which means that you could set your MTU to
1500 instead of 1492 - good luck with that as a rule of thumb. $DEITY only
knows what an ISP in WA has arbitrarily decided to mandate. Here in the UK we
have a near monopoly for the infrastructure but lots of providers that use it
and so it should be simple. To be fair, I bet you don't get docs like this:
[https://www.btplc.com/SINet/SINs/index.htm](https://www.btplc.com/SINet/SINs/index.htm)
(498 is FTTC)

Anyway, if you are happy maintaining your firewall rule set manually then
crack on but nowadays it is hard to do that. pfSense has a _lot_ of quite
vociferous users who kick the tyres on a regular basis. It even looks quite
pretty these days - all bootstrapped up and stuff, the red thing is long gone.

~~~
jlgaddis
> _The FTTCs are 80 /20Mbs-1 and the leased line is symmetric 100Mbs-1._

What's the "-1" in "80/20Mbs-1" and "100Mbs-1" signify? I've never seen this
"syntax" or formst used before but maybe it's an EU/UK thing (I'm in .us,
FWIW)?

~~~
F_r_k
OP has a scientific background.

-1 is meant as "to the power of -1". Thus, s-1 becomes 1/s, and the entire thing Mb/s

Never seen that either

~~~
gerdesj
_OP has a scientific background._ \- LOL - I have an HND (technician) in Civil
Engineering and I am now the MD of an IT consultancy (obvs). I picked up the
habit of using s-1 etc when studying Physics 'A' level (UK) many, many moons
ago. Not too sure why I persist with it these days but I dimly remember liking
the fact that you can use basic arithmetic on superscripts. To be fair I
should put s^-1 but s-1 is reasonably obvious.

------
starefossen
Only thing missing is a chaos monkey to randomly power down devices to make
sure everything still stays available.

~~~
jdboyd
There is a child present.

~~~
dopamean
The original chaos monkey.

------
halbritt
Nice setup, but we can all pretty much agree it's overkill for most. My ISP is
fairly reliable and outside of infant death, most network elements have a
pretty long MTBF.

I run a similar set of WiFi gear. I've a couple PoE powered Unifi UAP-AC-Pro
spread around the house, all connected to an 8-port Unifi PoE GigE switch.
Routing is done with an EdgeRouter lite, which as it turns out is capable of
line rate GigE.

I have a low power industrial computer with 4 cores and 8GB memory that runs
various services mostly via docker or vagrant. It consumes about 12w.

It's all powered by a 750VA APC SmartUPS. I get almost an hour of runtime on
the internal batteries. I may add some external batteries at some point, but
most power outages in my area don't last longer than 20-30 minutes.

~~~
saalweachter
Power outages are fairly common in my locale so that's what I've primarily
optimized for. Cable modem + WiFi hub on one UPS, desktop on another. Desktop
stays on through short (<15 minutes) outages, wifi+internet for 3-4 hours.
Power is still my primary point of failure, with probably 1-2 days of outages
longer than 3 hrs per year, although in many of those cases the cable will
also go out.

~~~
GordonS
> 1-2 days of outages longer than 3 hrs per year

Not trying to be a dick, but does that count as "fairly common"?

~~~
saalweachter
I mean, it's not common enough for me to spend the money on a backup
generator, but it's common enough that you need to at least consider how long
you can stay in a house without power, in what weather. Eg, my house will stay
warm enough that I don't need to worry about me or the pipes freezing after
1-2 days without power in the winter [although it gets quite nippy after ~18
hours]; if the power is off for more than 3 days in the summer I need to do
something or the chest freezer will defrost enough to spoil).

Shorter power outages are more common; 10-ish power interruptions of less than
~2 hours per year.

It's not like, developing or failing nation bad, but it's not great,
especially when the problem is always "a tree limb fell on a wire".

------
hackerpacker
Everyone has different needs of course,

My home setup:

hardwired all the desktops and a few access points via cheap 1gbit hardware
(literally found some at the thrift store/ebay), usually using tomato/shibby.

have a backup router.

battery backup on main routers/modem.

large external battery wire nutted to my desktop UPS.

NAS is an old laptop with battery intact, doubles as second display/machine.

use my phone via usb on my desktop if all else fails.

total cost, probably less than $100.

Oh, and I use a $5/month server for stuff that absolutely needs to be on full
time. Otherwise the only external access is me occasionally remoting into my
desktop and I am happy to stop and smell the flowers if that is interrupted
briefly.

~~~
chx
I have an even simpler setup: if my cable connection dies, I simply tether my
phone to replace it. There are no UPSes because both the laptop (TP25 w/ 24 +
72 Wh batteries) and the phone (it's a Moto Z Play with a battery mod) have
large enough batteries to last much longer than a domestic blackout in
downtown Vancouver.

My laptop is enough for me to stay productive (it's a ThinkPad 25! _very_
productive). Everything that needs to be online is on a Hetzner server I rent
for all sorts of purposes so the 51 EUR monthly bill kind of spreads out.

~~~
hackerpacker
I've been there, splurged on an alienware 17 a while ago, but mostly I only
use it on the road now.

I went with desktop because I wanted everyone in the house to have a decent
machine and I could get several I5s for less than $70 apiece (5 machines, one
in each bedroom) and wanted easy/cheap upgrades for some of them, and they are
all the same optiplex model, which makes my life easier.

I like my desktop setup a lot though, 3.3ghz I-5, 27" 1080, 16 gig ram, 1tb
ssd, 8tb in "cold storage", g402 mouse, gt710 vid, clicky keyboard, Nubwo N2
headset, decent posture, 100+ fps gaming. Probably threw $500 at it above the
initial $70 though, but most of the machines didn't get that treatment, but
their users aren't using it to make a living either.

~~~
chx
Posture wise I am normally using a Matias Ergo Pro mounted vertically and an
Evoulent Verticalmouse and of course an external monitor. But, in a pinch /
short travel I can just work on laptop. I tried desktop before but since
everything I work on needs to be on laptop too, the necessary sync becomes old
quick.

------
Johnny555
Fun solution, but seems like overkill for just about every home user.

I used to use a dual-WAN setup with cable modem + DSL backup. It worked well
with automatic failover. I use a pfSense APU based router and, with no moving
parts, it's been very reliable, nearly 4 years without any unscheduled
downtime.

Then I moved and only had a single ISP to choose from, so my backup is to
manually turn on a Wifi hotspot. I thought about using a cellular router with
ethernet or a wifi connection to the hotspot for auto-failover, but it just
wasn't worth the time and/or money to set it up -- if I'm home when the
internet goes down, I can just switch to the hotspot, if I'm not home, then
all I really lose is the ability to control the lights and thermostat
remotely, not exactly a critical function.

~~~
zf00002
> seems like overkill

I think that's quite the understatement. The thing that really stands out to
me is the claim that all of that is only drawing 220W at idle. I'm curious if
he means truly idle, like literally just booted up and not doing anything at
all, zero traffic, etc. Or if that's the draw with stuff actually being used.
Because 220W just for your home network is hilarious. I mean I feel dumb often
because my little pfsense box pulls about 15W.

------
kbenson
This was as all fairly straightforward to implement a decade ago on cheap
hardware and cheap switches running OpenBSD on pair of ALIXs and pair of semi-
cheap net gear switches. Full firewall and VPN fail over using pfsync and
sasync, IP failover with CARP.

You can do load balancing using PF as well, which is what we were mostly
offering, cheap fault tolerant hosting for colocated customers.

~~~
bradfitz
Much of this exercise was me playing with Ceph, which is pretty impressive.

Having VMs float around with shared storage makes complexity elsewhere go
away. i.e. I don't need to deal with CARP, VRRP, etc.

~~~
kbenson
Yeah, I noticed it was floating VMs, which is an interesting way to go. On one
hand, it's less parts to go kaput, on the other hand, those parts need to be
more robust.

The main thing that might make me shy away is the added exposure at the edge.
If the VM hosting is dedicated to just the network failover/firewall, it seems
wasteful, and if it isn't it seems unnecessary exposed.

The only other thing I'm not sure of, since I'm not too familiar with AL the
VM solutions nowadays, is whether an actual hardware failure of the active VM
hardware allows seamless failover (which you do get with what we were doing
back in the day).

Edit: although, it's not hard to emulate the stuff we were doing using some
OpenBSD virts on those two boxes, which even if they don't support full
hardware failure with the current setup they then would. Since you're playing
with the for fun, you might be interested in trying it. If you find OpenBSD
intimidating, you can use pfsense to do the same, which is a dedicated GUI
configured FreeBSD distro that offers much the same (there were some CARP
implementation differences/bugs in FreeBSD way back, but I think they got
fixed up long ago).

------
peterwwillis
Some alternatives:

* Cantenna/laser link to a house some blocks away to avoid local WAN link disruption

* For less performance-intense networks, remove the physical impediments: 2 routers, each with 1 APC, connected to 2 separate power circuits, connected to 2 WAN links, providing 2 radios each. No switch to go down or cables to trip over, redundancy of access point, redundancy of frequency/radio, redundancy of WAN link, redundancy of power. Hardware-wise this is pretty cheap and still highly available. If the routers are cheap, use a hardware watchdog.

------
pedrocr
I also thought having everything on UPS would allow me to keep an Internet
connection during a power outage. Turns out that when the power goes out so
does my ISP. Having a second ISP on LTE or Wifi like this setup may or may not
be enough to fix that.

------
comesee
Looks like a pretty resilient setup... But can it handle an Ethernet pause
frame broadcast flood
[https://github.com/nwholloway/mpcp](https://github.com/nwholloway/mpcp)

------
daxorid
Very cool configuration.

I attempted something similar to this in a 20U cabinet some time back. The
biggest issue is the fan noise that 1U form factor servers and network gear
produce, with their rather high RPMs. One can hear the noise across the other
side of the house.

We've since switched to fanless network gear and ATX form factor servers with
large diameter fans to keep the family happy. It definitely doesn't look as
nice, though.

~~~
isostatic
You can get pretty much the same result from a couple of fanless routers
(mikrotik, something running ddwrt, etc) -- resilient against hardware
failure, power failure, and wan failure.

Not as cool though, and clearly not running any servers, but that's what
things like AWS or Linode are for -- or for low power stuff, something like a
fitlet [0]

[0] [http://www.fit-pc.com/web/products/fitlet/](http://www.fit-
pc.com/web/products/fitlet/)

~~~
chrisper
>but that's what things like AWS or Linode are for

If your home is directly connected to their datacenter...

Not everyone has 10 Gbit upload with best peering!

~~~
isostatic
Yes, for home server use yes, I was thinking of public facing servers.

I'm happy with a QNap as the only home server I need.

------
halbritt
> I love Ceph so much...

Clearly hasn't been bitten by it, yet.

I mean... I love Ceph, too, but I don't ever want to run it again.

~~~
Uberphallus
Can you elaborate?

~~~
halbritt
Sure. It's an extraordinarily complex system that's difficult to engineer
correctly. It provides extraordinary durability, but the radius of failure
isn't obvious. Pro tip, it's the entire cluster. As such, an issue with an OSD
in one pool could potentially cause the entire cluster to have issues.

Recovery is difficult and there's no support unless you have a subscription
from Redhat and also run RHEL plus their stable distro of Ceph (RedHat Storage
or whatever). IIRC, they quoted me $90k for a petabyte of raw disk.

I haven't messed with it much in the last couple of years. Bluestore looked
really promising. I've thought about taking a look at rook, but haven't yet.

If I were in a position to deploy a bunch of storage on bare metal again, I'd
likely go with ceph. I do know that $GLORIOUS_FORMER_EMPLOYER ended up making
the migration to ScaleIO and report being happy with it and having good
performance.

~~~
Uberphallus
That was insightful, thanks!

------
galeforcewinds
Have you also given yourself a mobile equivalent for those times when you are
traveling, or when your primary environment is unsuitable and you must work at
a place with public WiFi?

------
tvanantwerp
Neat! But to be honest, it's way more than I'd ever invest in a home setup. I
manage an entire office of ~30 people with much less redundancy than this!

------
llama052
Couldn’t all this complexity be replaced with a ubiquiti edgerouter or a
prosumer router that’ll balance the links for you?

This is more of a homelab tinkering setup to learn.

~~~
voltagex_
Heh. Ubiquiti is complexity - you really have to use all their kit to get the
benefits.

------
w8rbt
Awesome setup Brad. I wish I had a tenth of that speed. I have Verizon DSL
(1.5 Mbit Down and 700 Kbit up). They advertise it as 3 down and 1.5 up, but
I've never seen that. That's the best I can get in rural Virginia. I do use
SQM on a Ubiquiti Edge Router X to fix buffer bloat, so latency is very good.

And thanks for all the Go code. It's awesome! I'm building 1.10.3 on an old
Beagle Bone Black right now ;)

~~~
GordonS
It boggles my mind that I can get 80/20 fiber in semi-rural Scotland, and so
many Americans are stuck on really crappy DSL connections!

------
jradd
I've worked for a company that had similar storage and VE. ProxMox on MooseFS.
I would prefer Ceph, but they are both pretty sweet! Awesome Lab!

------
textmode
"Past failures

I used to use a Soekris net6501 as my home gateway, _but its CPU maxes out NAT
'ing about 300 Mbps_, sadly, so I started looking at alternatives when I got
Centurylink fiber.

I used to use a _UniFi_ Security Gateway Pro but it failed one day and
wouldn't power on any more. Dave had a backup for me handy, but _the Unifi
controller software_ wedged itself and wouldn't let me remove the old (dead)
one ..."

There is much adoration of Ubiquiti _hardware_ on forums and message boards. I
do not doubt for a moment it has been well-deserved.

However, I have a question about the _software_. I would like to use own
kernel and custom utilities.

If I understand correctly, installing one's own choice of OS on Ubiquiti
hardware is not always possible and even if successful it carries a penalty in
terms of performance versus retaining the Ubiquiti pre-installed proprietary
OS.

Soekris made it easy for the user to install the OS of her choice. Tradeoff:
More user "control", but a slower router.

The question is: Are there other alternatives to Soekris that can exceed
300mbps and allow for user-chosen OS?

This is another line of (faster) routers where the vendor has allowed for easy
installation of user-chosen OS.

[https://protectli.com/product-comparison/](https://protectli.com/product-
comparison/)

There are comments in some other forums and message boards about these
computers but I have not seen this company discussed on HN before.

Note the website claims models FW1, 2 and 4 have no Intel ME, SPS or TXE.

[https://protectli.com/kb/intel-management-engine-
vulnerabili...](https://protectli.com/kb/intel-management-engine-
vulnerability-update/)

~~~
voltagex_
Hey textmode. I'm still very very new to this - I jumped from the Turris Omnia
[0] to the whole kit and kaboodle of Unifi gear.

I don't think Intel ME is at the top of my threat model - by the time
someone's using that kind of stuff on me I'm screwed anyway. I do, however,
pay insane prices for power (28-34 cents AUD per kWh). This has pretty much
meant I look for ARM and MIPS devices everywhere, but the latest gen Intel
stuff is looking good.

I hadn't seen those Protectli boards before and they look quite cool - I'll
keep them in mind. At full tilt, it'd cost me about $85 AUD per year to run.

If Marvell ever open sources the switch drivers for the Espressobin [1] [2]
then that may be an option to exceed 300mbps.

0: [https://omnia.turris.cz/en/](https://omnia.turris.cz/en/)

1: [http://wiki.espressobin.net/tiki-
index.php?page=Topaz+Switch](http://wiki.espressobin.net/tiki-
index.php?page=Topaz+Switch)

2: [http://espressobin.net/](http://espressobin.net/)

------
kqr
I think the redundant outlink dwarfs all other improvements mentioned here.
All but one of the incidents in my home have been due to ISP or optical fibre
company issues. (Which is not surprising -- they have many more miles of
cabling to maintain than I do.)

------
kevin_b_er
This is a lot of expense toward high availability while only having 30-45min
of backup power.

------
abrookewood
Once upon a time I would have been very envious of this set up. Now I just
shudder to think of the hassle of maintaining all of this.

Don't get me wrong, I still have highly available Internet at my house - I
just tether my laptop to my phone and I'm done.

------
intrasight
Since my internet (Fios) is way more reliable than my power, I'd first need a
whole-network UPS before worrying about internet redundancy. When I do lose
internet - which almost never happens - I switch to using my smartphone as
hotspot.

------
jwbensley
It's great that you have documented this process, especially the failures
section, not enough people do this in my opinion. However, it really annoys me
when people make these blog style posts on GitHub. Sorry OP, I for one
disapprove.

~~~
mihaifm
What's the problem with posting on Github? I could see several benefits for
it: no ads, source control, easy edit from the web page, notifications for
your followers...

------
stevewilhelm
I just skimmed the original post, but I didn't see an off site data backup.

Maybe you missed the New Yorker article entitled 'The Really Big One' [1]

[1] [https://s831.us/2KyfcEw](https://s831.us/2KyfcEw)

~~~
bradfitz
OP here. I mirror all my data to Amazon S3 and Google Cloud Storage too. Or
rather, Perkeep ([https://perkeep.org](https://perkeep.org)) does this for me.

------
foobarbazetc
I have 3x Asus OnHubs running Google WiFi and they deliver GigE from WebPass
fairly easily.

When that fails I switch to my iPhone. :)

(On a more serious note, I’d like to see the basement or whatever with raised
floors. Come on Brad. ;)

------
linsomniac
I went to the page to read details about how he load balanced upstream
connections, or if he was using heartbeat or whatnot. I didn't find that, but
what I did find was a gratuitous amount of kit that made me happy my
infrastructure choice at home is much, much simpler.

My setup is Comcast going into a simple, reliable Surfboard modem, feeding a
Google Wifi setup. If it goes down, which it just really doesn't do, we can
use cellular data.

Complexity is the enemy of availability. Keep it as simple as possible, but no
simpler.

~~~
linsomniac
(But, then again, my favorite home router is a Bosch :-)

------
flyGuyOnTheSly
That's a beautiful setup, but I'm curious... do a lot of people around the
world still struggle with regular internet downtimes?

I can hardly remember the last time that my internet connection cut out... but
if I had to guess... it was probably during the peak of a 100 year storm we
had a few years back that put the entire area underwater for about 48 hours.

Transformers were blowing up all over the place, the power was out for days in
some areas, and yes the internet went out as well at that point.

I live in the GTA FYI.

------
amorousf00p
I look at this setup and say to myself that this is just the wrong way to do
it. A 'floating' vm to NAT and route? Ceph does look very nice but I have no
need for anything but file based storage.

Here is my top down take on a more traditional (cheaper) approach. * 2 1G 5
port edge switches * IDS * vrrpd balanced cots NAT routers -w- RIPng + nginx
as generic and web proxy. * LAN 1G 12 port switches (1 hot, 1 cold) * 2
synology NAS (redundant, manual failover). * etc...

------
late2part
* The whole setup including all APs and switches draws about 220 watts idle. Power is pretty cheap in Seattle. Washington State (as of April 2018) has the cheapest electricity in the United States, at $0.0974/kWh.

[https://www.electricitylocal.com/states/washington/quincy/](https://www.electricitylocal.com/states/washington/quincy/)

The average residential electricity rate in Quincy is 4.85¢/kWh.[1]

4.85 << 9.74

------
sbr464
You should look into pfsense running as a vm on multiple hosts. You can sync
the configs with CARP. It's pretty solid, we use this setup in a couple of
data centers, few years with no downtime, and has failed over several times.

[https://www.netgate.com/docs/pfsense/highavailability/config...](https://www.netgate.com/docs/pfsense/highavailability/configuring-
high-availability.html)

------
lukeholder
In australia I just got the new Telstra "Smart" modem. Has a built in 4G sim
as a fall back when ADSL is down. Doesn't cost any extra. Pretty sweet.

~~~
jeauxlb
sweet until a backhoe takes out a fibre and your entire exchange/SAM ends up
saturating the 4G network, resulting in negligible network connectivity, and
two heavily disrupted networks.

------
rapfaria
Is this an overkill setup for Twitch/Youtube streamers?

~~~
regnerba
This guy doesn't appear to be a Twitch streamer. Aside from his rack having
stickers for Go, Kubernetes, GitHub, and more, his Twitter description doesn't
say anything about that.

If you're asking about in general would this be a good thing for a Twitch
streamer... then I would say no. Mostly because most Twitch streamers are not
going to know how to maintain something like this and they don't need all the
servers.

If someone not so technical, Twitch streamers included, needed the redundant
internet I would recommend something more along the lines of two ISPs like
this guy (specifically over two technologies if possible: fiber and wifi, but
that comes down to bandwidth requirements) but instead of going into multiple
switches and having 3 servers running with VMs moving around just plug the two
ISPs into something like the Unifi Security Gateway (USG) or USG Pro.

~~~
packetslave
it's Brad Fitzpatrick: founder of Livejournal, Golang core team member,
original author of memcached, SWE at Google.

~~~
regnerba
Thanks for the info! :)

------
sajal83
I have a redundant 2 ISP setup, and use multipath TCP to use both of them at
the same time.

A very outdated post about my setup : [https://www.sajalkayan.com/post/fun-
with-mptcp.html](https://www.sajalkayan.com/post/fun-with-mptcp.html)

I now have 2 broadband ISPs, and optionally I can hook in my phone's 4g into
the mix.

Multipath TCP allows me to "mix" bandwidth of both ISPs at the same time.

------
xupybd
We found that using a server as a router was not very robust. We were getting
strange problems all the time. The speed wasn't that great and finally we
replaced that with an off the shelf router and all the pain went away. I know
this was a software / configuration problem but we couldn't get it to work
well. Has anyone else encountered these sorts of issues? If so did you manage
to get it working well?

~~~
bradfitz
That's pretty vague. A server (no details) didn't work as well as hardware (no
details). Lot of missing info there.

------
ulyssesgrant
This makes my mediumly-available remote access home setup look even more like
child's play than it already does :)
[https://www.whoisdylan.com/sitdown/2018/05/31/connecting-
to-...](https://www.whoisdylan.com/sitdown/2018/05/31/connecting-to-a-home-
computer-with-a-dynamic-ip-address.html)

------
blago
This looks impressive but it doesn't seem to account for hung ISP modems. It's
a pretty common issue with consumer-grade service. If not handled properly
(e.g. power cycling) eventually both connections might end up inoperable.
Personally, I use a smart power switch that will cut off modem power for a
minute if pings start to fail.

------
ai_ia
Although, I don't understand the details, this is pretty impressive. But the
real question is why do you require it?

~~~
glitcher
> The primary goals of this project are...

> to have a highly-available home Internet setup, with no SPOF (Single Point
> of Failure)

> to learn and have fun.

------
weitzj
Nice that he is using Proxmox again.

I was researching and experimenting: What hyper visor is out there providing a
good file system (zfs) and also full disc encryption at the hyprvisor level?

tldr: FreeNAS

And it came out that this is not that trivial.

You can buy a Vsphere/ESXi license for encryption, but (probably) don’t have
the same capabilities as ZFS.

You could use Hyper-V and have encryption but no ZFS.

On the other side there is Promox (Debian 9 stretch) which has an installer
which uses ZFS (but no encryption). You can jump to some hoops and make a
manual Debian 9 Installation with ZFS and luks (for the encryption) and then
install Proxmox. Then you have to watch out to use the ZFS version Proxmox
uses (instead of the Debian version)

You could use OmniOS, SmartOS to get ZFS, but again no encryption out of the
box.

Solaris 11 has the ZFS and encryption part figured out, but the hypervisor
part is not clear to me.

So FreeBSD has ZFS and encryption (GELI) figured out as well. For the
hypervisor bhyve. Still there is manual work.

Then there is FreeNAS. It has ZFS, Encryption -and- hypervisor streamlined. :)

Some people use it as a VM guest inside Proxmox/ESXi, pass through their discs
and from FreeNAS Export either NFS or ZFS over iSCSI back to the hypervisor to
use as a storage pool.

Or as I found out, FreeNAS 11 has the bhyve hypervisor built in. You can have
FreeBSD jails for BSD and Linux, or full VM guests via bhyve like Windows or
Docker/Kubernetes.

FreeNAS ships with RancherOS as the minimal Linux vom, which can act as a
Docker host.(if you don’t want to setup your own)

So for our use case of having a safe file system and full disc encryption and
be able to launch VMs, and to have this very easily installed on an USB stick
with minimal configuration and excellent documentation, I would recommend
trying it out.

Of course Proxmox has live migrations, which is not figured out here. Probably
Kubernetes would help.

Probably the other good way would be to have drives and a mainboard which
support encryption at the hardware-level. Or wait until zfs on Linux v.0.8 is
more in use. It contains encryption support.

[https://doc.freenas.org/11/vms.html](https://doc.freenas.org/11/vms.html)

------
jonotime
bradfitz, any idea why the soekris maxes out at 300 Mbps? I have been looking
for info on that since thats my gateway (PFSense) at home and I think its
limiting my speed since I recently got gigabit fiber. I might replace it with
my espressobin running OpenWrt.

~~~
voltagex_
Hey, another Espressobin user! Did they ever fix the part where PCIe will
kernel panic the machine?

I think you'll still hit bottlenecks with the switch on the Espressobin -
Marvell hasn't enabled hardware acceleration, at least for the open source
parts.

------
EGreg
I wonder if one can multiplex over several connections (including wireless) to
get better throughput when they are all working, and then simply reduce to one
of them if the others fail?

Can someone write up exactly how to set something like that up, maybe show us
some urls?

------
teraflop
I'm curious about the redundant power setup. Does each server draw power from
both PDUs? Or do you have two servers on one PDU, and one on the other?

With three servers, if you have two power failures then the Ceph monitors will
no longer be able to achieve a quorum.

------
Tepix
I like the technical aspects. However: 220 watts idle power consumption? What
a waste of resources.

In practice using a Wifi-router with 4G fallback would achieve similar
availability at a fraction of the cost and power consumption.

------
fouc
We should be thinking about internet with large latencies, such as if you're
traveling in outer space. How would you design for that when you only have
intermittent connection?

What would you cache & how much?

~~~
zeth___
It's already been thought about:
[https://en.wikipedia.org/wiki/Interplanetary_Internet](https://en.wikipedia.org/wiki/Interplanetary_Internet)

~~~
fouc
Well, I perhaps shouldn't have said outerspace. I also meant locally, such as
on a sailboat or in an area with inconsistent internet. Or maybe only connect
to the internet once a day. Would be cool to have a good setup for that.

~~~
zeth___
If you follow the links on the page: [https://en.wikipedia.org/wiki/Delay-
tolerant_networking](https://en.wikipedia.org/wiki/Delay-tolerant_networking)

------
vonseel
Hrm. I have AT&T fiber. It does not go down. Ever.

OK, it went down once right after install but that was due to a tech
accidentally disconnecting me at the node while connecting a neighbor.

------
senojsitruc
> Washington State (as of April 2018) has the cheapest electricity in the
> United States, at $0.0974/kWh.

I'm in the Atlanta area. $0.07181/kWh.

~~~
endianswap
I believe Washington has certain areas that are the cheapest in the United
States, but it's not state-wide.

For example, Grant County PUD for residential customers: $0.04547 per kWh

~~~
waiseristy
Chelan county is the one with the cheapest electricity ~0.035/kWh

------
OutsmartDan
Just out of curiosity, how much does this all cost?

~~~
iv42
[https://twitter.com/bradfitz/status/1013880036334526464](https://twitter.com/bradfitz/status/1013880036334526464)

~~~
craftyguy
Too bad he doesn't detail monthly cost. It's likely to be more than $10k/yr.

~~~
regnerba
He basically does detail monthly costs. You're implying it costs him about
$833 a month to run.

His gig internet is $80 a month: [https://www.centurylink.com/fiber/plans-and-
pricing/seattle-...](https://www.centurylink.com/fiber/plans-and-
pricing/seattle-washington/)

His wifi backup internet is $40: [http://www.gigabitseattle.com/residential-
services](http://www.gigabitseattle.com/residential-services)

He specifically states the setup draws 220 watts at idle and that his
electricity costs $0.0974/kWh. So 220 _24 /1000_0.0974 = 0.514272 per day, or
about $15.40 a month at idle.

So around $135 a month.

~~~
craftyguy
> 220 watts at idle

yea if it's idle the entire month, which is doubtful. but even if it's not,
it's not likely to be too much more than the $135 you calculated. I figured
the internet service would have been more, since the rest of us get screwed by
our ISPs on costs.

~~~
regnerba
Yeah, $80 for gig internet... I wish :(

~~~
pbarnes_1
WebPass in SF is $60/month for GigE. It's kind of amazing.

No modem, just an Ethernet drop into your home.

------
camdenlock
bradfitz deserves the accolades that went instead to Zuckerberg. He’s the real
genius hacker to admire.

------
justizin
def had myself set up to have wifi-only during power outages when i was a
student in my first apartment in SF, but the no-SPoF here is above and beyond.
i'm really curious about the switch configs, nothing like UniFi existed the
last time I tried to do network-HA.

~~~
halbritt
I love Unifi. They're the only consumer-grade access points that will do
roaming worth a damn. They're access points, though, and don't function as a
router.

I use the UAP-AC and an EdgeRouter. The EdgeRouter has relatively
sophisticated capabilities for a piece of consumer great network gear. I have
GigE fiber to the home and get ~900Mbps through the router (and ~400Mpbs
through the access points).

I generally don't recommend the USG, which has similar functionality and is
integrated with the Unifi management platform.

~~~
pedrocr
>I love Unifi. They're the only consumer-grade access points that will do
roaming worth a damn.

What advantage do you get with that versus just running a bunch of APs with
the same SSID/password bridging to a single router? I do that with 3 cheap tp-
link routers (1 as router, 2 as APs) and LEDE and both my laptop and phone
work seamlessly. At one point I considered actually doing full 802.11r AP
roaming but the only actual use case I had for that was doing VoIP calls while
roaming between APs with no drops. Everything else works fine with the small
interruption of switching APs.

~~~
evil-olive
Unifi has a centralized controller that each AP talks to, and is able to
coordinate roaming. Gives you 802.11r-like functionality but with close to
zero setup beyond what's already needed to set up each AP (which is also
centralized at the controller, so adding an Nth AP to an existing site is
almost trivial).

> Note that UniFi Fast Roaming is not a direct implementation of 802.11r - it
> is a solution taking inspiration from 802.11r, with a few key proprietary
> differences. We've found that Fast Roaming provides about 90% of the roaming
> improvement offered by BSS Transition. However, Fast Roaming does not
> require client support, allowing backwards compatibility with all clients.

[https://help.ubnt.com/hc/en-
us/articles/115004662107-UniFi-F...](https://help.ubnt.com/hc/en-
us/articles/115004662107-UniFi-Fast-Roaming)

~~~
halbritt
Reiterating this: Management and updates are wonderful through the unifi app.
It functions similarly to Meraki, better in my experience and there's no
recurring fees.

The app can be self-hosted, run in the cloud, or on something they call a
"cloud key" that's not much more than a Raspberry Pi.

I've run it on a Pi3, and it's a little bit laggy, but tolerable. I prefer to
run it on my little x86 server.

Both the self-hosted and the cloud version can be managed remotely, which is
neat.

There are a host of other benefits, but given the price (around $100 per AP),
I see no reason to use the more commonplace consumer grade stuff. Check out
the unifi demo here:

[https://demo.ubnt.com/manage/site/outlets/dashboard](https://demo.ubnt.com/manage/site/outlets/dashboard)

~~~
pedrocr
For me LEDE/OpenWRT are easy enough to configure and fully open source. APs
are also about half the price you quote and the selection is much wider as it
works with most manufacturers (including UniFi's). So I see no reason to
depend on a more expensive closed-source solution that I never know when it
might go away.

~~~
halbritt
OpenWRT is pretty powerful, certainly. If you prefer an open source solution,
that's definitely the way to go. I've used OpenWRT which I liked quite a lot,
but I prefer the ease of having an integrated solution with the Unifi
products.

------
KiDD
I do this with pfSense with 3 different WAN connections. Fiber, Cable and
Cellular.

------
qrbLPHiKpiux
What’s the monthly cost?

------
post_break
He says 9.7c per kWh is the cheapest in the states. My 8.8c per kWh locked for
any usage in Texas would like to challenge that. (You can get down to like
4.5c if you get a plan with variable pricing based on usage too)

~~~
bradfitz
That's Washington's average. The actual rate is 5-6 cents or so for the first
block, and then 15 cents after some amount.

------
amelius
With such a setup, it must suck extra when AWS is down.

------
iosDrone
What's the point of this? Like, what can you do with this elaborate setup that
you could not do already with your laptop + internet connection?

------
melenaos
This is such an overkill for home setup

------
TomMckenny
>I have two ISPs

Ah, if only that were possible here.

------
smilbandit
an important, at least to me, point of data that's missing is the Decibel
level.

~~~
bradfitz
It was 55 dB a meter in front last time I checked, before the new rack which
if anything made it a tad bit quieter.

It's in my garage, though, so I don't care. But it's not annoyingly loud. I
used to have a 1U server in this same garage that was annoying... old Xeon
that drew 200+ watts idle with killer fans.

------
hossbeast
Nice job, Brad

------
preillyme
I love when @bradfitz says, "enterprisey".

------
poe876
Very impressive. Thank you for the incredible write-up. I got a whole bunch of
ideas for my own business's network architecture while going through your post
as most of my needs match up with that of yours, as you elaborated in your
post. Can't wait to post back the results post-implementation.

------
ythn
Needs satellite link as a backup-backup!

------
JAdamMoore
Brad, this is @DieLaughing. I just _know_ you're reading the Hacker News
comments. This is an amazing setup. I'm super jealous. ;)

------
thiagocsf
> Power is pretty cheap in Seattle.

This setup runs at about $50/mo, or $600/year, when idle. Do I have this
right?

Doesn’t sound cheap to me.

~~~
bradfitz
Could be worse. I probably wouldn't be doing this in, say, Hawaii.

