Hacker News new | comments | ask | show | jobs | submit login
My home lab setup for highly-available Internet (github.com)
699 points by bradfitz 7 months ago | hide | past | web | favorite | 287 comments



I also have redundant WAN at my house, slightly less sophisticated. Comcast (primary) and U-Verse (backup) on separate modems (wired only, no WiFi). When an outage incident occurs, and it gets escalated, I received a page (iMessage from family member, "Dad, the WiFi is down!"). If I'm away from the NOC/DC, I call the DC remote hands support line (call onsite family member), and have them perform a hard cutover ("go to back of the device with the antenna thingies, disconnect the BLUE cable and plug in the YELLOW cable").

I do have a UPS on the modems and main access point.. but after reading this post, I may invest in diesel generator and a 5,000 gallon subterranean tank.


OP is using CenturyLink fiber. I don't know if things have improved in the two years since I moved from Tacoma, WA where I had it, but it was dreadfully unreliable back in 2016. The unreliability wasn't caused by the fiber drop itself but rather, by a super shitty oversubscription issue up in their Tukwilla/Seattle exchange.

Their IPv6 situation was even worse. They used 6rd and I swear, the translation box was probably a single router or Linux box with a 100 Mbit NIC in a rack somewhere. If you bothered enabling 6rd, every v6 site would be awfully slow. Even the browser projects to automate the selection of v6/v4 didn't help.

When I finally moved away and cancelled the service, I mailed my modem back as directed. A few months later, they sent my account to a collections agency over the cost of a modem, which their system claimed to have not received. I spent hours on endless phone calls but ended up just paying them the $250 or whatever to save my credit and stop the madness.

Seriously, they were the worst provider I ever had.


I have Century Link fiber in Seattle and the internet experience has been good. They do keep charging me for 2(!) modems, though--one that I mailed back and one that I never had. Every six months or so I call them up and they credit the erroneous charges back to my account and remove the modems. Invariably the modem charges show back up 1-3 months later. I'm pretty sure this is some sort of procedural dark pattern meant to rip off everyone audacious enough to bring their own modem but not routinely check their bill.


I wish some attorney would initiate a class action over this. Attorneys file class action lawsuits over all sorts of petty charges all of the time and win. Reading these comments, this certainly feels like a pattern. The dollar amounts involved are not small, either.



Same thing happened with me and Time Warner Cable in New York. Every six months they would mysteriously boost my rate. I would call into cancellations and they’r refund the erroneous charges, only to try again in six months. A letter to my Attorney General stopped that crap.


"I will be contacting the attorney general's office, consumer fraud division, what was your name again?" is a very powerful way to get results.


> is a very powerful way to get results

In my experience, nothing happens until you actually make contact. Legal isn’t usually pleasant. But they’re almost always competent.


You threatened a lawsuit to a $8 an hour support rep. Do you think they really care? They just want you off the phone for metric reasons so they get a bigger raise.


Yep, I didnt a stint in one of samsung's global escalation centers and all the reps always rooted for people to threaten shit like this. Once you go legal there's no going back and Im no longer allowed to talk to you, only corporate counsel.

We all knew that 95% of all legal mumbo jumbo threats where all bark and no bite, and the other 10% was someone else's problem. So threaten away and know that you're making reps days.


Those ubiquitous "Turn on Auto-pay" buttons, in bright blue, are surely a hook into just such a hustle. How many people would bother to check the charges every month once they've signed up? Only the providers know for sure ...


That's crazy. I had the exact same experience, except for me they said the service had been cancelled, but then sent me a bill 3 months later for 3 months of back payment I "owed". And in my case I called and complained, but ended up just giving up and not paying.

They still come by to sell me fiber a few times a year, and I explain I would try again if they "forgive" the money I owe, the salesperson says "no problem", they spend 20 minutes on the phone to HQ, then say it's impossible :)

Oh yeah, and I had the exact same oversubscription issue. The service was fast as hell, until the middleschoolers got home at 2:30ish, then it slowed to almost unusable speeds.


My experience with them in Seattle has been relatively positive; no IPv6 slowness, good and consistent throughout overall. Moving inside of Seattle was mindbogglingly difficult for some reason and involved a 3 hour phone call with 10+ transfers and several layers of escalation just to confirm they serviced my new house... but once that was out of the way no issues.


I have Century Link in Seattle. IPv6 is laggy as hell even though internet speed tests report >400Mbps. They also mysteriously doubled my service fee 10 months in. Comcast had better latency and less "billing-anomalies"


When I moved between apartments in NYC, Time Warner Cable internal systems got very confused somehow. I had working internet at the new address, and was only being billed for the new address, but the old address was still under my name somehow, so the people who moved in could not get service. I was called by a TWC sales rep to ask me to call TWC support to clear this up for the new tenants. TWC support transferred me between accounts-management and network-tech multiple times, they were all confused. Took about 2 hours of active question/response. (I went through it to make sure I wouldn't get some mystery/impossible bill in the future.)


IPv6 is faster on Centurylink Fiber than IPv4, by a good 25 to 30ms (eg: local servers will be 2ms to 3ms over IPv6, or 27ms to 33ms via IPv4). This is primarily due to much more open peering policies for IPv6, they peer with Hurricane Electric in Seattle on IPv6 (but not via IPv4), HE has established itself as a critical peer for IPv6 (which benefits me greatly!).


Not sure if you'll see this, but - maybe you should see if you can find an IPv6 datacenter/hosting provider within short throw (tens to hundreds of miles) of the peering center, and VPN all your traffic!

(Hm. Bandwidth costs might throw that idea out the window, but it might be absolutely worth it for gaming - or maybe you could sign up for game server v6 alpha/beta testing, heh)


Heh, I do read most replies. To answer your question, I do VPN some latency sensitive stuff (and the free wifi I offer) to nearby servers as I have a few dozen TB of extra bandwidth included in my $30/month resource pool with 'em.


They have had persistent peering issues with YouTube, too, causing abnormally slow speeds.[1] Another tier 1 ISP making their customers suffer so they can shake down content providers for bandwidth fees.

Oh well, let's keep gifting them billions in CAF funding to build out more poorly-maintained private networks and allow more monopoly mergers with other tier 1 providers. I'm sure that'll help and not cause decades of stagnation.

[1]: https://www.dslreports.com/forum/r31539668-Awful-YouTube-Con...


As a primarily urban provider, CenturyLink almost certainly pays more into the CAF than it gets out of it. (CAF is money is collected from all ILECs and redistributed to rural ILECs).


Over IPv6 they have direct peering with Hurricane Electric in Seattle, which brings local latency down to 2 to 3ms. Performance for Youtube & Netflix is way better over IPv6 too, but in some edge cases I'll tunnel through a VPS in downtown to get close to filling my pipe when downloading stuff from overseas.

Reliability wise, I've had 1 outage in two years, which happened at 1am on a weekday for slightly under 2 hours due to a PPPoE Aggregator failing. Far better than Comcast or Wave ever were, and at our usage I don't think either of the cable providers would be viable. We hit 12TB last month iirc, not even a peep from Centurylink.


Fair enough. I've only had personal experience with CenturyLink's DSL service which in comparison to their fiber is abysmal. I'm probably letting my anger towards CenturyLink's poor stewardship of rural copper and the gov't rewarding them with free CAF money cloud my judgment.


Centurylink needs to get their worthless trash (aka copper) off the poles, the only way they'll gain and retain customers is with modern infrastructure (not ADSL like they've stranded so many areas on) and reasonable billing practices. VDSL2 is moderately competitive where its deployed, but old non-serviced areas like my part of Seattle (which never saw DSL) or poorly serviced areas on ADSL need to see upgrades immediately if Centurylink is to retain profitability on its non-carrier services side of the business.


I am not hugely familiar with how wholesale traffic works on the internet but I am a bit suprised to hear about different routing between IPv4 and IPv6. Isn’t any network hardware manufactured in the past 10 or 15 years dual stack? Surely by now we must be close to 100% IPv6 ready on the backbones. Why would there be a different route for v6 packets?


12TB? Is this a business or a residential service? How do you use that much data?


That is on Residential service, mostly Sling & Netflix. When I was on Centurylink's Prism IPTV product (which is great BTW (besides pricing), best linear TV experience out there, instant channel switching and live view of the last 5 channels you watched) we would regularly use in excess of 15TB as the IPTV traffic was constant unless you turned the STB off on each TV.


Right, I mustn't have been thinking properly - that's only about 36 megabits per second sustained. If that's multicast or from a local-ish node (which IPTV probably is), it's nothing.


Well, it was UDP Multicast when we were with Centurylink's IPTV, but now all that bandwidth has to transit an IX or transit provider.

Speaking of UDP Multicast, PFSense dropped all support for it without warning before updating, which was the end of me using PFSense at home as it broke TV until I could replace it with OpenWRT.


I'm guessing he's downloading "Linux ISOs".


Nope, that is minor portion of my traffic, nearly all of it is Sling with a bit of Netflix. Turns out high quality streaming video being left on 24/7 on multiple screens burns a ton of bandwidth :P


It seems wasteful to just leave on something streaming, just because you can.


Bandwidth is ephemeral, if you don't use it now, its not as though it will pile up and be usable later. With Fiber, there is no good reason to moderate usage, I'd be more concerned with the few watts the endpoint burns (which costs sub-$10 a year) than an extra TB of usage.


I have this on occasions. Action Cam footage which I backup to the cloud. If you use 4k it is a lot :)


I had the identical experience when they first launched and started heavily promoting their fiber to the home in seattle a year or two back. Same terrible v6 site performance, same oversubscription issues, and same "we never received the modem" claim. This claim didn't show up till about 2 months after I sent it in, by which time I no longer had my UPS tracking number. Seems like a real pattern with them and felt scammy even then.


> ... I may invest in diesel generator and a 5,000 gallon subterranean tank.

I highly recommend propane-fueled gensets. Fuel storage is much less hassle, and propane doesn't go bad. You won't get the runtime of a huge diesel tank. But there's often little point in that, because an extended power outage will also take down the telecom infrastructure. As I recall, a ~7kW genset at ~70% capacity went through ~40kg propane per day. That was running well pump, sump pump, refrigerator, CFLs, microwave, fans, and a ~3kW UPS for several computers.

Edit: Make sure to get a UPS that accepts genset power. That usually means full online aka double conversion. And everything must be grounded properly. Plus at least a manual transfer switch, to avoid injuring utility workers.


Diesel doesn't go bad, it can be stored for long periods of time. It's much cheaper to run a diesel generator than propane since diesel is much more energy dense but costs about the same.


How long? For people with decent electrical service, you need the genset for at most several hours, maybe once or twice a year. Your UPS should handle ~30 minutes, which is enough for most interruptions. So your fuel supply needs to be stable on the order of decades. Is diesel stable on that scale? And sure, propane can leak. So you need to check every few months.


I have one too, it's my cell phone with unlimited data. The hot spot can last for 4 hrs.


Ah, yes. I call that my "metered tertiary" because my "unlimited plan" isn't unlimited when it comes to tethering.


Look into modifying the TTL of your TCP requests.

IIRC that's how at least one wireless ISP measures hotspot data different than from-phone data.


Interesting. Is there any other way? Like is there a flag in wireless data that tags tethered traffic? I was always curious of how an ISP can tell.


Back in my Nexus 5 days while on t-mobile all you had to do was tell the phone to use the same gateway IP for tethered traffic as mobile traffic. Quick ADB command later unlimited tethering.


On some Android devices they have the OS report it separately which is one reason they hate the ability for you to root your own device.


They could look for requests to things like Windows/MacOS update servers, URLs that phones will never be accessing basically.


But that would allow them to tell that tethering is being used, not to measure the traffic to make it contribute to a cap.


> Look into modifying the TTL of your TCP requests.

so what, increment by 1?


Or more precisely use the default TTL of the phone (iOS has a different default TTL than windows) + 1.


Yeah basically


Same but it's very easy to bypass. Just install a proxy on your phone and set your system proxy to the phone in your OS. Then all the connections look like it comes from your phone as the PC is using the phone as a http proxy.


Ya same here, With my two additional power banks, which costed 20$ each, I am looking at 2 days of backup. And there is unlimited data (5gb/day/highspeed) that is enough for most uses. For additional usage if any, like a big Xcode update I can top up the data plan with additional 5gb for few dollars.


Yup, I do the same. I always also carry a big portable battery just in case.

https://www.amazon.com/Omnicharge-Portable-Power-Bank-connec...


I have redundant wireless at my house - fios for primary and t-mobile for backup. It's not a seamless handoff because I have to turn on the hotspot on my phone.


Went to the comments to say exactly this. It's a much simpler setup!


Well too late now but this is why you either hand it in to a human and get a receipt or you send it via certified or registered mail because that'll hold up in court.

It's unfortunate anyone has to go through so much trouble to prove to Century Link what their inventory system is probably telling them anyway but it's always best to protect yourself.


>I do have a UPS on the modems and main access point.. but after reading this post, I may invest in diesel generator and a 5,000 gallon subterranean tank.

I'm not sure if that's a joke or not, mainly because after reading it I'm thinking "It's a stupid idea, but ... no it's a really stupid idea, but ..."


Read the post again, starting from the top, and consider that maybe the entire thing is satire.


Oh :(


> 5,000 gallon subterranean tank. ...about 19,000 litres. That seems like tremendous overkill. Are you also planning on using that for home heating? If not, it seems like a very large maintenance burden for anything other than some kind of survival scenario.


Are you sure it's not a joke? ;)


Hey how can i learn more about all this? id love to understand whates going on, on the github page, im following a bit but still want a better understanding the way you and the rest of the commenters have, what are some good starting points?

Any resources, books, links, youtubes especially, that you can point me to?

Also, in his set up, theres no router? He says hes using a VM? What does that mean?


>Hey how can i learn more about all this?

Go to your router's config page, and google all the words/acronyms you don't know. Read the Wikipedia pages too. That should put you in a position where you can ask better questions.

Router is a generic term for something that takes wired Ethernet and performs NAT and creates a wifi network( something an access point aka AP does). The NAT here is handled by a different device(the by the 'VM', which is a program that runs an entire OS by simulating a computer) and the AP here is multiple UniFi devices[1].

I would say get familiar with googling technical terms, because there won't always be someone willing to answer questions

[1]:https://github.com/bradfitz/homelab#wi-fi-aps


> Router is a generic term for something that takes wired Ethernet and performs NAT and creates a wifi network

not to nitpick, but this not actually true. for home use, perhaps it is conflated to mean this, but really it is one machine that reroutes traffic on behalf of another - NAT/wifi or other media transformation is not necessarily required here (though definitely could be a part of it)


Well a router takes a packet from one of its interfaces and uses the Internet Protocol address encoded in the packet header to determine which of its interfaces to forward the packet to based on the preconfigured destination subnet for the interface. Most home routers only have two interfaces and two subnetworks.

Most people are familiar with home routers which do Network Address Translation and have a built in Wireless access point. Neither of those things are required for something to be a router. In fact routers with IPv6 support do not perform NAT between your local network and the Internet for IPv6.

Similarly, DHCP, DNS, and various other things that home routers do can actually be handled by totally separate hosts on the network. That's what he's doing.

If you're looking for more information about how networking works, I would highly recommend Computer Networks by Andrew Tanenbaum. It's more abstract than the typical "Understand TCP/IP in 600 pages" books that are available but it provides a good high-level overview of how networking works, what protocols matter, and how everything fits together.

Edit: When he says he's using VMs, he means that he's using Virtual Machines to run multiple operating systems on the same server. Each of these operating systems runs one or more servers for DHCP or DNS or other networking services. I assume that he's using his virtualization platform to mirror the virtual machines between his servers and provide hot spares, so that if one VM goes down another spare can step in.


It is a complex subject. pfSense is Open Source and IMNSHO the best Swiss Army knife of routers/firewalls.

Download this: https://www.pfsense.org/download/

Have a read of this: https://www.netgate.com/docs/pfsense/

Hang out here: https://forum.netgate.com/

The VM is a router. Provided you ensure that traffic has to go through the VM and that the VM is able to route etc then it is a router 8) OP is using Linux whereas pfSense is FreeBSD based but pfSense is pretty much the only (near enough) turnkey product that does multi WAN and CARP properly. I should mention OPNsense as well here for fairness.



Works fine in xen and esxi though.


I'm pretty sure this went over their head (don't intend to be mean).

I believe they asked, what a VM actually stands for and what is means, rather than what it's for.


Fair enough. For a laugh I searched for "vm" and got a wikipedia article on virtual machines. GP did ask for rather a lot of clarification but then you (we) still have to allow for those times when someone just does not get it, despite everything.

For example, I'm diving into Home Assistant, I don't think I'm daft but I ended up posting what turned out to be a really silly request for help because I was not a local and used to the scenery. I'd read all the docs, which I will soon subtly alter, but missed an implied (if you knew the system) point.

How the heck do you describe what a VM router does, quickly? 8)


Reddit. /r/homelab


But, can you toss the connections from both providers into a switch and make both avaialable for use all the time? Like a Active-active setup??

Any reason not to do that?


And put my remote hands worker out of a job? No way.


Yeah, but they must cost you a bundle!


That, my friend, is a sunk cost.


So you are committed to paying for the remote hands and can't get out of it, or have already fully paid for lifetime support from the remote hands and can't get a refund :)

That doesn't mean that there can't be cost savings from automation. For example if it costs X in lost business due to a misunderstanding by the remote hands that extends the outage unnecessarily, then a certain number of times avoiding that X cost would pay for the investment in automation. You pay the "sunk" and the new cost but you avoid unnecessary costs in the long run.

It's all a matter of fully modeling your costs and benefits. Noting that certain costs are sunk is a partial model.


See: sunk cost fallacy


From 5 minutes of Wikipedia reading, it appears this requires special support from the router to enable “sticky sessions” which prevent out-of-order packets; or, the device OS itself can “stripe” packets across NICs with special (extra) software to enable that.


ECMP handles this just fine by default with Linux (it's per flow aka TCP connection, not per-packet).

You can make of course get it to be per-packet load balanced, but as you note, there are issues with that when you don't control both ends.


You can use a hash of source and dest ip, protocol and port, but you will get confusing results and some sites won't be happy.

Using source ip to round robin on active wan connections is the safest.


that really depends on which fields are included in the ecmp hash and can break stuff in weird ways, like path mtu discovery


When would your onsite support assistant switch back to the blue cable?

Can't you keep both connected to a router and have a script do the switching instead?

Anyhow, still impressive.


Whenever I see solutions like this I think back to an org I worked at where a high-visibility day-long database outage gained upper level management attention. The response, after the managers talked to our vendor (IBM), was to re-architect everything to use HACMP clusters for all of our production databases company-wide.

That was followed by a couple years of 100+ hour/year cumulative outages due to HACMP stability issues, and an environment that everyone was deathly afraid to touch.

The hardcore network engineer in me appreciates the detail in these kinds of solutions, but these days the practical side of me is satisfied with usability and maintainability of SPOF cable access with a manual failover to mobile hotspot on the rare occasions that drops offline.


Former network engineer here, can confirm. Time and again I've seen redundant systems create their own problems where without all that extra complexity things would have been fine.

Even ISPs and CDNs I worked with sometimes have surprisingly uncomplicated redundancy systems (sometimes just a handful of small routers they are very much ready to power down to cut over to backup paths or bring up new paths) and often they do not use the more complicated methods.

The catch with complicated redundancy is there is always a very close relationship or protocol or something between redundant components, bet it storage systems, network systems, anything. Inevitably a system goes down or loses its mind and takes it's redundant peers with it.... every new system you introduce is one more piece that could reach out and take everyone else with it. I saw it time and again, and again...


I’ve seen overengineered and undermaintained HA systems result in much lower uptimes than a simple system with multiple SPOFs. I’ve seen well built and maintained HA systems fail under “rare” edge cases.

I’ve also seen well built and maintained HA systems work exactly as desired.

As a general rule, the cost of building and operating a reliable HA solution is not 2x, but at least 10x. If the system being protected is not worth that, you’ll very likely find the MTTR acronym far easier to catch than the rather more slippery HA.


Completely agree.

My home network is built with Mikrotik kit which is priced where it's affordable to have spares. I have yet to encounter a failure, but could drop in a new router in a couple of minutes with the saved configs.

I have SNMP monitoring feeding from telegraf into influxdb on an RPI. Dashboard rendered with Grafana on PC. Also have telegraf pinging to all 24x7 devices and collecting data from electricity meter, smartplugs, and Nests. It's been fun to do.


What advantage does that offer over something like LibreNMS which will do everything ?


Would you consider doing a write-up of how you set this up?


then you're not building your redundant systems properly.

Web, Power, Internet, Network, Military systems at scale use reliable redundancy and work w/ very little downtime.


The key part of redundancy is that your "redundancy glue"[1] must be significantly more reliable than each component, including its software and implementation -- because often the glue failing in isolation itself can cause outages. So the probability of failure was simply P(single failure); now for 2x parallel redundant systems it is P(single failure)^2 + P(glue failure). If P(single failure)^2 ~ 0, we need P(glue failure) < P(single failure), at the very least.

[1] i.e. the systems that interconnect the multiple redundant system, detect failures, redirect traffic, etc.


Very similar to the 'infrastructure as code' story, where you're still left with the construction and maintenance of the infrastructure that bootstraps the infrastructure as code systems.

Turtles all the way down, I guess.


> Turtles all the way down, I guess.

Indeed it is important in this case of course that this does not happen :) To see the increased reliability and P(glue failure)<P(single failure) you need to assure the glue systems are very simple and well built -- and preferably they need to be much smaller than the system you're protecting.

Another adequate expression to apply here is

"Who watches the watchmen?"

The answer again is the watchmen must watch themselves and be very reliable.

On this topic I recommend von Neumann's (the brilliant mathematician) "Computer and the brain" book, where he explores how computing systems can be reliably interconnected and how those failure probabilities interact. He was interested on how the brain could be so robust to failure -- don't worry there's no time spent speculating on how the brain works, instead he derives from first principles properties of reliable computing components, and possible reliable designs (the brain's unknown internal workings at the time, and now to a lesser extent, would follow as a special case). He used this same approach in analyzing the principles of life, where he came up with a self-replicating machine with a tape encoding of itself, predating the discovery of DNA -- it's a very inspiring and powerful approach. Unfortunately he could not complete 'Computer and the Brain', he was in declining health due to cancer and died while writing it. What was left is still very interesting imo. He is one of those giants whose shoulders we can sit on to peek over the horizon :)


Thank you.

As a caution against tenanting the deployment tools in-band, I'm reminded of an incident I witnessed about five years back. Company was moving their compute from on-prem to colo datacenters. Pretty good, mature setup: Almost entirely virtualized, 10Gb iSCSI SAN, credentials managed via a dedicated COTS tool, etc. They got most things over-the-wire to the DC. But the final migration had to be done cold - Shut the last bits down that were keeping everything running, move them to the DC and power back on.

Everything went very well until the SAN wouldn't come up. To get into the SAN and troubleshoot they needed the domain, which wasn't available. They had a local account on the SAN, the key for that was safely stored in the password manager. Which was a virtual machine. On the hyper visors. That wouldn't come up until the SAN was booted. Oops!

OK, that's a very obvious foot-in-mouth, in hindsight. As a more likely example, how about the Amazon S3 outage a few years back that wasn't reported on the status page, because the images for the status page were stored on... S3 :D

>you need to assure the glue systems are very simple and well built -- and preferably they need to be much smaller than the system you're protecting.

Absolutely agree.


Certainly it's possible to build redundant systems properly. But it's expensive. All the well-built redundant systems you listed understand that and budget for it.

Most half-baked redundant systems I've seen are a result of "I want four nines, but I only want it to cost 20% more than a two or three nines solution" type thinking.


Reminds me of what my brother in law says: I don't want to be stuck doing tech support for my family.

With my luck, it would catastrophically fail while out of town, leaving the wife and kids without internet.

My dad set up a lot of complicated stuff like this. As people are prone to do, eventually he died, and it just made it difficult to troubleshoot technical problems for mom. So now the equipment sits in some corner, unused, because we replaced it all with something your average AT&T technician could troubleshoot.


> With my luck, it would catastrophically fail while out of town, leaving the wife and kids without internet.

Two ISPs, two networks. One called "main", one called "backup".

If "Main" fails, move over to "Backup", either with a cable, or on a different SSID.


Where in some cases, the "Backup" is tethering with a smart-phone.


Are you advocating buying internet service from two different companies and paying for both every month in case one fails for a brief period of time?


> Are you advocating buying internet service from two different companies and paying for both every month in case one fails for a brief period of time?

That's not an unreasonable solution, considering most people already pay two ISPs (one fixed, and another for their phone/tablet). When your home wifi goes down, you're going to fall-back to your mobile anyway. I'm thinking of getting an extra data SIM, an LTE modem and do auto-failover.

--edit--

My needs are somewhat unique - my traveling laptop is on its last legs (and will be replaced by a cheap chromebook. Desktops/servers get better bang for the buck compared to laptops. Go figure!), so I tunnel onto a server at home for heavy-lift computing. If the internet fails when I'm not home, I'd be left stranded (and this has happened).


In my case my Surface Book 2 gives me all the firepower I need to not miss my desktop, and it also has a PCIE SSD on it like my desktop. I do agree, sometimes tethering is highly useful, at least in my case on my laptop. I try to keep as many things as offline capable as possible.


That's literally what the author of the article describes.

From a practical point of view I think it's silly to do such a thing for a residential situation, but I can appreciate using it as a learning experience for building systems like this.


Depends how reliable your isp is ans how much it costs if it goes down.

3g is good enough backup for me, but for the office we go for two routers two isps and vrrp on the lan side, load balance across the wans, with failover to the other one.


To be fair, mom probably will not be migrating VMs across three different supermicros and managing a ceph cluster to get a wifi connection.

I would not discount the possibility completely. But I judge it unlikely.


If I wanted a seemless non-SPOF network for my family, I'd put in two mikrotiks, with the primary on mains, and secondary on UPS, £120 for a pair to do routing at a decent (1gig) speed on the main, and built in 4G on the reserve.

Then I'd put the primary router on the wired line, the other one on a 4G sim which did nothing but heartbeats unless the wired line went down. If the wired line shut down, traffic would reroute via 4G within 10 seconds or so. If the primary router went down, the backup router would take over in a similar time frame. Might put some capping on the 4G router to the netflix/etc boxes to keep bandwidth costs down.

UPS would be about 10W, so £45 for a 4 hour one. Possibly look at renewable energy of some sort to keep the UPS going during an extended outage.

I'd then VRRP on the lan side with primary on the main router (which would have a backup route via the secondary router)

Cloud based VM to do monitoring/alerting and land outgoing openvpn tunnels from both routers to allow secure remote access.

£170, £10 a month plus main ISP, and an hour of config.

However in reality having an ISP provided router and showing them how to tether in a problem works fine. OK, they lose their devices if the main circuit goes off, but running those over 4G can be pricey.


There's a reason Arthur C Clarke's short story Superiority was once required reading at MIT [1].

[1] https://en.wikipedia.org/wiki/Superiority_(short_story)



EU would like to have a word with you.


Me, the person who put it online, or both of us?


According to the Wikipedia article, it was required reading for a specific course, no?


I had never been exposed to this. great read. thanks


This was actually a case study from when Clarke was an MBA intern at Google.


I'm pretty sure the sci fi write Clarker was never an MBA intern at Google, He'd have been 73 in 2000. Plus he was a scientist, now a biz person.


I'm sure GP was joking.

>because of its own organizational flaws and its willingness to discard old technology without having fully perfected the new.


Maybe.


I meant "not a biz person", instead of 'now a biz person', but I can't edit the original posting.


I used to work for a company whose setup was super simple.

ADSL Modem > Firewall > Router > Web/DB servers

It was basic, but it worked. Our web servers were mission critical, but as a B2B business they, and the ADSL connection, didn't sustain a heavy load. The only issues we had over several years were with the ADSL modem. Everything else just worked.

When we moved office we moved our servers to a co-hosting centre with an upgraded network setup with all sorts of backup and redundancy. Every week something went wrong. Sometimes simple is best.


I worked at a place that hosted the servers in-house. They even built a special little air-conditioned room and put a generator on the roof. I never knew all the details but there was dual everything, 2 lines coming in, stuff to switch between them, nothing could possibly go wrong... until the day it did. Turns out someone has plugged all the machines into a single extension cable, and the fuse popped.


Even the big boys do that in the big storm of 87 in the uk Telecom Gold (an early online service) was quite proud that the UPS kicked in - only to realize that the modems that linked to the x.25 network where not on the UPS :-)


My anecdata: I used to admin a SWIFT cluster. It was built by the manuals on IBM hardware, that included HACMP with quorum determined by a shared disk.

Nobody understood exactly how the cluster worked to the point that a correction my boss made on the physical connections, made us loose a couple of million of dollars in transactions not processed.

The funny part is, when the cluster was working fine, a takeover took at least 20 minutes. During that time nothing was "available". The thing is, no matter what, SWIFT Alliance took that time to properly close and open the DB.


I look at that and all I want to do is raise my eyebrow. That's like water cooling Celerons or heavy tweaking of Honda Civics - you're not doing all that for redundancy, you're doing that as a hobby and redundancy (or speed) are an excuse.

I've set up ISP redundancy on my home network before, I should probably test to verify that it still works after my update some months back. It's a truly high-tech solution: A Netgear WNDR3700v2 router (5x Gigabit, dual-band, circa 2011) running LEDE (previously OpenWRT).

It's not automatic, but I can set it to act as a wifi client, so if my regular Internet goes down I can simply connect into the router, connect to a phone hotspot, and continue providing internal network access. I don't recall if it's able to act as both a client and an AP on the same frequency at the same time, but since my wife's Kindle and Chumby are the only 2.4-only devices in the house I'm not really that concerned about it either.

And yes, the Chumby does still work though it's just a clock these days.


It's clearly intended to be for enjoyment and practice.

Like the guys who make videos of sharpening a grocery store knife to an atom width.


that sounds interesting, link please?


With pleasure. The 'atom width' is hyperbole from me. Sharpening a $1 knife: https://www.youtube.com/watch?v=7dFFEBnY0Bo

And maybe you'll find this interesting: Sharpening a wooden knife: https://www.youtube.com/watch?v=kKH63_r0OCA


If anyone is still reading this, the $1 knife video is from JunsKitchen and a bunch of his other videos are great as well. I think I'd have to call them foodie porn.

And his cats are remarkably well behaved.


That's also what I thought. All this for 45 minutes of internet when the power goes out, and twice a lifetime 1 day time saving when something crashes hard (like a hard drive) and you need to restore from the backup. It has to be for tinkering.


I like your 'water cooling Celerons' analogy.

It is hard to beat a stock, as supplied by the telco, router with a generic Android phone for maximum uptime. If one connection is wired and the other is wifi then the computer handles broadband difficulties with no problems.

If you are actually serious about 'single point of failure' then you just need to live with someone that is likely to not pay the bills for electricity or broadband. Being insufficiently creditworthy to have better than a pay as you go burner phone helps too as every byte costs $$$. Living in an area where any nice toys will get stolen/destroyed also 'helps' as a refurbished laptop running linux is then only practical option. Congested wifi 'helps' too, a basic wifi booster with ethernet out becomes truly useful for 'blazing speeds', particularly if wanting your backup network to come from the local cafe or some neighbour with an easily Googlable password.

Having a local server for development and version control means that you are good to go when it comes to useful work even if there is no connectivity going.

For entertainment a regular FM radio works fine. Two refurbished laptops and a USB stick for bulk transfer of current project stuff makes it fully possible to pull an all-nighter even if there is no electricity due to bills-not-being paid reasons. A nice add is a Chromebook, those things designed for nine year olds with a battery that lasts 10 hours with no difficulty does the job with better wifi than any normal laptop, no fans and no thermal runaway.

Even better, the whole kit can be put in a modest backpack and a bit of couch-surfing later one can be back in business.

It is much more satisfying to do more with less, I would probably hate myself if I had a basement full of servers and only whiled away the hours on social media rather than do 'work'.

This budget ethos is anti-pattern but why should it be? The carbon footprint of operating on low-power refurbished hardware is penguin friendly and cheap. If your apps are supposed to be compatible with regular consumer PCs then it doesn't really help to have a beast of a machine with 4K screen, 32Gb or RAM and some quad Xeon. Maybe a linux toolchain with no virtualisation is better for making one's code performant on target devices. Obviously an SSD helps.

The kids and the grandparents can read books together if the devices are down. They can also listen to the FM radio. What's not to like?

Thank goodness I don't do company IT. Yes it would consist of two refurbished laptops hidden under the floorboards, servicing 50-100 office workers without any difficulty.


Just googled "Chumby" and I'm very happy to report that a Chumby looks as cute as it sounds ( http://i.imgur.com/bKSgZPA.jpg )


All those stickers add 5 HP


lol Chumby.

One of those quirky little devices that existed in this weird span of time when computing power was small and cheap, but our phones had not yet come to rule everything in our lives. RIP 2007-2012


Im still pissed sony bricked my Dash clock that was based on Chumby tech.


Hey how can i learn more about all this? id love to understand whates going on, on the github page, im following a bit but still want a better understanding the way you and the rest of the commenters have, what are some good starting points?

Any resources, books, links, youtubes especially, that you can point me to?

Also, in his set up, theres no router? He says hes using a VM? What does that mean?


Good move on having not just two WANs, but two technologies. I've seen setups before where people have had two wans, from two different ISPs, but both cables ran down the same duct in the road. Single digger took them both out. It would be a pretty severe problem if fibre and wireless goes at the same time!

I assume you're not running a full BGP handoff to each ISP, so any existing sessions will die should your WAN die (as your lan get natted behind a different IP address). Presumably your nat state will move over in the case of router failure as it's a floating VM of some sort, so what's the failover time for each component? How does it compare to using say VRRP?

How are you detecting ISP failures -- are you pinging beyond the next hop, or are you assuming if you can ping/arp the upstream router, it's working? I've had failure scenarios with ISPs where the next hop works, but nothing past that.

What benefits are there of tcpproxy over something like nginx (for http/s) or dst-nat (for other connections)?

It looks like all your traffic defaults to WAN1, and only uses WAN2 in certain cases. Do you have the ability to send traffic for a given client to WAN2 by default?

What type of queuing are you using -- can 1 client hog all the bandwidth?

And finally, what keyboard layout is 6 above N?


IIRC, the Unifi stuff as well as Meraki will do multiple ISPs. They do outbound NAT, and have a liveness check which is just a ping sent to the next hop. Ping fails, or the interface goes down and the device simply sends the traffic the other direction. Any established TCP sessions simply fail, but any new traffic will failover just fine.

I'm using this setup in my office. Easier than finding a last-mile type ISP that supports BGP.


Do you have any idea how the "upstream port" detection works on Unifi gear? While I'm waiting for the piece of Unifi kit that does PPPoE and DHCP, I've got their switch plugged into my old router - straight away the switch was able to work out that this was a WAN connection and none of my other traffic gets routed through that. To set the same thing up on LEDE took hours.


If you use the same ISP you can probably get a routing working. But you're not going to get your own AS for a home network, even if you find an appropate ISP to provide you transit.

Next hop checking isn't always good enough. I had a 7 minute outage on one line last week, next hop was fine, but outside the ISP network it all fell apart.


> But you're not going to get your own AS for a home network, even if you find an appropate ISP to provide you transit.

ARIN, at least, will happily assign you an ASN assuming you 1) meet the multi-homing requirements and 2) pay the bill for it.


Presumably the requirement includes having a couple of ISPs advertising your IP space, which I assume means having a /24. Can you still get those easily from ARIN?


The European version of ARIN allows IPv6-only networks. /24 cost you about 3-6k$ each, depending on if you can spare a month to get a good price or need it announced tomorrow morning for your AS.


Then set the liveness check for something further upstream of the ISP.

Getting an AS is easy. Getting portable IPv4 address space or an LOA to readvertise is more tricky.


Here in the UK I found you couldn't get an AS number without a VAT number (i.e. being a company).

(Of course, you can start your own company for something like 13 quid/year. Now that I have one maybe I should revisit that.)


Last time I talked to an WISP they were trying to get an AS number, they had a /21, but were still struggling.

Seems like a lot of effort to ensure your ssh session doesn't drop


Look at a QWERTY keyboard. Start with your finger on the N key. Move it up to the key above, H. Move it up again to the Y key. Now once more, move it up to the 6.

"Above" here is kind of incorrect, it's actually "beyond". Colloquially we say the keys are above and below each other.


I just assumed he was referring to a (USA-specific?) telephone keypad (where "N" corresponds to "6").


If you want to feel more inferior about your home lab, https://www.reddit.com/r/homelab is a good source of safe-for-work porn and information on over-engineered setups.


~10 years ago, I had a completely full 42U cabinet in my house, along with another 8U or so of gear and several devices that aren't measured in RU's (access points, Cable and DSL modems, VoIP phones, etc.).

Most of the gear was used for lab scenarios and such for various (Cisco, Juniper, et al) networking certs and was (mostly, but not completely) isolated from my "real" network. IIRC, I had ~35 VLANs at one point.

My extremely over-engineered home lab certainly served its purpose but I think I spent as much time maintaining it as I did actually using it, although it really came in handy for building out PoCs for projects I was handling at $work (my test/lab network at $work wasn't nearly as well-equipped as my home lab was!).

For the last several years, though, I've managed to get by with a single subnet that is shared by everything -- a few laptops, a couple desktops, a server hosting the handful of obligatory VMs, and, of course, the various phones, tablets, and streaming devices that are ubiquitous in all of our homes nowadays.

Just within the last few weeks, however, I've acquired a new server (2 x 10-core Xeons, 256 GB RAM, 4 "Enterprise" SSDs and 12 "Enterprise" HDDs (600 GB 15k SAS)), dug a couple switches out of storage in the garage, replaced my Internet router with a small industrial box running OpenBSD, and started building out a few more subnets for proper separation of various devices (I've twice been offered a 42U cabinet recently but, thus far, managed to say no!). Like probably most HN'ers, I've got a few VPSes spread out here and there as well. Finally, I've got a decent (but was over-built) 2U box in a rack at $work ($work == ISP) that I am planning to use to tie all of this together (using Wireguard, of course).

Yes, I'm fully aware that I'm in the beginning stages of a relapse. After these upcoming changes, however, I don't intend to "grow" this lab much larger (although this kinda stuff does just creep up on you sometimes).


You are not alone my friend.

I used to also have a 42U cabinet in my garage for several years. It housed a bunch of servers, mostly Dell poweredge but also some no-name boxes, plus some switches and other miscellaneous gear.

The power draw was too strong for my poorly garage circuit and after any power outage I had to power up the rack one device at a time - it was a massive pain. I also spent WAY too much time tinkering with it all, instead of actually using it in anger. Sure, it help me immensely doing PoCs for work or for my own learning, but it was always overkill. Funnily enough though, every other tech-head that saw it was envious, until I started detailing the horror stories of keeping it all running.

Thankfully Virtualisation became a usable and affordable platform for tinkerers, and I migrated everything (via a streamlined custom P2V process) to ESX, then later on migrated/rebuilt the VMs over to Hyper-V.

I now just run 2x Tower servers (HP 8xxx series workstations - dual Xeon based) and run 20+ VMs on each. Plus a single NAS for file storage. Life is so much easier... and the Garage is so much quieter.


" replaced my Internet router with a small industrial box running OpenBSD"

What box and how's it performing?



weird... that link is blocked for me in the UK (redireting to contentcontrol.vodafone.co.uk). Wonder what thats is about since its basically the network topology of a, albeit crazy, home network..


It's probably blanket blocking of imgur as opposed to the image itself.


Newer vodafone contracts have contentcontrol enabled by default (because the UK is now a nanny state), you have to call them up to get it turned off.


Is that the adult content filter?


I feel like this "article" should go there, not on HN. I mean, we all know what a server rack looks like?


Well, as of 11:42 on 4th July BST, 234 other HNs would disagree with you.


Good stuff. However - only one Linux router (VM) which means that you can't upgrade it and reboot without loss of service. The way around that is two VMs and VRRP or similar and a lot of very complicated NAT and firewall rules.

Out of the box, pfSense can do multi WAN and CARP (similar to VRRP) clustering. At the office I have two older servers with lots of NICs and five WANs. Inbound redundancy is provided by dynamic DNS and SRV records etc. Note that to do CARP/VRRP, you do need at least a /29 IPv4 allocation. You need an address per box plus the virtual one that is actually used by services. PPPoA/E is harder to deal with than cable/leased line etc but it turns out that low cost Billion 8800NLR2 can do external IPv4 pass through as well as do the PPPoA/E. They will need an address as well from your range. You need something like them in this case because only one device can be the PPPoA/E dial up system at a time. Unless you have some very fancy secret sauce, your clustered routers' pppd or whatever are going to get confused as to who does what.

I notice you have a cloud key. Unifi on an Ubuntu VM is easy, and much easier to backup and snapshot before upgrades, so is safer. You can also front it with HA Proxy for simple URLs and perhaps Lets Encrypt. pfSense has a HA Proxy package with a GUI and I believe it is CARP friendly as well ...


Unfortunately, OP is using Centurylink fiber. It's been a few years since I lived in Tacoma, WA and used this service but it's something like PPPoE over VLAN. There was a FreeBSD bug a few years back where PPPoE was ridiculously slow when running on top of a VLAN interface. OpenBSD did not have this problem, which is why I ran that for a firewall instead of my preferred pfSense.


I have four FTTC (PPPoE/A) WANs and a BT (UK) leased line at work. The FTTCs are 80/20Mbs-1 and the leased line is symmetric 100Mbs-1. I've put all five WANs down a separate 802.1q VLAN. Each of my routers has one physical NIC (Intel 1Gb) dedicated to WANs. The other nine NICs, each, are for internal VLANs.

I use Draytek 120 or 130s modems for single ADSL or FTTC connections but for CARP clusters, I use Billion Bipac 8800NLR2, so I am not doing the PPPoA/E on the pfSense boxes. The Billions are able pass through bits of a /29 and do the PPPoA/E themselves - the only cheap router (~£60) I've found to do this.

I've been running this thing for about four years now. PPPox is a complex beast and there are a few things to look out for such as MTU. PPPoE imposes an eight byte overhead (hence 1492) and back in the day some ill advised auth mechanism required setting a 1458 byte MTU. Apparently, some BT kit supports mini Jumbo frames of 1508 bytes which means that you could set your MTU to 1500 instead of 1492 - good luck with that as a rule of thumb. $DEITY only knows what an ISP in WA has arbitrarily decided to mandate. Here in the UK we have a near monopoly for the infrastructure but lots of providers that use it and so it should be simple. To be fair, I bet you don't get docs like this: https://www.btplc.com/SINet/SINs/index.htm (498 is FTTC)

Anyway, if you are happy maintaining your firewall rule set manually then crack on but nowadays it is hard to do that. pfSense has a lot of quite vociferous users who kick the tyres on a regular basis. It even looks quite pretty these days - all bootstrapped up and stuff, the red thing is long gone.


> The FTTCs are 80/20Mbs-1 and the leased line is symmetric 100Mbs-1.

What's the "-1" in "80/20Mbs-1" and "100Mbs-1" signify? I've never seen this "syntax" or formst used before but maybe it's an EU/UK thing (I'm in .us, FWIW)?


OP has a scientific background.

-1 is meant as "to the power of -1". Thus, s-1 becomes 1/s, and the entire thing Mb/s

Never seen that either


OP has a scientific background. - LOL - I have an HND (technician) in Civil Engineering and I am now the MD of an IT consultancy (obvs). I picked up the habit of using s-1 etc when studying Physics 'A' level (UK) many, many moons ago. Not too sure why I persist with it these days but I dimly remember liking the fact that you can use basic arithmetic on superscripts. To be fair I should put s^-1 but s-1 is reasonably obvious.


Sorry, "s-1" means "per second"


Only thing missing is a chaos monkey to randomly power down devices to make sure everything still stays available.


There is a child present.


The original chaos monkey.


... with water balloons.


Now we're getting into Chaos Gorilla territory.

For those that assume that was just a joke on escalating size, the joke was actually made it a real thing by Netflix when they actually named the component that randomly shuts down not just services, but entire AWS availability zones of Netflix services.

Child with a water balloon? Hope you have multiple data-closets in your house...


Nice setup, but we can all pretty much agree it's overkill for most. My ISP is fairly reliable and outside of infant death, most network elements have a pretty long MTBF.

I run a similar set of WiFi gear. I've a couple PoE powered Unifi UAP-AC-Pro spread around the house, all connected to an 8-port Unifi PoE GigE switch. Routing is done with an EdgeRouter lite, which as it turns out is capable of line rate GigE.

I have a low power industrial computer with 4 cores and 8GB memory that runs various services mostly via docker or vagrant. It consumes about 12w.

It's all powered by a 750VA APC SmartUPS. I get almost an hour of runtime on the internal batteries. I may add some external batteries at some point, but most power outages in my area don't last longer than 20-30 minutes.


Power outages are fairly common in my locale so that's what I've primarily optimized for. Cable modem + WiFi hub on one UPS, desktop on another. Desktop stays on through short (<15 minutes) outages, wifi+internet for 3-4 hours. Power is still my primary point of failure, with probably 1-2 days of outages longer than 3 hrs per year, although in many of those cases the cable will also go out.


> 1-2 days of outages longer than 3 hrs per year

Not trying to be a dick, but does that count as "fairly common"?


I mean, it's not common enough for me to spend the money on a backup generator, but it's common enough that you need to at least consider how long you can stay in a house without power, in what weather. Eg, my house will stay warm enough that I don't need to worry about me or the pipes freezing after 1-2 days without power in the winter [although it gets quite nippy after ~18 hours]; if the power is off for more than 3 days in the summer I need to do something or the chest freezer will defrost enough to spoil).

Shorter power outages are more common; 10-ish power interruptions of less than ~2 hours per year.

It's not like, developing or failing nation bad, but it's not great, especially when the problem is always "a tree limb fell on a wire".


I think so.


Everyone has different needs of course,

My home setup:

hardwired all the desktops and a few access points via cheap 1gbit hardware (literally found some at the thrift store/ebay), usually using tomato/shibby.

have a backup router.

battery backup on main routers/modem.

large external battery wire nutted to my desktop UPS.

NAS is an old laptop with battery intact, doubles as second display/machine.

use my phone via usb on my desktop if all else fails.

total cost, probably less than $100.

Oh, and I use a $5/month server for stuff that absolutely needs to be on full time. Otherwise the only external access is me occasionally remoting into my desktop and I am happy to stop and smell the flowers if that is interrupted briefly.


I have an even simpler setup: if my cable connection dies, I simply tether my phone to replace it. There are no UPSes because both the laptop (TP25 w/ 24 + 72 Wh batteries) and the phone (it's a Moto Z Play with a battery mod) have large enough batteries to last much longer than a domestic blackout in downtown Vancouver.

My laptop is enough for me to stay productive (it's a ThinkPad 25! very productive). Everything that needs to be online is on a Hetzner server I rent for all sorts of purposes so the 51 EUR monthly bill kind of spreads out.


I've been there, splurged on an alienware 17 a while ago, but mostly I only use it on the road now.

I went with desktop because I wanted everyone in the house to have a decent machine and I could get several I5s for less than $70 apiece (5 machines, one in each bedroom) and wanted easy/cheap upgrades for some of them, and they are all the same optiplex model, which makes my life easier.

I like my desktop setup a lot though, 3.3ghz I-5, 27" 1080, 16 gig ram, 1tb ssd, 8tb in "cold storage", g402 mouse, gt710 vid, clicky keyboard, Nubwo N2 headset, decent posture, 100+ fps gaming. Probably threw $500 at it above the initial $70 though, but most of the machines didn't get that treatment, but their users aren't using it to make a living either.


Posture wise I am normally using a Matias Ergo Pro mounted vertically and an Evoulent Verticalmouse and of course an external monitor. But, in a pinch / short travel I can just work on laptop. I tried desktop before but since everything I work on needs to be on laptop too, the necessary sync becomes old quick.


Fun solution, but seems like overkill for just about every home user.

I used to use a dual-WAN setup with cable modem + DSL backup. It worked well with automatic failover. I use a pfSense APU based router and, with no moving parts, it's been very reliable, nearly 4 years without any unscheduled downtime.

Then I moved and only had a single ISP to choose from, so my backup is to manually turn on a Wifi hotspot. I thought about using a cellular router with ethernet or a wifi connection to the hotspot for auto-failover, but it just wasn't worth the time and/or money to set it up -- if I'm home when the internet goes down, I can just switch to the hotspot, if I'm not home, then all I really lose is the ability to control the lights and thermostat remotely, not exactly a critical function.


> seems like overkill

I think that's quite the understatement. The thing that really stands out to me is the claim that all of that is only drawing 220W at idle. I'm curious if he means truly idle, like literally just booted up and not doing anything at all, zero traffic, etc. Or if that's the draw with stuff actually being used. Because 220W just for your home network is hilarious. I mean I feel dumb often because my little pfsense box pulls about 15W.


PfSense or something like an out of maintenance Fortigate is an easier solution


This was as all fairly straightforward to implement a decade ago on cheap hardware and cheap switches running OpenBSD on pair of ALIXs and pair of semi-cheap net gear switches. Full firewall and VPN fail over using pfsync and sasync, IP failover with CARP.

You can do load balancing using PF as well, which is what we were mostly offering, cheap fault tolerant hosting for colocated customers.


Much of this exercise was me playing with Ceph, which is pretty impressive.

Having VMs float around with shared storage makes complexity elsewhere go away. i.e. I don't need to deal with CARP, VRRP, etc.


Yeah, I noticed it was floating VMs, which is an interesting way to go. On one hand, it's less parts to go kaput, on the other hand, those parts need to be more robust.

The main thing that might make me shy away is the added exposure at the edge. If the VM hosting is dedicated to just the network failover/firewall, it seems wasteful, and if it isn't it seems unnecessary exposed.

The only other thing I'm not sure of, since I'm not too familiar with AL the VM solutions nowadays, is whether an actual hardware failure of the active VM hardware allows seamless failover (which you do get with what we were doing back in the day).

Edit: although, it's not hard to emulate the stuff we were doing using some OpenBSD virts on those two boxes, which even if they don't support full hardware failure with the current setup they then would. Since you're playing with the for fun, you might be interested in trying it. If you find OpenBSD intimidating, you can use pfsense to do the same, which is a dedicated GUI configured FreeBSD distro that offers much the same (there were some CARP implementation differences/bugs in FreeBSD way back, but I think they got fixed up long ago).


Some alternatives:

* Cantenna/laser link to a house some blocks away to avoid local WAN link disruption

* For less performance-intense networks, remove the physical impediments: 2 routers, each with 1 APC, connected to 2 separate power circuits, connected to 2 WAN links, providing 2 radios each. No switch to go down or cables to trip over, redundancy of access point, redundancy of frequency/radio, redundancy of WAN link, redundancy of power. Hardware-wise this is pretty cheap and still highly available. If the routers are cheap, use a hardware watchdog.


I also thought having everything on UPS would allow me to keep an Internet connection during a power outage. Turns out that when the power goes out so does my ISP. Having a second ISP on LTE or Wifi like this setup may or may not be enough to fix that.


Looks like a pretty resilient setup... But can it handle an Ethernet pause frame broadcast flood https://github.com/nwholloway/mpcp


Very cool configuration.

I attempted something similar to this in a 20U cabinet some time back. The biggest issue is the fan noise that 1U form factor servers and network gear produce, with their rather high RPMs. One can hear the noise across the other side of the house.

We've since switched to fanless network gear and ATX form factor servers with large diameter fans to keep the family happy. It definitely doesn't look as nice, though.


You can get pretty much the same result from a couple of fanless routers (mikrotik, something running ddwrt, etc) -- resilient against hardware failure, power failure, and wan failure.

Not as cool though, and clearly not running any servers, but that's what things like AWS or Linode are for -- or for low power stuff, something like a fitlet [0]

[0] http://www.fit-pc.com/web/products/fitlet/


>but that's what things like AWS or Linode are for

If your home is directly connected to their datacenter...

Not everyone has 10 Gbit upload with best peering!


Yes, for home server use yes, I was thinking of public facing servers.

I'm happy with a QNap as the only home server I need.


> ATX form factor servers with large diameter fans to keep the family happy.

It's insane how quiet you can go with this approach, while remaining air-cooled. I know when my home server is running backup scripts because the noise increases at least tenfold when the hard drives spin. Fortunately, I have coordinated that to be only once a day -- the rest of the time the drives are in standby.


Modern servers can be decently quiet as well, assuming you don't run them at 100% load all the time. I've currently got 2xR210 II's, 1xR520, 1xR320, and a Juniper EX2200-48T running in a rack right behind me. It's audible while I'm in my office under normal load, but as soon as I leave the room and close the door you can't hear a thing.

It's not whisper silent, especially during the summer when the fans on the R320 speed up to around 6000RPM (and this is with a E5-2430L) - but that's mostly due to my office remaining closed from the rest of the house leaving the ambient intake temperature around 75-80F (rest of the house stays at 72F). I'm probably going to stick with 2U's (probably R520's) when I start expanding again to lower the noise at higher temps, since the more equipment I add the more heat gets trapped in the room.


> I love Ceph so much...

Clearly hasn't been bitten by it, yet.

I mean... I love Ceph, too, but I don't ever want to run it again.


Can you elaborate?


Sure. It's an extraordinarily complex system that's difficult to engineer correctly. It provides extraordinary durability, but the radius of failure isn't obvious. Pro tip, it's the entire cluster. As such, an issue with an OSD in one pool could potentially cause the entire cluster to have issues.

Recovery is difficult and there's no support unless you have a subscription from Redhat and also run RHEL plus their stable distro of Ceph (RedHat Storage or whatever). IIRC, they quoted me $90k for a petabyte of raw disk.

I haven't messed with it much in the last couple of years. Bluestore looked really promising. I've thought about taking a look at rook, but haven't yet.

If I were in a position to deploy a bunch of storage on bare metal again, I'd likely go with ceph. I do know that $GLORIOUS_FORMER_EMPLOYER ended up making the migration to ScaleIO and report being happy with it and having good performance.


That was insightful, thanks!


If you taunt it, even by accident, it has a habit of biting back. And as it has your data, you don't really feel too comfortable just nuking it in that case.


I lol'd while nodding vehemently in agreement.


Have you also given yourself a mobile equivalent for those times when you are traveling, or when your primary environment is unsuitable and you must work at a place with public WiFi?


Neat! But to be honest, it's way more than I'd ever invest in a home setup. I manage an entire office of ~30 people with much less redundancy than this!


Couldn’t all this complexity be replaced with a ubiquiti edgerouter or a prosumer router that’ll balance the links for you?

This is more of a homelab tinkering setup to learn.


Heh. Ubiquiti is complexity - you really have to use all their kit to get the benefits.


Awesome setup Brad. I wish I had a tenth of that speed. I have Verizon DSL (1.5 Mbit Down and 700 Kbit up). They advertise it as 3 down and 1.5 up, but I've never seen that. That's the best I can get in rural Virginia. I do use SQM on a Ubiquiti Edge Router X to fix buffer bloat, so latency is very good.

And thanks for all the Go code. It's awesome! I'm building 1.10.3 on an old Beagle Bone Black right now ;)


It boggles my mind that I can get 80/20 fiber in semi-rural Scotland, and so many Americans are stuck on really crappy DSL connections!


I've worked for a company that had similar storage and VE. ProxMox on MooseFS. I would prefer Ceph, but they are both pretty sweet! Awesome Lab!


"Past failures

I used to use a Soekris net6501 as my home gateway, but its CPU maxes out NAT'ing about 300 Mbps, sadly, so I started looking at alternatives when I got Centurylink fiber.

I used to use a UniFi Security Gateway Pro but it failed one day and wouldn't power on any more. Dave had a backup for me handy, but the Unifi controller software wedged itself and wouldn't let me remove the old (dead) one ..."

There is much adoration of Ubiquiti hardware on forums and message boards. I do not doubt for a moment it has been well-deserved.

However, I have a question about the software. I would like to use own kernel and custom utilities.

If I understand correctly, installing one's own choice of OS on Ubiquiti hardware is not always possible and even if successful it carries a penalty in terms of performance versus retaining the Ubiquiti pre-installed proprietary OS.

Soekris made it easy for the user to install the OS of her choice. Tradeoff: More user "control", but a slower router.

The question is: Are there other alternatives to Soekris that can exceed 300mbps and allow for user-chosen OS?

This is another line of (faster) routers where the vendor has allowed for easy installation of user-chosen OS.

https://protectli.com/product-comparison/

There are comments in some other forums and message boards about these computers but I have not seen this company discussed on HN before.

Note the website claims models FW1, 2 and 4 have no Intel ME, SPS or TXE.

https://protectli.com/kb/intel-management-engine-vulnerabili...


Hey textmode. I'm still very very new to this - I jumped from the Turris Omnia [0] to the whole kit and kaboodle of Unifi gear.

I don't think Intel ME is at the top of my threat model - by the time someone's using that kind of stuff on me I'm screwed anyway. I do, however, pay insane prices for power (28-34 cents AUD per kWh). This has pretty much meant I look for ARM and MIPS devices everywhere, but the latest gen Intel stuff is looking good.

I hadn't seen those Protectli boards before and they look quite cool - I'll keep them in mind. At full tilt, it'd cost me about $85 AUD per year to run.

If Marvell ever open sources the switch drivers for the Espressobin [1] [2] then that may be an option to exceed 300mbps.

0: https://omnia.turris.cz/en/

1: http://wiki.espressobin.net/tiki-index.php?page=Topaz+Switch

2: http://espressobin.net/


I think the redundant outlink dwarfs all other improvements mentioned here. All but one of the incidents in my home have been due to ISP or optical fibre company issues. (Which is not surprising -- they have many more miles of cabling to maintain than I do.)


This is a lot of expense toward high availability while only having 30-45min of backup power.


Once upon a time I would have been very envious of this set up. Now I just shudder to think of the hassle of maintaining all of this.

Don't get me wrong, I still have highly available Internet at my house - I just tether my laptop to my phone and I'm done.


Since my internet (Fios) is way more reliable than my power, I'd first need a whole-network UPS before worrying about internet redundancy. When I do lose internet - which almost never happens - I switch to using my smartphone as hotspot.


It's great that you have documented this process, especially the failures section, not enough people do this in my opinion. However, it really annoys me when people make these blog style posts on GitHub. Sorry OP, I for one disapprove.


What's the problem with posting on Github? I could see several benefits for it: no ads, source control, easy edit from the web page, notifications for your followers...


I just skimmed the original post, but I didn't see an off site data backup.

Maybe you missed the New Yorker article entitled 'The Really Big One' [1]

[1] https://s831.us/2KyfcEw


OP here. I mirror all my data to Amazon S3 and Google Cloud Storage too. Or rather, Perkeep (https://perkeep.org) does this for me.


I have 3x Asus OnHubs running Google WiFi and they deliver GigE from WebPass fairly easily.

When that fails I switch to my iPhone. :)

(On a more serious note, I’d like to see the basement or whatever with raised floors. Come on Brad. ;)


I went to the page to read details about how he load balanced upstream connections, or if he was using heartbeat or whatnot. I didn't find that, but what I did find was a gratuitous amount of kit that made me happy my infrastructure choice at home is much, much simpler.

My setup is Comcast going into a simple, reliable Surfboard modem, feeding a Google Wifi setup. If it goes down, which it just really doesn't do, we can use cellular data.

Complexity is the enemy of availability. Keep it as simple as possible, but no simpler.


(But, then again, my favorite home router is a Bosch :-)


That's a beautiful setup, but I'm curious... do a lot of people around the world still struggle with regular internet downtimes?

I can hardly remember the last time that my internet connection cut out... but if I had to guess... it was probably during the peak of a 100 year storm we had a few years back that put the entire area underwater for about 48 hours.

Transformers were blowing up all over the place, the power was out for days in some areas, and yes the internet went out as well at that point.

I live in the GTA FYI.


I look at this setup and say to myself that this is just the wrong way to do it. A 'floating' vm to NAT and route? Ceph does look very nice but I have no need for anything but file based storage.

Here is my top down take on a more traditional (cheaper) approach. * 2 1G 5 port edge switches * IDS * vrrpd balanced cots NAT routers -w- RIPng + nginx as generic and web proxy. * LAN 1G 12 port switches (1 hot, 1 cold) * 2 synology NAS (redundant, manual failover). * etc...


* The whole setup including all APs and switches draws about 220 watts idle. Power is pretty cheap in Seattle. Washington State (as of April 2018) has the cheapest electricity in the United States, at $0.0974/kWh.

https://www.electricitylocal.com/states/washington/quincy/

The average residential electricity rate in Quincy is 4.85¢/kWh.[1]

4.85 << 9.74


You should look into pfsense running as a vm on multiple hosts. You can sync the configs with CARP. It's pretty solid, we use this setup in a couple of data centers, few years with no downtime, and has failed over several times.

https://www.netgate.com/docs/pfsense/highavailability/config...


In australia I just got the new Telstra "Smart" modem. Has a built in 4G sim as a fall back when ADSL is down. Doesn't cost any extra. Pretty sweet.


sweet until a backhoe takes out a fibre and your entire exchange/SAM ends up saturating the 4G network, resulting in negligible network connectivity, and two heavily disrupted networks.


Is this an overkill setup for Twitch/Youtube streamers?


This guy doesn't appear to be a Twitch streamer. Aside from his rack having stickers for Go, Kubernetes, GitHub, and more, his Twitter description doesn't say anything about that.

If you're asking about in general would this be a good thing for a Twitch streamer... then I would say no. Mostly because most Twitch streamers are not going to know how to maintain something like this and they don't need all the servers.

If someone not so technical, Twitch streamers included, needed the redundant internet I would recommend something more along the lines of two ISPs like this guy (specifically over two technologies if possible: fiber and wifi, but that comes down to bandwidth requirements) but instead of going into multiple switches and having 3 servers running with VMs moving around just plug the two ISPs into something like the Unifi Security Gateway (USG) or USG Pro.


it's Brad Fitzpatrick: founder of Livejournal, Golang core team member, original author of memcached, SWE at Google.


Thanks for the info! :)


I have a redundant 2 ISP setup, and use multipath TCP to use both of them at the same time.

A very outdated post about my setup : https://www.sajalkayan.com/post/fun-with-mptcp.html

I now have 2 broadband ISPs, and optionally I can hook in my phone's 4g into the mix.

Multipath TCP allows me to "mix" bandwidth of both ISPs at the same time.


We found that using a server as a router was not very robust. We were getting strange problems all the time. The speed wasn't that great and finally we replaced that with an off the shelf router and all the pain went away. I know this was a software / configuration problem but we couldn't get it to work well. Has anyone else encountered these sorts of issues? If so did you manage to get it working well?


That's pretty vague. A server (no details) didn't work as well as hardware (no details). Lot of missing info there.


I found that using an off the shelf router was not very robust. I was getting strange problems all the time. The speed wasn't that great and finally I replaced that with a server and all the pain went away.


> We found that using a server as a router was not very robust.

Yet plenty of folks (myself included) have been doing exactly that for well over two decades without any major issues to speak of.

You are likely correct that it was "a software / configuration problem" but the lack of any actual details or useful information makes it impossible to offer any potential insight; baseless speculation is the best you may hope to receive.


This makes my mediumly-available remote access home setup look even more like child's play than it already does :) https://www.whoisdylan.com/sitdown/2018/05/31/connecting-to-...


This looks impressive but it doesn't seem to account for hung ISP modems. It's a pretty common issue with consumer-grade service. If not handled properly (e.g. power cycling) eventually both connections might end up inoperable. Personally, I use a smart power switch that will cut off modem power for a minute if pings start to fail.


Although, I don't understand the details, this is pretty impressive. But the real question is why do you require it?


> The primary goals of this project are...

> to have a highly-available home Internet setup, with no SPOF (Single Point of Failure)

> to learn and have fun.


Nice that he is using Proxmox again.

I was researching and experimenting: What hyper visor is out there providing a good file system (zfs) and also full disc encryption at the hyprvisor level?

tldr: FreeNAS

And it came out that this is not that trivial.

You can buy a Vsphere/ESXi license for encryption, but (probably) don’t have the same capabilities as ZFS.

You could use Hyper-V and have encryption but no ZFS.

On the other side there is Promox (Debian 9 stretch) which has an installer which uses ZFS (but no encryption). You can jump to some hoops and make a manual Debian 9 Installation with ZFS and luks (for the encryption) and then install Proxmox. Then you have to watch out to use the ZFS version Proxmox uses (instead of the Debian version)

You could use OmniOS, SmartOS to get ZFS, but again no encryption out of the box.

Solaris 11 has the ZFS and encryption part figured out, but the hypervisor part is not clear to me.

So FreeBSD has ZFS and encryption (GELI) figured out as well. For the hypervisor bhyve. Still there is manual work.

Then there is FreeNAS. It has ZFS, Encryption -and- hypervisor streamlined. :)

Some people use it as a VM guest inside Proxmox/ESXi, pass through their discs and from FreeNAS Export either NFS or ZFS over iSCSI back to the hypervisor to use as a storage pool.

Or as I found out, FreeNAS 11 has the bhyve hypervisor built in. You can have FreeBSD jails for BSD and Linux, or full VM guests via bhyve like Windows or Docker/Kubernetes.

FreeNAS ships with RancherOS as the minimal Linux vom, which can act as a Docker host.(if you don’t want to setup your own)

So for our use case of having a safe file system and full disc encryption and be able to launch VMs, and to have this very easily installed on an USB stick with minimal configuration and excellent documentation, I would recommend trying it out.

Of course Proxmox has live migrations, which is not figured out here. Probably Kubernetes would help.

Probably the other good way would be to have drives and a mainboard which support encryption at the hardware-level. Or wait until zfs on Linux v.0.8 is more in use. It contains encryption support.

https://doc.freenas.org/11/vms.html


bradfitz, any idea why the soekris maxes out at 300 Mbps? I have been looking for info on that since thats my gateway (PFSense) at home and I think its limiting my speed since I recently got gigabit fiber. I might replace it with my espressobin running OpenWrt.


Hey, another Espressobin user! Did they ever fix the part where PCIe will kernel panic the machine?

I think you'll still hit bottlenecks with the switch on the Espressobin - Marvell hasn't enabled hardware acceleration, at least for the open source parts.


I wonder if one can multiplex over several connections (including wireless) to get better throughput when they are all working, and then simply reduce to one of them if the others fail?

Can someone write up exactly how to set something like that up, maybe show us some urls?


I'm curious about the redundant power setup. Does each server draw power from both PDUs? Or do you have two servers on one PDU, and one on the other?

With three servers, if you have two power failures then the Ceph monitors will no longer be able to achieve a quorum.


I like the technical aspects. However: 220 watts idle power consumption? What a waste of resources.

In practice using a Wifi-router with 4G fallback would achieve similar availability at a fraction of the cost and power consumption.


We should be thinking about internet with large latencies, such as if you're traveling in outer space. How would you design for that when you only have intermittent connection?

What would you cache & how much?



Well, I perhaps shouldn't have said outerspace. I also meant locally, such as on a sailboat or in an area with inconsistent internet. Or maybe only connect to the internet once a day. Would be cool to have a good setup for that.


If you follow the links on the page: https://en.wikipedia.org/wiki/Delay-tolerant_networking


Hrm. I have AT&T fiber. It does not go down. Ever.

OK, it went down once right after install but that was due to a tech accidentally disconnecting me at the node while connecting a neighbor.


> Washington State (as of April 2018) has the cheapest electricity in the United States, at $0.0974/kWh.

I'm in the Atlanta area. $0.07181/kWh.


I believe Washington has certain areas that are the cheapest in the United States, but it's not state-wide.

For example, Grant County PUD for residential customers: $0.04547 per kWh


Chelan county is the one with the cheapest electricity ~0.035/kWh


Which company is that from, and did you include the surcharges? I'm on Georgia Power on the R-22 residential tariff, and I pay about twice as much as the "headline rate", once I've added them all in: http://www.psc.state.ga.us/calc/electric/GPcalc.asp


Just out of curiosity, how much does this all cost?



Too bad he doesn't detail monthly cost. It's likely to be more than $10k/yr.


He basically does detail monthly costs. You're implying it costs him about $833 a month to run.

His gig internet is $80 a month: https://www.centurylink.com/fiber/plans-and-pricing/seattle-...

His wifi backup internet is $40: http://www.gigabitseattle.com/residential-services

He specifically states the setup draws 220 watts at idle and that his electricity costs $0.0974/kWh. So 22024/10000.0974 = 0.514272 per day, or about $15.40 a month at idle.

So around $135 a month.


> 220 watts at idle

yea if it's idle the entire month, which is doubtful. but even if it's not, it's not likely to be too much more than the $135 you calculated. I figured the internet service would have been more, since the rest of us get screwed by our ISPs on costs.


Yeah, $80 for gig internet... I wish :(


WebPass in SF is $60/month for GigE. It's kind of amazing.

No modem, just an Ethernet drop into your home.

More

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: