Hacker News new | past | comments | ask | show | jobs | submit login
Self hosting is important (dataswamp.org)
566 points by hucste 55 days ago | hide | past | favorite | 318 comments

As someone who self-hosts a lot of different apps, self-hosting is really a slippery slope. Once you start enjoying the control over the system and data, you want to self host everything.

The most important aspect is the security and you learn this by doing it.

My entire self-host apps are hosted behind a private VPN called Pritunl, it provides self-hosted corporate VPN like setup where you can manage users and access to servers.

I host these following apps/products right now:

- Pritunl (corporate like VPN)

- Superset (Analytics)

- Bitwarden (Password Manager)

- OpenVPN with Pihole (Personal VPN with Adblock)

- Wireguard with Pihole (Personal VPN with Adblock)

- Drone.io (CI/CD)

- Posthog (Web Traffic Analytics)

- Papercups (Web chat support)

I think the team dynamic can be just as important from the “host everything” standpoint. Hosts generally have incentives to automate manual processes, and a diverse set of customers pushing to make that automation sane, for some value of sanity.

There’s a struggle against manual processes in self hosted environments, or aggressive automation with bespoke or otherwise incomplete tools. What you want is glue code holding together open source tools without too much abstraction over the top. You should always have a hint what’s going on underneath. I find myself having to spend way too much social capital on this.

While I much prefer self hosted, there is a clear advantage of third parties inasmuch as you can bond over the stupid things their solutions do, instead of driving wedges between teams by engaging in that kind if catharsis.

Out of curiousity, what is reason to host three different VPNs?

I have my own bootstrapped startup since a year and half and I manage the entire infrastructure for my startup.

My startup focuses on big data projects and is currently building a web-based Bloomberg Terminal alternative(https://quantale.io) which is an infrastructure heavy project.

here is why I use 3 VPNs:

1. Pritunl is used as an enterprise VPN setup. Using this VPN, I provide access to different webapps/self-host to different users in my team/clients. for eg, the stage version of Quantale is hosted on a server(with ufw rule to only allow connection from pritunl VPN IP) and I can provide access to you by creating a vpn config for you and you will only be able to access stage server, you wont be able to access any other servers behind this VPN.

2. OpenVPN with Pihole, I use this VPN as a personal VPN. This VPN blocks ads and trackers using PiHole and my self-hosted password manager is only accessible via this VPN

3. Wirehole with PiHole, this is a backup VPN to access my password manager in case I loose access to my OpenVPN server.

Nice, Posthog is just what i was looking for. Awesome!

Is this for personal use, or to supplement other activities? Probably to use with something else — I really don't see why you'd want analytics stuff for just yourself.

I have my own bootstrapped startup since a year and half and I manage the entire infrastructure for my startup.

My startup focuses on big data projects and is currently building a web-based Bloomberg Terminal alternative(https://quantale.io) which is an infrastructure heavy project.

The Superset self-host analytics is used to provide analytics services to a client in manufacturing sector.

This article touches on something interesting: community hosting.

I'd like to explore that. Specifically, the idea of small communities where a group of people maintains the underlying tech, and - kicker here - everyone in the community knows more or less everyone else in the community.

That offers a bit more security/safety/continuity than just self-hosting everything, while still not ceding control to a faceless corporation.

Granted, there will always be other reliances outside of the community - like internet and electricity providers - but a line has to be drawn somewhere.

Glad to answer questions if you'd like. I started and helped run a bandwidth cooperative from 2000-2015 or so. I decommissioned my last box in the coop at the beginning of this year.

The basic story, though, is that before the dot-com crash, a lot of SF nerds kept their pet projects on work bandwidth. That became risky during the crash, so I and some pals rented a fractional cabinet in a colo provider and split the costs. I think we ended up using 4 providers over the years and peaked at a full cabinet, almost all 1U servers.

I was glad I did it and at the end I was glad to be done with it. A co-op is hard to wrangle and it's basically impossible to make sure that the workload is evenly spread, so you have to be comfortable with the fact that somebody, probably you, is going to be doing a bunch of unpaid work, even if it's only keeping track of what needs doing and herding people into doing it.

Eventually, I decided running physical hardware was more hassle than it was worth to me. Trying to solve mysteries like, "Why does google sometimes decide my email is spam" was a multi-year effort that I never did solve, even though I knew people at Google. And I grew to dread the chance that something would break and I'd have to rush down to the colo, possibly having to return from vacation (or beg a friend to be remote hands). So eventually I shifted some of the stuff I was hosting off to service providers (yay Fastmail!) and the rest into Terraform-built slices of AWS.

I do sometimes miss the ability to fully run down a problem (e.g., by looking at mail server logs). But mostly it's a relief. I'm happy now to get my hardware kicks on things where uptime doesn't matter.

This is one of my main arguments for why we need to fill the gap of both turnkey disk arrays and data replication. Family photos, especially of kids, should not have to be in Instagram or Facebook, if you have three members of the extended family with any basic technical chops at all. You should be able to self host a triply redundant copies of the family photos, complete with bandwidth aggregation.

People in my parents’ generation all have stories of some grandma’s house fire eating the family hoard of photos, including the only copies of Great Grandpa Frank as a child. We don’t want Uncle Steve losing those pictures just because his house is in the 100 year flood plain.

I'd be happy with an E2EE solution that could be deployed on a cheap VPS. Unfortunately, all the photo and video hosting solutions I've come across lack E2EE. Building it out myself is a daunting prospect because it would mean putting together mobile apps for both Android and iOS that supported it as well.

It should be fairly trivial to deploy a number of existing FOSS photo hosting solutions to a server in your house though. Syncthing should make for trivial replication between the local instance and one at a relative's house. It recently gained support for E2EE (ie untrusted servers) so you could even throw it up on a cheap VPS as an extra backup and to allow for more reliable distribution between nodes. There's no bandwidth aggregation to be had here with current tooling though.

What software would you use for incremental sync from phone of photos and videos though? Client applications have been my greatest struggle. I use nextcloud but their app (android) has plenty of flaws when the only purpose is one big rsync (no mirror, I will delete photos from my phone!), but all the options I find required foreground interaction or full directory sync both ways, which requires all data to be on phone too

Add Video to that list. Family movies, etc.

Or how about accessing your personal library of data while outside of the house (like while visiting family)?

Backups and professional media work from home? Those need upload too.

Consumers need symmetrical data connections, or at least something much closer to symmetrical, than any ISP (in my area at least) has been willing to provide.

> everyone in the community knows more or less everyone else in the community

Sometimes "impersonal" is a feature, not a bug. I really don't want community sysadmins with access to logs of information about other community members. That has much more potential for abuse than a more impersonal service with a stricter expectation of privacy.

Create a good community encryption policy that protects members from each other. The kind of thing a larger org would never do because it might be legally prevented, or wouldn't want to completely exclude the possibility of future monetization opportunities.

That’s true, but consider the example of small towns: just like what you’re suggesting, there are no secrets.

It’s interesting to see what happens to social connections and expectations when we grow beyond the number of people we can meaningfully connect with.

> That’s true, but consider the example of small towns: just like what you’re suggesting, there are no secrets.

Are you suggesting that small towns where everyone knows everyone and people gossip a lot (for better or for worse) are equivalent to communities where a sysadmin knows everything everyone does on the internet?

I'd say that it's like a small town where the Sheriff knows just about everything that happens.

A lot of universities' computer science departments do something like this. They'll have a cluster of machines in a room somewhere for undergrads and grads to SSH into and use as they please. Those are usually run by IT, but grad students in ML fields will often have another set of machines with specific GPUs in their own offices that are completely student-run.

I've been into this for a long time. I think that 50-250 families can support their own sysadmin, someone who works directly for them and manages all of their tech and interactions with the internet.

true, but this would be ideological rather than economical, which is why it would never happen, unfortunately.

a sysadmin will commoditize the process, make it cheaper, faster, more convenient, transferable (a family moves to a different place), etc. This sysadmin's business would grow, kill the competition, and then we would be back at the current state where we've ceded computing/networking to corporations.

The client would not be the product.

I can't remember the name now, but I picked up a flyer for a place just like this based in Amsterdam at fosdem once.

It was very reasonably priced, you were able to have physical access and is more of an enthusiast club than a business.

Seriously considered it, but I don't live in Amsterdam and they recommended being able to speak Dutch to participate properly.

> It was very reasonably priced, you were able to have physical access and is more of an enthusiast club than a business.

Sounds like it might be one of the hackerspaces there: https://wiki.hackerspaces.org/Amsterdam

Reminds me of my time on TF2 servers when I was a kid. Everyone knew everyone on our main server and it was a community. It's something that discord doesn't a decent job at capturing today, but unfortunately not selfhosted. Matrix is interesting, but I'm waiting for their new Go implementation (dendrite?). The deployment for matrix feels like it's heavier than it should be. I feel like I should be able to spin up a process and point to the port and call it a day. Maybe the overhead is from the need for authentication for federation, but I personally don't care about federation for my purposes.

This sounds like what you need: https://github.com/spantaleev/matrix-docker-ansible-deploy

I don't know anything about ansible, or much about docker, or self hosting. And I was able to set it up and it's working quite well for my family and friends. You don't have to enable federation. Set federation_domain_whitelist to an empty list, and poof, federation disabled.

DNS settings are pretty easy too - especially if you can allow your instance to take control over an entire domain (and don't have to host other web services other than what the playbook supports). Don't need the SRV stuff here: https://github.com/spantaleev/matrix-docker-ansible-deploy/b...

If you just have a private server for < 100 users, 1 vCPU and 2GB RAM is enough. I also use it for bridging to IRC using heisenbridge (which the playbook supports) and it's no problem on the tiny server.

Updates are very easy, pull the latest playbook, and run setup again. Done.

> Matrix is interesting, but I'm waiting for their new Go implementation (dendrite?). The deployment for matrix feels like it's heavier than it should be. I feel like I should be able to spin up a process and point to the port and call it a day.

Sounds like you, like me, have had your brain broken by the ease of deploying Go programs :) The current Synapse server is written in Python, so it's a bit of a trial. That said, I run it on a tiny linode instance and it Just Works after maybe an hour of fiddling around (I seem to remember something about DNS records being the fiddliest part to get right).

There are repos for debian/ubuntu so its just matter of adding the repo and doing apt install. You have to make config for synapse but you have to do it everywhere. You can use sqlite instead of postgres. It will go surprisingly far and you don't have to install postgres :).

The only time Synapse starts to get heavy is if you federate with lots of massive rooms. If you don't federate it should be absolutely fine.

Dendrite is never going to ship, and if it does, it will never really have parity with Synapse. Mark my words.

*Citation needed

But also, I am not sure that "parity with Synapse" is necessarily required or desirable. Sure, it needs to be fully functional and handle all the basic federatable communications, but its not like most small scale self-hosters need/want all the enterprise features being packed into Synapse. I think it is better for the Matrix ecosystem to have various server implementation that target different segments but can all still communicate.

Dendrite shipped 0.4.0 2 weeks ago, and 0.4.1 ships today (with a 10x speed-up in state resolution performance). Meanwhile it passes 92% of the server-server matrix compliance test suite (sytest) and 61% of client-server.

I'd expect us to ship 1.0 once these numbers hit 100%, which at the current rate should be before end of the year.

Do you not think it's in the best interest of Element Matrix Services to ship Dendrite? Presumably it would reduce their cost of operations significantly ...

Cooperative business models were built exactly for this sort of thing. Farmers have been doing this since the dawn of time.

One fine line that the community will need to tread is that it needs to attract enough people with aligned interests to socialise the costs of paying people to do the sysadmin work (or find enough sufficiently-motivated volunteers to do so, and develop procedures allowing these people to hand over properly when they lose interest), but remain small enough to "know everyone" involved.

As a participant in a number of small, mostly-volunteer tech community groups, I think this might be a difficult endeavour.

Shared responsibility is no responsibility. The biggest challenge with community hosting is that there would be no accountability. In academia, many CS labs / universities have shared computing resources to run compute intensive jobs. But it doesn't really pan out, until there are few big incumbents doing most of the heavy lifting. I rather deal with AWS/GCP/Azure..etc than deal with community driven hosting.

This has been on my mind lately a lot with Nextdoor. I have really mixed feelings about Nextdoor which is a slightly separate issue, but it always seemed to me that something like Mastodon or maybe even SSB would be an ideal use case in the same space as Nextdoor. You could have local communities around local servers, that have some natural reason to organize about that (geography), but are still loosely federated.

I'm not sure where my thoughts are going, as I'm not exactly surprised Nextdoor has more use than a more decentralized system for this use, but it's salient to me as I'd think something like SSB or Mastodon would ideally occupy the space that Nextdoor is occupying. I'm not sure if it is highlighting the legwork that Nextdoor did to build up its userbase (physically mailing people in a community), or the lack of technical sophistication of users in general, or the relative infancy of Mastodon/SSB/etc, or something inherent about getting a foot in the door with decentralized stuff in terms of mindset, or some inherent limitations of decentralization (can you really just compel/convince people to use decentralized services? People just use them).

I'm trying to imagine, for example, local police posting to Mastodon about some local safety issue in the same way as on Nextdoor. With Nextdoor, it's something known nationally, the state probably gives them recommendations, they just post to Nextdoor. Nextdoor might have even reached out to them. With e.g., Mastodon, I suppose I could see it being recognized as a thing if use got up, but where are they posting? The local popular servers? Do they run their own police server? Some kind of city government server?

This isn't a criticism of decentralization -- I'd like to see everything more decentralized. I just think something like Nextdoor is an interesting case to me to think through these issues because Nextdoor is so localized, and it seems like that's kind of the ideal use case for decentralized services.

> Nextdoor is so localized, and it seems like that's kind of the ideal use case for decentralized services.

Ideal use cases for decentralized services are also ideal business opportunities. You want to find collective action problems, charge rent for solving them, then manipulate your users to make you even more money from whatever resource is being collectively managed.

edit: I can easily imagine an app started to organize and coordinate people who wanted to volunteer to pick-up and clean public parks 10 years later becoming a app that was de facto required in order to visit a public park.

Nextdoor is getting huge. I have family members who are always on it.

The owners are in for a huge payday.

I don't get the allure of the site. The site seems to attract complainers. That is not my point though.

I am interested in decentralized sites, like Mastadon.

Does anyone know of a good site that would walk a developer through building a rough clone of Mastadon?

I know it uses Ruby on Rails, React, etc., but would like a detailed walkthrough.

I did a rough search, but didn't find much on the programming of a decentralized social website on the technical side.

> walk a developer through building a rough clone of Mastadon

There are already full featured clones that federate. Pleroma and Pixelfed for starters. There are also multiple alternative frontends for both Mastodon and Pleroma. Granted none of those are walkthroughs but they do provide full featured examples in multiple languages.

> The site seems to attract complainers.

Indeed. But what do you think (or suspect, or hope) would be different about a Mastodon instance playing the same role?

My hunch is that the characteristics of ND are caused by its perceived role, not the technology, hosting arrangements or (lack of) federalization/locality.

> I don't get the allure of the site. The site seems to attract complainers.

I think you get the allure of the site perfectly.

What is Nextdoor? Google showed a .com address and it was asking for an address, but it didn’t seem to recognise any address of my country. App isn’t in App Store either. Is that the service you’ve mentioned?

Is it something US only? Is it a social network that starts with your address?

Yeah I'm super interested in that idea as well. I wonder if there are already some initiatives in the US doing stuff like it? Would it be as simple as setting up a server in someone's house and then splitting the cost of electricity and internet? Would a multiuser setup like that work under a personal internet line, or would most ISPs try to shut it down?

We used to do this in the late 90's (when there were less hosting options in general). A bunch of us at work wanted to run our own sites and experiments online, so we pooled our money and built a server that sat at one of our houses. At first we just got a static IP for the server, but eventually as more people at work joined, the guy who had the server in his basement got a T1 line installed.

We all just split the cost of internet and server upgrades, etc, which may have come out to like $40 a year or something on average. We probably did this for a decade or so until the hardware got too old and there wasn't as much interest in maintaining it all.

While I just have a VPS now, I do miss that old server and all of us working on it, and literally being able to do whatever we wanted with it. All it takes is a few buddies to get together and try it out. Experiment and see what happens, let it grow organically.

I do this for some friends and their contacts. And while making a small presentation about what we offer with pros and cons, I added one point to both Pro: you know the sysadmin Con: you know the sysadmin So as some already mentioned, it also can be a problem to trust some random company or someone you know.

This sounds like an awesome idea, and for many communities it might be, but having to trust someone you know personally for things like these can also be a source of drama when your relationship becomes bad for unrelated reasons

aka federated systems like Mastodon?

The biggest most successful and best run organizations I've worked for have not worried about hosting their software. They're concerned with delivering value to customers and doing it quickly.

The most fractured and worst run companies have been concerned with self-hosting and frankly they sympathized with a lot of the stuff I'm seeing on here.

But I recognize we still need people self-hosting because it drives innovation and competition. I believe there are ways of doing things well while self-hosting, but I'm not sure what those ways are.

I don't think it's a yes or no, right or wrong kind of thing. For an independent / small developer I think there's value in self hosting your tools, but I think it would be crazy to self host a customer facing app. There's scaling and redundancy that you can take advantage that you could never come close to hosting on your own. However, for tooling like VCS and CI, I think maintaining control is extra important when you don't have any negotiating power.

For mid-sized companies where you're paying someone to maintain things, whether that's an employee or a 3rd party, I think you need to assess your tools in terms of your negotiating power and how important it is to maintain control of everything. What if GitHub bans you? I don't think I'd try to self host a customer facing app at this scale either.

I think you can be short-term successful by throwing caution to the wind and using every shortcut available, but will the first to market advantage be enough to offset the price competition of people that aren't locked in to some proprietary API gateway or WAF? For example, what if I take extra time to build on OpenFaaS and you build on AWS everything. Who wins long term? You're faster, but I have better negotiating power (ie: less costs) by threatening to switch vendors.

Or is the idea of switching hosting vendors detached from reality at this point? Is hosting cost so negligible it doesn't matter? All I know is that everything looks crazy expensive from where I am.

> but I think it would be crazy to self host a customer facing app. There's scaling and redundancy that you can take advantage that you could never come close to hosting on your own

We see the ”scaling” argument a lot, but honestly it's overrated for most businesses: for 95%+ percents of the businesses you can run online, the traffic you have can easily be handled by a single cheap machine running a PHP backend, as long as everything is properly set up (caching, using a CDN for media, etc). Modern computers are fast!

Same for redundancy: your business is unlikely to fail because your website is down for 2h every month (heck, this even that much worse than github /s).

Independent developers almost never need to care about scaling and redundancy.

> What if GitHub bans you?

This is one of those things that I was talking about them sympathizing with.

It's unrealistic. If you're getting banned from github or a cloud provider maybe you should reconsider what you're doing.

> You're faster, but I have better negotiating power (ie: less costs) by threatening to switch vendors.

Okay, but AWS isn't expensive as it is. If you build to the way these resources are meant to be built you can seriously minimize your costs.

I've seen plenty of lift and shifted apps get their costs dramatically reduced once redesigned in a "cloud native" architecture. The lift and shifted design was in the thousands of dollars a month. Rebuilding for AWS resources brought our costs to almost literally nothing on this same app. I'm talking $5,000 a month to $5 per month.

So I have a hard time following the premise that you'll do it cheaper with an on-prem or self-managed system.

Why is this getting downvoted?

Probably for this line:

> If you're getting banned from github or a cloud provider maybe you should reconsider what you're doing.

That’s a sentiment popular among Big Tech types (such as Eric Schmidt: “If you have something that you don’t want anyone to know, maybe you shouldn’t be doing it in the first place”), and distinctly unpopular among those who consider the cloud providers’ decisionmaking arbitrary, capricious, or dangerous.

Exactly. I had to migrate a legal canadian gun business website from Shopify to Magento because Shopify changed their policies years ago.

There's still some topics that are taboo and that some businesses refuses to touch.

But you still went with a cloud hosted solution.

> The most fractured and worst run companies have been concerned with self-hosting and frankly they sympathized with a lot of the stuff I'm seeing on here.

Self-hosted organizations that I think of off the top of my head are Google, Facebook, Amazon, Backblaze, Microsoft, Stack Overflow, and possibly Wal-Mart & CloudFlare.

…and Dropbox.

They moved everything including the hardware and the data in house, and next year they’ve doubled the provided storage (to 1TB) with no additional cost.

Lastly, they’ve doubled the storage to 2TB and added some features for $20 more.

As a person who selfhost a lot, I'd guess the advice in the article targets individual and communities first and foremost, not commercial entities.

I agree, when my company moved from self hosted mattermost to slack, it was a definite productivity boost.

Was the Mattermost instance badly managed?

Because for us it was different. We got rid of Slack and decided to just use basic Google Chat we already had with Google mail. It was definitely a productivity boost and insanely decreased the number of times people messages each other. It was helpful.

So what, in your case, it had to with self hosting? Or did you mean it was just the magical effect of Slack being Slack?

There was a lot of frictional jank related to mattermost that just made it less smooth to use and more annoying, with the occasional missed notification and so on.

Slack has a lot more product thought and nice features that keep on getting added. Our competency is not managing mattermost, it's doing what the company is made to do.

If you want to reduce communication, well that is a different thing altogether and you can choose if you want chatroom software for your company or not.

> Our competency is not managing mattermost, it's doing what the company is made to do.

The thing though; is that this argument can be used to outsource everything.

Office space, HR, recruitment, even making coffee.

At some point the externality puts restrictions on your maneuverability, in the case of office space it's because you might not be able to just run a few power cables somewhere, for HR (imagining a scenario where it's outsourced) there's limitations on speed, or in the worst cases limitations on how you can even interface with the HR department.

Making coffee is one of the cheapest things you can do, outsourcing it may mean, though, that you lose choice of what beans you have, or what types of milks you want to stock, and there's a premium on the price because another company wants to make a profit supplying this service.

This is a contrived example, but overall my point is: there are times that in-sourcing your tools actually gives you the time and freedom later. It's a gamble you make.

Mattermost PM here, thanks for sharing and sorry to hear that your experience wasn't smooth when you tried the product.

We'd love to improve, are there one or two things top of mind for us to change? You mentioned occasionally missing notifications?

Your company probably talked to our company a lot through the years. It's been a while, so I don't remember my specific complaints, but I do remember that push notifications would sometimes not be delivered, so I would miss messages. It would also go the other way. There are a lot of small nice refinements in slack & discord that mattermost just doesn't have.

Once recent new feature slack added that I like was scheduled messages, but that is a pretty new thing.

Thank you! Really appreciate your feedback here.

It's depressing, because I can imagine a world where the government would access people's data, but only under extreme circumstances like investigating terrorist networks. Instead, they use it in outrageously corrupt ways, like collecting vast swathes of communications from everyone and storing them with poor limitations on access.

As Chomsky (who I don't agree with on everything by any means) has written, when the government talks about doing things for "security," that usually means security of the government from its own people.

For every article written about lock-in, hundreds aren't written because people get overwhelmed by technology.

Let’s face it, even with no explicit lock-in or proprietary features in use, changing providers is still a pain.

Especially if you're a company and you've integrated their services into yours. I was talking about a business idea with someone and they literally mentioned that once you get into a business relationship with a company you're super hard to remove and used the example of customer service outsourcing they experienced. That's not even tech related and they had serious trouble moving away from the provider.

If you’re going to half-ass the thing, outsourcing might be better. But if you intend to do it right, you have to do it yourself.

Doing it right means achieving multipliers. Using each thing you do to improve everything else you do.

How do you improve your product when you’ve outsourced customer support? How do you align it with your business philosophy?

Yeah, I’ve watched a lot of effort going in to avoiding vendor lock-in that seemed like it was basically a waste of everyone’s time.

Self-hosting is time-consuming and potentially dangerous with respect to security.

You need to know what you are doing.


Example: Dropbox is open to the world. You can share files with everyone. Can you properly secure a nextcloud instance?

VPN may not be applicable, because you have to share files with others. Even then, you need to have fair amount of knowledge about networking, protocols, security, current software, vulnerabilities, etc. Even with SSH, you need to be careful. And this is only the security part, I am not getting into a dozen of other concerns.

Overall, as software complexity grows, self-hosting will be increasingly harder.

Encrypting client-side and using a managed solution is a compelling option.

> Self-hosting is time-consuming and potentially dangerous with respect to security.

When you see that large companies get hacked all the time with you sensitive info and password released in the wild, it makes you think twice about "security" when your data is not in your hands. I'd say both are dangerous anyway, and certainly trusting a third party with any kind of data is a big gamble (plus, they may be spying on you as well).

It depends on what the third party is. The chances that your google account gets hacked because of lax security practices on google's part, is probably orders of magnitude lower than your typical F500 company getting hacked because they forgot to patch their machines.

They just roll over all government requests for data, so that's a lot of APT that are neutralized.

This. I'm keenly aware of how time-consuming self-hosting is.

- A FreeBSD firewall (requires continuous patching)

- 6 DNS/NTP servers (don't ask!), most of which are in the cloud

- 2 VMware ESXi hosts

- 3 ethernet switches (an 8-port 10Gbe, 24-port 1GBe, 8-port 1GBe)

- 2 WiFi Access Points

- 12TB TrueNAS server

- 2 laptops, 1 desktop

- countless VLANs, countless VMs.

Effectively I run my own AWS. But it comes at a cost: countless evenings & weekends. Endless updates (OS, BIOS, firmware), periodic hardware failures.

Also, as pointed out, security. My unpatched DNS server was compromised, and the intruder managed to get root on my server (this was back in '99, before BIND was heavily re-vamped for security).

Self-hosting is a labor of love, but I'd be hard-pressed to recommend it to anyone who didn't enjoy it.

It is only time consuming if you let it be: I have been there too, hosting each service in a different OpenVZ jails (before containers were a thing) and doing hyper complex stuff...

Nowadays I simplify to the extreme (refrain to run something I do not need, always using the simplest solution) and it works pretty well for me:https://benou.fr/www/ben/14-years-of-self-hosting.html

One hack in '99 is not bad really. Looks like you're doing a great job.

Don't forget that the whole DIY thing is also incredibly educational. People tend to forget that when weighing the pros and cons.

It's not always directly teaching useful skills for work as most companies will just want you to know how to talk to AWS. But general computing and security knowledge is always useful IMO.

I like seeing people acknowledge the problems that come with self-hosting. I tried to self-host a few years back ended up lasting only a handful of months before going back to letting others host the services I use.

I didn't run into any specific issues, but instead I ended up realizing that I had to monitor the services myself to ensure that they were still functioning properly and that they had security patches applied. That's not a responsibility I want to deal with.

And as strange as it sounds, I also noticed that there actually were privacy advantages to not hosting stuff myself. Maintaining multiple identities when self-hosting is only possible with a domain per identity and not reusing the same machine for services across identities.

Wow, you really need to write a how-to book and sell it on leanpub. I would buy it!

The other side of this is that unless you're a very important individual nobody is going to blow zero days on your self-hosted server, and you're pretty unlikely to get focused by individual human (non-automated) attention/exploitation.

I've been self hosting for over a decade with no intrusion to my knowledge, although I'm sure some state-level actor has access. On the flip side I've had many of my login credentials stolen over the years due to a wide range of companies getting hacked- haveibeenpwned currently lists 11 breaches for just one of my emails. It's probable I'll get owned eventually, but I've got some catching up to do.

I mostly agree with your post, except using a zero day on a small (especially self-hosted) server is very rarely blowing it. In fact I would bet the majority of self-hosted or small-time servers wouldn't have the first clue about how to figure out how you got in, let alone parsing logs to figure out the exploit. Assuming they even log sufficiently, hiring a forensics expert is almost certainly out of the question financially.

I wanted to write exactly the same comment: it is a lot less likely to be targeted. The big company leaks happen often because A LOT of resources and human hours go into trying to find flaws in their security.

Not only that, but the reward is a lot smaller for the attacker and the overall damage is smaller for the community. If attackers get into Google Analytics/Tag Manager servers they will be able to find data and sensitive information about most of the websites in the world and be able to control them. If they get into your self-hosted analytics server they would only find out your stats which can't be used for much.

There is one thing to find the name and phone number of one person and another thing to find the name and phone number of millions of people.

You can use a self-host app like Pritunl[1] to host a private vpn server and put all the other self-host instances behind this vpn.

Hackers wont even know if your self-host server exists. I self-host Bitwarden and that's how I am able to sleep at nights.

[1] https://github.com/pritunl/pritunl

What if your self-hosted app must be accessible on the web? (eg. a blog or analytics platform)

Would all that traffic still have to go through the VPN tunnel?

No, only the traffic of the self-host server you whitelist on Pritunl using the self-host server IP goes through VPN. Rest of the internet traffic works as usual.

This is my issue with self-hosting. I am so damn paranoid.

I'm not a sysadmin or a security expert.

I don't keep vital or sensitive stuff on anything I'm hosting but it's still frighting.

> Overall, as software complexity grows, self-hosting will be increasingly harder.

Setting up self-hosting is not easy, except that it can be, as I see in the responses to this comment.

I am not sure I understand what "as software complexity grows" means. My observation is that "as software complexity grows" it eventually (and hopefully) fails, and we go back to simpler software, albeit using a few things we've learned along the way.

"As software complexity grows" is not a desirable trait. I hope that there is no need for such software, but I can't predict the future.

Most self hosted things don't need to be on the internet, the only things I have on the internet are a webserver, a game server or two, and an openvpn server.

The rest of my stuff is all local/vpn only.

This is my solution too. My server with private data is only accessible via my LAN. I'm home often enough that syncing isn't a problem. I kind of treat it like the old Palm desktop, where you had to sync regularly by USB. The nice thing is that the sync is automatic in this case. I know that kind of punctuated syncing wouldn't work for everyone, but it works for me.

My public server has a couple of ports open to the internet, but SSH, SFTP, etc., are only accessible on the LAN with access by key (no passwords). It does things like XMPP (hashed passwords, no locally-stored chat data), public websites, and the like.

Until we have self hosting as simple as app installation and without having to fiddle with security, it will be a niche thing.

Plenty of home/SMB NAS offer that. Plus there are projects like https://www.freedombox.org/.

On top of that, many hosting providers offer to set up popular open source projects for you.

Even if it's "as simple as an app installation", you still need to have a public IP address that isn't behind a NAT. How many residential ISPs offer that?

NAT isn't an issue, but CGNAT is a problem and becoming more common as IPv4 space gets more expensive.

It's not much more difficult. Many hosting companies provide installers like e.g. cPanel that allow you to set up a Nextcloud instance within a minute.

Look at the Uniform Server, a complete WAMP stack pre-hardened for placement on a public server. Just run the installer, it is that easy.

That's the statement I wholeheartedly disagree with.

It is INCOMPARABLY more secure in a broad sense just because you control your infrastructure.

Yes, you need to know what you are doing, but this is applicable to everything, does it not? Of course, mindlessly subscribing to bazillion of services is much simpler, but it's plainly not professional.

On a side note, do you think Dropbox is any more secure than any other service, including self hosted? Or any other service?

After years of seeing how those companies are made from inside I am personally quite free from those illusions.

This is one reason I think urbit is cool - it makes self hosting way easier.

I run mine in digital ocean, but if you want to run it off your home network it’s basically just figuring out the vpn bit to safely get on your home network and everything else is good to go. You can also use something like tail scale or zero tier to skip the vpn part (but I know less about those things).

Hopefully in time even this will get easier with UI that guides you through the process.

> Even then, you need to have fair amount of knowledge about networking, protocols, security, current software, vulnerabilities, etc.


> Encrypting client-side and using a managed solution is a compelling option.

You need a similar amount of expert knowledge to properly configure your client-side encryption, ensure the algorithm wasn't cracked, the implementation you're using doesn't have any severe vulnerabilities, etc.

If we're in a situation where we can trust no one, not even ourself, then we have a problem.

You can trust a Linux distribution to provide reasonably secure software out of the box, like Debian / Freedombox

Nothing you care about should have access to the open web. If your self-hosted services can be accessed by anyone with a web browser or curl, you're doing it wrong.

> VPN may not be applicable, because you have to share files with others.

You can use a self-host app like Pritunl[1] to host a private vpn server and put all the other self-host instances behind this vpn.

[1] https://github.com/pritunl/pritunl

"you need to know what you're doing" -Mr. Obvious

There are pre-packaged solutions such as the Uniform Server - a complete WAMP stack fully hardened for placement on a public server. This is an EXTREMELY COMMON PROBLEM and PEOPLE HAVE OPEN SOURCE PACKAGED SOLUTIONS.

This constant "it's too hard, waaa!" bullshit is just lies.

I love the sentiments in this blog. I don’t put them into active practice, but I like them, for example I look to Twitter once every morning to see if there is any new tech I should look at or papers to put in my readying list; my Mastodan account languishes. I have small and free VPSs from both Google and Oracle which I appreciate. I totally rely on the publishing platform https://leanpub.com/u/markwatson for writing and publishing the books I write.

What her blog triggered for me is that we can have a better digital life by being conscious and taking control of our assets, control over interactions with people and companies, etc.

I wouldn't call it self hosting. It is more associative hosting. Self hosting is when you do everything yourself and that can be really cheap. It's more work and require more competence but you have the minimum dependency.

I had a bad experience in using non self hosting. I used weebly for my blog because it was free and convenient. Without warning they disallowed free access. I can't modify my data and can't export it. That gives me an unpleasant feeling about weebly and such type of free service.

I now do true self hosting as far as I can. I wouldn't even trust an association.

> Self hosting is when you do everything yourself and that can be really cheap

“Cheap”, only if you don’t value your own time.

This "only if your time is cheap" argument is fallacious.

Especially since it was originally used in the Linux desktop context.

If you have enough skill (or the willingness to learn) and initial investment of time, then the ROI on these DIY projects can be immense.

I am far more productive with a Linux desktop and self-hosted / managed "solutions" than their commercial alternatives.

For example: My media server setup far outperforms Netflix and Spotify in terms of ROI and /even/ convenience.

Similarly my Linux desktop PC is better for work and play compared to any off the shelf MacOS or Windows experience.

If you have the perseverance and initial time to invest, you end up over time saving so much time and money.

> If you have enough skill (or the willingness to learn) and initial investment of time, then the ROI on these DIY projects can be immense.

I self host a ton of stuff. Sometimes I feel like I'm wasting time that could be spent writing code, but, ultimately, I think having good sysadmin and network admin abilities makes a difference in the quality of software development.

Sometimes I see developers that barely seem to know how networks and DNS work.

And the whole argument about time spent is getting weaker. My stuff has gotten to the point where it's a bunch of Docker containers that I could auto-update if I wanted. The hardest part is picking containers that are maintained, but all the official ones are nowadays.

De-cloudification is a thing now: https://www.economist.com/business/2021/07/03/do-the-costs-o...

We're coming a full circle. At work, we just installed a couple of massive 64-core Xeon machines. On prem. Like it is 2002.

> If you have enough skill (or the willingness to learn)

Building the skill requires an investment of time, which has to be compared against more productive (read: profitable) alternatives. Remember that all endeavors have opportunity costs.

> My media server setup far outperforms Netflix and Spotify

Every time I've done the math, this only comes out ahead financially if you already have a huge library or if you are willing to torrent.

Is there something I'm missing?

In the civilized world, we have 8-hour work days, some of the days of a week, and then we can do whatever we want with the rest. By which I mean, most people do not see the remaining hours as “potential money making time” but as “this is when I do something I like to do”.

> then we can do whatever we want with the rest

hmmm... Let's count : 8 hours of work = 8 hours + 1.5 hours traveling to work + 1 hour for noon break. Then I sleep 7 hours. Then I need 1 hour to get ready in the morning. In the evening, it takes about 1.5 hour to cook (don't tell me it's my choice to spend time cooking instead of eating pre-made-full-of-sugar-and-fat food). Total = 20. So 4 hours left. But somehow, work is sometimes hard, so I need about an hour of rest. So in the end 3 hours left per week day. On the weekend, I'll spend 2 hours doing groceries, 2 hours keeping the house clean and doing repairs. Unless you are alone, you'll have time spent socializing, which is not exactly a choice neither, you need it for your mental health. And if you do some sports, again because it's fun but also because, at some point, it's for your health (i.e. being able to use your non-working time in a useful way). So well, it's not like there's much left. And I don't even count the kids... (but that was a choice :-) )

Agreed, many people simply don’t have the time to do hosting as a hobby. Me neither - I chose a family and a music hobby. But that’s not really relevant to the GP’s argument “your time is money”, though. My point is, only my working time is money. My spare time is mine to spend on whatever I like.

Well there are degrees here, aren’t there? I might hack away on some software in my free time but there are some aspects of that I like more than others where I’d rather spend my time. Besides that, nothing about this article led me to believe it’s just about personal hobby projects.

I couldn’t think of anything worse than debugging mail delivery all evening in that time.

Me neither, I do enough such stuff in my work hours. But I’m sure some people get a kick out of getting it to work and learning all about email internals.

Or your own money. I did the math and I was spending more money on just electricity to run my home server than it would cost to pay for the services it provided. Not to mention the initial cost of the hardware you need to host it.

A raspberry pi is not sufficient for running things like nextcloud in any kind of performant way.

A box with an i5-4570 or similar and 8GB of RAM costs about $80 to buy, and uses ~25W or around $25-30 a year in power. A comparable VPS or Dedicated box is easily 10x the cost.

I think people see those ridiculous rack-mount servers some people run at home that suck down 300+ watts and assume that's just normal!

I went for even lower power usage, with an i3-7100u box that uses about 2W most of the day and cost $75 plus some extra RAM.

Depends what services you need. I used to be doing a lot with my server but then it became just static web hosting and nextcloud which I replaced with the cheapest google storage plan and gitlab pages.

These days power usage might be workable with something like a mac mini server. I did a test and my ryzen 5 server with 3 HDDs was drawing 75w minimum and my area has quite expensive power so it just didn't make sense to keep running it.

A VPS also comes with a lot of really useful advantages. You aren't tied down to the hardware. As your needs change, you can change the scale of the VPS. Right now I still have the homeserver sitting here waiting to be sold as well as some other previous machines which were not powerful enough.

A VPS is also relatively unaffected by things like power and internet outages. It just keeps working. It's more convenient when you move house since you don't have downtime in the process. It has a dedicated fixed IP address and ipv6 with no fucking around with CGNAT or blocked ports.

Just buying a fixed IP address would cost an extra $5/month.

Once you consider every cost, a VPS can seem pretty good value in many cases.

> A box with an i5-4570 or similar and 8GB of RAM costs about $80 to buy, and uses ~25W or around $25-30 a year in power.

I'm guessing you're looking at the preowned market?

For those prices, people might consider themselves lucky to get an underpowered Celeron with BYO RAM and storage, brand new.

Yep! Not much point in buying new hardware for running basic services at home, especially since used business stuff is so cheap, it can cost 1/10th the amount for similar results of buying new.

I'm currently using a mac mini 2011 that I got from free from work (it did not support newer xcode and mojave). I'm the only user and have Lychee, Jellyfin, Syncthing on it.

> on just electricity to run my home server than it would cost to pay for the services it provided. Not to mention the initial cost of the hardware you need to host it.

Most servers with enough GB of RAM and powerful processors can cost in the 50/100 USD range to rent per month. It's much cheaper to self host beyond a rock bottom VPS. Leaving a modern PC on the whole time will not cost that much in a month, and what you invest in hardware will pay for itself with the difference over time.

enough RAM for what? Without diving into the bargain bin, I get a 64 GB VPS or dedicated server for ~$50, that's quite a lot. (And I don't need it, so I pay ~11€ for a 16 GB VPS, and even that's overkill for me)

Where are you getting 64GB of ram on a dedicated server for $50/month? Even OVH and hetzner charge almost double that.

Hetzner EX42 and AX41 both start at 40.46 € (local price, so incl. 19% VAT), how is that almost $100?

If you need multiple GB of RAM, you're probably doing it wrong.

doing what wrong? There are applications that require several GB of RAM.

Do you mean gitlab? :)

Nifi, Kafka, etc...

> “Cheap”, only if you don’t value your own time.

That's a ridiculous take, because the skills you get through self-hosting are actually marketable afterwards.

It can be, but that depends entirely on what kind of career path one is interested in. Not everyone is interested in landing a SRE job.

You never know when or where the skills you picked up are going to be useful, no matter the career path or occupation.

Indeed! But for every topic one chooses to study deeper, one also has to reject some other topics, simply due to the fact that every person has limited time on earth. Thus, one needs to choose wisely. There’s nothing wrong with spending time on learning the skills needed to do self-hosting. But I don’t believe everyone has the same preferences here. Just as not everyone will learn to brew beer, make furniture, sew clothes, make pottery, build a house, etc etc.

As Chaucer would have it: “The lyf so short, the craft so longe to lerne”

And of course, everything gets changed in version n+1 in the churn-churn-churn world of web software so that the skills one picked up become dated fast unless constantly being refreshed. Not worth it for something that only might be useful if they're lucky.

I automated as much as possible with ansible. I could upgrade my debian system in a few hours. With ansible I have a recovery plan ready in case of disaster. I could have used docker containers, but I'm a bit old school. It's not much work. I do check logs every day though. It was significant work to set up since I had to learn ansible.

I don't self host anything, but I have the skills and experience to do so. I think I would rather enjoy using those skills and more than using my skills in my current job. Though my current job over-values my time by a lot.

Some properties of self-hosted infrastructure can't be had for love or money with commercial solutions. Or alternatively, are so costly that you can't justify the money for it when there's a mortgage to be paid.

Learning is valuable time.

It can also just be enjoyable and therefore not wasted time.

That said, the learned skills are only actually valuable if you can use what you learned later on in life. I've done my fair share of fiddling around with raspberry pis and kernel compiling when I was younger, but can't think of a single time in the last few years where I had to use that knowledge in my day job now that everything is containers+k8s+<some cloud hoster>. Maybe we can argue that it gave me a slight speedup when trying to grok the container execution model or something like that, but I could have gained that knowledge much more efficiently in other ways.

There are infinite things to learn. Why should I prioritize learning all the broken things that will allow we to self-host, and not, say, carpentry. Or knitting. Or the history and evolution of a non-y language. Or...

Because you enjoy that?

The original comment said nothing about enjoyment, or about enjoying spending time and learning this particular set of skills.

Here is my roadmap to the "metaverse" or the final medium if you like: The clients will be X86/Win and slowly migrate to ARM/Lin as electricity prices rise, right now only Jetson Nano is good enough, Raspberry 4 has half-float issues and the GPU is generally too weak.

On the backend you need to own the persistent data but not the real-time data, so you will distribute your database on 2x or more home hosted setups and the regional live servers (asia (AWS and GCP), central US (GCP and IONOS) and europe (here anything goes)) will connect to those.

You need 1Gb/s up+down fiber on two homes for this.

You also need a software/hardware stack that can saturate those 1Gb/s at very low wattage so you can have lead-acid backup power (make sure your appartement building has a UPS on the switch in the basement).

The real tricky part is the license you apply to all of this so that others are incentivized to fill the demand for you in the case that blows up!

I'm going to go with with monthly payments in proportion to your revenues starting at $20/month.

For end customers I'm thinking $10/year.

We're well on the path of universal, always connected, high-speed devices with cheap data. Software is also surging in sophistication and complexity. I really don't see why most applications cannot be truly self-hosted (which means your own device or hardware, not a VPS or colocated) these days, except for video.

I can only speculate that the abysmal state of self-hosted software for the general public is because there is not enough money to be made in terms of recurring subscriptions or constant inflow of data.

I sell self-hosted software since 2012, what can I say... Times change.

The problem with general public (people at home) is that most of them really don't want to pay for such software anymore and for the developer it requires to worry too much about stuff always getting broken as it runs in a galore of different configs.

For my stuff 90% of support issues are just bad permissions, bad mounted filesystem, somebody forgot to run apt update, etc. People really think that all those issues are our responsibility, just to educate the customer is a waste of everybody's time.

What are you selling, if I may ask?

> I can only speculate that the abysmal state of self-hosted software for the general public is because there is not enough money to be made in terms of recurring subscriptions or constant inflow of data.

That's exactly what it is. Some software charges a subscription for self hosting. You maintain everything like a sysadmin and pay a huge per user per month subscription fee. It's insane.

Look at authentication systems to see how ridiculous the price discrimination / gouging has become. It costs $0.0055 per month for and AWS Cognito user or $0.00325 per month for an Azure AD External Identities user. However, as soon as you use Active Directory for employees it's several dollars per month per user. The P1 plans are $6 per user per month. What makes auth for an employee worth 184,000% more than it is for a customer?

I think big tech is absolutely scamming everyone, especially small businesses. They're taking "charge what the market will bear" to a whole new level and the only reason it's working is because anti-trust laws aren't being enforced. If we had fair competition the cost for a lot of tech would drop substantially IMO. There's a lot of room in a market with 2000x markup.

because that "cheap data" is only cheap for download.

upload is aggressively throttled, filtered, sniffed, redirected, and otherwise treated as a hostile act by ISPs, to submit to the demands of the media industry and keep squeezing businesses for exorbitant rates for the same bloody service but with the filters turned off.

your average consumer ISP account where i live can't even run sshd without using complicated work arounds.

the system has de-democratized web hosting and monolithic services have rushed to fill the vacuum left by the death of the ISP hosting era.

... in the US, period. (Nota bene for old asymmetric standards used in other countries, and 5G is still asymmetrical unless there's a street-by-street deployment due to conservation of physics).

I'm in Finland and it's the same here.

A friend who basically was freelance and did tons of self-hosted IT shit for years, has given up and is now doing contract work in fucking Photoshop, because you can't practically run own hardware services anymore.

It's a joke, and it upsets me that it all seems to just have happened quietly under everyone's nose, and no one seems to be worried about it at all.

Not related to the discussion, but I need to say I didn’t know I could use NB (i.e., “nota bene”) in an English conversation. Thanks

Only if you spell it out. I've never seen it abbreviated in English, but I have seen it spelled out.

I only wish people knew what "e.g." meant -- "exampli gratia" or "free example". Folks on HN frequently use e.g. when they mean i.e. or "id est" or "that is". When you know the Latin, it rankles you every time.

However despite Latin, "data" is stuff just like "hair" in standard English. "The hair _is_ on the floor", not "The hair _are_ on the floor." And thus "the data is collected", not "the data _are_ collected". English isn't a slave to Latin, but some misuses are too egregious to be tolerated.

> I only wish people knew what "e.g." meant -- "exampli gratia" or "free example". Folks on HN frequently use e.g. when they mean i.e. or "id est" or "that is". When you know the Latin, it rankles you every time.

Just like you, I substitute "e.g." whenever I want to use "for example", and "i.e." for "that is".

I find the difference quite straightforward once I "get it".

Just as a side anec-note, I’ve seen “NB” far more frequently than _nota bene_ (in fact this thread may be one of the only times ever)

> Only if you spell it out. I've never seen it abbreviated in English, but I have seen it spelled out.

I've only ever seen it abbreviated. The only time I've seen it spelled out is when I looked up what it meant.

And USA is asymetric because housing is spread out. Same for Australia. But if services were locally-hosted, one wouldn’t need the big submarine cables to go back to USA.

I pay over $100/mo for 5 Mbps up-speeds. There are literally no other ISPs that will offer me anything different. So, I use a VPS. Turns out, that’s not only better bandwidth performance, it’s also cheaper than the electricity from self-hosting where I am as well.

The appeal of hosting is that somebody else takes care of infra, OS, application management for you. Until this can be meaningfully solved for self-hosted situations, self-hosting will always be at a disadvantage.

Why can it not be meaningfully solved right now, technically? Do we need AGI to solve easy to use self-hosted apps? I don't think so. I think the blockers are more economic than technical. Which is why in fact the field is heading the other way, towards the universal cloud and thin clients.

Self-hosting means that the person who does the administering, has no control over the infrastructure, configuration, etc. There are millions of ways in which something could be configured. The user could have installed a kernel extension that panics every 2 days, for all he knows.

This means that the administrator must be sufficiently skilled to be able to handle all anything that might come at him. Which means that he's expensive. This makes it economically hard to compete against hosting, where they administrators can be cheaper due to having more controlled environments.

One solution is for administrators to insist on that environment conforms to some sort of standard. But no meaningful standardization currently exist for this context.

Making the device/software resilient enough, is also very hard, and suffers the same problems as with human administrators. If you install a device in a network with a faulty router, then what is that device supposed to do? How does it even know the router is the culprit?

Why the downvotes? What exactly have I said that is controversial, untrue or misleading?

>I think the blockers are more economic than technical.

I don't see why it can't be both. I run an eCommerce website and pay $5/month to DigitalOcean for what is basically a VPS running Wordpress and Cyberpanel (free and good cPanel alternative).

The reason I'm happy to pay that $5 is because it makes a whole host of technical problems disappear. I don't have to worry about maintaining hardware or dealing with an outage if my (consumer) internet is down for maintenance. I don't have to configure my router or set up the CDN, and the bandwidth they have at these data centres is 10x what I have at home.

If these technical problems disappeared, I wouldn't need to outsource the hosting to an external provider and could save myself the hosting fee. On the other hand, if cloud hosting were significantly more expensive (or if I was just running a website as a hobby and didn't care about downtime), I'd definitely spend the time learning to self-host.

It was already the case for many years.

I have a more practical take on self hosting.

Developers should be talking to the Ops folk. It informs your architecture decisions with practical considerations, like physics, and how many NICs you can plug into a homogenous switch before you have network hops screwing up your pretty but naive designs.

When you stop self hosting, the number and quality of those people goes away when they realize they should find someplace else to be. And when we need fewer of them, we stop making new ones.

I try to push architectures that allow for a degree of heterogeneity, where we have one data center we own, and use others we don’t for geographic redundancy and speed of light concerns.

For a read mostly system 5a Reading an entire zip file's contents and writing out a brand new zip file could be an extremely slow process.

For read-mostly systems, that may mean for instance that we keep the system of record (I’m doing just this to bootstrap a personal project that has a read-mostly information architecture) but distribute the UI out into the Someone Else’s Computers.

When you self-host, the government has to come to you for your data.

When you use third-party services, the government can go to them. The third party might not fight the request the same way you would. And, you might not even know it happened. The third party might be expressly forbidden from telling you it happened, in fact.

This was why Hillary Clinton wanted to host her personal email in her basement. A physical server that she owned, on property she owned; there was no legal way to request that data without going to her personally. If she had used the State Dept server for her personal email, Congress could have accessed all her personal emails simply by asking State to send them over.

That’s a controversial example, but the same principle is followed by many companies and organizations who have kept some portion of their data self-hosted. It’s often email or some core of file storage that they consider legally sensitive.

This is getting harder to do, though. Look at the recent revelation that the government tried to get newspaper email metadata from Proofpoint, a spam filter provider. Self-hosting a good spam/phishing filter seems almost impossible in 2021, because of the huge amounts of data needed to train filters well.

Spam filtering on your own mail server is easy. 99% of spam are generic automated E-Mails that are sent in bulk with lots of spoofed metadata (domain, sending address, date, etc.). I have an address on a domain that used to be hosted by a third party and it got tons of spam. At some point I moved the domain to my own server with mailcow, and it blocked the vast majority of spam out of the box with no false positives. It uses rspamd, not sure if they have a tweaked config for it or something

Generally I really like mailcow. It makes dealing with all the ugly parts of hosting E-Mail fairly simple

I'm using mailinabox, very similar to mailcow. Before that, did all the config myself.

Incoming spam is hardly a problem. Spammassassin, rspamd and those catch most. Greylisting the rest. Once a year I see an uptake in spam, spend a few minutes dilligently marking everything a spam/not spam which the server the uses to retrain itself a little.

Spamfilgering when selfhosting is hardly more work than on gmail, live, proton and such.

Your outgoing mail icw spamfiltering, however, is an entirely different, and tough problem.

In the country I live, Brazil, federal police has been breaking into people's homes/offices and taking away all digital devices at once: laptops, phones, thumb drives etc.

That makes me think what type of contingency I should have in place to stay minimally operational after such event happens to me. A VPS somewhere with my work toolkit installed and files synced via syncthing, for example? Maybe... but what if the police could get to the same VM via the confiscated devices? I don't know...

You can make an authentication method strong enough on the VPS, multiple factors, even IP block lists so they'd have to do it from your home.

Secondly, you're local machine should encrypt itself if that's your threat model. They can take it while it's still on but if that's actually a concern for you, you can figure out a way to trigger a lock or a shutdown if things change. If it's a stationary machine, it can be easy to notice your environment changing. maybe you can't find the mac addresses of your switch any more, maybe all 10 of your neighbor's ssid info is no longer visible. Perhaps lack of internet is good enough.

Phones are a lot harder because their environment changes a lot more, but you can still check things like has my computer decided to go to lock itself? In the end, if your threat model involves that kind of risk, you can set your devices up to brick themselves or at least shutdown and encrypt themselves.

Last, you'd probably want a device so that you can do the things. A phone and or old laptop with an OS already installed that you can retrieve.

That's an interesting opsec problem. Here is the solution that requires writing more software:

1. Find some friends or people you trust to not sell you out to the police. Ideally, these people should be in another country.

2. Place a server box on their property. This box will be a replica of your every-day home-server and devices.

3. However, in order to stop law enforcement from technically [1] finding this replica-box, you will need to use Tor. This ensures your home-server does not store the ip address or the physical location of the replica-box.

4. If your home-server is taken by law enforcement, you can buy another home-server and use memorized details (or call your friends on a burner phone) to restore a backup from the remote device [2].

[1] Please note that law enforcement can legally compel you with threats of jail time to reveal where these replica boxes are.

[2] Since you will probably be under surveillance, it's unlikely law enforcement will allow you to freely communicate on the internet with new devices and servers.

Regarding [1], do you know Brazilian law? I don't. In any case, the right to not incriminate yourself has been widely adopted, and in principle, could perhaps be invoked here, too.

> but what if the police could get to the same VM via the confiscated devices? I don't know...

This is usually what passwords are for, something you know that cannot be stolen (short of rubber hose cryptography)

Yes. For that I've been thinking of using VeraCrypt's hidden volumes. A volume inside another volume where an adversary cannot see their boundaries, which could allow some plausible deniability for passwords. I guess.

Manually rotated offline backups. Copy all your stuff to an external hard drive and stash it at your least technical friend's place. Go visit them once a week and swap the drive while you're there. You might lose up to a week's work but the bulk of your data will be safe.

If your server has full disk encryption it should be relatively safe against attacks where they just take the device, and so whatever you use to sync should be safe too?

It depends whether you want to preserve your work somewhere so that it cannot be wiped, or if you want to secure it so nobody has access.

In the first case I would set up a "append only" system where you cannot delete anything, just append information. This could simply be a incremental backup system.

Have it managed by someone outside your country, you would just be a user.

In that case if they grab everything they cannot delete what you have there, and the cannot access it as administrator either.

If you want to protect from the second case, its gets much more complicated.

You need to encrypt the systems that hold the data and make it so that the encryption key is wiped from the systems if they are in a panic state. This can go as far as you want: no more Internet (the machine was disconnected), or the trigger on the door of your basement starts a countdown of a few seconds you can only stop by logging in - otherwise the system shuts down (or better, cuts the power).

An extra complication is if you fear that you can be forced to provide decryption keys. In such a case you could either go for dynamic keys that are provided to you by someone else outside your country, though a process that ensures that you are safe.

"you want to preserve your work somewhere so that it cannot be wiped"

This is my biggest concern. Confiscated devices are never returned to their owners.

Pick a VPS from a hosting provider outside of your home country (Brazil in this case). Use it to hold encrypted backups. Either a Syncthing instance configured as untrusted, or just btrfs send incremental snapshots and filter the stream through gpg on the way out.

I suppose your other issue is making sure that payments still get made in a timely manner if for some reason your home country freezes your assets or you get arrested. I guess it just depends on what you're up to and how paranoid you are. Personally I wouldn't worry about it too much. You could probably just deposit a portable hard drive with a friend or relative periodically if you're that concerned.

> When you use third-party services, the government can go to them. The third party might not fight the request the same way you would. And, you might not even know it happened. The third party might be expressly forbidden from telling you it happened, in fact.

I just read the LinkedIn Incident [1] from the Darknet Diaries, and it's scary how the FBI managed to get all that information about the Russian hacker.

[1] https://darknetdiaries.com/transcript/86/

I'm a bit astonished that LinkedIn's IT[0] needed the FBI to figure out that the person had a unique useragent. And that they don't have alerts for unknown IPs SSH'ing into their server.

[0] though this is before Microsoft acquiring them, so it was probably just the usual startup reckless abandon.

I recently had to hand-over a ton of data for a police investigation. The data had to come from off-site backups, I had to write manual SQL queries because of unique data requests that required cross references. All in all a lot of work that would be hard and time consuming to get if they bypassed me and accessed the raw data from my VPS provider. It would have saved me a ton of time though had they bypassed me.

Did you charge them fees? It can be possible to reasonably recover costs associated with these efforts.

No, I didn't. Not sure this is possible in the Netherlands. I had an hour long Teams call for them to know what data they could request. After the formal request came in it took a good part of the day to get everything they requested. Received some follow up requests so probably a full day "lost". If nothing else it was a good test of the backup system.

If it happens again, it may be worth asking. I don’t think this is the kind of thing agencies offer cash for out of hand.

I don’t know how law-enforcement is funded there, but presumably they do use contractors and service providers.

If their alternative is to pay a contractor who has no familiarity with your system, it would be preferable to simply pay the person who knows what they’re doing and be done with it.

It doesn't work that way. Government forces us to use their service for a fee, and forces us to provide services for free. Tax filing, and handling authorities requests are prime examples.

It does sometimes in the US.

Some agencies will pay companies to perform digital forensic work necessary to offer data. Even administration of the job of forensic work, ie PM level assistance can be compensated.

This includes if a company is served a subpoena and a warrant to provide certain data.

Billing can be good, certainly valley-competitive.

Iirc, this has been well covered as a thing that happens at big tech companies, including fb, but it also can apply to small ones.

> When you self-host, the government has to come to you for your data.

Right, and your example of a literal server in a basement supports that, but if you are colocating or using a VPS they will almost definitely go to your provider first and probably won't even tell you.

Nope. If you colocate hardware which you own (which is what colocation means), then they can't just go get your hardware. Even if they break the law and nab your hardware, you'll know because it's down.

With VPSes, they can get your data and you might never know. It's an extremely important distinction.

To clarify this, the government has to go through certain procedures to seize your private property. If you own a hardware server, it is your property, even if it is sitting in someone else’s data center.

Supposedly they have to do that for safety deposit boxes too, but as recent events have shown in LA, that doesn't stop them from seizing everything including those boxes and then opening them up to take inventory. A judge objects, but it's too late. Now people are having to prove that they own whatever was in those boxes to get back their stuff back, and if they can't -- everything is gone.

If you encrypt the disk is a VPS provider going bother going to effort of trying to hook into the running machine via their hypervisor in a way that won't be evident to the owner of the server?

I'm not saying they can't I just don't see that they would spend their time doing this when they can send to the request to the server's owner and then it's no longer their problem to deal with.

Unless you’re in an environment where you literally have to type or provide the decrypting key on each start, you are dealing with a situation where your provider has both the encrypted data and the encryption key.

> Unless you in an environment where you literally have to type or provide the decrypting key on each start

The OS may boot up, but one could have the data on a separate volume. Services won't start until that volume is mounted, which could be manual-only. Either LUKS-on-any-FS or encrypted ZFS would work.

With encrypted (Open)ZFS you can actually send encrypted bits remotely: the destination does not need the key to save the bit stream to disk, so you can have a secure cold storage copy of your data.

> There's an even more compelling reason to choose OpenZFS native encryption, though—something called "raw send." ZFS replication is ridiculously fast and efficient—frequently several orders of magnitude faster than filesystem-neutral tools like rsync—and raw send makes it possible not only to replicate encrypted datasets and zvols, but to do so without exposing the key to the remote system.

> This means that you can use ZFS replication to back up your data to an untrusted location, without concerns about your private data being read. With raw send, your data is replicated without ever being decrypted—and without the backup target ever being able to decrypt it at all. This means you can replicate your offsite backups to a friend's house or at a commercial service like rsync.net or zfs.rent without compromising your privacy, even if the service (or friend) is itself compromised.

* https://arstechnica.com/gadgets/2021/06/a-quick-start-guide-...

Nobody is arguing that it's not possible. We're just saying it's a huge hassle and that even being willing to go through the hassle on every boot is itself a red flag.

It's not a huge hassle, it's a mild hassle. I'm no ZFS expert, but LUKS is trivial.

How many times do your systems reboot?

But typing in the key at boot / mount time is the only setup when disk encryption makes any sense at all.

Full disk encryption with the key stored in a TPM or something makes sense as a way to enable a quick secure erase. If you clear the key from the TPM, the storage is useless; or if the storage gets removed for decommisioning, it's going to be hard to match it back up to the TPM, even if the TPM isn't cleared.

Dumping VM memory contents is pretty trivial.

AMD's SEV and Intel's SGX should protect from this. Of course, you still have to take the VPS provider's word that they've enabled them on their CPUs.

...which is approximately zero VPS providers. I haven't seen them advertised outside of specialty azure/aws instance types.

> you still have to take the VPS provider's word that they've enabled them

No, you don't. Both of those implementations provide hardware attestation via vendor keys securely embedded in the CPU. I have no idea if any providers currently make such features available though.

That is for applications specifically written to compute on the secure element, no?

The parent poster probably got his terminology confused. AFAIK SGX runs on the secure element, SEV is for isolating the VM from the host.

> Self-hosting a good spam/phishing filter seems almost impossible in 2021

No, it's very easy to filter spam locally. You don't need huge amounts of data, just your regular email. Which makes it much better on your data.

Running my own email infrastructure for a long time, filtering spam is a non-issue.

I self host my personal mail server with stock debian / exim / spamasssasin without any tweaking on a tiny A20 Olimex Server. Spam filtering works better than that of the professional posteo.de service which I also use for a club.

How is your email deliverability though? My main issue was having my mail sent to spam even if my IP was clean. I resigned and moved to O365 and haven’t had issues. But I hate that I had to do that.

Not OP, but I have had deliverability problems with only one provider, and that is outlook.com. They seem to not care at all whether you have set up everything correctly (I pass all checks for reverse DNS, SPF, DKIM, etc., and I am not on any blacklists) but just have their own shitty whitelist of senders and throw everything else in spam. I had to throw in the towel and send through an SMTP proxy hosted by my VPS provider which solved all issues.

Please try to avoid using O365 as they literally are the main culprits that make self-hosting email a pain in the butt.

Wanted to say exactly the same thing.

I've set up everything according to best practices (SPF, DKIM, TLS, static IP for almost a year, reverse DNS, blacklist removal, spam checks).

I've also repeatedly contacted Microsoft support to get unblocked. All my requests to whitelist the IP in the last year or so have been ignored.

Microsoft is the sole bad actor I've encountered in more of a decade of self hosting email.

On principle, I've decided not to use a different provider, and users on Microsoft services will not get emails from me or from my websites.

This will only change if enough people complain. As a paying O365 customer, I'd encourage you to open support tickets that you're not receiving emails from some the smaller email servers, e.g. those hosted on DigitalOcean.

I've had Outlook suddenly start sending family member's Gmail addresses to spam on multiple occasions. I truly don't understand what's going on there.

same experience.. I run a small ISP. we only take paid clients so no one is a spammer (other than lost passwords). no delivery issues elsewhere. yet Microsoft filter absolutely everything into spam.... emails to support just get a "your request has been denied as not applicable" or some such junk

No problem at all. I do not host from home because the IP's of private cable providers are blacklisted in spam lists, but from a colocation in a small data center.

Not the OP but I have a similar environment, and do not know of any deliverability problems. Early on I found mails to one or two providers, like Yahoo and some Canadian ISP were bouncing, but I got a new IP and those troubles went away.

> When you self-host, the government has to come to you for your data.

And better, if you catch wind they're after you, you can format your HD to zeroes, or (if you don't want even the physical drive around) throw it in a fire or something :).

Friendly reminder that if law enforcement asks you for data, you can fight it in court, but they can require you to preserve the data while you fight. Deleting data under such protection could end with you facing an obstruction of justice charge.

Better yet is for the government to not even know the data existed, non? (Of course, you'd better make damn sure that they had no way of knowing before.)

And when you do this they go ahead and convict you for that instead often much easier than whatever they were trying to get you for in the first place.

The person in the example got to use any criteria she wanted to distinguish personal from work email (which seemed to be sorting on keywords and phrases), do all this privately before turning the work emails over, and IIRC charge the government for the time it took. If she had co-located, I bet they could have carted that server away, and her person would have to do the same process in some office with officials in and out of the room and over their shoulder.

I agree keeping all or some portion of data self-hosted should be an important aspect of data storage for everyone, but the same does not hold true for email. You see, the problem with emails is that unless you are sending emails just within your organization and controlling where it lands (landing server), you cannot guarantee where it lands.

Email is communication with other people, if you are sending an email to a person using Gmail your basement server for email gives you no protection over your email data a such. Govt. can easily request email data from Google of the recipient's account.

> if you are sending an email to a person using Gmail your basement server for email gives you no protection over your email data a such. Govt. can easily request email data from Google of the recipient's account.

It does give you protection on the fact that they then need to know the recipients emails and do multiple warrants to gather them if they are over multiple providers, which may or may not go through. For sure it's easier considering that most people use a few US providers, but it's not always the case (even less so for governments matters, which include foreign countries, thus foreign providers too).

Another great reason for self hosting if you or an organization you work for will ever be at odds with a governmental power structure:


The most notable example of self hosting going right is CNN, they self hosted their emails and were therefore able to fight the court order until it is narrowed and there was a change of leadership in the white house & DoJ.

If you aren't going to self host write it into the contract that you must be informed (Google pushed back on court order because it would have violated the contract with NYT)

Instances of data seizure that went unimpeded: Phone records (both work and personal) for all orgs. Emails for Politico, buzzfeed, the Times, a congressional staffer, and more. iCloud metadata for at least a dozen individuals associated with the House Intelligence Committee, and more.

Mail-in-a-box [0] has a very good mail filter. Junk mail is about at Gmail levels for me, with almost zero false positives and almost zero false negatives. Some of my accounts are fairly high volume and I have found its performance to be very acceptable.

The fact that I can host as many domains and accounts as I want with all kinds of filters and rules and forward them all to my main account as needed with rules is just gravy.

[0] https://mailinabox.email/

Hillary Clinton was being investigated for using her personal basement server to handle official emails containing classified or confidential information. Something which, if I had done it when I had a security clearance, I would have been not only fired on the spot, but escorted off the facility in handcuffs for doing.

Anyone can put classified information into your email account by forwarding the right news story or Wikipedia page to you in an email. There is a lot of classified information that is also publicly known. Federal law enforcement understands this and takes it into account when deciding to prosecute.

Note that Hillary Clinton was not prosecuted despite the subsequent administration basically running on a promise to do so.

Official business with classified information is never done via email, even if everyone is using the government email servers. There are separate networks, devices, and protocols for storing and operating with classified information.

Also, as has been said repeatedly, the US government doesn’t have a binary “classified” or “not classified”. There are many different levels and administrations introduce/adapt them as necessary[1], and there is a practice of retroactive classification.

[1] https://en.wikipedia.org/wiki/Classified_information_in_the_...

Though with TPM/full drive encryption you can have a box you own hosted by a third party but that third party cannot "open".

What was the most popular TPM chip for many years had a broken RSA key generator. It produced private keys that could be cracked with $76 worth of computing power:



It is really hard to see this as anything other than a bugdoor.

My laptop has this TPM chip. I am really glad I never used it, and even went so far as to disable support for it when I built my coreboot image.

Products sold with the buzzword "trusted" are a magnet for this sort of garbage. They've painted a "please bugdoor me" target on their back. The only thing you can hope to trust is general-purpose computing devices, with a large market, that obey their owner. Unfortunately it is increasingly difficult to find those.

As long as it's running, anyone can exfiltrate your key material from SDRAM. And even for a few minutes after it's running, if they can dump them in LN2 quickly enough. There are kludgy schemes to make this harder, like Schneier's Boojum, but in the end your attacker just needs enough resources and patience.

Most FDE schemes don't run crypto ops on the TPM itself - key derivation occurs there, then the results are cached in RAM ( or sometimes, protected CPU registers, in which case they may be able to inject privileged code into the kernel address space? ).

LUKS on a colo will probably protect you if you're a fentanyl distributor or movie pirate. Probably not if you're a terrorist or a high-value nation-state target.

> When you self-host, the government has to come to you for your data.

Sure, but my ability to stop them is probably substantially smaller than, say, Amazon’s legal departments capabilities.

What incentive does Amazon have to fight against the government on your behalf?

I left open a proxy by mistake on an ovh server years ago, for 4 days. People found it and used it for fraud.

A few months later, all my personal gmail account are seized and I reveive an email (that I could read after changing my password) from a police department in god fuck knows where middle of nowhere countryside asking me for data on the proxy usage.

Sadly I had revoked the server subscription since I didnt need it anymore (and probably hadnt kept any logs anyway since I was just playing aroud with a server) but I really really wanted to help.

I mean, it s rare the police would call you for a legitimate usage and political suppression. They call you for fraud with damage and it s awful being responsible in small part but unable to help... I was not mad they read all my emails, I was sorry someone lost money because of my mistake.

> left open a proxy .. People found it and used it for fraud

Maybe I haven't had enough coffee, but I'm failing to connect how leaving a proxy open was a major enabler for fraud. What kind of fraud?

The trust of their customers?

Afaik the US Government is a big Amazon customer.

I would imagine that particular customer would rather Amazon not quietly honor, say, a Russian subpoena for their data.

does it mean you have to put your data into Yandex or Alibaba Cloud if you wanna avoid USG quietly getting it?

The problem for an Amazon hosted server is US subpoenas, not Russian or European or whatever...

Amazon has AWS regions in six continents.

Ha ha ha ha ha...

Amazon? Trust? People trust Amazon to exist and to bill. Providing services to those who pay the bills is almost incidental.

Any company's legal department is like HR, it's role is to protect the company, not the employees and certainly not the customers.

Even more so for non-paying users, as in gmail or facebook.

Especially when the companies are already happily selling account metadata.

Getting a reputation for handing customer data over to the government without a fight seems like the sort of thing that would damage a hosting company.

>It didn't effect Experian.

You, as a consumer don't really get to choose experian or not.

>It didn't effect Yahoo.

Who says it didn't?

>It didn't effect Sony.

So a bunch of internal business documents got leaked. As a consumer I couldn't care less.

>It didn't effect AT&T.

If every provider was mandated to do this, then I wouldn't call it "poor data security reputation".

It’s probably better than you think. You’ll need a competent lawyer but beyond that you’ll depend on the court system, which attempts to put you and the government on equal footing.

Depending on the legal issue at stake, it might also be possible to access additional legal expertise pro bono, or through an organization like the ACLU.

Amazon probably won’t even try.

They’ve clearly and openly committed to trying for years. https://www.computerworld.com/article/2705826/amazon-web-ser...

Even Twitter doesn’t like to roll over, and they’ve got a lot less at stake. https://www.latimes.com/politics/story/2021-05-17/twitter-fi...

But your ability to delete the data is substantially higher than your ability to get Amazon to delete it.

If you are hosted on AWS, it is really easy to delete your data.

Also, you can encrypt it with keys that they will NOT use to decrypt.

The data will also NOT leave the region (or country) that you specify

What guarantee do you have that Amazon will delete it when you tell them to, though? It doesn't even necessarily come down to whether you trust Amazon ethically and legally, but also whether you trust their internal processes.

Shredding the data on your own hard drive gives you a pretty good guarantee. Drilling a big gaping hole through it afterwards gives you an even better one.

Is the "NOT" due to process, or technical constraints? Because it's very easy to make an exception to normal process, if the right people are asking

Quite the opposite.

Ability and willingness are two different things.

> When you self-host, the government has to come to you for your data.

Yes, and rather than sending a letter to the hosting company, they can come to your house and confiscate all electronic equipment. (that's not a joke btw, when local LE comes to your house, you can lose anything electronic from laptop/server down to backup drives and ipod, possibly taking years to recover) For me that doesn't sound like a good potential tradeoff.

> This was why Hillary Clinton wanted to host her personal email in her basement.

[citation needed]

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact