Hacker News new | past | comments | ask | show | jobs | submit login
Self-Hosting Dozens of Web Applications and Services on a Single Server (cprimozic.net)
511 points by mattrighetti on Dec 31, 2021 | hide | past | favorite | 280 comments



Easy setup with:

- traefik (nginx proxy with auto letscencrypt)

- portainer (docker container management)

- fail2ban (basic security)

- logwatch (server / security stats by mail)

- munin (server stats)

- restic (cloud backup)

- unattended-upgrades (auto install security updates)

- apticron (weekly info)

- n8n (automatisation for e.g. quick info via telegram, if something not work)

Run every app that you want in your container.


This is _very_ close to my exact setup. Though, I never used portainer (just manage a big docker-compose file) is it worth it?


If you are comfortable with the cli you won't need it at all. It over complicates things in my opinion.


I also put everything in docker containers, and docker-compose for each thing is a must. It usually pains me to see when popular projects have no decent docker-compose files available and I have to make them.

For backups restic is a blessing. And so is Syncthing for getting things to the backup machine quickly (from mobile and non-unix machines).


> I also put everything in docker containers, and docker-compose for each thing is a must. It usually pains me to see when popular projects have no decent docker-compose files available and I have to make them.

On the bright side, it's a one-time cost and generally pretty simple, especially if they provide a sample `docker run` command. (Although yes, it would of course be better yet if projects would publish one so that it was easier)


Portainer does make certain tasks much easier. I use Portainer to clean up any Docker volumes, containers, and images I no longer need. Doing that for a lot of Docker resources using the command line gets old really fast.

I say this as someone who uses docker and docker-compose extensively and I'm very comfortable with a CLI.


Can't you:

  docker system prune
with perhaps a filtering flag attached?


How do you deal with backing up the data/“disaster” recovery? (I out disaster in quotes because we’re talking a about home servers so not mission critical data but still pretty annoying to lose.


If you have OP's level of control over the hardware (ie; you have a dedicated server on which you can access the hypervisor), then taking incremental backups of the entire VM is the best way to ensure you can hit the big undo button if anything goes wrong.

The most important, and neglected, part of backup & restore, is the restore. If resources permit, this is where I like to use a battle tested solution instead of rolling my own with scripts. For my self hosted servers I use Veeam, but there are many good alternatives.

It's nice having the option to restore the entire VM to a specific state, and to also be able to mount backups and pull files if only a few need to be restored. It's also handy to be able to spin up a backup into its own virtual machine.


> (...) taking incremental backups of the entire VM is the best way to ensure you can hit the big undo button if anything goes wrong.

What if your server is already compromised? Doesn't that approach ensure your server remains compromised after you pressed the big undo button?


Potentially, depends when it happened, which point you restore to, and if the vulnerability that got them in in the first place is still present. If one of my servers got hacked I personally wouldn't risk it, and would just nuke it and rebuild. If everything's in Docker containers then the services can be torn down and spun up easily, and databases can be exported from a backup VM and imported into the new server.


Not OP, but I've been tinkering with the idea of having a raspberry pi w/ an ext hdd stashed at a friends or parents place and do something like an rsync over wireguard for super important stuff.


I'm not doing it remotely, but have an Odroid HC2 set up as a sort of single-drive personal NAS on my home network. Restic backing up to there is absolutely stellar, including being able to mount a snapshot for selective recovery if needed.

The HC2 is discontinued now, but the HC4 looks like a nice, current option. https://www.hardkernel.com/shop/odroid-hc4-oled/ Maybe I should get one of these to take to my daughter's place.


easy? Is this sarcasm?


I think just the typical evaluation of someone with expertise in doing something, it's easy if you're a real devops kind, then just put together a bunch of things, do some config files, write a make or two, take two or three hours to do something that the rest of use take a week to do. in the same way that I might set up a service to scrape XML files and fill up my ElasticSearch search instance, and take a couple hours to set up a working service that I can keep expanding and other people might be like - easy, is that a joke?


"Easy" is probably because it boils down to, install X, Y, Z, edit config for X, Y, Z, start daemons X, Y, Z. There's no complex math or thinking involved, just plugging stuff in.


Rebuilding a transmission is easy in the same way, undo some bolts, replace some parts, press in some seals, and screw some bolts back in!


It takes practice. I just started making smoothies in a blender, and there are a bunch of little things to know to make it a little easier on yourself. It’s not just “throw all your shit in a blender and push button”, at least not with the cheap blender I have access to.


sure but you have to spend lots of time reading documentation and know stuff to figure out you need to put these things together.


But once you have, it feels "easy" (and might even look easy to someone who doesn't know the work that was put into learning it)


which is exactly my point? He calls it easy because he can do that in a couple hours, and if problems arise later he can always just fix them with a quick 10-20 minutes. It's easy.

On a side note this is actually a problem, in my experience, if I find something easy I might not cover every edge case - I can cover those if and when they come up - but given enough potential inputs and users the edge cases always come up. And then what happens if I am unavailable at that moment, the edge cases need to be handled by people who will find the easy solution I built up incredibly hard and daunting.


As someone running something similar, I thought it was quite easy when I first set it up: I used the similar setup of a friend as a baseline to get through the configuration. It took about 1 hour to setup the base things to have the infrastructure running.

It sounds more complicated than it is.


I always hear about the easy setups, but never about total (man-hours included) cost of ownership through a couple release cycles on each component.


I have been running my own personal servers in a similar setup for the last 10 years. Have turned on automatic updates, including automatic reboot, and everything runs in docker (using docker-compose).

I can not remember a single time something bad or unexpected happened. Only the planned things - upgrading the distro every couple of years, and updating major versions of the things running in containers probably once a year or two. And maybe sometimes some unplanned updates if particularly bad vulnerability gets disclosed in a popular software/library. I am pretty sure I don't spend more than a few days per year to manage it.

If I had opted for a cloud vendor managed alternative, it would have been so much more expensive. I have definitely saved thousands or tens of thousands over the last 10 years.

But then again, I know how to manage it and I planned it out so it would not cause too much trouble for me. Prior to this setup I endured many painful moments and that "wasted time" allowed me to think of a better way to manage it and avoid certain problems along the way. Also available tooling has improved a lot.

Then again - this is for my personal projects and I would do it somewhat differently for large projects.


> I always hear about the easy setups, but never about total (man-hours included) cost of ownership through a couple release cycles on each component.

I run about half a dozen web apps on a single node on Hetzner with Docker swarm mode + traefik ingress + whatever the web apps need.

Any app I have is deployed in seconds as a docker stack. I treat my Docker swarm node as cattle, and I have an Ansible script to be used in case of emergencies that deploys everything from scratch. The Ansible script takes, from start to finish, only a couple of minutes to get everything up and running. I can do this with zero downtime as I have an elastic IP I can point at any node at will.

If I wanted, I could optimize everything even further, but it's already quite fast. In fact, I can get a new deployment on my Hetzner setup up and running faster than I can get an EC2 instance available in AWS.

Proponents of big cloud providers as the only viable option typically have absolutely no idea what they are talking about regarding availability, redundancy, and disaster recovery. It's mostly resume-driven development seasoned with a dash of "you don't get fired for picking IBM".


Easy in comparison to make everything yourself and configure every little service. This needs understanding of service and how they work and it needs much time.

After 20 years doing this, my postet example is stupid simple, that works for 80% auf all equirements.

For the rest 20% you need a good admin, or book a service with a good admin.


It’s super easy. Like literally would take someone who’s worked in infra an evening to set this all up and then a Sunday morning to have it automated in Ansible.

It’s a single server running a few containers with config files. The complexity comes when you outgrow a single machine or need stronger availability guarantees but none of that matters for a single-ish user setup.


Not easy to set up. But perhaps easy to maintain


If you compare it to an out of the box service from cloud vendor, then yes, it does take a bit more time, but not that much more time.

For long running projects it is certainly worth to invest into proper setup. It really is a lot cheaper in the long run.


Wow so easy, only 9 different services. Then there's the underlying OS, managing hardware & network setup. Also need to make sure your network provider actually allows you to host (commercially?) in your own home. And ensuring that you have a static ip address for your server.

So easy :')


> need to make sure your network provider actually allows you to host (commercially?) in your own home

If you're hosting something commercially, you should get a commercial ISP plan. If you can get it at home, why would the provider not allow you to host your services that way?

That said, why would you do that? It would be very hard to scale this operation, so unless you're planning to be a tiny (and likely lossy) operation forever, get yourself a cheap VPS to start with, then migrate as needed.

This post is about self-hosting services for yourself, and perhaps to a few close relatives and friends. Many of us do that (have a look at r/selfhosted and check out the Self Hosted podcast), and OP's set up is one of the simplest around.

> ensuring that you have a static ip address

There are many ways to access your services without one. A mesh network, like Tailscale, ZeroTier, Nebula, is my favourite, but a regular VPN also works, and so does dynamic DNS.


Yes it is. Install e.g. ubuntu, install the services, done. Every other mentioned problem is a problem of your provider, not of your setup.


I think I’ve had the same IP address from my cable company for more than a decade now. (Regular old cable modem user with a dynamic, but unchanging, IP address.)


Have you ever looked at nginx proxy manager? I find it much easier to use than traefik since everything can be done through a UI.

https://nginxproxymanager.com/


Yeah but once you figure out Traefik, it’s just 3 extra lines in your deployment files for every new service. And I inevitably have to redeploy again, and I hate doing the same boring thing twice, so it’s nice being able to bake complete orchestration into a repo.

(And it’s also nice for being able to try things because your repo has a complete full snapshot of your setup that you can diff.)


The thing I like about nginxproxymanager is that it's easy to add non-docker hosts. There are some services that I route that I don't have in the same docker cluster as everything else. That requires static files changes for traefik itself somewhere.


A few months ago, I made an offer of $100 in one of the freelancing websites, for someone to set-up something like your configuration on one of my Digital Ocean instances. I asked for a few more apps to be installed (git, svn, etc). There were no takers :-)

I think a web site/service which lets you choose "apps" and spawns a VPS instance would be very useful and profitable (Think "ninite for VPS"). I started to work on this but never had the time to continue. With an Ansible/Chef/Puppet recipe, this should be relatively easy to do.


I believe there were no takers because 100$ for such a setup is completely out of market.

I wouldn't be surprised if a freelancer would charge 100$ per hour to do this kind of work spanning multiple work days.


Seems like https://sandstorm.io/ could be what you are looking for.


Probably not, sandstorm is abandoned, most softwares are broken and unmaintained, gitlab version is from 2016 and full of vulnerabilities, same for WordPress version from 2018. Project is dead. I think the guy behind the project was hired by Cloudflare few years ago


I know that there was some work on reviving sandstorm after Kenton Varda joined Cloudflare, see e.g. https://sandstorm.io/news/2020-02-03-reviving-sandstorm, but it is very possible it never got anywhere. Sad but understandable.


A lot of VPS providers have a catalog of apps to install for you and dump you in a web management console. Sometimes it's handy but usually the security is awful. Back in the day, Webmin was the go-to way to do that and configure your server in general from a web interface


Although not exactly user-friendly, I created my first proper bash script the other day for setting up a postfix server on a vps. You have to create the vanialla vps first but then the script does all of the apt install stuff and then uses awk/sed/whatever to modify config files.

The nice thing is that it is mostly what you would do manually anyway and the commands are unlikely to change much/often since they will install latest versions of postfix etc. when you run the script.

I think this might be more doable since the scripts are easy to create and maintain so perhaps just a site with bash scripts for e.g. "Postfix + Sendmail" or "PHP7 + nginx"


Whether or not you classify it as “easy” isn’t relevant. You hit the nail on the head in the prior sentence: “time” is the issue, and you’re asking someone to trade their time for a pittance.

FWIW I’d charge way more than $100/hr for this work.


I think they mean that if you already have the setup as a provisioning script all you would need to do is to modify it a little, run it and get cash in return.


Something like CapRover wouldn't work for you? Although it's not very up to date. And the one-click collection of apps, is a bit outdated too. You'd need to reference your custom docker images.


Doesn’t DO itself have “apps” ?

Also see Cloudron. Not cheap but I’ve heard that people are very happy with their service. Basically you self host their “app as a service” platform so to speak.

Kind of like a super polished Sandstorm, but totally different sandboxing technologies (believe Cloudron uses Docker but not sure if they still do—- and I believe Sandstorm used Kenton Varda’s cap n proto technology which probably allowed for even greater sandboxing/protection than Docker, I would have to imagine..)..


I used to be a Sandstorm user but yes it was abandoned. Have been a Cloudron guy since the start and have been very happy with it.


This is what we’re building over at https://KubeSail.com :)


Take a look at cloudron.io. It's not open source but sadly sandstorm.io apps have gotten out of date.


year 2000 called, they are asking for cpanel back


You can choose apps and spawn a VPS on DO already.


But can you install multiple apps on one instance?


Love this, I have a similar setup but had never heard of fail2ban or logwatch. Looking forward to checking these out


CrowdSec may interest you.

https://crowdsec.net/

(No affiliation)


Just to add one single point from my side:

• Backup is not to some cloud service, which I cannot control but done to a small server at home with a NAS attached in the basement.

• some smaller service during development I run simply from home from an small and old ALIX based on AMD Goede. Only probleme here, I need to upgrade at some point because some SSE2 instructions are not supported, which makes now problems with newer packeges including some self-compiled packages.


Three independent, but somewhat related thoughts on this topic:

1). On HOWTO articles about infra (1/2): I'd like to see more articles that lead with requirements, rather than setups that then justify the setup with requirements. Like, congrats, you managed to host a bunch of web applications via containers on a dedicated server. It's really nice for a super personal project and I'm sure it helped OP gain a lot of operational experience across several domains, but I just find this type of article to be "DYI porn" for a highly specific subset of DYI.

2). On HOWTO articles about infra (2/2): Is there any sort of comprehensive infra setup guide out there? Even something that just covers "core" AWS services (CloudFront, EC2, ELB, ECS, Lambda, CloudWatch, CloudTrail, and a few others...) feels like it would be incredibly useful for so many people.

3). "AWS on Rails" - I feel like we're getting close to a time when "AWS on Rails" emerges and we see the birth of a new opinionated tool that says "Most people just want to ship their product, not fiddle with cloud infra. Do it our way and you can focus on product instead of infra"


AWS is complexity-as-a-service which abstracts away so much that you pay a HUGE price when it's time to scale -- and it's a confusing matrix of tradeoffs. I find it easier to login to a server, "sudo apt-get install" and tail some logs, than to try to manage the monstrosity that is AWS. Products that go horizontal-cloud-first are easy to scale, but burn an order of magnitude more money, with far worse realtime performance. At the end of the day, AWS is just an abstraction on top of CPU, RAM and disk -- if your app makes bad use of underlying CPU, RAM and disk, then no amount of cloud magic is going to fix that, it'll just mask the problem and you'll end up paying an arm and a leg. AWS/GCloud are actually incentivized to make it hard to debug and optimize performance, they would want you to waste resources on their platform.

See also: https://news.ycombinator.com/item?id=29660117


Exactly, at this decade it's still a ripoff = huge margin for them, but market is there, one can't deny that. It'll even out eventually so they're on par with smaller shops and dedicated/colocation, eventually they'll have no choice but to cut margins and leverage scale to kick out smaller competition, they'll delay it, of course, as much as possible with lock-in dark techniques etc – but at the end there is no reason, in theory, why they should not be more attractive price wise with anything else you can come up with. At this decade-tick timeline they're happy taking huge margins and people just throw money without much of first principle etc thinking.


No:

- They’ll leverage GDPR to make it only possible to comply to all privacy standards (SOC, credit card management…) if you use AWS,

- Worse, they’ll provide special Linux repos with vulnerability fixes that only they provide. Log4j can make all your distributions obsolete in a single day, and only Amazon Linux users will have that patch, this quickly.

- Then insurances will stop covering people who deploy their software manually, because vulnerabilities are too prone to happen.


In the long run and for most higher global nationwide scale AWS/GCP will be cheaper from 1) general reduced price every few quarters/years 2)private contract. It’s important to realize those cloud providers does a whole bunch of work to reduce cost internally so customers can pay less (eventually)


AWS and competing clouds are a non-starter for anything resource intensive such as processing lots of data (needs tons of CPU/GPU) or serving/moving lots of it (needs tons of bandwidth).

The costs are bearable for early-stage startups that are mostly typical web applications with little resource requirements, or bigger startups that have so much VC money they are happy to burn it, but beyond these two scenarios clouds are very bad value for money.


Generally, application level decisions will have a bigger influence on cost than underlying compute storage for large data processing applications.

Tradeoffs that will dominate the costs

1. Are you using a fast language or a slow language? (10-100x cost difference)

2. Are you using an efficient storage format such as parquet? (10-100x cost difference)

3. Are you using a reasonably efficient query planner job processor? (1-10x cost difference).

4. Are you using a good algorithm for the data processing task? meaning a good efficient SQL, An efficient imperitive algorithm etc. (unbounded cost difference)

The above tradeoffs will have a cost difference of up to 10^5 ( or greater depending on point 4). Once you account for risk, utilization, capex, and people costs. The cost difference between different compute platforms is usually negligible to the above points.


What about the cost of optimizing the process? If compute can be obtained very cheap elsewhere, it may be cheaper to just use that with an inefficient process rather than spend time/money optimizing the process and then running it on expensive compute.


Depends,

Cloud providers cut compute/storage costs every few years. You can get long term contracts with them that have heavy discounts for guaranteed spend. Switching compute instance types is pretty cheap, and switching compute platforms is relatively low cost when the time comes. Meaning that investors can generally expect your compute costs to fall YoY all else being equal.

Building out your own compute generally means adding new headcount to deal with colo things such as vender management, technical details such as networking, capes/opex management, and finally hardware selection. The odds that the hardware you select perfectly matches the compute you need in 2 years is low. The odds that a cloud provider has made a cheaper compute option in 2 years are moderate. The odds that your app needs something that’s available trivially in a cloud provider are high.


There is a life beyond AWS. I host applications (including ones responsible for 10s of millions in revenue) on dedicated servers rented from Hetzner and OVH. Do not even do containers as my normal deployment is server per particular business (with standby) and the cost of renting a dedicated server comparatively to revenue is microscopic. CI/CD / setup from scratch / backup / restore is handled by a single bash script and has virtually zero administration overhead. For my model I see zero business benefits doing anything on cloud. I did some deployments on Azure due to a particular client requesting it but management overhead was way higher and do not even start on costs vs performance.


The business benefits of the cloud are ability to scale, an API for everything, cheap automation, cheap managed services for everything, and good customer support. Like you say for your model none of those things may be important, but they're killer features for others


>"ability to scale"

I write C++ business API servers. On the type of hardware I rent from Hetzner /OVH they're capable of processing thousands of requests per second. This would cover probably 90+ percent of real world businesses.

>"an API for everything"

Not sure what you mean here. API for what in particular? My servers talk to partner systems of real businesses. We consume their APIs and they consume ours. Integration is done by little plugins that translate requests should it be needed. I am not sure how Amazon will help in this department. The only generic API that everyone uses is email but you do not need Amazon for that. As for database - I always deploy one local to my business server.

>"good customer support"

Frankly in many years there was like 2-3 times when I needed something resembling support from Hetzner / OVH. In either case the response was prompt and satisfactory.


Of course there are use cases where not using cloud facilities is cheaper, but unless your business is low margin the difference might still be irrelevant and having the managed services available easily can be handy if you ever need them.

There are many use cases where hosting fee is not the right thing to optimize for.


The last time I compared hosting prices of AWS with self-hosting. AWS based hosting would have cost us every three month the cost of buying over the top speced server hardware. These numbers are not directly comparable, but they are a large difference to account for colocation and maintenance. And we never managed to get a similar performance of postgres on AWS compared to the self-hosted version we use now. But the main advantage is the simplification you get. It is certainly not easier to use you own hardware, but it is far simpler.


> > "an API for everything"

> Not sure what you mean here.

I assume referring to APIs for controlling and monitoring the infrastructure, not anything about APIs you may provide in your application or consume from external sources.


What infrastructure? Thanks to my primitive approach I hardly have any.


If you want to dynamically provision storage or grab an extra instance when you’re under load or something. It’s handy to not have to sit on a bunch of storage you’re not using just in case there’s a rush.


Unless the storage (or compute) is so cheap that it's cheaper to just always have enough on hand than bother with autoscaling and the added complexity and potential point of failure.

Old-school dedicated servers are so cheap that you can match a startups' peak autoscaled load and still pay less than their "idle" load.


Exactly my case. My peak load costs are way less than their "idle". Storage expansion if needed is even simpler to handle. Those cloudy people like to create problems where none exists.


Playing devil’s delegate with a normal process how we launch new service in big tech:

1. Have you done a security review and ideally pen-testing from a 3rd party? Are you fully TLS from end to end, are you aware of any exploitable vector from your API/UI and how you mitigate them

2. How do you handle software patching and vulnerabilities?

3. Do you consider your app Operational ready? Can you rollback, do you audit changes, do you need multi-AZ resilience, do you need cell based arch. Did someone other than yourself or outside your team look at your monitoring dashboard? Do you need to be paged at 3am if some of your dependencies or underlying node degrades. We have to answer a 50 questions template and review it multiple times..

4. Did you calculate the cost and forecast it with how the service/app may grow in the next 12-36 months?

While you still need to do all this when using a Cloud providers, you probably _should_ do much more if you manage the bare metal

If you have already done all those, Kudos to you, but I still find it hard to trust everyone who DIY


My clients do / pay for whatever the audits are required in their line of business. I do not get involved unless told to do some particular things.

>" but I still find it hard to trust everyone who DIY"

It depends on what is the actual DIY part is. In any way it is your problem. Not mine. Sorry to say but to me your whole post feels like typical FUD scaremongering client to cloud.

>"While you still need to do all this when using a Cloud providers, you probably _should_ do much more if you manage the bare metal"

No I should not probably do much more. I do much less as I have way less moving parts.


I just think all those features are necessary for any serious usage and it’s costly to build all on your own. But perhaps you are right since it fit your business model - you seems not interested in building a platform at all


>"you seems not interested in building a platform at all"

I build products. That is my primary output. And the actual delivery is a binary executable that does the job. And it does it with the stellar performance that makes all that horizontal scaling infra totally unneeded. I serve the needs of my clients, not the needs of Amazon.


To beelzebub the devil...how many large scale hacks/data leaks/embarrassment from large companies are because of a misconfigured S3 bucket?


I like this. Out of curiosity, could you share your bash script?


I do not want to clean it up from some revealing things, so no. But it is fairly trivial. If you are an experienced programmer you would not have troubles writing one up one yourself in no time.


Understood, it was pure curiosity anyway ;)


What do you use for backup?


I would recommend to use borgbackup - it is very convenient for security (it provides very flexible encryption options with safe defaults) and efficiency (deduplication)


scheduled jobs running pg_dump and rsync


I would guess rsync ;)


> I just find this type of article to be "DYI porn" for a highly specific subset of DYI.

This seems unnecessarily dismissive. (But maybe that means I am part of the 'specific subset'... :) ). IMO there is a big difference between 'tinkering' and 'self-hosting'. Projects you are tinkering with are typically transitory and can be treated as "pets". But in the 'self-hosted' world, stability and ease of maintenance are huge. In that regard I think the overhead of containers makes total sense, especially when running a bunch of stuff on the same server.


I may also be part of that highly specific subset because I enjoy these kinds of articles as well. It’s interesting to explore different solutions and to und west and their pros and cons even if they aren’t motivated by upfront requirements (you’re probably less likely to come up with an interesting configuration in that case anyway).


The problem with AWS is that you can’t really do anything without understanding 1. IAM, and 2. VPCs/networking. And these are probably the two most complicated parts. For DIY you’re probably best off avoiding AWS


This 1000%. It was amazing how much more comfortable I was with my suite of VPS's when I moved them from AWS to Digital Ocean! I always describe on-boarding with AWS as "drinking from a firehouse". There is so much to understand and it is hard to tell what you should actually care about. DO, on the other hand, has a much simpler set of services that seem much more applicable to 'non-enterprise' usage. (Plus the documentation for DO is great!)


I have a recruitment problem. If I tell recruits “We use AWS”, they’ll be happy the be waterboarded when trying to do something, they’ll have the feeling of being in connection with the rest of the world, preparing their resume for the future.

If I tell them “I have Digital Ocean, I maintain the load balancer myself, the database myself, ELK myself, and you’ll be expert in Postgres, Ansible, Nginx and Debian”, I sound like an old fart.

The future for them in AWS is a click-on-the-button deployment and they’ll master nothing.


Amazon tried to emulate Digital Ocean with Lightsail.

It does feel a bit like a Cinderella service however. The versions of the managed databases you can connect are sometimes several years old.


I had this feeling when I first started with AWS years ago. It was hard to find a good overview and all of the Amazon doc on individual services seemed to start in the middle. So, a lot of my initial understanding came through intuition, and trial and error.

For many scenarios, you can completely ignore IAM, but it's definitely not advisable.

On the VPC side, it's actually fairly straightforward, but you may need to come up to speed a bit on some networking concepts if (like me) that's not your background. Nothing too onerous though, especially if you have some technical background.

There are also some gotchas that allow you to too easily do things like create security groups or other resources outside the correct VPC. If you overlook that, you're in for some brick wall head-banging 'til you figure it out.


But how can we trust the DIY stuff meet compliance and hold the right security bar? It’s much easier to do with AWS.

Or maybe as a startup, to-C website you don’t really care


I think the complexity can lead to its own set of security risks, as people just keep opening permissions wider until things connect.


That's actually a really good point. Out of the box, it's hard to screw up because things are pretty locked down. It's really in attempting to open things up that the security risk comes in if people aren't explicitly aware of exactly what they're opening.

EDIT: and this isn't necessarily difficult to grok. A lot of what you'll use from the network side is security groups, and they are straightforward. /EDIT

There are also actually some bad patterns in the AWS Console UI that don't help here. For instance, despite all the warnings they place on S3 buckets about making things public, they still allow you to appear to change subobjects to private. In a traditional hierarchical directory structure, the more granular subobject settings would override, but not so with S3. If you didn't know that, then you've just shot yourself in the foot.


Great and interesting point. I believe the solution is to have “security by default” Infra-as-code construct and some static analyzer


When I am helping out people new to AWS, this is where 90% of the problems are. It suggests to me that they are lacking sensible defaults, when so many people have trouble and I just recommend connecting Beanstalk to RDS. I think some meta recipes that give you a set up where this is already secure and working make sense.


I built a tool for deploying web apps on AWS.. https://github.com/keybittech/awayto If you look at the deployment script app/bin/data/deploy, you can break things down into manageable pieces and understand what's going on probably within a day or two given you have some baseline understanding of frameworks and APIs.

But, I totally agree that the underlying nuance is a lot to take on when you start getting into X is required for Z which connects to A, yadda yadda.

That being said, if you choose a service and invest the time to understand it, you are availing yourself to a very wide world of technology "at your fingertips." You can most certainly say the same for DIY, just different shades of what you want to be responsible for I guess.


Sounds about right based on my experience with AWS for past 8 years. Does anyone (besides AWS/TF/Pulumi, for whom it’s not their core product) attempt to solve the networking and IAM? These two areas, esp AWS networking, despite being fundamental building blocks, just never get coverage.

I know I’m being a beggar and beggars can’t be choosers, however, I believe that there are folks here who could solve this problem and make a nice living off of it. Hope someone reads this comment and builds something to address this.


There's definitely something to this... but I think in practice most projects end up needing some amount of customization in their networking even if it's 90% "standard". In attempting to provide this, you might end up just gradually recreating TF/Pulumi anyway.


You can do a lot with EB for simplistic projects. But yes without that you are in for a world of pain using anything else


I think there's good value to be had from articles like these, depending on where you're starting and how you approach this kind of work.

On the requirements side, one thing I would like to see though is an approach for determining how your applications will scale on cloud infra.

For instance on AWS, which RDS and EC2 instances will suffice out of the gate, and at what IOPS/bandwidth? And when will I need to scale them? The metric is simply how many users can I support with acceptable response times on a given configuration?

Sure, we know that's highly dependent on the applications, stack, etc. But I've often thought there should be some rubric for approaching this that doesn't require going heavy on performance analyzers, load balance testing, etc, which frequently seem like overkill out of the gate when you're making your initial config selections.


2) I think this is a bit like asking, "Why isn't there a comprehensive guide to art"

Infra is massively complicated. Not only do you have competing frameworks/modules, you also have these change in different releases of Linux. Some of these are simpler, some are more comprehensive. Some are really hard to use but powerful/well-maintained, others are simple but might not be so good. Some perform well at a cost, others might perform relatively less well but are easier. Sometimes it is worth the effort setting up ufw on a server, sometimes it isn't.

BY the time you got to something that was simple enough to create a guide from, and which didn't change each week when some vendor renames a configuration option, it would be very high-level and possibly not very useful.


Heroku is "AWS on Rails" taken to an extreme and there have been lots of other in-between services and tools. Cloud 66, Laravel Forge and even AWS's Elastic Beanstalk are all part of that spectrum.


> Is there any sort of comprehensive infra setup guide out there?

This would be the AWS certification at their different levels.


I would tend to disagree. First, because not everyone is on AWS. Second, because even for people on AWS, some of a company's infra isn't on AWS (e.g. gmail). Third because not every infra setup work has an AWS tool.

To take a personal example, we use multiple infra providers.

Some of our infra is gmail (currently working on automating it with terraform).

Some of it is other infrastructure providers that aren't US-based.

etc.


What’s the “automating gmail with terraform” angle? G-suite automation? Or GCP?


g-suite.


Can you elaborate on the terraform automation for gmail?


When we first built our user account system, we had to create a google user and add it to relevant groups in admin console whenever we hired someone, before the automation could kick in.

We're tired of that so we're looking into using a terraform provider [1], so that we can declare new users and instantiate their resources with a simple PR.

[1]: https://registry.terraform.io/providers/hashicorp/googlework...


> It's really nice for a super personal project and I'm sure it helped OP gain a lot of operational experience across several domains, but I just find this type of article to be "DYI porn" for a highly specific subset of DYI.

I share (some parts) of the sentiment.

Fiddling with a VPS and dockers (and Linux generally) landed me my current job but there's a curve and now I feel the need to up my game with a deeper knowledge and understanding of the different pieces and of the overall picture. Otherwise I am just another admin bash monkey.


I (my company) use a similar approach: a single dedicated server with docker containers, by using dokku [0], for an heroku-like self-hosted PaaS.

Most of our applications are either:

- app developped in-house (django/flask): Procfile + deploy with git push

- standard app with a docker image available: deploy directly

Dokku comes with useful "service" plugins for databases, auto https (letsencrypt), virtual hosts... Overall, a good experience.

[0] https://dokku.com/


+1 for Dokku here. Been running 10 containers on the second-cheapest Hetzner instance available for years now. Never had any issues.

My only, tiny gripe would be excessive space consumption on the somewhat small 20gb SSDs you get with Hetzner VPSs.


  My only, tiny gripe would be excessive space consumption on the somewhat small 20gb SSDs you get with Hetzner VPSs.
I was trying to imagine the reason for this. Is it that dokku is similar to heroku, and does a build step upon git-push-to-deploy? So assets/compilation/etc have artifacts and a cache that sticks around to help speed up future pushes/builds?


Same here (not a company but some paying users) but with CapRover.

Multiple webapps (Symfony, Phoenix, Magento) on a 20€/m OneProvider server.

Databases backups are done by a cron bash script which uploads to a ftp.

It works fine, only real downside for my use case is the small downtime after a deploy. I probably would use something else for a frequently deployed pro webapp.


Pardon my curiosity, but do you pay for Dokku Pro?


I do not.


Thanks for this article, it's great to see people caring for their server (does it have a name?) and not defaulting to the serverless craze. Here's a few thoughts :)

> there is some small downtime when I deploy new versions of things since I don't have any load balancing or rolling deployments

It's entirely possible to achieve, depending on your stack. `nginx -s reload` will reload the entire config without killing existing connections or inducing undue downtime. So if you can start a second instance of your "webapp" on a separate port/socket (or folder for PHP) and point nginx to it there shouldn't be any downtime involved.

> for users that are geographically far away latency can be high

That's true, but counter-intuitively, i found unless you're serving huge content (think video or multiple MB pages) it's not a problem. CDN can actually make it worse on a bad connection, because it takes additional roundtrips to resolve the CDN's domain and fetch stuff from there while i already have a connection established to your site. As someone who regularly uses really poor xDSL (from the other side of the atlantic ocean) i have a better experience with sites without a CDN that fit in under 1MB (or even better < 200KB) with as little requests as possible (for the clients that don't support HTTP2).

> CloudFlare (...) That may become necessary if I ever have trouble with DDOS attacks

I've personally found OVH to be more than capable and willing to deal with DDOS for their customers. OVH has been previously posted on HN for dealing with huge DDOS. That is of course if you have proper caching and you don't have an easy venue for a remote attacker to induce huge CPU/RAM load. For example, Plausible-like analytics can be such an attack vector because every request is logged in a database; something like GoAccess [0] is more resilient, and no logs is even lighter on resources.

[0] https://goaccess.io/


> there is some small downtime when I deploy new versions of things

Some time ago I was looking for an easier way to fix this. It seemed to me that a good way would be to have the reverse proxy (e.g. nginx or similar) hold the requests into the app restarts. Fit the user, this would mean a ~10s hiccup rather than 504 errors.

I didn't find an easy way to do it with nginx though and was sort of disappointed. Maybe other reverse proxies make this easier? Or maybe there is a stand-alone tool to do this?

[edit: one app I host can only be run single-instance as it keeps some state in-process. It can outsource it to Redis, but that seems overkill if it's only needed during upgrades, 10s/week or so]


> [edit: one app I host can only be run single-instance as it keeps some state in-process. It can outsource it to Redis, but that seems overkill if it's only needed during upgrades, 10s/week or so]

Not exactly the same, but couldn't you serve a 425 Too Early with a meta refresh of 10s to achieve exactly the same goal? So change your nginx config to serve this "updating, your browser will refresh automatically" page, reload nginx, update the app, revert nginx config and reload nginx. Would that not address your needs, albeit in a more convoluted way?


That might work for HTML pages, API routes would need a different configuration. Then every app consuming the API needs to be updated to support the 425 code gracefully. In addition, I have to manually do this switch in nginx before and after the update...

I would rather a system that holds requests for 20s max if the backend refuses connections.


> I would rather a system that holds requests for 20s max if the backend refuses connections.

I think that's what nginx does by default, isn't it? Try again until "gateway timeout". The problem in this case is if you want graceful shutdown you need your backend to stop accepting new connections without stopping to process the existing ones, then update the app. If some client connections are long-lived, that's a hard problem and that's why in that case running a second instance makes sense if you can do it.

> Then every app consuming the API needs to be updated to support the 425 code gracefully.

That's indeed a problem, but your client should probably support some forms of server "failure" modes, including a "try again soon" type of reply (or interpreting 5XX as such).

> I have to manually do this switch in nginx before and after the update...

This can be fully automated. Just use different config files in sites-enabled. Then you can mv or ln to enable one or the other and run nginx -s reload. The whole process would be a few lines of shell script.


This is not my experience with nginx, I am seeing requests eventually time out. Then I am seeing 504 errors even though the backend is up, seemingly like it takes some time for nginx to notice.

This might very well be a configuration error on my part though.

> your client should probably support some forms of server "failure" modes, including a "try again soon" type of reply

Currently the "try again soon" is passed on directly to the user. For a lot of endpoints, the client is a web browser for which this is not possible (short of turning the whole thing into a PWA... just for the ~10s a week the app is restarting...)


This would be a cool feature for reverse proxies to have.


Seems HAProxy can do it, though it's manual, not automatic when the app is unavailable during reload: https://serverfault.com/a/450983/187237


Great write up! Tip I learned this week when I migrated my VPS [1]: when dumping MySQL/MariaDB databases for disaster recovery, dump the grants (aka database user rights) with the pt-show-grants tool.

You don't want to import the mysql table itself on a fresh MySQL/MariaDB installation, it's a headache. So dump all your tables to SQL (raw data) and dump your user grants/rights with pt-show-grants (which in itself creates a small SQL file) that you easily import in MySQL/MariaDB.

[1] https://j11g.com/2021/12/28/migrating-a-lamp-vps/


If you don't have access to (or don't want to use) the Percona toolkit, you can get all of the grants directly from the `sql_grants` table by installing `common_schema`.


Be careful with docker and ufw though! Any rules you setup in ufw will be ignored by docker, so exposing a container will always open it up to the public even if you specifically denied access through ufw.


Very good point, I didn't know this almost got burned by this while learning Docker. What I did was use shared network (for private db connection etc) in a docker-compose file, and then expose the port I wanted to reverse proxy out on by

    ports:
        - 127.0.0.1:3000:3000
This way it only exposes it to the local host on the machine without exposing it on the firewall. Then I reverse proxied out port 3000 with NGINX to the outside world. I'm surprised this isn't talked about more in beginner tutorials etc.


Technically, Docker is adding iptables rules that are ignored by ufw rather than docker ignoring ufw.

To fix, just turn off iptables for the docker daemon and add the rules manually to UFW


And if you don’t want to do that because there are some downsides, make sure you setup container networking correctly and don’t just expose ports just to expose ports. Learned that one the hard way when someone exposed redis.


Nice write-up :) I'll probably adopt some of these fancy new tools like nginx or munin (that you call old) some day... still running good old Apache + php, postfix, dovecot, irssi, ... I think my box (in its various reincarnations, first as dedicated server, then as VM on a bigger dedicated server that I shared with friends, and now as cloud VM) has been running since ~2005. Now I feel old ;)


Same here. It's amazing how many services you can easily host if they are on PHP and low traffic.

The beauty of PHP is that more services consume no additional resources until they are used. This lets you run many many services without worrying about resource usage. In stark contrast to running everything in containers where you have at least a process per idle service.


I ran servers with 100000s of sites/applications per server and indeed php made that possible. I had a complex setup with jails/chroots and selinux and some customizations to the mysql, apache and php source to make sure users couldn't abuse things. With nodejs or whatever, we would've ran a huge loss, now we ran a nice profit.

When people go on about environmental issues and such, I cannot help to think all the useless cycles spent by 'modern software'. FaaS helps I guess. But that is not really unlike CGI; with my old setup it is actually practically the same but mine was less flexible (php only).


Why does php not need to idle?


This is super useful, thanks for posting. I hate AWS complexity-as-a-service, just give me a damn baremetal machine and let me run and tune my own services. I love having full visibility into raw disk/cpu performance and even benching my hardware.

So many useful comparisons here, Bunny.net vs. CloudFlare, and the fact that you got this far without even using fail2ban!

Questions (not necessarily for OP, but for anyone)

- Give us an idea of the maximum load this server is doing in terms of requests per second?

- Anyone choosing AMD Epyc over Intel to avoid spectre/meltdown mitigation slowdowns?

- Any use of postfix or other SMTP forwarding via Postmark or another email provider?

- What is your Postgres configuration like for your heaviest loads? max_connections? Just wondering how Postgres scales across cores.


Hey, author here!

- Request rate is pretty low on average and peaks at around 15 requests/second. That's nothing really and it would probably take 100x or more request rate before I saw any kind of CPU bottlenecking or similar; my upload would probably bottleneck first. The biggest resource usage on the server comes from some expensive database queries made by some of my applications. - I'd definitely be down to use that kind of CPU but it wasn't available at the price point I wanted. Most of my stuff isn't CPU-bound anyway. - I used to self-host my own mailserver and webmail, but to be honest it was a disaster. Even after spending tons of time configuring reverse DNS, DMARC, SKIN, SPF, and the other list of arcane and specialized stuff you need, I still had tons of issues with mail I sent going to spam. I gave up and pay Google to host my mail for my own domains now. - I really haven't done much tuning. MySQL/MariaDB is the DB that gets the most load and its settings are most default. I have put a lot of effort into profiling my apps' query patterns and making sure proper indexes and schemas are in place, though.


Traefik Proxy[0] was a game changer for my self-hosted setup of Docker containers.

Traefik can read labels applied to Docker containers (easily done with docker-compose) and setup the proxy for you as containers come and go. Even maintains the Lets Encrypt certificates seamlessly.

[0] https://traefik.io/traefik/


Traefik is great, but their documentation is awful IMO. I moved to Caddy which I prefer currently.


To be fair, Caddy's docs aren't great either. Last time I tried to deploy it to give SSL to something that didn't have it, took me 2 hrs to figure out the docs enough to get it working.


Caddy 1 docs felt handcrafted for each use case it supports, whereas Caddy 2 documentation feels generated, except for a couple of pages, which is enough to cover 99% of my needs.


Yes exactly.

I also think Caddy 1 was just simpler to use in general, so it didn't matter as much. But I don't have much experience with that version, so could be wrong.


What could make the docs better? What solved your problem?


Time and trial-and-error is what eventually solved it.

I was trying to configure the auto-HTTPS functionality to use DNS challenges, because my setup forbids any of the others, and this apparently requires plugins that are only supported by community efforts, so that's probably why I had such an issue.

The docs around configuring this were not immediately clear to me. Its like one part is over in this corner, the other over there. And the part about DNS provider plugins is off in a completely different direction. I definitely think things could be better organized. Also your one tutorial for getting HTTPS support only mentions the very basic cases. I think it'd be beneficial to see some more advanced tutorials using for instance some of the other LE challenge methods, something like this definitely would've saved me some time.


Definitely for people getting started with self hosting I would recommend Caddy.


I love & use caddy for proxying to Docker containers + other services running outside Docker.

I wish there were an easier way to bind docker ports to Caddy automatically (without using dokku et al.), but for now I maintain a Caddyfile. Which, thinking of it, doesn't even require setting up a janky script for the rare times when I need to host a new service & modify the config.

I guess there's no reason to make things harder for myself 6 months in the future.

Related: https://xkcd.com/1205/


I do something similar:

- GCP VM with “Google Container OS”.

- “Cloud init” config set on the VM metadata (can easily recreate the VM; no snowflake config via SSH mutations).

- My service runs in a docker container, reads/writes to a SQLite file on the host disk.

- GCP incrementally snapshots the disk every hour or so, and makes copies to different region. Any disk writes are copied instantly to another zone.

- Lets encrypt cert is read from the host disk and the docker container serves HTTPS directly (no proxy). Certificate is renewed with the LE CLI.

- The service logs to standard out, this is collected by the Google logs daemon which I can view with the web UI.

- Google have their own HTTP uptime monitoring and alerting which sends you an SMS.


TIL about their uptime monitor, thanks!

https://cloud.google.com/monitoring/uptime-checks


Nice. I've gone down a different path and built https://github.com/piku/piku, which I use to run 12-15 web services (and batch workers) off a single-core VM in Azure, plus around the same amount of private services on a 4GB RAM Raspberry Pi.

I still use docker-compose and have a k3s cluster running, but I really like the simplicity of deploying via git.

Cloud-init for my core setups is here: https://github.com/piku/cloud-init


Really good write up and always nice to see the honesty about past mistakes. For those without time to get hands quite as dirty, like me, I've found that CapRover[1] gives just enough UI to fiddle while having sensible defaults.

[1] https://caprover.com/


The only negative about caprover I’ve found is the distinction between containers with persistent storage and not. I don’t want to recreate my whole app just to add persistent storage.

Otherwise I’ve been using it without issue for months now.


Kasra from CapRover. Feel free to open a Feature Request on Github. There is no architectural design limitation that prevents this. It can certainly be added.


Instead of recording the “docker run“ commands, you might want to have a look at docker-compose


I had started out the same way, especially if it was a new app and I wasn't familiar with how I really wanted to run it. Some containers expect a fair number of environment variables and multiple mounts. Once I got everything working, I would create a script /svcs with the corresponding docker run command. There's even a cool tool called "runlike" which can create a well formatted command for any running container.

https://github.com/lavie/runlike/

But I've got those migrated to docker-compose files these days and I try to start with the docker-compose file instead of going directly into testing out docker run commands.


Came here to say this! Honestly, my into to docker-compose came when I realised I was starting to do the same thing (saving 'run' commands). Small learning curve for the yaml structure, but well worth it (and honestly I think it is easier to visualize the deployment when it is all nicely laid out in the compose file). Pretty much the only way I use docker now. (Also has a bonus of not really adding a bunch of overhead or complex. Just does what you need for a basic multi-container deploymnet and no more.)


Yes I thought that was odd too! OP mentions keeping manual records of "docker run" commands and that it might need some improvement...

Well apart from that detail that's exactly what I do for my stack of personal sites. "docker-compose" is really hard to beat for all kinds for setups. And I get 64GB to play with for 50EUR/month from Hetzner, so I don't spend too much time worrying about RAM.


I’m using the same server type from OVH in Hillsboro and it’s great. They frequently do sales. Highly recommend anyone interested to go for the nvme upgrade.

I rent three now total, one for production sites, one as a development machine strictly for vscode and another for development / staging / misc. waaaaaay overkill but it’s been a huge quality of life improvement.

For containers I just use a couple docker compose and a script to run them all.

Reverse proxy is through nginx proxy manager which takes care of LE renewals and has a nice web interface.


I’m running collocated servers with simple docker compose file too. Actually https://mailsnag.com is hosted on one of the servers and is using single docker compose file for deployment. Not sure why the author had to record every single docker run command when docker compose file manages all of that for you.


Where do you go to monitor these Hillsboro OVH sales?


They send out emails but also just around major holidays and just randomly. They use this page most of the time from what I can tell https://us.ovhcloud.com/deals

Sometimes though they'll send targeted email offers that are not on the site and are good for older hardware but at a really cheap price.


There's projects like Yunohost [0] and Freedombox [1] that aim to make all this easier for the average person. Interested to hear about more such software projects.

[0] https://yunohost.org

[1] https://freedombox.org


Another one is libreserver.org (formerly freedombone), whose blog is a treasure trove. To my knowledge, it's the only solution that supports Tor/I2P out of the box, and even has experimental branches (looking for testers) to support a local "mesh" setup.

These projects are amazing but if you're gonna use them, don't forget to do backups, especially if your system is hosted on a SD card (for example on a raspi).

I'm just a little sad there's no standard packaging format across those distros, although i've spoken with yunohost and libreserver maintainers in the past and they seemed somewhat interested in the topic!


I've been using yunohost for a few months and nothing could be easier. I wish more guides would spread the word about it instead of promoting manual installation of multiple packages involving the command line and debugging.


> Whenever I add a new service, I record the exact docker run command I used to launch it and refer to it when I need to re-create the containers for whatever reason. I know that this is an area that could use improvement; it's quite manual right now. I'd be interested to hear about some kind of light-weight solution for this that people have come up with.

Later...

> I host my own analytics ... They provide an open-source, self-hostable version which I have it deployed with docker-compose.

Wouldn't docker-compose be the "already doing it" answer to the first question? It's pretty much your library of 'exact docker run commands', plus sugar to manage the running ones collectively.


> Wouldn't docker-compose be the "already doing it" answer to the first question? It's pretty much your library of 'exact docker run commands', plus sugar to manage the running ones collectively.

I'm also confused why he's not using compose. Perhaps he is unaware that when you have multiple services in a compose file, re-running `docker compose up` only restarts containers whose parameters have changed in the compose file.

He mentions further down that he's using compose for one service, so obviously he is aware of it.

But I can't blame him. When getting up to speed on docker, I found that figuring out compose and how to use it effectively was frustratingly under-documented.


Yet another modern "cloud native" era developer discovers how overpriced and underpowered most cloud offerings are.

Here's what they did folks: they set cloud prices a while back, did not drop them as Moore's Law delivered more power but instead pocketed the profit, and meanwhile ran an entire "cloud native" development push to encourage development practices that maximize cloud lock-in (both through dependence on cloud services and encouraging complexity).

Oh, and bandwidth is ludicrously overpriced in the cloud. You can transfer hundreds of terabytes a month outbound for <$500 if you know where to look. No I don't mean from static data either. Look at bare metal hosting.


>I have all my HTTPS certificates issued + managed via letsencrypt. It's a terrific service; all I have to do is run snap run certonly --nginx every 3 months and everything is taken care of automatically.

Hopefully through cron and not manual invokation! Certbot can safely be executed daily by cron, as it will only attempt renewal if it is required.

Automating certificate renewal is an very important step in ensuring availability. I feel like part of the on-call sysadmin initiation process (pre-ACME days) was getting a frantic midnight phone call because someone forgot to renew a certificate...

I suspect they are using cron, but this has been omitted unintentionally.


Except when restarting the services that rely on the certificate doesn't work reliably... but yeah, needs to be automated.


A few months ago I tried Nginx Proxy Manager[1] and never looked back.

It provides a nice looking UI to manage reverse proxy with Let's Encrypt certs, auto renewal and a few other nice features.

[1]: https://nginxproxymanager.com/


great writeup that overlaps with a lot of stuff i do for myself — a few related tool suggestions from my own experience are caddy for https (absurdly easy config — seriously, i would never go back to nginx if i could help it) and netdata for server monitoring (open source, optionally connectable to a web dashboard with email alerts)

[0] https://caddyserver.com/

[1] https://github.com/netdata/netdata


Lots of AWS frustration here, so I will mention a project I created to auto generate AWS web applications, covering what I believe are a majority of practical use cases (db, ui, api, users, roles, groups).

https://awayto.dev

https://github.com/keybittech/awayto

It's not a product, just a project. Check it out if AWS is getting you down! :)


Echoing the sentiment here, this is a great way to host smaller projects on the cheap, without adding the complexity/price of k8s, Nomad et al!

I do the same, and have spent some time automating the backup of such a set of standalone containers [0], in case others also find it useful.

[0] https://github.com/jareware/docker-volume-backup


I also have a tonne of stuff running on a single, fanless server via docker compose. It is amazing, barely breaks 20% CPU, better uptime than aws ;-) ... One overlooked aspect is that if you use something like my Streacom db4 case, beautiful, fanless, noiseless, you can put it in a living area. Depending on where you live, the power usage can be fully offset against heating costs for your home, making it very efficient.


Just a note on power consumption with regard to heat output. Electric resistive heating is 100% efficient with regard to electricity from the socket. This is the upper limit on efficiency for a computer. This ignores efficiency losses from transmission and from electricity generation. If your electricity is generated by fossil fuels, you are getting much less than 100% efficiency from the energy stored in those fuels.

Heat pumps can achieve greater than 100% efficiency relative to electric resistive heating, because they move heat, rather than do work to create heat. These can achieve equivalent heating at 50% of the electricity utilization compared to resistive heating.[0]

Local fossil fuel heating (e.g. natural gas furnace) is much more efficient than electric resistive heating. The efficiency of local fossil fuel combustion is close to 100% of the energy content of the fuel, as it is turned directly into heat and that heat is dissipated into the home. More than 60% of the energy content used to generate electricity is lost in the generation process in the US.[1]

So, the heat from a computer does offset heating costs, but at less than 100% efficiency. For a small fanless machine, this is not likely to be a large expenditure, but it is valuable to keep in mind that heat from electronics is not "free".

[0]: https://www.energy.gov/energysaver/heat-pump-systems [1]: https://www.eia.gov/todayinenergy/detail.php?id=44436


Great post with lots of details. I'm sure I'll try some of tools mentioned.

We do something similar, multiple django sites and static sites on a single digital ocean droplet. We use docker for everything though, and have a single nginx listening on 80/443 that routes traffic to the sites by domain. Each site has a separate pg db and everything is glued together / configured / deployed via ansible.


Do you think I can make this without Docker? I afraid to use docker I don't get it. Why not just systemctrl restart webserivce1 or 2 ?


Of course you can. That approach makes different tradeoffs, of course, but it's entirely possible. I have a "legacy" server with ~20 small/medium, non-containerized applications running on it (Django, older Python frameworks, some Lisp & Scheme stuff, etc.), all running as system services and sitting behind an Nginx front-end.

But I can see the merits of containers, and might containerize these apps & services some day.


I'm doing something similar though I've opted specifically to _do_ use Kubernetes via k0s [0]. It works wonderfully well and allows me to use most things that are available in the k8s ecosystem like cert-manager [1] and external-dns [2]. All configuration is stored in Git and the server itself is basically disposable as it only runs k0s and some very basic stuff like iptables rules.

I see this sentiment quite a lot that k8s is too complex for small scale setups, but in my experience it scales down very well to small setups. Granted, you do have to know how to work with k8s, but once you learn that you can apply that knowledge to many different kinds of setups like k0s, bare metal, EKS/AKS etc.

[0] https://k0sproject.io/

[1] https://cert-manager.io/

[2] https://github.com/kubernetes-sigs/external-dns


I'm playing with k0s and it seems it doesn't play nice with firewalld.

With firewalld active containers cannot do networking, not even with hosts jn the same lan.

Everything else works beautifully though.


I'm using iptables myself and it works fine, though you have to make sure that traffic on the kube-bridge interface is allowed.

With iptables:

  -A INPUT -i kube-bridge -j ACCEPT
  -A OUTPUT -o kube-bridge -j ACCEPT
Other than that I've configured iptables to drop all incoming traffic except a few whitelisted ports.


Do you have any estimates how resource hungry k0s is? Ran few resource constrained k3s clusters, where 25% of cpu was always spent on running k3s itself.


Very similar. I guess it's really k8s (the control plane) itself that is so resource intensive. Looking with top right now kube-apiserver, kubelet, kube-controller, kine and k0s use 13.5, 12.5, 5.6, and 3.0 % CPU respectively. Obviously it fluctuates quite a bit, but seems to be around 25-30% of 1 CPU core too. Also uses about 500-600mb of memory.

So yes, it definitely takes quite a bit of resources. I'm running this on 4 CPU cores and 6 GB memory, so 25% of 1 core and some 600mb of memory still leaves plenty of resources for the services. On a more philosophical note (as was mentioned below in this thread), it is a bit wasteful perhaps.


Just done a very quick search but it seems like k3s is the better choice so what did you like about k0s?


Curious what makes you think k3s is the better choice? The only reason I ended up going with k0s was that I had problems getting k3s working well behind a locked down firewall. With k0s that was pretty easy.


Is there a reason people don’t self-host on a small home server? $85/mo is a lot of money if you aren’t making any money off what you’re hosting. If you mostly run CRUD APIs and sites, is there any downside to a low bandwidth home connection? Multiple family members already stream Plex content from my measly 20mbit upstream. Why not run a whole site or multiples?


You could. If I did it then I would most likely go with Cloudflare Tunnel( https://blog.cloudflare.com/tunnel-for-everyone/ ) which is free to use. Since I would want to try to protect myself from a possible DDoS attack.

But using that then I think it would be more than feasible to do it from your home network if you have a fast enough upload link.


I both maintain a server cabinet at home and rent VMs from providers. There are many reasons people don't self-host: noise, heat, dealing with hardware setup, etc. And if any of your service receives public traffic, you'll have to be very careful in configuring it because you don't want attackers to target your home network or the ISP/police to come at you for illegal traffic on your line.

$85/month is a lot of money, but given the number of services he runs on his server, each service now costs roughly $3/month, lower than the price of the cheapest DigitalOcean droplet.


I do both.

I have a server @home with Plex and few other services that are used mostly by myself and close relatives. Not something very public. It runs on a second-hand i7 NUC which is almost silent unless someone needs transcoding on content read from Plex. But it's "no warranty" services.

And I rent a $20 server at OVH on which I put more public stuff. Like blog for myself or others and for which I don't want to be linked _directly_. For instance a friend's blog was under attack because some people didn't shared her ideas. I was covered by OVH's anti-DDOS. If it had been on my home server, I probably would had some _troubles_. Same goes for the seedbox hosted on it. I don't have to set a VPN and put safety in place to make sure that not a single bit of data from the seedbox is shared without the VPN. I just run it exposing the public IP of the server and don't care. Worst case scenario is OVH taking down the server after (many) abuse reports.


We made a product called Hoppy Network (https://hoppy.network) that fits this use case perfectly. We use WireGuard tunnels to bypass all ISP restrictions, and provide a clean /32 IPv4 and /56 IPv6 using a native network interface. Some of our customers are on Starlink or cellular ISPs. We just launched our annual plans today, cheapest plan is $80/year. Plus, we're sending out a sticker-pack to our first 20 annual plan customers. For the purists out there, we don't filter ICMP packets and don't block any ports.

TLDR: You can self-host without your ISP knowing!


No need for CDN, if the connection is already setup and the server supports HTTP2 the files can be sent in parallel. TCP and SSL handshake back-n-forth will likely eat up the initial Atlantic latency. Also static content can be cached on the user device. And serving static content is well optimized in most web servers, so you do not need beefy hardware.


If you want to do something like this from home behind CGNAT, blocked 80/443 ports, etc, you'll probably need to set up tunneling as well:

https://github.com/anderspitman/awesome-tunneling


Nice writeup, and at the core not too different from what I'm doing myself (albeit with the server at home, not colo'd, and the specs are far more modest).

The only thing I'd change in your workflow, perhaps, is switching from docker CLI commands to docker-compose. That'd make things a lot more reproducible and easy to read, and if you group relevant containers into a compose file, they're also automagically networked together, which is handy.

Your "trick" of grouping TCP ports by the hundreds is something I might steal, I've been using $lastport+1 for a while and now there's a bit of guesswork involved whenever I need to do things that don't go through the proxy. Then again, that's not often, so I might leave it.


Using FreeBSD jails you can easily host hundreds if not thousands of web applications and services on a single server having the specs mentioned by the OP. This practice isn't even noteworthy in FreeBSD land as it is so common.


I went the opposite direction, 1 vm per service. It's much easier to secure, backup, restore thanks to full isolation.


Me too. Yesterday I was trying to add a second app (3 docker-compose services) to one of my 5$/mo droplets and it started impacting and slowing down the other app. Debugging was harder too and could affect both... Perhaps at some point I'll try it again.


Sounds like your server may have been swapping? Disk IO is notoriously bad on most VPS so make sure to disable swap and setup oomkiller. In the case of a small VPS, containerization overhead is also not negligible: you may consider to setup everything natively (to your VM) to counteract that.


Yea, that makes sense! Ty


You should look into systemd-nspawn, VM experience without the overhead.


I hope you don't use the VM snapshotting feature as backup method? I had problems when running a database service on it for obvious reasons.


Nop indeed, I use the native backup tool to create a dump then use borg to save the backup in a remote location


But that bloats your backups like crazy. Much better to use something like Ansible to setup the VMs and then just backup the data.


True about the backups. I'm thinking once I upgrade to 22.04, I'll be able to use virtio-fs for better efficiency (+ actual dedup!).


I have a similar setup and wrote how to setup everything here

https://costapiy.com/deploy_django_project_linux_server/ and here https://github.com/costapiy/server_setup


I had the same problem and didn't want to manage things by hand, so I wrote Harbormaster:

https://gitlab.com/stavros/harbormaster

It basically pulls Compose apps from the git repositories you specify, builds the containers and makes sure they're running. Pretty simple and works really well for me.


I wonder what the environmental impact of a 24/7 running system with such high-end specs is. Desktops are worse with a graphics card (just yesterday I noticed my system's power consumption doubles when the GPU turns on: 20W with 100% CPU on all cores and WiFi stressed; 37W with additionally primusrun glxgears running), but desktops only run on demand. Dedicated non-mobile hardware doesn't scale to demand that well and has fairly high idle draw.

Don't get me wrong, my server also runs a ton of crap, like git/email/backups/irc/mysql/etc. plus a couple websites and some python scripts (3x doing continuous data logging, 2x providing some service), but it draws less than a traditional light bulb and also runs everything in linux containers. Unless you're doing large data processing tasks like machine learning on a weekly/daily basis, odds are you don't need more than two consumer-grade cores (I've got more since I repurposed an old laptop, but they're very rarely used). Large RAM is nice, though, I do have to say that. My 8GB is stretching it a little, considering that my favorite email server software is a Windows GUI thing and so I run virtualbox to have this guilty pleasure, which consumes about a quarter of the system's RAM just for one service.

Aside from Owncast, no idea what requirements that has, I think all of OP's services would be fine with a Raspberry Pi as well, just RAM would be tight with so many individual containers. Especially given the static site hosting, and doubly especially with that offloaded to a CDN, the only dynamic things (screenshot hoster, pastebin) are own use and won't see a dozen concurrent users continuously or anything.

Edit: I read over the -track API websites part. I don't know whether those might fit on a Pi as well, but that sounds like various people make use of it and this server is not just single user.


I run a home-server with pretty high spec and how much extra juice it has left over has been bothering me. It feels wasteful.

My server isn't bound by CPU almost at all - the most intensive thing is the occasional trans-coding for jellyfin, however it does use quite a bit of ram for a couple of minecraft servers and ZFS. I'd really like some sort of preferably arm or maybe risc-v based server (in the future) that can take sata and a good bit of ram but most of the SBCs I see would require that my drives work over USB which is annoying and usually dont have more than 4/8gb of ram.


From what I saw, both Intel and AMD are trying to adopt hybrid CPUs that hopefully have a power consumption curve closer to M1 systems. So I hope that this improves in the future.


That's great, but from an ecological perspective, repurposing second-hand hardware will always have more benefits than producing new "green" hardware, as most of the energy spent across the lifecycle of a computer happens during production.

Buying a raspberry pi for selfhosting is certainly not greener than repurposing your old desktop/laptop. Although i have to admit the economic incentives are skewed due to foundries and global supply chains paying energy orders of magnitude cheaper than we pay our electricity.


Where did you read that? It doesn't sound implausible, but if we're talking 70W idle for a 24/7 server, that does add up so I'd be interested in where the cut-off point is.

Edit: just realized the 70W figure is in a sibling thread, not a (grand)parent, but I'm still interested regardless!


Some sources, not very recent but with great links to more detailed studies: https://css.umich.edu/factsheets/green-it-factsheet https://www.fcgov.com/climatewise/pdf/computers.pdf

Arguably, none of these account for the pollution due to extraction/refinement of materials, which is another concern raised with building more stuff. Recycling is also a polluting process (usually less so than new extraction) but unfortunately most electronics is never recycled and R&D in the hardware world is mostly focused on raw performance and does not even try to optimize for recyclability.

If any government cared at all about climate change, they would mandate hardware manufacturers to publish hardware sheets (for long-term maintenance and interoperability [0]), outlaw throwaway single-purpose computers (as opposed to flashable general-purpose computers [1]) and planned obsolescence, and invest massively in two key areas:

- green IT R&D: there's very little universities around the world working on this, and they operate on a shoestring budget

- public-service repair shops for electronics: it's ridiculously hard (or expensive) to find electronics repair services even in big cities, but having skilled people do a 1$ part change on a device that would otherwise be thrown away (eg. soldering a new micro-USB connector or changing a burnt capacitor) goes to great lengths to extend the lifetime of existing devices

I'm interested if people have more links/resources to share on that topic!

[0] https://www.usenix.org/conference/osdi21/presentation/fri-ke...

[1] https://boingboing.net/2011/12/27/the-coming-war-on-general-...


have you come across Harun Šiljak's fantastic piece in the Science for the People magazine titled 'Opening This Article Voids Warranty'?

"Repair is not an auxiliary, optional part of the economy. Repair is care, at every stage of the supply chain, and at every scale. Those in dire need of repair for their devices are, just like Terry Pratchett’s Theory of Economic Injustice predicts, in the crosshairs of widening economic inequality.4 Already-impoverished consumers are encouraged to buy ever-newer products (and other planned obsolescence mechanisms have co-evolved with the boom of overproduction for the Global North). These products are notoriously hard to repair by design and require repeated purchases, exposing the long-term scheme of the manufacturers. Mineral extraction necessary for the manufacture of new hi-tech devices inflicts death and immiseration on surrounding populations.5 A community that sees no value to repair is a community that cannot respond to the crisis of capitalism, and is merely its hostage. Repair, as an act of reclaiming technology, is ongoing in the Global North and South with complementary driving forces and problems."

[...]

"A classic strategy of anti-repair design is avoiding modularity. If the parts of the device are strategically bundled together so that failure of one part requires replacement of a whole unit, it is not “repair” anymore. While it happens with mechanical components as well, the electronic version of this strategy is worthy of closer examination. Unlike a gearbox whose miniaturization still results in an assemblage of separate gears that might be smaller and harder to replace, miniaturization in electronics was driven by manufacture of monolithic semiconductor structures. Control systems that have previously been implemented with, for example, a collection of discrete transistors and diodes (basic electronic components that are easily replaceable) have been revamped as embedded systems: for the same functionality they now use a microchip with software running on it. Access to the software is not provided. Access to the chip itself is not provided as it is commonly “globbed” (covered with a black blob of epoxy). On top of this, the manufacturer takes precautions to prevent you from replacing the entire controller with a different, customized controller on your own. Here I return to the robotic arm: what kind of a controller do you want such a mechanism to have? The odds are that the same arm might be deployed in a myriad of different settings and scenarios, and needs tweaking of the controller. The “body without organs” controller under the blob of epoxy offers no modularity, no chance to expand, reduce, or in any other way customize inputs, outputs, or processing capabilities. The individual components that might be at a higher risk of damage (e.g. transistors) don’t exist anymore, so every failure takes the entire block of silicon down with it. And finally, if product support is discontinued, the manufacturer goes out of business, or chooses to change the business model into selling a service rather than a product, the controller is expected to become an unusable brick. To make things worse, by making non-standard interfaces with mechanics and/or placing restrictive licenses on its software, the brick is supposed to be irreplaceable, hence rendering the entire robotic arm unusable (even if the mechanics are perfectly sound).

The loss of repairability is not a consequence of technological progress or increasing complexity—it should arguably be the opposite. Complex systems science pioneer W. Brian Arthur explains the two primary mechanisms of improving a technology: “internal replacement,” or changing the existing parts; and “structural deepening,” which means adding new components.10 Neither of these require that new parts and components cannot be modular, replaceable, and repairable. Complexity, in fact, is all about modularity and heterogeneity and can be an argument in favor of repair. The concepts of internal replacement and structural deepening, if anything, are the philosophy of repair as a creative process. New parts or mechanisms that come from repair contribute to an invention: potential new applications of the device, a new approach to manufacturing, and personalization of the item. A creatively repaired device is where the social network merges with the technological one. However, that is not in the interests of the manufacturing lobby: this network is one of capital accumulation.

The other aforementioned strategy of disabling repair is the legal one. To keep this grip of capital on the technology and knowledge of manufacturing and repair, the opponents of repair create the illusion of illegality: Looking under the hood should be taboo, understanding how things work should be perceived as illegal, and the concept of patents and protection of intellectual property should be regurgitated as respect for science and protecting the world from anarchy. Big manufacturers such as Apple also grasp at other legal straws such as privacy concerns.11

Bogus legal barriers run against the basic principles of science and engineering. Take, for example, the concept of reverse engineering. Finding out how a piece of hardware or software works by observing its inputs and outputs is an essential part of repair in modern technological systems. Often portrayed as illegal, this activity does not go against trade secrets laws. Instead, it becomes an issue concerning terms and conditions agreed to by user and manufacturer.12 Among legal contracts, “terms and conditions” represent a world of their own, with clauses that are often void, unenforceable, or plain illegal.13 The “opening box voids warranty” stickers mentioned earlier are a blatant example, but not the only one. Through lobbying, manufacturers erect new legal barriers where previously there had been none: when the Unitron Mac 512, a Brazilian clone of the Apple Macintosh was developed in the mid-eighties, it infringed no laws in Brazil. Only after heavy lobbying from Apple and threats of sanctions from the US government did the Brazilian government introduce a completely new “software law” to stop the sale of the reverse-engineered Macintosh.14"

source: https://magazine.scienceforthepeople.org/vol24-2-dont-be-evi...


Wow, that's a great resource. Thank you very much for the link!


From the environment perspective, it also really matters where that 70W comes from generation-wise. If you're in an area that primarily uses Nuclear or things like Wind, Solar, Hydro, or Geothermal, that 70W of electricity isn't going to be particularly bad for the environment compared to places where it comes from Oil, Coal, or Natural Gas plants.


Generally true, although it's still a very rare exception if your energy is both green and could not reasonably have been used to reduce ongoing CO2e-intensive consumption. We still have a green energy shortage basically everywhere in the world (Iceland might be a rare exception, having green energy while not being conveniently near other countries without low-CO2e energy sources of their own).

I can use solar panels on my own roof to power an extravagant server, but it's still not really net zero (even if you would offset solar panel production and recycling) because they could have supplied to the still-very-fossil national grid if I hadn't been consuming it frivolously.

(Edit: I should maybe add that we don't all need to be saints all the time. My guilty pleasure is long showers, I enjoy those despite the power consumption being 10kW. I do better in other areas and vote with my wallet and ballot. I of course have no insight into what other energy choices you / OP / the reader makes so my comments are not meant as criticism towards any person.)


That would be very welcome from both a planet but also simply a normal end user perspective indeed!


I guess that in this very specific environment, the only practical reason against the Pis are the containers. Spotty support for ARM images seems to be a continued problem.


I didn't know that was a problem. I run all my services in containers (only service on the host is ssh) but it's all from Debian repositories and so would run on basically any architecture. I guess it depends on how specialized the software is that you require but what is built into Debian is quite extensive in my experience.


I briefly ran a home server on desktop grade hardware. Ryzen 5 with 2 hard drives. The thing was pulling a constant 70w while basically idle. Insanely inefficient compared to adding an extra user to an existing app/service.


Following this rationale, many people run small hosting coops from home, for example as part of the chatons.org or libreho.st federations. The author of the article even acknowledges they host some services for their sister and some other people.

That's a great way to mutualize resources (reduce waste) and at the same time is arguably much greener than datacenter/cloud services which require dedicated infrastructure (a great source of energy/pollution), and often have layers and layers of user tracking/profiling and detailed logging which consume great amounts of resources.


That sounds quite bad for "doing nothing", modern desktop class hardware can do quite a bit better than that.


> modern desktop class hardware can do quite a bit better than [70W idle]

If a €1200 built-from-parts desktop from March 2020 counts as modern desktop class hardware... this system uses 85W idle. It has an AMD CPU+GPU, is Intel and/or Nvidia any better in this regard or are you talking about more low-end hardware?


A lot of it is usually setup issues, i.e. either not enabling power saving features or hardware incompatibilities causing them to not be used. E.g. some mainboards ship with efficient power states disabled in BIOS.

EDIT: e.g. for an optimized system with Ryzen 9 5950X and RTX 2070 Super apparently 40 W is doable for "sitting idly on Windows Desktop" according to reputable tests. Lower-class hardware can do less.


"40W is doable" is honestly not what I expected from your comment (I got the impression of both less and it being the default), but then again you didn't say it was "good", just "better". Fair enough, but this achievable idle consumption is still double of the peak consumption of a mid-range laptop. Either way, thanks for putting a number to it!


Ryzen 9 with 16 cores and fancy-ish GPU is not really a mid-range setup, as I said, you can go lower with weaker components. (the other numbers I had at hand quickly were from way below the 1200€ mentioned in the comment I replied to, and I thought it was a useful upper mark). And in reverse, 20W peak is low for a laptop - my Thinkpad 480s (which I guess you can argue is not a mid-range laptop, but not exactly a workstation beast either) peaks more at 40.

The desktop comparison to many laptops is more a mini-ITX based, embedded-graphics box with e.g. a quad-core CPU. There idle <=15W is realistic. (And the laptop idles at ~5, so still more efficient of course). And the range between for components inbetween that and high-end.

Desktop is going to be worse than laptops, but people tend to overestimate how much power a desktop system actually has to use if it's built well. Modern CPUs are efficient at power-saving, PSUs have gotten better, ...


I had to stick a little GPU in it since there were no built in graphics and the machine refused to boot without a GPU. That would explain some of the power. The rest I assume went to the fans and hard drives.


if you use a NUC you can get less than 30W easily.


Out of interest, what is the email server? Your server's EHLO wasn't very revealing.


It's an old thing that I should probably not mention for opsec reasons since, honestly, it probably can get pwned (I would be surprised if no intelligence agency has a 0-day for it), and I keep thinking I should switch to linux-based solutions, but it just never floats to the top of my todo list. It's also a fairly big operation since I'd have to coordinate the switchover with some people and I have no idea how to go about IMAP message copying other than telling them to move it over in Thunderbird which seems error-prone.

If anyone has a good suggestion for an open source system where I can:

- Give people access to control their domains via a web interface (create accounts, set a catch-all, such things)

- Have aliases and server-side rules (e.g. blackhole email coming into spamaddr2021december@lucb1e.com; move From "bank" Subject "balance change*" Into inbox/notifications), preferably via the web interface

- Optionally: have IMAP and SMTP without a lot of moving parts (I've looked at typical Linux setups before and they're usually quite finicky)

Then please let me know!


I’ve had this before with webmin and virtualmin (I think), but they do a lot more and effectively take over your whole server, so I’m not sure if that’s what you are looking for.


That should be fine if I can just run it in a container and simply use only the mail server part. I'll take a look, thanks!


> my favorite email server software is a Windows GUI thing

> [list of features]

hMailServer it is?

And regarding SMTP replacement... for personal uses I moved to Fastmail years ago. It does everything listed except "control domains" - but does it in a reseller account (so you can have the control too). Of course it's all costs money, so...


I guess mailcow would suit your needs. Two years ago I replaced an old OpenXchange setup with mailcow and it was fairly easy to switch over. It also support IMAP copy from the admin UI.


What made you switch from OpenXchange?


I have a much simpler set-up, but I don't run everything on one server. I usually prefer having a separate VPS or dedicated server for each service, but this is starting to be a bit more costly than simply renting a powerful machine for everything.

One advantage of having everything scattered around is that not everything is going to go down at once (eg: a DDOS or even a traffic spike on your machine will kill all your sites/services if you only have one machine).

I use: - Percona PMM[0] - Database/server monitoring - Backups to BackBlaze B2 using cron jobs - Simple deployments, usually using SFTP to upload files (usually for LAMP applications, also running Node.js apps using pm2 that can listen to file changes and restart the app)

I am actually working on creating some tools/scripts/guides to aid with self-hosting (for backing up stuff, uptime monitoring, alerting).


Nice! I do something similar for a few production applications, but I use docker compose to bring the services up, so I don’t have to create the networks manually or remember all the command arguments. Works great.


Is your 64MB memory ECC?

Why do you RAID-1 NVMe? They are likely to fail simultaneously so maybe make them independent and schedule regular backups from one volume to the other, or to a remote disk drive somewhere.


I don't think OVH offers any dedicated servers without ECC. Their cheaper offspring, Kimsufi (now a separate company since quite a few years) may, although I'm not sure.

Even the cheapest OVH dedicated server, starting at about 50 EUR / month, has 16 GB of RAM and it's ECC.


Many Kimsufi offerings are not fitted with ECC, I don't think some of the lower-end CPUs (the Atoms) even support it at all.

Their middle-budget brand SoYouStart seens to only offer machines with ECC, I started using one recently, so I assume you are right and the main brand operates likewise.


This seems to be the RAM they're provisioned it with: https://www.samsung.com/semiconductor/dram/module/M393A2K40B...

It doesn't look to be ECC.

> Why do you RAID-1 NVMe

That's what they set it up with when I first rented the server. What makes them likely to fail simultaneously, out of curiosity? I do backup everything irreplaceable as much as I can; I've not had to do it but theoretically I'd should be able to re-build everything from scratch with a day or two of effort.


Raid-1 is doing exactly what you recommend without any effort. A perfect replica of the disk. And if the other one dies, who cares, the beauty of raid-1 is you don‘t need the other one to have a full copy.


I think the idea here is that RAID1 forces both SSDs to write every block at the same time. With identical SSDs and very similar write endurance profiles you're likely to have them both give up at the same time.

Even just a nightly rsync would decorrelate what is right now nearly perfect correlation.


> identical SSDs ... you're likely to have them both give up at the same time

I wouldn't say much more likely than with traditional drives, unless you are getting towards EOL in terms of how much rewriting has been done but after that much time I'd expect randomness to separate things out at least a bit.

The main concern I have with either drive type is finding out blocks that haven't been touched in ages have quietly gone bad, and you don't notice until trying to read them to rebuild the array once a failed drive has been replaced - that applies equally unless you run a regular verify. Other failure modes like the controller dying are less likely to happen concurrently, unless there is a power problem or some such in the machine of course, but again these might affect all drive types and this is one of the reasons you need proper backups as well as RAID (the age-old mantra: RAID is not a backup solution, RAID increases availability and reduces the chance you'll need to restore from backup).

Having said that, my home server deliberately has different drives (different controllers, highly unlikely that even if the memory on each is from the same manufacturer it is from the same batch) in its R1 mirror of SSDs, just in case. The spinning metal drives it also has in another array were bought in a way to decrease the chance of getting multiple from one batch in case it is a bad batch.

> nightly rsync

The problem with that and other filesystem level options is that depending on the filesystem and what you are running, some things might be missed due to file-locking. As RAID is block-device level this is never going to be the case, though of course in either case you can't catch what is in RAM and not yet physically written.

Of course this problem will be present for most off-device backup solutions too, so you could use the same mitigation you have there for backing up between the drives too.


LVM snapshots of an xfs filesystem would do the same but without the jiggery-pokery of an rsync. It's atomic, too, iirc


Again, both NVMe modules are likely to fail simultaneously when used in a RAID-1 mirror on the same chassis, controller and PSU, under the same workload, especially if they are the same model and age.


I'm not sure this issue is significantly worse for SSDs compared to other drive types, except once they get really old and are close to EOL as defined by the amount written, though I'm happy to be proven wrong if you have some references where the theory has been tested.

If you are really worried about that, perhaps artificially stress one of the drives for some days before building the array so it is more likely to go first by enough time to replace and bring the array back from a degraded state?


Just add an event to calendar for $date+1y:

"Replace one of SSD/NVMe drives in $server because I'm afraid of simultaneous failure (and probably still didn't configured backup for this server)"


> Just add an event to calendar for $date+1y

Unfortunately that won't wash if you are renting a server of collocating. Replacing a drive without it having already failed would most likely result in a charge for unnecessary hands-on maintenance time.

Though if you are expanding the storage anyway you could do it with an identically sized pair of drives, sync the existing array over to one of the new ones, drop the extra, then you have to unmatched drives for a new array. If using LVM you can join that to the existing VG or (less safe but better performing once you are done) you can try reshape the two arrays into a stripeset for RAID1+0. And hope that the new drives are not from the same batch as your existing ones and have just been sat in a store cupboard for the last year.


Hehe, have been doing that on my Raspberry Pi 4 for years now

- Nextcloud - Gitlab - A bunch of websites hosted under different domains - Matrix homeserver

And the load is still surprisingly low


> And the load is still surprisingly low

Which means that you have some extremely unpopular services running...? :) /s

(Happy new year!)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: