Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Self hosting 10TB in S3 on a framework laptop and disks (jamesoclaire.com)
170 points by ddxv 13 hours ago | hide | past | favorite | 83 comments




If it's just the mainboard and no screen, OP could put it in a dedicated case like the CoolerMaster one:

https://www.coolermaster.com/en-global/products/framework/


Here's a link to the case on the Framework marketplace:

https://frame.work/ca/en/products/cooler-master-mainboard-ca...

I put my original mainboard in one of these when I upgraded. It's fantastic. I had it VESA-mounted to the back of a monitor for a while which made a great desktop PC. Now I use it as an HTPC.


Those are pretty cool. I meant to highlight more, that the laptop has done super well. I can't even tell it's on as I hear no fan / no heat. I guess laptops are pretty good for this as they are great at sipping power when there is a low load.

Back in 2012 or so, I reused an old netbook (an Asus Eee PC) with an Atom CPU & 1GB of RAM, installed Ubuntu Server, and used it as a home server. It handled the printer, DNS-VPN proxying for streaming, and a few other things admirably for years. (And ironically was resilient to Spectre because its Atom CPU was before Intel added speculative execution)

Eventually, the thing that kicked the bucket was actually the keyboard (and later the fan started making "my car won't start" noises occasionally). Even the horribly-slow HDD (that handled Ubuntu Server surprisingly well) hadn't died yet.


What do you use self-hosted S3 for? I feel like all the use cases I can think of would be better served by a network attached file system.

A fair few things want blob object storage like S3. NFS does not scale to ridiculous levels horizontally or vertically. S3 does things like de-duplication and other funky tricks.

So if you want to use an app that needs S3 then you need to deploy S3 and not NFS.

I run a minio cluster (S3) for Veeam backups at work. I also run multiple NFS for Veeam and VMware datastores.

Tools for the job mate!


I'd rather go with an old Dell T30 and 2x10TB Seagate Exos in ZFS RAID1 mode (Mirror). This thing would make me nervous every day, even with a daily backup in place... While the Dell T30 would also make me nervous, you could at least plug the disks into any other device and are not wiring up everything with some easy to pull out cables ;)

However, garage sounds nice :-) Thanks for posting.


I've been using ZFS for quite a while, and I had a realization some time ago that for a lot of data, I could tolerate a few hours worth of loss.

So instead of a mirror, I've set up two separate one-disk pools, with automatic snapshots of the primary pool every few hours, which are then zfs send/recv to the other pool.

This gives me a lot more flexibility in terms of the disks involved, one could be SSD other spinning rust for example, at the cost of some read speed and potential uptime.

Depending on your needs, you could even have the other disk external, and only connect it every few days.

I also have another mirrored RAID pool for more precious data. However almost all articles on ZFS focus on the RAID aspect, while few talk about the less hardware demanding setup described above.


Interesting idea... thanks for sharing.

I have two setups.

1.) A mirror with an attached Tasmota Power Plug that I can turn on and off via curl to spin up an USB-Backup-HD:

  curl "$TASMOTA_HOST/cm?cmnd=POWER+ON"
  # preparation and pool imports
  # ...
  # clone the active pool onto usb pool
  zfs send --raw -RI "$BACKUP_FROM_SNAPSHOT" "$BACKUP_UNTIL_SNAPSHOT" | pv | zfs recv -Fdu "$DST_POOL"
2.) A backup server that pulls backup to ensure ransomware has no chance via zsync (https://gitlab.bashclub.org/bashclub/zsync/)

To prevent partial data loss I use zfs-auto-snapshot, zrepl or sanoid, which I configure to snapshot every 15 minutes and keep daily, weekly, montly and yearly snapshots as long as possible.

To clean up my space when having too many snapshots, I wrote my own zfs-tool (https://github.com/sandreas/zfs-tool), where you can do something like this:

  zfs-tool list-snapshots --contains='rpool/home@' --required-space=20G --keep-time="30d"

That's a really cool idea and matches my use case well. I just copy pasted it to another person in this thread who was asking about the ZFS setup.

Your use case perfectly matches mine in that I wouldn't mind much about a few hours of data loss.

I guess the one issue is that it would require more disks, which at the current prices is not cheap. I was suprised how expensive it was when I bought them 6 months ago and was even more suprised when I looked recently and the same drives are even more now.


I opted to use a two disk mirror, and offline the slow disk. Hourly cronjob to online the slow disk, wait, and then offline it again.

Gives me the benefit of automatic fixes in the event of bit rot in any blocks more then an hour old too.


That sounds cool; is it possible to just query the ZFS system to know when it has finished synchronizing the slow disk, before bringing it offline again? Do you think that stopping and spinning the disk again, 24 times a day, is not going to cause much wear to the motors?

  zfs wait <poolname>

That is another way, though annoying if you've set up automatic error reporting.

I’ve not heard of garage before but it looks quite interesting. I use s3 a lot for work but for homelab backups I’ve always just used borg on borgbase. Now I’m wondering whether I could use garage to pair a local node and AWS glacier for cheap redundancy of a large media library (I’m assuming that ~all of the reading is automatically done from the local node). TFA doesn’t really talk much about the actual experience of using garage - would love to hear more opinions from those who use it for self-hosting.

Edit: Realised you can’t use glacier since storage has to be mounted to the ec2 compute running the garage binary as a filesystem. So doesn’t really make sense as media library backup over just scheduling a periodic borg / restic backup to glacier directly.


Another alternative is ZeroFS[1], just store your stuff directly to S3.

[1] https://github.com/Barre/ZeroFS


That looks very interesting - will look into it, thanks.

I haven't needed to interact with Garage itself specifically. I've been using Boto3 / awscli / s3cmd / rclone for everything S3 API related and it's worked great. Garage was a few commands to setup, turn on, get API keys setup, and then left to run on it's own for the past 4 months.

So in that sense, I've loved it.


I love Garage. It just works. I have Garage running on a few older Odroid HC2's, primarily for k8s Velero backup, and it's just set and forget.

Nice to see Garage mentioned. I was deciding between S3-compatible self-hosted alternatives and ended up choosing SeaweedFS. It seems to require less manual configuration compared to Garage

I'd like more ellaboration on the technical side. Not literally how to do the same and what commands to use, but more in the line of how are the ZFS pools configured, or if Garage is opinionated and configures it all by itself. Are there mirrors in there? Or it's just individual pools that sync from some disks to others?

I have 2 USB disks and want to make a cheapo NAS but I always doubt between making a ZFS mirror, making 2 independent pools and use one to backup the other, or just go the alternate route and use SnapRAID and then be able to mix more older HDDs for maximum usage of the hardware I already own.


My understanding is that Garage is not opinionated and could easily have worked without ZFS. I installed ZFS in Ubuntu, and then later installed Garage.

As for the ZFS setup, I kept it simple and did RAID5/raidz1. I'm no expert in that, and have been starting to think about it again as the pool approaches 33% full.

I saw this comment in another thread here that sounded interesting as well by magicalhippo: "I've been using ZFS for quite a while, and I had a realization some time ago that for a lot of data, I could tolerate a few hours worth of loss. So instead of a mirror, I've set up two separate one-disk pools, with automatic snapshots of the primary pool every few hours, which are then zfs send/recv to the other pool."

This caught my attention as it matches my usecase well. My original idea was that RAID5 would be good incase a HD fails, and that I would replicate the setup at another location, but the overall costs (~$1k USD) are enough that I haven't done that yet.


If you know where to look/are a little lucky, you can get an adequate RAID5 going for like $500-800 depending on the storage you need. I grabbed a QNAP 4 bay (no SSD caching) and 4x refurbished enterprise HDD's (14tb/ea) for just under $700 all-in last november if memory serves. Pretty reasonable for a 42tb RAID5 IMO.

Just wanted to share a quiet successful self hosting.

Does this JBOD consist of SSD? HDDs in that amount can be rather noisy.

Yeah they are HDs and are surprisingly noisy.

It's weird to me that "owning a computer that runs stuff" is now "self-hosting", just feels like an odd phrasing. Like there's an assumption that all computers belong to someone else now, so we have to specify that we're using our own.

Think services

You can own a computer and not run any services at all. Most people do.

Deciding to run your own services, like email, means a lot of work that most people aren’t interested or capable of doing.

It’s the difference between using your computer to consume things or produce things.


It’s not clear from the blog post if the S3 is accessible from outside their home. I agree with the parent that purely local services aren’t what typically counts as “self-hosting”.

We call it self hosting because it is typically hosted by someone else, get it?

Let's not kid ourselves that maintaining 10TB with resiliency handling and other controls built in is something that is trivial. It is only trivial due to the offerings that Cloud computing has made easy.

Self-hosting implies those features without the cloud element and not just buying a computer.


10tb fits on one disk though - it may not be trivial but it's not overly complicated setting up a raid-1. Off-site redundancy and backup of course does make it more complicated however.

And all of those things are more steps than "buying a computer".

Reminds me of the "Dropbox can be built in a weekend"


You can buy a 10TB+ external drive which uses RAID1.

You can also buy a computer with this — not a laptop, and I don't know about budget desktops, but on Dell's site (for example) it's just a drop-down selection box.


Moot point. It really depends on your expectations.

Self-hosting 10TB in an enterprise context is trivial.

Self hosting 10TB at home is easy.

The thing is: once you learn enough ZFS, whether you’re hosting 10 or 200TB it doesn’t change much.

The real challenge is justifying to yourself spending for all those disks. But if it’s functional to yourself spending hobby…


Very cool! I replaced my mainboard on my framework and am trying to convert it to a backup for my nas.

Could you talk a little more about your zfs setup? I literally just want it to be a place to send snapshots but I’m worried about the usb connection speed and the accidentally unplugging it and losing data



ZFS is RAM hungry, plus doesn't like USB connections (like the article implied). So, I've been eyeing btrfs as a way to setup my NAS drives. Would I miss something in that setup?

Getting into S3 myself and really curious about what Garage has to offer vs the more mature alternatives like Minio. From what I gather, it kinda works better with small (a few kilobytes) files or something?

Minio recently started removing features from the community version. https://news.ycombinator.com/item?id=44136108

How awful. It seems to be a pattern nowadays?

Some former colleagues still using gitlab ce tell me they also removed features from their self-hosted version, particularly from their runners.


I loved minio until they silently removed 99% of the admin UI to push users towards the paid offering. It just disappeared one day after fetching the new minio images. The only evidence of the change online was discussions by confused users in the GitHub issues

I have also been considering this for some time. Been comparing MinIO, Garage, and Ceph. MinIO may not be wise given their recent moves, as another commenter noted. Garage seems ok but their git doesn’t show much activity these days so I wonder if it too will be abandoned. Which leaves us with Ceph. May have a higher learning curve but also offers the most flexibility as one can do object as well as block and file. Gonna set up a single node with 9 OSD’s soon and give it a go but always looking for input if anyone would like to provide some.

If I can reassure you about Garage, it's not at all abandoned. We have active work going on to make a GUI for cluster administration, and we have applied for a new round of funding for more low-level work on performance, which should keep us going for the next year or so. Expect some more activity in the near future.

I manage several Garage clusters and will keep maintaining the software to keep these clusters running. But concerning the "low level of activity in the git repo": we originally built Garage for some specific needs, and it fits these needs quite well in its current form. So I'd argue that "low activity" doesn't mean it's not reliable, in fact it's the contrary: low activity means that it works well for us and there isn't a need to change anything.

Of course implementing new features is another deal, I personally have only limited time to spend on implementing features that I don't need myself. But we would always welcome outside contributions of new features from people with specific needs.


I appreciate the response! Thanks for the update. I will continue keeping an eye on the project then and possibly giving it a try. I have read the docs and was considering setting it up across two sites. The implementation seemed address this pain point with distributed storage solutions and latency.

I've used Ceph in a home lab setting for 9 years or so now. Since cephadm is has gotten even easier to manage even though it really was never that hard. A few pointers. No SMR drives, they have such bad performance that they can periodically drop out of the cluster. Second, no consumer SSDs/NVMe devices. You need power loss prevention on your drives. Ceph directly writes to the drive, it ignores cache, without PLP you may literally have slower performance than rust.

You also want fast networking, I just use 10Gbps. My nodes each are 6 rust and 1 NVMe drive each, 5 nodes. I colocate my MONs and MDS daemons with my OSDs, each node has 64GB of RAM and I use around 40GB.

Usage is RDB for a three node OpenStack cluster, and CephFS. I have about 424TiB between rust and NVMe raw.


The point about smr drives cannot be stressed enough.

Smr drives are absolutly shit-tier choice in terms of drives.


I have an ancient Qnap NAS (2015) which is on borrowed time and I’m trying to figure out what to replace it with. Keep going back and forth between rolling my own with a Jonsbo case vs. a prebuilt like the new Ubiquti boxes. This is an attractive third option of a modest compute box (raspy, NUC, etc.) paired with a JBOD over USB. Can you still use something like TrueNAS with a setup like that?

Local storage should be like a home appliance, not something we build even though we can.

When things inevitably need attention it’s not about diy.


Thanks for the lead on Garage S3. Everyone's always recommending minIO and Ceph which are just not fun to work with.

Neat. Depending on your use case it might make sense. Still I wonder what they use for backup? For many use cases downtime is acceptable, but data loss is generally not. Did I miss it in the post?

OP here. There I currently have some things syncd to a cloud S3. The long term plan would be to replicate the setup at another location to take advantage of garage region/nodes, but need to wait for the money for that.

What enclosure houses the JBOD?

Don't know about that one but can recommend Terramaster DAS, they don't cheap out on the controller. I have a d4-320 connected to my NUC.

With the metadata only on the internal drive, isn't this a SPOF?

Given that it's JBOD over USB I don't think this is aimed at redundancy

Yeah, this was an effort to get around cloud costs for large amounts of 'low value' data that I have but use in my other home servers for processing. I still sync some smaller result sets to an S3 in the cloud for redundancy as well as for CDN uses.

I thought zfs is doing the RAID.

It could be. Author didn't specify. zfs isn't inherently redundant or RAID so it may or may not have redundancy

i'd be stressed out while watering those plants.

Plants look very portable

The laptop is easy to repair, at least.

Why are you calling it S3? That is a proprietary Amazon cloud technology. Why not call it what is it is, e.g. ZFS, file store, or object store? Let's not dilute terms.

That's a good point, it is S3 compatible object storage, not just S3. My experience with AWS S3 has impacted the way I use object storage and since this project is syncd to another S3 compatible object storage using the S3 protocol, in my head I just call it all S3.

> Garage implements the Amazon S3 API and thus is already compatible with many applications.

https://garagehq.deuxfleurs.fr/


Yes, it's S3 API compatible, but it's not S3. The originally submitted article title misleads by claiming it's S3. There is no valid excuse.

Amazing, I will try Garage.

What brand of HDD did you use?


I went with IronWolf, likely due to price, though interestingly they are 25% more expensive than when I bought them six months ago.

Read up on backblaze hard drive reports. Great source of info

10TB, you could just mirror 2 drives with that, seen people serving 10PB at home by this point I'm sorry to say

I really don't get it. Do they host it on Amazon S3 or do they self-host it on a NAS?

They built an object storage system exposing an S3-compatible API, by using https://garagehq.deuxfleurs.fr/

Okay, weird to call it S3, if it is just object storage somewhere else. Its like saying "EKS" if you mean Kubernetes, or talking about "self hosting EC2" by installing qemu.

> weird to call it S3

I feel that is a bit of an unfair assessment.

AWS S3 was the first S3-compatible API provider, nowadays most cloud providers and bunch of self hosted software supports S3(-Compatible) APIs. Call it Object Store (which is a bit unspecific) or call it S3-Compatible.

EKS and EC2 on the other hand are a set of tools and services, operated by AWS for you - with some APIs surrounding them that are not replicated by any other party (at least for production use).


You can even do S3-on-ZFS

S3 is both a product and basically an API standard.

Garage talks the same S3 API.

GarageFS S3 compatibility https://garagehq.deuxfleurs.fr/documentation/reference-manua... vs

  - Openstack Swift
  - CEPH Object Gateway
  - Riak CS
  - OpenIO
SeaweedFS vs. JuiceFS https://juicefs.com/docs/community/comparison/juicefs_vs_sea...

It’s self hosted, and self hosted nas’ can run the s3 storage protocol locally as well.

Yeah, that's pretty standard for object storage to be S3-compatible. I think azure blob is the only one that doesn't support it.

>>About 5 months ago I made the decision to start self hosting my own S3.

It is eleven nines of durability? No. You didn't build S3. You built a cheapo NAS.


And won't be charged for ingres, egress or IOPS etc, it's better than bad, it's good. Happy times.

I think it's pretty obvious he's talking about the protocol not the amazon service...



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: