Hacker News new | past | comments | ask | show | jobs | submit | jcalvinowens's comments login

I've used BTRFS exclusively for over a decade now on all my personal laptops, servers, and embedded devices. I've never had a single problem.

It's the flagship Linux filesystem: outside of database workloads, I don't understand why anybody uses anything else.


"Flagship"? I don't know a single person who uses it in production systems. It's the only filesystem I've lost data to. Ditto for friends.

Please go look up survivor bias. That's what all you btrfs fanboys don't seem to understand. It doesn't matter how well it has worked for 99.9% of you. Filesystems have to be the most reliable component in an operating system.

It's a flagship whose fsck requires you to contact developers to seek advice on how to use it because otherwise it might destroy your filesystem.

It's a flagship whose userspace tools, fifteen years in, are still seeing major changes.

It's a flagship whose design is so poor that fifteen years in the developers are making major changes to its structure and depreciating old features in ways that do not trigger an automatic upgrade or informative error to upgrade, but cause the filesystem to panic with error messages for which there is no documentation and little clue what the problem is.

No other filesystem has these issues.


Btrfs is in production all over the damn place, at big corporations and all kinds of different deployments. Synology has their own btrfs setup that they ship to customers with their NAS software for example.

I found it incredibly annoying the first time I ran out of disk space on btrfs, but many of these points are hyperbolic and honestly just silly. For example, btrfs doesn't really do offline fsck. fsck.btrfs has a zero percent chance of destroying your volume because it does nothing. As for the user space utilities changing... I'm not sure how that demonstrates the filesystem is not production ready.

Personally I usually use either XFS or btrfs as my root filesystem. While I've caught some snags with btrfs, I've never lost any data. I don't actually know anyone who has, I've merely just heard about it.

And it's not like other well-regarded filesystems have never ran into data loss situations: even OpenZFS recently (about a year ago) uncovered a data-eating bug that called its reliability into question.

I'm sure some people will angrily tell me that actually btrfs is shit and the worst thing to ever be created and honestly whatever. I am not passionate about filesystems. Wake me up when there's a better one and it's mainlined. Maybe it will eventually be bcachefs. (Edit: and just to be clear, I do realize bcachefs is mainline and Kent Overstreet considers it to be stable and safe. However, it's still young and it's upstream future has been called into question. For non-technical reasons, but still; it does make me less confident.)


    For example, btrfs doesn't really do offline fsck. fsck.btrfs has a
    zero percent chance of destroying your volume because it does nothing.
fsck.btrfs does indeed do nothing, but that's not the tool they were complaining about. From the btrfs-check(8) manpage:

    Warning

    Do not use --repair unless you are advised to do so by a
    developer or an experienced user, and then only after having
    accepted that no fsck can successfully repair all types of
    filesystem corruption. E.g. some other software or hardware
    bugs can fatally damage a volume.
    
    [...]
    
    DANGEROUS OPTIONS
    
    --repair
        enable the repair mode and attempt to fix problems where possible

        Note there’s a warning and 10 second delay when this option is
        run without --force to give users a chance to think twice
        before running repair, the warnings in documentation have
        shown to be insufficient

Yes, but that doesn't do the job that a fsck implementation does. fsck is something you stuff into your initrd to do some quick checks/repairs prior to mounting, but btrfs intentionally doesn't need those.

If you need btrfs-check, you have probably hit either a catastrophic bug or hardware failure. This is not the same as fsck for some other filesystems. However, ZFS is designed the same way and also has no fsck utility.

So whatever point was intended to be made was not, in any case.


>I don't actually know anyone who has, I've merely just heard about it.

Well "yarg", a few comments up in this conversation, says he lost all his data to it with the last year.

I've seen enough comments like that that I don't see it as a trustworthy filesystem. I never see comments like that about ext4 or ZFS.


Contrary to popular belief, people on a forum you happen to participate in are still just strangers. In line with popular belief, anecdotal evidence is not a good basis to form an opinion.

Exactly how do you propose to form an opinion on filesystem reliability then? Do my own testing with thousands of computers over the course of 15 years?

You don't determine what CPUs are fast or reliable by reading forum comments and guessing, why would filesystems be any different?

That said, you make a good point. It's actually pretty hard to quantify how "stable" a filesystem is meaningfully. It's not like anyone is doing Jepsen-style analysis of filesystems right now, so the best thing we can go off of is testimony. And right now for btrfs, the two types of data-points are essentially, companies that have been using it in production successfully, and people on the internet saying it sucks. I'm not saying either of those is great, and I am not trying to tell anyone that btrfs is some subjective measure of good. I'm just here to tell people it's apparently stable enough to be used in production... because, well, it's being used in production.

Would I argue it is a particularly stable filesystem? No, in large part because it's huge. It's a filesystem with an integrated volume manager, snapshots, transparent compression and much more. Something vastly simpler with a lower surface area and more time in the oven is simply less likely to run into bugs.

Would I argue it is perfectly reasonable to use btrfs for your PC? Without question. A home use case with a simple volume setup is exceedingly unlikely to be challenging for btrfs. It has some rough edges, but I don't expect to be any more likely to lose data to btrfs bugs as I expect to lose data from hardware failures. The bottom line is, if you absolutely must not lose data, having proper redundancy and backups is probably a much bigger concern than btrfs bugs for most people.


>You don't determine what CPUs are fast or reliable by reading forum comments and guessing, why would filesystems be any different?

Your premise is entirely wrong. How else would I determine what CPUs are fast or reliable? Buy dozens of them and stress-test them all? No, I use online sites like cpu-monkey.com that compare different CPUs' features and performance according to various benchmarks, for the performance part at least. For reliability, what way can you possibly think of other than simply aggregating user ratings (i.e. anecdotes)? If you aren't running a datacenter or something, you have no practical alternative.

At least for spinning-rust HDDs, the helpful folks at Backblaze have made a treasure trove of long-term data available to us. But this isn't available for most other things.

> It's not like anyone is doing Jepsen-style analysis of filesystems right now, so the best thing we can go off of is testimony.

This is exactly my point. We have nothing better, for most of this stuff.

>companies that have been using it in production successfully, and people on the internet saying it sucks

Companies using something doesn't always mean it's any good, especially for individual/consumer use. Companies can afford teams of professionals to manage stuff, and they can also make their own custom versions of things (esp. true with OSS code). They're also using things in ways that aren't comparable to individuals. These companies may be using btrfs in a highly feature-restricted way that they've found, through testing, is safe and reliable for their use case.

> It's a filesystem with an integrated volume manager, snapshots, transparent compression and much more. Something vastly simpler with a lower surface area and more time in the oven is simply less likely to run into bugs.

This is all true, but ZFS has generally all the same features, yet I don't see remotely as many testimonials from people saying "ZFS ate my data!" as I have with btrfs over the years. Maybe btrfs has gotten better over time, but as the American car manufacturers found out, it takes very little time to ruin your reputation for reliability, and a very long time to repair that reputation.


> Your premise is entirely wrong. How else would I determine what CPUs are fast or reliable? Buy dozens of them and stress-test them all? No, I use online sites like cpu-monkey.com that compare different CPUs' features and performance according to various benchmarks, for the performance part at least. For reliability, what way can you possibly think of other than simply aggregating user ratings (i.e. anecdotes)? If you aren't running a datacenter or something, you have no practical alternative.

My point is just that anecdotes alone don't tell you much. I'm not suggesting that everyone needs to conduct studies on how reliable something is, but if nobody has done the groundwork then the best thing we can really say is we're not sure how stable it is because the best evidence is not very good and it conflicts.

> Companies using something doesn't always mean it's any good, especially for individual/consumer use. Companies can afford teams of professionals to manage stuff, and they can also make their own custom versions of things (esp. true with OSS code). They're also using things in ways that aren't comparable to individuals. These companies may be using btrfs in a highly feature-restricted way that they've found, through testing, is safe and reliable for their use case.

For Synology you can take a look at what they're shipping since they're shipping it to consumers. It does seem like they're not using many of the volume management features, instead using some proprietary volume management scheme on the block layer. However otherwise there's nothing particularly special that I can see, it's just btrfs. Other advanced features like transparent compression are available and exposed in the UI.

(edit: Small correction. While I'm still pretty sure Synology has custom volume management for RAID which works on the block level, as it turns out, they are actually using btrfs subvolumes as well.)

I think the Synology case is an especially interesting bit of evidence because it's gotta be one of the worst cases of shipping a filesystem, since you're shipping it to customer machines you don't control and can't easily inspect later. It's not the only case of shipping btrfs to the customer either, I believe ChromeOS does this and even uses subvolumes, though I didn't actually look for myself when I was using it so I'm not actually 100% sure on that one.

> This is all true, but ZFS has generally all the same features, yet I don't see remotely as many testimonials from people saying "ZFS ate my data!" as I have with btrfs over the years. Maybe btrfs has gotten better over time, but as the American car manufacturers found out, it takes very little time to ruin your reputation for reliability, and a very long time to repair that reputation.

In my opinion, ZFS and other Solaris technologies that came out around that time period set a very high bar for reliable, genuinely innovative system features. I think we're going to have to live with the fact that just having a production-ready filesystem dropped onto the world is not going to be the common case, especially in the open source world: the filesystem will need to go through its growing pains in the open.

Btrfs has earned a reputation as the perpetually-unfinished filesystem. Maybe it's tainted and it will simply never approach the degree of stability that ZFS has. Or, maybe it already has, and it will just take a while for people to acknowledge it. It's hard to be sure.

My favorite option would be if I just simply don't have to find out, because an option arrives that quickly proves itself to be much better. bcachefs is a prime contender since it not only seems to have better bones but it's also faster than btrfs in benchmarks anyways (which is not saying much because btrfs is actually quite slow.) But for me, I'm still waiting. And until then, ZFS is not in mainline Linux, and it never will be. So for now, I'm using btrfs and generally OK recommending it for users that want more advanced features than ext4 can offer, with the simple caveat that you should always keep sufficient backups of your important data at all times.

I only joined in on this discussion because I think that the btrfs hysteria train has gone off the rails. Btrfs is a flawed filesystem, but it continues to be vastly overstated every time it comes up. It's just, simply put, not that bad. It does generally work as expected.


>Synology has their own btrfs setup that they ship to customers with their NAS software for example.

Synology infamously/hilariously does not use btrfs as the underlying file system because even they don't trust btrfs's RAID subsystem. Synology uses LVM RAID that is presented to btrfs as a single drive. btrfs isn't managing any of the volumes/disks.


Their reason for not using btrfs as a multi-device volume manager is not specified, though it's reasonable to infer that it is because btrfs's own built-in volume management/RAID wasn't suitable. That's not really very surprising: back in ~2016 when Synology started using btrfs, these features were still somewhat nascent even though other parts of the filesystem were starting to become more mature. To this day, btrfs RAID is still pretty limited, and I wouldn't recommend it. (As far as I know, btrfs RAID5/6 is even still considered incomplete upstream.) On the other hand, btrfs subvolumes as a whole are relatively stable, and that and other features are used in Synology DSM and ChromeOS.

That said, there's really nothing particularly wrong with using btrfs with another block-level volume manager. I'm sure it seems silly since it's something btrfs ostensibly supports, but filesystem-level redundancy is still one of those things that I think I would generally be afraid to lean on too hard. More traditional RAID at the block level is simply going to be less susceptible to bugs, and it might even be a bit easier to manage. (I've used ZFS raidz before and ran into issues/confusion when trying to manage the zpool. I have nothing but respect for the developers of ZFS but I think the degree to which people portray ZFS as an impeccable specimen of filesystem perfection is a little bit unrealistic, it can be confusing, limited, and even, at least very occasionally, buggy too.)


>That's not really very surprising: back in ~2016 when Synology started using btrfs, these features were still somewhat nascent even though other parts of the filesystem were starting to become more mature.

btrfs was seven years old at that point and declared "stable" three years before that.

ZFS is an example of amazingly written code by awesome engineers. It's simple to manage, scales well, and easy to grok. btrfs sadly will go the wayside once bcachefs reaches maturity. I wouldn't trust btrfs for important data, and neither should you. If you experience data loss on a Synology box, the answer you'll get from them is "tough shit, hope you have backups, and here's a coupon for a new Synology unit."


> btrfs was seven years old at that point and declared "stable" three years before that.

The on-disk format was declared stable in 2013[1]. That just meant that barring an act of God, they were not going to break the on-disk format, e.g. a filesystem created at that point would continue to be mountable for the foreseeable future. It was not a declaration that the filesystem was itself now stable necessarily, but especially was not suggesting that all of the features were stable. (As far as I know, many features still carried warning labels.)

Furthermore, the "it's been X years!" thing referring to open source projects has to stop. This is the same non-sense that happens with every other thing that is developed in the open. Who cares? What matters isn't how long it took to get here. What matters is where it's at. I know there's going to be some attempt at rationalizing this bit, but it's wasted on me because I'm tired of hearing this.

> ZFS is an example of amazingly written code by awesome engineers. It's simple to manage, scales well, and easy to grok.

Agreed. But ZFS was written by developers at Sun Microsystems for their commercial UNIX. We should all be gracious to live in a world where Sun Microsystems existed. We should also accept that Sun Microsystems is not the standard any more than Bell Labs was the standard, they are extreme outliers. If we measure everything based on whether it's as good as what Sun Microsystems was doing in the 2000s, we're going to have a bad time.

As an example, DTrace is still better than LTTng is right now. I hope that sinks in for everyone.

However, OpenZFS is not backed by Sun Microsystems, because Sun Microsystems is dead. Thankfully and graciously at that, it has been maintained for many years by volunteers, including at least one person who worked on ZFS at Sun. (Probably more, but I only know of one.)

Now if OpenZFS eats your data, there is no big entity to go to anymore than there is for btrfs. As far as I know, there's no big entity funding development, improvements, or maintenance. That's fine, that's how many filesystems are. But still, that's not what propelled ZFS to where it stood when Sun was murdered.

> btrfs sadly will go the wayside once bcachefs reaches maturity.

I doubt it will disappear quickly: it will probably continue to see ongoing development. Open Source is generally pretty good at keeping things alive in a zombie state. That's pretty important since it is typically non-trivial to do online conversion of filesystems. (Of course, we're in a thread about a tool that does seamless offline conversion of filesystems, which is pretty awesome and impressive in and of itself.)

But for what it's worth, I am fine with bcachefs supplanting btrfs eventually. It seems like it had a better start, it benchmarks faster, and it's maturing nicely. Is it safer today? Depends on who you ask. But it's hard to deny that it doesn't seem like the point at which bcachefs will be considered stable by most will take more than a year or two tops, assuming kernel drama doesn't hold back upstream.

Should users trust bcachefs with their data? I think you probably can right now with decent safety, if you're using mainline kernels, but bcachefs is still pretty new. Not aware of anyone using it in production yet. It really could use a bit more time before recommending people jump over to it.

> I wouldn't trust btrfs for important data, and neither should you.

I stand by my statement: you should always ensure you have sufficient backups for important data, but most users should absolutely fear hardware failures more than btrfs bugs. Hardware failures are an if, not a when. Hardware will always fail eventually. Data-eating btrfs bugs have certainly existed, but it's not like they just appear left and right. When such a bug appears, it is often newsworthy, and usually has to do with some unforeseen case that you are not so likely to run into by accident.

Rather than lose data, btrfs is instead more likely to just piss you off by being weird. There are known quirks that probably won't lose you any data, but that are horribly annoying. It is still possible, to my knowledge, to get stuck in a state where the filesystem is too full to delete files and the only way out is in recovery. This is pretty stupid.

It's also not particularly fast, so if someone isn't looking for a feature-rich CoW filesystem with checksums, I strongly recommend just going with XFS instead. But if you run Linux and you do want that, btrfs is the only mainline game in town. ZFS is out-of-tree and holds back your kernel version, not to mention you can never really ship products using it (with Linux) because of silly licensing issues.

> If you experience data loss on a Synology box, the answer you'll get from them is "tough shit, hope you have backups, and here's a coupon for a new Synology unit."

That suggests that their brand image somewhat depends on the rarity of btrfs bugs in their implementation, but Synology has a somewhat good reputation actually. If anything really hurts their reputation, it's mainly the usual stuff (enshittification.) The fact that DSM defaults to using btrfs is one of the more boring things at this point.

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...


I agree with what you say, and I would never trust btrfs with my data because of issues that I've seen in the past, My last job I installed my Ubuntu desktop with btrfs and within three days it had been corrupted so badly because of a power outage that I had to completely wipe and reinstall the system.

That said:

> but cause the filesystem to panic with error messages for which there is no documentation and little clue what the problem is.

The one and only time I experimented with ZFS as a root filesystem I got bit in the ass because the zfs tools one day added a new feature flag to the filesystem that the boot loader (grub) didn't understand and therefore it refused to read the filesystem, even read-only. Real kick in the teeth, that one, especially since the feature flag was completely irrelevant to just reading enough of the filesystem for the boot loader to load the kernel and there was no way to override it without patching grub's zfs module on another system then porting it over.

Aside from that, ZFS has been fantastic, and now that we're all using UEFI and our kernels and initrds are on FAT32 filesystems I'm much less worried, but I'm still a bit gunshy. Not as much as with BTRFS, mind you, but somewhat.


I list data on btrfs on a raspberry pi with a slightly dodgy PSU.

We need more testing of filesystems and pulling the power.

I switched to a NAS with battery backup and it's been better.

So that was inconclusive, before that the last time I lost data like that was to Reiserfs in the early 2000s.


> Please go look up survivor bias. That's what all you btrfs fanboys don't seem to understand. It doesn't matter how well it has worked for 99.9% of you. Filesystems have to be the most reliable component in an operating system.

Not sure. It's useful if they are reliable, but they only need to be roughly as reliable as your storage media. If your storage media breaks down once in a thousand years (or once a year for a thousand disks), then it doesn't matter much if your filesystem breaks down once in a million years or once in a trillion years.

That being said, I had some trouble with BTRFS.


Meta (Facebook) has millions of instances of Btrfs in production. More than any other filesystem by far. A few years ago when Fedora desktop variants started using Btrfs by default, Meta’s experience showed it was no less reliable than ext4 or XFS.

I couldn't disagree more: I've worked with lots of embedded devices running systemd, and it solves many more problems than it introduces. The community is also quite responsive and helpful in my experience.

I won't pretend there aren't occasional weird problems... but there's always a solution, here's a recent example: https://github.com/systemd/systemd/issues/34683

Memory use is irrelevant to me: every embedded Linux device I've been paid to work on in the past five years had over 1GB of RAM. If I'm on a tiny machine where I care about 8MB RSS, I'm not running Linux, I'm running Zypher or FreeRTOS.


> every embedded Linux device I've been paid to work on in the past five years had over 1GB of RAM. If I'm on a tiny machine where I care about 8MB RSS, I'm not running Linux, I'm running Zypher or FreeRTOS

The gap between “over 1GB of RAM” and 8MB RSS contains the vast majority of embedded Linux devices.

I, too, enjoy when the RAM budget is over 1GB. The majority of cost constrained products don’t allow that, though.

That’s said, it’s more than just RAM. It increases boot times (mentioned in the article) which is a pretty big deal on certain consumer products that aren’t always powered on. The article makes some good points that you’ve waved away because you’ve been working on a different category of devices.


> The gap between “over 1GB of RAM” and 8MB RSS contains the vast majority of embedded Linux devices

The gap between 16MB RAM and 64MB RAM doesn't exist, though. Literally doesn't, the components have the same cost down to a cent in the BOM.

And if you can have 64MB, then there's systemd's own true memory use (around 3-4MB) is completely immaterial.


except thanks to availability crisis hitting the industry for the past decade you have to go with the 4mb sometimes

just look at wifi routers. in the usa and China they are all sold with 64 or 128mb ram. south America and Europe they are all 16 or 32 for no clear reason.


Do you have some examples? I have a very hard time imagining a modern Wifi router supporting the latest standards and IPv6, admin web interface and so on running on 16 MB of RAM. I also have issue with "wifi routers in Europe are all 16 or 32 MB of RAM". In what decade?

My ISP provided router also does VPN, VoIP, mesh networking, firewalling, and it's towards the lower end of feature set (as it's offered for free and not a fancy router I bought).

Are you talking about devices from the early 2000?


My TP-Link MR3020 from around 2015 only has 4^H 16 MB of ram (4 MB flash) and thus cannot even run OpenWRT anymore.


That thing was overstaying it's welcome for two years even then by staying on WiFi 4. 802.11n was adopted 15 years ago.


I’ve still got two MR3040. TP-Link hasn’t released any update for them in years. You can run an older version of OpenWrt on them, but there’s no real point. These things don’t even support 5GHz WiFi.


I've got a few devices that only support 2.4 B/G. They're not in common use, but using an equally legacy router is the only way for them to connect.


Device with very nice design. I still keep it as decorative even when its a brick.


2015 was also almost 10 years ago.


So i guess its a brick now and there is nothing we can do about it.


Using something that's already been produced (good) is not the same as selling dead end e-waste that's so underspecced it's barely working new (bad).


every. single. one.

pick any modem from linksys or dlink or netgear. then buy one in south America and compare what's really inside

look at all the revB on openwrt wiki, sometimes ram lowers. sometimes arm cpu change to mediatek. often the wifi chip changes from qualcomm to rtl. and it's always the revisions sold outside of usa and China in the observation fields.


> south America and Europe they are all 16 or 32 for no clear reason

I don't know where you're getting your data from but it's clearly wrong or outdated. These are the most often sold routers in Czechia on Alza (the largest online retailer) under $100:

- TP-Link Archer AX53 (256MB)

- TP-Link Archer AX23 (128MB)

- TP-Link Archer C6 V3.2 (128MB)

- TP-Link Archer AX55 Pro (512MB?)

...

- Mercusys MR80X (256MB)

- ASUS RT-AX52 (256MB)

https://www.alza.cz/EN/best-sellers-best-wifi-routers/188430...


"Best sellers" usually means "best advertized because of worst sales".


> The gap between “over 1GB of RAM” and 8MB RSS contains the vast majority of embedded Linux devices.

Of all currently existing Linux devices running around the world right this moment? Maybe.

But of new devices? Absolutely not, and that's what I'm talking about.

> The majority of cost constrained products don’t allow that, though.

They increasingly do allow for it, is the point I'm trying to make.

And when they don't: there are far better non-Linux open source options now than there used to be, which are by design better suited to running in constrained environments than a full blown Linux userland ever can be.

> It increases boot times (mentioned in the article) which is a pretty big deal on certain consumer products that aren’t always powered on. The article makes some good points that you’ve waved away because you’ve been working on a different category of devices.

I've absolutely worked on that category of devices, I almost never run Linux on them because there's usually an easier and better way. Especially where half a second of boot time is important.


> But of new devices? Absolutely not, and that's what I'm talking about.

The trouble with "new" is that it keeps getting old.

There would have been a time when people would have said that 32MB is a crazy high amount of memory -- enough to run Windows NT with an entire GUI! But as the saying goes, "what Andy giveth, Bill taketh away". Only these days the role of Windows is being played by systemd.

By the time the >1GB systems make it into the low end of the embedded market, the systemd requirements will presumably have increased even more.

> there are far better non-Linux open source options now than there used to be, which are by design better suited to running in constrained environments than a full blown Linux userland ever can be.

This seems like assuming the conclusion. The thing people are complaining about is that they want Linux to be good in those environments too.


> There would have been a time when people would have said that 32MB is a crazy high amount of memory

Those days are long gone though, for better or worse.

We live in the 2020s now and ram is plenty. The small computers we all carry in our pockets (phones) usually have between 4 and 16g GB ram.


That's entirely the point. In the days of user devices with 32MB of RAM, embedded devices were expected to make do with 32KB. Now we have desktops with 32GB and the embedded devices have to make do with 32MB. But you don't get to use GB of RAM now just because embedded devices might have that in some years time, and unless something is done to address it, the increase in hardware over time doesn't get you within the budget either because the software bloat increases just as fast.

And the progress has kind of stalled:

https://aiimpacts.org/trends-in-dram-price-per-gigabyte/

We've been stuck at ~$10/GB for a decade. There are plenty of devices for which $10 is a significant fraction of the BOM and they're not going to use a GB of RAM if they can get away with less. And if the hardware price isn't giving you a free ride anymore, not only do you have to stop the software from getting even bigger, if you want it to fit in those devices you actually need it to get smaller.


I recently looked up 2x48GB RAM kits and they are around 300€ and more for the overclockable ones. That is 3€ per GB and this is in the more expensive segment in the market since anyone who isn't overclocking their RAM is fine using four slots.


The end of that chart is in 2020 and in the interim the DRAM makers have been thumped for price fixing again, causing a non-trivial short-term reduction in price. But if this is the "real" price then it has declined from ~$10/GB in 2012 to, let's say, $1/GB now, a factor of 10 in twelve years. By way of comparison, between 1995 and 2005 (ten years, not twelve) it fell by a factor of something like 700.

You can say the free lunch is still there, but it's gone from a buffet to a stick of celery.


> We live in the 2020s now and ram is plenty. The small computers we all carry in our pockets (phones) usually have between 4 and 16g GB ram.

I do not think the monster CPUs running Android or iOS nowadays are representative of embedded CPUs.

RAM still requires power to retain its contents. In devices that sleep most of the time, decreasing the amount of RAM can be the easiest way to increase battery life.

I would also think many of the small computers inside my phone have less memory. For example, there probably is at least one CPU inside the phone module, a CPU doing write leveling running inside flash memory modules, a CPU managing the battery, a CPU in the fingerprint reader, etc.


> It increases boot times

Is it really the case? On desktops it is significantly faster than all the other alternatives. Of course if you do know your hardware there is no need for discovering stuff and the like, but I don't know. Would be interested in real-life experiences because to me systemd's boot time was always way faster than supposedly simpler alternatives.


When Arch Linux switched to systemd, my laptop (with an HDD) boot times jumped from 11 seconds to over a minute. That 11 seconds was easy to achieve in Arch’s config by removing services from the boot list and marking some of the others as supposed to be started in parallel without blocking others. After the switch to systemd there was no longer a such a simple list in a text file, and systemd if asked for the list would produce such a giant graph that I had no energy to wade through it and improve things.

Later, when I got myself a laptop with an SSD, I discovered that what my older Arch configuration could do on an HDD is what systemd could do only with an SSD.


I switched to systemd when Arch switched and from the get go, it was massively easier to parallelise with systemd than with the old system and that was with an HDD.

Systemd already parallelises by default so I don't know what insanely strange things you were doing but I fail to see how it could bring boot time form 11s to 1 minute. Also, it's very easy to get a list of every services enabled with systemctl (systemctl list-unit-files --state=enabled) so I don't really know what your point about a giant graph is.


Running things in parallel isn't going to make the disk faster... with a HDD I'd think it is actually even more likely to make the disk slower.


We don’t have to talk in hypotheticals here. Booting time benchmarks from the time systemd was released are everywhere and showed shorter boot times. It was discussed ad nauseam at the time.


Arch changed to systemd in 2012, at which point systemd was 2 years old. It surely had quite a few growing pains, but I don't think that's representative of the project. In general it was the first init system that could properly parallelize, and as I mentioned, it is significantly faster on most desktop systems than anything.


> In general it was the first init system that could properly parallelize

I'm not sure what you mean by "properly" but didn't initng and upstart (and probably some others I can't recall) do the parallel stuff before systemd?


it was only faster if you started with bloated redhat systems to begin with. but yes, it was the beginning of parallelism on init...

but the "faster boot" you're remembering are actually a joke at the time. since the team working on it were probably booting vms all the time, the system was incredible aggressive on shutdown and that was the source of it. something like it reboots so fast because it just throws everything out and reboot, or something. i don't really care much for the jokes but that was why everyone today remembers "systemd is fast".


It mandates strict session termination, unlike the unsustainable wild west approach of older Unix systems. Proper resource deallocation is crucial for modern service management. When a user exits without approval of "lingering user processes," all their processes should be signaled to quit and subsequently killed.


i think the "unsustainable wild west" of sending sigterm, waiting and sending sighup was very good because it was adaptable (you were on your own if you had non standard stuff, but at least you could expect a contract)

Nowadays if you start anything more serious from your user session (e.g. start a qemu vm from your user shell) it will get SIGHUP asap on shutdown, because systemd doesn't care about non service pids. but oh well.

...which is where the jokes about "systemd is good for really fast reboots" came from mostly.


The old way has literally no way to differentiate between a frozen process and one that just simply wants to keep on running after the session's end, e.g. tmux, screen.

It's trivial to run these as a user service, which can linger afterwards. Also, systemd has a configurable wait time before it kills a process (the "dreaded" 2 mins timer is usually something similar)


which was fine for everything that didn't need a watchdog. systemd on the other hand still lacks most common usecase and people bend over backwards to implement them with what's available. ask distro maintainers who know the difference between the main types of service files...


I mean, this anecdote only tells us that if something is configured poorly it will behave poorly.

If you're working in the embedded space it's surely worth a little bit of time to optimize something like this.


Same for me. Incredibly useful on the bloated sensors running Linux.

All the smaller systems with no RAM run on baremetal anyway. There's no room and no need to run Linux or a threaded RTOS. Much less security headaches also.


> every embedded Linux device I've been paid to work on in the past five years had over 1GB of RAM

That is almost by definition not an embedded device. There's a reason we have vfork().


there are 2 types of embedded systems: those that ship 1 million units, an and this that ship 50. if you're shipping 1mil units, you need to optimize RAM size, but if you're only shipping a few, them it's not worth squeezing everything down as long as it doesn't break your power/cost target. there's a ton of devices out there that literally just use a cheap smartphone as an "embedded" CPU because that way, Google has already done 90% of your R&D for you


Well, you can gatekeep all you want, but it's increasingly practical and common to have what would have seemed like an absurd amount of RAM a decade ago on things like toasters.


You have been living in a strange world if you’ve been getting away with 1GB in the average consumer IoT device for the past 5 years.

That’s not typical at all. I’ve done a lot of work with production runs in the millions. There is no way we’d put that much RAM on a device unless it was absolutely, undeniably required for core functionality.


Typically in IoT you'll count RAM in kB not MB and definitely not GB. See STM32 H5/H7/L4/L4+/U0 as an example.


Typically in IoT you'll use an soc that actually supported some kind of network connection


Depends.

In Automotive (i.e. telematics devices) you'll want a separate MCU for CAN-bus. For example, if you are doing Request-Response model you'll want to make use of the built-in filters. Besides, it is unlikely that a modem would support the CAN interface.

In Cellular IoT you'll prefer a separate MCU as it is much easier to port to different hardware. For example, you can hook-up the module via UART and use CMUX (AT/GNSS/PPP) and you'll cover 80%+ of the modules available in the market with very minimal specific implementation layers to enter into these modes.


I've asked in the past, and been told that a even a 2x-3x difference in the amount of RAM made such a negligible difference in cost it was decided to go with the larger amount. I frankly have a hard time understanding how that can be true... but I can't really imagine why they wouldn't be honest with me about it.


> I've asked in the past, and been told that a even a 2x-3x difference in the amount of RAM made such a negligible difference in cost it was decided to go with the larger amount

That doesn't pass the sniff test. Look at retail RAM prices. Certainly the magnitude of the price is quite different than buying individual RAM chips at quantity, but the costs do scale up as RAM size goes up. Hell, look at RAM chip prices: you are definitely going to increase the price by more than a negligible amount if you 2x or 3x the amount of RAM in your design.

Also consider the Raspberry Pi, since the article mentions it quite a bit: RAM size on the Pi is the sole driver of the different price points for each Pi generation.


At quantities of 100, 512Mbit of RAM is $1.01 [0]. 1Gbit of RAM is $1.10 [1]. 2Gbit is $1.05 [2]. 4Gbit is $1.16 [3]. It is only at 8Gbit that prices substantially increase to $2.30 [4].

So no, at those sizes price really does not change all that much, and installing 512MB of RAM instead of 64MB only increases the product's cost by $0.15. It's a commodity made on legacy processes, things like packaging, testing, and handling far outweigh the cost of the actual transistors inside the chip.

[0]: https://www.lcsc.com/product-detail/DDR-SDRAM_Winbond-Elec-W...

[1]: https://www.lcsc.com/product-detail/DDR-SDRAM_Winbond-Elec-W...

[2]: https://www.lcsc.com/product-detail/DDR-SDRAM_Samsung-K4B2G1...

[3]: https://www.lcsc.com/product-detail/DDR-SDRAM_Samsung-K4B4G1...

[4]: https://www.lcsc.com/product-detail/DDR-SDRAM_Samsung-K4A8G1...


At a company I worked at they explicitly told us they will do anything to avoid upgrading the hardware from 1GB to 4GB because it increases costs. They would rather we optimize the software to use less RAM than upgrade the hardware.

I remember arguing as well with people about 0.10$ components. They told me it was a no go, not even a point of bringing it up. Sometimes even a 0.01$ is a big deal.


Yeah, prices do indeed go up beyond 1GB - but we're talking about systemd needing 8MB of RAM. With small RAM chips there is more variation between parts than there is between sizes - hence the linked 2Gbit chip being cheaper than the 1Gbit one despite both of them being the cheapest option at LCSC. Those 8MBytes of systemd might push you from needing 1Gbit of RAM to 2Gbit, but it isn't going to push you from 8Gbit to 64Gbit - and as shown 1Gbit and 2Gbit don't meaningfully differ in price.

There are a lot of other factors involved with respinning hardware which make an upgrade a lot more expensive than simply a BOM increase. I can definitely understand why an existing product wouldn't be upgraded, but for a new product going to a bigger memory chip is a far smaller hurdle. The added software engineering time for working around RAM limitations can easily outweigh any money saved on the BOM, with choosing a smaller chip ending up being penny-wise pound-foolish.

And indeed, an extra $0.10 or even $0.01 can be a big deal. But those cheap systems usually aren't powerful enough to meaningfully run Linux in the first place: just because you can technically hack a $1.00 RP2040 or ESP32 into running Linux doesn't mean it is a good idea to actually do so. If your product is both cheap enough that it represents a significant fraction of your BOM and high-volume enough that you can afford the engineering time in the first place, why not use a purpose-built embedded OS like Zephyr?


The retail price of a finished product has very little to do with the cost of individual components and more with profit margins or customer segmentation.


Even Apple produced laptops with 8 GB RAM just recently, which they sold for hundreds of dollars with huge margins (AFAIK). If you're going to produce something with $50 cost, 1GB RAM cost will be meaningful.

In my experience production people will eat your soul for a single resistor if they can cut costs on it.


That is the Apple tax on everything the fruit company sells, they always push the margins as far as fans are willing to pay for.


That RAM is unified though, not a good comparison.

Also, just because something holds true at large numbers doesn't mean it scales all the way down. Either due to economies of scale, or the negligibly different architecture/components at that size.


The RAM is ordinary LPDDR5 organized into what is de facto just a large number of memory channels. It's not HBM or anything exotic, the cost of the RAM chips themselves are the same cost they are anywhere else.


Had vfork, unless you're on some not-BSD where vfork has remained relevant the last three decades?

DESCRIPTION vfork() was originally used to create new processes without fully copying the address space of the old process, which is horrendously inefficient in a paged environment. It was useful when the purpose of fork(2) would have been to create a new system context for an execve(2). Since fork(2) is now efficient, even in the above case, the need for vfork() has diminished. vfork() differs from fork(2) in that the parent is suspended until the child makes a call to execve(2) or an exit (either by a call to _exit(2) or abnormally). ... HISTORY The vfork() function call appeared in 3.0BSD with the additional semantics that the child process ran in the memory of the parent until it called execve(2) or exited. That sharing of memory was removed in 4.4BSD,


Huh?

Been a while since I was in the digital signage space but a lot of the equipment runs of the shelf RK3288 plugged into commercial displays. 2GB of RAM was pretty common. IIRC's though LG's WebOS TV's have minimum 2GB of RAM in the digital signage space directly built into the units themselves. I believe Samsung Tizen based units has similar RAM.

My router has 1GB of RAM in it. But even my cheapest routers have 128 to 256 MB of RAM. The Cisco 9300 Catalyst switches have about 8 GB of RAM, but switches with beefy amounts of RAM are getting pretty common now, even if somewhat pricey.

Yeah there's massive swathes of embedded space that's tiny. But the higher end stuff isn't exactly out of reach anymore. The RK3288's IIRC ran about $20 a unit at the time before I left the industry.


A decade ago, I had to settle for 512MB of ram for my Windows XP desktop.


Three decades perhaps. In 2014 it was Windows 8, SSDs, 5th gen i7 (Haswell), 8-16GB of DDR4 ram. Even the iPhone 6 came with 1GB of ram.


There's a huge chunk of people out there, not able to afford the latest.


But desktops with those specs are <$100, probably less than $50.

If they can't afford those specs then you're approaching the group of people who can't afford a computer in the first place.


Yes, that's about 3 billion people. They use whatever they can afford. Usually that's whatever is very old, because new software doesn't run well except on very new hardware which is more expensive.

I recently wanted an MP3 player for an art project. Local stores don't sell mp3 players anymore, they only sell smartphones. So I bought an MVNO smartphone for $40. When I charged it up and tried to use it, I thought maybe it was broken, because it would take 10-30 seconds to load a settings menu or app. Nope, all these bargain carrier-branded phones are that slow. The hardware is [somewhat] old, but the new Android OSes run like molasses on them. It was like going back in time. Remember how Windows 98 would make your hard drive screech for a good couple minutes as it struggled to juggle the swap memory so you could open MS Word? That's the experience with most software today with "affordable" hardware even a few years old.

So using Windows XP is often the only choice, if you don't have a lot of money, like 1/3rd of the planet. (And it's not just the third world. 59% of American households with K-12 school kids don't have a working computer, or it works too slowly to be useful)


This is simply so misleading thats its nonsense.

So you bought a new phone for $40, and it was a POS?

My kids use my old iPhone 7, which is in the same price bracket and is nothing like that. Its fast enough for Roblox, Minecraft, and certainly fast enough for a web browser.

I have an old Dell USFF that I use for server purposes, but its a Skylake (so newer than what was the original conversation), with ssd and 16Gb, and that was <£50. That can boot with systemd in under 5 seconds. It can boot to the full Gnome desktop in under 6 seconds. Firefox can start and get the Office.com site up in less than 3 seconds.

Because thats what were talking about.

> But desktops with those specs are <$100, probably less than $50.

I just checked eBay. Yes they still are.


I see you, but win XP was literally end of life 10 years ago.


So what?


Indeed, but still wrong.

In 2014 i got myself a second- or third-hand thinkpad X220, released in 2011, off eBay. It came with 8gb ram (two 4GB sticks) but it supported 16GB ram as well (two 8gb sticks).

The laptop (Asus A8Jc) I got when I was a teenager in ~2005 came with a dual core intel cpu and 1GB ram. So "512mb desktops" are way older than that.


That was the amount of RAM in my Athlon Windows XP multimedia PC, bought in 2002, by 2006 my newly acquired ThinkPad PC RAM was already measured in GB.


Three decades ago was 1995 and Windows 95 ran on 8MB of RAM.


That "occasional weird problem" is because systemd is not designed to be used with other software. While you technically still have the choice to roll your own non-systemd initramfs, it will be an uphill battle.


> That "occasional weird problem" is because systemd is not designed to be used with other software.

That's not true, systemd has an explicitly documented and supported initrd interface: https://systemd.io/INITRD_INTERFACE/

It's actually really easy, there's rarely a reason for the initrd to be more complex than a single 50-line busybox script on an embedded device with built-in drivers.


"really easy" except for the hotplug events that don't get to systemd and cause your problem.


I got an immediate answer with a solution from the upstream developer when I asked about it: I can't imagine how that could have been easier to solve. And it was also trivial to hack around by leaving the entry out of fstab and mounting it in a script.


Debian, Ubuntu and all derivatives thereof use initramfs-tools, which does not use systemd in the initrd, and things work just fine


Ubuntu is literally trying to replace it with dracut as we speak.


https://dracut-ng.github.io/dracut-ng/developer/compatabilit...

Dracut is used both in Void Linux and on Alpine without systemd and with busybox.

It even runs continuous integration with musl based containers.


Alpine uses mkinitfs per default, tho.


Not an uphill battle at all; Void Linux does this by default and has for many years.


Void Linux dropped systemd because it wouldn't work with a different libc than glibc. Which i would add as next point in my systemd lock-in list.


The embedded people will naturally look at busybox. I saw it running on a credit card scanner at Old Navy a few years ago.

It has an init:

  $ /home/busybox-1.35 init --help
  BusyBox v1.35.0 (2022-01-17 19:57:02 CET) multi-call binary.

  Usage: init

  Init is the first process started during boot. It never exits.
  It (re)spawns children according to /etc/inittab.
  Signals:
  HUP: reload /etc/inittab
  TSTP: stop respawning until CONT
  QUIT: re-exec another init
  USR1/TERM/USR2/INT: run halt/reboot/poweroff/Ctrl-Alt-Del script
On an embedded system, that will be a strong contender for pid 1.


This is what Alpine Linux use, Alpine also uses OpenRC for service startup.


Not uphill at all; try buildroot


Small typo: Zephyr https://zephyrproject.org/


Anything mass produced is going to be pressured to reduce BOM cost, RAM capacity is still and will continue to be a prime target.


I was just thinking "there is no way this person works on embedded devices". Then I read the last paragraph where you bring up "over 1GB of RAM". Explains that.


Systemd trolls on HN are really out of hand. This isn't even credible.


It could work the way OP described if they routed all outbound traffic via ISP-A regardless of source address, and ISP-A allowed spoofing. I think that's what they meant.


It is common practice for business subscribers (around UK) to get a /29 On the router we add a single /32 via the tunnel.

I think even the cheapest 100bucks business plans from many ISPs come with /28 or /29. It is a complete waste because we had like 10 offices with 3-5 persons with laptops and NO servers. The common question from the ISPs is: Do you need some IPs? When we answer no, they give us /29.


> but I need the flexibility to send packets with any of my source addresses through any of my ISPs

As someone who always enables rp_filter everywhere... I'm very curious why?


I use a read-only squashfs rootfs on top of dm-verity to get a trusted userspace. The initramfs is a 50 line shell script which calls veritysetup with the known root hash, and is itself part of the signed boot image. Only /var is writable.


> something that I'd like to get rid off in the future as well by adding a VPN layer on top.

What VPN software would you use? Personally I've never found anything I consider as trustworthy as OpenSSH.


Wireguard. Actually I setup also 2nd backup tunnel in case some upgrade or change messes up the first one.


+1 to WireGuard. For people new to it, there are some great scripts which set up and configure it for you like https://github.com/Nyr/wireguard-install


I use OpenVPN for historical reasons but today I’d go for Wireguard, much simpler, faster and integrated in the kernel, connectionless so much less friction when e.g. rebooting or changing networks.


Wireguard is quite good too, and if you’re up for some complication in your life you can do full mesh quite easily with it if your online infra is a bit distributed.


There's also helpers for wireguard meshes. Or become dependent on yet another service like tailscale (at least there's headscale) or zerotier.


It has no dhcp


you can statically configure if you don't have a zillion hosts


I use IPv6 magic


Aren't the cool kids using https://tailscale.com/ these days?


multicast/mDNS is broken, and it doesn't seem that it will be fixed anytime soon. This prevents hosts discovering each other as if they were on non-virtual LAN.

Personally, I find that having to set up an OIDC provider is too much overhead for a VPN. In a corporate setting, you likely have something already, but for individuals or small teams it's too much extra work.


How could that work with their architecture? They configure your device to use a DNS server running locally in their app. That resolves their device names to their internal device IP addresses. Their device names default to hostnames, just like mDNS does.

So to give an example if I enter http://geder in my browser I want that to resolve to 100.100.5.10 regardless of if I am on my home network (where geder is) or if I am on a train.

From my perspective half the reason to use tailscale is that it replaces why I'd want mDNS with less bugs.


That requires rewriting all software to follow tailscale's model instead of mDNS. Additionally, discovery would no longer work when devices are on the same physical network.


Except that mDNS is required for loads of things (via DNS-SD, which is basically the main reason to use mDNS).


Ain't no cool kids (in my world) using centralized for-profit services for essential things like that.


Pretty much all of it is open-source, and there's a self-hosted open-source alternative available for the only closed-source cloud-hosted component[0] - and that's even actively being promoted by Tailscale![1]

[0]: https://headscale.net/

[1]: https://tailscale.com/opensource#encouraging-headscale


Seems the cool kids are using Headscale then if anything, rather than Tailscale :)


Traditionally on Linux it's going to be AF_PACKET sockets: you can read and write raw ethernet frames via the usual syscalls.

https://www.man7.org/linux/man-pages/man7/packet.7.html

There are more modern interfaces which perform better.


Writing raw ethernet frames, can you do that as an unprivileged user? Or is the point more that we're doing userspace networking, which could nevertheless be run as root?


You need CAP_NET_RAW, but I'm pretty sure you can drop privileges after bind()


I'm surprised rateless fountain codes aren't mentioned! If you enjoy this sort of thing, you'll find the Luby Transform Code fascinating: https://en.wikipedia.org/wiki/Luby_transform_code

This paper is a really nice overview with more detail: https://switzernet.com/people/emin-gabrielyan/060112-capilla...

LT codes are used as the "outer code" in the linear time RaptorQ encoding specified in RFC6330: https://www.rfc-editor.org/rfc/rfc6330


I have implemented RaptorQ and RFC6330.

First, the rfc is pointlessly complex and optimized for files, not for streaming. if you want to play with it, manage blocks by yourself, ignore the asinine interleaving and block size management.

Second, the algorithm is actually split in two parts, and while the second (generation of repair blocks) is linear, the first is cubic on the number of messages that you put together in a block (~~ matrix gaussian elimination).

And while parts of both encoding and decoding can be cached, I think that "linear time" encoding for raptorq is actually just false marketing speak.


Are rateless fountain codes the better solution and if are there any systems that are using them?


Amplidata did this https://en.wikipedia.org/wiki/Amplidata

It's a great solution (fast, storage overhead of about 1.2%) iff your data is immutable.


Aren't there patent problems with fountain codes?


Luby's original paper was published in 2002. Not sure about RaptorQ though...


IIRC, Qualcomm still holds patents on RaptorQ. They do provide a blanket license exemption for implementing RFC6330.


Are there any more detailed technical reports about the current starliner problems out there yet? All I can find are a few paragraphs in press releases. Maybe we just have to wait.


You can use pthread_once() to simplify the initialization part: https://man.archlinux.org/man/pthread_once.3.en

I don't understand the desire not to link to pthread, it's about as ubiquitous as a library can be.

I doubt it's really a problem in this application... but naive userspace spinlocks are absolutely horrendous, see NOTES here: https://man.archlinux.org/man/pthread_spin_init.3.en

  User-space spin locks [...] are, by definition, prone to priority inversion and unbounded spin times. A programmer using spin locks must be exceptionally careful not only in the code, but also in terms of system configuration, thread placement, and priority assignment.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: