I actually disagree about the reasoning behind this strategy. iOS may be more important to Apple at the moment, in terms of market value, but there's a better reason to target it first. iOS is deployed to constrained systems, iPhone, iPad, appleTV... this is easier to test because there will be less variance to deal with than on macOS deployments. It makes a ton of sense to start there for technical reasons so that they can validate it in more consistent environments.
More generally, Apple was providing working OS upgrades for years while the notion of upgrading Windows between major versions was basically laughable. Wasn't Windows 8 the first version of Windows that you could actually upgrade to and have the system boot successfully? Maybe Vista to 7 worked for people. I never bothered trying. But I remember upgrading from OS X 10.3 to 10.4 and being floored when everything just worked.
If older editions of Windows just forcibly purged all the drivers when you upgraded, and told you to reinstall them afresh from the installation CD/OEM site (where you'd then get a version that's maybe for your current OS instead), that could have saved everyone quite a few headaches.
Then again, with major peripherals (displays, keyboards/mice, Ethernet cards, USB controllers) being much less standard than today, ripping out the OEM's driver could wedge your computer.
i have upgraded (or my dad had anyways) from win 3.11 to win 95 and win 95 to 98. i think i clean installed win xp when i got a new computer but eventually i upgraded that to win 7.
all of them ran fine. or was that just not the norm?
What iOS 10.3 did was basically the equivalent of having a Windows upgrade which converted FAT/FAT32 to NTFS in-place without the user having to manage the process.
I wonder if Apple will take the same approach for macOS (convert manually if you want for now, or we'll use it if you reinstall). Seems like a far greater risk for Apple if upgrading the OS breaks something for a lot of people, when they want users to be completely comfortable about updates.
It definitely worked on the boot partition at least on XP, not sure about 2k. If it can’t lock the drive it schedules it to be done at boot time (similar to on-boot chkdsk).
I think my failure rate got to about 30% before I stopped even trying to run the upgrade and just did a fresh install after backing everything up.
* Siri on the desktop
* Gatekeeper getting more restrictive
* Neglect of pro users
* New stuff showing up on iOS first
Possibility to help the user make a backup via iCloud (harder to do on Mac).
Also, you might argue that if you had to lose it, you'd rather lose data on your phone then your computer. I backup my computer every hour with Time Machine, but if I lost my phone I would care the slightest if I had to install again from a 10-day-old backup.
Here's a good article addressing bit rot: https://lockss.org/locksswiki/files/ACM2010.pdf
The first assertion is simply wrong: the internet is littered with people who have corrupt media files which cause user-visible errors. Beyond that, however, RAID can be part of a strategy but is far from sufficient. The paper you linked provides a good discussion of the reasons why but the root mistake is assuming that failures are perfectly distributed random events at a single point.
End-to-end strong integrity checks are so important because e.g. in that scenario with a two-drive RAID array you'd still be subject to errors caused transferring the data from the host to the RAID controller, on the controller itself, in communication to the drives (one reason why it's important to mix vendors & models), and on the path back when reading data. You'd also have to worry about corruption in memory, errors caused by e.g. power fluctuation or other hardware events which may be grossly underestimated because they're silently detected and retried until you get lucky and one of the 100,000 errors happened to generate a valid checksum, etc.
Generating a cryptographic hash as early as possible and checking it at every stage is key both because it catches almost all of those scenarios but also because it allows you to confidently report a positive validation result rather than just the absence of detected errors.
More importantly, however, is that this isn't a reliably random process but often a highly-correlated one. Many users won't have a problem but the people who do often have many files affected. I've seen multiple cases where that wasn't noticed until after their backup software, cloud service, etc. had copied the corrupted bits, not to mention cases where data was corrupted on a RAID array which wasn't regularly scrubbed and so the corruption wasn't noticed until the other disk failed, long after the invalid data had been written to tape as well.
Apple devices, save for their least sold model (MacPro), in their least sold product line (Mac) are consumer devices with no ECC RAM. Most people don't know what ECC RAM is, most don't care, and won't pay more for it.
For Apple, it doesn't make sense to switch to ECC for system memory, however, if they implement storage ECC at block level in their SSD controller (Apple designes their own SSD controllers), they can obtain the same functionality at much lower cost (only flash controller RAM needs to be ECC).
Huh! I assumed they used off the shelf stuff. Do you have any articles about this? :)
They mentioned the '15 MacBook was the first to use Apple's custom SSD controller, which iFixit took a picture of here: https://www.ifixit.com/Teardown/Retina+Macbook+2015+Teardown...
They most likely started off from Apple's purchase of Anobit, the custom flash controller designer, back in 2011: https://en.wikipedia.org/wiki/Anobit
All their SSD's use their controllers.
As for external disks, Apple couldn't care less, for them, the future is cloud storage.
From their blog post:
> The easiest way to protect against hardware failure is to keep multiple copies of the data in different hardware failure domains. While that does work, it is fairly resource-intensive, and we knew we could do better. We wondered, “Can we store fewer than two copies of the same data and still protect against loss?”
But the point is no filesystem can protect you from unreliable RAM.
Though we're talking about Optane, then that's different. In the future (with solid OS support), Optane will be both RAM and non-volatile storage - and you really don't want data corruption there. Has Intel announced anything regarding ECC, parity or other systems in-place to protect against bitrot in Optane?
This is a major danger with database servers, as regular bit rot can be detected or corrected above the filesystem level, but no systems exist that can garantee protection to data in ram. Since databases try to do most/all operations in ram before flushing to disk there's a huge risk of silent data corruption.
Adding bitrot protection to the filesystem only gives you more protection if the drive has crappy error correction. Adding protection to ram adds checks to a system that usually has none
Would love to read a source on this.
The disk has an onboard data buffer, usually implemented in DRAM. That DRAM chip is likely sourced by the lowest bidder, and it is likely not to have either parity or ECC. Also the CPU used in the drive can introduce its own data errors and corruption. It doesn't even have to be hardware. A firmware bug can also corrupt data.
There are numerous buffers and storage elements involved in transferring the data to/from the disk's onboard memory to/from the computer's RAM. Many/most of those elements are not checked for parity. Data is subject to corruption in transit.
That's what makes a filesystem like ZFS so interesting. It checks the entire data path. It doesn't care where the corruption occurs.
Sure, that's a small possibility, but are we trying to get to 0% chance of data corruption?
And look, we're already back to, you can only get RAM from Apple, because so many of their products come with it soldered on the logic board with no upgrade path.
A special controller isn't a valid substitute for other measures apple hasn't historically been any good at designing filesystems and they aren't any good now.
People died in mangled messes but the sky didn't fall then either.
The status quo is rarely a sufficient argument because the human race is pretty much terrible at everything improving things only slowly, incrementally accruing useful strategies and procedures.
We built bridges well via centuries of practice, software is less mature.
And I didn't say it is. My answer was to the parent who singled-out Apple as somehow special in neglecting this.
In my time at Microsoft I did see a number of workarounds for bad disks in Windows source, in ntfs.sys and elsewhere.
However I agree with your overall assessment, it's not as if anyone is running zfs or similar as a default.
ZFS can't really help you with that as the data was likely damaged in transit rather than on disk; though with its own 256 bit hash, it is likely to detect those faulty system components earlier than later.
I encountered this going through a copy of my photos stored on a pair of WD Greens using NTFS 3-4 years ago. The original copy on a ZFS machine was fine. I found a few others, and promptly stopped using those drives.
Two years ago I had repeated bursts of ZFS checksum errors from a pair of SanDisk SSDs. Evidently TRIM didn't quite work perfectly 100% of the time, and caused data corruption - luckily ZFS was always able to repair it, and it being detected meant I could do something about it early - I updated firmware and the issue went away. Last year it came back after an OS update, and I just turned TRIM off completely (I guess it was sensitive to TRIM patterns and those changed).
Last year I also had a Toshiba HDD forget how to IO properly, and got a constant stream of ZFS checksum errors from it until I yanked it from the hot-swap bay and reinserted it. It resilvered and scrubbed fine.
These aren't the only times I've seen checksum errors and silent corruption, they're just the most recent. ZFS lost a file once, and was very noisy about it - the status message for the lost metadata stayed until I recreated the pool. NTFS, UFS2, ext2, all were completely silent on the fact that they were showing me data that was clearly wrong.
I don't trust disks, or IO controllers, and I don't trust filesystems that do. Neither should you.
- Run MemTest86 on hardware before using it as storage
- Use an FS that does checksums
Unfortunately, the latter doesn't seem to apply to APFS.
AMD (at least in the past), includes it on all parts so that it's up to the consumer to choose.
 - http://www.hardwarecanucks.com/forum/hardware-canucks-review...
 - https://www.reddit.com/r/Amd/comments/5x4hxu/we_are_amd_crea...
Do you have any idea the cost of running validation tests? I'm not the least bit concerned that they haven't "validated" the ECC functionality. It's enabled, they know it works, it's the same ECC they use on server class chips, and if someone found a bug I have no doubt they'd issue microcode to fix it.
Page 5 is perhaps the most important one, where it observes that neither Windows nor Linux appear to react by halting to a UE, and Windows can't quite figure out that ECC is enabled on the platform and parse the notifications it gets as such.
So, sure, I should concede that it is "enabled" on all parts, I was wrong. But that doesn't mean it should be trusted on any of them.
 - http://www.hardwarecanucks.com/forum/hardware-canucks-review...
My home media center PC / NAS runs an i3-4370 CPU with ECC RAM
Most motherboards have the extra memory traces in place even if they don't enable ECC.
"using it as a differentiator between server and desktop"
so it would cost them SOMETHING...
Those are two segmented markets. If Intel's revenue is above their R&D and other expenses, then whatever profit they make milking enterprises/server companies is independent of what they make milking us.
But selling "locked batteries" in a product where the batteries are 80% of the innovation/feature set, and where after-market batteries could cause all kinds of issues, is one thing.
Whereas selling memory at triple or more the price just because you switched on some feature (ECC) that would have costed nothing to switch on for everybody is another thing.
Product market segmentation is a very reasonable thing to do. Why do people make it out like a bad thing? If you ever run a business, you will want to find a way to get big enterprise to pay X, and small business to pay X/4. ECC is something businesses care way more about than gamers, so why not charge more for it?
Because most of us would rather pay a price that mostly reflects costs + some reasonable profit, not some artificially created segment, not fuel extravagant profits, not pay for future research, not pay for the company to have cash reserves, etc etc.
If it offends you that your device has some enterprise feature that you don't really need turned off unless you pay $x... sorry? But you don't really have a right to the feature for some margin % that you deem fair.
If your flash device never moved the static data, the only flash blocks that would get wear cycles would be the flash blocks that contained dynamic (normally changed and rewritten) data. The result would be the blocks of flash that were not static would quickly wear out and the blocks of flash that were static would have a lot of unused write cycles available.
In order to use all of the wear cycles of all of the blocks, the static data has to be moved regularly so the blocks all have a (roughly) equal number of wear cycles. Every time the data is moved, there is an opportunity for data corruption.
The flash data blocks (typically) have ECC (error checking and correction) which is designed to prevent data corruption. There are limitations to ECC:
* ECC can only correct a limited number of errors.
* Flash memory is not a perfect storage medium, it can "bit rot" too - the primary reason for ECC with flash is to "hide" the inherent bit rotting of flash. "MLC" flash chips aggravate the problem because their margins are smaller.
* If a memory controller does a wear leveling move and the source data is bad, beyond the ability of the ECC to correct, it has no way to correct that error and (generally) has no way to inform the user that their file (system) has suffered corruption.
In Jean-Louis Gassée's anecdote (which is typical), his notification that his wife's files were corrupt was an backup failure notification. The backup failure was telling him that it could not read files, but it was not clear to him (and would not be clear to most users) that the root cause was file corruption, not a backup problem per se.
par2create -r5 -n2 example.par2 *.jpg
In what way?
The other alternative is having a server that's up and running all the time, exposed to the internet (or complicate the setup with a VPN), so I can sync. Operations would take a long time (via the internet) or I would anyway need to transfer the data to my computer, work on it, sync it back. During this time, any protections that ZFS offers are null since anything could happen in my computer and I can't test for it locally.
ZFS is great. But it's not the answer to everything.
Only if you use it everywhere. Including on your laptop.
I have a cron job which runs every month and verifies each photo album; that picked up the error.
Switched back to vinyl for my music, my 20 year old Texhnics 1210 will probably out last my current laptop and the next!
"Record wear can be reduced to virtual insignificance, however, by the use of a high-quality, correctly adjusted turntable and tonearm, a high-compliance magnetic cartridge with a high-end stylus in good condition, and careful record handling, with non-abrasive removal of dust before playing and other cleaning if necessary."
In other words, record wear is negligible if you buy good equipment and take good care of it. Just like the risk of bit rot, catastrophic crashes etc. is negligible if you buy good storage equipment and take good care of your data (good backups etc.)
Perhaps the one big advantage of vinyl over digital is that the shelf life of unused vinyl is larger than a human lifetime. A disk stored on a shelf, however, can suffer damage in many ways and is guaranteed to be very hard to connect to your computer in ten-twenty years.
He also invented the legendary BeOS.
BeFS was way way ahead of its time. Really a shame that OS didn't get picked up by more people.
Both my iPhone 7 and my iPad Pro only have 32G of storage and the new file system gave me back extra storage due to better utilization. It was interesting how smoothly the transition went.
There's a little more about the early Mac software history at Folklore.org, including another screen shot of what eventually became Finder:
For the old-timers, it was a shape table and multiple draw commands of numbered (ascii value - 32) shapes would allow you to write proportionally spaced text to the graphical screen. It was the same format Take-1 used (we used their programmer's toolkit).
So, Cream 12 came down from Altos all the way to our Apple II+'s.
ZFS/Raid + ECC ram are complicated and expensive and are not even a 100% guarantee against bit rot.
Does it make sense? No idea.
In any case, the "hash" appears to be CRC32, stored in extended attributes:
$ xattr .inputrc
$ xattr -px 'com.apple.finder.copy.source.checksum#N' .inputrc
26 E5 4A AB
$ cksum .inputrc
2873812262 65 .inputrc
$ printf '%x\n' "$(cksum .inputrc | cut -d ' ' -f 1)"
If so, who cares about that?
With the 'unix epoch' way of doing things (counting seconds since 1970), a (signed) 32bit int will run out of values in 2038. If they'd counted milliseconds since 1970, they would have exhausted int32_t in less than 36 minutes, and nanoseconds would have exhausted same in less than 2 seconds.
So the move to 64bit timestamps solves the '2038 problem' (for the filesystem, at least) - but at some point someone has decided this filesystem does not need to support the year 292,277,026,596AD, and has chosen to use some of the space for granularity instead. (Given 292 billion years is 20 times the estimated age of the universe, they were probably correct in their assumption that we'll have a new filesystem before then).
If the number of people who need to differentiate between two files created in the same second, is more than the number of people who expect their Watch to still work after our sun is a cold shrivelled mass - they've made the more logical use of the valuespace.
I don't know where you came up with that number. It is from Apple documentation?
A true nanosecond timestamp wouldn't get you anywhere near the number you suggest. Here's a back of the envelope calculation:
A 32-bit timestamp with one-second resolution is good for, more or less, 68 years. Or 136 years if unsigned. That's the traditional "unix epoch" setup.
Now add 32 more bits to the LSBs of the timestamp. There are 1 billion nanoseconds in a second. A 32 bit integer can hold over 4 billion different values. So the 32 LSBs, if they represent nanoseconds, will overflow in just over 4 seconds.
What that means is the 32 MSBs now have a range of just over 4x the traditional Unix timestamp. If it was 68 or 136 years before, it becomes 291 or 582 years.
That's what I would have done if I were Apple. I would have set the LSB of a 64 bit unsigned counter to represent 1 nanosecond. I would keep year 0 as 1970. So, problem solved until (roughly) 1970 + 582 = 2552.
This is why I call nanoseconds a 'free' feature - it's clear it's time to solve the '2038 problem' now - and simply moving to a 64bit timestamp would solve that. But rather than solving this problem for the next 292 billion years, we can make better use of the timestamp today. So, as you say, providing nanosecond granularity for the next 500 years, being more useful than one-second granularity for the next 292 billion.
Milliseconds would have lasted 24 days. Microseconds would have been 36 mins.
"It also massively increases the granularity of object time-stamping: APFS supports nanosecond time stamp granularity rather than the 1-second time stamp granularity in HFS+. Nanosecond timestamps are important in a modern file system because they help with atomicity—in a file system that keeps a record of writes, having nanosecond-level granularity is important in tracking the order of operations."
Full thing is at: https://arstechnica.com/apple/2016/06/digging-into-the-dev-d...
And if you're doing microseconds, why not go for the full nanoseconds?
Funny way to refer to Microsoft.
I probably would have left the original title alone, personally, but you could make a case that it was misleading. HN moderators apparently agreed and have since changed it.