Which seemed crazy to me, because the disks were used in non-redundant configurations... if there were read errors on those disks, it would cause actual data corruption, and eventually caused servers several steps up the line to crash and set my pager off, which is how i found out about it.
That's the hard part of infrastructure as code; a lot of programmers don't understand (or don't think about?) what it means to have a failure. In this case, running the disks non-redundantly was reasonable; the system would have dealt just fine with the whole server falling over... but because it "recovered" the error, the error was propagated all the way up to my goddamn pager. (Infernal pager? sisyphean pager? that job had the most active pager I've worn in 20 or so odd years of wearing pagers.)
Most of my devops consulting these days is more on the human side of things (devs and ops not getting along, Managements Just Don't Understand, etc.) but whenever I end up in a design review this is still the first thing I ask: "how does this break, under what circumstances will it break, and how to we respond to it breaking without waking somebody with SSH access up at two in the morning?".
As long as you plan for that last part, (which can mean just wiping and reinstalling every time you get an unrecoverable read error, it can mean RMAing the disk after the first reallocated sector (as a RAID would) and it can mean doing something with something like zfs to catch errors and mark whole files bad. )
Centralized disk has it's own set of issues, though it is a possible solution.
The only possible cenarios I can think off are then even worst: data lost, uptime lost...