I see a lot of mention on various forums about the storage they used (5pb) but am just wondering if anyone know what kind of backend they used to house this? From what I saw there were too many disks - in the wrong type of enclosure - to be running on a single server, which suggests multiple physical servers. I've seen a prior CERN research paper on gluster and ceph (iirc) and am just wondering if anyone in the know could enlighten me?
The WaPo article also references a few of the interesting issues they had:
"Then they spent the two years parsing literal truckloads of data, some of which had to be shipped on hard drives from the South Pole and defrosted outside a supercomputer facility at MIT."