I see lots of large timeframes of the data (20 years), but nothing about how much data that actually is. I'm not very familiar with this kind of data, but am curious about the software side and how much data was needed, timeframes for processing the data, any special hardware required, etc....
How much data did you start with (gigabytes? terabytes?)?
What does this data actually look like? csv, custom binary format, some open spec maybe?
How much did you end up filtering out for the various reasons in the paper?
Was there anything that surprised you personally while working on this paper? It seems like most of this is confirming existing theory which is great, but curious if you had any new take aways.
Does the team want to continue to pursue this? If so, what do they hope to accomplish or maybe there's some odd data / behavior that you would like to continue to look at?
Yes - we analysed a lot of observations to do the calibration work - that's the advantage of a big public archive. After processing it takes several hundred gigabytes. It probably would take a few times more, but I threw away quite a lot of it which we don't use for this analysis (flared time periods and low energies). That doesn't included the input raw datasets, which might be a few TB - I've not checked, as they're on a different system.
The data, as I say above, is in FITS format, which is standard binary table format. The processed data are these event files (lists of photons), spectra (tables of energy vs number of photons), and detector responses (matrices to turn a model spectrum into an observed spectrum). Along the way there are lots of intermediate text and FITS files. I even used HDF5 for part of the code, but that's mainly because it's so easy to use from Python.
How much was filtered? Usually we need to filter around 40% of the time periods for an average observation due to flares caused by soft protons hitting the detector. In this analysis we also threw away a lot of the data at lower energies, as we were only interested in the high energy emission lines, where we can calibrate the detector. I don't know the number there - maybe we threw away 80% of the total events by filtering the low energies. Finally, we also throw away half of the events, to retain those with the best energy resolution (those where a photon hits a single pixel on the detector).
Surprises? For the Perseus cluster, it was nice when I made a map of the motions and ended up with something that looked like the simulations of sloshing. For Coma, I was surprised that the gas in the cluster still has the same velocity as the central galaxies - I would have thought that it should have slowed down - it will be interesting to discuss this further with theorists. I was also surprised by the complexity of the detector on the instrument. It seemed a simple idea when I started, but turned out to be rather tricky.
We're planning to pursue this further. We have new deep observations of two other nearby clusters. The aim is study "feedback" by active galactic nuclei - active black holes affecting their surroundings - in the centre of these clusters. They should be disturbing the gas/plasma and we hope to measure that, as that hasn't been done before. There are also some things we could do to improve the calibration technique if we have time. For example, we could also use photons which land on multiple pixels.