Hacker News new | past | comments | ask | show | jobs | submit login

I'm still working through the paper which is fascinating and have only had a chance to glance it over so forgive me if I ask any questions that are answered in the paper.

I see lots of large timeframes of the data (20 years), but nothing about how much data that actually is. I'm not very familiar with this kind of data, but am curious about the software side and how much data was needed, timeframes for processing the data, any special hardware required, etc....

How much data did you start with (gigabytes? terabytes?)?

What does this data actually look like? csv, custom binary format, some open spec maybe?

How much did you end up filtering out for the various reasons in the paper?

Was there anything that surprised you personally while working on this paper? It seems like most of this is confirming existing theory which is great, but curious if you had any new take aways.

Does the team want to continue to pursue this? If so, what do they hope to accomplish or maybe there's some odd data / behavior that you would like to continue to look at?

Software wise, we use a standard pipeline that reduces the data from the space observatory into the standard astronomy format (FITS), provided by the European Space Agency. The output is in the form of events - X-ray photons which landed on a detector at a particular time. This can then be turned into spectra with the standard software, extracting from particular spatial regions. The spectra can be fit with a standard tool in X-ray astronomy (Xspec), but this also relies on spectral models (some standard, some I made for this project). However, a lot of the hard work is in the form of Python code I made for running the pipeline, extracting spectra, collating the spectra, adding them together, fitting them, collating the results and doing fits. There are also some scripts in tcl for controlling Xspec. The plots and things were done with Veusz (which I wrote) and ds9 (a standard astronomy image GUI).

Yes - we analysed a lot of observations to do the calibration work - that's the advantage of a big public archive. After processing it takes several hundred gigabytes. It probably would take a few times more, but I threw away quite a lot of it which we don't use for this analysis (flared time periods and low energies). That doesn't included the input raw datasets, which might be a few TB - I've not checked, as they're on a different system.

The data, as I say above, is in FITS format, which is standard binary table format. The processed data are these event files (lists of photons), spectra (tables of energy vs number of photons), and detector responses (matrices to turn a model spectrum into an observed spectrum). Along the way there are lots of intermediate text and FITS files. I even used HDF5 for part of the code, but that's mainly because it's so easy to use from Python.

How much was filtered? Usually we need to filter around 40% of the time periods for an average observation due to flares caused by soft protons hitting the detector. In this analysis we also threw away a lot of the data at lower energies, as we were only interested in the high energy emission lines, where we can calibrate the detector. I don't know the number there - maybe we threw away 80% of the total events by filtering the low energies. Finally, we also throw away half of the events, to retain those with the best energy resolution (those where a photon hits a single pixel on the detector).

Surprises? For the Perseus cluster, it was nice when I made a map of the motions and ended up with something that looked like the simulations of sloshing. For Coma, I was surprised that the gas in the cluster still has the same velocity as the central galaxies - I would have thought that it should have slowed down - it will be interesting to discuss this further with theorists. I was also surprised by the complexity of the detector on the instrument. It seemed a simple idea when I started, but turned out to be rather tricky.

We're planning to pursue this further. We have new deep observations of two other nearby clusters. The aim is study "feedback" by active galactic nuclei - active black holes affecting their surroundings - in the centre of these clusters. They should be disturbing the gas/plasma and we hope to measure that, as that hasn't been done before. There are also some things we could do to improve the calibration technique if we have time. For example, we could also use photons which land on multiple pixels.

Applications are open for YC Summer 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact