
Using Apache Spark to Analyze Large Neuroimaging Datasets - gk1
https://blog.dominodatalab.com/pca-on-very-large-neuroimaging-datasets-using-pyspark/
======
cottonseed
If you're doing neuroscience image analysis, you probably want to take a look
at Bolt, Thunder, Lightning:

[http://bolt-project.org/](http://bolt-project.org/) [http://thunder-
project.org/](http://thunder-project.org/) [http://lightning-
viz.org/](http://lightning-viz.org/)

and associated work going on at the Freeman lab at HMMI:

[https://www.janelia.org/lab/freeman-lab](https://www.janelia.org/lab/freeman-
lab)

~~~
fitzwatermellow
Good stuff, thanks! Recently saw this Nature article on the state-of-the-art
in neuroscience imaging and it's amazing:

The Human Connectome Project's neuroimaging approach

[http://www.nature.com/neuro/journal/v19/n9/full/nn.4361.html](http://www.nature.com/neuro/journal/v19/n9/full/nn.4361.html)

------
flxb
Did you try applying sklearn's PCA to a subsampled dataset? Randomly sampling
1% of your dataset would probably allow you to find the first four principle
components in less than 30 minutes.

It would be interesting to see whether these components are significantly
different to the ones you got on the whole dataset.

