> I immediately found the results suspect, and think I have found what is actually going on. The dataset it was trained on was 2770 images, minus 982 of those used for validation. I posit that the system did not actually read any pictures from the brains, but simply overfitted all the training images into the network itself. For example, if one looks at a picture of a teddy bear, you'd get an overfitted picture of another teddy bear from the training dataset instead.
> The best evidence for this is a picture(1) from page 6 of the paper. Look at the second row. The building generated by 'mind reading' subject 2 and 4 look strikingly similar, but not very similar to the ground truth! From manually combing through the training dataset, I found a picture of a building that does look like that, and by scaling it down and cropping it exactly in the middle, it overlays rather closely(2) on the output that was ostensibly generated for an unrelated image.
> If so, at most they found that looking at similar subjects light up similar regions of the brain, putting Stable Diffusion on top of it serves no purpose. At worst it's entirely cherry-picked coincidences.
Yep, at best methods like these are more like classifiers than "reconstruction" in any kind of detail.
Still cool that you can classify brain activity, but ultimately reproducing a picture of a building here recovers about the same amount of information as printing the word "building" on the screen.
Note too that the paper states "We generated five images for each test image and selected the generated images with highest PSMs [perceptual similarity metrics].", so it actually directly admits that the presented images are cherry-picked at least once in a way that IMO is blatantly cheating (why take the best of 5 and not 5 million? Then the pictures will look even closer to the ground truth!)
If the pictures they showcase at the top are among the best results they got, it means it's double-cherry picked. In that case it wouldn't surprise me if you could get similar results from random noise.
fMRI is extremely coarse-grained, it also only measures blood flow. There's no way you can reconstruct what a person is thinking from just that, except at a really basic level.
If we want better brain-to-machine interfaces we'll need to work on better non-invasive (or at least invasive but low-risk) brain imaging
MEG has much better temporal resolution than MR, but it’s spatial resolution is only marginally better (according to the below source, figure 6). However it doesn’t help that MR has a vast array of acquisition methods and techniques.
The below link is also interesting for its discussion of fMRI.
I have a feeling this is going to involve NIR-II with fluorescent particles. To be clear, I'm not advocating for this, the thought of it freaks me out. Just because something isn't physically invasive, doesn't mean it isn't invasive in other ways!
Me thinks it's the thought that it could work that is freaky, not the mechanism itself. If you think governments demanding backdoors to encryption is bad now, what if that tech worked? All it would take is a kidnapping case like in "The Cell" for things to get very uncomfortable.
> NSD provides data acquired from a 7-Tesla fMRI scanner over 30–40 sessions during which each subject viewed three repetitions of 10,000 images. We analyzed data for four of the eight subjects who completed all imaging sessions (subj01, subj02, subj05, and subj07).
Imagine viewing 30,000 images in an MRI. Where do I sign up?
The paper states that they created a model per subject. I wonder if it would have been possible to train just one that works for all subjects.
Also isn’t the visual cortex active when dreaming? Would love to see a recording of my dreams some day.
I'll copy the first comment by https://news.ycombinator.com/user?id=Aransentin I usually copy just a part but it's very clear to explain what is the problem
> I immediately found the results suspect, and think I have found what is actually going on. The dataset it was trained on was 2770 images, minus 982 of those used for validation. I posit that the system did not actually read any pictures from the brains, but simply overfitted all the training images into the network itself. For example, if one looks at a picture of a teddy bear, you'd get an overfitted picture of another teddy bear from the training dataset instead.
> The best evidence for this is a picture(1) from page 6 of the paper. Look at the second row. The building generated by 'mind reading' subject 2 and 4 look strikingly similar, but not very similar to the ground truth! From manually combing through the training dataset, I found a picture of a building that does look like that, and by scaling it down and cropping it exactly in the middle, it overlays rather closely(2) on the output that was ostensibly generated for an unrelated image.
> If so, at most they found that looking at similar subjects light up similar regions of the brain, putting Stable Diffusion on top of it serves no purpose. At worst it's entirely cherry-picked coincidences.
> 1. https://i.imgur.com/ILCD2Mu.png .
> 2. https://i.imgur.com/ftMlGq8.png .