Hacker News new | past | comments | ask | show | jobs | submit login
High-res image reconstruction, latent diffusion models from human brain activity (biorxiv.org)
99 points by rntn on April 28, 2023 | hide | past | favorite | 18 comments



Previous discussion https://news.ycombinator.com/item?id=35012981 (459 points | 56 days ago | 159 comments)

I'll copy the first comment by https://news.ycombinator.com/user?id=Aransentin I usually copy just a part but it's very clear to explain what is the problem

> I immediately found the results suspect, and think I have found what is actually going on. The dataset it was trained on was 2770 images, minus 982 of those used for validation. I posit that the system did not actually read any pictures from the brains, but simply overfitted all the training images into the network itself. For example, if one looks at a picture of a teddy bear, you'd get an overfitted picture of another teddy bear from the training dataset instead.

> The best evidence for this is a picture(1) from page 6 of the paper. Look at the second row. The building generated by 'mind reading' subject 2 and 4 look strikingly similar, but not very similar to the ground truth! From manually combing through the training dataset, I found a picture of a building that does look like that, and by scaling it down and cropping it exactly in the middle, it overlays rather closely(2) on the output that was ostensibly generated for an unrelated image.

> If so, at most they found that looking at similar subjects light up similar regions of the brain, putting Stable Diffusion on top of it serves no purpose. At worst it's entirely cherry-picked coincidences.

> 1. https://i.imgur.com/ILCD2Mu.png .

> 2. https://i.imgur.com/ftMlGq8.png .


Yep, at best methods like these are more like classifiers than "reconstruction" in any kind of detail.

Still cool that you can classify brain activity, but ultimately reproducing a picture of a building here recovers about the same amount of information as printing the word "building" on the screen.


Note too that the paper states "We generated five images for each test image and selected the generated images with highest PSMs [perceptual similarity metrics].", so it actually directly admits that the presented images are cherry-picked at least once in a way that IMO is blatantly cheating (why take the best of 5 and not 5 million? Then the pictures will look even closer to the ground truth!)

If the pictures they showcase at the top are among the best results they got, it means it's double-cherry picked. In that case it wouldn't surprise me if you could get similar results from random noise.


fMRI is extremely coarse-grained, it also only measures blood flow. There's no way you can reconstruct what a person is thinking from just that, except at a really basic level.

If we want better brain-to-machine interfaces we'll need to work on better non-invasive (or at least invasive but low-risk) brain imaging


There's magnetoencephalography[1][2] which from what I could see offers a more direct route to measure brain activity.

[1]: https://en.wikipedia.org/wiki/Magnetoencephalography

[2]: https://hebergement.universite-paris-saclay.fr/supraconducti...


MEG has much better temporal resolution than MR, but it’s spatial resolution is only marginally better (according to the below source, figure 6). However it doesn’t help that MR has a vast array of acquisition methods and techniques.

The below link is also interesting for its discussion of fMRI.

https://www.nature.com/scitable/blog/brain-metrics/what_does...


I have a feeling this is going to involve NIR-II with fluorescent particles. To be clear, I'm not advocating for this, the thought of it freaks me out. Just because something isn't physically invasive, doesn't mean it isn't invasive in other ways!


It freaks you out? Isn't NIR dye pretty safe to use?


Reading thoughts is the ultimate invasion of privacy. Nothing weird about that freaking people out.


Me thinks it's the thought that it could work that is freaky, not the mechanism itself. If you think governments demanding backdoors to encryption is bad now, what if that tech worked? All it would take is a kidnapping case like in "The Cell" for things to get very uncomfortable.


> NSD provides data acquired from a 7-Tesla fMRI scanner over 30–40 sessions during which each subject viewed three repetitions of 10,000 images. We analyzed data for four of the eight subjects who completed all imaging sessions (subj01, subj02, subj05, and subj07).

Imagine viewing 30,000 images in an MRI. Where do I sign up?


Yeah honestly that would be pretty fun, to be honest. :)


That would be torture to this adhd claustrophobe!


you'd go insane from the incessant clicking from the mri machine


The paper states that they created a model per subject. I wonder if it would have been possible to train just one that works for all subjects. Also isn’t the visual cortex active when dreaming? Would love to see a recording of my dreams some day.


They didnt even validate it with a dead fish.

https://www.wired.com/2009/09/fmrisalmon/


Is non-invasive brain imaging to get the ‘thought level’ resolution an impossible problem?

Imagine trying to image your CPU/motherboard to see what the state of its electrons are in?


This is great. I would start a company for tinfoil helmets immediately. Huge market. Big rewards.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: