Identifying organic compounds with visible light

cookieperson · on March 19, 2023

This paper is pretty weak. Sorry to bust the AI hype train on this but collecting the refractive index of compounds a bunch of wavelengths is completely possible, but not revolutionary. I did it at a job, and so have my friends. In some ways RI based measures are harder to take then raman measures. Classification based on RI is also nothing new....

Also... There's some pretty serious challenges with using other people's data rather then actual data collected from a single instrument. Yes RI is a physical thing, but taking those measurements across many wavelengths involves error, sometimes instrument dependent error.

I won't bother looking at the code because neither the abstract nor the results particularly interest me as someone trained in this field(notably the author doesn't seem to be). Some part of me does wonder how they ran this experiment without testing on training data. happens all the time with newcomers. Again not that the claim is unbelievable, it is, it's just it is a worry given the premises posed in the paper itself. A symptom that this is probably low quality is that it's not published in a journal where peers who are interested in or trained to actually review this. Yes believe it or not there are 5 or some odd journals wayyy better suited for this kind of publication. Almost worry it was rejected elsewhere before landing in a p chem journal because... It doesn't seem very good, new, well written, or useful.

bildung · on March 19, 2023

Funnily enough using the refractive index to discern compounds is the modus operandi of one of the very fist spectrometer made by William Hyde Wollaston in 1802: https://www.en.silicann.com/blog/post/history-of-spectroscop... (disclaimer: I wrote that).

Very cool approach, though it will most probably only work for classification (i.e. what is this sample), whereas NIR spectroscopy (most common when working with organic compounds) allows for quantification of the sample analytes (who much of each analyte is in this sample).

cookieperson · on March 19, 2023

Yea their goal seems to be for classification. I'm completely unimpressed by this paper. I'm a former participant in this field, and I feel bad for the outsiders here thinking this is a breakthrough. Had I of reviewed this it would t have been accept and I'm far more liberal of a reviewer than a lot of people in the field. It's unsurprising to me this was published in a journal which the topic is a bad fit...

sargun · on March 19, 2023

This is potentially super powerful for the idea of a pocket spectrophotometer for everyone. This idea has been in many startup pitches over the last 15 years -- the most common consumer use case is identifying allergens. I can see this being useful beyond that too, with things like reagent free desktop drug testing systems that don't cost $25k.

cookieperson · on March 19, 2023

Not really. You can build a pocket raman spectrometer for probably 1000usd max.

Refractive index based classification across wavelengths is probably not going to help you find allergens. Unless you isolate the allergen from the mixture then take the measure and have it in your model...

kwhitefoot · on March 19, 2023

Sounds interesting but it's got a long way to go before it will be useful to anyone.

ginko · on March 19, 2023

Does the voting step with multiple classifiers make sense? My gut feeling would be that training a single larger network would be more effective.

gpcr1949 · on March 19, 2023

They use a random forest classifier, which is an ensemble model that gives a consensus result of several decision trees. One way to achieve this consensus is voting. Random forest models are commonly used in building chemical models like this (and in QSAR), because they are quite robust. Due to the typically small size of chemical data sets (dozens to thousands, typically), more sophisticated methods are not usable and do not perform better.

cookieperson · on March 19, 2023

Even then random forest is the wrong choice for this type of data. It should be the thing you do in your first hour of having it before choosing something more appropriate

cookieperson · on March 19, 2023

My experience is that there are far simpler, faster, and well known models to perform this type of classification and that the premise of this study is flawed in so many ways it could have absolutely no utility at all if attemptedly implemented in real life.

jplona · on March 19, 2023

Just from the title, I'm imagining this is more or less https://images.app.goo.gl/jGsA1QGurgprJbw78

cookieperson · on March 19, 2023

You made my day lol this is what I got from it too.