There are a few alternatives designs that aim to mitigate this particular issue by either preventing crosstalk  or by using multiple LED wavelengths and angles - the second one is something a med device engineer told me about a few months ago, but I'm having trouble finding any papers or information on it.
That effect is going to be tiny, though. Wikipedia is telling me a hemoglobin tetramer masses "about" 64,000 daltons, so a complex loaded up with 4CO will mass 64,111 vs 64,127 amu. 0.024% difference.
On how these devices work, it is typically two LEDs the produce light that bounces through the skin, some light is absorbed by blood (different absorption by different oxidation and density) then the reflected light is captured by a photodiode. All of this is on one surface, they are very basic. The density changes rapidly and that's how you capture the pulse (easy to get btw) and the oxidation is based on absolute levels (THIS IS THE PROBLEM!!!!). After building one I learned to not take these things too seriously, especially fingertip and wrist mounted ones.
I encourage others to build them. You can do most of it in a weekend if you have some soldering experience (best if you can do surface mount soldering), some programming, and are comfortable doing data analysis. And no, I do not expect my device to be on par with the ones in hospitals, but I was quickly able to match readings from a finger based device that I bought from the local drug store.
I'm not very familiar, so I'm not sure how to interpret this. Does this describe the normal procedure for when you can rely on a pulse oximeter, or does it describe the modified procedure for when you can't?
"If your service has a pulse oximeter, you should have a protocol describing when to use it. Generally this will include all patients complaining of respiratory problems or otherwise at risk for hypoxia. When used properly, the device can help you to assess the effectiveness of artificial respirations, oxygen therapy, and bronchodilator (inhaler) therapy."
Pulse oximetry is for monitoring change. When a critical value is needed for medical intervention, such as a decision to put in a breathing tube, tests like arterial blood gas has are used.
For one, according to the article the discrepancy was found to be up to 8 points in some cases, not 3. And it seems reasonable that quick, practical decisions in hospitals are made based on point thresholds, not purely huge point leaps as you suggest.
I haven't dug any further than the article and your comment to verify these facts. But rather than find a way to prove why this doesn't matter, why don't we assume it does? Then the knowledge and awareness might spread a little bit more through us, and if it is an issue, it might be that much more likely to be solved. There's no penalty for being wrong with a weakly held but well-meaning assumption. At the very least, we can be more aware of another possible dimension of bias in technology.
Because it should be casually dismissed. This person doesn't know what they're talking about, refused to listen to the people who told them they didn't know what they are talking about, and wrote an article full of nonsense.
If the assumption is incorrect, then it would be misinformation that we would be spreading.
Also, doctors are already aware about how oximeters may be inaccurate. It doesn't mean that they still can't be useful.
First, let's be clear:
> Doctors know that these pulse ox meters aren't very accurate and use them accordingly
Some doctors know...
> Nobody's changing their plan because you have a pulse ox of 93 instead of 95. They're there to quickly measure big changes that indicate a problem.
Sometimes, perhaps even often, they're used for this. It's one use case of pulse oximetry, and it's an important one.
I read your post as extrapolating general claims from common cases, and erasing the real experiences of people that fall outside of these common cases.
Bias often happens at the margins. That doesn't make it less real for the people who suffer from it, even if most people don't suffer, even if the common cases work well.
It can be hard to see those margins when you aren't in them, but that doesn't mean they don't exist -- they are still worth writing about and trying to fix.
As someone else pointed out, this isn't only about melanin in the skin. Darkened skin for any reason will cause this (burns, smoking, dirt, etc...). Medical personnel are trained to account for this partial accuracy.
The other fact is that there are newer techniques being worked on to improve the technology.
This is absolutely bias, and it is entirely due to engineering choices made by people, not nature.
Bias doesn't happen in physics. Bias happens where humans and physics interact.
> Medical personnel are trained to account for this
Medical personnel aren't the only people who use pulse oximeters and need them to read reliably. Beyond that, this thread and the article it discusses are full of examples of why "medical personnel are trained" isn't enough of an answer here.
"Cargo Cult Science" (esp. Millikan).
Einstein to M. Curie:
Planck: Science advances one funeral at a time. (See also Kuhn.)
Yes, the way in which varying wavelengths of light interact with varying configurations of bulk matter is a physical phenomenon, and not subject to bias.
But that isn't the same as saying, as the comment to which I originally replied seemed to do, that no bias can exist in the way in which that phenomenon is interpreted by humans.
You sound pretty emotional, maybe tone it down and try to think objectively.
Just because there's an underlying physical reason for the device to be inaccurate on a subset of the population, doesn't mean that it's fair to use that device on the impacted population. It's been accepted by the medical community, despite the additional training required to correctly use the device -- amounting to guesswork when the medical personnel remember that it's necessary. Accounting for human error, this means that affected populations will receive a lower quality of care.
Just a hypothetical. Would it be accepted if it was accurate for black women, and required guesswork when treating white men? Because there is a long history of significant bias in medical studies. Just because "it's just physics" explains the device's flaw, doesn't mean that acceptance of a flawed device isn't a result of systemic bias.
The only thing you could claim injustice for would be if there was no other mechanism for measuring blood oxygen other than something that was known to not work properly for people of color.
No one — myself including — is saying that physics is biased, or that this tool is inherently biased. The headline is misleading, it’s really talking about the interpretation of the number that’s biased, not the physics. The number is just a number.
The bias comes through the typical institutional use of this tool, as described in the article: the “normal” regions are based on “normal” for lightly pigmented people, so heavily pigmented people are by default not in the “normal” range. Do some doctors realize this? Of course they do. But as programmers we all know the power of defaults some doctors won’t realize this, or will forget, or will realize it but it will still affect them.
Other policies (like the Medicaid one described in the article) also ignore these differences.
It’s not the physics that’s biased. It’s the institutional and individual assumptions around the tool.
However yes, in my experience I never had enough difference in dark-skinned individuals to be clinically meaningful.
Threshold values are used to.make decisions, dark skin people do read higher (meaning decisions not informed by knowledge of the difference are likely to delay intervention for darker skinned people) and also have worse outcome with otherwise similar presentation for many of the conditions where pulse oximetry would be a factor in decision-making. It's quite possible that the outcome difference is in part because of the measurement difference, even if the likely impact in each individual case is minor.
The problem is that that '95' might really be a '90' - surely a drop from 98 to 90 is a cause for concern. If that information is being hidden, it needs to be fixed, regardless of other, more subjective fallback methods.
For reference I have two pulse oximeters at home and they routinely give me results that differ by 2-3%.
Seems like an issue we should fix, but I don't know what impact this would have in a clinical setting given that they use multiple criteria and the devices are known to be inaccurate.
The SpO2 measurement performance of a device must be verified before the device is released to the market. The U.S. Food and Drug Administration (FDA) suggests using standards presented in the following:
ISO 80601-2-61:2017 – Medical electrical equipment -- Part 2-61: Particular requirements for basic safety and essential performance of pulse oximeter equipment
Pulse Oximeters – Premarket Notification Submissions [510(k)s] Guidance for Industry and Food and Drug Administration Staff
According to these regulations, manufacturers need to declare the calibration range, reference, accuracy, methods of calibration and range of displayed saturation level. Furthermore, for the performance assessment, the FDA requires at least 200 data points equally spaced over a saturation range of 70% to 100%. Test subjects should have different ages, gender, and skin tones. For instance, the FDA requires that at least 30% of the volunteers must have dark skin pigmentation. The overall error or the root mean square error (RMSE) must be below 3.0% for transmissive pulse oximetry and below 3.5% for reflective pulse oximetry.
© 26 Mar, 2019, Maxim Integrated Products, Inc.
I get that it is interesting to research optically measuring systems to 'search for bias' and I'm sure there are going to be biases in varying degrees, but that doesn't mean it's always for the same reason or with the same result.
There are very real problems of course, take the example with the photographic film development from the article: that is a problem because a picture that is supposed to reflect the subject you photographed no longer does what it was suppsoed to do. And someone here posted the example of the soap dispenser that couldn't "see" people that didn't have light skin. Also a problem, because now you can't get soap. Both of those are mostly examples of errors made during engineering (apply Hanlon's razor when in doubt - it's not always malice like people often assume), and might be preventable if there were more different samples and interactions available during development.
That doesn't mean it's the only type of problem we have with devices for humans, or that keeping a color palette of people around is always going to fix it.
Take the soap dispenser example: you could have made one that uses radar or ultrasonic detection, solves multiple problems because you now don't have to consider lighting conditions which always vary due to differences in bathrooms (light fixtures, positioning, positioning of the dispenser itself etc). That doesn't relate to skin color, but can just as easily be a similar problem that could have been prevented all the same by doing proper engineering. But it's likely that in cases like this (soap dispensers) the detection system was just a module that some other third party company imports and sells, and they bought it from a module integrator somewhere else and they bought the parts as generic motion detectors from a company 4 layers down the chain that specializes in measuring daytime movements of rabbits or something silly like that.
On the other hand: if you have a government that requires certain controls to be in place and applied for medical equipment to be used in a medical setting you would expect some of the requirements to be drawn up in a way that reflects the use cases. It appears that either the devices are not scoped within those controls, or the controls failed. Or it's all good and the difference in measurement is not significant enough to matter for the use case and thus the devices are used as-is and this research doesn't relate to the real world as well as people think.
When I see these types of examples brought up, I don't see malice being assumed. Rather, these are good illustrations of structural bias. "Structural" (or "systemic") is often used in contrast to overt bigotry or malice. It's sometimes the result of a certain workforce or industry being substantially skewed with respect to the wider population. You're right, Hanlon and his razor do work here to show that it's a naive error rather than intentional harm. But the result is still traceable to the disparities between the designers and the population at large.
As for the soap dispenser: If the team of engineers designing the module had a mix of skin colors, then they would have quickly seen that their product failed sometimes. They would, at the outset, have known that they needed to tune the photoreceptor better, or switch to a different method of detection.
Though it is funny that Nikon, a Japanese company, made cameras with an blink detector that falsely triggered on Asian eyes. That either refutes my own point above, or it indicates they imported that functionality.
It could be that the device as designed and completed for manufacturing 15 years ago, the manufacturing was shifted around between 2 companies and the supply chain has 10 levels deep of participants that all did their own tests on their part in their own company in their own country on their own continent. And between that and the consumer it turned out there were 4 companies that just trade and don't actually look at, touch or use the product.
This is probably not what happened at Nikon, I'd guess that they have a few departments that may not work together as well as they could. Would be interesting to see how this actually came to be, I imagine there are many ways to create a (partially (bad)) product, and equally many ways to try to prevent that.
When it comes to us in particular, we can each truthfully say "it wasn't my fault and I didn't mean any harm". But harm happens, and Hanlon's razor becomes dull with repeated application like that. As another rule goes, "Once is happenstance, twice is coincidence, three times is enemy action".
The solutions to systemic problems aren't simple. They certainly include rejecting the people who are outright being malicious. They also include a willingness to say, "Gosh, I keep using this didn't-mean-anything excuse. Maybe I need to start instead meaning to not do it."
Most of it was designed somewhere, built somewhere else, integrated yet somewhere else, and then simply bought out of a catalogue bulk, shipped to a warehouse and sold in smaller bulk to stores. The products and people that could have detected a problem are all over the world, and have seen every possible version of a human and still they end up in use with the problems they have.
If the pattern were constrained to fit a scenario as you describe then it would probably be something we can do something about. But just like in enterprise software contracts, the distance (both mentally, physically and legally) between all of this is not allowing for much change for any single actor. And I'm pretty sure the manufacturer of a sensor that doesn't even know where it ends up 10 layers deep in the supply chain is going to be remotely aware of this.
In practical terms we're seeing an anti-pattern and we try to personify it so we can express our dissatisfaction, but that's just as practical as shouting at the sun for not damaging all skin types at similar rates. There is no single source of the fault and there is no person or company or something like that, that is 'responsible' for it.
One possibility is to have some sort of consumer organisation test products on a diverse set of humans and report on products that lack support for certain shapes, colours or sizes, but other than that I have no idea how we'd adjust multiple people at multiple companies in multiple countries with multiple cultures.
The problem comes when you say, "Hey, maybe there oughta be some black people involved in some of the processes", and people scream "reverse racism" and "affirmative action" and all the other buzzwords. They assert the harm to themselves, and they get listened to. But when black people say, "Well, I'm also being harmed, and in ways you're not seeing," the response is "well, that specific thing isn't my fault".
The solutions are systemic. As you say, it's a chain: for a black person to get hired into that position everything has to have gone right for them. They didn't get failed by bad schools and they didn't get dismissed by their professors and they didn't get filtered out by a resume screener who didn't like their name and they didn't lose a job to somebody who hires people they know -- which just doesn't happen to include black people.
So it's not easy to solve, and even the good solutions are slow. But one critical step is for each person to recognize that their local optimizations and local considerations contribute to the bigger problem, and think of ways that they can make things better rather than just say "not my problem".
If the building is commercial, not residential, you'll find yourself on the business end of an ADA complaint. Not because of malice, but because of neglect.
The point isn't that people have different skin. It's not even that this one common device works better for some people than others. It's that the practice of medicine, having grown up around this cheap and effective device, turns out to be discriminatory in ways that we wouldn't naively expect.
That's worth talking about, because it's a problem that should be fixed.
> apply Hanlon's razor when in doubt - it's not always malice like people often assume
Nowhere in the article is "malice" alleged. You added that part yourself, precisely because the bias described has infected you too. Reflexively arguing that "black people get bad O2 readings" is an argument that shouldn't be made in journalism isn't helpful to anyone, most especially the patients needing better care.
It's more that having a higher quantity of a known problem doesn't make the problem go away, and neither does trying to find a single source or single instigator of a systemic problem. It's not a new problem either.
If we take skin color and replace it with finger length for devices that require you to grip it, or with distance between the eyes for things that you look through, we have the same issue. In that regard this is just another facet of ergonomics in design. Even if we were to test for a lot of things in a lot of places in the supply chain you still can't catch everything.
Thats the point of inclusion.
Although the soap dispenser example feels like one of those problems, it is just wear and tear of the sensor and they work properly when cleaned.
In this case, pulseox failures, can have effects on when people are admitted for treatment.
Same with all the white labeling and supply chain and integrators in between; the disconnect between all of the stages and the contracts that can bind a bad product to be delivered and sold anyway can't be fixed on just one end of the process.
Why not? That seems like the appropriate place to reject the inferior components arriving from that long and deep supply chain. You're creating a soap dispenser for sale in North America. It does not work on 15% of the population. That's a fail. You can't just throw your hands up and say, "Well, those photoreceptors were designed in Taiwan, and they didn't account for dark skin". If that's the case then you don't source those, or you request a different calibration profile, or you add a lens filter that compensate. You engineer the solution for your target market.
The same company using global components took the time to print logos in English and the user manual in English. They made sure the power circuitry matched North America's system. They set up a domestic phone number for support. Established payment systems to accept USD for their product.
They do all these things to sell better into a particular market, but they don't bother with the actual people who will be using it?
Don't get me wrong. You are accurately explaining the how of the failure. But that's exactly why the last step in the supply-chain exists. To make the actual finished product. That's where you catch deficiencies like this.
Yes sure, why don't we dispense with our most important and practical monitoring system because it has a racial bias? Pulsoxymetry saves lives every day. Many lives. Just ask old caregivers who were around before them.
uh we're saying make a better one.
and to only order the better ones.
and if you are making a better one to factor in these things.
and it is easier to factor in these things with a more diverse team as the edge cases won't be considered edge cases, but main cases.
Also, no one said this is the only problem we have with devices.
One of the off-brand meters was completely wrong (e.g. dangerous) for blacks, the other one was in between.
IMO seems that the certifications for this devices, especially for the ones used by medical stuff to decide things need to be better tested.
Other interesting point that I did not consider is you can have a device with an average error under a safe threshold but that gives a much above average error for some groups putting this groups at risk if the people using this devices are not warned about this issues.
It is a long article with many studies cited so I can't check all of those, IMO "the readings can be affected by skin tone" is not something that is sufficient for a medical device , what should a nurse or doctor do? There is no numeric value so what does that mean? is the device useless if you have dark sin or is it 2% wrong or as the article suggests the error is non linear and increases if you are suffering with low oxygen?
What I would do if I would sell this products is test with a few ranges of skin tones, and if needed have a switch on the device that you have to set for a certain level or if that is to expensive print instead of a number an interval and have the user trained to read that.
FWIW, my wife (Han, who grew up in Shanghai) generally has whiter skin than do I (a generic American of mostly northern European descent).
Wait. Would this happen to PIR-based systems?
The author cites one study where doctors noticed "a bias of up to 8 percent", which sounds high, but if you click through to the link provided, it says:
>The mean bias (Spo2 − Sao2) for the 70%–80% saturation range was 2.61% for the Masimo Radical with clip-on sensor, −1.58% for the Radical with disposable sensor, 2.59% for the Nellcor clip, 3.6% for the Nellcor disposable, −0.60% for the Nonin clip, and 2.43% for the Nonin disposable.
Basically, the amount of bias varies by manufacturer and type of device, and it is not always a positive bias as implied by the text of the link.
This seems to be more of a case of measurement error bounds inherent to any measurement device, rather than "a popular medical device encod[ing] racial bias" and all the other rhetoric in the article.
"systemic racism is as much a part of biology as genomes are: The conditions in which we develop—including limited access to healthy food, exposure to toxic pollutants, the threat of police violence or the injurious stress of racial discrimination—influence the likelihood that any one of us will suffer from high blood pressure, diabetes or serious complications from COVID-19"
As a nonmedical person, this seems blindingly obvious - to the point I could reference the "floor is made of floor" meme - so it would surprise me to learn that professionals haven't considered this downfall of the technology when taking their readings. Judging by the discussion here, I don't think I'm going to be surprised today.
What I am saying is that systems developed by a team with a majority of single race is bound to have biases purely because of the lack in opinion diversity while testing the system.
But I could be completely wrong...
Edit: I pressed send by accident, I never meant to actually post this comment but I'm leaving it up anyways.
Unfortunately it turned out that the fingerprint readers were racist, and refused to allow black people into the building.
They were swiftly removed.
Edit: there is a further problem that because many interpretations of the same facts are possible if you go looking for certain things you will very likely find them. While someone with different initial beliefs will find evidence of something else. All from the same facts.
The article presents what appears to be a plausible, evidence-based hypothesis as to why skin color in particular (not exclusive of other factors) might lead to inaccurate results for dark-skinned people. It then mentions studies done to either confirm or reject this hypothesis, and that the results seemed to confirm it. They link to a follow-up study done here.
You're only giving vague, general dismissals. If you believe the conclusions reached are in error, what evidence do you have that pulse oximeters are not affected by skin color?
I'm also not suggesting we ignore skin color as a relevant factor. I'm taking issue with the idea that everything that involves skin color is evidence of bias or other more malicious underlying causes.
“That statistical estimator has a bias” means that you can expect it to often produce an over estimate (for example).
Yes, it does. The FDA blocked its use.
Is it possible to jailbreak iOS and gain access to... hmm, probably not. Even if the driver ships in iOS, it would be a fair question how much "disabled" was implemented by removing secret/irreplaceable code rather than simply disabling flags (potentially because the former approach was more convenient).
I bet they do. They certainly did a while back, and I've found nothing saying this has been fixed.