Hacker News new | past | comments | ask | show | jobs | submit login
Pulse oximeters give biased results for people with darker skin (bostonreview.net)
180 points by lilrhody on Aug 6, 2020 | hide | past | favorite | 117 comments

I'm a student EMT and our instructor told us not to rely on a pulse ox reading if our patient has dark skin, or if they were in a fire, or are a lifelong smoker, or if they have any nail polish or dirt/grease on their fingertips. We just look for the 94-100% range and would only take action if we get a reading around 87% or lower, or see a sudden drop during treatment.

There are a few alternatives designs that aim to mitigate this particular issue by either preventing crosstalk [1] or by using multiple LED wavelengths and angles - the second one is something a med device engineer told me about a few months ago, but I'm having trouble finding any papers or information on it.

[1] https://www.hindawi.com/journals/jhe/2018/3521738/

The article notes that the devices inaccurately report higher oxygen levels for darker skin tones. Meaning if someone dark skinned reports a result that falls into the 94-100% range, they may in fact be at lower oxygen levels that put them at risk.

The bias is smaller than the actual measurement error. The measurement error is around +/- 1. So a 96 reading means actual blood O2 of 95-97. The bias introduced by skin color is around 0.5. So a 96 reading means 95-97 in a white person and 94.5-96.5 in a black person. The bias from skin color isn't significant because the measurement itself is pretty inaccurate.

That's true. There are other context clues that we can use, like a patient's medical history, how their breathing looks/sounds, or if their environment or mental status suggests CO poisoning (that also won't show up even on the perfect pulse ox, since it binds to hemoglobin the same way oxygen does). Overall we just need a better way to get information about oxygen levels.

At first thought, a hemoglobin protein loaded up with CO molecules should have slightly less mass than an oxygenated hemoglobin complex, which could... theoretically... be measured, maybe with Raman spectroscopy.

That effect is going to be tiny, though. Wikipedia is telling me a hemoglobin tetramer masses "about" 64,000 daltons, so a complex loaded up with 4CO will mass 64,111 vs 64,127 amu. 0.024% difference.

Just so you have some extra information (I actually build a heart rate monitor and pulse oximiter) skin color and transparency makes a big difference, same with the color of the LEDs. I'm pretty pale and one of my friends is pretty tan, my signal was much stronger than his. The IR (infrared) ones work better as they are able to penetrate the skin better and there are clearer absorption lines from oxidized blood. I also tested different locations to place the device, I did notice most are fingertip or wrist (fitbits and smart watches), but found I got substantially stronger signals by placing the device near the upper forearm (god did this help in debugging). This was true for everyone I tested (unfortunately only white people) and helped smooth out the data a little, but it was still noticeable.

On how these devices work, it is typically two LEDs the produce light that bounces through the skin, some light is absorbed by blood (different absorption by different oxidation and density) then the reflected light is captured by a photodiode. All of this is on one surface, they are very basic. The density changes rapidly and that's how you capture the pulse (easy to get btw) and the oxidation is based on absolute levels (THIS IS THE PROBLEM!!!!). After building one I learned to not take these things too seriously, especially fingertip and wrist mounted ones.

I encourage others to build them. You can do most of it in a weekend if you have some soldering experience (best if you can do surface mount soldering), some programming, and are comfortable doing data analysis. And no, I do not expect my device to be on par with the ones in hospitals, but I was quickly able to match readings from a finger based device that I bought from the local drug store.

> We just look for the 94-100% range and would only take action if we get a reading around 87% or lower

I'm not very familiar, so I'm not sure how to interpret this. Does this describe the normal procedure for when you can rely on a pulse oximeter, or does it describe the modified procedure for when you can't?

That's the normal procedure, as far as I know. Taking a pulse ox reading is done at the same time as we'd get blood pressure, listen to breath sounds and watch chest expansion depth/rate, and talk to the patient about how they're feeling or what's wrong. It's just one factor that could tell us that they're short of breath and we should provide oxygen, but it's important to consider the patient as a whole. They like to say "treat the patient, not the data".

Thanks. So in a case where you cannot rely on it, do you completely avoid using the pulse oximeter at all? Or do you use it but place less faith in the readings and interpret the readings differently?

Since I'm just a student I think it's best if I defer to the textbook:

"If your service has a pulse oximeter, you should have a protocol describing when to use it. Generally this will include all patients complaining of respiratory problems or otherwise at risk for hypoxia. When used properly, the device can help you to assess the effectiveness of artificial respirations, oxygen therapy, and bronchodilator (inhaler) therapy."

If people were in a fire it's true the value may be flawed, but we usually say that more because of potential carboxy- and methemoglobinemia than because of soot.

Personally, I've never seen a case where the difference was clinically meaningful. So, meh...

Yeah, and if there was a real pressing need to get an more accurate number, you would get an arterial blood gas.

Pulse oximetry is for monitoring change. When a critical value is needed for medical intervention, such as a decision to put in a breathing tube, tests like arterial blood gas has are used.

This device seems pretty flawed if it can't read fingers on like 80% of people...

This really isn't a big deal. Doctors know that these pulse ox meters aren't very accurate and use them accordingly. Nobody's changing their plan because you have a pulse ox of 93 instead of 95. They're there to quickly measure big changes that indicate a problem. In other words, they're looking for patients going from 98 to 75, not 98 to 95.

My insurance company didn't authorize a full sleep study because my overnight pulse oximeter results were normal, so it took me an extra couple of years to find out I needed a CPAP machine. It might have helped to know that my skin color could cause inaccurate results.

What's the point of casually dismissing this article's premise? Whether the conclusions are fully sound or not isn't the goal of this article, this is journalistic not academic. The goal is to raise awareness of a potential bias, and I think it achieves that (assuming the facts all check out).

For one, according to the article the discrepancy was found to be up to 8 points in some cases, not 3. And it seems reasonable that quick, practical decisions in hospitals are made based on point thresholds, not purely huge point leaps as you suggest.

I haven't dug any further than the article and your comment to verify these facts. But rather than find a way to prove why this doesn't matter, why don't we assume it does? Then the knowledge and awareness might spread a little bit more through us, and if it is an issue, it might be that much more likely to be solved. There's no penalty for being wrong with a weakly held but well-meaning assumption. At the very least, we can be more aware of another possible dimension of bias in technology.

>What's the point of casually dismissing this article's premise?

Because it should be casually dismissed. This person doesn't know what they're talking about, refused to listen to the people who told them they didn't know what they are talking about, and wrote an article full of nonsense.

"Journalists" in 2020 summarized.

>why don't we assume it does? Then the knowledge and awareness might spread a little bit more

If the assumption is incorrect, then it would be misinformation that we would be spreading.

Medicine is very rarely about absolutes, it's mostly about how things get better or worse. When I was in the hospital, no one looked at the absolute values of my blood work, they cared to see if it dropped based on the treatment they saw.

Also, doctors are already aware about how oximeters may be inaccurate. It doesn't mean that they still can't be useful.

Yeah, if we had such cut and dry "rules" about medicine, then we could automate a lot of what doctors do. Medicine is as much about the art as it is the science. You come to understand the limitations of different measuring techniques, and learn to know when you need to rely on each for more accurate representations.

Many quick, practical decisions are made on patient assessment and clinical judgement, not just numbers. None of us that I know of has ever come across a perfect pulse ox. "Treat the patient, not the numbers" as the saying goes. When they don't match up, I like to do things like use a different finger/hand (or ear lobe or forehead).

This thread because it has race as a secondary theme will be rife with such statements dismissing the premise.

Bullshit should be dismissed, regardless of race.

That hasn’t been true at all in my experience. When my son got pneumonia, we were told he would be hospitalized at 92 when he was reading 94. He’s dark-skinned so who knows what the actual reading should have been.

FTA: "At a reading of 88 or 89, Medicare will reimburse for oxygen at home, but at 90 it won’t"

Getting oxygen at home reimbursed by Medicare also requires diagnoses with certain conditions and specific testing. If oxygen at home is required the testing to satisfy Medicare requirements won't be conducted using the common finger tip pulse ox meter.

Do you have an example of what testing they do actually use?

Probably ABG like the article mentions.

I think they would do a test for arterial blood gas.

There might actually be room for a lawsuit there, if that regulation has a demonstrable impact on people with darker skin

Frustratingly, this kind of dismissal perpetuates injustices.

First, let's be clear:

> Doctors know that these pulse ox meters aren't very accurate and use them accordingly

Some doctors know...

> Nobody's changing their plan because you have a pulse ox of 93 instead of 95. They're there to quickly measure big changes that indicate a problem.

Sometimes, perhaps even often, they're used for this. It's one use case of pulse oximetry, and it's an important one.

I read your post as extrapolating general claims from common cases, and erasing the real experiences of people that fall outside of these common cases.

Bias often happens at the margins. That doesn't make it less real for the people who suffer from it, even if most people don't suffer, even if the common cases work well.

It can be hard to see those margins when you aren't in them, but that doesn't mean they don't exist -- they are still worth writing about and trying to fix.

This IS NOT BIAS! This is physics!

As someone else pointed out, this isn't only about melanin in the skin. Darkened skin for any reason will cause this (burns, smoking, dirt, etc...). Medical personnel are trained to account for this partial accuracy.

The other fact is that there are newer techniques being worked on to improve the technology.

When a measurement device is, in some settings, consistently off in one direction... that's one of the most central definitions of bias.

Physics describes how oxygen saturated blood looks in the infrared, but using that information in a device is engineering. The engineers who design these devices should have better understood how something obvious like skin color would bias the measurements. It seems like the physics could be exploited again via a visible light sensor that detects darker skin.

This is absolutely bias, and it is entirely due to engineering choices made by people, not nature.

> This IS NOT BIAS! This is physics!

Bias doesn't happen in physics. Bias happens where humans and physics interact.

> Medical personnel are trained to account for this

Medical personnel aren't the only people who use pulse oximeters and need them to read reliably. Beyond that, this thread and the article it discusses are full of examples of why "medical personnel are trained" isn't enough of an answer here.

Both measurement and social/cultural bias occur in physics:


"Cargo Cult Science" (esp. Millikan). http://calteches.library.caltech.edu/51/2/CargoCult.htm

Einstein to M. Curie: https://www.brainpickings.org/2016/04/19/einstein-curie-lett...

Planck: Science advances one funeral at a time. (See also Kuhn.) https://en.wikiquote.org/wiki/Max_Planck

Yes, and the point that I'm making is that "this isn't bias, this is physics" is missing the point.

Yes, the way in which varying wavelengths of light interact with varying configurations of bulk matter is a physical phenomenon, and not subject to bias.

But that isn't the same as saying, as the comment to which I originally replied seemed to do, that no bias can exist in the way in which that phenomenon is interpreted by humans.

I suspect instances of bias introduced by other-than cultural, social, human, and perceptual dynamics are also present.

> This IS NOT BIAS! This is physics!

You sound pretty emotional, maybe tone it down and try to think objectively.

Just because there's an underlying physical reason for the device to be inaccurate on a subset of the population, doesn't mean that it's fair to use that device on the impacted population. It's been accepted by the medical community, despite the additional training required to correctly use the device -- amounting to guesswork when the medical personnel remember that it's necessary. Accounting for human error, this means that affected populations will receive a lower quality of care.

Just a hypothetical. Would it be accepted if it was accurate for black women, and required guesswork when treating white men? Because there is a long history of significant bias in medical studies. Just because "it's just physics" explains the device's flaw, doesn't mean that acceptance of a flawed device isn't a result of systemic bias.

I'd hardly call it injustice. Anything based on light is going to be variable depending on skin tone (or, as other have mentioned, darkened skin for any reason).

The only thing you could claim injustice for would be if there was no other mechanism for measuring blood oxygen other than something that was known to not work properly for people of color.

A lot of other folks made this point, but I’m replying here because your response was the most considered.

No one — myself including — is saying that physics is biased, or that this tool is inherently biased. The headline is misleading, it’s really talking about the interpretation of the number that’s biased, not the physics. The number is just a number.

The bias comes through the typical institutional use of this tool, as described in the article: the “normal” regions are based on “normal” for lightly pigmented people, so heavily pigmented people are by default not in the “normal” range. Do some doctors realize this? Of course they do. But as programmers we all know the power of defaults some doctors won’t realize this, or will forget, or will realize it but it will still affect them.

Other policies (like the Medicaid one described in the article) also ignore these differences.

It’s not the physics that’s biased. It’s the institutional and individual assumptions around the tool.

Also, while they might not change plan because of a reading of 93 instead of 95 - the article describes a bias of up to 8%, which does seem enough to be significant.

We are looking for people to make 90+, 92+ if reasonably healthy. 75 is super low, and apart from exceptions is a _very_ concerning value. See the dissociation curve of hemoglobin/O2 to know why.

However yes, in my experience I never had enough difference in dark-skinned individuals to be clinically meaningful.

> Nobody's changing their plan because you have a pulse ox of 93 instead of 95.

Threshold values are used to.make decisions, dark skin people do read higher (meaning decisions not informed by knowledge of the difference are likely to delay intervention for darker skinned people) and also have worse outcome with otherwise similar presentation for many of the conditions where pulse oximetry would be a factor in decision-making. It's quite possible that the outcome difference is in part because of the measurement difference, even if the likely impact in each individual case is minor.

> In other words, they're looking for patients going from 98 to 75, not 98 to 95

The problem is that that '95' might really be a '90' - surely a drop from 98 to 90 is a cause for concern. If that information is being hidden, it needs to be fixed, regardless of other, more subjective fallback methods.

The difference of 5% is only common in the 60-70% on some clips. In the 90%-100% level the different is up to 2%-3% on the worst pulse ox's, and usually around 1%.

For reference I have two pulse oximeters at home and they routinely give me results that differ by 2-3%.

Seems like an issue we should fix, but I don't know what impact this would have in a clinical setting given that they use multiple criteria and the devices are known to be inaccurate.


The SpO2 measurement performance of a device must be verified before the device is released to the market. The U.S. Food and Drug Administration (FDA) suggests using standards presented in the following:

ISO 80601-2-61:2017 – Medical electrical equipment -- Part 2-61: Particular requirements for basic safety and essential performance of pulse oximeter equipment

Pulse Oximeters – Premarket Notification Submissions [510(k)s] Guidance for Industry and Food and Drug Administration Staff

According to these regulations, manufacturers need to declare the calibration range, reference, accuracy, methods of calibration and range of displayed saturation level. Furthermore, for the performance assessment, the FDA requires at least 200 data points equally spaced over a saturation range of 70% to 100%. Test subjects should have different ages, gender, and skin tones. For instance, the FDA requires that at least 30% of the volunteers must have dark skin pigmentation. The overall error or the root mean square error (RMSE) must be below 3.0% for transmissive pulse oximetry and below 3.5% for reflective pulse oximetry.

© 26 Mar, 2019, Maxim Integrated Products, Inc. https://www.maximintegrated.com/en/design/technical-document...

Seems like it would be better if instead of merely an overall error of at most 3% they required an error of at most 3% in the subpopulations of interest too.

And darker coloured textiles are less visible on people with darker skin. Grass is green and bananas tend to be curved.

I get that it is interesting to research optically measuring systems to 'search for bias' and I'm sure there are going to be biases in varying degrees, but that doesn't mean it's always for the same reason or with the same result.

There are very real problems of course, take the example with the photographic film development from the article: that is a problem because a picture that is supposed to reflect the subject you photographed no longer does what it was suppsoed to do. And someone here posted the example of the soap dispenser that couldn't "see" people that didn't have light skin. Also a problem, because now you can't get soap. Both of those are mostly examples of errors made during engineering (apply Hanlon's razor when in doubt - it's not always malice like people often assume), and might be preventable if there were more different samples and interactions available during development.

That doesn't mean it's the only type of problem we have with devices for humans, or that keeping a color palette of people around is always going to fix it.

Take the soap dispenser example: you could have made one that uses radar or ultrasonic detection, solves multiple problems because you now don't have to consider lighting conditions which always vary due to differences in bathrooms (light fixtures, positioning, positioning of the dispenser itself etc). That doesn't relate to skin color, but can just as easily be a similar problem that could have been prevented all the same by doing proper engineering. But it's likely that in cases like this (soap dispensers) the detection system was just a module that some other third party company imports and sells, and they bought it from a module integrator somewhere else and they bought the parts as generic motion detectors from a company 4 layers down the chain that specializes in measuring daytime movements of rabbits or something silly like that.

On the other hand: if you have a government that requires certain controls to be in place and applied for medical equipment to be used in a medical setting you would expect some of the requirements to be drawn up in a way that reflects the use cases. It appears that either the devices are not scoped within those controls, or the controls failed. Or it's all good and the difference in measurement is not significant enough to matter for the use case and thus the devices are used as-is and this research doesn't relate to the real world as well as people think.

> Both of those are mostly examples of errors made during engineering (apply Hanlon's razor when in doubt - it's not always malice like people often assume), and might be preventable if there were more different samples and interactions available during development.

When I see these types of examples brought up, I don't see malice being assumed. Rather, these are good illustrations of structural bias. "Structural" (or "systemic") is often used in contrast to overt bigotry or malice. It's sometimes the result of a certain workforce or industry being substantially skewed with respect to the wider population. You're right, Hanlon and his razor do work here to show that it's a naive error rather than intentional harm. But the result is still traceable to the disparities between the designers and the population at large.

As for the soap dispenser: If the team of engineers designing the module had a mix of skin colors, then they would have quickly seen that their product failed sometimes. They would, at the outset, have known that they needed to tune the photoreceptor better, or switch to a different method of detection.

Though it is funny that Nikon, a Japanese company, made cameras with an blink detector that falsely triggered on Asian eyes. That either refutes my own point above, or it indicates they imported that functionality.

The issue is that we assume that there was a building somewhere, where a group of people sat down and designed and produced a device. (the soap dispenser) But we don't actually know that. If there is a single spot where this was done, and also where the testing was done, and there was an abundance of skin tones available, then this was caught. But that's a lot of ifs.

It could be that the device as designed and completed for manufacturing 15 years ago, the manufacturing was shifted around between 2 companies and the supply chain has 10 levels deep of participants that all did their own tests on their part in their own company in their own country on their own continent. And between that and the consumer it turned out there were 4 companies that just trade and don't actually look at, touch or use the product.

This is probably not what happened at Nikon, I'd guess that they have a few departments that may not work together as well as they could. Would be interesting to see how this actually came to be, I imagine there are many ways to create a (partially (bad)) product, and equally many ways to try to prevent that.

Each instance of it isn't necessarily malicious, but the way they all end up causing harm to the same people makes the system as a whole act as if it were malicious. Especially when combined with a proportion of people who are actively malicious, and when any attempts to address the systemic issue are dismissed.

When it comes to us in particular, we can each truthfully say "it wasn't my fault and I didn't mean any harm". But harm happens, and Hanlon's razor becomes dull with repeated application like that. As another rule goes, "Once is happenstance, twice is coincidence, three times is enemy action".

The solutions to systemic problems aren't simple. They certainly include rejecting the people who are outright being malicious. They also include a willingness to say, "Gosh, I keep using this didn't-mean-anything excuse. Maybe I need to start instead meaning to not do it."

The issue with all of this, is that it's not just one company with a group of all the parts that design, create, test and sell this stuff.

Most of it was designed somewhere, built somewhere else, integrated yet somewhere else, and then simply bought out of a catalogue bulk, shipped to a warehouse and sold in smaller bulk to stores. The products and people that could have detected a problem are all over the world, and have seen every possible version of a human and still they end up in use with the problems they have.

If the pattern were constrained to fit a scenario as you describe then it would probably be something we can do something about. But just like in enterprise software contracts, the distance (both mentally, physically and legally) between all of this is not allowing for much change for any single actor. And I'm pretty sure the manufacturer of a sensor that doesn't even know where it ends up 10 layers deep in the supply chain is going to be remotely aware of this.

In practical terms we're seeing an anti-pattern and we try to personify it so we can express our dissatisfaction, but that's just as practical as shouting at the sun for not damaging all skin types at similar rates. There is no single source of the fault and there is no person or company or something like that, that is 'responsible' for it.

One possibility is to have some sort of consumer organisation test products on a diverse set of humans and report on products that lack support for certain shapes, colours or sizes, but other than that I have no idea how we'd adjust multiple people at multiple companies in multiple countries with multiple cultures.

The chains do interfere with the problem, and one part of the solution is to have people at every single step who think of it as their problem. What I suspect happens in a case like this is that a white manufacturer asks a bunch of white engineers for a design, who sends it off to a Chinese manufacturing plant, to be shipped to a white buying agent in pharmacies. Nobody in the process is explicitly racist, but neither are there any black people in the process to say, "Hey, would this work on me?"

The problem comes when you say, "Hey, maybe there oughta be some black people involved in some of the processes", and people scream "reverse racism" and "affirmative action" and all the other buzzwords. They assert the harm to themselves, and they get listened to. But when black people say, "Well, I'm also being harmed, and in ways you're not seeing," the response is "well, that specific thing isn't my fault".

The solutions are systemic. As you say, it's a chain: for a black person to get hired into that position everything has to have gone right for them. They didn't get failed by bad schools and they didn't get dismissed by their professors and they didn't get filtered out by a resume screener who didn't like their name and they didn't lose a job to somebody who hires people they know -- which just doesn't happen to include black people.

So it's not easy to solve, and even the good solutions are slow. But one critical step is for each person to recognize that their local optimizations and local considerations contribute to the bigger problem, and think of ways that they can make things better rather than just say "not my problem".

If you have stairs leading up to the porch of your house, they aren't there to prevent people with disabilities from entering. They're there because it's a cheap and convenient solution.

If the building is commercial, not residential, you'll find yourself on the business end of an ADA complaint. Not because of malice, but because of neglect.

> And darker coloured textiles are less visible on people with darker skin. Grass is green and bananas tend to be curved.

The point isn't that people have different skin. It's not even that this one common device works better for some people than others. It's that the practice of medicine, having grown up around this cheap and effective device, turns out to be discriminatory in ways that we wouldn't naively expect.

That's worth talking about, because it's a problem that should be fixed.

> apply Hanlon's razor when in doubt - it's not always malice like people often assume

Nowhere in the article is "malice" alleged. You added that part yourself, precisely because the bias described has infected you too. Reflexively arguing that "black people get bad O2 readings" is an argument that shouldn't be made in journalism isn't helpful to anyone, most especially the patients needing better care.

My point wasn't to suggest that skin tones are the only thing in the world that aren't considered by all the people during all product design phases. It was more that we know that this is a problem in our bubble of communication, we know it just as much that we know that grass tends to be green (or yellow-ish when it doesn't rain for a while and the sun is out).

It's more that having a higher quantity of a known problem doesn't make the problem go away, and neither does trying to find a single source or single instigator of a systemic problem. It's not a new problem either.

If we take skin color and replace it with finger length for devices that require you to grip it, or with distance between the eyes for things that you look through, we have the same issue. In that regard this is just another facet of ergonomics in design. Even if we were to test for a lot of things in a lot of places in the supply chain you still can't catch everything.

The product just wouldn't have shipped with a product manager who also experienced it not working.

Thats the point of inclusion.

Although the soap dispenser example feels like one of those problems, it is just wear and tear of the sensor and they work properly when cleaned.

In this case, pulseox failures, can have effects on when people are admitted for treatment.

It would probably not have shipped in that case, but the amount of product managers actually using their own products sadly isn't at the saturation you'd expect.

Same with all the white labeling and supply chain and integrators in between; the disconnect between all of the stages and the contracts that can bind a bad product to be delivered and sold anyway can't be fixed on just one end of the process.

I'm going to challenge you on "...can't be fixed on just one end of the process"

Why not? That seems like the appropriate place to reject the inferior components arriving from that long and deep supply chain. You're creating a soap dispenser for sale in North America. It does not work on 15% of the population. That's a fail. You can't just throw your hands up and say, "Well, those photoreceptors were designed in Taiwan, and they didn't account for dark skin". If that's the case then you don't source those, or you request a different calibration profile, or you add a lens filter that compensate. You engineer the solution for your target market.

The same company using global components took the time to print logos in English and the user manual in English. They made sure the power circuitry matched North America's system. They set up a domestic phone number for support. Established payment systems to accept USD for their product.

They do all these things to sell better into a particular market, but they don't bother with the actual people who will be using it?

Don't get me wrong. You are accurately explaining the how of the failure. But that's exactly why the last step in the supply-chain exists. To make the actual finished product. That's where you catch deficiencies like this.

> The product just wouldn't have shipped with a product manager who also experienced it not working.

Yes sure, why don't we dispense with our most important and practical monitoring system because it has a racial bias? Pulsoxymetry saves lives every day. Many lives. Just ask old caregivers who were around before them.

> why don't we dispense with our most important and practical monitoring system because it has a racial bias?

uh we're saying make a better one.

and to only order the better ones.

and if you are making a better one to factor in these things.

and it is easier to factor in these things with a more diverse team as the edge cases won't be considered edge cases, but main cases.

Just because your customer didn't explicitly tell you to account for people with darker skin doesn't mean your failure to do so is just "an error made during engineering". It can still be systemic racial bias for neither you nor your client to consider who is actually going to be using your devices.

Also, no one said this is the only problem we have with devices.

If I understand it, they looked at 3 pulse ox meters. The expensive one from the company that was sponsoring the study read about one point (out of 100) lower on blacks which is inaccurate but not outrageous.

One of the off-brand meters was completely wrong (e.g. dangerous) for blacks, the other one was in between.

They look at a few studies too and asked around companies if they handled the issues mentioned in those studies, the most positive result was that some models from one company are now better)probably some of the others are not).

IMO seems that the certifications for this devices, especially for the ones used by medical stuff to decide things need to be better tested.

Other interesting point that I did not consider is you can have a device with an average error under a safe threshold but that gives a much above average error for some groups putting this groups at risk if the people using this devices are not warned about this issues.

Nonin created the first of these back in the '90s, so I pulled one of their spec sheets from 2016 (https://www.nonin.com/wp-content/uploads/2018/09/NoninConnec...) which explicitly mentions the pigmentation issue along with nail polish, poor circulation, and breathing issues. From that, this isn't exactly a new or unknown thing since commercial products have been working around this for some time. I wasn't able to find an industry standard for testing on these, so maybe the article should cite that as an opportunity to improve a product or add a new method?

> I wasn't able to find an industry standard for testing on these, so maybe the article should cite that as an opportunity to improve a product or add a new method?

It is a long article with many studies cited so I can't check all of those, IMO "the readings can be affected by skin tone" is not something that is sufficient for a medical device , what should a nurse or doctor do? There is no numeric value so what does that mean? is the device useless if you have dark sin or is it 2% wrong or as the article suggests the error is non linear and increases if you are suffering with low oxygen?

What I would do if I would sell this products is test with a few ranges of skin tones, and if needed have a switch on the device that you have to set for a certain level or if that is to expensive print instead of a number an interval and have the user trained to read that.

I wonder if the pulse oxes in China are calibrated for Chinese skin or White skin? Many people are buying pulse oxes straight off of Alibaba and they might be getting readings that are too low if the calibration is off.

I’ll be surprised if they are calibrated at all.

Lots of thermometers on Amazon aren’t even FDA cleared so I wouldn’t be surprised if the pulse oximeters aren’t FDA cleared either.

Depends on what you are measuring as well; if you want to know if there is a significant change you can do that. If you want absolute numbers you are probably doing it wrong anyway. (there is no single true template for numbers universally describing what a certain human should be like)

I saw a video somewhere of a pulse ox which would given fake reasonable reading if you put a pencil in its sensor. Just makes you think its working.

China is home to a tremendous amount of racial diversity, from the majority Han to Tibetans to the oppressed Uighur. Our western idea that "Asian" is a race is rather flawed. I don't think you can come up with any "typical" Chinese skin color.

FWIW, my wife (Han, who grew up in Shanghai) generally has whiter skin than do I (a generic American of mostly northern European descent).

Reminds me of the automatic soap dispenser at Facebook that can't see black people. https://gizmodo.com/why-cant-this-soap-dispenser-identify-da...

I can believe this because automatic doors seem more prone to having a hard time opening when I wear black clothes. It's not a big deal or anything, but I occasionally have to step back and then forth again to get it to open. Is this a common experience of black people, or am I crazy?

Oh, THAT'S why entrance barriers at supermarkets only occasionally fail, but fail consistently on a given day...

Wait. Would this happen to PIR-based systems?

I'm not sure. It seems at least possible with supermarket doors because those supposedly use a low-power microwave beam. With PIR, it still relies on a deflection of a beam, so I think it would be a problem with that as well, but I'm just guessing.

PIR doesn't have a "beam", it detects temperature changes (i.e. you, a warm object, moving around). Clothes could play a role, not sure if the color though.

This article doesn't seem particularly balanced. Unless I'm missing something, the author doesn't really give any generalized details about the amount of bias (for example, mean bias).

The author cites one study where doctors noticed "a bias of up to 8 percent", which sounds high, but if you click through to the link provided, it says:

>The mean bias (Spo2 − Sao2) for the 70%–80% saturation range was 2.61% for the Masimo Radical with clip-on sensor, −1.58% for the Radical with disposable sensor, 2.59% for the Nellcor clip, 3.6% for the Nellcor disposable, −0.60% for the Nonin clip, and 2.43% for the Nonin disposable.

Basically, the amount of bias varies by manufacturer and type of device, and it is not always a positive bias as implied by the text of the link.

This seems to be more of a case of measurement error bounds inherent to any measurement device, rather than "a popular medical device encod[ing] racial bias" and all the other rhetoric in the article.

IIUC, dark-skinned people have higher covid19 mortality rates, and physicians usually(?) use this kind of oxymeter in their decision-making. I wonder if there's a connection.

Here's a good article that discusses why Black Americans have higher mortality rates:

"systemic racism is as much a part of biology as genomes are: The conditions in which we develop—including limited access to healthy food, exposure to toxic pollutants, the threat of police violence or the injurious stress of racial discrimination—influence the likelihood that any one of us will suffer from high blood pressure, diabetes or serious complications from COVID-19"


The article also mentions that nonwhite people have a higher rate of their trouble with breathing being dismissed as due to anxiety compared to white people... It's possible there is a layer of biases, and at each step, there's a risk of not getting timely medical intervention resulting in poorer outcomes.

Right. I wasn't trying to imply that the oxymeter bias explained the entire difference in outcome rates. I'm just curious if it matters enough to warrant further study.

I am agreeing with you, and speculating that the oxymeter is a small tip of the iceberg, and a multitude of "small" biases that add up to terrible outcomes.

Interestingly, the new Apple watch will supposedly have a pulse oximeter of sorts [1]. I was looking forward to that. Now I wonder how it biases as well, especially given the wide distribution of apple watches and them being used as a defacto emergency health monitoring device [2]

[1]: https://www.tomsguide.com/news/apple-watch-6-blood-oxygen-mo...

[2]: https://9to5mac.com/2020/07/01/critical-heart-disease/

Sorry if I'm not demonstrating adequate sympathy, but all I can thing of from reading this headline is "oh dear, a person with a genetic adaptation that prevents some wavelengths of light from penetrating to deeper layers of the skin will find it hard to make some wavelengths of light penetrate to deeper layers of the skin"

As a nonmedical person, this seems blindingly obvious - to the point I could reference the "floor is made of floor" meme - so it would surprise me to learn that professionals haven't considered this downfall of the technology when taking their readings. Judging by the discussion here, I don't think I'm going to be surprised today.

I believe if we were given the race of the developers this bias would become clear - nothing against the developers or their race, I'm not implying the developers are racist in any way.

What I am saying is that systems developed by a team with a majority of single race is bound to have biases purely because of the lack in opinion diversity while testing the system.

But I could be completely wrong...

Edit: I pressed send by accident, I never meant to actually post this comment but I'm leaving it up anyways.

It's maybe not too surprising that a device that works by shining a light through your skin is affected by pigments in the skin.

That's not supposed to be surprising, it's the fact most pulse OX meters are known to not work well for darker skin and it simply wasn't fixed.

A company I worked for decided one day to purchase budget fingerprint readers for door access. They were installed with much fanfare.

Unfortunately it turned out that the fingerprint readers were racist, and refused to allow black people into the building.

They were swiftly removed.

measurement devices do not have 'bias' they have 'tolerance'. Give them some credit

Two separate things. The tolerance for blood oxygen is centered around sensor readings from white skin. Which is a bias.

And people with sweaty hands. And people with low iron levels. There's this push to show that everything is biased against certain classes. I'm sure some of the examples are true, but I can't imagine that constantly seeking evidence of problematic issues is good for individuals or society.

Seeking evidence of problematic issues is how society identifies and eventually addresses such issues. Do you believe society would benefit from ignoring evidence or pretending those issues didn't exist?

I agree that seeking evidence is how issues are addressed. The problem is that evidence is being sought in a particular way, which systematically biases which evidence is available for consideration.

Edit: there is a further problem that because many interpretations of the same facts are possible if you go looking for certain things you will very likely find them. While someone with different initial beliefs will find evidence of something else. All from the same facts.

But you seem to be suggesting that we ignore skin color as a relevant factor, despite the data, out of a belief that any correlations drawn from that data must be distorted to further a political agenda.

The article presents what appears to be a plausible, evidence-based hypothesis as to why skin color in particular (not exclusive of other factors) might lead to inaccurate results for dark-skinned people. It then mentions studies done to either confirm or reject this hypothesis, and that the results seemed to confirm it. They link to a follow-up study done here[0].

You're only giving vague, general dismissals. If you believe the conclusions reached are in error, what evidence do you have that pulse oximeters are not affected by skin color?


I don't think the conclusions are in error. It's reasonably well known among medical professionals that skin color affects pulse-ox readings. It is also true that sweaty hands or low iron levels will cause incorrect pulse-ox readings.

I'm also not suggesting we ignore skin color as a relevant factor. I'm taking issue with the idea that everything that involves skin color is evidence of bias or other more malicious underlying causes.

I don't think this should be flagged/removed. It's a technical review of medical devices.


Bias here means reliably off in a particular direction, not discriminatory.

“That statistical estimator has a bias” means that you can expect it to often produce an over estimate (for example).

While I get being touchy about such a term, I think bias is the proper term here. https://en.m.wikipedia.org/wiki/Bias_(statistics)

I think the quibble may not be with the headline, which just says bias, but with the repeated use of "racial bias" in the article content. That is a somewhat loaded term in these times we are living in. A more neutral write-up might just have observed that the devices seem to have poor quality control and may not be particularly accurate for all skin colors. Caveat emptor and all that, but it is a $25 medical device.

"Pulse oximeters give less accurate results for people with darker skin" would have conveyed exactly the same information, but probably would have yielded fewer clicks

Quelle surprise - garbage in, garbage out. I'll bet my Apple Watch doesn't have these issues.

The Apple Watch doesn't have a pulse oximeter (EDIT: it does have one that is not approved and therefore is disabled), so it would be difficult for it to have this issue. You might be thinking of optical HR measurement, for which sensors have also been shown to have skin color biases in some cases.


> The Apple Watch doesn't have a pulse oximeter

Yes, it does. The FDA blocked its use.



Noted and thanks for the information, but as far as I can tell my point stands if the feature is completely inaccessible. I don't really understand what the GP comment was referring to regarding Apple Watch pulse oximetry accuracy. Other consumer devices such as my smartphone and watch have working pulse oximeter sensors built-in that are presumably FDA approved (or not FDA banned anyway), so it doesn't bode well for the Apple Watch if it can't pass this.

I have a lot of confidence in Apple even if they are less than 100% all the time. They put a vast amount of effort into catering for diverse users, their iOS accessibility, for example. So I would bet that the Watch sensors are solid. Or will be, as the feature appears to be coming soon: https://9to5mac.com/2020/03/08/apple-watch-blood-oxygen-satu...

Your dedication to your Kool-Aid is impressive.


Is it possible to jailbreak iOS and gain access to... hmm, probably not. Even if the driver ships in iOS, it would be a fair question how much "disabled" was implemented by removing secret/irreplaceable code rather than simply disabling flags (potentially because the former approach was more convenient).

Ah yeah, maybe. I've had the watch and previously a Vivoactive HR and they never seemed to have any trouble with my skin colour when measured against other measurement methods.

Apple Watch already had exactly this problem in the past.

>I'll bet my Apple Watch doesn't have these issues.

I bet they do. They certainly did a while back, and I've found nothing saying this has been fixed.


Mildly interesting, but that looks exceedingly anecdotal.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact