Overcoming Bias in Computer Vision

At the recent Embedded Vision Summit, Will Byrne gave a presentation Overcoming Bias in Computer Vision, a Business Imperative . He started by pointing out that bias is marring AI's rollout. The three examples he cited are: Tay.AI chatbot became racist, sexist, and homophobic voice command systems in cars fail more often with women criminal sentencing AI shows bias against African-Americans I'm not sure I'd describe the first two as biased in any way. The chatbot was largely feeding back what it was fed and it didn't have good defenses. If on the other hand, it had become over extremely nice, I suspect the same people complaining about it would have found that just fine, and not biased at all. Women's voices are harder to recognize. I'll tell you another one, so are Scottish accents. This video is a comedy sketch (two Scotsmen stuck in a voice recognition elevator) but similar problems have been reported with Siri. If you have never lived in Scotland then you may need to turn on the closed-captioning to understand everything, but that doesn't prove you are biased against Scots. https://youtu.be/NMS2VnDveP8 This might reflect not doing enough training with women (and Scotsmen) or it might reflect a genuine harder problem. I'll tell you one I know from some consulting work I did a few years ago: Chinese women's fingerprints are harder to recognize. This is a genuine issue, the prints are less well-defined, and does not just reflect lack of training on Chinese women. In fact, fingerprint recognition isn't (or certainly wasn't) done using neural nets, that would be using a sledgehammer to crack a nut. The fingerprint algorithms simply look at features and try and match them to the stored data. The third one, the criminal sentencing one, might be true bias. I don't know which system is being referred to. Training Data Not all systems are based on neural nets, but those that are, reflect the training data they are given. There are three ways to get training data. First, approach: have some labeled training data. The most well-known, for vision, is ImageNet, the existence of which may be responsible for a good part of the huge improvement in vision algorithms. Previously groups used limited data, whereas ImageNet contains 14 million labeled images. Not just dogs, but breeds, for example. Second, generate it algorithmically. This is known as competitive learning and is how AlphaGoZero got so good so fast. It only needs to rules of the game to generate lots of training data. In a similar way, self-driving car algorithms are often trained against driving environment simulators. But most situations do not have relatively simple rules, and this is not possible. Direct training, also known as unsupervised learning. This is the "take your toddler to the zoo, point to an animal, and say 'that is a zebra', and the training is done." (see my post Embedded Vision Summit: It's a Visual World for more background). We are nowhere near knowing how to do this. Yann LeCun, until recently Facebook's head of AI said after AlphaGo's win: As I've said in previous statements: most of human and animal learning is unsupervised learning. We need to solve the unsupervised learning problem before we can even think of getting to true AI. And that's just an obstacle we know about. What about all the ones we don't know about? So I think Will was limiting the problem too much when he said: All machine intelligence is built upon training data that was, at one point, created by people. The first approach above is like that. But the second is not. AlphaGo Zero only had training data generated by people in the very limited sense that people created the rules of Go and Chess. And unsupervised learning is only using training data generated by people in that people decided those striped animals are called zebras. He did have some examples of labeled faces from the wild that have been used to gauge the effectiveness of facial recognition tools. He found that 83% were white people. For the US that is a little high (white people are 77%). Further, if this was training for an algorithm to be used in China, it is obviously a poor choice of dataset. But this shows a general point, that if you want to train for a more general case, you need more (and often a lot more) training data. If you removed the excess white people from the corpus, it won't improve recognition of Chinese. What is Statistical Bias? Bias in an everyday conversation sometimes means that any statement that there are differences between groups, such as nationality or sex or age, is biased. But bias is really a statistical term, which measures the expected error when estimating the value of a parameter, the difference between the sampled value and the actual value. So it is closer to "how true is it?". If the bias is zero (or small enough), the estimator is said to be unbiased. Black men are much less likely to commit suicide than non-blacks. What that means is that an unbiased sample for the rate of suicide in blacks is lower than for the rest of the country. That is an important fact for doctors to know and take into account when thinking about how to treat, for example, depression. It is really important not to override unbiased estimates (in the statistical sense), just because they appear biased in the everyday sense. Of course, care has to be taken that the sample really is unbiased, but with enough randomization, it does not even need to be a huge sample, the size of the population doesn't even appear in the equations (a teaspoon is still just fine to check the saltiness of a commercial-sized pan of soup). Or women like romcoms more than action movies, and men are the other way around. Again, saying that is just sexist bias (in the everyday sense) rather than admitting it is unbiased in the statistical sense is just likely to lead to Netflix annoying everyone. Men and women will be bombarded with movies in which they have no interest, if they get pressured to ignore reality. While I'm talking about statistics, here are two other things people get wrong all the time. just because a distribution is continuous doesn't mean that nothing can be distinguished. The color spectrum is continuous, but that doesn't mean we don't distinguish between red and green every day, even though there are lots of colors in between too. Good luck with telling a police officer if you go through a red light that all colors are the same, and it's really "huist" to say otherwise. just because two distributions overlap, and the in-group variance is bigger than that between them, then it does not mean that the groups are essentially the same. The in-group variability in height for women (and for men) is bigger than the average difference in height between men and women. But it remains true that, on average, women are shorter than men. Obviously, you need to avoid the trap of going to the extreme and saying "all women are shorter than all men". Just watch Game of Thrones. I think that in this social media era, there is a big worry that scientifically accurate results get ignored because they don't "feel right". For example, above I said that women like romcom movies. Maybe you thought that was just a stereotype so you could ignore it. But, in fact, Stereotype Accuracy is one of the Largest and Most Replicable Effects in All of Social Psychology. But again, don't go too far: stereotypes are only true on average, and don't apply to every individual. I'm sure there are women that hate romantic comedies, for example. In particular, in the title of Will's presentation, it would be very dangerous if social media pressure made it a "business imperative" to ignore science. Final Conclusion There are lots of mistakes to avoid in training data, unintentional bias being one of them. But we should be careful not to fall into the trap of avoiding so-called bias when it is actually statistically unbiased. We use the loaded term bias in two totally different ways, and we need to be aware of the difference. Sign up for Sunday Brunch, the weekly Breakfast Bytes email.