Sometimes I’ve come across a bit of real-life interpersonal relationships drama in which one person says of another, “that person gave me a funny look—they must hate me”.
Although a person saying this may have been destabilised by other factors (high-stress medical conditions or whatever) and may have immediate emotional needs that are more for sympathy than logic, there is a part of me that wants to avoid the possibility of reinforcing a potentially harmful misunderstanding, and respond along the lines of:
What if you misread them? What if what they were thinking of wasn’t you at all? Can’t we at least apply Hanlon’s Razor here?
to which the frustrated individual on the receiving end of the “look” might reply something like:
What do *you* know about facial expressions? You have a brain condition that stops you from being able to process them! I *know* I’ve read that person correctly, and you’re in no position to suggest otherwise.
It’s correct that I basically can’t process facial expressions myself, but I *do* happen to have scientific evidence that “normal” people aren’t always right about them.
When I was a PhD student, I sat two desks away from a specialist who knows *everything* about facial expressions. She invented a computer system to read them, and went on to turn that into a successful AI company. You might have read her book Girl Decoded. And while Rana el Kaliouby was primarily focused on *computer* readings of emotions, she necessarily had to look at the way *humans* read them too—which brings me to a very interesting number she found in one of her experiments.
Rana has improved her emotion-reading system over the years: her early versions were not as good as the modern one. (I say this because I want it to be clear that the numbers I’m about to drag up from her early days are *not* representative of the modern commercial system.) I distinctly remember one time when Rana was testing an *early* version of her system on a certain set of videos, and its accuracy turned out to be 77.4%. Although getting it right nearly four times out of five was definitely impressive, it didn’t seem *super* impressive just yet, and Rana wanted to compare it with something else to get some idea of where she stood.
Well there *wasn’t* another computer system that could try to read the complex emotions in those videos—there might have been a couple of basic systems that could tell “happiness” from “sadness”, but Rana was the first one to try to distinguish between things like “confusion” and “concentration”. So the only way she’d be able to get a meaningful comparison figure with which to evaluate her system, was to run an experiment asking *humans* to name the emotions and see how well *they* did.
Of course Rana picked fully-sighted neurotypical subjects with no known medical issues that affected the reading of facial expressions, and there was me sitting in the corner thinking “oh, those sighties will get 98% for sure”.
On the basic test, they got 71%.
They were wrong nearly three times in ten, just looking at *basic* emotions.
That person who gave you a funny look? Could be a 30% chance your brain was misreading it.
And then, when those volunteers were tested on more *complex* emotional states, their accuracy fell to 51.5%.
This experiment says there could be a 50-50 chance your brain has misread that subtly funny look the person gave you.
(I’m not leaking unpublished results here. It’s all in Rana’s thesis. Naturally she doesn’t make it a *major* point, because her thesis is focused on the automatic system, but the human results *are* in there. Download Rana’s thesis and search for Figure 3.4.)
We *cannot* jump to conclusions about a person’s relationship with us from the way our brain reacted to just one look, even if we know that look was reacting to *us* (which we don’t know for sure). We need more data than that. Sure, we might not want to hang around to collect said more data if we’re likely to be in a one-off confrontational situation with a stranger, but if it’s a person you’ll meet again, then please don’t write them off *just* because of a look—you might have read it correctly, but you also might not have, so it might be worth giving your hypothesis an extra test or two, just to make sure.
Now, we *could* argue about these percentages. Although Rana conducted the experiment rigorously, it *was* only one experiment and could probably do with being repeated at a larger scale—after all, Rana wasn’t exactly focusing her PhD on how good *humans* are at reading emotions; that was just an incidental figure she needed as a baseline for evaluating her computer system. Nevertheless, it is strong evidence that there exist at least some circumstances in which “normal” people can fail to read emotions correctly between 30% and 50% of the time.
Can I take a “cheap shot” at Kate Crawford’s book *Atlas of AI*—Crawford was right to point out the huge hidden resource consumption of some “Big Tech” (not just limited to “AI”) and the dangers of people having “automation bias” (thinking the computer is better than it really is—I’m not sure “AI” should have been called “intelligence” unless you limit that to pure “crystallised” intelligence in Cattell’s model), but when she had a go at Rana’s brainchild she went too far. Yes I may be biased: I shared an office with Rana and encouraged her in her quest, saying her system could be a great help to people who have trouble reading emotions themselves. But I do have an actual argument here; I’m not *just* being pro-Rana because I was one of her “lab mates”.
In fairness to Kate Crawford, I don’t think she *meant* to do a “hatchet job” on Rana. I mean, if you’re writing a book called “Atlas of” some problem, then it can understandably make you feel pressure to dig up a bit of “dirt” on as many aspects of it as possible. Something exists called affective computing? Quick, find something shaky about it—OK, got something, so let’s dismiss that field and move on to the next because this publisher deadline won’t meet itself. That being the case, I’m only having a go at the book, not the author who might have done better if she hadn’t been under pressure at the time.
So the objection to the whole of affective computing that was raised in *Atlas of AI* went back to Rana’s early experiments testing both her system and her colleagues on a set of videos that was originally compiled by Professor Simon Baron-Cohen (the one who came up with the “empathising–systemising” theory, with men tending to lean a little toward the ‘systemising’ solve-the-problem side and women tending to lean a little toward the ‘empathising’ understand-the-person side, and autism being an arrangement of the brain that’s extremely good at systemising but has more trouble with empathising, although that *doesn’t* mean autistic people don’t *have* empathy, they just might need a clearer explanation of the situation first)—Baron-Cohen compiled a set of videos for an experiment he wanted to run to find out if autistic people can be trained to recognise emotions (which is a fair enough question, whatever its answer turned out to be), and, in compiling that set of videos (which was also the one used by Rana in her tests), Baron-Cohen did employ some actors.
Actors! Oh no! Everybody panic and run because the whole field that Rana invented is resting on the wobbly foundation of a set of videos made by *actors*!
Except, of course, that’s not the whole story.
Firstly, those were early days. Rana and her company have done *much* more work since then, working with *millions* of natural spontaneous emotions recorded from people who gave their consent. The state of Rana’s prototype in 2003 is not at all relevant to the modern system. Everybody’s got to start *somewhere* and it’s been effectively remade since then.
Secondly—and more relevant to our interpretation of that small-scale experiment with human emotion readers—although some actors can be bad, other actors can be good, and the *best* actors have learned the skill of adopting a temporary mindset that they *are* the character, facing the situation *for real*—or at least have the ability to re-live real incidents from their own lives that trigger the required emotions for real, which is one reason why being a top actor can be such a high-stress job.
True, it’s safe to say Baron-Cohen wouldn’t have been able to afford to employ actors with top track records like Patrick Stewart, nor did he clip pre-confirmed footage from existing blockbuster films (there *might* be legal arguments about copyright exceptions for certain uses in certain situations, but he probably preferred to avoid the possibility of getting that wrong and simply create new material from scratch), but if you employ enough early-career actors you should eventually find unrecognised “talent” among them—and each of the videos in that set was run past a panel of ten judges, and kept only if at least eight of the judges said it was a good performance of the emotion it was supposed to be.
Which basically means that if you take a set of performances that is misread at most 20% of the time by one set of people, then it can be misread 30% to 50% of the time by another set of people, when the two reading sessions were done under slightly different circumstances.
As I said, this was an early-days baseline test for getting Rana’s automatic system off the ground: it could do with a larger scale, more controlled repeat. But as preliminary data it’s still enough to cast doubt on the idea that the “normal” human brain is always reliable at reading the emotions of others. It is not.
All material © Silas S. Brown unless otherwise stated.