Kartik Hosanagar Vivian Jair
July 23, 2018
In 2014, Stanford professor Clifford Nass faced a student revolt. Nass s
students claimed that those in one section of his technology interface course
received higher grades on the final exam than counterparts in another.
Unfortunately, they were right: two different teaching assistants had graded
the two different sections exams, and one had been more lenient than the
other. Students with similar answers had ended up with different grades.
Nass, a computer scientist, recognized the unfairness and created a technical
fix: a simple statistical model to adjust scores, where students got a certain
percentage boost on their final mark when graded by a TA known to give grades
that percentage lower than average. In the spirit of openness, Nass sent out
emails to the class with a full explanation of his algorithm. Further
complaints poured in, some even angrier than before. Where had he gone wrong?
Companies and governments increasingly rely upon algorithms to make decisions
that affect people s lives and livelihoods from loan approvals, to
recruiting, legal sentencing, and college admissions. Less vital decisions,
too, are being delegated to machines, from internet search results to product
recommendations, dating matches, and what content goes up on our social media
feeds. In response, many experts have called for rules and regulations that
would make the inner workings of these algorithms transparent. But as Nass s
experience makes clear, transparency can backfire if not implemented carefully.
Fortunately, there is a smart way forward.
Transparency and Trust
Two years after the protests in Nass s class, Ren Kizilcec, a young Stanford
PhD student who had worked under Nass decided to conduct a study looking at the
effects of grading transparency on student trust. He used the massive open
online course (MOOC) platform Coursera, which, like many MOOCs, employs peer
grading to manage an extraordinarily high volumes of exams. The work gets done,
but peer grading exacerbates the problem of grading bias since it involves
large numbers of graders with varying personalities and tendencies.
In Kizilcec s study, 103 students submitted essays for peer grading and got
back two marks: a grade that represented an average peer grade, and a computed
grade which was the product of an algorithm that adjusted for bias. Some
students were told, Your computed grade is X, which is the grade you received
from your peers. Others were provided greater transparency in fact an entire
paragraph explaining how the grade had been calculated, why adjustments had
been made (to account for peers bias and accuracy ), and naming the type of
algorithm used ( an expectation maximization algorithm with a prior ). Both
groups were then asked to rate their trust in the process.
The students had also been asked what grade they thought they would get, and it
turned out that levels of trust in those students whose actual grades hit or
exceeded that estimate were unaffected by transparency. But people whose
expectations were violated students who received lower scores than they
expected trusted the algorithm more when they got more of an explanation of
how it worked. This was interesting for two reasons: it confirmed a human
tendency to apply greater scrutiny to information when expectations are
violated. And it showed that the distrust that might accompany negative or
disappointing results can be alleviated if people believe that the underlying
process is fair.
But how do we reconcile this finding with Nass s experience? Kizilcec had in
fact tested three levels of transparency: low and medium but also high, where
the students got not only a paragraph explaining the grading process but also
their raw peer-graded scores and how these were each precisely adjusted by the
algorithm to get to a final grade. And this is where the results got more
interesting. In the experiment, while medium transparency increased trust
significantly, high transparency eroded it completely, to the point where trust
levels were either equal to or lower than among students experiencing low
transparency.
Making Modern AI Transparent: A Fool s Errand?
What are businesses to take home from this experiment? It suggests that
technical transparency revealing the source code, inputs, and outputs of the
algorithm can build trust in many situations. But most algorithms in the
world today are created and managed by for-profit companies, and many
businesses regard their algorithms as highly valuable forms of intellectual
property that must remain in a black box. Some lawmakers have proposed a
compromise, suggesting that the source code be revealed to regulators or
auditors in the event of a serious problem, and this adjudicator will assure
consumers that the process is fair.
This approach merely shifts the burden of belief from the algorithm itself to
the regulators. This may a palatable solution in many arenas: for example, few
of us fully understand financial markets, so we trust the SEC to take on
oversight. But in a world where decisions large and small, personal and
societal, are being handed over to algorithms, this becomes less acceptable.
Another problem with technical transparency is that it makes algorithms
vulnerable to gaming. If an instructor releases the complete source code for an
algorithm grading student essays, it becomes easy for students to exploit
loopholes in the code: maybe, for example, the algorithm seeks evidence that
the students have done research by looking for phrases such as according to
published research. A student might then deliberately use this language at the
start of every paragraph in her essay.
But the biggest problem is that modern AI is making source code transparent
or not less relevant compared with other factors in algorithmic functioning.
Specifically, machine learning algorithms and deep learning algorithms in
particular are usually built on just a few hundred lines of code. The
algorithms logic is mostly learned from training data and is rarely reflected
in its source code. Which is to say, some of today s best-performing algorithms
are often the most opaque. High transparency might involve getting our heads
around reams and reams of data and then still only being able to guess at
what lessons the algorithm has learned from it.
This is where Kizilcec s work becomes relevant a way to embrace rather than
despair over deep learning s impenetrability. His work shows that users will
not trust black box models, but they don t need or even want extremely high
levels of transparency. That means responsible companies need not fret over
what percentage of source code to reveal, or how to help users read massive
datasets. Instead, they should work to provide basic insights on the factors
driving algorithmic decisions.
Explainable AI: The Way Forward
One of the more important sections of the EU s groundbreaking General Data
Protection Regulation (GDPR) focuses on the right to explanation. Essentially,
it mandates that users be able to demand the data behind the algorithmic
decisions made for them, including in recommendation systems, credit and
insurance risk systems, advertising programs, and social networks. In doing so,
it tackles intentional concealment by corporations. But it doesn t address
the technical challenges associated with transparency in modern algorithms.
Here, a movement called explainable AI (xAI) might be helpful.
xAI systems work by analyzing various inputs used by a decision-making
algorithm, measuring the impact of each of the inputs individually and in
groups, and finally reporting the set of inputs that had the biggest impact on
the final decision. For example, if such a system were applied to an
essay-grading algorithm, it might analyze how changes in various inputs such as
content, word count, vocabulary level, grammar, or sourcing affected the final
grade and provide an explanation like this:
Tim received a score of 73 on his essay.
49 percent of Tim s score is explained by content matches with key concepts
listed in the grading key.
18 percent of the score is explained by Tim s essay exceeding the word-count
threshold of 1,000 words but not exceeding the limit of 1,300 words.
13 percent of the score is explained by the fact that Tim s essay mentioned
relevant source documents in appropriate contexts.
The rest of Tim s score is explained by several other less significant factors.
In some of our ongoing research, we find that achieving this level of
transparency is well within the capabilities of today s machine learning and
statistical methods. This kind of analysis could help engineers get around the
black box problem the problem that they themselves don t always know what is
motivating the decisions of their machine learning algorithms. It identifies
relationships between inputs and outcomes, spots possible biases, and gives
routes into fixing problems. Would it also, for users, hit that transparency
sweet spot that Kizilcec identified? It s too soon to tell. In the meantime, it
is worth remembering that building trust in machine learning and analytics will
require a system of relationships, where regulators, for example, get high
levels of transparency, and users accept medium levels. Both sides are
important, says Kizilcec of how transparency for auditors versus users can
effect buy-in. If we get only one side right, it won t work.
Kartik Hosanagar is a Professor of Technology and Digital Business at The
Wharton School of the University of Pennsylvania. He was previously a cofounder
of Yodle Inc. Follow him on Twitter @khosanagar.
Vivian Jair is a graduate of the University of Pennsylvania where she studied
Strategic Management, Finance, and Operations and worked as a research
assistant in ProfessorKartik Hosanagar s research group.