💾 Archived View for gmi.noulin.net › mobileNews › 6005.gmi captured on 2023-01-29 at 17:24:17. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

Superforecasting: How to Upgrade Your Company s Judgment

2016-04-29 09:04:43

Paul J. H. SchoemakerPhilip E. Tetlock

From the May 2016 Issue

Imagine that you could dramatically improve your firm s forecasting ability,

but to do so you d have to expose just how unreliable its predictions and the

people making them really are. That s exactly what the U.S. intelligence

community did, with dramatic results. Back in October 2002, the National

Intelligence Council issued its official opinion that Iraq possessed chemical

and biological weapons and was actively producing more weapons of mass

destruction. Of course, that judgment proved colossally wrong. Shaken by its

intelligence failure, the $50 billion bureaucracy set out to determine how it

could do better in the future, realizing that the process might reveal glaring

organizational deficiencies.

The resulting research program included a large-scale, multiyear prediction

tournament, co-led by one of us (Phil), called the Good Judgment Project. The

series of contests, which pitted thousands of amateurs against seasoned

intelligence analysts, generated three surprising insights: First, talented

generalists often outperform specialists in making forecasts. Second, carefully

crafted training can enhance predictive acumen. And third, well-run teams can

outperform individuals. These findings have important implications for the way

organizations and businesses forecast uncertain outcomes, such as how a

competitor will respond to a new-product launch, how much revenue a promotion

will generate, or whether prospective hires will perform well.

The approach we ll describe here for building an ever-improving organizational

forecasting capability is not a cookbook that offers proven recipes for

success. Many of the principles are fairly new and have only recently been

applied in business settings. However, our research shows that they can help

leaders discover and nurture their organizations best predictive capabilities

wherever they may reside.

Find the Sweet Spot

Companies and individuals are notoriously inept at judging the likelihood of

uncertain events, as studies show all too well. Getting judgments wrong, of

course, can have serious consequences. Steve Ballmer s prognostication in 2007

that there s no chance that the iPhone is going to get any significant market

share left Microsoft with no room to consider alternative scenarios. But

improving a firm s forecasting competence even a little can yield a competitive

advantage. A company that is right three times out of five on its judgment

calls is going to have an ever-increasing edge on a competitor that gets them

right only two times out of five.

Before we discuss how an organization can build a predictive edge, let s look

at the types of judgments that are most amenable to improvement and those not

worth focusing on. We can dispense with predictions that are either entirely

straightforward or seemingly impossible. Consider issues that are highly

predictable: You know where the hands of your clock will be five hours from

now; life insurance companies can reliably set premiums on the basis of updated

mortality tables. For issues that can be predicted with great accuracy using

econometric and operations-research tools, there is no advantage to be gained

by developing subjective judgment skills in those areas: The data speaks loud

and clear.

About the Good Judgment Project

In 2011, Philip Tetlock teamed up with Barbara Mellers, of the Wharton School,

to launch the Good Judgment Project. The goal was to determine whether some

people are naturally better than others at prediction and whether prediction

performance could be enhanced. The GJP was one of five academic research teams

that competed in an innovative tournament funded by the Intelligence Advanced

Research Projects Activity (IARPA), in which forecasters were challenged to

answer the types of geopolitical and economic questions that U.S. intelligence

agencies pose to their analysts.

The IARPA initiative ran from 2011 to 2015 and recruited more than 25,000

forecasters who made well over a million predictions on topics ranging from

whether Greece would exit the eurozone to the likelihood of a leadership

turnover in Russia to the risk of a financial panic in China. The GJP

decisively won the tournament besting even the intelligence community s own

analysts.

At the other end of the spectrum, we find issues that are complex, poorly

understood, and tough to quantify, such as the patterns of clouds on a given

day or when the next game-changing technology will pop out of a garage in

Silicon Valley. Here, too, there s little advantage in investing resources in

systematically improving judgment: The problems are just too hard to crack.

The sweet spot that companies should focus on is forecasts for which some data,

logic, and analysis can be used but seasoned judgment and careful questioning

also play key roles. Predicting the commercial potential of drugs in clinical

trials requires scientific expertise as well as business judgment. Assessors of

acquisition candidates draw on formal scoring models, but they must also gauge

intangibles such as cultural fit, the chemistry among leaders, and the

likelihood that anticipated synergies will actually materialize.

Consider the experience of a UK bank that lost a great deal of money in the

early 1990s by lending to U.S. cable companies that were hot but then tanked.

The chief lending officer conducted an audit of these presumed lending errors,

analyzing the types of loans made, the characteristics of clients and loan

officers involved, the incentives at play, and other factors. She scored the

bad loans on each factor and then ran an analysis to see which ones best

explained the variance in the amounts lost. In cases where the losses were

substantial, she found problems in the underwriting process that resulted in

loans to clients with poor financial health or no prior relationship with the

bank issues for which expertise and judgment were important. The bank was able

to make targeted improvements that boosted performance and minimized losses.

On the basis of our research and consulting experience, we have identified a

set of practices that leaders can apply to improve their firms judgment in

this middle ground. Our recommendations focus on improving individuals

forecasting ability through training; using teams to boost accuracy; and

tracking prediction performance and providing rapid feedback. The general

approaches we describe should of course be tailored to each organization and

evolve as the firm learns what works in which circumstances.

Train for Good Judgment

Most predictions made in companies, whether they concern project budgets, sales

forecasts, or the performance of potential hires or acquisitions, are not the

result of cold calculus. They are colored by the forecaster s understanding of

basic statistical arguments, susceptibility to cognitive biases, desire to

influence others thinking, and concerns about reputation. Indeed, predictions

are often intentionally vague to maximize wiggle room should they prove wrong.

The good news is that training in reasoning and debiasing can reliably

strengthen a firm s forecasting competence. The Good Judgment Project

demonstrated that as little as one hour of training improved forecasting

accuracy by about 14% over the course of a year.

Learn the basics.

Basic reasoning errors (such as believing that a coin that has landed heads

three times in a row is likelier to land tails on the next flip) take a toll on

prediction accuracy. So it s essential that companies lay a foundation of

forecasting basics: The GJP s training in probability concepts such as

regression to the mean and Bayesian revision (updating a probability estimate

in light of new data), for example, boosted participants accuracy measurably.

Companies should also require that forecasts include a precise definition of

what is to be predicted (say, the chance that a potential hire will meet her

sales targets) and the time frame involved (one year, for example). The

prediction itself must be expressed as a numeric probability so that it can be

precisely scored for accuracy later. That means asserting that one is 80%

confident, rather than fairly sure, that the prospective employee will meet

her targets.

Understand cognitive biases.

Cognitive biases are widely known to skew judgment, and some have particularly

pernicious effects on forecasting. They lead people to follow the crowd, to

look for information that confirms their views, and to strive to prove just how

right they are. It s a tall order to debias human judgment, but the GJP has had

some success in raising participants awareness of key biases that compromise

forecasting. For example, the project trained beginners to watch out for

confirmation bias that can create false confidence, and to give due weight to

evidence that challenges their conclusions. And it reminded trainees to not

look at problems in isolation but, rather, take what Nobel laureate Daniel

Kahneman calls the outside view. For instance, in predicting how long a

project will take to complete, trainees were counseled to first ask how long it

typically takes to complete similar projects, to avoid underestimating the time

needed.

Training can also help people understand the psychological factors that lead to

biased probability estimates, such as the tendency to rely on flawed intuition

in lieu of careful analysis. Statistical intuitions are notoriously susceptible

to illusions and superstition. Stock market analysts may see patterns in the

data that have no statistical basis, and sports fans often regard basketball

free-throw streaks, or hot hands, as evidence of extraordinary new capability

when in fact they re witnessing a mirage caused by capricious variations in a

small sample size.

How Training and Teams Improve Prediction

The Good Judgment Project tracked the accuracy of participants forecasts about

economic and geopolitical events. The control group, made up of motivated

volunteers, received no training about the biases that can plague forecasters.

Its members performed at about the same level as most employees in high-quality

companies perhaps even better, since they were self-selected, competitive

individuals. The second group benefited from training on biases and how to

overcome them. Teams of trained individuals, who debated their forecasts

(usually virtually), performed even better. When the best forecasters were

culled, over successive rounds, into an elite group of superforecasters, their

predictions were nearly twice as accurate as those made by untrained

forecasters representing a huge opportunity for companies.

R1605E_SCHOEMAKER_IMPROVEPREDICTION.png

Another technique for making people aware of the psychological biases

underlying skewed estimates is to give them confidence quizzes. Participants

are asked for range estimates about general-interest questions (such as How

old was Martin Luther King Jr. when he died? ) or company-specific ones (such

as How much federal tax did our firm pay in the past year? ). The predictors

task is to give their best guess in the form of a range and assign a degree of

confidence to it; for example, one might guess with 90% confidence that Dr.

King was between 40 and 55 when he was assassinated (he was 39). The aim is to

measure not participants domain-specific knowledge, but, rather, how well they

know what they don t know. As Will Rogers wryly noted: It is not what we don t

know that gets us into trouble; it is what we know that ain t so. Participants

commonly discover that half or more of their 90% confidence ranges don t

contain the true answer.

Again, there s no one-size-fits-all remedy for avoiding these systematic

errors; companies should tailor training programs to their circumstances.

Susquehanna International Group, a privately held global quantitative trading

firm, has its own idiosyncratic approach. Founded in 1987 by poker aficionados,

the company, which transacts more than a billion dollars in trades a year,

requires new hires to play lots of poker on company time. In the process,

trainees learn about cognitive traps, emotional influences such as wishful

thinking, behavioral game theory, and, of course, options theory, arbitrage,

and foreign exchange and trading regulations. The poker-playing exercises

sensitize the trainees to the value of thinking in probability terms, focusing

on information asymmetry (what the opponent might know that I don t), learning

when to fold a bad hand, and defining success not as winning each round but as

making the most of the hand you are dealt.

Companies should also engage in customized training that focuses on narrower

prediction domains, such as sales and R&D, or areas where past performance has

been especially poor. If your sales team is prone to hubris, that bias can be

systematically addressed. Such tailored programs are more challenging to

develop and run than general ones, but because they are targeted, they often

yield greater benefits.

Build the Right Kind of Teams

Assembling forecasters into teams is an effective way to improve forecasts. In

the Good Judgment Project, several hundred forecasters were randomly assigned

to work alone and several hundred to work collaboratively in teams. In each of

the four years of the IARAP tournament, the forecasters working in teams

outperformed those who worked alone. Of course, to achieve good results, teams

must be deftly managed and have certain distinctive features.

Composition.

The forecasters who do the best in GJP tournaments are brutally honest about

the source of their success, appreciating that they may have gotten a

prediction right despite (not because of) their analysis. They are cautious,

humble, open-minded, analytical and good with numbers. In assembling teams,

companies should look for natural forecasters who show an alertness to bias, a

knack for sound reasoning, and a respect for data.

Who Are These Superforecasters?

The Good Judgment Project identified the traits shared by the best-performing

forecasters in the Intelligence Advanced Research Projects Activity tournament.

A public tournament is ongoing at gjopen.com; join to see if you have what it

takes.

Philosophical Approach and Outlook

Cautious

They understand that few things are certain

Humble

They appreciate their limits

Nondeterministic

They don t assume that what happens is meant to be

Abilities and Thinking Style

Open-minded

They see beliefs as hypotheses to be tested

Inquiring

They are intellectually curious and enjoy mental challenges

Reflective

They are introspective and self-critical

Numerate

They are comfortable with numbers

Methods of Forecasting

Pragmatic

They are not wedded to any one idea or agenda

Analytical

They consider other views

Synthesizing

They blend diverse views into their own

Probability-focused

They judge the probability of events not as certain or uncertain but as more or

less likely

Thoughtful updaters

They change their minds when new facts warrant it

Intuitive shrinks

They are aware of their cognitive and emotional biases

Work Ethic

Improvement-minded

They strive to get better

Tenacious

They stick with a problem for as long as needed

It s also important that forecasting teams be intellectually diverse. At least

one member should have domain expertise (a finance professional on a budget

forecasting team, for example), but nonexperts are essential too particularly

ones who won t shy away from challenging the presumed experts. Don t

underestimate these generalists. In the GJP contests, nonexpert civilian

forecasters often beat trained intelligence analysts at their own game.

Diverging, evaluating, and converging.

Whether a team is making a forecast about a single event (such as the

likelihood of a U.S. recession two years from now) or making recurring

predictions (such as the risk each year of recession in an array of countries),

a successful team needs to manage three phases well: a diverging phase, in

which the issue, assumptions, and approaches to finding an answer are explored

from multiple angles; an evaluating phase, which includes time for productive

disagreement; and a converging phase, when the team settles on a prediction. In

each of these phases, learning and progress are fastest when questions are

focused and feedback is frequent.

The diverging and evaluating phases are essential; if they are cursory or

ignored, the team develops tunnel vision focusing too narrowly and quickly

locking into a wrong answer and prediction quality suffers. The right norms can

help prevent this, including a focus on gathering new information and testing

assumptions relevant to the forecasts. Teams must also focus on neutralizing a

common prediction error called anchoring, wherein an early and possibly

ill-advised estimate skews subsequent opinions far too long. This often happens

unconsciously because easily available numbers serve as convenient starting

points. (Even random numbers, when used in an initial estimate, have been shown

to anchor people s final judgments.)

One of us (Paul) ran an experiment with University of Chicago MBA subjects that

demonstrated the impact of divergent exploration on the path to a final

prediction. In one test, subjects in the control group were asked to estimate

how many gold medals the U.S. would win relative to another top country in the

next summer Olympics and to provide their 90% confidence ranges around these

estimates. The other group was asked to first sketch out various reasons why

the ratio of medals might be lower or higher than in years past and then make

an estimate. This group naturally thought back to terrorist attacks and

boycotts, and considered other factors that might influence the outcome, from

illness to improved training to performance-enhancing drugs. As a consequence

of this divergent thinking, this group s ranges were significantly wider than

the control group s, often by more than half. In general, wider ranges reflect

more carefully weighed predictions; narrow ranges commonly indicate

overconfident and often less accurate forecasts.

Trust.

Finally, trust among members of any team is required for good outcomes. It is

particularly critical for prediction teams because of the nature of the work.

Teams that are predicting the success or failure of a new acquisition, or

handicapping the odds of successfully divesting a part of the business, may

reach conclusions that raise turf issues or threaten egos and reputations. They

are also likely to expose areas of the firm, and perhaps individuals, with poor

forecasting abilities. To ensure that forecasters share their best thinking,

members must trust one another and trust that leadership will defend their work

and protect their jobs and reputations. Few things chill a forecasting team

faster than a sense that its conclusions could threaten the team itself.

Track Performance and Give Feedback

Our work on the Good Judgment Project and with a range of companies shows that

tracking prediction outcomes and providing timely feedback is essential to

improving forecasting performance.

Consider U.S. weather forecasters, who, though much maligned, excel at what

they do. When they say there s a 30% chance of rain, 30% of the time it rains

on those days, on average. Key to their superior performance is that they

receive timely, continual, and unambiguous feedback about their accuracy, which

is often tied to their performance reviews. Bridge players, internal auditors,

and oil geologists also shine at prediction thanks in part to robust feedback

and incentives for improvement.

The purest measure for the accuracy of predictions and tracking them over time

is the Brier score. It allows companies to make direct, statistically reliable

comparisons among forecasters across a series of predictions. Over time, the

scores reveal those who excel, be they individuals, members of a team, or

entire teams competing with others.

Brier Scores Reveal Your Best and Worst Predictors

It s important that forecasters make precise estimates of probability for

example, pegging at 80% the likelihood that their firm will sell between 9,000

and 11,000 units of a new product in the first quarter. That way, the

predictions can be analyzed and compared using a method called Brier scoring,

allowing managers to reliably rank forecasters on the basis of skill.

Brier scores are calculated by squaring the difference between a probability

prediction and the actual outcome, scored as 1 if the event happened and 0 if

not. For example, if a forecaster assigns a 0.9 probability (a 90% confidence

level) that the firm will exceed a sales target and the firm then does, her

Brier score for that forecast is:

(0.9 1) , or 0.01.

If the firm misses the target, her score is:

(0.9 0) , or 0.81.

The closer to zero the score is, the smaller the forecast error and the better

the prediction.

Brier scoring makes it readily apparent who s good at forecasting and who isn

t. By enabling direct comparison among forecasters, the tool encourages

thoughtful analysis while exposing shooting from the hip and biased

prognostications.

But simply knowing a team s score does little to improve performance; you have

to track the process it used as well. It s important to audit why outcomes were

achieved good or bad so that you can learn from them. Some audits may reveal

that certain process steps led to a good or a bad prediction. Others may show

that a forecast was correct despite a faulty rationale (that is, it was lucky),

or that a forecast was wrong because of unusual circumstances rather than a

flawed analysis. For example, a retailer may make very accurate forecasts of

how many customers will visit a store on a given day, but if a black-swan event

say, a bomb threat closes the store, its forecast for that day will be badly

off. Its Brier score would indicate poor performance, but a process audit would

show that bad luck, not bad process, accounted for the outlying score.

Gauging group dynamics is also a critical part of the process audit. No amount

of good data and by-the-book forecasting can overcome flawed team dynamics.

Consider the discussions that took place between NASA and engineering

contractor Morton Thiokol before the doomed launch of the space shuttle

Challenger in 1986. At first, Thiokol engineers advised against the launch,

concerned that cold temperatures could compromise the O-rings that sealed the

rocket boosters joints. They predicted a much higher than usual chance of

failure because of the temperature. Ultimately, and tragically, Thiokol

reversed its stance.

The engineers analysis was good; the organizational process was flawed. A

reconstruction of the events that day, based on congressional hearings,

revealed the interwoven conditions that compromised the forecast: time

pressure, directive leadership, failure to fully explore alternate views,

silencing of dissenters, and a sense of infallibility (after all, 24 previous

flights had gone well).

To avoid such catastrophes and to replicate successes companies should

systematically collect real-time accounts of how their top teams make

judgments, keeping records of assumptions made, data used, experts consulted,

external events, and so on. Videos or transcripts of meetings can be used to

analyze process; asking forecasters to record their own process may also offer

important insights. Recall Susquehanna International Group, which trains its

traders to play poker. Those traders are required to document their rationale

for entering or exiting a trade before making a transaction. They are asked to

consider key questions: What information might others have that you don t that

might affect the trade? What cognitive traps might skew your judgment on this

transaction? Why do you believe the firm has an edge on this trade? Susquehanna

further emphasizes the importance of process by pegging traders bonuses not

just to the outcome of individual trades but also to whether the underlying

analytic process was sound.

Well-run audits can reveal post facto whether forecasters coalesced around a

bad anchor, framed the problem poorly, overlooked an important insight, or

failed to engage (or even muzzled) team members with dissenting views.

Likewise, they can highlight the process steps that led to good forecasts and

thereby provide other teams with best practices for improving predictions.

Each of the methods we ve described training, team building, tracking, and

talent spotting is essential to good forecasting. The approach must be

customized across businesses, and no firm, to our knowledge, has yet mastered

them all to create a fully integrated program. This presents a great

opportunity for companies that take the lead particularly those with a culture

of organizational innovation and those who embrace the kind of experimentation

the intelligence community did.

But companies will capture this advantage only if respected leaders champion

the effort, by broadcasting an openness to trial and error, a willingness to

ruffle feathers, and a readiness to expose what we know that ain t so in

order to hone the firm s predictive edge.

A version of this article appeared in the May 2016 issue (pp.72 78) of Harvard

Business Review.

Paul J. H. Schoemaker is the former research director of the Wharton School s

Mack Institute and a coauthor of Peripheral Vision (Harvard Business Review

Press, 2006). He served as an adviser to the Good Judgment Project.

Philip E. Tetlock is the Annenberg University Professor at the University of

Pennsylvania and a coauthor of Superforecasting (Crown, 2015). He co-led the

Good Judgment Project.