Uncertainty surrounds virtually every element of international politics. Heads of state confront uncertainty when judging how their counterparts will react to crises.1 Generals confront uncertainty when evaluating the chances that their strategies will succeed or fail.2 Intelligence analysts confront uncertainty when they assess other states’ capabilities and intentions.3 Diplomats confront uncertainty when they attempt to discern their negotiating partners’ bottom lines.4 In these circumstances, and many others like them, national security officials must constantly grapple with the fact that they possess imperfect information about the world.
How well do national security officials meet that challenge? What kinds of biases do national security officials display when assessing uncertainty? How do those tendencies shape international affairs? Scholars answer these questions in many different ways. Realists typically argue that national security officials can be expected to assess uncertainty in a rational, unbiased manner.5 Political psychologists often claim that national security officials are prone to overconfidence, in the sense that they consistently assign too much certainty to their judgments.6 Overconfidence is widely viewed as a source of instability in international politics, as leaders who exaggerate the chances that their policies will succeed will also be more prone to initiating military disasters.7 Other scholars, however, argue that national security officials are prone to underconfidence due to professional cultures that discourage taking analytic risks.8 Underconfidence can undermine national security decision-making, too, particularly by discouraging leaders from exploiting feasible opportunities to advance their country’s interests.
It is notoriously difficult to understand which of these problems predominates, and to what extent, in national security decision-making. Part of the problem is that national security is so complex that it is often impossible to say whether any assessment of uncertainty in this domain is “right” or “wrong.”9 For example, if a general claims that there is a 70 percent chance they will win a battle, but they lose, then it is hard to know whether the general’s judgment was flawed or if they simply got unlucky.10 The standard way to solve that challenge is to evaluate the accuracy of many judgments at once. Thus, if we look at all of the battles in which generals predict a 70 percent chance of success, then we can see whether generals actually win those battles roughly 70 percent of the time.11 Yet that approach is difficult to implement in national security affairs, where important events are rare enough to make statistical evaluation challenging, practitioners rarely make explicit assessments of uncertainty, and the most important judgments are often classified.12
The data reveal that national security officials’ intuitions are overwhelmingly overconfident.
To understand how cognitive biases might influence national security decision-making, scholars frequently analyze “non-elite” samples, such as college students or participants recruited from the general population. For example, a recent multi-year study called the Good Judgment Project recruited thousands of people to make nearly one million forecasts about international politics.13 These predictions tended to be overconfident, but it is hard to extrapolate the extent to which this finding generalizes to national security officials. Some studies show that elite and non-elite populations exhibit similar psychological tendencies;14 others argue that national security officials should display fewer cognitive biases than the broader public;15 others find that national security officials have biases that are not prominent among non-elites;16 while still others theorize that overconfidence is one bias that elites and non-elites are especially likely to share.17 It is not possible to resolve these debates without conducting large-scale analyses of how well (or how poorly) national security officials assess uncertainty.
This study aims to answer these questions by analyzing a novel dataset containing over 60,000 assessments of uncertainty made by nearly 2,000 military and civilian national security officials from more than forty NATO allies and partners. This is by far the largest publicly available body of probability estimates made by national security practitioners. It is also the only large-scale study of its kind that spans military and civilian elites from many national backgrounds. These data have important limitations—most notably, they were gathered by asking national security officials to answer surveys rather than by analyzing the output of structured analytic processes. The findings are thus primarily useful for evaluating national security officials’ intuitive abilities to assess uncertainty; they do not reflect real-world judgments. Yet the next section explains that there are many reasons to expect that intuitive biases shape national security decisions. It is important for national security bureaucracies to identify and mitigate those flaws.
The data reveal that national security officials’ intuitions are overwhelmingly overconfident.18 For example, when study participants estimated that statements had a 90 percent chance of being true, those statements were true just 58 percent of the time. If participants had made every one of their judgments with less certainty, 96 percent of them would have improved their performance. In short, if you are a national security professional, the world is probably more uncertain than you think.19
This finding extends an emerging body of scholarship that shows how foreign policy practitioners share cognitive biases that are widespread among the general public.20 For example, this study indicates that national security officials were significantly more overconfident than participants in the Good Judgment Project, and that this bias was comparable to the results of identical surveys administered to respondents on the crowdsourcing platform Amazon Mechanical Turk.21 This pattern is remarkable given that national security bureaucracies have strong incentives to cultivate skills for assessing uncertainty accurately.22 Yet most national security bureaucracies do not systematically gather data to identify and correct judgmental biases.23 This article shows that it would be feasible and desirable to implement such procedures.
The study’s findings provide several additional insights for policy and scholarship. For example, experimental evidence from this study shows that just two minutes of training significantly reduced national security officials’ overconfidence. National security officials’ cognitive biases are thus widespread, but they also appear to be tractable if national security organizations are willing to combat them with relatively small amounts of effort. Another experiment embedded within this study demonstrates that national security officials’ intuitions for assessing uncertainty are especially prone to false positives. This finding implies that there may be a shared cognitive foundation for several phenomena that scholars typically treat as distinct, such as mutual optimism in war (in which both sides overestimate their chances of success), threat inflation (which involves attaching excessive certainty to ambiguous claims), and overrating the probability of changes to the status quo. Understanding that national security officials’ judgments are prone to false positives also carries practical implications—suggesting, for example, that intelligence agencies should avoid “single-outcome forecasting” by ensuring that analysts always consider multiple hypotheses when assessing uncertainty.24
Finally, the data reveal that national security officials display similar cognitive biases regardless of whether they are asked to assess uncertainty about future versus current issues, and regardless of whether they are asked to express their judgments using numbers or words. This finding suggests that surveys eliciting numeric assessments of uncertainty about factual matters, which can be conducted in minutes, reveal biases that are relevant for understanding how national security officials make forecasts using natural language, which can take months or years to process. These patterns provide additional evidence that national security bureaucracies can leverage insights from decision science to improve cognitive performance at scale.25
Studying Cognitive Biases Among National Security Officials
How well (or how poorly) do national security officials assess uncertainty? The most rigorous studies of this subject offer mixed answers to that question. For example, when David Mandel and Alan Barnes examined 1,514 forecasts made by the Canadian Intelligence Secretariat’s Middle East Division, they found that those judgments were systematically underconfident.26 When Nicholas Miller analyzed 199 judgments from US National Intelligence Estimates regarding nuclear proliferation, he found that those judgments were initially quite overconfident, but that the quality of these assessments improved over time.27 When Bradley Stastny and Paul Lehner analyzed 99 forecasts made by US intelligence analysts on a range of subjects, they found that those judgments were overconfident in some areas, underconfident in others, and poorly calibrated on the whole.28 Each of these studies offers important contributions, particularly in showing that scholars can rigorously evaluate assessments of uncertainty in national security contexts. Each nevertheless examines a relatively small volume of data drawn from relatively narrow subsets of practitioners. It is thus unsurprising that these studies reach conflicting conclusions. All of these studies, moreover, focus on civilian intelligence analysts, whose behavior may not generalize to national security professionals writ large.
Other scholars have examined the challenges of assessing uncertainty in international politics by drawing study participants from the general population. The Good Judgment Project, for example, recruited more than 2,000 individuals to make geopolitical forecasts.29 That study identified a group of “superforecasters” who made highly accurate predictions, but found that participants were, on the whole, moderately overconfident; for example, when Good Judgment Project forecasters estimated that an outcome had a 95 percent chance of taking place, those outcomes occurred closer to 85 percent of the time.30 Yet, as noted earlier, it is not obvious that this finding applies to national security professionals, who have more incentives than the general population to hone their ability to assess uncertainty, who devote their careers to studying world politics, and who inhabit unique professional cultures that might encourage excessive caution rather than overconfidence. In sum, no empirical study to date provides generalizable foundations for understanding the extent to which national security officials’ assessments of uncertainty are systematically biased in one direction or another.
To tackle that challenge, this study partnered with four advanced military education programs: the Canadian Forces College, the NATO Defense College, the Norwegian Defence Intelligence School, and the US National War College.31 These institutions comprise large, diverse samples of national security professionals. In Canada, Europe, and the United States, military officers who obtain the rank of colonel are normally required to complete a graduate degree at these kinds of institutions. The NATO Defense College and the US National War College serve an especially diverse range of countries, drawing students from more than forty NATO allies and partners.32 These institutions’ cohorts also contain substantial numbers of civilian national security officials drawn from foreign affairs ministries, intelligence agencies, and other areas of government tasked with responsibilities related to international affairs.33 These institutions agreed to administer online surveys as part of their core curricula in exchange for providing participants individualized feedback about their cognitive biases. Participation rates exceeded 90 percent for most cohorts. A total of 1,894 national security officials participated in this exercise.34 These officials made 63,130 assessments of uncertainty.
This study design has several advantages over prior research. For example, the study contains roughly thirty times as many assessments of uncertainty as Mandel and Barnes’ analysis of Canadian intelligence officials, which was previously the largest publicly available dataset examining national security officials’ probabilistic judgments.35 Whereas most prior studies of this subject involve relatively narrow samples of personnel, often drawn from one office within one country and almost always focusing on intelligence analysts specifically, this study involves a wide variety of civilian and military officials who represent a wide range of nationalities. While it is, of course, impossible to know whether this study’s findings apply to states (such as China) that do not send national security officials to institutions associated with NATO, we can at least be confident that the cognitive biases documented in this article generalize broadly—that they are not the product of particular countries or institutional cultures. And, while survey research on national security elites often suffers from low participation rates that raise questions about representativeness,36 the data described in this article reflect judgments made by nearly every national security official who was assigned to one of the educational programs with which the study partnered.37
Each survey asked participants to estimate the chances that 30 to 40 statements were true. These questions were regularly updated across survey waves, covering a variety of topics related to international military, economic, and political affairs. In total, the study contained more than 250 unique questions. Every survey was cleared in advance by participating institutions to ensure that its content was relevant to the national security officials with whom they worked.
Most questions asked respondents to assess uncertainty about current issues. For example, one question asked: “In your opinion, what are the chances that NATO’s members spend more money on defense than the rest of the world combined?”38 Assessments of uncertainty on these questions could be evaluated immediately, in order to give national security professionals feedback as soon as the survey concluded.39 Other questions asked participants to make forecasts that could only be evaluated at later dates, such as: “In your opinion, what are the chances that Russia and Ukraine will officially declare a ceasefire by the end of 2022?”40 As shown below, national security professionals demonstrated similar cognitive biases across these question formats.
Additionally, national security officials often make high-stakes choices under conditions of stress and time scarcity that preclude the use of structured analytic processes.
Most assessments of uncertainty in the study were elicited as numeric percentages, which made it possible to give clear feedback to the participants regarding their judgmental biases. Quantitative assessments of uncertainty, however, might seem inapt, given that national security officials often express uncertainty using qualitative language. To address this issue, a random subset of responses was elicited using qualitative terms, such as “likely” and “almost certain,” that are recommended for use in the US Intelligence Community.41 This variation also had no meaningful impact on results.
This study’s primary drawback is that national security officials naturally invest less effort into completing surveys than they would devote to analyzing real decisions. This limitation is essentially unavoidable for experimental research on high-stakes decision-making.42 Results should thus be interpreted as measuring participants’ intuitions for assessing uncertainty, recognizing that these intuitions are just one input to national security analysis and decision-making. As Daniel Kahneman might phrase it, these data reflect national security officials “thinking fast”—the data presented below capture national security officials’ “cognitive first steps” when assessing uncertainty.43
These intuitions matter for two main reasons. First, substantial evidence shows that individuals’ initial, intuitive impressions of a problem anchor their subsequent judgments.44 Even if deliberative analysis can mitigate the impact of intuitive cognitive errors, the first steps that national security professionals take when assessing uncertainty shape their subsequent performance. This argument is consistent with findings from Joshua Kertzer and colleagues showing that individual-level cognitive biases persist in group settings.45 Even if group deliberation often improves analytic rigor, it does not necessarily eliminate flaws in human judgment. In some cases, group deliberation can enhance cognitive biases—for example, by suppressing heterodox viewpoints46 or through herding behavior that encourages individuals to adopt more extreme views.47
Additionally, national security officials often make high-stakes choices under conditions of stress and time scarcity that preclude the use of structured analytic processes. These constraints, which are essentially unavoidable in tactical decision-making, force individuals to rely on their intuitions in a manner that amplifies the effects of cognitive biases.48 Even at strategic levels, national security officials frequently form beliefs based on intuitions rather than on conducting extensive deliberations or reading rigorous intelligence reports.49 For example, high-ranking members in the George W. Bush administration devoted little systematic effort to assessing the long-term risks of invading Iraq.50 Instead, they based their decision to go to war on intuitive assumptions that the US military could easily stabilize Iraq after toppling Saddam Hussein’s regime.51 This is just one salient example of why it is important to understand the accuracy of national security officials’ intuitions for assessing uncertainty, to identify the biases that those intuitions contain, and to determine whether those problems can be mitigated.52
National Security Officials’ Intuitions Are Overwhelmingly Overconfident
Figure 1 presents a calibration curve depicting the 50,408 numeric assessments of uncertainty that this study collected.53 The figure’s horizontal axis captures the chances that national security officials assigned to statements being true. The vertical axis indicates the proportion of the time that those statements were actually true. If national security officials’ intuitions for assessing uncertainty were well calibrated, then the data would fit a 45-degree line, such that when study participants said a statement had a 30 percent chance of being true, then those statements would actually be true 30 percent of the time.
Figure 1. Judgmental calibration for 50,408 assessments of uncertainty made by national security professionals. Figure by the author.
Instead, figure 1 reveals that the national security officials who participated in this study were overwhelmingly overconfident. For instance, when officials thought there was a 90 percent chance that a statement was true, those statements were true just 57 percent of the time. This degree of overconfidence is at least as large as what other studies have previously documented in non-elite samples. For example, the national security officials who contributed to this study were significantly more overconfident than forecasters who participated in the Good Judgment Project, and they were roughly as overconfident as a group of 775 respondents recruited to take the same survey that was administered to one of the study’s National War College cohorts.54
This pattern of overconfidence was remarkably consistent across the data. All nineteen cohorts of national security officials who participated in the study gave overconfident estimates. This bias appeared for both civilian and military professionals, for both men and women, and for both US and non-US citizens.55 This bias also appeared across a wide range of subject matter; limiting the analysis to virtually any subset of survey questions produced similar results. (See the online appendix for details.)
The data also show that national security officials’ judgments were biased towards false positives. Figure 1 documents this pattern by showing that national security officials’ assessments were particularly overconfident when they estimated probabilities over 50 percent. We can quantify this overconfidence by measuring the difference between the probabilities that national security officials assigned to their judgments and the actual proportion of those claims that were true. Thus, when the statements to which participants assigned a 90 percent probability turned out to be true just 57 percent of the time, that represents a bias of 33 percentage points. By contrast, if we look at statements to which participants assigned a 10 percent probability—a degree of certainty that is logically equivalent to judgments of 90 percent—those statements turned out to be true 32 percent of the time, for a gap of 22 percentage points. In other words, national security officials appear to have a particular tendency to believe that false statements are true. Later sections of this article will present further evidence to document that bias and explain why it has important implications for the theory and practice of national security decision-making.
Figure 2 quantifies the average accuracy of each study participant’s judgments using Brier scores, which capture the squared difference between the probability estimates an individual made and the estimates they could have made if they knew each question’s “right answer” with certainty.56 Since Brier scores measure squared error, lower numbers indicate more accurate judgments. The vertical lines in figure 2 reflect two benchmarks for gauging performance. A Brier score of 0.250 is the score that participants would have received if they claimed complete ignorance, and thus recorded probability estimates of 50 percent, for every question the survey posed. A Brier score of 0.335 is the score that participants would have received, on average, if they had responded to the survey by making probability estimates at random.
Figure 2. Brier scores for 1,470 national security professionals. Figure by the author.
The average participant’s Brier score in this study was 0.280,57 with 68 percent of participants receiving Brier scores that were worse than 0.250. In other words, most national security officials in this study would have performed better if they simply said they did not know the answer to every question that the survey gave them. Sixteen percent of participants would have received better scores, in expectation, if they had guessed probabilities at random.
These findings do not indicate that national security officials lack knowledge or that they cannot think probabilistically. Figure 1 clearly demonstrates that study participants had reliable intuitions for judging which statements were more likely to be true than others.58 Yet national security officials’ overconfidence was so extreme that it essentially canceled out the knowledge that these individuals possessed. Of national security officials who participated in the study, 96 percent would have received better Brier scores if they had attached less certainty to every one of their judgments.59
Figure 3 indicates that participants who attached less certainty to their judgments tended to be more accurate overall. The horizontal axis of this graph reflects each national security official’s “certitude”: the average distance between their assessments of uncertainty and 50 percent.60 The graph’s vertical axis captures each national security official’s Brier score. Figure 3 “normalizes” these attributes into percentile rankings within each survey cohort in order to minimize confounding factors that might result from different groups receiving different questions at different times. The graph reveals a consistent, negative relationship between certitude and judgmental accuracy.61 In other words: The more certainty national security officials possessed in this study, the less accurate their judgments tended to be.
Figure 3. National security officials who assigned more certainty to their judgments also tended to be less accurate. Figure by the author.
Mitigating Overconfidence Through Brief Training
Showing that national security officials are overwhelmingly overconfident does not imply that their biases are impossible to correct. It is plausible that their overconfidence stems, at least in part, from the fact that most national security officials do not receive explicit feedback about their abilities to assess uncertainty. In the absence of such feedback, it is easy to develop “illusions of skill.”62 Philip Tetlock, for example, has documented a tendency for experts to give themselves full credit for making judgments that seem wise after the fact while “explaining away” their failures in a manner that prevents effective learning.63 How hard is it to burst these illusions and thereby improve performance?
The standard tool that decision scientists deploy for this purpose is called “calibration training.” This method involves asking participants to assess uncertainty, providing feedback on the accuracy of their judgments, and then administering follow-up surveys to measure improvement over time.64 That approach was infeasible in the context of this research, where national security officials were only available to take a single survey.
This study thus took a different approach to combating cognitive biases by providing a random subset of individuals with information at the start of each survey describing the biases that national security officials had previously demonstrated in prior surveys.65 This material explained that prior participants’ judgments were systematically overconfident, documented that claim by presenting a calibration curve like the graph in figure 1, and explained that almost all participants would have achieved better scores if they had assigned less certainty to every one of their judgments. (See the online appendix for details.) This extra information was not demanding: On average, participants spent two minutes reading it before moving on to the remainder of the survey.
Armed with this information, national security officials made much better assessments of uncertainty. They posted average Brier scores of 0.274, while participants in the control group posted average Brier scores of 0.291. This improvement was highly significant in both statistical and substantive terms.66 National security officials were similarly receptive to this extra information regardless of whether they were men or women, military or civilian personnel, and US or non-US citizens. As expected, improved performance was associated with the fact that participants who viewed data about prior cohorts’ overconfidence attached less certainty to their judgments.67 Almost all of the improved performance in the treatment group (91 percent) is attributable to the fact that they became more cautious when assessing uncertainty.68
This finding is consistent with prior research showing that decision-makers can be trained to combat cognitive biases. For example, Megan Kelly and David Mandel found that instructing intelligence analysts to watch a course made up of six instructional videos significantly improved judgmental accuracy.69 The Good Judgment Project found that a group of forecasters who were randomly assigned to take a one-hour online training program in reducing cognitive biases performed significantly better than their counterparts.70 This study complements that literature by showing that interventions need not be extensive or sophisticated to have meaningful impact. If national security officials can systematically improve their judgments by receiving just two minutes of training, then national security bureaucracies may be able to combat overconfidence by institutionalizing similarly simple procedures at large scales.
Bias Toward False Positives
Study participants’ assessments of uncertainty were biased toward false positives. In other words, national security officials appear to find it easier to generate ideas about why a hypothesis might be true than why it might be false. Yet, without experimentally manipulating questions, it is difficult to know whether this pattern represents a consistent cognitive bias, as opposed to spurious features of survey design. Survey questions may have unintentionally been phrased in a manner that skewed participants’ judgments.
To address this ambiguity, a subset of surveys randomly selected questions from two mutually exclusive and logically identical alternatives. For example, half of participants might receive this question: “What are the chances that Boko Haram has killed more civilians than ISIS since 2010?” The other half would receive this question: “What are the chances that ISIS has killed more civilians than Boko Haram since 2010?” Since these hypotheses are the inverse of one another, the average probability that rational individuals assign to them should sum to 100 percent.71 If participants’ judgments were skewed toward false positives, then these average estimates would sum to more than 100 percent.
Figure 4 depicts the average participant’s response to each of the two mutually exclusive question variants. Across 280 questions that appeared in this experimental module, the average probabilities participants assigned to each question variant summed to 110 percent. That bias is highly statistically significant72 and is widespread in the data. The average response to each survey question’s two variants summed to more than 100 percent for 244 of the 280 questions in the experiment. This shows that national security officials’ assessments of uncertainty were systematically biased toward false positives, and that this bias generalizes across a wide range of questions rather than being driven by performance on an idiosyncratic subset of issues in the study.
Figure 4. The average probability that national security officials assigned to mutually exclusive question variants consistently summed to more than 100 percent. Figure by the author.
This bias toward false positives may be related to the “availability heuristic”: the tendency for people to exaggerate the chances of outcomes that come more readily to mind.73 In this view, imagining how a hypothesis might be true may tend to be a more concrete (and thus easier) task than imagining how a hypothesis might be false. If that is the case, then the availability heuristic suggests that most people will have a bias toward confirming, rather than refuting, hypotheses presented to them. The overrepresentation of false positives shown in this survey may also be related to “acquiescence bias”: the tendency for people to agree with propositions they are asked to consider.74
Both of these interpretations have similar relevance for decision-making. In particular, they support the idea that national security officials should avoid the practice of “single-outcome forecasting,” in which analysts focus on assessing the chances of a particular hypothesis being true rather than evaluating how uncertainty is distributed among multiple possibilities.75 A good example of this contrast is how US intelligence analysts studying Iraq’s alleged nuclear program in 2002 concluded that Saddam Hussein was importing aluminum tubes in order to build centrifuges for enriching uranium. This argument was plausible, but there was also evidence to indicate that Iraq was using the aluminum tubes to build conventional rockets (an alternate hypothesis that turned out to be correct).76 Orienting the intelligence process around assessing the chances that Iraq was building nuclear weapons may have exacerbated analysts’ bias towards false positives. Simultaneously assessing the chances of multiple hypotheses—in this case the chances that Iraq was using the aluminum tubes for centrifuges, or for rockets, or for some other purpose—can counteract that tendency.77 Documenting a consistent bias toward false positives also supports prior research arguing that national security officials may benefit from employing a “falsificationist” mindset, which means that they explicitly seek out information to disconfirm statements they think are likely to be true.78
Facts Versus Predictions, and Words Versus Numbers
Most assessments of uncertainty in the study pertained to factual matters.79 Those estimates could be used to provide national security professionals with immediate feedback about their performance.
The task of assessing uncertainty about current states of the world is crucial to many elements of national security decision-making. For example, debates about whether Iraq was pursuing nuclear weapons in 2002, or whether the United States had correctly identified Osama bin Laden’s location in 2010, or the state of the US-Soviet nuclear balance during the Cold War, or the extent to which Chinese leaders currently possess revisionist intentions, all require assessing uncertainty about factual matters. Yet national security professionals must also assess uncertainty about future states of the world—for example, estimating the chances that military operations will succeed, or predicting how another country might respond to diplomatic provocations.80 To what extent do the study’s findings about how national security officials assess factual matters reflect their capabilities for making predictions? To address this question, the study also collected a series of forecasts that were scored at later dates.81 Figure 5 presents calibration curves for both question types, respectively.
Figure 5. Judgmental calibration when assessing current versus future states of the world. Figure by the author.
These graphs demonstrate that national security officials’ overconfidence, along with their proclivity for false positives, were even more pronounced when making forecasts. Since there is no way to ensure that the surveys’ forecasting questions had an equal degree of difficulty to questions regarding current states of the world, figure 5 cannot sustain causal claims about the degree to which national security professionals’ performance differs across these question types. Figure 5 nevertheless shows that this study’s findings are not driven by the choice to focus primarily on assessing uncertainty about current states of the world. If anything, it appears that national security bureaucracies have greater reasons to worry about overconfidence and a bias toward false positives when assessing uncertainty about future events.
This study also asked national security officials to assess uncertainty by estimating numeric percentages.82 As noted earlier, this format facilitates providing clear feedback about judgmental biases, but it differs from the way that national security officials often assess uncertainty: verbally and qualitatively. It is possible that asking national security officials to assess uncertainty in unfamiliar ways garbled their thoughts and thereby produced judgmental biases that would not appear in normal settings.83 To test whether this distinction matters, four survey waves assigned a random subset of participants to assess uncertainty using the “words of estimative probability” shown in figure 6, based on the US National Intelligence Council’s then-current guidance for expressing uncertainty.84
Since these words do not carry precise definitions, the accuracy of these data is open to some interpretation. However, national security officials who used these terms were also clearly overconfident. For example, when participants said that a statement was “almost certain” to be true, those statements turned out to be false 32 percent of the time (and true the other 68 percent).
Figure 6. “Words of estimative probability” lexicon recommended by the US National Intelligence Council.
Figure 7. Judgmental calibration for 13,480 verbal assessments of uncertainty provided by national security officials. Figure by the author.
Figure 7 also replicates the finding that national security officials’ assessments of uncertainty are prone to false positives. For example, when participants said that a statement had a “remote chance” of being true, those statements were true 12 percent of the time—a rate of surprise that was almost three times lower than what national security officials encountered when they assigned the term “almost certain,” despite the fact that these judgments reflect equivalent degrees of certitude according to the lexicon used in the experiment.85
The article’s main findings are thus not particularly sensitive to whether national security officials assessed uncertainty about future versus current states of the world, nor do they depend on eliciting judgments using numbers rather than words. This is welcome news for national security organizations that wish to employ tools from the decision sciences to analyze and improve decision-making. Forecasts that national security officials make using natural language can take years to evaluate. By contrast, surveys eliciting numeric assessments of uncertainty about factual matters can be conducted in a matter of minutes. The fact that such surveys appear to offer generalizable insights about national security officials’ cognitive biases provides further evidence that organizations can tackle those problems at scale.
Implications for Scholarship and Policy
Systematic overconfidence among national security officials has troubling implications for policymaking. National security officials continually make choices that place lives and resources at risk. It is important to minimize those risks wherever possible; indeed, some courses of action are only worth taking if they are almost certain to succeed. Yet even when national security officials who participated in this study were completely certain that their judgments were right—that is, when they said a statement had either a 0 percent or a 100 percent chance of being true—they were wrong more than 25 percent of the time.86 If analysts who are completely certain about their conclusions are wrong so often, then leaders must be cautious about trusting anyone who claims that a course of action is truly safe.
These findings also offer implications for international relations theory, particularly with respect to scholarship in the realist tradition, which assumes that national security officials make rational assessments of uncertainty simply because they have strategic incentives to do so.87 If national security officials devoted as much effort to bolstering their cognitive capacities as realists assume, then we would not expect their judgments to be so overconfident; we would not expect this bias to be comparable to judgments made by “non-elite” respondents who have no special reasons to cultivate talent for assessing uncertainty in world politics. Nor would we expect that just two minutes of training would markedly improve national security officials’ performance. Each of these findings throws doubt on the assumption that national security officials can reliably assess uncertainty in rational ways.
Relaxing that assumption matters for international relations theory. It suggests that decision-makers are likely to underestimate the risks surrounding national security policies, a bias that is likely to foment international instability.88 Demonstrating that national security officials’ judgments are skewed toward false positives also suggests a cognitive foundation for several phenomena that scholars tend to treat as separate. For example, a proclivity for false positives may be part of the reason why foreign policy analysts tend to overrate the probability of changes to the status quo,89 and may contribute to threat inflation.90 The adage that generals “always prepare to fight the last war” is consistent with the idea that, once national security officials have identified a challenge, they will overestimate the chances of encountering that challenge again in the future.91 Mutual optimism in war—a phenomenon that is widely viewed as a destabilizing force in international politics92—also requires at least one state to make a false-positive assessment in overpredicting its chances of obtaining a favorable outcome.
Future research could expand the above analysis in at least two ways. First, study participants could take multiple rounds of surveys.93 This approach would provide clearer evidence of the extent to which training produces durable improvements in performance. This procedure would also facilitate experimenting with different training methods in order to determine the most effective approaches to improving assessments of uncertainty.94 The Good Judgment Project, for example, identified specific training procedures that durably improved the quality of geopolitical forecasts.95 National security organizations would generally benefit from incorporating such training into professional development programs. It is equally important to understand which interventions have short-term impacts that primarily involve “priming” people to think in certain ways without permanently enhancing their skill sets. Bureaucracies would benefit from incorporating these interventions into structured analytic techniques and other standard operating procedures that national security officials employ on a regular basis.
It would also be valuable to gather large-scale datasets that evaluate judgments made by groups, as assessments of uncertainty in national security often reflect corporate judgment rather than individual viewpoints.96 Some studies show that group collaboration can attenuate individual-level cognitive biases, particularly for groups whose members hold diverse viewpoints that expose people to new information they had not previously considered.97 In other contexts, groupwork has been shown to replicate individual-level biases98 or even to exacerbate judgmental errors.99 The latter problem is particularly likely to occur in cases where group members share similar views; here, collaboration runs the risk of amplifying biases in a phenomenon known as “group extremity shift.”100
In other words, it is not obvious whether we should expect institutional procedures to amplify, mitigate, or maintain national security officials’ intuitive overconfidence. Moreover, different kinds of institutional procedures likely interact with cognitive biases in different ways across different contexts.101 The complex and contingent nature of these relationships suggests that bureaucratic practices likely play an important (and arguably understudied102) role in shaping the rationality of national security policy. If national security officials’ intuitions for assessing uncertainty are as flawed as this study indicates, then rational decision-making must be mediated by institutional design where possible.
If national security officials’ intuitions for assessing uncertainty are as flawed as this study indicates, then rational decision-making must be mediated by institutional design where possible.
Finally, this study supports some practical advice for national security practitioners. First, remember that the world is more uncertain than you think. Recognize that your intuitions are likely to be overconfident, especially if you have not previously received systematic, quantitative feedback on your assessments of uncertainty. If you think that an outcome is likely to be true, consider those chances to be closer to 60 percent than 90 percent. If you think that an outcome is unlikely to be true, consider those chances to be closer to 40 percent than 10 percent. Apply the same corrections to advice you receive from others—the world is likely more uncertain than they think, too.
Second, remember that your judgments are prone to false positives. As described earlier, you can combat this problem by employing a falsificationist mindset and by avoiding single-outcome forecasting. Instead of assessing the chances that a single statement is true, try to consider how uncertainty is distributed across multiple possibilities. Making this range of possibilities explicit can combat your natural tendency to fixate on one potential outcome to the exclusion of others.103
Finally, national security bureaucracies would benefit from providing personnel with quantitative feedback regarding their ability to assess uncertainty. Though this study documents widespread overconfidence among national security officials, its data also suggest that this bias is tractable. If just two minutes of training can substantially mitigate national security officials’ overconfidence, then national security bureaucracies can almost certainly identify and combat cognitive biases at large scales. The procedure described in this article—administering surveys, processing data, and providing participants with individualized feedback on their performance—took roughly twenty minutes per cohort. These tasks can be automated, using code that appears in this article’s online appendix. Similar exercises could be incorporated into any professional training program, or conducted by any institution willing to devote a small amount of time to the goal of improving its participants’ judgments about the world.
Jeffrey A. Friedman is an associate professor of government at Dartmouth College, where he studies the politics and psychology of national security decision-making and directs the John Rosenwald postdoctoral fellows program in US foreign policy and international security.
Acknowledgements: Special thanks go to the 1,894 national security officials who generously volunteered their time to participate in this study. Rich Andres, Christina Brookes, Mark Bucknam, Jennifer Lerner, Stephen Mariano, Idun Mustulien, and Bryan Pendleton played irreplaceable roles in facilitating the partnerships on which the project depended. Many of the core ideas presented in this article were inspired by Richard Zeckhauser, who also participated in early training sessions. Erik Lin-Greenberg, Nicholas Miller, Caleb Pomeroy, Alberto Simpser, Megan Stewart, Michael Poznansky, and John Wilcox provided thoughtful comments on prior drafts. Freya Jamison, Luca Fagotti, Joowon Kim, and Benjamin Rutan provided research assistance. The research presented in this article was partly conducted while the author was Visiting Fellow at the Institute for Advanced Study in Toulouse. Funding from the French Agence Nationale de la Recherche (under the Investissement d’Avenir programme, ANR-17-EURE-0010) is gratefully acknowledged.
Image: Dguendel, CC BY 4.0 <https://creativecommons.org/licenses/by/4.0>, via Wikimedia Commons104