In 1995, McArthur Wheeler walked into two Pittsburgh banks and robbed them
in broad daylight, with no visible attempt at disguise. He was arrested
later that night, less than an hour after videotapes of him taken from
surveillance cameras were broadcast on the 11 o'clock news. When police
later showed him the surveillance tapes, Mr. Wheeler stared in
incredulity. "But I wore the juice," he mumbled. Apparently, Mr. Wheeler
was under the impression that rubbing one's face with lemon juice rendered
it invisible to videotape cameras (
Fuocco, 1996
).
We bring up the unfortunate affairs of Mr. Wheeler to make three points.
The first two are noncontroversial. First, in many domains in life,
success and satisfaction depend on knowledge, wisdom, or savvy in knowing
which rules to follow and which strategies to pursue. This is true not
only for committing crimes, but also for many tasks in the social and
intellectual domains, such as promoting effective leadership, raising
children, constructing a solid logical argument, or designing a rigorous
psychological study. Second, people differ widely in the knowledge and
strategies they apply in these domains (
Dunning, Meyerowitz, & Holzberg, 1989
;
Dunning, Perie, & Story, 1991
;
Story & Dunning, 1998
), with varying levels of success. Some of the knowledge and theories that
people apply to their actions are sound and meet with favorable results.
Others, like the lemon juice hypothesis of McArthur Wheeler, are imperfect
at best and wrong-headed, incompetent, or dysfunctional at worst.
Perhaps more controversial is the third point, the one that is the focus
of this article. We argue that when people are incompetent in the
strategies they adopt to achieve success and satisfaction, they suffer a
dual burden: Not only do they reach erroneous conclusions and make
unfortunate choices, but their incompetence robs them of the ability to
realize it. Instead, like Mr. Wheeler, they are left with the mistaken
impression that they are doing just fine. As
Miller (1993)
perceptively observed in the quote that opens this article, and as
Charles Darwin (1871)
sagely noted over a century ago, "ignorance more frequently begets
confidence than does knowledge" (p. 3).
In essence, we argue that the skills that engender competence in a
particular domain are often the very same skills necessary to evaluate
competence in that domainone's own or anyone else's. Because of
this, incompetent individuals lack what cognitive psychologists variously
term
metacognition
(
Everson & Tobias, 1998
),
metamemory
(
Klin, Guizman, & Levine, 1997
),
metacomprehension
(
Maki, Jonas, & Kallod, 1994
), or
self-monitoring
skills (
Chi, Glaser, & Rees, 1982
). These terms refer to the ability to know how well one is performing,
when one is likely to be accurate in judgment, and when one is likely to
be in error. For example, consider the ability to write grammatical
English. The skills that enable one to construct a grammatical sentence
are the same skills necessary to recognize a grammatical sentence, and
thus are the same skills necessary to determine if a grammatical mistake
has been made. In short, the same knowledge that underlies the ability to
produce correct judgment is also the knowledge that underlies the ability
to recognize correct judgment. To lack the former is to be deficient in
the latter.
The Studies
We explored these predictions in four studies. In each, we presented
participants with tests that assessed their ability in a domain in which
knowledge, wisdom, or savvy was crucial: humor (Study 1), logical
reasoning (Studies 2 and 4), and English grammar (Study 3). We then asked
participants to assess their ability and test performance. In all studies,
we predicted that participants in general would overestimate their ability
and performance relative to objective criteria. But more to the point, we
predicted that those who proved to be incompetent (i.e., those who scored
in the bottom quarter of the distribution) would be unaware that they had
performed poorly. For example, their score would fall in the 10th or 15th
percentile among their peers, but they would estimate that it fell much
higher (Prediction 1). Of course, this overestimation could be taken as a
mathematical verity. If one has a low score, one has a better chance of
overestimating one's performance than underestimating it. Thus, the real
question in these studies is how much those who scored poorly would be
miscalibrated with respect to their performance.
In addition, we wanted to examine the relationship between miscalibrated
views of ability and metacognitive skills, which we operationalized as (a)
the ability to distinguish what one has answered correctly from what one
has answered incorrectly and (b) the ability to recognize competence in
others. Thus, in Study 4, we asked participants to not only estimate their
overall performance and ability, but to indicate which specific test items
they believed they had answered correctly and which incorrectly. In Study
3, we showed competent and incompetent individuals the responses of others
and assessed how well participants from each group could spot good and
poor performances. In both studies, we predicted that the incompetent
would manifest poorer metacognitive skills than would their more competent
peers (Prediction 2).
We also wanted to find out what experiences or interventions would make
low performers realize the true level of performance that they had
attained. Thus, in Study 3, we asked participants to reassess their own
ability after they had seen the responses of their peers. We predicted
that competent individuals would learn from observing the responses of
others, thereby becoming better calibrated about the quality of their
performance relative to their peers. Incompetent participants, in
contrast, would not (Prediction 3). In Study 4, we gave participants
training in the domain of logical reasoning and explored whether this
newfound competence would prompt incompetent individuals toward a better
understanding of the true level of their ability and test performance
(Prediction 4).
Study 1: Humor
In Study 1, we decided to explore people's perceptions of their competence
in a domain that requires sophisticated knowledge and wisdom about the
tastes and reactions of other people. That domain was humor. To anticipate
what is and what others will find funny, one must have subtle and tacit
knowledge about other people's tastes. Thus, in Study 1 we presented
participants with a series of jokes and asked them to rate the humor of
each one. We then compared their ratings with those provided by a panel of
experts, namely, professional comedians who make their living by
recognizing what is funny and reporting it to their audiences. By
comparing each participant's ratings with those of our expert panel, we
could roughly assess participants' ability to spot humor.
Our key interest was how perceptions of that ability converged with actual
ability. Specifically, we wanted to discover whether those who did poorly
on our measure would recognize the low quality of their performance. Would
they recognize it or would they be unaware?
Method
Participants.
Participants were 65 Cornell University undergraduates from a variety of
courses in psychology who earned extra credit for their participation.
Materials.
We created a 30-item questionnaire made up of jokes we felt were of
varying comedic value. Jokes were taken from
Gender failed to qualify any results in this or any of the studies
reported in this article, and thus receives no further mention.
Our first prediction was that participants overall would overestimate
their ability to tell what is funny relative to their peers. To find out
whether this was the case, we first assigned each participant a percentile
rank based on the extent to which his or her joke ratings correlated with
the ratings provided by our panel of professionals (with higher
correlations corresponding to better performance). On average,
participants put their ability to recognize what is funny in the 66th
percentile, which exceeded the actual mean percentile (50, by definition)
by 16 percentile points, one-sample
t
(64) = 7.02,
p
< .0001. This overestimation occurred even though self-ratings of
ability were significantly correlated with our measure of actual ability,
r
(63) = .39,
p
< .001.
Our main focus, however, is on the perceptions of relatively "incompetent"
participants, which we defined as those whose test score fell in the
bottom quartile (
n
= 16). As
Figure 1
depicts, these participants grossly overestimated their ability relative
to their peers. Whereas their actual performance fell in the 12th
percentile, they put themselves in the 58th percentile. These estimates
were not only higher than the ranking they actually achieved, paired
t
(15) = 10.33,
p
< .0001, but were also marginally higher than a ranking of "average"
(i.e., the 50th percentile), one-sample
t
(15) = 1.96,
p
< .07. That is, even participants in the bottom quarter of the
distribution tended to feel that they were better than average.
In short, Study 1 revealed two effects of interest. First, although
perceptions of ability were modestly correlated with actual ability,
people tended to overestimate their ability relative to their peers.
Second, and most important, those who performed particularly poorly
relative to their peers were utterly unaware of this fact. Participants
scoring in the bottom quartile on our humor test not only overestimated
their percentile ranking, but they overestimated it by 46 percentile
points. To be sure, they had an inkling that they were not as talented in
this domain as were participants in the top quartile, as evidenced by the
significant correlation between perceived and actual ability. However,
that suspicion failed to anticipate the magnitude of their shortcomings.
At first blush, the reader may point to the regression effect as an
alternative interpretation of our results. After all, we examined the
perceptions of people who had scored extremely poorly on the objective
test we handed them, and found that their perceptions were less extreme
than their reality. Because perceptions of ability are imperfectly
correlated with actual ability, the regression effect virtually guarantees
this result. Moreover, because incompetent participants scored close to
the bottom of the distribution, it was nearly impossible for them to
underestimate their performance.
Despite the inevitability of the regression effect, we believe that the
overestimation we observed was more psychological than artifactual. For
one, if regression alone were to blame for our results, then the magnitude
of miscalibration among the bottom quartile would be comparable with that
of the top quartile. A glance at
Figure 1
quickly disabuses one of this notion. Still, we believe this issue
warrants empirical attention, which we devote in Studies 3 and 4.
We conducted Study 2 with three goals in mind. First, we wanted to
replicate the results of Study 1 in a different domain, one focusing on
intellectual rather than social abilities. We chose logical reasoning, a
skill central to the academic careers of the participants we tested and a
skill that is called on frequently. We wondered if those who do poorly
relative to their peers on a logical reasoning test would be unaware of
their poor performance.
Examining logical reasoning also enabled us to compare perceived and
actual ability in a domain less ambiguous than the one we examined in the
previous study. It could reasonably be argued that humor, like beauty, is
in the eye of the beholder.
2
Indeed, the imperfect interrater reliability among our group of
professional comedians suggests that there is considerable variability in
what is considered funny even by experts. This criterion problem, or lack
of uncontroversial criteria against which self-perceptions can be
compared, is particularly problematic in light of the tendency to define
ambiguous traits and abilities in ways that emphasize one's own strengths
(
Dunning et al., 1989
). Thus, it may have been the tendency to define humor idiosyncratically,
and in ways favorable to one's tastes and sensibilities, that produced the
miscalibration we observednot the tendency of the incompetent to
miss their own failings. By examining logical reasoning skills, we could
circumvent this problem by presenting students with questions for which
there is a definitive right answer.
Finally, we wanted to introduce another objective criterion with which we
could compare participants' perceptions. Because percentile ranking is by
definition a comparative measure, the miscalibration we saw could have
come from either of two sources. In the comparison, participants may have
overestimated their own ability (our contention) or may have
underestimated the skills of their peers. To address this issue, in Study
2 we added a second criterion with which to compare participants'
perceptions. At the end of the test, we asked participants to estimate how
many of the questions they had gotten right and compared their estimates
with their actual test scores. This enabled us to directly examine whether
the incompetent are, indeed, miscalibrated with respect to their own
ability and performance.
The order in which specific questions were asked did not affect any of the
results in this or in any of the studies reported in this article and thus
receives no further mention.
As expected, participants overestimated their logical reasoning ability
relative to their peers. On average, participants placed themselves in the
66th percentile among students from their class, which was significantly
higher than the actual mean of 50, one-sample
t
(44) = 8.13,
p
< .0001. Participants also overestimated their percentile rank on the
test,
M
percentile = 61, one-sample
t
(44) = 4.70,
p
< .0001. Participants did not, however, overestimate how many
questions they answered correctly,
M
= 13.3 (perceived) vs. 12.9 (actual),
t
< 1. As in Study 1, perceptions of ability were positively related to
actual ability, although in this case, not to a significant degree. The
correlations between actual ability and the three perceived ability and
performance measures ranged from .05 to .19, all
ns.
What (or rather, who) was responsible for this gross miscalibration? To
find out, we once again split participants into quartiles based on their
performance on the test. As
Figure 2
clearly illustrates, it was participants in the bottom quartile (
n
= 11) who overestimated their logical reasoning ability and test
performance to the greatest extent. Although these individuals scored at
the 12th percentile on average, they nevertheless believed that their
general logical reasoning ability fell at the 68th percentile and their
score on the test fell at the 62nd percentile. Their estimates not only
exceeded their actual percentile scores,
t
s(10) = 17.2 and 11.0, respectively,
p
s < .0001, but exceeded the 50th percentile as well,
t
s(10) = 4.93 and 2.31, respectively,
p
s < .05. Thus, participants in the bottom quartile not only
overestimated themselves but believed that they were above average.
Similarly, they thought they had answered 14.2 problems correctly on
averagecompared with the actual mean score of 9.6,
t
(10) = 7.66,
p
< .0001.
In sum, Study 2 replicated the primary results of Study 1 in a different
domain. Participants in general overestimated their logical reasoning
ability, and it was once again those in the bottom quartile who showed the
greatest miscalibration. It is important to note that these same effects
were observed when participants considered their percentile score, ruling
out the criterion problem discussed earlier. Lest one think these results
reflect erroneous peer assessment rather then erroneous self-assessment,
participants in the bottom quartile also overestimated the number of test
items they had gotten right by nearly 50%.
Study 3 was conducted in two phases. The first phase consisted of a
replication of the first two studies in a third domain, one requiring
knowledge of clear and decisive rules and facts: grammar. People may
differ in the worth they assign to American Standard Written English
(ASWE), but they do agree that such a standard exists, and they differ in
their ability to produce and recognize written documents that conform to
that standard.
Thus, in Study 3 we asked participants to complete a test assessing their
knowledge of ASWE. We also asked them to rate their overall ability to
recognize correct grammar, how their test performance compared with that
of their peers, and finally how many items they had answered correctly on
the test. In this way, we could see if those who did poorly would
recognize that fact.
After completing the test, participants compared their general ability to
"identify grammatically correct standard English" with that of other
students from their class on the same percentile scale used in the
previous studies. As in Study 2, participants also estimated the
percentile rank of their test performance among their student peers, as
well as the number of individual test items they had answered correctly.
As in Studies 1 and 2, participants overestimated their ability and
performance relative to objective criteria. On average, participants'
estimates of their grammar ability (
M
percentile = 71) and performance on the test (
M
percentile = 68) exceeded the actual mean of 50, one-sample
t
s(83) = 5.90 and 5.13, respectively,
p
s < .0001. Participants also overestimated the number of items they
answered correctly,
M
= 15.2 (perceived) versus 13.3 (actual),
t
(83) = 6.63,
p
< .0001. Although participants' perceptions of their general grammar
ability were uncorrelated with their actual test scores,
r
(82) = .14,
ns,
their perceptions of how their test performance would rank among their
peers was correlated with their actual score, albeit to a marginal
degree,
r
(82) = .19,
p
< .09, as was their direct estimate of their raw test score,
r
(82) = .54,
p
< .0001.
As in previous studies, participants falling in other quartiles
overestimated their ability and performance much less than did those in
the bottom quartile. However, as
Figure 3
shows, those in the top quartile once again underestimated themselves.
Whereas their test performance fell in the 89th percentile among their
peers, they rated their ability to be in the 72nd percentile and their
test performance in the 70th percentile,
t
s(18) =
-
4.73 and
-
5.08, respectively,
p
s < .0001. Top-quartile participants did not, however, underestimate
their raw score on the test,
M
= 16.9 (perceived) versus 16.4 (actual),
t
(18) = 1.37,
ns.
Thus far, we have shown that people who lack the knowledge or wisdom to
perform well are often unaware of this fact. We attribute this lack of
awareness to a deficit in metacognitive skill. That is, the same
incompetence that leads them to make wrong choices also deprives them of
the savvy necessary to recognize competence, be it their own or anyone
else's.
We designed a second phase of Study 3 to put the latter half of this claim
to a test. Several weeks after the first phase of Study 3, we invited the
bottom- and top-quartile performers from this study back to the laboratory
for a follow-up. There, we gave each group the tests of five of their
peers to "grade" and asked them to assess how competent each target had
been in completing the test. In keeping with Prediction 2, we expected
that bottom-quartile participants would have more trouble with this
metacognitive task than would their top-quartile counterparts.
This study also enabled us to explore Prediction 3, that incompetent
individuals fail to gain insight into their own incompetence by observing
the behavior of other people. One of the ways people gain insight into
their own competence is by comparing themselves with others (
Festinger, 1954
;
Gilbert, Giesler, & Morris, 1995
). We reasoned that if the incompetent cannot recognize competence in
others, then they will be unable to make use of this social comparison
opportunity. To test this prediction, we asked participants to reassess
themselves after they have seen the responses of their peers. We predicted
that despite seeing the superior test performances of their classmates,
bottom-quartile participants would continue to believe that they had
performed competently.
In contrast, we expected that top-quartile participants, because they have
the metacognitive skill to recognize competence and incompetence in
others, would revise their self-ratings after the grading task. In
particular, we predicted that they would recognize that the performances
of the five individuals they evaluated were inferior to their own, and
thus would raise their estimates of their percentile ranking accordingly.
That is, top-quartile participants would learn from observing the
responses of others, whereas bottom-quartile participants would not.
In making these predictions, we felt that we could account for an anomaly
that appeared in all three previous studies: Despite the fact that top-
quartile participants were far more calibrated than were their less
skilled counterparts, they tended to underestimate their performance
relative to their peers. We felt that this miscalibration had a different
source then the miscalibration evidenced by bottom-quartile participants.
That is, top-quartile participants did not underestimate themselves
because they were wrong about their own performances, but rather because
they were wrong about the performances of their peers. In essence, we
believe they fell prey to the
false-consensus effect
(
Ross, Greene, & House, 1977
). In the absence of data to the contrary, they mistakenly assumed that
their peers would tend provide the same (correct) answers as they
themselvesan impression that could be immediately corrected by
showing them the performances of their peers. By examining the extent to
which competent individuals revised their ability estimates after grading
the tests of their less competent peers, we could put this false-consensus
interpretation to a test.
After this, participants were shown their own test again and were asked to
re-rate their ability and performance on the test relative to their peers,
using the same percentile scales as before. They also re-estimated the
number of test questions they had answered correctly.
With top-quartile participants, a completely different picture emerged. As
predicted, after grading the test performance of five of their peers, top-
quartile participants raised their estimates of their own general grammar
ability,
t
(18) = 2.07,
p
= .05, and their percentile ranking on the test,
t
(18) = 3.61,
p
< .005. These results are consistent with the false-consensus effect
account we have offered. Armed with the ability to assess competence and
incompetence in others, participants in the top quartile realized that
the performances of the five individuals they evaluated (and thus their
peers in general) were inferior to their own. As a consequence, top-
quartile participants became better calibrated with respect to their
percentile ranking. Note that a false-consensus interpretation does not
predict any revision for estimates of one's raw score, as learning of the
poor performance of one's peers conveys no information about how well one
has performed in absolute terms. Indeed, as
Table 1
shows, no revision occurred,
t
(18) < 1.
This study also supported Prediction 3, that incompetent individuals fail
to gain insight into their own incompetence by observing the behavior of
other people. Despite seeing the superior performances of their peers,
bottom-quartile participants continued to hold the mistaken impression
that they had performed just fine. The story for high-performing
participants, however, was quite different. The accuracy of their self-
appraisals did improve. We attribute this finding to a false-consensus
effect. Simply put, because top-quartile participants performed so
adeptly, they assumed the same was true of their peers. After seeing the
performances of others, however, they were disabused of this notion, and
thus the they improved the accuracy of their self-appraisals. Thus, the
miscalibration of the incompetent stems from an error about the self,
whereas the miscalibration of the highly competent stems from an error
about others.
The central proposition in our argument is that incompetent individuals
lack the metacognitive skills that enable them to tell how poorly they are
performing, and as a result, they come to hold inflated views of their
performance and ability. Consistent with this notion, we have shown that
incompetent individuals (compared with their more competent peers) are
unaware of their deficient abilities (Studies 1 through 3) and show
deficient metacognitive skills (Study 3).
The best acid test of our proposition, however, is to manipulate
competence and see if this improves metacognitive skills and thus the
accuracy of self-appraisals (Prediction 4). This would not only enable us
to speak directly to causality, but would help rule out the regression
effect alternative account discussed earlier. If the incompetent
overestimate themselves simply because their test scores are very low (the
regression effect), then manipulating competence after they take the test
ought to have no effect on the accuracy of their self-appraisals. If
instead it takes competence to recognize competence, then manipulating
competence ought to enable the incompetent to recognize that they have
performed poorly. Of course, there is a paradox to this assertion. It
suggests that the way to make incompetent individuals realize their own
incompetence is to make them competent.
In Study 4, that is precisely what we set out to do. We gave participants
a test of logic based on the Wason selection task (
Wason, 1966
) and asked them to assess themselves in a manner similar to that in the
previous studies. We then gave half of the participants a short training
session designed to improve their logical reasoning skills. Finally, we
tested the metacognitive skills of all participants by asking them to
indicate which items they had answered correctly and which incorrectly
(after
McPherson & Thomas, 1989
) and to rate their ability and test performance once more.
We predicted that training would provide incompetent individuals with the
metacognitive skills needed to realize that they had performed poorly and
thus would help them realize the limitations of their ability.
Specifically, we expected that the training would (a) improve the ability
of the incompetent to evaluate which test problems they had answered
correctly and which incorrectly and, in the process, (b) reduce the
miscalibration of their ability estimates.
After taking the test, participants were asked to rate their logical
reasoning skills and performance on the test relative to their classmates
on a percentile scale. They also estimated the number of problems they had
solved correctly.
Next, a random selection of 70 participants were given a short logical-
reasoning training packet. Modeled after work by Cheng and her colleagues
(
Cheng, Holyoak, Nisbett, & Oliver, 1986
), this packet described techniques for testing the veracity of logical
syllogisms such as the Wason selection task. The remaining 70 participants
encountered an unrelated filler task that took about the same amount of
time (10 min) as did the training packet.
Afterward, participants in both conditions completed a metacognition task
in which they went through their own tests and indicated which problems
they thought they had answered correctly and which incorrectly.
Participants then re-estimated the total number of problems they had
answered correctly and compared themselves with their peers in terms of
their general logical reasoning ability and their test performance.
Scores on the metacognition task supported the first part of this
prediction. To assess participants' metacognitive skills, we summed the
number of questions each participant accurately identified as correct or
incorrect, out of the 10 problems. Overall, participants who received the
training packet graded their own tests more accurately (
M
= 9.3) than did participants who did not receive the packet (
M
= 6.3),
t
(138) = 7.32,
p
< .0001, a difference even more pronounced when looking at bottom-
quartile participants exclusively,
M
s = 9.3 versus 3.5,
t
(36) = 7.18,
p
< .0001. In fact, the training packet was so successful that those who
had originally scored in the bottom quartile were just as accurate in
monitoring their test performance as were those who had initially scored
in the top quartile,
M
s = 9.3 and 9.9, respectively,
t
(30) = 1.38,
ns.
In other words, the incompetent had become experts.
To test the second part of our prediction, we examined the impact of
training on participants' self-impressions in a series of 2 (training: yes
or no) × 2 (pre- vs. postmanipulation) × 4 (quartile: 1
through 4) mixed-model analyses of variance (ANOVAs). These analyses
revealed the expected three-way interactions for estimates of general
ability,
F
(3, 132) = 2.49,
p
< .07, percentile score on the test,
F
(3, 132) = 8.32,
p
< .001, and raw test score,
F
(3, 132) = 19.67,
p
< .0001, indicating that the impact of training on self-assessment
depended on participants' initial test performance.
Table 2
displays how training influenced the degree of miscalibration
participants exhibited for each measure.
To examine these interactions in greater detail, we conducted two sets of
2 (training: yes or no) × 2 (pre- vs. postmanipulation) ANOVAs. The
first looked at participants in the bottom quartile, the second at
participants in the top quartile. Among bottom-quartile participants, we
found the expected interactions for estimates of logical reasoning
ability,
F
(1, 35) = 6.67,
p
< .02, percentile test score,
F
(1, 35) = 14.30,
p
< .002, and raw test score,
F
(1, 35) = 41.0,
p
< .0001, indicating that the change in participants' estimates of
their ability and test performance depended on whether they had received
training.
No such increase in calibration was found for bottom-quartile participants
in the untrained group (
n
= 18). As
Table 2
shows, they initially reported that both their ability and score on the
test fell in the 55th percentile, and did not change those estimates in
their second set of self-ratings, all
t
s < 1. Their estimates of their raw test score, however, did
changebut in the wrong direction. In their initial ratings, they
estimated that they had solved 5.8 problems correctly. On their second
ratings, they raised that estimate to 6.3,
t
(17) = 2.62,
p
< .02.
For individuals who scored in the top quartile, training had a very
different effect. As we did for their bottom-quartile counterparts, we
conducted a set of 2 (training: yes or no) × 2 (pre- vs.
postmanipulation) ANOVAs. These analyses revealed significant interactions
for estimates of test performance,
F
(1, 26) = 6.39,
p
< .025, and raw score,
F
(1, 26) = 4.95,
p
< .05, but not for estimates of general ability,
F
(1, 26) = 1.03,
ns.
In the first analysis, we examined objective performance, metacognitive
skill, and the accuracy of self-appraisals in a manner suggested by
Baron and Kenny (1986)
. According to their procedure, metacognitive skill would be shown to
mediate the link between incompetence and inflated self-assessment if (a)
low levels of objective performance were associated with inflated self-
assessment, (b) low levels of objective performance were associated with
deficits in metacognitive skill, and (c) deficits in metacognitive skill
were associated with inflated self-assessment even after controlling for
objective performance. Focusing on the 70 participants in the untrained
group, we found considerable evidence of mediation. First, as reported
earlier, participants' test performance was a strong predictor of how much
they overestimated their ability and test performance. An additional
analysis revealed that test performance was also strongly related to
metacognitive skill,
b
(68) = .75,
p
< .0001. Finally, and most important, deficits in metacognitive skill
predicted inflated self-assessment on the all three self-ratings we
examined (general logical reasoning ability, comparative performance on
the test, and absolute score on the test)even after objective
performance on the test was held constant. This was true for the first
set of self-appraisals,
b
s(67) =
-
.40 to
-
.49,
p
s < .001, as well as the second,
b
s(67) =
-
.41 to
-
.50,
p
s < .001.
5
Given these results, one could wonder whether the impact of training on
the self-assessments of participants in the bottom quartile was similarly
mediated by metacognitive skills. To find out, we conducted a mediational
analysis focusing on bottom quartile participants in both trained and
untrained groups. Here too, all three mediational links were supported. As
previously reported, bottom-quartile participants who received training
(a) provided less inflated self-assessments and (b) evidenced better
metacognitive skills than those who did not receive training. Completing
this analysis, regression analyses revealed that metacognitive skills
predicted inflated self-assessment with participants' training condition
held constant,
b
(34)s =
-
.68 to
-
.97,
p
s < .01. In fact, training itself failed to predict miscalibration when
bottom-quartile participants' metacognitive skills were taken into
account,
b
s(34) = .00 to .25,
ns.
These analyses suggest that the benefit of training on the accuracy of
self-assessment was achieved by means of improved metacognitive skills.
6
General
Discussion
In the neurosciences, practitioners and researchers occasionally come
across the curious malady of anosognosia. Caused by certain types of
damage to the right side of the brain, anosognosia leaves people paralyzed
on the left side of their body. But more than that, when doctors place a
cup in front of such patients and ask them to pick it up with their left
hand, patients not only fail to comply but also fail to understand why.
When asked to explain their failure, such patients might state that they
are tired, that they did not hear the doctor's instructions, or that they
did not feel like respondingbut never that they are suffering from
paralysis. In essence, anosognosia not only causes paralysis, but also the
inability to realize that one is paralyzed (
D'Amasio, 1994
).
In this article, we proposed a psychological analogue to anosognosia. We
argued that incompetence, like anosognosia, not only causes poor
performance but also the inability to recognize that one's performance is
poor. Indeed, across the four studies, participants in the bottom quartile
not only overestimated themselves, but thought they were above-average,
Z
= 4.64,
p
< .0001. In a phrase, Thomas Gray was right: Ignorance is bliss
at least when it comes to assessments of one's own ability.
What causes this gross overestimation? Studies 3 and 4 pointed to a lack
of metacognitive skills among less skilled participants. Bottom-quartile
participants were less successful than were top-quartile participants in
the metacognitive tasks of discerning what one has answered correctly
versus incorrectly (Study 4) and distinguishing superior from inferior
performances on the part of one's peers (Study 3). More conclusively,
Study 4 showed that improving participants' metacognitive skills also
improved the accuracy of their self-appraisals. Note that these findings
are inconsistent with a simple regression effect interpretation of our
results, which does not predict any changes in self-appraisals given
different levels of metacognitive skill. Regression also cannot explain
the fact that bottom-quartile participants were nearly 4 times more
miscalibrated than their top-quartile counterparts.
Study 4 also revealed a paradox. It suggested that one way to make people
recognize their incompetence is to make them competent. Once we taught
bottom-quartile participants how to solve Wason selection tasks correctly,
they also gained the metacognitive skills to recognize the previous error
of their ways. Of course, and herein lies the paradox, once they gained
the metacognitive skills to recognize their own incompetence, they were no
longer incompetent. "To have such knowledge," as
Miller (1993)
put it in the quote that began this article, "would already be to remedy
a good portion of the offense."
The Burden of Expertise
Although our emphasis has been on the miscalibration of incompetent
individuals, along the way we discovered that highly competent individuals
also show some systematic bias in their self appraisals. Across the four
sets of studies, participants in the top quartile tended to underestimate
their ability and test performance relative to their peers,
Z
s =
-
5.66 and
-
4.77, respectively,
p
s < .0001. What accounts for this underestimation? Here, too, the
regression effect seems a likely candidate: Just as extremely low
performances are likely to be associated with slightly higher perceptions
of performance, so too are extremely high performances likely to be
associated with slightly lower perceptions of performance.
As it turns out, however, our data point to a more psychological
explanation. Specifically, top-quartile participants appear to have fallen
prey to a
false-consensus effect
(
Ross et al., 1977
). Simply put, these participants assumed that because they performed so
well, their peers must have performed well likewise. This would have led
top-quartile participants to underestimate their comparative abilities
(i.e., how their general ability and test performance compare with that of
their peers), but not their absolute abilities (i.e., their raw score on
the test). This was precisely the pattern of data we observed: Compared
with participants falling in the third quartile, participants in the top
quartile were an average of 23% less calibrated in terms of their
comparative performance on the testbut 16% more calibrated in terms
of their objective performance on the test.
7
More conclusive evidence came from Phase 2 of Study 3. Once top-quartile
participants learned how poorly their peers had performed, they raised
their self-appraisals to more accurate levels. We have argued that
unskilled individuals suffer a dual burden: Not only do they perform
poorly, but they fail to realize it. It thus appears that extremely
competent individuals suffer a burden as well. Although they perform
competently, they fail to realize that their proficiency is not
necessarily shared by their peers.
Incompetence and the Failure of Feedback
One puzzling aspect of our results is how the incompetent fail, through
life experience, to learn that they are unskilled. This is not a new
puzzle.
Sullivan, in 1953
, marveled at "the failure of learning which has left their capacity for
fantastic, self-centered delusions so utterly unaffected by a life-long
history of educative events" (p. 80). With that observation in mind, it is
striking that our student participants overestimated their standing on
academically oriented tests as familiar to them as grammar and logical
reasoning. Although our analysis suggests that incompetent individuals are
unable to spot their poor performances themselves, one would have thought
negative feedback would have been inevitable at some point in their
academic career. So why had they not learned?
One reason is that people seldom receive negative feedback about their
skills and abilities from others in everyday life (
Blumberg, 1972
;
Darley & Fazio, 1980
;
Goffman, 1955
;
Matlin & Stang, 1978
;
Tesser & Rosen, 1975
). Even young children are familiar with the notion that "if you do not
have something nice to say, don't say anything at all." Second, the
bungled robbery attempt of McArthur Wheeler not withstanding, some tasks
and settings preclude people from receiving self-correcting information
that would reveal the suboptimal nature of their decisions (
Einhorn, 1982
). Third, even if people receive negative feedback, they still must come
to an accurate understanding of why that failure has occurred. The problem
with failure is that it is subject to more attributional ambiguity than
success. For success to occur, many things must go right: The person must
be skilled, apply effort, and perhaps be a bit lucky. For failure to
occur, the lack of any one of these components is sufficient. Because of
this, even if people receive feedback that points to a lack of skill, they
may attribute it to some other factor (
Snyder, Higgins, & Stucky, 1983
;
Snyder, Shenkel, & Lowery, 1977
).
Finally, Study 3 showed that incompetent individuals may be unable to take
full advantage of one particular kind of feedback: social comparison. One
of the ways people gain insight into their own competence is by watching
the behavior of others (
Festinger, 1954
;
Gilbert, Giesler & Morris, 1995
). In a perfect world, everyone could see the judgments and decisions that
other people reach, accurately assess how competent those decisions are,
and then revise their view of their own competence by comparison. However,
Study 3 showed that incompetent individuals are unable to take full
advantage of such opportunities. Compared with their more expert peers,
they were less able to spot competence when they saw it, and as a
consequence, were less able to learn that their ability estimates were
incorrect.
Limitations of the Present Analysis
We do not mean to imply that people are always unaware of their
incompetence. We doubt whether many of our readers would dare take on
Michael Jordan in a game of one-on-one, challenge Eric Clapton with a
session of dueling guitars, or enter into a friendly wager on the golf
course with Tiger Woods. Nor do we mean to imply that the metacognitive
failings of the incompetent are the only reason people overestimate their
abilities relative to their peers. We have little doubt that other factors
such as motivational biases (
Alicke, 1985
;
Brown, 1986
;
Taylor & Brown, 1988
), self-serving trait definitions (
Dunning & Cohen, 1992
;
Dunning et al., 1989
), selective recall of past behavior (
Sanitioso, Kunda, & Fong, 1990
), and the tendency to ignore the proficiencies of others (
Klar, Medding, & Sarel, 1996
;
Kruger, 1999
) also play a role. Indeed, although bottom-quartile participants
accounted for the bulk of the above-average effects observed in our
studies (overestimating their ability by an average of 50 percentile
points), there was also a slight tendency for the other quartiles to
overestimate themselves (by just over 6 percentile points)a fact our
metacognitive analysis cannot explain.
When can the incompetent be expected to overestimate themselves because of
their lack of skill? Although our data do not speak to this issue
directly, we believe the answer depends on the domain under consideration.
Some domains, like those examined in this article, are those in which
knowledge about the domain confers competence in the domain. Individuals
with a great understanding of the rules of grammar or inferential logic,
for example, are by definition skilled linguists and logicians. In such
domains, lack of skill implies both the inability to perform competently
as well as the inability to recognize competence, and thus are also the
domains in which the incompetent are likely to be unaware of their lack of
skill.
In other domains, however, competence is not wholly dependent on knowledge
or wisdom, but depends on other factors, such as physical skill. One need
not look far to find individuals with an impressive understanding of the
strategies and techniques of basketball, for instance, yet who could not
"dunk" to save their lives. (These people are called coaches.) Similarly,
art appraisers make a living evaluating fine calligraphy, but know they do
not possess the steady hand and patient nature necessary to produce the
work themselves. In such domains, those in which knowledge about the
domain does not necessarily translate into competence in the domain, one
can become acutelyeven painfullyaware of the limits of one's
ability. In golf, for instance, one can know all about the fine points of
course management, club selection, and effective "swing thoughts," but
one's incompetence will become sorely obvious when, after watching one's
more able partner drive the ball 250 yards down the fairway, one proceeds
to hit one's own ball 150 yards down the fairway, 50 yards to the right,
and onto the hood of that 1993 Ford Taurus.
Finally, in order for the incompetent to overestimate themselves, they
must satisfy a minimal threshold of knowledge, theory, or experience that
suggests to themselves that they can generate correct answers. In some
domains, there are clear and unavoidable reality constraints that
prohibits this notion. For example, most people have no trouble
identifying their inability to translate Slovenian proverbs, reconstruct
an 8-cylinder engine, or diagnose acute disseminated encephalomyelitis. In
these domains, without even an intuition of how to respond, people do not
overestimate their ability. Instead, if people show any bias at all, it is
to rate themselves as worse than their peers (
Kruger, 1999
).
Relation to Work on Overconfidence
The finding that people systematically overestimate their ability and
performance calls to mind other work on calibration in which people make a
prediction and estimate the likelihood that the prediction will prove
correct. Consistently, the confidence with which people make their
predictions far exceeds their accuracy rates (e.g.,
Dunning, Griffin, Milojkovic, & Ross, 1990
;
Vallone, Griffin, Lin, & Ross, 1990
;
Lichtenstein, Fischhoff, & Phillips, 1982
).
Our data both complement and extend this work. In particular, work on
overconfidence has shown that people are more miscalibrated when they face
difficult tasks, ones for which they fail to possess the requisite
knowledge, than they are for easy tasks, ones for which they do possess
that knowledge (
Lichtenstein & Fischhoff, 1977
). Our work replicates this point not by looking at properties of the task
but at properties of the person. Whether the task is difficult because of
the nature of the task or because the person is unskilled, the end result
is a large degree of overconfidence.
Our data also provide an empirical rebuttal to a critique that has been
leveled at past work on overconfidence.
Gigerenzer (1991)
and his colleagues (
Gigerenzer, Hoffrage, & Kleinbölting, 1991
) have argued that the types of probability estimates used in traditional
overconfidence worknamely, those concerning the occurrence of single
eventsare fundamentally flawed. According to the critique,
probabilities do not apply to single events but only to multiple ones. As
a consequence, if people make probability estimates in more appropriate
contexts (such as by estimating the total number of test items answered
correctly), "cognitive illusions" such as overconfidence disappear. Our
results call this critique into question. Across the three studies in
which we have relevant data, participants consistently overestimated the
number of items they had answered correctly,
Z
= 4.94,
p
< .0001.