AMMUNITION SELECTION:
RESEARCH AND MEASUREMENT ISSUES

By 
N.J. SCHEERS, Ph.D 
Operations Research Analyst 
and 
STEPHEN R. BAND, Ph.D. 
Special Agent
Institutional Research and Development Unit
FBI Academy
Quantico, VA



When law enforcement officers talk about the "most effective"
caliber bullet or the "best" combat handgun on the street,
emotions run high and opinions vary.  This can be expected, since
these topics have caused considerable debate for years.

But what of the firearms expert who is tasked with the
responsibility of selecting ammunition and firearms for a
department?  What are the crucial issues that should be
considered?  Where should testing begin?  What needs to be
addressed in order to conduct a fair and impartial ammunition and
firearms selection program?

The FBI Academy's Institutional Research and Development Unit
(IRDU) provides consultation primarily to the FBI's Training
Division personnel  regarding research methodology, evaluation
and statistical analysis.  This article provides an introduction
to research design and statistical analysis with regard to
ammunition selection.  It is intended to assist firearms
personnel in designing an ammunition research project and
analyzing the results.

The topics addressed include (1) research design, (2) criteria
for selecting ammunition, (3) rater bias, and (4) statistical
analyses.  Throughout the article emphasis is placed on
understanding the logic of the various elements of a research
project.

DESIGN OF THE RESEARCH

Kerlinger, a research methodologist, indicates that research
design is the structure, plan or strategy developed to obtain
results from a research project. "Research designs are invented
to enable the researcher to answer research questions as validly,
objectively, accurately, and economically as possible."(1)

In designing any ammunition selection study, the first step is to
determine the comparisons to be made.  For example, is the
purpose of the study to compare the same caliber bullet
performance for ammunition made by different companies or to
compare the performance of the same caliber bullet in handguns
produced by different manufacturers?  

The following research design is used throughout this article as
a convenient example; three different calibers are compared on
performance measures of penetration, expansion and weight in a
variety of target simulants (targets).  Examples of targets are
gelatin blocks to simulate human tissue, sheets of metal to
resemble the properties of an automobile door, automobile
windshield glass held at a given angle, and so on.  

"Internal validity" and "external validity" are two major
criteria by which any research design is judged.  Internal
validity, for the example shown above, is the extent to which
differences in penetration, expansion and weight can be
attributed to differences in the physical characteristics of the
calibers rather than to other influences or conditions.  External
validity is the extent to which similar differences in
performance would generalize to other ammunition, conditions or
settings.  The ideal would 
be to maximize both internal and
external validity.  However, the importance of maximizing
internal validity, that is, controlling unwanted  influences, by
necessity, often limits external validity.

Internal Validity

Internal validity is extremely important in any ammunition
selection study; if the research is internally valid, then there
is a high probability that the differences in caliber performance
are caused by the different sizes of the calibers.  Internal
validity is synonymous with control over unwanted influences. 
For ammunition selection studies, the unwanted influences that
must be controlled or held constant would include environmental
conditions, physical/human conditions, and target simulants.  

Environmental conditions-In an indoor range, environmental
conditions for firing ammunition can be easily controlled. 
Shooting should take place where temperature, weather, light and
noise are kept fairly constant.  Without an indoor range, keeping
these conditions constant is extremely difficult.

Physical/human conditions��Many other physical and human
influences can affect a study.  Some of these influences can be
determined; others cannot.  The best way to control unwanted
influences is to simultaneously set up test barrels, one for each
caliber to be tested, and randomly determine the order in which
the test barrels are fired.  (A table of random numbers can be
used to determine the order.)  For example, a researcher who
fires one caliber all morning and then fires a different caliber
throughout 
the afternoon might have measurements influenced by
the fatigue of late afternoon shooting and therefore
unintentionally record measurement results favoring the caliber
shot in the morning.

Other variables are not controlled by random ordering for firing
the different calibers.  For example, if test barrels are not of
equal length, firing them in random order would not compensate
for these differences.  Using test barrels of unequal length will
affect not only the velocity but also the extent of penetration. 
Therefore, if unequal length test barrels are used, additional
research is necessary to determine the �extent� of the differences
among the calibers tested, which adds greatly to the comp
lexity
of the research.

Targets-Whether one type of target or a variety of targets are
used in the study, controlling the  variations in the
construction of these targets is critical and can be done by
randomly distributing targets (again using the random numbers
table) of a given type across calibers.  For example, if a batch
of gelatin blocks is not mixed thoroughly and blocks with greater
density are used with only one caliber, then any differences in
penetration, expansion or weight for the different calibers could
be partiall
y or fully caused by the consistency of the gelatin
blocks.

Since gelatin blocks are used both as stand-alones and behind
other targets, two other controls are suggested.  First, because
gelatin blocks can deteriorate easily, care must be taken to
preserve their integrity.  Gelatin blocks should be stored in
insulated coolers prior to use and should be checked by measuring
their temperature before being used for targets.  Second, an
already-penetrated gelatin block should not be used again as a
target.  The trauma from the first round's impact may disturb
the consis
tency of the gelatin and affect the measurement of
penetration from later rounds fired into it.

External Validity

After maximizing internal validity, the reseaercher must also
plan for external validity so that the results can be generalized
beyond the bullets used in the study.  There are many conditions
under which results may be generalized; no study can accomplish
all of them.  However, it's important to know what these
conditions are since the generalizations that cannot be made set
the limitations of the study.

External validity is the extent to which any difference in
performance among the calibers can be generalized to (1) a larger
population, such as other lots of ammunition of the same caliber
made by the same manufacturer; (2) different populations, such as
other ammunition of the same caliber made by different
manufacturers; (3) "real-life" targets that the study targets
purport to "simulate"; and (4) other conditions and settings.   

How can a researcher determine if the results of a study can be
generalized to a larger population of other same caliber bullets
from the same manufacturer?  If the bullets in a study are a
random sample from this larger population of bullets, the
bullets are representative of that population.  This means that
any sample of the same caliber bullets from this population can
be expected to produce similar results.

How can the results be generalized to other conditions or
settings?  One way is to build important conditions into the
research design.  When the study at the beginning of this article
was designed to compare the performance of different calibers in
a variety of targets, we decided to see if performance results
would generalize over the different target types.  If a
particular caliber shows superior performance, will this occur in
all targets in the study? Some of the targets?

No one study can provide answers to all the questions that can be
generated around a particular research question.  Often, logic
and expert judgment must be used to provide some tentative
answers as to whether the results will generalize to the same
calibers made by other manufacturers and to other conditions and
settings.  Will the same results be obtained in actual automobile
doors as in simulated targets?  Will the same results hold in
extreme temperature as in an indoor range?  If it is important
to ans
wer these questions with confidence, the best procedure is
to carry out a series of studies that vary the important
conditions and settings to determine the extent of the
generalization over conditions.

CRITERIA FOR AMMUNITION SELECTION

The criteria we are using to determine the most effective bullet
are performance measures linked to adversary incapacitation. 
These performance measures are penetration, expansion and weight. 

Reliable and Valid Measurements-Whenever any measurement is
taken, whether it is a blood pressure test, an achievement test
or measurement of bullet performance, it is important to know how
reliable and valid these measurements are.  Reliability refers to
consistency of measurement; for example, it is the extent to
which two raters measuring penetration for a given round obtain
similar results.  Validity refers to the accuracy of measurement;
biased measurements can occur if the measurement of penetration
f
or one of the calibers is consistently too high or too low.

Reliability and validity can affect the results of a study. If
measurement is unreliable, i.e., if the measurement was taken
with a ruler made of very flexible rubber, it will be more
difficult to find true differences among the calibers.  If a
measurement is biased for one caliber but not another, the
results may show differences that are not true differences.
A New Measurement Procedure-Of the three criteria for
ammunition selection, the measurement of a round's penetration
into a gelatin block seems to have the most potential for
reliability and validity problems.  The traditional method of
measuring wound tracks in ballistic gelatin is to view the track
through the surface of the gelatin block and measure the channel
from bullet entry to the end of the "bounce back" with a tape
measure or ruler.  We call this method of measuring penetration
"topical measurement."  

There are two potential problems with the traditional
measurement of penetration.  The first problem centers on
reliability of the measurement.  Would optical/light refraction
through the gelatin block result in inconsistent (more
unreliable) results when penetration was measured topically?  The
second problem centers on the accuracy of the measurement.  Is
there sufficient curvature in some of the wound tracks that
differential results would occur if a more accurate (valid)
measure of the wound track were 
applied? 

In our work in ammunition selection, these problems have been
addressed by measuring each "wound track" by two different
raters using two different methods.  First, measurements were
taken topically using a locking metal tape measure.  Then, a
medical urethral catheter was used to measure the wound track
internally up to the back of the resting bullet.  The total
catheter measurement was the internal measurement added to a
topical measurement from the back of the bullet up to and
including "bounce-back." Fo
r each round fired, two raters
measured penetration both topically and with the catheter.

Both topical and catheter procedures were highly reliable when
the measurements of the two raters were compared.  In examining
the validity of the two procedures, we found that the heaviest
caliber studied showed more curvature than the lightest caliber. 
The average curvature for the heaviest caliber was almost
one-third of an inch, with the largest recorded curvature of over
one-half inch.  Therefore, if curvature is expected, it is
probably best to use the catheter method of measuring
penetration.

RATER BIAS

Rater bias can occur in ammunition selection research when the
researchers themselves (raters) are measuring penetration,
expansion and weight.  Under these conditions it is necessary to
guard against conscious or unconscious biases of the researchers
who may favor a specific caliber.  However, favoring a specific
caliber should not prevent individuals from being active in a
research project.  Rather, controls must be built into the
research that prevent conscious or unconscious biases from
affecting the re
sults.

The usual procedure for eliminating rater bias is to keep the
raters "blind," that is, prevent those who take the penetration,
expansion and weight measurements from knowing which caliber is
being fired.  In ammunition selection studies, firearms experts
are often employed as researchers to select the most effective
bullet.  These experts can, for the most part, immediately
determine bullet caliber from bullet performance; it is
impossible to keep them "blind."  To get around this problem,
staff members not familiar with firearms can be taught to take 
penetration, expansion and weight measurements.  Using blind
raters will add much credibility to a research project.

STATISTICAL ANALYSES

When statistical inference tests are used in making decisions
about results, the question being asked is, "Did the differences
among the calibers happen by chance or are they true
differences?"  A statistically significant result is interpreted
to mean that the probability of the differences among the
calibers being due to chance is very small.

Ammunition and firearms experts may find it useful to call upon
experts in research methodology and statistics to make
recommendations concerning the design of the study, sample size,
procedures and statistical analyses.  Oftentimes, it is possible
to use a graduate student in research methods and/or applied
statistics at a local university to assist in research projects.

Conditions That Influence Statistical Tests

Several conditions influence whether results of performance
tests are statistically significant.  Two of the most important
influences are the size of the sample and the variability of the
data.  In general, the larger the sample size (the number of test
bullets fired) and the smaller the variability (the amount of
variation in penetration of several rounds of a specific
caliber), the more likely it is that the results will be
statistically significant if true differences exist among the
calibers tested.  

While a researcher usually does not have control over the
variability of the data, it is possible to have some control over
sample size.  In ammunition selection studies, because of the
labor involved in making gelatin blocks, a sample of five rounds
per caliber for several targets is considered quite large. 
Statistically, however, this is a small sample size and depending
on the variability of the data, differences as large as one inch
may not be statistically significant.

Statistical Procedures for Ammunition Selection Testing

Because various types of designs can be applied to ammunition
selection studies, numerous types of statistical tests can be
applied to the resultant data.  The following analyses can be
considered and discussed with a consulting statistician for
additional advice with a specific project:

1. Descriptive statistics summarizing the number�� of rounds
fired, the means, standard deviations, standard errors, 95%
confidence intervals, and minimum and maximum measures can be
recorded and displayed in tables;  

2. Homogeneity of variance tests can be conducted to identify 
significant differences in the variability of the different calibers 
tested; 

3. Analysis of variance (ANOVA) tests can be conducted to
identify significant mean differences among two or more calibers
for the various targets.  If an equal number of rounds is
fired for each caliber, ANOVA is the appropriate statistical test
since it is robust to violations of the homogeneity of variance
assumption; and

4. For those ANOVA analyses where significant differences are
found, post hoc comparisons can be calculated to determine
significant differences between all possible pairs of means for
the different calibers tested in a project. 

CONCLUSION

Ammunition selection research projects must be considered in the
context of the overall difficulty in obtaining bullet
performance data.  Despite the best intentions of researchers to
control potential bias and extraneous variables, "real world"
variables associated with law enforcement combat situations can
never be perfectly simulated.  

The research and measurement techniques suggested for ammunition
selection projects are not unique to ammunition selection;
indeed, they are widely used in the physical and behavioral
sciences. However, techniques of this type infrequently appear in
law enforcement-related research literature for ammunition
testing.  When more rigorous approaches to research are used,
there is much more confidence in the results and the
interpretation of the results.  The  importance of valid results
cannot be overstated; t
he lives of law enforcement officers
depend on the results.

Footnote

F.N. Kerlinger, Foundations of Behavioral Research (New York:
Holt, Rinehart and Winston, 1984).
