This document contains a brief discussion of the types of survey errors and the accuracy of estimates derived from surveys.
Estimates derived from sample surveys are subject to two types of errors--sampling errors and
nonsampling errors. Nonsampling errors can be attributed to many
sources, such as response differences, definitional difficulties, differing respondent interpretations,
and respondent inability to recall information.
Sampling errors (the focus of this presentation) occur when estimates are derived from a sample rather than a census of the population. The sample used for a particular survey is only one of a large number of possible samples of the same size and design that could have been selected. Even if the same questionnaire and instructions were used, the estimates from each sample would differ from the others. This difference, termed sampling error, occurs by chance, and its variability is measured by the standard error associated with a particular survey.
Estimates of the characteristics of scientists and engineers obtained using SESTAT are based on sample surveys and are thus subject to sampling errors. (Another related term is the variance which is the square of the standard error and is sometimes used in standard error calculations.)
Having estimated a population quantity such as a mean or total, it is desirable to assess
the accuracy of the estimate. The customary approach is to construct a confidence interval within which
one is sufficiently sure the true population value lies. The standard error of a survey estimate measures
the precision with which an estimate from one sample approximates the true population value, and
thus can be used to construct a confidence interval for a survey parameter to assess
the accuracy of the estimate. Let
be an estimator of a parameter of
interest
with a standard error
.
If the sample size is large,
then an approximate (1-
)100 percent confidence
interval for
is
,
is the upper
/2
percentage point of the normal distribution with mean zero and variance one.
If the process of selecting a sample from the population were repeated many times and an estimate and its standard error calculated for each sample, then:
=0.10) of the intervals from 1.645 (=
) standard errors below
the estimate to 1.645 standard errors above the estimate will include the true population value.
=0.05) of the intervals from 1.96 (=
) standard errors below
the estimate to 1.96 standard errors above the estimate will include the true population value.
=0.01) of the intervals from 2.575
(=
) standard errors below
the estimate to 2.575 standard errors above the estimate will include the true population value.
With an estimate of the standard error and the factors above (1.645, 1.96, or 2.575), a data user may construct a
confidence interval, or range of values, that includes the true population value with the given
probability
(=0.10, 0.05, or 0.01).