Understanding Sampling Errors

This document contains a brief discussion of the types of survey errors and the accuracy of estimates derived from surveys.

1. Types of Survey Errors

Estimates derived from sample surveys are subject to two types of errors--sampling errors and nonsampling errors. Nonsampling errors can be attributed to many sources, such as response differences, definitional difficulties, differing respondent interpretations, and respondent inability to recall information.

Sampling errors (the focus of this presentation) occur when estimates are derived from a sample rather than a census of the population. The sample used for a particular survey is only one of a large number of possible samples of the same size and design that could have been selected. Even if the same questionnaire and instructions were used, the estimates from each sample would differ from the others. This difference, termed sampling error, occurs by chance, and its variability is measured by the standard error associated with a particular survey.

Estimates of the characteristics of scientists and engineers obtained using SESTAT are based on sample surveys and are thus subject to sampling errors. (Another related term is the variance which is the square of the standard error and is sometimes used in standard error calculations.)

2. Assessing the Accuracy of Estimates

Having estimated a population quantity such as a mean or total, it is desirable to assess the accuracy of the estimate. The customary approach is to construct a confidence interval within which one is sufficiently sure the true population value lies. The standard error of a survey estimate measures the precision with which an estimate from one sample approximates the true population value, and thus can be used to construct a confidence interval for a survey parameter to assess the accuracy of the estimate. Let Mathematical sign, Theta-Bar be an estimator of a parameter of interest Mathematical sign, Theta with a standard error Standard error of Theta-Bar formula. If the sample size is large, then an approximate (1-Mathematical sign, alpha)100 percent confidence interval for Mathematical sign, Theta is

Confidence Interval for theta, formula,

where z subscript alpha divided by 2 formula is the upper alpha symbol/2 percentage point of the normal distribution with mean zero and variance one.

If the process of selecting a sample from the population were repeated many times and an estimate and its standard error calculated for each sample, then:

With an estimate of the standard error and the factors above (1.645, 1.96, or 2.575), a data user may construct a confidence interval, or range of values, that includes the true population value with the given probability (=0.10, 0.05, or 0.01).


Go to NSF Home Page Go to SRS Home Page Go to SESTAT Home Page Go to Sampling Error Home Page
Updated: September 28, 2001