- For convenience, both rate estimates and corresponding
population estimates are included in each table. Population estimates are
presented in thousands (e.g., a population estimate of 430 represents 430,000
persons). Each "observed estimate" is followed by its 95% confidence interval
in parentheses. Estimates are provided for each of the following time periods
when use of illicit drugs, alcohol, and tobacco occurred: (a) use in the
lifetime (ever used), (b) use in the past year (used past year), and (c)
use in the past month (used past month), also referred to as "current use."
These estimates have been obtained by weighting the data to reflect current
population totals for various demographic subgroup populations.

- An analysis weight was calculated for each completed
interview to reflect selection probabilities and to compensate for nonresponse
and undercoverage. A poststratification adjustment was made to force the
respondent weight totals to equal U.S. Bureau of the Census projections
of the civilian, noninstitutionalized population 12-years or older. The
poststratification totals4 were obtained from
the National Estimates and Population Projections Branch of the U.S. Bureau
of the Census and classified

- Tables 1A, 1B, and 1C in Section II present the
sample sizes and U.S. population totals on which the population estimates
of drug use are based. The population figures are estimates of the civilian,
noninstitutionalized population and are generated by summing the individual
final analysis weights of the respondents belonging to each population
category.

- The population estimates in this report are based
on either the total responding sample or all cases in a subgroup, including
some cases where missing data for some recency-of-use and frequency-of-use
variables were replaced with logically or statistically imputed (i.e.,
replaced) values. The interview classification "minimally complete" (a
status necessary for a case to be included in the database) requires that
data on the recency of use of alcohol, marijuana, and cocaine be present.
To determine case completeness, an editing procedure was employed to replace
missing data for these substances based on information supplied by the
respondent elsewhere in the questionnaire. After this editing, case completeness
was determined. When necessary, additional logical imputation also was
done to replace other inconsistent, missing, or otherwise faulty data.

After editing, any data still missing for recency-of-use and frequency-of-use questions (for drugs other than alcohol, cocaine, and marijuana) were statistically imputed using the technique of sequential hot-deck imputation. The first step in this procedure involves sorting the data file progressively using data on recency of use of alcohol, marijuana, and cocaine; age; gender; Hispanic origin; race/ethnicity; and a State indicator variable (i.e., California, Arizona, or remainder of the United States). The hot-deck imputationprocedure replaces a missing item on a particular record by the last encountered nonmissing response for that item (from the previous record) on the sorted database. The hot-deck imputation procedure is appropriate for recency-of-use and most frequency-of-use variables because the level of item nonresponse is low.

Missing data for the variables on frequency of use
of alcohol, cocaine, and marijuana in the past 12 months were statistically
imputed using a regression-based method of imputation. This imputation
procedure involves estimating a polytomous logistic model using a number
of respondent characteristics. The explanatory variables used in these
models included those variables used in the recency-of-use hot-deck imputation
procedure, such as recency of use of alcohol, marijuana, and** **cocaine;
age; gender; Hispanic origin; race/ethnicity; and State. After the model
parameters were estimated, the resulting model was used to predict a categorical
value for each frequency-of-use item nonresponse. The model-based imputation
procedure is appropriate for alcohol, cocaine, and marijuana frequencies
for two reasons: (a) the relative amount of nonresponse or faulty responses
to these questions is larger than what is observed for the recency-of-use
and other frequency-of-use items, and (b) the model-based procedure allows
a greater number of statistically significant explanatory variables to
contribute to imputing a response compared to what is possible with the
hot-deck method.

The main advantage of imputation is that it simplifies the calculation of estimates. Its use can reduce the bias caused by missing data and thus improve the accuracy of estimates. In the 1998 NHSDA, however, the potential impact of bias due to item nonresponse and the impact of imputation on the estimates themselves were quite small because item nonresponse was less than 2% for most of the drug use recency questions.

- The NHSDA, like all sample surveys, has an inherent
degree of statistical uncertainty based on the sample design. NHSDA estimates
are subject to uncertainties of two types: sampling errors and nonsampling
errors. Examples of nonsampling errors are recording mistakes, coding errors,
nonresponse, differences in respondents' interpretations of questions,
and purposely false answers. The effects of nonsampling errors on the estimates
cannot normally be quantified; however, rigorous attempts are made to minimize
their occurrence through pretesting, interviewer training, interview verification,
coder training, coding verification, and other quality control measures.

Sampling errors denote the random fluctuations that occur in estimates based on samples drawn from a population; such variations can be eliminated only by conducting a complete census. Using the same procedures, different samples drawn from the same population would be expected to result in different estimates. Many of these observed estimates would differ to some degree from the "true" population value, and these differences are due to sampling error. The variance of an estimate is the basic measure of this type of error.

To account for the complex features of the NHSDA sample design (such as unequal selectionprobabilities, stratification, and clustering), the variance estimates of the NHSDA drug use statistics are computed for this report using a survey data analysis software package called SUDAAN.5 Estimates of means or proportions, such as drug use prevalence, take the form of nonlinear statistics where the variances cannot be expressed in closed form. Variance estimation for nonlinear statistics in SUDAAN is based on a first-order Taylor series approximation of the deviations of estimates from their expected values. The resulting variance estimates are approximately unbiased for sufficiently large sample sizes.

For a given variance estimate, the associated design effect is the ratio of the design-based variance estimate over the variance that would have been obtained from a simple random sample of the same size. Because the combined design features of stratification, clustering, and unequal weighting are expected to increase the variance estimates, the design effect should virtually always be greater than one. For prevalence rates near zero, however, the variance-inflating effects of unequal weighting and clustering are sometimes underestimated, resulting in design effects of less than one. Because the corresponding variance estimates are then considered anomalously small, two other variance estimates are computed as quality control measures. The first of these other variance estimates is based only on the stratification and unequal weighting effects, and the second is based on simple random sampling. The variance estimate used for obtaining confidence intervals is then the maximum of these three estimates.

The 95% confidence intervals for the drug use proportions and corresponding population estimates are constructed based on the logit transformation. Because the drug use proportions in the NHSDA are frequently small, the logit transformation has been used for this report to yield asymmetric interval boundaries. These asymmetric intervals are more balanced with respect to the probability that the interval is above or below the true population value than is the case for standard symmetric confidence intervals.

To illustrate the method, let

*L* = logit of *p* = *ln* [*p*/(1-*p*)],
where "*ln*" denotes the natural logarithm, and

where the quantity in parentheses that is multiplied
by 1.96 estimates the standard error (SE) of *L*. Applying the inverse
logistic transformation to the confidence interval endpoints, A and B,
yields a 95% confidence interval for the proportion, *P*, as

where "*exp*" denotes the inverse log transformation.
The lower and upper confidence interval endpoints for percentage estimates
are obtained by multiplying the lower and upper endpoints for proportions
by 100. The confidence interval for the corresponding population estimate
is obtained by multiplying the confidence interval endpoints by the estimated
number of individuals in the population subgroup constituting the base
or denominator of the associated proportion.

- The precise interpretation of the 95% confidence
interval is as follows: If repeated samples of identical design are drawn
from the population, and the sample estimate and corresponding upper and
lower confidence limits are calculated for each sample, then the true population
value is covered by the confidence intervals of, on average, 95 of 100
samples.

For tables in this report, each estimate of the number of users of the drug in the defined subgroup (as well as its corresponding estimated percentage of the subgroup's total population) is accompanied by an upper and lower confidence limit. For example, in the lower portion of Table 3A, the "observed estimate" for the total number of people who have "ever used" marijuana is 72,070,000. The "lower limit" is 69,122,000, and the "upper limit" is 75,080,000. The interpretation of these estimates is that one can be 95% confident that the total number of people who have ever tried marijuana at least once in their lifetime lies between 69,122,000 and 75,080,000, with the best 1998 NHSDA estimate being 72,070,000. The corresponding percentage estimates for the lower and upper confidence limits are 31.6% and 34.4%, respectively, with the best estimate being 33.0%.

As in other publications in the NHSDA series, estimates with low precision are not reported. The criterion used for suppressing estimates is based on the size of the estimate and the relative standard error (RSE) of the estimate. The RSE is defined as the ratio of the standard error of an estimate divided by the estimate itself. Specifically, cell percentages and corresponding estimates of numbers of users are suppressed if at least one of the following three criteria is met:

(1)* p* < .0005 or *p* $
.9995

(2)* RSE*[-*ln(p)*] > 0.175 when *p*
__<__ 0.5

(3)* RSE*[-*ln*(1-*p*)] > 0.175 when
*p* > 0.5

- (1)

- -

*SE(p*)/(1-*p*)

- -

4 These 1998 population projections were based on the 1990 U.S. Census counts.

5 Shah, B.V., Barnwell, B.G., & Bieler, G.S. (1997).SUDAAN user's manual: Version 7.5. Research

Triangle Park, NC: Research Triangle Institute.

This page was last updated on June 03, 2008. |