1997 National Household Survey on Drug Abuse:  Population Estimates

Previous Page TOC Next Page


The primary objective of the National Household Survey on Drug Abuse (NHSDA) is to measure the prevalence of use of illicit drugs, alcohol, tobacco products, and nonmedical use of prescription drugs in the United States. The 1997 NHSDA is the seventeenth in the series, which began in 1971.

The Population Estimates report is published to provide the drug abuse prevention, treatment, and research communities, and other interested parties, with timely data on current substance use prevalence measures. A companion volume, the Main Findings report, is issued at a later date and presents an expanded analysis of the data, including information on drug and alcohol use trends; demographic correlates of use of illicit drugs, alcohol, and tobacco; patterns and problems of drug use; perceived harmfulness of drug use, etc. Another NHSDA report containing preliminary results from the NHSDA is produced for distribution when the survey results are first released each year. Special analytic reports are also periodically produced on topics of current interest (e.g., drug use among employed people, drug use and family structure).

Estimates presented in Sections III and IV of this report are based on a questionnaire and estimation methodology introduced in 1994 and continued in 1995 through 1997. [ Office of Applied Studies. (1995). National Household Survey on Drug Abuse: Population estimates 1994 (DHHS Publication No. SMA 95-3063, p. 9). Rockville, MD: Substance Abuse and Mental Health Services Administration.] Due to the effect this new methodology may have on the magnitude of estimates, comparisons with NHSDA data prior to 1994 should be made with caution. For additional detailed information on the 1994 questionnaire modification and its implications, see SAMHSA Publication #SMA-96-3084 entitled, The Development and Implementation of a New Data Collection Instrument for the 1994 National Household Survey on Drug Abuse.


The survey was based on a stratified, multi-stage area probability sample. For the 1997 study, 123 primary sample units (PSUs) were selected at the first stage of sampling. Within each PSU, area segments were selected with unequal probability proportional to a composite size measure that was designed to overrepresent concentrated Hispanic and black neighborhoods, as well as younger individuals. Dwelling units were selected from each sample segment. The target population included all civilian residents of households (including civilians residing on military installations) and noninstitutional group quarters (e.g., college dormitories, homeless shelters, rooming houses) 12 years old and older. Persons excluded from the universe include military personnel on active duty, transient populations (such as homeless people that do not reside in shelters), and residents of institutional group quarters, e.g., jails, hospitals, etc. Data collection was continuous over the calendar year.

During the 1997 study, data from a special supplemental sample was collected beginning with the second quarter of data collection. This supplemental sample was designed to increase the number of person respondents who reside in California and Arizona. This allows analysts to form relatively precise direct survey estimates for these States. In summary, 15 of the 123 PSUs were located in California and 18 PSUs were located in Arizona. The final person-level, sample weights for NHSDA respondents were appropriately adjusted to account for this supplemental sample, thereby eliminating any potential bias in estimates that might otherwise exist.

Survey data were collected through personal visits to each selected residence. Introductory letters were mailed to each residence, explaining the survey prior to the interviewer's visit. Upon arrival, the fieldrepresentative conducts a short voluntary screening procedure with any resident of the household 18 years old or older that is capable of providing information on the age, race/ethnicity, sex and marital status of each resident 12-years-old or older. This information is used in a random selection procedure that determines whether any resident members are eligible for an in-depth interview (either one, two, or no individuals are selected). The interviewer has no control over this selection procedure. The 1997 within-household person selection probabilities were based on the race/ethnicity of the head of household and the ages of each household member. Selected individuals were then asked if they would complete a voluntary interview. NHSDA field representatives conducted the interviews using a paper and pencil questionnaire that included both interviewer administered questions and self-administered answer sheets (for collection of sensitive information). All screening and interview responses are kept confidential.

In 1997 a total of 31,290 eligible dwelling unit members were selected for an interview; of these, a total of 24,505 interviews were completed (5,223 interviews were conducted in California, 4,577 interviews were conducted in Arizona and 14,705 interviews were conducted in the remaining States). Total response rates for screening and interviewing were 92.7 percent and 78.3 percent, respectively.


For convenience, both rate estimates and corresponding population estimates are included in each table. Population estimates are presented in thousands (e.g., a population estimate of 430 represents 430,000 persons). Each "observed estimate" is followed by its 95 percent confidence interval in parentheses. Estimates are provided for each of the following time periods when use of illicit drugs, alcohol, and tobacco occurred: (a) use in the lifetime (Ever Used), (b) use in the past year (Used Past Year), and (c) use in the past month (Used Past Month), also referred to as "current use." These estimates have been obtained by weighting the data to reflect current population totals for various demographic subgroup populations.

Development of Weights

An analysis weight is calculated for each completed interview to reflect selection probabilities and to compensate for nonresponse and undercoverage. A poststratification adjustment is made to force the respondent weight totals to equal U.S. Bureau of the Census projections of the civilian, noninstitution- alized population. The poststratification totals [ These 1997 population projections are based on the 1990 U.S. Census counts.] are obtained from the National Estimates and Projections Branch of the U.S. Bureau of the Census and are classified by age group, sex, race, and Hispanic origin. The poststratification totals obtained from the U.S. Bureau of Census were appropriately adjusted using State-level population projections that were also obtained from the U.S. Bureau of Census. In addition to the demographic control totals, this allowed for the use of a three-level "State" variable (i.e. California, Arizona and all other States) to be used in the post-stratification in order to account for the special supplemental sample. In general, the interview samples from each quarter are poststratified to one-fourth of the projected population totals. These totals represent the population at the midpoint of each quarter's data collection period (the 15th day of February, May, August, and November). The resulting quarterly analysis weights sum to the average of the four quarter-specific projections. The final analysis weight can be viewed as the number of population members that each respondent represents.

Tables 1A, 1B, and 1C present the sample sizes and U.S. population totals on which the population estimates of drug use are based. The population figures are estimates of the civilian, noninstitutionalized population and are generated by summing the individual final analysis weights of the respondents belonging to each populationcategory.

Adjusting for Nonresponse Through Imputation

Population estimates are based on either the total sample or all cases in a subgroup, including some cases where missing data for some recency-of-use and frequency-of-use variables were replaced with logically or statistically imputed (i.e., replaced) values. The interview classification "minimally complete" (a status necessary for a case to be included in the database) requires that data on the recency of use of alcohol, marijuana, and cocaine be present. To determine case completeness, an editing procedure is employed to replace missing data for these substances based on information supplied by the respondent elsewhere in the questionnaire. After this editing, case completeness is determined. When necessary, additional logical imputation is also done to replace other inconsistent, missing, or otherwise faulty data.

After editing, any data still missing for recency-of-use questions (for drugs other than alcohol, cocaine, and marijuana) are statistically imputed using a technique known as "hot deck imputation." The first step in this procedure involves sorting the data file progressively using data on recency of use of alcohol, marijuana, and cocaine; age; sex; Hispanic origin; and race. The hot deck imputation procedure replaces a missing item on a particular record by the last encountered nonmissing response for that item (from the previous record) on the sorted database. The hot deck imputation procedure is appropriate for recency-of-use variables because the level of item nonresponse is low.

Missing data for the variables on frequency of use in the past 12 months are statistically imputed using a regression-based method of imputation. This imputation procedure involves estimating a polytomous logistic model using a number of respondent characteristics. The explanatory variables used in these models include those variables used in the recency-of-use hot deck imputation procedure, such as recency of use of alcohol, marijuana, and cocaine; age; sex; Hispanic origin; and race. After the model parameters are estimated, the resulting model is used to predict a categorical value for each frequency-of-use item nonresponse. The model-based imputation procedure is appropriate for two reasons: (a) the relative amount of nonresponse or faulty responses to these questions is larger than what is observed for the recency-of-use items, and (b) the model-based procedure allows a greater number of statistically significant explanatory variables to contribute to imputing a response compared to what is possible with the hot deck method.

The main advantage of imputation is that it simplifies calculation of the estimates. Its use can reduce the bias caused by missing data and thus improve the accuracy of estimates. In this survey, however, the potential impact of bias due to item nonresponse and the impact of imputation on the estimates themselves are quite small since item nonresponse is less than 2 percent for the drug use recency questions.

Sampling Error and Confidence Intervals

The NHSDA, like all sample surveys, has an inherent degree of statistical uncertainty based on the sample design. NHSDA estimates are subject to uncertainties of two types: sampling errors and nonsampling errors. Examples of nonsampling errors are recording mistakes, coding errors, nonresponse, differences in respondents' interpretations of questions and purposely false answers. The effects of nonsampling errors on the estimates cannot normally be quantified; however, rigorous attempts are made to minimize their occurrence through pretesting, interviewer training, interview verification, coder training, coding verification, and other quality control measures.

Sampling errors denote the random fluctuations that occur in estimates based on samples drawn from a population; such variations can be eliminated only by conducting a complete census. Using the same procedures, different samples drawn from the same population would be expected to result in different estimates. Many of these observed estimates would differ to some degree from the "true" population value, and these differences are due to sampling error. The variance of an estimate is the basic measure of this type of error.

To account for the complex features of the NHSDA sample design (such as unequal selection probabilities, stratification, and clustering), the variance estimates of the NHSDA drug use statistics are computed for this report using the survey data analysis software package, SUDAAN. [ Shah, B.V., Barnwell, B.G., & Bieler, G.S. (1997). SUDAAN User's Manual, Release 7.5 . Research Triangle Park, NC: Research Triangle Institute. ] Estimates of means or proportions, such as drug use prevalence, take the form of nonlinear statistics where the variances cannot be expressed in closed form. Variance estimation for nonlinear statistics in SUDAAN is based on a first-order Taylor series approximation of the deviations of estimates from their expected values. The resulting variance estimates are approximately unbiased for sufficiently large sample sizes.

For a given variance estimate, the associated design effect is the ratio of the design-based variance estimate over the variance that would have been obtained from a simple random sample of the same size. Because the combined design features of stratification, clustering, and unequal weighting are expected to increase the variance estimates, the design effect should virtually always be greater than one. However, for prevalence rates near zero, the variance-inflating effects of unequal weighting and clustering are sometimes underestimated, resulting in design effects of less than one. Because the corresponding variance estimates are then considered anomalously small, two other variance estimates are computed as quality control measures. The first of these other variance estimates is based only on the stratification and unequal weighting effects, and the second is based on simple random sampling. The variance estimate used for obtaining confidence intervals is then the maximum of these three estimates.

The 95 percent confidence intervals for the drug use proportions and corresponding population estimates are constructed based on the logit transformation. Because the drug use proportions in the NHSDA are frequently small, the logit transformation has been used for this report to yield asymmetric interval boundaries. These asymmetric intervals are more balanced with respect to the probability that the interval is above or below the true population value than is the case for standard symmetric confidence intervals.

To illustrate the method, let

p = estimated proportion,

var(p) = variance estimate of p,

q = 1-p,

L = logit of p = ln [p/(1-p)], where "ln" denotes the natural logarithm, and

var(L) = var(p)/(pq)2.

The approximate 95 percent confidence interval for L is then calculated as

L` +- ` 1.96 ``left(`{SQRT {`var `(`p)`}} OVER {pq} ``right)~=~(`A`,`B`)~~,

where the quantity in parentheses that is multiplied by 1.96 estimates the standard error (SE) of L. Applying the inverse logistic transformation to the confidence interval endpoints, A and B, yields a 95 percent confidence interval for the proportion, P, as

left(`1 OVER {1 `+` \exp `(`-`A`)}` , `1 OVER {`1 `+` \exp `(-`B`)}right) ~=~ (`P sub {lower}~ , ~P sub {upper} `)~~,where "exp" denotes the inverse log transformation. The lower and upper confidence interval endpoints for percentage estimates are obtained by multiplying the lower and upper endpoints for proportions by 100. The confidence interval for the corresponding population estimate is obtained by multiplying the confidence interval endpoints by the estimated number of individuals in the population subgroup constituting the base or denominator of the associated proportion.

The precise interpretation of the 95 percent confidence interval is as follows: If repeated samples of identical design are drawn from the population, and the sample estimate and corresponding upper and lower confidence limits are calculated for each sample, then the true population value is covered by the confidence intervals of, on average, 95 of 100 samples.

For tables in this volume, each estimate of the number of users of the drug in the defined subgroup (as well as its corresponding estimated percentage of the subgroup's total population) is accompanied by an upper and lower confidence limit. For example, in the lower portion of Table 3A, the "observed estimate" for the total number of people who have "ever used" marijuana is 71,112,000. The "lower limit" is 68,207,000, and the "upper limit" is 74,079,000. The interpretation of these estimates is that one can be 95 percent confident that the total number of people who have ever tried marijuana at least once in their lifetime lies between 68,207,000 and 74,079,000, with the best 1997 NHSDA estimate being 71,112,000. The corresponding percentage estimates for the lower and upper confidence limits are 31.5 percent and 34.3 percent, respectively, with the best estimate being 32.9 percent.

As in other publications in the NHSDA series, estimates with low precision are not reported. The criterion used for suppressing estimates is based on the size of the estimate and the relative standard error (RSE) of the estimate. The RSE is defined as the ratio of the standard error of an estimate divided by the estimate itself. Specifically, cell percentages and corresponding estimates of numbers of users are suppressed if at least one of the following two criteria is met:

(1) p < .0005 or p > .9995

(2) RSE[-ln(p)] > 0.175 when p < 0.5

(3) RSE[-ln(1-p)] > 0.175 when p > 0.5

where RSE[-ln(p)] is the RSE(p)/-ln(p). For computational purposes, this is equivalent to

(1) p < .0005 or p > .9995


(2) ------- > 0.175 when p < 0.5

-ln (p)


(3) ----------- > 0.175 when p > 0.5


where SE(p) is the standard error estimate of p. The log transformation of p is used to provide a more balanced treatment of measuring the quality of small, large, and intermediate p values. The switch to (1-p) for p greater than 0.5 yields a symmetric suppression rule across the range of possible p values. Since the sample sizes for subgroup populations are relatively large, low precision generally occurs only for prevalence rates that are near either 0 percent or 100 percent.


NHSDA drug use prevalence data are presented for each sex; four major age groups (12 to 17, 18 to 25, 26 to 34, and 35 years old and older); three major mutually exclusive racial/ethnic groups, based on respondents' self-classifications (Hispanic in origin, regardless of race; white, not of Hispanic origin; and black, not of Hispanic origin); and four geographic regions. (Those who did not identify themselves as Hispanic, non-Hispanic white, or non-Hispanic black are included in the population totals, but separate estimates are not presented for this "other" category because the sample size is too small [see Table 1B].) Tables are presented separately for the total population, whites, Hispanics, blacks, and geographic region. The four U.S. Bureau of the Census regions are Northeast, North Central, South, and West. For each drug, eight tables are arranged to facilitate group comparisons. Data for the estimated numbers of users in subgroups are arranged in rows and presented by sex for each of four age groups. Data in the remaining seven tables for each racial/ethnic or regional subgroup are presented first by age, then by sex, and finally for the total population.

Time periods of use shown in column headings are "ever used," "used past year," and "used past month." These categories are cumulative, i.e., those who have "used [in the] past month" are also included in the "used [in the] past year" and "ever used" categories. Likewise, those who have "used [in the] past year" are included in the "ever used" estimates.

Other than presenting results by age group and other basic demographic characteristics, no attempt is made in this report to control for potentially confounding factors that might help explain any associations observed. This point is particularly salient with respect to race/ethnicity, which tends to be highly associated with socioeconomic characteristics. Also, the cross-sectional nature of the data precludes any causal interpretations of observed relationships. Nevertheless, data presented in this report are useful for comparing demographic subgroups with respect to drug use rates, regardless of why they differ.

Prevalence Estimates for Specific Drugs and Drug Classes

Section III presents the basic set of drug use prevalence estimates grouped by various drug categories. The first drug category presented is "Any Illicit Drug," which includes any use of marijuana/hashish; cocaine, including crack; inhalants; hallucinogens, including LSD and PCP; heroin; and the nonmedical use of psychotherapeutics, i.e., stimulants, sedatives, tranquilizers, and analgesics. Following the estimates for any illicit use, tables are presented separately for various specific categories of illicit drug use as well as for alcohol, cigarettes, and smokeless tobacco. The small number of respondents reporting these drug use behaviors resulted in low precision for most "used past month" estimates, as well as many other estimates; therefore, less detail is shownfor estimates of PCP use, LSD use, heroin use, and needle use.

Frequency of Drug Use Among Past Year Users

Data presented in Section IV are useful for identifying how often a drug is used. After earlier surveys, the estimate of those who had used a drug in the past month was cited by some readers as an estimate of the number of "regular users." This interpretation was not satisfactory because past month users include both those experimenting with the drug as well as regular users. Therefore, information has since been collected on the frequency of use in the past year for marijuana, cocaine, and alcohol. These drugs were selected because of their higher prevalence rates. Frequency of drug use during the past 12 months is classified into three categories: "at least once," "12 or more days," and "51 or more days." The categories are cumulative; those using "51 or more days" are also counted among the "12 or more days" and the "at least once" users. Similarly, those using "12 or more days" are also counted among those who have used "at least once" in the past year. By definition, estimates for those who have used "at least once" are equivalent to those who have "used past year" in earlier tables.


The estimates produced in this report should be viewed as approximations based on the best available data. Readers are therefore cautioned to take the following points into account when using or interpreting the data in this report:

·The value of self-reports obviously depends on the honesty and memory of sampled respondents. Research has supported the validity of self-report data in similar

contexts. [ Rouse, B.A, Kozel, N.J., & Richards, L.G. (Eds.). (1985). Self-report methods of estimating drug use (NIDA Research Monograph No. 57, DHHS Publication No. ADM 85-1402). Rockville, MD: National Institute on Drug Abuse.] , [ Turner, C.F., Lessler, J.T., & Gfroerer, J.C. (Eds.). (1992). Survey measurement of drug use: Methodological studies (DHHS Publication No. ADM 92-1929). Rockville, MD: National Institute on Drug Abuse.] Although NHSDA procedures are designed to encourage truthfulness and recall, as with all studies of this type, some under reporting or over reporting may occur.

·NHSDA drug use prevalence estimates for specific subgroups are sometimes based on modest to small sample sizes, which may lead to substantial sampling error.

·Population projections prepared for the U.S. Bureau of the Census' Current Population Survey (and used in weighting the 1997 NHSDA sample) are subject to error, which increases with the age of the last census.

·The population surveyed consists of noninstitutionalized civilians living in households, college dormitories, homeless shelters, rooming houses, and on military installations. Therefore, this report does not present estimates for some segments of the U.S. population that may contain a substantial proportion of drug users, such as transients not residing in shelters (e.g., users of soup kitchens or residents of street encampments) and those incarcerated in county jails or State and Federal prisons.

Previous Page Page Top TOC Next Page

This page was last updated on February 05, 2009.