Worker Drug Use and Workplace Policies and Programs: Results from the 1994 and 1997 National Household Survey on Drug Abuse
Although the sample design has changed several times since its inception in 1971, the NHSDA has continued to be representative of the U.S. general population age 12 and older. The 1997 NHSDA employed a multistage area probability sample of 24,505 persons interviewed January-December 1997. Response rates for household screening and for interviewing were 93 percent and 78 percent, respectively. Blacks, Hispanics, young people (i.e., age 12-34), and residents of Arizona and California were oversampled to improve the accuracy of estimates for those populations.
The first stage of sampling in the NHSDA is the selection of 115 primary sampling units (PSUs), each consisting of counties (i.e., administrative subdivisions of States) or groups of counties (e.g., metropolitan areas). Within these PSUs, segments (e.g., city blocks or enumeration districts) are selected. In each of the segments, a listing of all dwellings is made. From the eligible sample units, which can be either households or units within group quarters, sample persons are randomly selected, with unequal probabilities, using a screening procedure carried out by interviewers. Oversampling of certain subpopulations is accomplished, by assigning the appropriate selection probabilities at the PSU, segment, and person levels. In 1997, these subpopulations were blacks, Hispanics, young people (i.e., age 12-34), and residents of Arizona and California.
In 1997, the NHSDA sampled segments were allocated equally into four separate samples, one for each 3-month period during the year, so that the survey was essentially continuously in the field. A more complete description of the 1997 NHSDA sample design can be found in the 1997 National Household Survey on Drug Abuse Sample Design Report (Office of Applied Studies, SAMHSA, 1998).
Selection of Primary Sampling Units (PSUs)
The 1997 NHSDA used the same 115 PSUs that were selected for the 1996 NHSDA. These 115 PSUs were selected to represent the nation=s total eligible population, including areas of high Hispanic concentration. The PSUs were metropolitan areas, counties, groups of counties, and independent cities. The 115 PSUs comprised 43 certainty PSUs and 72 noncertainty PSUs. The 43 certainty PSUs were metropolitan areas with high Hispanic concentration that have been included in the NHSDA with certainty since 1988. The 72 noncertainty PSUs were selected with probabilities proportional to size and minimal replacement to represent the balance of the nation outside the 43 certainty PSUs.
The noncertainty PSUs were selected with probabilities proportional to a composite size measure. The composite size measure was defined as the sum of racial/ethnic group dwelling counts weighted by the specified racial/ethnic sampling rates. This selection scheme allowed for targeting particular racial/ethnic subpopulation sample sizes. The use of a composite size measure ensured: (1) roughly equal sample sizes per sample subarea (the second stage unit of selection) and (2) roughly equal probabilities of selecting eligible individuals (the fifth or final-stage unit of selection) within race/ethnicity, and age group.
Selection of the Second-Stage Sample: Subareas within PSUs
About 96 percent of the subareas (i.e., area segments) selected in the 1997 NHSDA, or 1,860 out of 1,940 total selected segments, were previously used in the 1996 NHSDA. The remaining four percent were new subareas that had to be selected in 1996 because there were insufficient unused listing units in some 1996 subareas to provide the required number of sample listing units per subarea.
Both the 1996 NHSDA sample subareas and the new subareas selected in 1997 were drawn from the 1990 Census files. Within each sample PSU, subareas were defined by combining adjacent blocks to create non-overlapping area segments that contained at least 90 occupied dwellings. The area segments from each stratum of each of the PSUs were selected with probabilities proportional to a composite size measure. For each stratum, a composite size measure equaled a weighted average of the numbers of Hispanic, non-Hispanic black, and non-Hispanic non-black dwelling units with weights proportional to the desired racial/ethnic sampling rates.
Selection of the Third-Stage Sample: Listing Units within Subareas
Projections indicated that screenings had to be completed for approximately 50,000 dwelling units in order to identify sufficient dwelling units to yield Hispanic, non-Hispanic black, and age domain samples of the desired size. Assuming an average 93 percent screening completion rate and a projected average 85 percent listing unit eligibility rate, approximately 63,000 listings had to be selected. A listing unit was ineligible for the study if it was: (a) vacant; (b) a vacation, second, or temporary home; (c) not a dwelling unit; (d) a military facility whose occupants were only military personnel; or (e) an institutional housing facility. A sample of 67,049 dwelling units was selected; of these, 56,912 were to determined to be eligible sample units.
Dwelling unit listings were selected using systematic sampling. The sampled listings were then sent to the field for screening. After first determining that a sampled listing was eligible for the study, the interviewer completed a dwelling roster that listed all residents age 12 and older with their age and race/ethnicity. This roster formed the basis for the within-dwelling sampling of individuals.
Within-Dwelling Unit Sampling
The 1997 NHSDA used a within-dwelling unit sample selection approach in which each dwelling was classified according to the race/ethnicity of the head of the dwelling. The interviewers also determined the age domains represented by individuals residing in each dwelling in terms of the presence or absence of individuals age 12-17, 18-25, 26-34, 35-49, and 50 and older.
Data Collection Methodology
NHSDA data are collected through in-person interviews with sample persons, incorporating procedures to maximize respondents' cooperation and willingness to report honestly about their illicit drug use behavior. Introductory letters are sent to sampled addresses, followed by an interviewer visit. A five-minute screening procedure involves listing all household members along with their basic demographic data and possible selection of sample person(s). This selection process is designed to provide the necessary sample sizes for specified population groups by selecting up to two persons per household, depending on the composition of the household.
Interviewers attempt to conduct the survey in a private place, away from other household members. The administration of the interview averages about an hour in length, and includes a combination of interviewer-administered and self-administered questions. In the latter, the answers to sensitive questions, such as those dealing with illicit drug use, are recorded by the respondent and not seen or reviewed by the interviewer. After the self-administered answer sheets are completed, they are placed by the respondent in an envelope, which is sealed and mailed to the contractor with no personal identifying information attached.
Upon receipt, questionnaires are checked for critical identification and demographic data, then keyed to disk. This creates a file consisting of one record for each completed interview. Extensive within-record consistency checks and resolution of most inconsistencies and missing data are done using machine editing routines, called logical imputation. For some key variables that still have missing values after the application of logical imputation, statistical imputation is used to replace the missing data with appropriate valid response codes.
Two types of statistical imputation procedures are used. Hot-deck imputation involves the replacement of a missing value with a valid code taken from another respondent who is "similar" and has complete data. Logistic regression models are also used to determine replacement values for some variables.
The population estimates in this report are based on sample survey data rather than on complete data for the population. Therefore, the data were weighted to obtain unbiased estimates of drug use in the population represented by the 1997 NHSDA. The basic sampling weights for the 1997 NHSDA are equal to the inverse of the probabilities of selection of sample respondents. In other words, the smaller a respondent=s chance of entering the sample, the larger the weight of that respondent in the calculation of unbiased estimates for the target population.
The probability of selection of a respondent can be computed as the product of four stagewise sampling probabilities. The probability of selecting a respondent equals the product of the following: (1) the probability of selecting the respondent=s PSU, (2) the probability of selecting the respondent=s subarea given selection of his/her PSU, (3) the probability of selecting the respondent=s dwelling unit given selection of his/her subarea, and 4) the probability of selecting the respondent given selection of his/her dwelling unit.
To obtain the final NHSDA weights, the basic sampling weights were adjusted to account for dwelling unit-level and individual-level nonresponse and subsampling of individuals within dwelling units, and further adjusted to ensure consistency with intercensal population projections obtained from the U.S. Census Bureau. The adjustments that were made to the basic sampling weights are as follows:
Adjustment 2. Dwelling unit roster adjustment. This adjustment was based on a logistic regression model using the same predictor variables used in Adjustment 1 together with age, race/ethnicity, and sex.
Adjustment 3. Adjustment for individual nonresponse (incomplete interviews). This adjustment was based on a logistic regression model using the same predictor variables as Adjustment 2.
Adjustment 4. Post-stratification to ensure consistency with U.S. Census Bureau projections. This adjustment was designed to ensure consistency between the sums of weights within specified demographic subgroups and U.S. Census Bureau projections of the sizes of the subgroups. The post-stratification totals were obtained from the National Estimates and Projections Branch of the U.S. Census Bureau and were classified by age, sex, race, and Hispanic origin.
The precise method of statistical estimation applied in the logistic regression analyses of Chapter 6 is called Aweighted maximum likelihood estimation.@ This estimation method is implemented in the computer program SUDAN and discussed in Shah et al. (1995) and Shah et al. (1997).
Limitations of the Data
An important limitation of the NHSDA estimates of drug use prevalence is that they are only designed to describe the target population of the survey (i.e., the civilian non-institutionalized population). Although this includes more than 98 percent of the total U.S. population, it does exclude some important and unique subpopulations who may have very different substance use patterns. For example, the survey excludes active military personnel, who have been shown to have significantly lower rates of illicit drug use. The survey also excludes persons living in institutional group quarters, such as prisons and residential drug treatment centers, who have been shown to have higher rates of illicit drug use. Homeless persons who are not living in a shelter at the time of the survey, another population shown to have higher than average rates of illicit drug use, are also excluded.
Sampling Error and Statistical Significance
The sampling error of an estimate is the error caused by the selection of a sample instead of conducting a census of the population. Sampling error is reduced by selecting a large sample and by using efficient sample design and estimation strategies such as stratification, optimal allocation, and ratio estimation.
With the use of probability sampling methods in the NHSDA, it is possible to develop estimates of sampling error from the survey data. These estimates, which are presented in Appendix B, have been calculated for all prevalence estimates presented in this report using a software package called SUDAAN that implements a Taylor series linearization approach to take into account the effects of the complex NHSDA design features (Shah, Barnwell, Hunt, & LaVange, 1994). The sampling errors are used to identify unreliable estimates and to test for the statistical significance of differences between estimates.
Estimates considered to be unreliable due
to unacceptably large sampling error are noted by asterisks (*) in the
tables in Appendix A. The criterion used for suppressing estimates was
based on the relative standard error (RSE), which is defined as the ratio
of the standard error over the estimate. The log transformation of the
proportion estimate (p) was used to calculate the RSE. Specifically, rates
and corresponding estimated number of users were suppressed if:
When making comparisons of estimates for different population subgroups from the same data year, the covariance term, which is usually small and positive, has typically been ignored. This results in somewhat conservative tests of hypotheses that will sometimes fail to establish statistical significance when in fact it exists.
Nonsampling errors occur from nonresponse, coding errors, computer processing errors, errors in the sampling frame, reporting errors, and other errors. Nonsampling errors in the NHSDA are reduced through data editing, statistical adjustments for nonresponse, and close monitoring and periodic retraining of interviewers. Although nonsampling errors can often be much larger than sampling errors, measurement of most nonsampling errors is difficult or impossible. However, some indication of the effects of some types of nonsampling errors can be obtained through proxy measures such as response rates and from other research studies.
In the 1997 NHSDA, of the 81,068 eligible households sampled, 75,136 were successfully screened for a screening response rate of 92.7 percent. In these screened households, a total of 31,290 sample persons were selected, and completed interviews were obtained from 24,505 of these sample persons, for an interview response rate of 78.3 percent. Nearly eleven percent, or 3,365 of the sample persons were classified as refusals, 2,198 (7.0%) were not available or never at home, and 1,151 (3.7%) did not participate for various other reasons, such as physical or mental incompetence or language barrier. The response rate was highest among those age 12-17 (83%). Response rates were also higher among Hispanics (83%) and blacks (82%) than among whites (76%).
Among survey participants, item response rates were above 98 percent for most questionnaire items. However, inconsistent responses for some items, including the drug use items, were common. Estimates of drug use from the NHSDA are based on the responses to multiple questions by respondents, so that the maximum amount of information is used in determining whether a respondent is classified as a drug user. Inconsistencies in responses are resolved through a logical editing process that involves some judgement on the part of survey analysts and is a potential source of nonsampling error. For example, a respondent might report his/her most recent use of a drug as more than a month ago, but in a later question report having used in the past month. This inconsistency could occur because the interviewer may have developed greater rapport with the respondent in the latter stages of the interview, leading to more openness on the part of the respondent. In this example, the respondent would be considered a past-month user. In 1997, 22 percent of the estimate of past-month marijuana use and 53 percent of the past-month cocaine use estimate is based on such cases.
NHSDA estimates are based on self-reports of drug use, and their value depends on respondents' truthfulness and memory. Although many studies have generally established the validity of self-report data and the NHSDA procedures are designed to encourage honesty and recall, some degree of underreporting is assumed. No adjustment to NHSDA data is made to correct for this. The methodology used in the NHSDA has been shown to produce more valid results than other self-report methods, such as telephone interviews (Turner, Lessler, & Gfroerer, 1992; Aquilino, 1994). However, comparisons of NHSDA data with data from surveys conducted in classrooms suggest that underreporting of drug use by youth in their homes may be substantial (Gfroerer, Wright, & Kopstein, 1997).
Aquilino, W.S. (1994). Interview mode effects in surveys of drug and alcohol use: A field experiment. Public Opinion Quarterly, 58, 210-240.
Bishop, Y.M.M., Fienberg, S.E., & Holland, P.W. (1975). Discrete multivariate analysis: Theory and practice. Cambridge, MA: The MIT Press.
Gfroerer, J., Wright, D., & Kopstein, A. (1997). Prevalence of youth substance use: The impact of methodological differences between two national surveys. Drug and Alcohol Dependence, 47, 19-30.
Office of Applied Studies, SAMHSA. (1998). 1997 National Household Survey on Drug Abuse Sample Design Report. Rockville, MD: SAMHSA.
Shah, B.V., Barnwell, B.G., Hunt, P.N., & LaVange, L.M. (1994). SUDAAN User=s Manual: Release 6.4, Research Triangle Park, NC: Research Triangle Institute.
Shah, Babubhai, Folsom, Ralph E., LaVange, Lisa M., Wheeless, Sara C., Boyle, Kerrie E., & Williams, Rick L. (1995) Statistical methods and mathematical algorithms used in SUDAAN. Research Triangle Park, NC: Research Triangle Institute.
Shah, Babubhai V., Barnwell, Beth G., Bieler, Gayle (1997) SUDAAN User=s Manual, Relaease 7.5. Research Triangle Park, NC: Research Triangle Institute.
Turner, C. F., Lessler, J.T. & Gfroerer, J. C. (1992). Survey measurement of drug use: Methodological studies. Rockville, MD: National Institute on Drug Abuse.
This page was last updated on December 30, 2008.