1997 National Household Survey on Drug Abuse 
Sample Design Effects and Generalized Standard Errors
The best variance estimation approach is to use commercially available variance estimation software packages, such as the Research Triangle Institute (RTI) SUrvey DAta ANalysis or SUDAAN package (Shah, Barnwell, & Bieler, 1997). Most software packages compute variance estimates for means, percentages, and other statistics based on firstorder Taylor Series approximation of the deviations of estimates from their expected values. SEs have been computed using SUDAAN for all parameter estimates appearing in this report and are available from the Office of Applied Studies (OAS) upon request. Whenever possible, these estimates should be used to compute confidence intervals (CIs) and perform statistical comparisons. It is the goal here, however, to provide future users of the 1997 NHSDA database with approximate SE estimates for situations when the NHSDA's SE estimates are not available.
Two approaches for approximating SE estimates are presented in this section. The first uses median domain design effects. The second is based on a prediction equation obtained from modeling design effects. These alternatives to the published SE estimates are described below.
As noted previously, the design effect is the ratio of the designbased variance estimate divided by the variance estimate that would have been obtained from a simple random sample of the same size. Therefore, the design effect summarizes the effects of stratification, clustering, and unequal weighting on the variance of a complex sample design. Because clustering and unequal weighting are expected to increase the variance, the design effect should virtually always be greater than one.
As also discussed earlier, however, design effects were frequently less than one for prevalence rates near zero. Because these values were considered spurious, another design effect estimate based only on stratification and unequal weighting effects was substituted if it was greater than the total design effect. Moreover, if both design effect estimates were less than one, a value of one was substituted.
For the 1997 NHSDA, the median design effects
were based on estimates from:
For each specified domain within the 1997 NHSDA, a median design effect was calculated from the above estimates as opposed to calculating an average design effect. Because extreme values of some design effects would have distorted the associated averages, medians were chosen to provide a better measure of the central value. The domains were defined by crossclassifications of age by gender, race/ethnicity, population density, geographic region of residence, adult education, and current employment. The domain Arizona/California also was included. Design effects associated with percentage estimates exhibiting low precision were not used. Because the design effects from the licit drug use estimates tended to be larger than the design effects from the illicit drug use estimates, the median design effects were computed separately for these two classifications.
It was observed that the 1997 design effects were generally 1.1 to 1.5 times larger than the 1996 design effects, uniformly across demographic characteristics. A examination of unequal weighting effects showed that the 1997 unequal weighting effects were 1.5 times higher than that for 1996 in the West (compared with 0.9 times the 1996 unequal weighting effect in the other three regions).
Two changes made to the 1997 sample design likely contributed to the increase in the unequal weighting. First, the ArizonaCalifornia supplement may have caused enough of an increase in unequal weighting to affect the precision in some of the domains. Second, another source of unequal weighting was the modification made to the withinhousehold selection procedures to allow pairs in a household to be selected; this introduced some withinage group unequal weighting that did not exist in the 1996 survey.
Table C.1 presents the median design effects for
the illicit drugs, and Table C.2 presents the median design effects for
the licit drugs. These tables can be used to calculate an approximate variance
estimate for a particular domain of the 1997 NHSDA as follows:
where
p_{d}
= estimated proportion for domain d,
n_{d} = sample size for domain d, and
DEFF_{d,MED} = median design effect for domain d .
The approximate SE estimate for p_{d},
SE(p_{d})_{appx}, is simply the square root
of var(p_{d})_{appx}.
When a median design effect for a domain under
investigation is not listed in Tables C.1 or C.2, an alternative SE approximation
is recommended. This approximation uses a prediction equation obtained
from modeling estimated design effects. The definition of the design effect
is the basis for the regression model:
where
var(p) = designbased variance estimate of p and
[p(1p)/n] = simple random sample variance estimate of p .Taking the log (base 10) of both sides of the above equation leads to the following loglinear model:
where= regression coefficients for the intercept, log(p), log(1p), and log(n), respectively.
Separate models were fit for the licit and illicit drug use estimates in the 1997 NHSDA. The design effects used to calculate the medians in Tables C.1 and C.2 were used to fit the licit and illicit drug use models for the 1997 NHSDA.
By substituting the fitted model into the definition of the design effect, a prediction equation for the approximate SE is obtained:
whereb_{0i}, b_{1i}, b_{2i}, b_{3i} = regression coefficients estimates for the intercept, log(p), log(1p), and log(n), respectively.
The indexi indicates whether the SE approximation is for a licit drug or illicit drug prevalence estimate.After solving for the regression coefficients, the above approximation reduces to the following two prediction equations:
Tables C.3 and C.4 present generalized SEs for various percentages (from 1% to 99%) and sample sizes (from 100 to 24,505) for the 1997 NHSDA, predicted from Equations (C2) and (C3).
In summary, the user may obtain 1997 NHSDA SE estimates from the following recommended order of sources:
Once the variance estimates have been obtained, the user may apply the methods discussed in previous sections to compute confidence intervals or make statistical comparisons.
1. commercially available variance estimation software packages, such as SUDAAN; otherwise,
2. published SEs from this or other reports using data from the 1997 NHSDA (obtainable upon request from the OAS at SAMHSA); otherwise,
3. median domain design effects appearing in Tables C.1 and C.2 and application of Equation (C1); otherwise,
4. modelbased prediction, using Equations (C2) and (C3) or Tables C.3 and C.4 for national or regional estimates.
Table C.1 Median Design Effects of Illicit Drug Use Estimates, by Age Group and Demographic Characteristics: 1997 NHSDA Questionnaire
Age Group in Years  
Demographic Characteristic  1217  1825  2634  35+  Total 
Total  2.72  3.14  2.15  2.59  3.69 
Gender
Male Female 
2.27
2.62 
3.00
2.83 
1.85
2.13 
2.50
1.82 
3.67
2.63 
Race/Ethnicity^{1}
White, nonHispanic Black, nonHispanic Hispanic 
2.27
1.77 2.18 
2.49
1.51 2.65 
1.50
1.52 1.63 
2.22
1.85 2.58 
2.81
2.87 3.43 
Population Density
Large metro Small metro Nonmetro 
2.72
2.18 2.22 
4.05
2.38 2.21 
2.62
1.86 1.60 
2.32
2.42 1.74 
4.15
3.39 1.99 
Region
Northeast North Central South West 
1.87
1.49 1.81 4.66 
2.33
1.91 2.70 3.70 
1.98
1.73 2.00 2.84 
1.33
1.99 2.52 3.43 
2.16
2.79 3.19 5.23 
Adult Education^{2}
Less than high school High school graduate Some college College graduate 
N/A
N/A N/A N/A 
2.88
3.30 2.67 2.29 
2.21
1.89 2.43 2.10 
2.66
2.17 2.15 2.78 
3.03
2.59 2.85 2.90 
Current Employment^{3}
Fulltime Parttime Unemployed Other^{4} 
N/A
N/A N/A N/A 
3.27
2.77 2.40 2.66 
2.22
2.21 1.86 1.88 
2.86
1.68 1.25 2.13 
3.54
2.06 2.04 2.25 
Arizona/California
California Arizona, Quarters 24 
2.35
1.44 
2.48
1.65 
2.38
1.59 
2.48
1.38 
3.50
2.23 
N/A: Not applicable.
^{1}The category "other" for
race/ethnicity is not included.
^{2}Data on adult education
are not applicable for 12 to 17 year olds.
^{3}Data on current employment
are not applicable for 12 to 17 year olds.
^{4}Retired, disabled, homemaker,
student, or "other."
^{4}Retired, disabled, homemaker,
student, or "other."
Table C.2 Median Design Effects of Licit Drug Use Estimates, by Age Group and Demographic Characteristics: 1997 NHSDA Questionnaire
Age Group in Years  
Demographic
Characteristic 
1217  1825  2634  35+  Total 
Total  3.35  4.45  2.89  3.97  7.12 
Gender
Male Female 
3.36
2.56 
3.47
3.75 
2.37
2.67 
2.94
2.92 
5.13
5.01 
Race/Ethnicity^{1}
White, nonHispanic Black, nonHispanic Hispanic 
2.78
1.82 2.63 
4.53
2.11 3.45 
2.20
1.88 1.85 
3.30
2.10 2.59 
4.73
3.90 4.03 
Population Density
Large metro Small metro Nonmetro 
3.50
2.47 3.72 
3.48
5.42 3.23 
2.83
3.10 2.21 
3.88
2.58 3.07 
5.95
4.73 6.35 
Region
Northeast North Central South West 
2.00
2.43 2.58 4.49 
2.49
1.65 2.85 13.94 
2.47
1.93 3.27 3.74 
2.06
1.97 3.05 8.63 
3.20
3.40 5.64 17.89 
Adult Education^{2}
Less than high school High school graduate Some college College graduate 
N/A
N/A N/A N/A 
3.06
2.88 4.94 2.78 
2.53
2.46 2.41 2.28 
3.20
2.49 3.14 3.61 
4.30
3.64 4.70 3.74 
Current Employment^{3}
Fulltime Parttime Unemployed Other^{4} 
N/A
N/A N/A N/A 
3.24
3.92 2.43 3.50 
2.35
2.57 2.53 2.00 
3.31
2.53 2.39 3.58 
3.84
3.72 3.69 5.18 
Arizona/California
California Arizona, Quarters 24 
2.23
1.58 
3.02
1.71 
2.47
1.93 
2.71
1.39 
4.43
2.94 
N/A: Not applicable.
^{1}The category "other" for
race/ethnicity is not included.
^{2}Data on adult education
are not applicable for 12 to 17 year olds.
^{3}Data on current employment
are not applicable for 12 to 17 year olds.
^{4}Retired, disabled, homemaker,
student, or "other."
^{4}Retired, disabled, homemaker,
student, or "other."
Table C.3 Generalized Standard Errors for Estimated Percentages of Illicit Drug Use Estimates: 1997
Sample Size
for Base of Percentage, n 




















100  0.38  0.67  0.93  1.40  2.41  3.98  5.13  5.93  6.40  6.50  6.22  5.44  3.96  2.73  2.04  1.62  1.07 
300  0.24  0.43  0.60  0.90  1.54  2.55  3.29  3.81  4.11  4.17  3.99  3.49  2.54  1.75  1.31  1.04  0.69 
500  0.20  0.35  0.49  0.73  1.26  2.08  2.68  3.10  3.34  3.40  3.25  2.84  2.07  1.43  1.07  0.84  0.56 
700  0.17  0.31  0.42  0.64  1.10  1.81  2.34  2.71  2.92  2.97  2.83  2.48  1.80  1.24  0.93  0.74  0.49 
900  0.16  0.28  0.38  0.58  0.99  1.64  2.11  2.44  2.63  2.68  2.56  2.24  1.63  1.12  0.84  0.67  0.44 
1,000  0.15  0.26  0.37  0.55  0.95  1.57  2.03  2.34  2.53  2.57  2.45  2.15  1.56  1.08  0.81  0.64  0.42 
1,250  0.14  0.24  0.34  0.51  0.87  1.43  1.85  2.14  2.31  2.35  2.24  1.96  1.43  0.98  0.74  0.58  0.39 
1,500  0.13  0.22  0.31  0.47  0.81  1.33  1.72  1.99  2.14  2.18  2.08  1.82  1.33  0.91  0.68  0.54  0.36 
2,000  0.11  0.20  0.28  0.42  0.72  1.19  1.53  1.77  1.91  1.94  1.86  1.62  1.18  0.81  0.61  0.48  0.32 
2,500  0.10  0.18  0.25  0.38  0.66  1.08  1.40  1.62  1.74  1.77  1.70  1.48  1.08  0.74  0.56  0.44  0.29 
3,000  0.10  0.17  0.24  0.36  0.61  1.01  1.30  1.50  1.62  1.65  1.57  1.38  1.00  0.69  0.52  0.41  0.27 
4,000  0.09  0.15  0.21  0.32  0.54  0.90  1.16  1.34  1.44  1.47  1.40  1.23  0.89  0.62  0.46  0.36  0.24 
5,000  0.08  0.14  0.19  0.29  0.50  0.82  1.06  1.22  1.32  1.34  1.28  1.12  0.82  0.56  0.42  0.33  0.22 
7,500  0.07  0.12  0.16  0.25  0.42  0.70  0.90  1.04  1.12  1.14  1.09  0.95  0.69  0.48  0.36  0.28  0.19 
10,000  0.06  0.10  0.15  0.22  0.38  0.62  0.80  0.92  1.00  1.01  0.97  0.85  0.62  0.43  0.32  0.25  0.17 
15,000  0.05  0.09  0.12  0.19  0.32  0.53  0.68  0.79  0.85  0.86  0.82  0.72  0.52  0.36  0.27  0.21  0.14 
18,000  0.05  0.08  0.11  0.17  0.30  0.49  0.63  0.73  0.79  0.80  0.76  0.67  0.49  0.34  0.25  0.20  0.13 
24,505^{1}  0.04  0.07  0.10  0.15  0.26  0.43  0.56  0.64  0.69  0.71  0.67  0.59  0.43  0.30  0.22  0.18  0.12 
Note: Generalized standard errors are predicted from the following equation: SE = 100*[ 1.1032p^{(.8264)}*(1p)^{(.6002)}/n^{(.4036)}].
^{1}The total sample size for the 1997 NHSDA is 24,505.
^{1}The total sample size for the 1997 NHSDA is 24,505.
Table C.4 Generalized Standard Errors for Estimated Percentages of Licit Drug Use Estimates: 1997
Sample Size
for Base of Percentage, n 




















100  0.98  1.46  1.85  2.47  3.62  5.10  6.02  6.55  6.75  6.64  6.19  5.35  3.90  2.73  2.08  1.67  1.14 
300  0.64  0.96  1.21  1.62  2.36  3.33  3.93  4.28  4.41  4.34  4.05  3.49  2.55  1.79  1.36  1.09  0.75 
500  0.52  0.78  0.99  1.33  1.94  2.73  3.22  3.51  3.62  3.56  3.32  2.87  2.09  1.47  1.12  0.90  0.61 
700  0.46  0.69  0.87  1.16  1.70  2.40  2.83  3.08  3.17  3.12  2.91  2.52  1.83  1.29  0.98  0.79  0.54 
900  0.42  0.63  0.79  1.06  1.54  2.18  2.57  2.79  2.88  2.83  2.64  2.28  1.66  1.17  0.89  0.71  0.49 
1,000  0.40  0.60  0.76  1.01  1.48  2.09  2.46  2.68  2.76  2.72  2.54  2.19  1.60  1.12  0.85  0.68  0.47 
1,250  0.37  0.55  0.70  0.93  1.36  1.92  2.26  2.46  2.53  2.49  2.33  2.01  1.46  1.03  0.78  0.63  0.43 
1,500  0.34  0.51  0.65  0.87  1.27  1.79  2.11  2.29  2.36  2.32  2.17  1.87  1.36  0.96  0.73  0.59  0.40 
2,000  0.31  0.46  0.58  0.77  1.13  1.60  1.88  2.05  2.11  2.08  1.94  1.67  1.22  0.86  0.65  0.52  0.36 
2,500  0.28  0.42  0.53  0.71  1.04  1.46  1.73  1.88  1.94  1.91  1.78  1.54  1.12  0.79  0.60  0.48  0.33 
3,000  0.26  0.39  0.50  0.66  0.97  1.36  1.61  1.75  1.81  1.78  1.66  1.43  1.04  0.73  0.56  0.45  0.31 
4,000  0.23  0.35  0.44  0.59  0.87  1.22  1.44  1.57  1.61  1.59  1.48  1.28  0.93  0.65  0.50  0.40  0.27 
5,000  0.21  0.32  0.41  0.54  0.79  1.12  1.32  1.44  1.48  1.46  1.36  1.17  0.86  0.60  0.46  0.37  0.25 
7,500  0.18  0.27  0.35  0.46  0.68  0.96  1.13  1.23  1.27  1.24  1.16  1.00  0.73  0.51  0.39  0.31  0.21 
10,000  0.16  0.25  0.31  0.42  0.61  0.86  1.01  1.10  1.13  1.11  1.04  0.90  0.65  0.46  0.35  0.28  0.19 
15,000  0.14  0.21  0.27  0.35  0.52  0.73  0.86  0.94  0.97  0.95  0.89  0.77  0.56  0.39  0.30  0.24  0.16 
18,000  0.13  0.20  0.25  0.33  0.48  0.68  0.80  0.87  0.90  0.89  0.83  0.71  0.52  0.37  0.28  0.22  0.15 
24,505^{1}  0.12  0.17  0.22  0.29  0.43  0.60  0.71  0.78  0.80  0.79  0.73  0.63  0.46  0.32  0.25  0.20  0.14 
Note: Generalized standard errors are predicted from the following equation: SE = 100*[.8911p^{(.5910)}*(1p)^{(.5569)}/n^{(.3876)}].
^{1}The total sample size for the 1997 NHSDA is 24,505.
^{1}The total sample size for the 1997 NHSDA is 24,505.
