Appendix B: State Estimation Methodology
B.1 Background
In response to the need for State-level information on substance abuse problems, the Substance Abuse and Mental Health Services Administration (SAMHSA) began developing and testing small area estimation (SAE) methods for the National Household Survey on Drug Abuse (NHSDA) in 1994 under a contract with RTI of Research Triangle Park, North Carolina. That developmental work used logistic regression models with data from the combined 1991 to 1993 NHSDAs and local area indicators, such as drug-related arrests, alcohol-related death rates, and block group/tract-level characteristics from the 1990 Census that were found to be associated with substance abuse. In 1996, the results were published for 25 States for which there were sufficient sample data (Office of Applied Studies [OAS], 1996). A subsequent report described the methodology in detail and noted areas in which improvements were needed (Folsom & Judkins, 1997).
The increasing need for State-level estimates of substance use led to the decision to expand the NHSDA to provide estimates for all 50 States and the District of Columbia on an annual basis beginning in 1999. It was determined that, with the use of modeling similar to that used with the 1991 to 1993 NHSDA data in conjunction with a sample designed for State-level estimation, a sample of about 67,500 persons would be sufficient to make reasonably precise estimates.
The State-based NHSDA sample design implemented in 1999 had the following characteristics:
States are stratified into field interviewer (FI) regions that covered the geography of each State. The FI regions are comprised of contiguous Census tracts and counties and designed to yield about 75 interviews per region. In the 42 smaller States (by population) and the District of Columbia, there are 12 FI regions; in the eight largest States, there are 48 FI regions.
Within each region, eight segments are randomly selected and two are allocated to each calendar quarter of data collection.
Within each segment, households are screened, and a sample of one to two persons per household is selected. An average of nine responding persons per segment is sought.
The samples are selected so that approximately 900 responding persons, 300 in each age group (12 to 17, 18 to 25, and 26 or older), are drawn in each of the 42 States and the District of Columbia. In the eight large States, the person samples are allocated equally to the three age groups with overall respondent sample sizes ranging from 2,669 to 4,681.
Tables B.1, B.2, and B.3 present, respectively, the achieved response rates, the survey population sizes (by State and age group), and the associated samples sizes.
In preparation for the modeling of the 1999 data, RTI used the data from the combined 1994-96 NHSDAs to develop an improved methodology that utilized more local area data and produced better estimates of the accuracy of the State estimates (Folsom, Shah, & Vaish, 1999). That effort involved the development of procedures that would validate the results for geographic areas with large samples. This work was reviewed by a panel with expertise in small area estimation.^{3} They approved of the methodology, but suggested further improvements for the modeling to be used to produce the 1999 State estimates. Those improvements have been incorporated into the methodology finally used for the State estimates included in this report. The methodology, called Survey-Weighted Hierarchical Bayes Estimation (HB), is described below.
B.2 Goals of Modeling
There were several goals underlying the estimation process. The first was to model substance use-related rates at the lowest possible level and aggregate over the levels to form the State estimates. The chosen level of aggregation was the 32 age group (12 to 17, 18 to 25, 26 to 34, 35+) by race/ethnicity (white-not Hispanic, black-not Hispanic, Hispanic, other) by gender cells at the block group level. Estimated population counts are obtained from a private vendor for each block group for each of the 32 cells. This level of aggregation was desired because the NHSDA first stage of sample selection was at the block group level, so that there would be data at this level to fit a model. In addition, there was a great deal of information from the Census at the block group level that could be used as predictors in the models. If substance use-related rates could be estimated for each of the 32 cells at the block group level, it would only be necessary to multiply by the estimated population counts and aggregate to the State level.
Another goal of the estimation process was to include the sampling weight in the model in such a way that the small area estimates would converge to the design-based (sample-weighted) estimate when they are aggregated to a sufficient sample size. There was a desire for the estimates to have this characteristic so that there would be consistency with the survey-weighted national estimates based on the entire sample.
A third goal was to include as much local source data as possible, especially data related to each substance use measure. This would help provide a better fit beyond the strictly sociodemographic information. The desire was to use national sources of these data so that there would be consistency of collection and estimation methodology across States.
Recognizing that estimates based solely on these "fixed" effects would not reflect differences across States due to differences in laws, enforcement activities, advertising campaigns, outreach activities, and other such unique State contributions, a fourth goal was to include "random" effects to compensate for these differences. The types of random effects that could be supported by NHSDA data were a function of the size of sample and the model fit to the sample data. For the 1999 survey, random effects were included at the State level and for substate regions comprised of three (typically neighboring) FI regions. Although this grouping of the three FI regions was principally motivated by the need to accumulate enough sample to support good model fitting for the low prevalence NHSDA outcomes, it was also reasoned that it would be possible to produce substate HB estimates for areas comprised of these FI region groups, once 2 or 3 years of NHSDA data were available, because that would yield substate region samples of at least 400 respondents. For substate areas like counties and large municipalities that do not conform to the substate region boundaries, HB estimates could be derived from their elemental block group-level contributions, but the direct survey data employed in the estimation of the associated substate region effects would not be restricted to the county or city of interest. This mismatch of FI region and county/large municipality boundaries weakens the theoretical appeal of the associated HB estimate. For this reason, substate HB estimates probably should be restricted to areas that can be matched reasonably well to FI region groups.
One of the difficulties of typical SAE has been obtaining good estimates of the accuracy of the estimates with prediction intervals that give a good representation of the true probability of coverage of the intervals. Therefore, the final major goal was to provide accurate prediction intervals—ones that would approach the usual sample-based intervals as the sample size increases.
B.3 Predictors Used in Logistic Regression Models
Local area data used as potential predictor variables in the logistic regression models were obtained from several sources, including Claritas, the Census Bureau, the FBI (Uniform Crime Reports), Health Resources and Services Administration (Area Resource File), SAMHSA (Uniform Facility Data Set), and the National Center for Health Statistics (mortality data). The list of sources and the actual variables that were selected as independent predictors for each age group for the estimation of the treatment gap are provided below.
B.3.1 Sources of Data
B.3.2 Predictor Variables in Final Model, by Age Group
Age Group 1 (Ages 12 to 17)
Age Group 2 (Ages 18 to 25)
Age Group 3 (Ages 26 to 34)
Age Group 4 (Ages 35 or Older)
B.4 Method of Selecting Independent Variables for the Models
For the 1999 SAE exercise, independent variables for modeling each of the substance use measures were first identified by a CHAID (Chi-squared Automatic Interaction Detector) algorithm. CHAID does not use sample weights. Prior to this process, all the continuous variables were categorized using deciles and were treated as ordinal in CHAID. Region was treated as a nominal categorical variables in CHAID. Significant independent variables from each model that were final nodes in the tree-growing process were identified as indicator variables destined for inclusion at a later step.
Independently, a SAS stepwise logistic regression model was fit for each dependent variable by age group. The SAS stepwise was used because it was able to quickly run all of the variables for all of the models, although it was recognized that the software would not take into account the complex sample design and the weights. The independent variables included all the first-order or linear polynomial trend contrasts across the 10 levels of the categorized variables, as well as the gender, region, and race variables. Significant variables (at the 3 percent level) were identified from this process. Based on this list, another list of variables was created that included the second- and third-order polynomials and the interaction of the first-order polynomials with the gender, race, and region variables.
Next, the variables from the CHAID process and the SAS process were entered into a SAS stepwise logistic model at the 1 percent significance level. Because of past concerns about overfitting of the data in earlier estimation using the 1991 to 1993 NHSDA data, the significance levels were made quite stringent. These variables were then entered into a SUDAAN logistic regression model because the SUDAAN software would adjust for the effects of the weights and other aspects of the complex sample design. All variables that were still significant at the 1 percent significance level were entered into the survey-weighted HB process.
Independently, a factor-analytic approach was used to determine the important variables to include in the model. This approach would allow the data to self-identify the important dimensions. The concern here was to use an alternative method that would have a certain face validity. That method was utilized to identify an independent set of variables that were then processed through the HB estimation. The results, however, in terms of model-fit and prediction intervals were generally not as good as with the CHAID/SAS/SUDAAN screening process for candidate independent variables. Also, the factor-analytic approach involved an inherently subjective step to attribute names to the various factor loadings, and the interest was more in the predictive ability of variables than in a substantive description of the dimensions. Nevertheless, it was encouraging to see that the results of the two approaches gave reasonably similar results. For these reasons, the estimates in this report were those based on the latter method that started with the CHAID process.
To select variables for the 2000 treatment gap model, an alternative to the 1999 approach was also implemented. This alternative, designed to further reduce the risk of overfitting, involved splitting the 2000 sample into two halves with the 7,200 sample area segments (block clusters) used as sampling units for the splitting. One of those half-samples was designated the training sample, and its complement was assigned the role of validation sample. The 1999 variable selection strategy was then applied to the training sample with a less stringent 10 percent significance level for retaining variables. Note that with a sample size one-half as large, the training sample would yield standard errors for the logistic regression coefficients that were expected to be inflated by a factor of 1.4. Therefore, a training sample significance level of 7 percent would be expected to yield a significance level of 1 percent in the full sample. The 10 percent level was chosen for the training sample after trying several alternatives. Once the variables were chosen using the training sample, the model was refit on the validation sample and variables that were not significant at the 10 percent level were dropped. The two alternative models resulting from the 1999 variable selection method and the new 2000 alternative were both subjected to the internal benchmarking validation exercise described later in this appendix (Section B.7). The new method produced small area estimates that were noticeably less biased for the 26 or older age groups and the 12 or older age groups. Based on this result, the alternative set of predictor variables was chosen.
B.5 General Model Description
The model can be characterized as a complex mixed model (including both fixed and random effects) of the form:
=X + ZU
Each of the symbols represents a matrix or vector. The leading term X is the usual (fixed) regression contribution, and ZU represents random effects for the States and FI region groups that the data will support and for which estimates are desired. Not obvious from the notation is that the form of the model is a logistic model used to estimate dichotomous data. The vector has elements ln[_{ijk} /(1-_{ijk})], where the _{ijk} is the propensity for the k^{th} person in the j^{th} FI composite region in the i^{th} State to engage in the behavior of interest (e.g., to use marijuana in the past month). Also not obvious from the notation is that the model fitting utilizes the final "sample" weights as discussed above. The "sample" weights have been adjusted for nonresponse and poststratified to known Census counts.
The estimate for each State behaves like a "weighted" average of the direct survey estimate in that State and the predicted value based on the national regression model. The "weights" in this case are functions of the relative precision of the sample-based estimate for the State and the predicted estimate based on the national model. The eight large States have large samples, and thus more "weight" is given to the sample estimate relative to the model-based regression estimate. The 42 small States and the District of Columbia put relatively more "weight" on the regression estimate because of their smaller samples. The national regression estimate actually uses national parameters that are based on the full sample of approximately 72,000 persons; however, the regression estimate for a specific State is based on applying the national regression parameters to that State's "local" county, block group, and tract-level predictor variables and summing to the State level. Therefore, even the national regression component of the estimate for a State includes "local" State data.
The goal then was to come up with the best estimates of and U. This would lead to the best estimates of , which would in turn lead to the best estimate of . Once the best estimate of for each block group and each age/race/gender cell within a block group has been estimated, the results could be weighted by the projected Census population counts at that level to make estimates for any geographic area larger than a block group.
B.6 Implementation of Modeling
The solution to the equation for in the above section is not straightforward but involves a series of iterative steps to generate values of the desired fixed and random effects from the underlying joint distribution. The details of the technique will be described in more detail in a methodological report currently in progress. In the interim, the basic process can be described as follows.
Let denote the matrix of fixed effects, be the matrix of State random effects i = 1-51, and v denote the matrix of FI composite region effects j within State i. Because the goal is to estimate separate models for four age groups, it is assumed that the random effect vectors are four variate Normal with null mean vectors and 4X4 covariance matrices D_{} and D_{v}, respectively. To estimate the individual effects, a Bayesian approach is used to represent the joint density function given the data by f(, , v, D_{v}, D_{} | y). According to the Bayes process, this can be estimated once the conditional distributions are known:
f_{1}( | , v, D_{v}, D_{}, y), f_{2}(D_{v}, D_{} | , v, y), and f_{3}(, v | , D_{v}, D_{}, y).
To generate random draws from these distributions, Markov Chain Monte Carlo (MCMC) processes need to be used. These are a body of methods for generating pseudo-random draws from probability distributions via Markov chains. A Markov chain is fully specified by its starting distribution P(X_{0}) and the transition kernel P(X_{t }|X_{t-1}).
Each MCMC step that involves the vector of binary outcome variables y in the conditioning set needs first to be modified by defining a pseudo-likelihood using survey weights. In defining pseudo-likelihood, weights are introduced after scaling them to the effective sample size based on a suitable design effect. Note that with the pseudo-likelihood, the covariance matrix of the pseudo-score functions is no longer equal to the pseudo-information matrix; therefore, a sandwich-type of covariance matrix was used to compute the design effect. In this process, weights are largely assumed to be noninformative (i.e., unrelated to the outcome variable y). The assumption of noninformative weights is useful in finding tractable expressions for the appropriate information matrix of the pseudo-score functions. The pseudo-log-likelihood remains an unbiased estimate of the finite-population log-likelihood regardless of this assumption.
Step I [_{a} | , v, y] (note that this does not depend oin D_{}, D_{v})
With flat prior for _{a}, the conditional posterior is proportional to the pseudo-likelihood function. For large samples, this posterior can be approximated by the multivariate Normal distribution with mean vector equal to the pseudo-maximum likelihood estimate and with asymptotic covariance matrix having the associated sandwich form. Assuming that the survey weights are noninformative makes the age group specific _{a} vectors conditionally independent of each other. Therefore, the _{a} can be updated separately at each MCMC cycle.
Step II [_{i } | , v_{i}, D_{}, y] (this does not depend on D_{v})
Here, the conditional posterior is proportional to the product of the prior g(_{i }|.), the pseudo-likelihood function f(y|.) as well as the prior p(,D_{}); this last prior can be omitted as it does not involve _{i}. Calculating the denominator (or the normalization constant) of the posterior distribution for _{i} requires multidimensional integration and is numerically intractable. To get around this problem, the Metropolis-Hastings (M-H) algorithm is used that requires a dominating density convenient for Monte Carlo sampling. For this purpose, the mode and curvature of the conditional posterior distribution are used; these can be simply obtained from its numerator. Then a Gaussian distribution is used with matching mode and curvature to define the dominating density for M-H. As with the age group specific _{a} parameters, the State-specific random effect vectors _{i} are conditionally independent of each other and can be updated separately at each MCMC cycle.
Step III [v_{ij} | , _{i}, D_{v}, y] (this does not depend on D_{})
Similar to step II.
Step IV [D_{} | ] , [D_{v} | v] (here, and v include all the information from y)
Here, the pseudo-likelihood involving design weights comes in implicitly through the conditioning parameters and v evaluated at the current cycle. An exact conditional posterior distribution is obtained because the inverse Wishart priors for D_{} and D_{v} are conjugate.
Remarks
B.7 Validation and Other Results
The following validation methodology was implemented at the time of the estimation of the 2000 percentage treatment gap and is specific to this measure. Validation was also conducted earlier at the time of the first release of the 1999 NHSDA data (OAS, 2000) and was based on the seven variables discussed in that report. Subsequently, an error in the imputation program was discovered and corrected, and the corrected file was used for the validation of the treatment gap estimation. Further information about the impact of the error on the previously released data from the 1999 NHSDA is provided in the 2000 Summary of Findings (OAS, 2001).
To validate the fit of the SAE models, the eight large sample States were used as internal benchmarks. For this purpose, 6 pseudo FI regions within each large sample State were created by pooling the 48 initial regions into 6 groups of 8. Each of these 6 pseudo-FI regions were then expected to have 16 area segments per calendar quarter. For each of these pseudo FI region-by-quarter sets of 16 area segments, any segments devoid of interviews were first randomly replaced by a selection from the non-empty segments in the set. The completed set of 16 segments from each pseudo-FI region-by-quarter combination was then randomly partitioned into 8 replicates of 2 segments each. When combined, each pair of large sample States had 12 pseudo-FI regions. By pooling one segment pair from each of the 48 pseudo-FI region-by-quarter combinations, 8 substate replicates were formed. Each of these 8 substate replicates mimicked the size and design structure of a small sample State.
Having created 8 pseudo-small State samples and associated universe-level files for each of the 4 paired States, SAEs were then produced for the 32 pseudo-States. Table B.4 shows these 32 substate SAEs and their direct survey-weighted analogs for the percentage treatment gap.^{4} Relative absolute biases of the substate estimates are shown where the full State sample direct estimate is used as the benchmark value.
The State-specific relative absolute bias (RB) quantities in Table B.4 equal the absolute differences of the averaged eight substate small area estimates and the State full sample design-based benchmark (e.g., California and Texas) divided by the benchmark. The average relative absolute bias (ARB) is the simple average across the four combined-State pairs of the RBs. The average relative bias across the 32 pseudo-States was only about 4 percent. This implies, on average, for a pseudo-State (similar in design and sample size to the 42 small States and the District of Columbia) with an estimated 2 percent treatment gap that the true value in the population is within 0.08 percent.
To compare the overall precision of the small area estimates with the direct survey estimates, ratios of the corresponding 95 percent Bayes prediction intervals, which fully account for the posterior variance of the fixed and random effect parameters, were compared with the corresponding direct survey 95 percent confidence intervals. These results are displayed in Table B.5.
The SAE and direct intervals are summarized by showing average ratios of the relative interval widths (the interval width for a State divided by the corresponding estimate for that State) by State and overall averages of the ratios across States. The average relative width across the 32 pseudo-States is about 1.80. This indicates generally that the confidence intervals for direct design-based estimates based on the same sample size would be 1.8 times larger than the prediction intervals resulting from the HB approach. The HB estimates are equivalent in precision to a direct estimate based on a sample that is 3.3 times larger. The tables also present the average relative root mean square (RMSQ), a measure that takes into consideration both the (small) bias and the variance in the HB estimation.
B.8 Caveats
Table B.1 shows the screening, interview, and overall response rate for each State and the District of Columbia. As mentioned in the text, these variable response rates can be associated with variable levels of nonresponse bias. In addition, there may be varying levels of response bias as a result of underreporting (and sometimes overreporting) use of illicit substances. For 1999 and 2000, the assumption being made is that the biases from these two sources are constant across States so that comparisons among States still hold.
Another possible contributor to bias in the State estimates, and the estimates in general, was the effect of editing and imputation. In developing the editing and imputation process, the desire was to minimize the amount of editing, typically somewhat subjective, and instead let the random imputation process supply any partially missing information. Overall, the percentage of imputed information was quite small for most substances. For example, respondents may have indicated that they used the drug in their lifetime or in the past year, but left blank the question about use in the past month. The method is based on a multivariate imputation in which some demographic and other substance use information from the respondent is used to determine a donor who is similar in those characteristics but has supplied data for the drug in question. Often, information was also available from the partial respondent on the recency of drug use. For many of the records, this auxiliary information was available. For a small portion, no auxiliary information was available, in which case a random donor with similar drug use patterns and demographic characteristics was used.
Table B.1 2000 NHSDA Weighted CAI Screening and Interview Response Rates, by State | |||||||
State |
Screening Response Rate |
Interview Response Rate |
Overall Response Rate |
State |
Screening Response Rate |
Interview Response Rate |
Overall Response Rate |
Total |
92.84 |
73.93 |
68.64 |
Missouri |
92.25 |
70.80 |
65.31 |
Alabama |
95.50 |
77.98 |
74.47 |
Montana |
94.91 |
80.21 |
76.13 |
Alaska |
95.43 |
80.24 |
76.58 |
Nebraska |
93.13 |
74.58 |
69.46 |
Arizona |
92.99 |
73.78 |
68.61 |
Nevada |
92.08 |
74.44 |
68.54 |
Arkansas |
97.19 |
81.00 |
78.73 |
New Hampshire |
92.41 |
75.12 |
69.42 |
California |
90.99 |
69.50 |
63.24 |
New Jersey |
91.96 |
66.56 |
61.21 |
Colorado |
94.84 |
75.26 |
71.37 |
New Mexico |
97.43 |
80.80 |
78.72 |
Connecticut |
89.83 |
71.36 |
64.10 |
New York |
88.78 |
73.73 |
65.46 |
Delaware |
92.91 |
68.25 |
63.42 |
North Carolina |
94.51 |
73.19 |
69.17 |
District of Columbia |
93.50 |
85.56 |
80.00 |
North Dakota |
94.43 |
79.46 |
75.03 |
Florida |
94.64 |
75.73 |
71.67 |
Ohio |
94.89 |
75.79 |
71.92 |
Georgia |
92.95 |
69.76 |
64.84 |
Oklahoma |
93.06 |
74.85 |
69.66 |
Hawaii |
91.95 |
78.45 |
72.14 |
Oregon |
91.87 |
73.91 |
67.90 |
Idaho |
93.94 |
74.45 |
69.94 |
Pennsylvania |
94.37 |
73.50 |
69.36 |
Illinois |
88.71 |
65.59 |
58.19 |
Rhode Island |
91.26 |
74.11 |
67.63 |
Indiana |
92.62 |
73.87 |
68.42 |
South Carolina |
94.69 |
77.84 |
73.71 |
Iowa |
94.78 |
80.00 |
75.83 |
South Dakota |
95.15 |
76.67 |
72.95 |
Kansas |
92.28 |
73.45 |
67.79 |
Tennessee |
90.25 |
72.45 |
65.39 |
Kentucky |
95.79 |
84.14 |
80.59 |
Texas |
94.72 |
78.12 |
74.00 |
Louisiana |
95.04 |
80.81 |
76.80 |
Utah |
95.11 |
83.44 |
79.36 |
Maine |
92.39 |
78.46 |
72.49 |
Vermont |
92.62 |
80.80 |
74.83 |
Maryland |
94.88 |
76.88 |
72.94 |
Virginia |
91.44 |
75.18 |
68.75 |
Massachusetts |
89.77 |
66.45 |
59.65 |
Washington |
93.59 |
75.45 |
70.61 |
Michigan |
93.19 |
73.18 |
68.20 |
West Virginia |
95.19 |
78.17 |
74.41 |
Minnesota |
94.66 |
80.62 |
76.32 |
Wisconsin |
94.33 |
75.06 |
70.81 |
Mississippi |
93.60 |
79.14 |
74.07 |
Wyoming |
95.41 |
76.61 |
73.09 |
Source: SAMHSA, Office of Applied Studies, National Household Survey on Drug Abuse, 2000. |
Table B.2 Estimated Numbers (in Thousands) of Persons Aged 12 or Older, by Age Group and State: 2000 | ||||
State |
Total |
Age Group (Years) | ||
12-17 |
18-25 |
26 or Older | ||
Total |
223,280 |
23,368 |
28,984 |
170,927 |
Alabama |
3,654 |
371 |
476 |
2,807 |
Alaska |
491 |
63 |
71 |
357 |
Arizona |
3,866 |
434 |
516 |
2,916 |
Arkansas |
2,159 |
225 |
279 |
1,655 |
California |
25,736 |
2,851 |
3,513 |
19,371 |
Colorado |
3,411 |
358 |
452 |
2,601 |
Connecticut |
2,701 |
257 |
308 |
2,136 |
Delaware |
630 |
65 |
79 |
487 |
District of Columbia |
424 |
44 |
58 |
321 |
Florida |
12,693 |
1,178 |
1,368 |
10,147 |
Georgia |
6,354 |
680 |
863 |
4,811 |
Hawaii |
975 |
95 |
115 |
764 |
Idaho |
1,083 |
130 |
165 |
789 |
Illinois |
9,768 |
998 |
1,306 |
7,465 |
Indiana |
4,949 |
512 |
665 |
3,772 |
Iowa |
2,390 |
249 |
319 |
1,822 |
Kansas |
2,155 |
240 |
293 |
1,622 |
Kentucky |
3,287 |
329 |
435 |
2,524 |
Louisiana |
3,561 |
418 |
519 |
2,624 |
Maine |
1,047 |
103 |
122 |
822 |
Maryland |
4,281 |
421 |
510 |
3,349 |
Massachusetts |
5,119 |
504 |
611 |
4,004 |
Michigan |
7,918 |
832 |
1,032 |
6,053 |
Minnesota |
3,954 |
431 |
539 |
2,985 |
Mississippi |
2,270 |
259 |
323 |
1,688 |
Missouri |
4,534 |
476 |
596 |
3,462 |
Montana |
776 |
85 |
100 |
591 |
Nebraska |
1,376 |
154 |
189 |
1,032 |
Nevada |
1,544 |
146 |
184 |
1,214 |
New Hampshire |
1,007 |
105 |
120 |
782 |
New Jersey |
6,717 |
629 |
783 |
5,305 |
New Mexico |
1,490 |
174 |
211 |
1,105 |
New York |
14,782 |
1,476 |
1,825 |
11,480 |
North Carolina |
6,365 |
651 |
777 |
4,936 |
North Dakota |
535 |
62 |
77 |
396 |
Ohio |
9,292 |
951 |
1,212 |
7,129 |
Oklahoma |
2,744 |
306 |
367 |
2,072 |
Oregon |
2,827 |
276 |
355 |
2,197 |
Pennsylvania |
10,117 |
988 |
1,186 |
7,943 |
Rhode Island |
821 |
84 |
95 |
642 |
South Carolina |
3,130 |
326 |
386 |
2,418 |
South Dakota |
619 |
73 |
88 |
458 |
Tennessee |
4,657 |
464 |
598 |
3,595 |
Texas |
16,057 |
1,877 |
2,368 |
11,813 |
Utah |
1,715 |
248 |
326 |
1,142 |
Vermont |
512 |
55 |
63 |
394 |
Virginia |
5,648 |
563 |
691 |
4,395 |
Washington |
4,784 |
487 |
606 |
3,691 |
West Virginia |
1,553 |
141 |
195 |
1,216 |
Wisconsin |
4,376 |
476 |
590 |
3,310 |
Wyoming |
425 |
49 |
61 |
315 |
Source: SAMHSA, Office of Applied Studies, National Household Survey on Drug Abuse, 2000. |
Table B.3 Survey Sample Size for Persons Aged 12 or Older, by Age Group and State: 2000 | ||||
State |
Total |
Age Group (Years) | ||
12-17 |
18-25 |
26 or Older | ||
Total |
71,764 |
25,717 |
22,613 |
23,434 |
Alabama |
936 |
294 |
337 |
305 |
Alaska |
833 |
294 |
257 |
282 |
Arizona |
927 |
292 |
303 |
332 |
Arkansas |
960 |
310 |
364 |
286 |
California |
5,022 |
2,365 |
1,354 |
1,303 |
Colorado |
911 |
278 |
298 |
335 |
Connecticut |
891 |
299 |
262 |
330 |
Delaware |
928 |
321 |
297 |
310 |
District of Columbia |
918 |
259 |
340 |
319 |
Florida |
3,478 |
1,194 |
1,140 |
1,144 |
Georgia |
1,145 |
520 |
330 |
295 |
Hawaii |
945 |
309 |
307 |
329 |
Idaho |
894 |
311 |
283 |
300 |
Illinois |
3,660 |
1,262 |
1,128 |
1,270 |
Indiana |
1,061 |
405 |
353 |
303 |
Iowa |
921 |
284 |
324 |
313 |
Kansas |
897 |
291 |
323 |
283 |
Kentucky |
1,018 |
341 |
345 |
332 |
Louisiana |
939 |
356 |
278 |
305 |
Maine |
901 |
321 |
234 |
346 |
Maryland |
967 |
332 |
317 |
318 |
Massachusetts |
1,002 |
378 |
298 |
326 |
Michigan |
3,576 |
1,234 |
1,090 |
1,252 |
Minnesota |
893 |
297 |
306 |
290 |
Mississippi |
917 |
309 |
320 |
288 |
Missouri |
893 |
314 |
302 |
277 |
Montana |
914 |
276 |
334 |
304 |
Nebraska |
906 |
311 |
291 |
304 |
Nevada |
925 |
305 |
284 |
336 |
New Hampshire |
883 |
280 |
246 |
357 |
New Jersey |
1,200 |
553 |
289 |
358 |
New Mexico |
874 |
315 |
267 |
292 |
New York |
3,589 |
1,160 |
1,142 |
1,287 |
North Carolina |
1,043 |
418 |
326 |
299 |
North Dakota |
896 |
288 |
320 |
288 |
Ohio |
3,678 |
1,227 |
1,215 |
1,236 |
Oklahoma |
973 |
303 |
374 |
296 |
Oregon |
864 |
288 |
275 |
301 |
Pennsylvania |
3,997 |
1,474 |
1,195 |
1,328 |
Rhode Island |
950 |
293 |
324 |
333 |
South Carolina |
855 |
275 |
269 |
311 |
South Dakota |
855 |
289 |
272 |
294 |
Tennessee |
947 |
367 |
285 |
295 |
Texas |
4,020 |
1,498 |
1,307 |
1,215 |
Utah |
1,031 |
362 |
372 |
297 |
Vermont |
981 |
344 |
320 |
317 |
Virginia |
1,047 |
437 |
274 |
336 |
Washington |
1,006 |
408 |
289 |
309 |
West Virginia |
950 |
322 |
286 |
342 |
Wisconsin |
1,119 |
453 |
312 |
354 |
Wyoming |
828 |
301 |
255 |
272 |
Source: SAMHSA, Office of Applied Studies, National Household Survey on Drug Abuse, 2000. |
Table B.4 Simulated Substate Prevalence Rates, Relative Absolute Bias, and Root Mean Square for Persons Needing But Not Receiving Treatment for an Illicit Drug Problem in the Past Year: 2000 | ||||
Needing But Not Receiving Treatment for an Illicit Drug Problem | ||||
Total |
12-17 |
18-25 |
26 or Older | |
California and Texas SAE |
2.18 |
5.30 |
4.79 |
1.21 |
California and Texas DBE |
2.01 |
5.34 |
4.95 |
0.95 |
CA_TX1 |
2.25 |
4.69 |
4.20 |
1.51 |
CA_TX2 |
2.27 |
6.24 |
5.17 |
1.12 |
CA_TX3 |
2.34 |
5.97 |
5.09 |
1.27 |
CA_TX4 |
2.59 |
5.37 |
5.71 |
1.59 |
CA_TX5 |
2.13 |
5.11 |
4.93 |
1.16 |
CA_TX6 |
2.12 |
4.98 |
4.69 |
1.21 |
CA_TX7 |
1.96 |
4.36 |
4.43 |
1.13 |
CA_TX8 |
2.09 |
5.28 |
4.20 |
1.21 |
RMSQ |
13.75 |
11.03 |
10.36 |
38.26 |
REL ABS BIAS |
10.46 |
1.66 |
3.06 |
34.08 |
New York and Florida SAE |
1.84 |
3.90 |
6.78 |
0.85 |
New York and Florida DBE |
1.82 |
3.48 |
7.04 |
0.85 |
NY_FL1 |
1.70 |
3.98 |
6.23 |
0.76 |
NY_FL2 |
1.88 |
4.04 |
6.94 |
0.87 |
NY_FL3 |
1.93 |
4.51 |
7.36 |
0.81 |
NY_FL4 |
1.88 |
3.93 |
6.69 |
0.92 |
NY_FL5 |
1.82 |
3.66 |
6.30 |
0.93 |
NY_FL6 |
1.69 |
4.13 |
5.80 |
0.78 |
NY_FL7 |
1.59 |
3.68 |
5.37 |
0.77 |
NY_FL8 |
2.02 |
3.78 |
7.52 |
0.99 |
RMSQ |
7.37 |
15.84 |
12.32 |
9.65 |
REL ABS BIAS |
0.37 |
13.97 |
7.33 |
0.94 |
Ohio and Michigan SAE |
1.64 |
3.84 |
5.44 |
0.70 |
Ohio and Michigan DBE |
1.66 |
4.00 |
5.59 |
0.67 |
OH_MI1 |
1.52 |
3.21 |
5.34 |
0.64 |
OH_MI2 |
1.72 |
3.91 |
5.27 |
0.82 |
OH_MI3 |
1.75 |
3.98 |
5.49 |
0.81 |
OH_MI4 |
1.59 |
4.35 |
5.00 |
0.63 |
OH_MI5 |
1.60 |
4.46 |
5.16 |
0.61 |
OH_MI6 |
1.62 |
3.49 |
5.73 |
0.66 |
OH_MI7 |
1.80 |
3.21 |
6.01 |
0.89 |
OH_MI8 |
1.63 |
3.91 |
5.25 |
0.70 |
RMSQ |
5.31 |
12.04 |
6.37 |
16.34 |
REL ABS BIAS |
0.37 |
4.68 |
3.25 |
7.16 |
Pennsylvania and Illinois SAE |
1.74 |
3.40 |
5.92 |
0.86 |
Pennsylvania and Illinois DBE |
1.70 |
3.17 |
5.84 |
0.85 |
PA_IL1 |
1.83 |
3.02 |
5.67 |
1.05 |
PA_IL2 |
1.69 |
4.03 |
5.05 |
0.84 |
PA_IL3 |
1.69 |
3.22 |
6.47 |
0.72 |
PA_IL4 |
1.75 |
3.85 |
5.74 |
0.83 |
PA_IL5 |
2.03 |
4.10 |
6.98 |
0.96 |
PA_IL6 |
1.65 |
3.06 |
5.47 |
0.86 |
PA_IL7 |
1.77 |
3.23 |
6.85 |
0.77 |
PA_IL8 |
1.78 |
3.39 |
5.23 |
1.01 |
RMSQ |
7.65 |
16.41 |
11.93 |
13.63 |
REL ABS BIAS |
4.11 |
10.08 |
1.54 |
4.09 |
AVERAGE RMSQ |
8.52 |
13.83 |
10.24 |
19.47 |
AVERAGE REL ABS BIAS |
3.83 |
7.60 |
3.79 |
11.57 |
Note: Relative Absolute Bias = |(Combined State Design-Based Estimate (DBE) - Mean of Eight Substate Small Area Estimates (SAE)|/Combined State Design-Based Estimate. Note: Root Mean Square (RMSQ) = Sqrt(Mean Squared Differences of Substate Small Area Estimates with Respect to Combined State Design-Based Estimates)/Combined State Design-Based Estimate. Source: SAMHSA, Office of Applied Studies, National Household Survey on Drug Abuse, 2000. |
Table B.5 Ratio of Relative Width of Design-Based Confidence Intervals to Small Area Estimation Prediction Intervals for Persons Needing But Not Receiving Treatment for an Illicit Drug Problem in the Past Year: 2000 | ||||
Ratio of Relative Width | ||||
Total |
12-17 |
18-25 |
26 or Older | |
CA_TX1 |
1.35 |
1.60 |
1.95 |
1.03 |
CA_TX2 |
1.37 |
1.13 |
1.09 |
5.19 |
CA_TX3 |
1.42 |
1.38 |
1.58 |
1.89 |
CA_TX4 |
1.69 |
1.61 |
1.81 |
1.87 |
CA_TX5 |
1.19 |
1.15 |
1.57 |
2.37 |
CA_TX6 |
1.63 |
1.13 |
1.77 |
2.69 |
CA_TX7 |
1.54 |
1.43 |
2.04 |
2.73 |
CA_TX8 |
1.78 |
1.41 |
1.94 |
3.17 |
California and Texas |
1.37 |
1.09 |
1.18 |
1.83 |
AVERAGE OVER 8 SUBSTATES |
1.50 |
1.35 |
1.72 |
2.62 |
NY_FL1 |
1.39 |
1.91 |
1.30 |
5.42 |
NY_FL2 |
2.57 |
1.77 |
1.26 |
2.99 |
NY_FL3 |
1.42 |
1.91 |
1.65 |
2.57 |
NY_FL4 |
2.96 |
2.17 |
1.65 |
2.89 |
NY_FL5 |
2.42 |
2.41 |
1.55 |
2.74 |
NY_FL6 |
2.00 |
1.61 |
1.36 |
3.55 |
NY_FL7 |
1.54 |
1.85 |
1.84 |
2.62 |
NY_FL8 |
1.73 |
2.18 |
1.40 |
1.88 |
New York and Florida |
1.61 |
1.27 |
1.16 |
1.75 |
AVERAGE OVER 8 SUBSTATES |
2.01 |
1.97 |
1.50 |
3.08 |
OH_MI1 |
2.34 |
1.74 |
2.24 |
5.12 |
OH_MI2 |
2.16 |
1.90 |
1.24 |
2.18 |
OH_MI3 |
2.15 |
1.30 |
1.78 |
2.73 |
OH_MI4 |
2.03 |
1.70 |
1.65 |
5.23 |
OH_MI5 |
1.55 |
1.17 |
1.99 |
* |
OH_MI6 |
1.59 |
1.49 |
1.42 |
5.48 |
OH_MI7 |
1.84 |
2.20 |
1.17 |
1.73 |
OH_MI8 |
1.55 |
1.49 |
1.80 |
1.17 |
Ohio and Michigan |
1.37 |
1.22 |
1.01 |
1.42 |
AVERAGE OVER 8 SUBSTATES |
1.90 |
1.62 |
1.66 |
3.38 |
PA_IL1 |
1.63 |
2.36 |
1.79 |
1.23 |
PA_IL2 |
2.52 |
1.86 |
2.24 |
3.75 |
PA_IL3 |
1.74 |
2.00 |
1.46 |
* |
PA_IL4 |
1.59 |
1.34 |
2.12 |
2.31 |
PA_IL5 |
1.66 |
1.49 |
1.29 |
2.11 |
PA_IL6 |
1.79 |
2.12 |
1.32 |
1.94 |
PA_IL7 |
1.90 |
2.51 |
1.37 |
5.44 |
PA_IL8 |
2.12 |
1.94 |
1.49 |
1.76 |
Pennsylvania and Illinois |
1.48 |
1.38 |
1.30 |
1.36 |
AVERAGE OVER 8 SUBSTATES |
1.87 |
1.95 |
1.64 |
2.65 |
* Relative width not computed due to design-based estimate of zero. Note: Relative Width Ratio = (Length of Design-Based Confidence Interval/Design-Based Estimate)/(Length of Small Area Estimate Prediction Interval/Small Area Estimate). Source: SAMHSA, Office of Applied Studies, National Household Survey on Drug Abuse, 2000. |
^{3} The panel included William Bell of the U.S. Bureau of the Census; Partha Lahiri of the University of Nebraska; Balgobin Nandram of Worcester Polytechnic Institute and the National Center for Health Statistics; Wesley Schaible, formerly Associate Commissioner for Research and Evaluation at the Bureau of Labor Statistics; and Alan Zaslavsky of Harvard University. Other attendees involved in the development or discussion were Ralph Folsom, Judith Lessler, Avinash Singh, and Akhil Vaish of RTI and Doug Wright of SAMHSA.
^{4} The validation results were based on a preliminary model; therefore, the combined State estimates shown in Table B.4 generally will not agree with estimates made by combining the corresponding State estimates from Table 6 or 7 in Chapter 3.
This page was last updated on June 03, 2008. |