This document provides information on the model-based small area estimates of substance use and mental health disorders in states based on data from the 2021 National Survey on Drug Use and Health (NSDUH). These estimates are available online along with other related information.1
NSDUH is an annual survey of the civilian, noninstitutionalized population aged 12 or older, conducted from January through December, and is sponsored by the Substance Abuse and Mental Health Services Administration (SAMHSA). The survey collects information from individuals residing in households, noninstitutionalized group quarters (e.g., shelters, rooming houses, dormitories), and civilians living on military bases. The 2021 NSDUH used multimode data collection, in which respondents completed the survey via the web or in person in eligible locations.2 In 2021, NSDUH collected data from 69,850 respondents aged 12 or older.
NSDUH is planned and managed by SAMHSA’s Center for Behavioral Health Statistics and Quality (CBHSQ). Data collection and analysis are conducted under contract with RTI International.3 A summary of NSDUH’s methodology is given in Section A.2. Section A.3 lists all the tables and files associated with the 2021 state estimates. Section A.4 provides details on the suppression criteria used for suppressing the estimates. Information is given in Section A.5 on the confidence intervals and margins of error and how to make interpretations with respect to the small area estimates. Section A.6 discusses related substance use measures and warns users about not drawing conclusions by subtracting small area estimates from two different measures. Section A.7 briefly discusses methodological changes for the 2021 NSDUH.
The survey-weighted hierarchical Bayes (SWHB) small area estimation (SAE) methodology used in the production of state estimates from the 1999 to 20194 surveys also was used in the production of the 2021 state estimates. The SWHB methodology is described in Appendix E of the 2001 state report (Wright, 2003b) and in Folsom and colleagues (1999). A general model description is given in Section B.1 of this document. A list of measures (outcomes) for which small area estimates are produced is given in Section B.2. Predictors used in the 2021 SAE modeling are listed and described in Section B.3. Selection of predictors for SAE modeling is described in Section B.4.
Small area estimates obtained using the SWHB methodology are design consistent (i.e., the small area estimates for states with large sample sizes are close to the robust design-based estimates). Additionally, the national small area estimates5 are very close to the national design-based estimates. However, to ensure internal consistency, it is desirable to have the national small area estimates exactly match the national design-based estimates. This process is called “benchmarking.” The benchmarked state-level estimates are also potentially less biased than the unbenchmarked state-level estimates. Beginning in 2002, exact benchmarking was introduced, as described in Section B.5.6 Tables of the estimated numbers of individuals associated with each measure are available online,7 and an explanation of how these counts and their respective Bayesian confidence intervals8 are calculated can be found in Section B.6. Section B.7 discusses the method to compute aggregated estimates by combining two age groups. The definition and explanation of the formula used in estimating the marijuana initiation rate are given in Section B.8. Note that, unlike the other outcomes discussed in this document, marijuana initiation is calculated as a ratio of two measures.
State estimates for the age groups 12 to 17, 18 to 25, 26 or older, 18 or older, and 12 or older9 are provided for all measures except any mental illness (AMI), serious mental illness (SMI), receipt of mental health services, major depressive episode (MDE; i.e., depression), serious thoughts of suicide, suicide plans, and suicide attempts. Additionally, estimates for youths aged 12 to 17 are not available for past year heroin use because heroin use in the past year for youths aged 12 to 17 was extremely rare in the 2021 NSDUH. As a result, estimates of past year heroin for people aged 12 or older are also not produced.
Estimates of underage (aged 12 to 20) alcohol use and binge alcohol use were also produced.10 Alcohol consumption is expected to differ significantly across the 18 to 25 age group because of the legalization of alcohol at age 21. Therefore, it was decided that it would be useful to produce small area estimates for people aged 12 to 20. A short description of the methodology used to produce underage drinking estimates is provided in Section B.9.
The remainder of Section B covers four additional topics:
NSDUH is the primary source of statistical information on the use of tobacco, alcohol, prescription pain relievers, and other substances (e.g., marijuana, cocaine) by the U.S. civilian, noninstitutionalized population aged 12 or older. The survey also includes several series of questions that focus on mental health issues. NSDUH has been ongoing since 1971 and is conducted by the federal government. The survey collects information from residents of households and noninstitutional group quarters (e.g., shelters, rooming houses, dormitories) and from civilians living on military bases. NSDUH excludes homeless people who do not use shelters, military personnel on active duty, and residents of institutional group quarters, such as jails and hospitals. From 1999 to 2019, the data were collected via face-to-face (in-person) interviews at a respondent’s place of residence using a combination of computer-assisted personal interviewing conducted by an interviewer and audio computer-assisted self-interviewing. Because of the coronavirus disease 2019 (COVID-19) pandemic, an additional web data collection mode was introduced to the 2020 NSDUH and continued to be used in the 2021 survey.
SAMHSA suspended in-person data collection on the 2020 NSDUH on March 16, 2020, because of the COVID-19 pandemic, a situation that affected virtually all national surveys that collect data in person, including NSDUH. A small-scale data collection effort was conducted in July 2020 to test protocols to reduce the risk of COVID-19 infection through in-person data collection. Because of ongoing COVID-19 infection rates in the United States, however, it became evident that a return to full-scale in-person data collection would not be feasible for obtaining a representative sample with a sufficient number of interviews to produce national estimates with acceptable precision for people aged 12 or older. Therefore, SAMHSA approved multimode data collection (in-person and web-based data collection) for the 2020 NSDUH beginning in Quarter 4. In-person data collection resumed on October 1, 2020 (in locations where COVID-19 infection metrics were sufficiently low), and web-based data collection began on October 30, 2020. Therefore, in addition to the collection of data through multiple survey modes in 2020, there was a gap in full-scale data collection between Quarters 1 and 4. Detailed descriptions of the methodological changes to the 2020 NSDUH because of the COVID-19 pandemic are provided in Section A.7 of this document and in Chapters 2, 3, and 6 of the 2020 National Survey on Drug Use and Health (NSDUH): Methodological Summary and Definitions report (CBHSQ, 2021b).
The 2021 sample was selected using the coordinated sample design developed for the 2014 through 2022 NSDUHs. The coordinated sample design is state based, with an independent, multistage area probability sample within each state and the District of Columbia. This design designates 12 states as large sample states. These 12 states have the following target sample sizes per year: 4,560 interviews in California; 3,300 interviews in Florida, New York, and Texas; 2,400 interviews in Illinois, Michigan, Ohio, and Pennsylvania; and 1,500 interviews in Georgia, New Jersey, North Carolina, and Virginia. Making the sample sizes more proportional to the state population sizes improves the precision of national NSDUH estimates. This change also allows for a more cost-efficient sample allocation to the largest states while slightly increasing the sample sizes in smaller states to improve the precision of state estimates (note that the target sample size per year in the small states is 960 interviews, except for Hawaii, where the target sample size is 967 interviews). The fielded sample sizes for each state in 2021 are provided in Table C.1.
Nationally in 2021, a total of approximately 220,740 addresses were screened, and individuals responded within the screened addresses (see Table C.1). The weighted screening response rate (SRR) for 2021 was 22.2 percent, and the weighted interview response rate (IRR) was 46.2 percent, for an overall weighted response rate (ORR) of 10.3 percent (Table C.1). The ORRs for 2021 ranged from 7.1 percent in New Jersey to 18.6 percent in Vermont. Estimates have been adjusted to reflect the probability of selection, unit nonresponse, poststratification to known census population estimates, item imputation, and other aspects of the estimation process. These procedures are described in detail in 2021 National Survey on Drug Use and Health: Methodological Resource Book (CBHSQ, 2022a).
All sampled households are screened to confirm eligibility and to select zero, one, or two household members to participate in the survey. The weighted SRR is defined as the weighted number of successfully screened households (or dwelling units)11 divided by the weighted number of eligible households, or
where is the inverse of the unconditional probability of selection for the household (hh) and excludes all adjustments for nonresponse and poststratification.
In successfully screened households, eligible household members who were selected were asked to complete the interview. The weighted IRR for NSDUH is defined as the weighted number of respondents divided by the weighted number of selected people, or
where is the inverse of the probability of selection for the ith person and includes household-level nonresponse and poststratification adjustments. To be considered a completed interview, a respondent must provide enough data to pass the usable case rule.12
The weighted ORR is defined as the product of the weighted SRR and the weighted IRR or
For more details on the screening and response rates, see Section 3.3.1 in 2021 Methodological Resource Book (CBHSQ, 2022a).
This section lists all products associated with the 2021 state estimates. All estimates are based on data from the 2021 NSDUH only. Historically, starting with the 2002-2003 state report through the 2018-2019 state report, the state estimates have been produced by pooling two years of NSDUH data except for the 2002 state report where estimates were based only on 2002 data. The pooling of a current year’s data with a previous year’s data to produce state estimates was recommended by an SAE expert panel13 to increase the precision of year-to-year change estimates (e.g., 2017-2018 vs. 2018-2019). The panel also noted that a single year of NSDUH data is sufficient to produce reliable state estimates.
As mentioned in Section A.2, there was a gap in full-scale data collection between Quarters 1 and 4 of 2020 due to COVID-19, which made the 2020 data not comparable with any other year. It is anticipated that 2021 will become a new baseline for monitoring NSDUH trends, and as a result, state estimates presented here are based on 2021 data alone and are not compared with prior state estimates.
The following products exclude age groups 12 to 17 and 12 or older for past year heroin use because in 2021 heroin use among youth was very rare. Additionally, a suppression rule was applied to the state SAEs and suppressed estimates are noted by an asterisk (*) in the various tables discussed below. Information about the suppression criteria can be found in Section A.4. In addition to this methodology document for the 2021 state estimates, the following products are available at https://www.samhsa.gov/data/nsduh/state-reports-NSDUH-2021:
Beginning in 2021, suppression is applied to unreliable estimates. The estimates meeting the suppression criteria discussed here are designated as unreliable and are not shown in tables and are noted by asterisks (*). The suppression criterion is based on a combination of the relative standard error (RSE) of or and the effective sample size (EFN), where p denotes the unbenchmarked small area estimate and denotes the natural logarithm of p. For p ≤ 50 percent, an RSE of is used, and for p > 50 percent, an RSE of is used. The separate formulas for p ≤ 50 percent and p > 50 percent produce a symmetric suppression rule; that is, if p is suppressed, then so will (1 − p). By using the first-order Taylor series approximation method, an estimate of an RSE of and an RSE of is given by
where denotes the posterior variance of p. The EFN is defined as , where n denotes the raw sample size and design effect is defined as ; hence, . A lower bound of 0.2 also was imposed on the design effects (i.e., all design effects that were less than 0.2 were changed to 0.2) to avoid publishing state by age group estimates with very small sample sizes or small prevalence estimates.
The following criterion was used to suppress state small area estimates:
when p < 5.23 percent, then suppress if an RSE of > 17.5 percent; when 5.23 percent ≤ p ≤ 94.77 percent, then suppress if the EFN ≤ 68; and when p > 94.77 percent, then suppress if an RSE of > 17.5 percent.
A graph is shown in Figure 1 in order to describe the relationship between p and the EFN for an RSE of = 17.5 percent when p ≤ 50 percent and for an RSE of = 17.5 percent when p > 50 percent. The suppression criterion switches to EFN between 5.23 percent and 94.77 percent so that the EFN is not allowed to fall below the EFN of 68 required at p = 50 percent.
At the top of each of the 35 tables showing state-level model-based estimates14 is the design-based national estimate along with a 95 percent design-based confidence interval, all of which are based on the survey design, the survey weights, and the reported data. The state estimates are model-based statistics (using SAE methodology) that have been adjusted (benchmarked) such that the population-weighted mean of the estimates across the 50 states and the District of Columbia equals the design-based national estimate. For more details on this benchmarking, see Section B.5. The region-level estimates are also benchmarked and are obtained by taking the population-weighted mean of the associated state-level benchmarked estimates. Associated with each state and regional estimate is a 95 percent Bayesian confidence interval. These intervals indicate the uncertainty in the estimate due to both sampling variability and model fit. For example, the state with the highest estimate of past month use of marijuana for young adults aged 18 to 25 in 2021 was Arizona, with an estimate of 40.0 percent and a 95 percent Bayesian confidence interval that ranged from 31.8 to 48.8 percent (see Table 3 of the state model-based prevalence estimates’ tables [CBHSQ, 2022c]). Assuming that sampling and modeling conditions held, the Bayes posterior probability was 0.95 that the true percentage of past month marijuana use in Arizona for young adults aged 18 to 25 in 2021 was between 31.8 and 48.8 percent. As noted earlier in footnote 8, the term “prediction interval” (PI) was used in the 2004-2005 NSDUH state report (Wright et al., 2007) and prior reports to represent uncertainty in the state and regional estimates. However, that term also is used in other applications to estimate future values of a parameter of interest. That interpretation does not apply to NSDUH state model-based estimates, so PI was replaced with “Bayesian confidence interval.”
“Margin of error” is another term used to describe uncertainty in the estimates. For example, if is a 95 percent symmetric confidence interval for the population proportion (p) and is an estimate of p obtained from the survey data, then the margin of error of is given by or . When is a symmetric confidence interval, will be the same as . The margin of error will vary for each estimate and will be affected not only by the sample size (e.g., the larger the sample, the smaller the margin of error) but also by the sample design (e.g., telephone surveys using random digit dialing and surveys employing a stratified multistage cluster design will, more than likely, produce a different margin of error) (Scheuren, 2004).
The confidence intervals shown in NSDUH state reports are asymmetric, meaning that the distance between the estimate and the lower confidence limit will not be the same as the distance between the upper confidence limit and the estimate. For example, Utah’s 2021 past month marijuana use estimate is 16.0 percent for young adults aged 18 to 25, with a 95 percent Bayesian confidence interval equal to 12.3 to 20.6 percent (see Table 3 of the state model-based prevalence estimates’ tables [CBHSQ, 2022c]). Therefore, Utah’s estimate is 3.7 (i.e., 16.0 – 12.3) percentage points from the lower 95 percent confidence limit and 4.6 (i.e., 20.6 – 16.0) percentage points from the upper limit. These asymmetric confidence intervals work well for small percentages often found in NSDUH state estimate tables and reports while still being appropriate for larger percentages. Some surveys or polls provide only one margin of error for all reported percentages. This single number is usually calculated by setting the sample percentage estimate () equal to 50 percent, which will produce an upper bound or maximum margin of error. Such an approach would not be feasible in this situation because the NSDUH state estimates vary from less than 1 percent to more than 75 percent; hence, applying a single margin of error to these estimates could significantly overstate or understate the actual precision levels. Therefore, given the differences mentioned above, it is more useful and informative to report the confidence interval for each estimate instead of a margin of error.
When it is indicated that a state has the highest or lowest estimate, it does not imply that the state’s estimate is significantly higher or lower than the next highest or lowest state’s estimate. Additionally, two significantly different state estimates (at the 5 percent level of significance) may have overlapping 95 percent confidence intervals. For details on a more accurate test to compare state estimates, see 2021 National Survey on Drug Use and Health: Comparison of Population Percentages from the United States, Census Regions, States, and the District of Columbia (CBHSQ, forthcoming a).
State estimates are produced for a number of related measures, such as marijuana use in the past month and illicit drug use in the past month, or alcohol use disorder in the past year and needing but not receiving treatment at a specialty facility for alcohol use in the past year. It might appear that one could draw conclusions by subtracting one from the other (e.g., subtracting the percentage who misused pain relievers in the past year from the percentage who misused opioids [misuse of pain relievers or use of heroin] in the past year to find the percentage who used only heroin in the past year but did not misuse pain relievers). Because related measures have been estimated separately with different models, subtracting one measure from another related measure at the state or census region level can give misleading results, perhaps even a “negative” estimate, and should be avoided. Users are advised to view the estimates along with their respective confidence intervals to get a better idea of the range in which the “true” value of the prevalence rate might fall (see Section A.5 for more details).
However, at the national level, because these estimates are design-based estimates, such comparisons can be made. For example, at the national level, subtracting estimates for cigarette use in the past month from the estimates of tobacco use in the past month will give the estimate of people who did not use cigarettes in the past month but used other forms of tobacco, such as cigars, pipes, or smokeless tobacco, in the past month.
Similar to the 2020 NSDUH, the COVID-19 pandemic affected data collection for the 2021 NSDUH. The 2021 NSDUH continued the use of multimode data collection procedures that were first implemented in October 2020 for the 2020 NSDUH. Multimode data collection was used for the entire 2021 NSDUH sample; however, the proportion of in-person interviews gradually increased from the beginning to the end of 2021. Even so, the multimode nature of the 2021 NSDUH is an important methodological difference from previous years. This section discusses special methodological issues specific to the 2021 NSDUH. More detailed information can be found in Chapter 6 of the 2021 Methodological Summary and Definitions report (CBHSQ, 2022b).
Before October 2020, all NSDUH data were collected in person, mainly in respondents’ homes. It was known that the use of multimode data collection for the 2020 NSDUH could affect the validity of comparisons between estimates from 2020 and those from prior years. However, the benefits of including a web-based interview option outweighed this concern, especially given the limitations on in-person data collection imposed by the COVID-19 pandemic. The COVID-19 pandemic forced all in-person data collection for NSDUH to stop in mid-March 2020. Except for a brief data collection period in July 2020 to test in-person safety protocols, data collection did not resume until Quarter 4 of 2020 (i.e., October 2020). The web data collection mode was introduced for NSDUH in Quarter 4 of 2020, and more than 90 percent of interviews in that quarter were conducted via the web. This multimode design continued into 2021, although some modifications were made to the data collection procedures, as discussed in Section 2.2 of the 2021 Methodological Summary and Definitions report (CBHSQ, 2022b). More than three quarters of interviews in Quarter 1 were completed via the web (76.6 percent). By Quarter 4, fewer than half of the interviews (41.5 percent) were completed that way. Altogether, 54.6 percent of the 2021 interviews were completed via the web.
National estimates differed significantly by web and in-person modes of data collection (also known as a “mode effect”). These differences were observed even in analyses that adjusted for demographic characteristics of respondents such as age, gender, race, and Hispanic origin. Consequently, estimates based on both web and in-person interviews were not comparable with estimates based on only one of these data collection modes. Weighting for the demographic characteristics of the sample to match the demographic characteristics of the population only partially adjusts for this difference. Differences between web and in-person respondents for most measures were not consistent across quarters. See Section 6.2.2 of CBHSQ (2022b) for more information about the findings from the assessments of multimode methodological changes in 2021. The estimates based on 2021 data represent an overall average of temporal (over 4 quarters) and data collection mode effects. Due to these methodological differences, it is not recommended to compare 2021 data with data from prior surveys.
NSDUH person-level weights are calibrated to population estimates for the state and demographic domains provided by the U.S. Census Bureau. For the 2011-2020 NSDUHs, the population estimates used in the poststratification adjustment were based on population estimates projected from the 2010 decennial census. Starting with the 2021 NSDUH, population estimates based on the 2020 decennial census were used in developing the person-level analysis weights.
The 2020 decennial census population estimates represent the current population characteristics more accurately than the population estimates calculated from the 2010 decennial census. As the U.S. Census Bureau noted in a press release on the quality indicators for the 2020 census,15 “Despite all the challenges of the pandemic, the completeness and accuracy of 2020 Census results are comparable with recent censuses.” For more details on how the 2021 NSDUH weights were developed, refer to Section 2.3.4 of CBHSQ (2022b).
The multimode data collection affected all methodological areas such as imputation procedures, weighting procedures, presentation of the data, and analysis and interpretation of the data. Given the impact of methodological differences in 2021, on the estimates (see Sections 6.1 and 6.2 of CBHSQ [2022b]), it was decided that it would not be appropriate to compare estimates from the 2021 NSDUH with those from prior years. For this reason, no statistical comparisons between the 2021 state estimates and estimates from prior years were done.
The state small area estimation (SAE) model is a complex mixed16 (including both fixed and random effects) logistic regression model of the following form:
where is the probability of engaging in the behavior of interest (e.g., using marijuana in the past month) for person-k belonging to age group-a in grouped state sampling region (SSR)-j of state-i.17 Let denote a vector of predictor variables (independent variables or fixed effects) associated with age group-a (12 to 17, 18 to 25, 26 to 34, and 35 or older) and denote the associated vector of the regression parameters. The age group-specific vectors of the auxiliary variables are defined for every block group in the nation and also include person-level demographic variables, such as race/ethnicity and gender. The vectors of state-level random effects and grouped SSR-level random effects are assumed to be mutually independent with and where is the total number of individual age groups modeled (generally, ). For hierarchical Bayes (HB) estimation purposes, an improper uniform prior distribution is assumed for , and proper Wishart prior distributions are assumed for and . The HB solution for involves a series of complex Markov Chain Monte Carlo (MCMC) steps to generate values of the desired fixed and random effects from the underlying joint posterior distribution. The basic process is described in Folsom and colleagues (1999); Shah and colleagues (2000); and Wright (2003a, 2003b).
Once the required number of MCMC samples (1,250 in all) for the parameters of interest are generated and tested for convergence properties (see Raftery & Lewis, 1992), the small area estimates for each race/ethnicity × gender cell within a block group can be obtained for each age group. These block group-level small area estimates then can be aggregated using the appropriate population count projections for the desired age group(s) to form state-level small area estimates. These state-level small area estimates are benchmarked to the national design-based estimates as described in Section B.5.
The following list contains all binary (0,1) measures for which age group-specific state estimates were produced. For all measures listed below, 2021 National Survey on Drug Use and Health (NSDUH) data were used to produce estimates.
Local area data used as potential predictor variables in the mixed logistic regression models were obtained from the following sources:
Data sources, along with the description of potential predictor variables obtained from each source, are provided in the following lists.
|Claritas Data (Description)||Claritas Data (Level)|
|% Population Aged 0 to 19 in Block Group||Block Group|
|% Population Aged 20 to 24 in Block Group||Block Group|
|% Population Aged 25 to 34 in Block Group||Block Group|
|% Population Aged 35 to 44 in Block Group||Block Group|
|% Population Aged 45 to 54 in Block Group||Block Group|
|% Population Aged 55 to 64 in Block Group||Block Group|
|% Population Aged 65 or Older in Block Group||Block Group|
|% Non-Hispanic Blacks in Block Group||Block Group|
|% Hispanics in Block Group||Block Group|
|% Non-Hispanic Other Races in Block Group||Block Group|
|% Non-Hispanic Whites in Block Group||Block Group|
|% Males in Block Group||Block Group|
|% American Indians, Eskimos, Aleuts in Tract||Tract|
|% Asians, Pacific Islanders in Tract||Tract|
|% Population Aged 0 to 19 in Tract||Tract|
|% Population Aged 20 to 24 in Tract||Tract|
|% Population Aged 25 to 34 in Tract||Tract|
|% Population Aged 35 to 44 in Tract||Tract|
|% Population Aged 45 to 54 in Tract||Tract|
|% Population Aged 55 to 64 in Tract||Tract|
|% Population Aged 65 or Older in Tract||Tract|
|% Non-Hispanic Blacks in Tract||Tract|
|% Hispanics in Tract||Tract|
|% Non-Hispanic Other Races in Tract||Tract|
|% Non-Hispanic Whites in Tract||Tract|
|% Males in Tract||Tract|
|% Population Aged 0 to 19 in County||County|
|% Population Aged 20 to 24 in County||County|
|% Population Aged 25 to 34 in County||County|
|% Population Aged 35 to 44 in County||County|
|% Population Aged 45 to 54 in County||County|
|% Population Aged 55 to 64 in County||County|
|% Population Aged 65 or Older in County||County|
|% Non-Hispanic Blacks in County||County|
|% Hispanics in County||County|
|% Non-Hispanic Other Races in County||County|
|% Non-Hispanic Whites in County||County|
|% Males in County||County|
|American Community Survey (ACS) (Description)||ACS Data (Level)|
|% Population Who Dropped Out of High School||Tract|
|% Housing Units Built in 1940 to 1949||Tract|
|% Females 16 Years or Older in Labor Force||Tract|
|% Females Never Married||Tract|
|% Females Separated, Divorced, Widowed, or Other||Tract|
|% One-Person Households||Tract|
|% Males 16 Years or Older in Labor Force||Tract|
|% Males Never Married||Tract|
|% Males Separated, Divorced, Widowed, or Other||Tract|
|% Housing Units Built in 1939 or Earlier||Tract|
|Average Number of Persons per Room||Tract|
|% Families below Poverty Level||Tract|
|% Households with Public Assistance Income||Tract|
|% Housing Units Rented||Tract|
|% Population with 9 to 12 Years of School, No High School Diploma||Tract|
|% Population with 0 to 8 Years of School||Tract|
|% Population with Associate’s Degree||Tract|
|% Population with Some College and No Degree||Tract|
|% Population with Bachelor’s, Graduate, Professional Degree||Tract|
|% Housing Units with No Telephone Service Available||Tract|
|% Households with No Vehicle Available||Tract|
|% Population with No Health Insurance||Tract|
|Median Rents for Rental Units||Tract|
|Median Value of Owner-Occupied Housing Units||Tract|
|Median Household Income||Tract|
|% Families below the Poverty Level||County|
|Uniform Crime Report (UCR) Data (Description)||UCR Data (Level)|
|Drug Possession Arrest Rate||County|
|Drug Sale or Manufacture Arrest Rate||County|
|Drug Violations’ Arrest Rate||County|
|Marijuana Possession Arrest Rate||County|
|Marijuana Sale or Manufacture Arrest Rate||County|
|Opium or Cocaine Possession Arrest Rate||County|
|Opium or Cocaine Sale or Manufacture Arrest Rate||County|
|Other Drug Possession Arrest Rate||County|
|Other Dangerous Non-Narcotics Arrest Rate||County|
|Serious Crime Arrest Rate||County|
|Violent Crime Arrest Rate||County|
|Driving under Influence Arrest Rate||County|
|Other Categorical Data (Description)||Other Categorical Data (Source)||Other Categorical Data (Level)|
|= 1 if Hispanic, = 0 Otherwise||National Survey on Drug Use and Health (NSDUH) Sample||Person|
|= 1 if Non-Hispanic Black, = 0 Otherwise||NSDUH Sample||Person|
|= 1 if Non-Hispanic Other, = 0 Otherwise||NSDUH Sample||Person|
|= 1 if Male, = 0 if Female||NSDUH Sample||Person|
|= 1 if Metropolitan Statistical Area (MSA) with
≥ 1 Million, = 0 Otherwise
|= 1 if MSA with < 1 Million, = 0 Otherwise||2010 Census||County|
|= 1 if Non-MSA Urban, = 0 Otherwise||2010 Census||Tract|
|= 1 if Urban Area, = 0 if Rural Area||2010 Census||Tract|
|= 1 if No Cubans in Tract, = 0 Otherwise||2010 Census||Tract|
|= 1 if No Arrests for Dangerous Non-Narcotics,
= 0 Otherwise
|Uniform Crime Report (UCR)||County|
|= 1 if No Arrests for Opium or Cocaine Possession,
= 0 Otherwise
|= 1 if No Housing Units Built in 1939 or Earlier,
= 0 Otherwise
|American Community Survey (ACS)||Tract|
|= 1 if No Housing Units Built in 1940 to 1949,
= 0 Otherwise
|= 1 if No Households with Public Assistance Income,
= 0 Otherwise
|Miscellaneous Data (Description)||Miscellaneous Data (Source)||Miscellaneous Data (Level)|
|Alcohol Death Rate, Underlying Cause||National Center for Health Statistics’ International Classification of Diseases, 10th revision (NCHS-ICD-10)||County|
|Cigarette Death Rate, Underlying Cause||NCHS-ICD-10||County|
|Drug Death Rate, Underlying Cause||NCHS-ICD-10||County|
|Alcohol Treatment Rate||National Survey of Substance Abuse Treatment Services (N-SSATS)||County|
|Alcohol and Drug Treatment Rate||N-SSATS||County|
|Drug Treatment Rate||N-SSATS||County|
|Unemployment Rate||Bureau of Labor Statistics (BLS)||County|
|Per Capita Income (in Thousands)||Bureau of Economic Analysis (BEA)||County|
|Average Suicide Rate (per 10,000)||NCHS-ICD-10||County|
|Food Stamp Participation Rate||Census Bureau||County|
|Single State Agency Maintenance of Effort||National Association of State Alcohol and Drug Abuse Directors (NASADAD)||State|
|Block Grant Awards||Substance Abuse and Mental Health Services Administration (SAMHSA)||State|
|Cost of Services Factor Index||SAMHSA||State|
|Total Taxable Resources per Capita Index||U.S. Department of Treasury||State|
|% Hispanics Who Are Cuban||2010 Census||Tract|
The predictor variables used in the SAE models were selected from the set of potential predictors given above using the method described in Section B.4.
Predictor variable selection was done using the 2021 data for all measures, using the following multistep process:
The self-calibration built into the survey-weighted hierarchical Bayes (SWHB) solution ensures the population-weighted average of the state small area estimates will closely match the national design-based estimates. The national design-based estimates in NSDUH are based entirely on survey-weighted data using a direct estimation approach, whereas the state and census region estimates are model based. Given the self-calibration ensured by the SWHB method, for state reports prior to 2002, the standard Bayes prescription was followed; specifically, the posterior mean was used for the point estimate, and the tail percentiles of the posterior distribution were used for the Bayesian confidence interval limits.
Singh and Folsom (2001) extended Ghosh’s (1992) results on constrained Bayes estimation to include exact benchmarking to design-based national estimates. In the simplest version of this constrained Bayes solution where only the design-based mean is imposed as a benchmarking constraint, each of the 2021 state-by-age group small area estimates is adjusted by adding the common factor , where is the design-based national estimate and is the population-weighted mean of the state small area estimates for age group-a. The exactly benchmarked state-s and age group-a small area estimates then are given by . Experience with such additive adjustments suggests that the resulting exactly benchmarked state small area estimates will always be between 0 and 100 percent because the SWHB self-calibration ensures the adjustment factor is small relative to the size of the state-level small area estimates.
Relative to the Bayes posterior mean, these benchmark-constrained state small area estimates are biased by the common additive adjustment factor. Therefore, the posterior mean squared error (MSE) for each benchmarked state small area estimate has the square of this adjustment factor added to its posterior variance. To achieve the desirable feature of exact benchmarking, this constrained Bayes adjustment factor was implemented for the state-by-age group small area estimates. The associated Bayesian confidence (credible) intervals can be recentered at the benchmarked small area estimates on the logit scale with the symmetric interval end points based on the posterior root mean squared errors (RMSEs). The adjusted 95 percent Bayesian confidence intervals are defined as follows:
, D and
The associated posterior coverage probabilities for these benchmarked intervals are very close to the prescribed 0.95 value because the state small area estimates have posterior distributions that can be approximated exceptionally well by a Gaussian distribution after the logit transformation.
Tables 1 to 35 of 2021 National Survey on Drug Use and Health: Model-Based Estimated Totals (in Thousands) (50 States and the District of Columbia) (CBHSQ, forthcoming b) show the estimated numbers of individuals associated with each of the 34 measures of interest. To calculate these numbers, the benchmarked small area estimates and associated 95 percent Bayesian confidence intervals are multiplied by the 2021 population count of the state by the age group of interest (Tables C.1 to C.3 of this methodology document).
For example, past month use of alcohol among 18- to 25-year-olds in Alabama was 39.69 percent in 2021.28 The corresponding Bayesian confidence intervals ranged from 33.62 to 46.10 percent. The population count for 18- to 25-year-olds for 2021 in Alabama was 508,027 (see Table C.2 in Section C of this methodology document). Hence, the estimated number of 18- to 25-year-olds using alcohol in the past month in Alabama was 0.3969 × 508,027, which is 201,636.29 The associated Bayesian confidence intervals ranged from 0.3362 × 508,027 (i.e., 170,799) to 0.4610 × 508,027 (i.e., 234,200). Note that when estimates of the number of individuals are calculated for Tables 1 to 35 in 2021 Model-Based Estimated Totals (CBHSQ, forthcoming b), the unrounded percentages and population counts are used, then the numbers are reported to the nearest thousand. Hence, the number obtained by multiplying the published estimate with the published population estimate may not exactly match the counts published in these tables because of rounding differences.
The only exception to this calculation is the production of the estimated numbers of marijuana initiates. Those estimates cannot be directly calculated as the product of the percentage estimate of first use of marijuana and the population counts available in Section C. That is because the denominator of that percentage estimate is defined as the number of person-years at risk for marijuana initiation, which is a combination of individuals who never used marijuana and one half of the individuals who initiated in the past 24 months (see Section B.8 for more details).
Tables 1 to 35 of 2021 Model-Based Prevalence Estimates (CBHSQ, 2022c) show estimates for the following age groups: 12 to 17, 18 to 25, 26 or older, 18 or older, and 12 or older. If a user was interested in producing aggregated estimates, such as for those aged 12 to 25, the aggregated estimates could be calculated using prevalence estimates along with the population totals shown in Section C of this document. However, with the information provided in the tables, the confidence intervals cannot be calculated. Below is an example of the calculation of aggregated estimate for a given state.
In 2021, past month use of alcohol in Alabama among youths aged 12 to 17 was 5.96 percent, and among young adults aged 18 to 25 it was 39.69 percent.30 The population counts for 12- to 17-year-olds and 18- to 25-year-olds in 2021 in Alabama were 395,422 and 508,027, respectively (see Table C.2 in Section C of this methodology document). Hence, one would calculate the estimate for people aged 12 to 25 by first finding the number of users aged 12 to 25, which is 225,203 ([0.0596 × 395,422] + [0.3969 × 508,027]), then dividing that number by the population aged 12 to 25, which results in a rate of 24.93 percent (225,203 / [395,422 + 508,027]).
Initiation31 rates typically are calculated as the number of new initiates of a substance during a period of time (such as in the past year) divided by an estimate of the number of person-years of exposure (in thousands). The initiation definition used here employs a simpler form of the at-risk population based on the model-based methodology. This model-based initiation rate (i.e., first use of marijuana in the past year among people at risk for initiation of marijuana use) is defined as follows:
where is the number of marijuana initiates in the past 24 months, and is the number of persons who never used marijuana.
The initiation rate is expressed as a percentage or rate per 100 person-years of exposure. Note that this estimate uses a 2-year time period to accumulate initiation cases from the annual survey. By assuming further that the distribution of first use for the initiation cases is uniform across the 2-year interval, the total number of person-years of exposure is 1 year on average for the initiation cases plus 2 years for all the “never users” at the end of the time period. This approximation to the person-years of exposure permits one to recast the initiation rate as a function of two population prevalence rates—namely, the fraction of people who first used marijuana in the past 2 years and the fraction who had never used marijuana. Both of these prevalence estimates were estimated using the SWHB estimation approach. Note that only initiation rates for marijuana use are provided here.
To obtain small area estimates for people aged 12 to 20 for past month alcohol and binge alcohol use, a separate set of SAE models with predictors selected for the age groups 12 to 17, 18 to 20, 21 to 34, and 35 or older were used. Model-based estimates for people aged 12 to 20 were produced by taking the population-weighted average of the individual age group (12 to 17 and 18 to 20) estimates. Estimates for underage drinking for past month alcohol and binge alcohol use were benchmarked to match national design-based estimates for that age group using the process described in Section B.5.
In the 2021 NSDUH, questions about vaping marijuana were added to the emerging issues section of the questionnaire. Respondents who reported that they vaped anything were asked whether they ever vaped marijuana with a vaping device. Additionally, respondents who answered “yes” to ever vaping marijuana were then asked how long it had been since they last vaped marijuana with a vaping device.
To maintain consistent measures across years where possible, a general principle of editing is not to edit across interview sections (except in situations where answers to questions in a previous section govern skip logic in a later section). However, the introduction to the marijuana section of the interview did not mention the use of marijuana with a vaping device as one of the ways people could use marijuana. Therefore, respondents might not have thought about vaping marijuana when they answered the earlier marijuana questions. For this reason, data from these marijuana vaping questions were incorporated into the marijuana use measures and related measures that include marijuana beginning with the 2021 NSDUH. If respondents reported that they did not use marijuana in the marijuana section of the questionnaire, but they later reported that they vaped marijuana, they were considered to have used marijuana in their lifetime and in the applicable recency period.
For details on marijuana vaping, please refer to Section 184.108.40.206 of CBHSQ (2022b).
The NSDUH questionnaire includes questions to measure SUD for alcohol and drugs. SUD estimates for drugs and alcohol in the 2021 NSDUH were based on the criteria in the Diagnostic and Statistical Manual of Mental Disorders, 5th edition (DSM-5; American Psychiatric Association [APA], 2013). Respondents were asked SUD questions separately for any drugs or alcohol they used in the 12 months prior to the survey.32
Drugs included marijuana, cocaine (including crack), heroin, hallucinogens, inhalants, methamphetamine, and any use of prescription pain relievers, tranquilizers, stimulants, or sedatives. Beginning in 2021, NSDUH respondents who reported any use of prescription psychotherapeutic drugs (i.e., pain relievers, tranquilizers, stimulants, or sedatives) in the past year (i.e., not just misuse of prescription drugs) were asked the respective SUD questions for that category of prescription drugs.
DSM-5 includes the following SUD criteria (as measured in the 2021 NSDUH):
For alcohol, marijuana, cocaine, heroin, and methamphetamine, respondents were classified as having an SUD if they had at least 2 of the 11 criteria in a 12-month period. However, respondents were classified as having a hallucinogen use disorder or an inhalant use disorder if they had at least 2 of the first 10 criteria in the past 12 months; the withdrawal criterion does not apply to hallucinogens and inhalants.
For use or misuse of prescription drugs, the applicable DSM-5 criteria for classifying respondents as having a prescription drug use disorder depends on whether respondents misused prescription drugs or used but did not misuse prescription drugs in the past year. If respondents misused prescription drugs in the past year, they were classified as having a prescription drug use disorder if they had at least 2 of the 11 criteria shown. However, if respondents used but did not misuse prescription drugs in the past year, they were classified as having a prescription drug use disorder if they had at least two of the first nine criteria shown above. Criteria 10 (tolerance) and 11 (withdrawal) do not apply to respondents who used but did not misuse these prescription drugs in the past year; tolerance and withdrawal can occur as normal physiological adaptations when people use these prescription drugs appropriately under medical supervision (Hasin et al., 2015).
The following lists the substances and types of use or misuse that are included in the 2021 NSDUH state SAEs:
Illicit drug or alcohol use disorder includes data from past year users of alcohol, marijuana, cocaine, heroin, hallucinogens, inhalants, and methamphetamine, and past year misusers of prescription psychotherapeutic drugs. SAEs for this illicit drug or alcohol use disorder measure are not shown; however, it is relevant to the definition for the need for substance use treatment described in Section B.12.
For more information about the SUD definitions based on criteria from DSM-5, see Section 220.127.116.11 of CBHSQ (2022b).
The 2021 NSDUH included a series of questions designed to measure treatment need for an alcohol or illicit drug use problem and to determine people needing but not receiving treatment. Respondents were classified as needing substance use treatment in the past year if they met either of the following criteria:
For additional details on how respondents were classified as needing substance use treatment, see Section 18.104.22.168 of CBHSQ (2022b).
This section provides a summary of the measurement issues associated with six mental health outcome variables such as mental illness, depression, and suicidal thoughts and behaviors. Additional details can be found in Sections 3.4.6, 3.4.7, and 3.4.14 of CBHSQ (2022b).
In the 2000-2001 and 2002-2003 NSDUH state SAE reports (Wright, 2003a, 2003b; Wright & Sathe, 2005), the Kessler-6 (K6) distress scale was used to measure SMI (Kessler et al., 2003). However, SAMHSA discontinued producing state-level SMI estimates beginning with the release of the 2003-2004 state report (Wright & Sathe, 2006) because of concerns about the validity of using only the K6 distress scale without an impairment scale; see Section B.4.4 in Appendix B of the 2004 NSDUH national findings report (OAS, 2005). The use of the K6 distress scale continued in the 2003-2004 and the 2004-2005 state reports (Wright & Sathe, 2006; Wright et al., 2007), not as a measure of SMI but as a measure of serious psychological distress (SPD) because it was determined that the K6 scale measured only SPD and merely contributed to measuring SMI and AMI (see the details that follow).
In December 2006, a new technical advisory group was convened by SAMHSA’s OAS (which later became CBHSQ) and the Center for Mental Health Services to solicit recommendations for data collection strategies to address SAMHSA’s legislative requirements. Although the technical advisory group recognized the ideal way to estimate SMI in NSDUH would be to administer a clinical diagnostic interview annually to all 45,000 adult respondents, this approach was not feasible because of constraints on the interview time and the need for trained mental health clinicians to conduct the interviews. Therefore, the approach recommended by the technical advisory group and adopted by SAMHSA for NSDUH was to use short scales in the NSDUH interview that separately measure psychological distress and functional impairment for use in a statistical model that predicts whether a respondent had mental illness.
To accomplish this, SAMHSA’s CBHSQ initiated a Mental Health Surveillance Study (MHSS) in 2008 as part of NSDUH to develop and implement methods to estimate SMI. Models using the short scales for psychological distress and impairment to predict mental illness status were developed from a subsample of adult respondents who had completed the NSDUH interview and were administered a clinical psychological diagnostic interview soon afterward. For the clinical interview data, people were classified as having SMI if they had a diagnosable mental, behavioral, or emotional disorder in the past 12 months, other than a developmental disorder or SUD, that met the 4th edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV) criteria (APA, 1994) and resulted in substantial functional impairment. This estimation methodology was implemented in the 2008 NSDUH (for details on the 2008 model, see Section 22.214.171.124 of CBHSQ [2021b]).
Based on recommendations from this panel, estimates of SMI were presented based on a revised methodology; thus, they were not comparable with estimates for SMI or SPD shown in NSDUH state reports prior to 2009. However, in 2013, another revision to the methodology for creating SMI estimates was made, and the estimates presented for 2011 and 2012 were based on this revised methodology (and therefore are not comparable with previously published estimates of SMI). Thus, the 2008-2009, 2009-2010, and 2010-2011 SMI estimates were reproduced using the new 2013 methodology. The 2013 methodology refers to the 2012 model described as follows.
Clinical Measurement of Mental Illness. Mental illness was measured in the MHSS clinical interviews using an adapted version of the Structured Clinical Interview for the DSM-IV-TR Axis I Disorders, Research Version, Non-patient Edition (SCID) (First et al., 2002) and was differentiated by the level of functional impairment based on the Global Assessment of Functioning (GAF) scale (Endicott et al., 1976).36 Past year disorders assessed through the SCID included mood disorders (e.g., MDE, manic episode), anxiety disorders (e.g., panic disorder, generalized anxiety disorder, posttraumatic stress disorder), eating disorders (e.g., anorexia nervosa), intermittent explosive disorder, and adjustment disorder. In addition, the presence of psychotic symptoms was assessed. SUDs also were assessed, although these disorders were not used to produce estimates of mental illness.
The SCID and the GAF in combination were considered to be the “gold standard” for measuring mental illness.
The 2012 SMI Model. The 2012 SMI prediction model was fit with data from 4,912 MHSS respondents from 2008 through 2012. For more information about the instruments and items used to measure the variables employed in the 2012 model, see Sections 126.96.36.199 through 188.8.131.52 of CBHSQ (2022b). Specifically, in CBHSQ (2022b), the instrument used to measure mental illness in the clinical interviews is described, followed by descriptions of the scales and items in the main NSDUH interviews that were used as predictor variables in the model (i.e., the K6 and World Health Organization Disability Assessment Schedule [WHODAS] total scores, age, MDE, and suicidal thoughts). The response variable Y equaled 1 when an SMI diagnosis was positive based on the clinical interview; otherwise, Y was 0. Letting X be a vector of characteristics attached to a NSDUH respondent and letting the probability that this respondent had SMI be , the 2012 SMI prediction model was
where refers to an estimate of the SMI response probability .
The covariates in equation (1) came from the main NSDUH interview data:
A cut point probability was determined, so that if for a particular respondent, then the respondent was predicted to be SMI positive; otherwise, the respondent was predicted to be SMI negative. The cut point (0.260573529) was chosen so that the weighted number of false positives and false negatives in the MHSS dataset were as close to equal as possible. To produce state estimates for SMI, the predicted SMI status for all adult NSDUH respondents was used in SAE modeling as the dependent variables.
A second cut point probability (0.0192519810) was determined so that any respondent with an SMI probability greater than or equal to the cut point was predicted to be positive for AMI, and the remaining respondents were predicted to be negative for AMI. The second cut point was chosen so that the weighted numbers of AMI false positives and false negatives were as close to equal as possible.
Starting in 2021, the measures used in the mental illness models were all imputed. Therefore, the source variables to create the measures of AMI and SMI had no missing data.
Two sections related to MDE were included in the 2021 questionnaire: an adult depression section and an adolescent depression section. These sections were originally derived from DSM-IV criteria for MDE. Consistent with the more recent criteria in DSM-5, NSDUH does not exclude MDEs occurring exclusively in the context of bereavement.
Questions on depression permit estimates of MDE to be calculated. Separate sections were administered to adults aged 18 or older and youths aged 12 to 17. The adult questions were adapted from the depression section of the National Comorbidity Survey Replication (NCS-R), and the questions for youths were adapted from the depression section of the National Comorbidity Survey Replication Adolescent Supplement (NCS-A) (see https://www.hcp.med.harvard.edu/ncs/ ). To make the sections developmentally appropriate for youths, there are minor wording differences in a few questions between the adult and youth sections. Revisions to the questions in both sections were made primarily to reduce the length and to modify the NCS questions, which are interviewer administered, for self-administration in NSDUH.
According to DSM-5, a person is classified as having had an MDE37 in their lifetime if they had at least five or more of the following nine symptoms nearly every day (except where noted) in the same 2-week period, where at least one of the symptoms is a depressed mood or loss of interest or pleasure in daily activities: (1) depressed mood most of the day; (2) markedly diminished interest or pleasure in all or almost all activities most of the day; (3) significant weight loss when not sick or dieting, or weight gain when not pregnant or growing, or decrease or increase in appetite; (4) insomnia or hypersomnia; (5) psychomotor agitation or retardation at a level observable by others; (6) fatigue or loss of energy; (7) feelings of worthlessness or excessive or inappropriate guilt; (8) diminished ability to think or concentrate or indecisiveness; and (9) recurrent thoughts of death or suicidality (i.e., recurrent suicidal ideation without a specific plan, making a specific plan, or making an attempt). Unlike the other symptoms listed previously, recurrent thoughts of death or suicidality did not need to have occurred nearly every day (APA, 2013).
Respondents who have had an MDE in their lifetime are asked if, during the past 12 months, they had a period of depression lasting 2 weeks or longer while also having some of the other symptoms mentioned. Respondents reporting experiences consistent with them having had an MDE in the past year are asked questions from the SDS to measure the level of functional impairment in major life activities reported to be caused by the MDE in the past 12 months (Leon et al., 1997).
Starting in 2021, the variables for MDE among adults were statistically imputed. MDE variables were not statistically imputed for youths aged 12 to 17.
The 2021 NSDUH included sets of questions asking adults aged 18 or older whether they had serious thoughts of suicide, made any suicide plans, or had attempted suicide in the past 12 months. All adult respondents in 2021 were asked whether they made a suicide plan or attempted suicide regardless of whether they reported that they had serious thoughts of suicide in the past 12 months. Additionally, beginning in 2021, the variables for suicidal thoughts and behaviors among adults were statistically imputed, so these variables had no missing data for 2021.
|District of Columbia||19,060||17,200||3,580||23.03%||1,340||770||569,400||60.05%||13.83%|
|DU = dwelling unit.
Source: SAMHSA, Center for Behavioral Health Statistics and Quality, National Survey on Drug Use and Health, 2021.
|District of Columbia||350||150||33,728||44.53%||330||210||76,598||69.14%||660||410||459,074||59.72%|
|NOTE: Computations in this table are based on a respondent’s age at screening. Thus, the data in the Total Respondents column(s) could differ from data in other National Survey on Drug Use and Health tables that use the respondent’s age recorded during the interview.
Source: SAMHSA, Center for Behavioral Health Statistics and Quality, National Survey on Drug Use and Health, 2021.
|District of Columbia||430||200||54,718||52.84%||990||620||535,672||61.12%|
|NOTE: Computations in this table are based on a respondent’s age at screening. Thus, the data in the Total Respondents column(s) could differ from data in other National Survey on Drug Use and Health tables that use the respondent’s age recorded during the interview.
Source: SAMHSA, Center for Behavioral Health Statistics and Quality, National Survey on Drug Use and Health, 2021.
American Psychiatric Association. (1994). Diagnostic and statistical manual of mental disorders (DSM-IV) (4th ed.).
American Psychiatric Association. (2013). Diagnostic and statistical manual of mental disorders (DSM-5) (5th ed.). https://doi.org/10.1176/appi.books.9780890425787
Center for Behavioral Health Statistics and Quality. (2021b). 2020 National Survey on Drug Use and Health: Methodological summary and definitions. https://www.samhsa.gov/data/report/2020-methodological-summary-and-definitions
Center for Behavioral Health Statistics and Quality. (2022a). 2021 National Survey on Drug Use and Health: Methodological resource book. Substance Abuse and Mental Health Services Administration. https://www.samhsa.gov/data/report/nsduh-2021-methodological-resource-book-mrb
Center for Behavioral Health Statistics and Quality. (2022b). 2021 National Survey on Drug Use and Health: Methodological summary and definitions. https://www.samhsa.gov/data/report/2021-methodological-summary-and-definitions
Center for Behavioral Health Statistics and Quality. (2022c). 2021 National Survey on Drug Use and Health: Model-based prevalence estimates (50 states and the District of Columbia). https://www.samhsa.gov/data/report/2021-nsduh-state-prevalence-estimates
Center for Behavioral Health Statistics and Quality. (forthcoming a). 2021 National Survey on Drug Use and Health: Comparison of population percentages from the United States, census regions, states, and the District of Columbia. Substance Abuse and Mental Health Services Administration.
Center for Behavioral Health Statistics and Quality. (forthcoming b). 2021 National Survey on Drug Use and Health: Model-based estimated totals (in thousands). Substance Abuse and Mental Health Services Administration.
Center for Systems Science and Engineering, Johns Hopkins University (2021). Coronavirus resource center: Global map: COVID-19 dashboard. https://coronavirus.jhu.edu/map.html
Endicott, J., Spitzer, R. L., Fleiss, J. L., & Cohen, J. (1976). The Global Assessment Scale: A procedure for measuring overall severity of psychiatric disturbance. Archives of General Psychiatry, 33(6), 766-771. https://doi.org/10.1001/archpsyc.1976.01770060086012
First, M. B., Spitzer, R. L., Gibbon, M., & Williams, J. B. W. (2002). Structured Clinical Interview for DSM-IV-TR Axis I Disorders, Research Version, Non-patient Edition (SCID-I/NP). New York State Psychiatric Institute, Biometrics Research.
Folsom, R. E., Shah, B., & Vaish, A. (1999). Substance abuse in states: A methodological report on model based estimates from the 1994-1996 National Household Surveys on Drug Abuse. In Proceedings of the 1999 Joint Statistical Meetings, American Statistical Association, Survey Research Methods Section, Baltimore, MD (pp. 371-375). American Statistical Association.
Ghosh, M. (1992). Constrained Bayes estimation with applications. Journal of the American Statistical Association, 87(418), 533-540. https://doi.org/10.2307/2290287
Hasin, D. S., Greenstein, E., Aivadyan, C., Stohl, M., Aharonovich, E., Saha, T., Goldstein, R., Nunes, E. V., Jung, J., Zhang, H., & Grant, B. F. (2015). The Alcohol Use Disorder and Associated Disabilities Interview Schedule-5 (AUDADIS-5): Procedural validity of substance use disorders modules through clinical re-appraisal in a general population sample. Drug and Alcohol Dependence, 148, 40-46. https://doi.org/10.1016/j.drugalcdep.2014.12.011
Kessler, R. C., Barker, P. R., Colpe, L. J., Epstein, J. F., Gfroerer, J. C., Hiripi, E., Howes, M. J., Normand, S. L., Manderscheid, R. W., Walters, E. E., & Zaslavsky, A. M. (2003). Screening for serious mental illness in the general population. Archives of General Psychiatry, 60(2), 184-189. https://doi.org/10.1001/archpsyc.60.2.184
Leon, A. C., Olfson, M., Portera, L., Farber, L., & Sheehan, D. V. (1997). Assessing psychiatric impairment in primary care with the Sheehan Disability Scale. International Journal of Psychiatry in Medicine, 27(2), 93-105. https://doi.org/10.2190/t8em-c8yh-373n-1uwd
Office of Applied Studies. (2005). Results from the 2004 National Survey on Drug Use and Health: National findings (HHS Publication No. SMA 05-4062, NSDUH Series H-28). Substance Abuse and Mental Health Services Administration.
Raftery, A. E., & Lewis, S. (1992). How many iterations in the Gibbs sampler? In J. M. Bernardo, J. O. Berger, A. P. Dawid, & A. F. M. Smith (Eds.), Bayesian statistics 4 (pp. 763-774). Oxford University Press.
Rao, J. N. K. (2003). Small area estimation (Wiley Series in Survey Methodology) (1st ed.). John Wiley & Sons.
RTI International. (2013). SUDAAN® language manual, release 11.0.1.
SAS Institute Inc. (2017). SAS/STAT software: Release 14.1.
Scheuren, F. (2004). What is a survey? (2nd ed.). https://www.unh.edu/institutional-research/sites/default/files/media/2022-05/what-is-a-survey.pdf
Shah, B. V., Barnwell, B. G., Folsom, R., & Vaish, A. (2000). Design consistent small area estimates using Gibbs algorithm for logistic models. In Proceedings of the 2000 Joint Statistical Meetings, American Statistical Association, Survey Research Methods Section, Indianapolis, IN (pp. 105-111). American Statistical Association.
Singh, A. C., & Folsom, R. E. (2001, April 11-14). Hierarchical Bayes calibrated domain estimation via Metropolis-Hastings Step in MCMC with application to small areas. Presented at the International Conference on Small Area Estimation and Related Topics, Potomac, MD.
Wright, D. (2003a). State estimates of substance use from the 2001 National Household Survey on Drug Abuse: Volume I. Findings (HHS Publication No. SMA 03-3775, NHSDA Series H-19). Substance Abuse and Mental Health Services Administration, Office of Applied Studies.
Wright, D. (2003b). State estimates of substance use from the 2001 National Household Survey on Drug Abuse: Volume II. Individual state tables and technical appendices (HHS Publication No. SMA 03-3826, NHSDA Series H-20). Substance Abuse and Mental Health Services Administration, Office of Applied Studies.
Wright, D., & Sathe, N. (2005). State estimates of substance use from the 2002-2003 National Surveys on Drug Use and Health (HHS Publication No. SMA 05-3989, NSDUH Series H-26). Substance Abuse and Mental Health Services Administration, Office of Applied Studies.
Wright, D., & Sathe, N. (2006). State estimates of substance use from the 2003-2004 National Surveys on Drug Use and Health (HHS Publication No. SMA 06-4142, NSDUH Series H-29). Substance Abuse and Mental Health Services Administration, Office of Applied Studies.
Wright, D., Sathe, N., & Spagnola, K. (2007). State estimates of substance use from the 2004-2005 National Surveys on Drug Use and Health (HHS Publication No. SMA 07-4235, NSDUH Series H-31). Substance Abuse and Mental Health Services Administration, Office of Applied Studies.
This National Survey on Drug Use and Health (NSDUH) document was prepared by the Center for Behavioral Health Statistics and Quality (CBHSQ), Substance Abuse and Mental Health Services Administration (SAMHSA), U.S. Department of Health and Human Services (HHS), and by RTI International, Research Triangle Park, North Carolina. Work by RTI was performed under Contract No. HHSS283201700002C. Marlon Daniel served as government project officer and as the contracting officer representative.
This document was drafted by RTI and reviewed at SAMHSA. Production of the report at SAMHSA was managed by Rong Cai and Shiromani Gyawali.
1 Use the NSDUH link on the following webpage: https://www.samhsa.gov/data/nsduh/state-reports-NSDUH-2021.
2 Eligibility of areas for in-person data collection was determined by state- and county-level COVID-19 metrics collected by Johns Hopkins University (Center for Systems Science and Engineering, Johns Hopkins University, 2021) (see Section 2.2.1 of the 2021 National Survey on Drug Use and Health (NSDUH): Methodological Summary and Definitions report [CBHSQ, 2022b]).
3 RTI International is a trade name of Research Triangle Institute. RTI and the RTI logo are U.S. registered trademarks of Research Triangle Institute.
4 The 2019-2020 State SAEs were produced, but they have since been removed from SAMHSA’s website. Methodological investigations found that the unusual societal circumstances in 2020 and the resulting methodological revisions to NSDUH data collection have affected the comparability of 2020 estimates with estimates from 2019 and earlier. Consequently, estimates that involve combining data from 2020 with previous years have been removed from the SAMHSA website.
5 National small area estimates = Population-weighted averages of state-level small area estimates.
6 The census region-level estimates in the tables are population-weighted aggregates of the state estimates. The published national estimates, however, are benchmarked to exactly match the design-based estimates.
7 See Tables 1 to 35 in 2021 National Survey on Drug Use and Health: Model-Based Estimated Totals (in Thousands) (50 States and the District of Columbia) (CBHSQ, forthcoming b).
8 Note that in the 2004-2005 NSDUH state report (Wright et al., 2007) and prior reports, the term “prediction interval” (PI) was used to represent uncertainty in the state and regional estimates. However, that term also is used in other applications to estimate future values of a parameter of interest. That interpretation does not apply to NSDUH state report estimates; thus, “prediction interval” was dropped and replaced with “Bayesian confidence interval.”
9 For major depressive episode, estimates for people aged 12 or older are not included. For any mental illness, serious mental illness, receipt of mental health services, thoughts of suicide, suicide plans, and suicide attempts, estimates for youths aged 12 to 17 and people aged 12 or older are not included because youths are not asked these questions.
10 Binge drinking is defined as having five or more drinks (for males) or four or more drinks (for females) on the same occasion on at least 1 day in the 30 days prior to the survey.
11 A successfully screened household is one in which all screening questionnaire items were answered by an adult resident of the household and either zero, one, or two household members were selected for the NSDUH interview.
12 The usable case rule requires that a respondent answer “yes” or “no” to the question on lifetime use of cigarettes and “yes” or “no” to at least nine additional lifetime use questions.
13 The SAE expert panel, convened in 1999 and 2000, had six members: Dr. William Bell of the U.S. Bureau of the Census; Partha Lahiri, Professor of the Joint Program in Survey Methodology at the University of Maryland at College Park; Professor Balgobin Nandram of Worcester Polytechnic Institute; Wesley Schaible, formerly Associate Commissioner for Research and Evaluation at the Bureau of Labor Statistics; Professor J. N. K. Rao of Carleton University; and Professor Alan Zaslavsky of Harvard University.
14 See Tables 1 to 35 in 2021 Model-Based Prevalence Estimates (CBHSQ, 2022c).
16 The use of mixed models (fixed and random effects) allows additional error components (random effects) to be included. These account for differences between states and within-state variations that are not taken into account by the predictor variables (fixed effects) alone. It is also difficult (if not impossible) to produce valid mean squared errors (MSEs) for small area estimates based solely on a fixed-effect national regression model (i.e., synthetic estimation) (Rao, 2003, p. 52). The mixed models produce estimates that are approximately represented by a weighted combination of the direct estimate from the state data and a regression estimate from the national model. The regression coefficients of the national model are estimated using data from all of the states (i.e., borrowing strength), and the regression estimate for a particular state is obtained by applying the national model to the state-specific predictor data. The regression estimate for the state is then combined with the direct estimate from the state data in a weighted combination where the weights are obtained by minimizing the MSE (variance + squared bias) of the small area estimate.
17 To increase the precision of the estimated random effects at the within-state level, three SSRs from the 2021 sample were grouped together to form 250 grouped SSRs. California had 12 grouped SSRs; Florida, New York, and Texas each had 10 grouped SSRs; Illinois, Michigan, Ohio, and Pennsylvania each had 8 grouped SSRs; Georgia, New Jersey, North Carolina, and Virginia each had 5 grouped SSRs; and the rest of the states and the District of Columbia each had 4 grouped SSRs.
19 This is the first time state estimates of opioid misuse in the past year have been published by the Substance Abuse and Mental Health Services Administration (SAMHSA).
20 Estimates of underage (aged 12 to 20) alcohol use were also produced.
21 Estimates of underage (aged 12 to 20) binge alcohol use were also produced.
22 This is the first-time state estimates of opioid use disorder in the past year have been published by SAMHSA.
24 Generally, age groups are 12 to 17, 18 to 25, 26 to 34, and 35 or older. For underage alcohol and binge alcohol use, the age group is 12 to 20.
25 Depending on the measure and age group, significance levels were 1, 3, 5, or 10 percent.
26 Depending on the measure and age group, significance levels were 1, 3, or 5 percent.
27 Depending on the measure and age group, significance levels were 0.5, 1, 3, 5, or 10 percent.
28 See Table 14 in 2021 National Survey of Drug Use and Health: Model-Based Prevalence Estimates (50 States and the District of Columbia) (CBHSQ, 2022c).
29 See Table 14 in 2021 NSDUH: Model-Based Estimated Totals (CBHSQ, forthcoming b).
30 See Table 14 in 2021 NSDUH: Model-Based Prevalence Estimates (CBHSQ, 2022c).
31 In NSDUH SAE documents prior to 2016-2017, the term “initiation” was referred to as “incidence.”
32 NSDUH respondents in 2021 were asked the respective questions for alcohol use disorder or marijuana use disorder if they reported use of these substances on 6 or more days in the past year.
33 For alcohol, for example, withdrawal symptoms include (but are not limited to) trouble sleeping, hands trembling, hallucinations (seeing, feeling, or hearing things that are not really there), or feeling anxious.
34 For alcohol use disorder, for example, this criterion involves the use of alcohol, sedatives, or tranquilizers to get over or avoid alcohol withdrawal symptoms.
35 NSDUH respondents in 2021 were asked the respective questions for alcohol use disorder or marijuana use disorder only if they reported use of these substances on 6 or more days in the past year.
36 The GAF is a numeric scale used by mental health clinicians to quantify the severity of mental disorders and the extent to which mental disorders negatively affected a person’s daily functioning. In the MHSS, GAF scores were assigned by clinical interviewers at the end of each SCID interview based on information gathered throughout the interview about symptoms of mental disorders and related impairment. This procedure differs from use of the WHODAS in NSDUH, which relies on respondents’ (rather than clinicians’) perceptions of the extent to which their symptoms of psychological distress affected their day-to-day functioning.
37 “An MDE” refers to the occurrence of at least one MDE, rather than only one MDE. Similarly, reference to “the MDE” in a given period (e.g., the past 12 months) does not mean an individual had only one MDE in that period.
Long description, Figure 1. This figure is a graph of a function within a coordinate plane; the horizontal axis shows the estimated proportion (p = small area estimate), and the vertical axis shows the required effective sample size for the estimated proportion to be published. A horizontal line through the graph indicates that an effective sample size of 68 is required for the current suppression rule. There also is a dashed vertical line at the intersection of the estimated proportion of 0.05 and the effective sample size of 68. The graph decreases from an infinitely large required effective sample size when the estimated proportion is close to zero and approaches a local minimum of 50 when the estimated proportion is 0.2. The graph increases for estimated proportions greater than 0.2 until a required effective sample size of 68 is reached for an estimated proportion of 0.5. There also is a dashed vertical line at the intersection of the estimated proportion of 0.5 and the effective sample size of 68. The graph decreases for estimated proportions greater than 0.5 and approaches a local minimum of 50 for the required effective sample size when the estimated proportion is 0.8. The graph increases for estimated proportions greater than 0.8 and reaches an infinitely large required effective sample size when the estimated proportion is close to 1.0.
Long description end. Return to Figure1.
Long description, Equation 1. Capital S R R is equal to the ratio of two quantities. The numerator is the summation of the product of w sub h h and complete sub h h. The denominator is the summation of the product of w sub h h and eligible sub h h.
Long description end. Return to Equation 1.
Long description, Equation 2. Capital I R R is equal to the ratio of two quantities. The numerator is the summation of the product of w sub i and complete sub i. The denominator is the summation of the product of w sub i and selected sub i.
Long description end. Return to Equation 2.
Long description end. Return to Equation 3.
Long description, Equation 4. The relative standard error of the negative of the natural logarithm of p is equal to the square root of the posterior variance of p divided by the product of p and the negative of the natural logarithm of p. The relative standard error of the negative of the natural logarithm of 1 minus p is equal to the square root of the posterior variance of 1 minus p divided by the product of 1 minus p and the negative of the natural logarithm of 1 minus p.
Long description end. Return to Equation 4.
Long description, Equation 5. The model is given by the following equation: log of pi sub a, i, j, k divided by 1 minus pi sub a, i, j, k is equal to the sum of three terms. The first term is given by x transpose sub a, i, j, k times beta sub a. The second term is eta sub a, i. And the third term is nu sub a, i, j.
Long description end. Return to Equation 5.
Long description, Equation 6. Lower sub s and a is defined as the exponent of capital L sub s and a divided by the sum of 1 and the exponent of capital L sub s and a. And upper sub s and a is defined as the exponent of capital U sub s and a divided by the sum of 1 and the exponent of capital U sub s and a.
Long description end. Return to Equation 6.
Long description, Equation 7. Capital L sub s and a is defined as the difference of two quantities. The first quantity is the natural logarithm of the ratio of Theta sub s and a and 1 minus Theta sub s and a. The second quantity is the product of 1.96 and the square root of MSE sub s and a, which is the mean squared error for state-s and age group-a.
Long description end. Return to Equation 7.
Long description, Equation 8. Capital U sub s and a is defined as the sum of two quantities. The first quantity is the natural logarithm of the ratio of Theta sub s and a and 1 minus Theta sub s and a. The second quantity is the product of 1.96 and the square root of MSE sub s and a, which is the mean squared error for state-s and age group-a.
Long description end. Return to Equation 8.
Long description, Equation 9. The mean squared error, MSE sub s and a, is defined as the sum of two quantities. The first quantity is the square of the difference of two parts. Part 1 is defined as the natural logarithm of the ratio of capital P sub s and a and 1 minus capital P sub s and a. Part 2 is defined as the natural logarithm of the ratio of Theta sub s and a and 1 minus Theta sub s and a. The second quantity is the posterior variance of the natural logarithm of the ratio of capital P sub s and a and 1 minus capital P sub s and a.
Long description end. Return to Equation 9.
Long description, Equation 10. The average annual rate is defined as 100 times quantity q divided by 2. Quantity q is defined as capital X sub 1 divided by the sum of 0.5 times capital X sub 1 plus capital X sub 2.
Long description end. Return to Equation 10.
Long description, Equation 11. The logit of pi hat is equivalent to the logarithm of pi hat divided by the quantity 1 minus pi hat, which is equal to the sum of the following six quantities: negative 5.972664, the product of 0.0873416 and capital X sub k, the product of 0.3385193 and capital X sub w, the product of 1.9552664 and capital X sub s, the product of 1.1267330 and capital X sub m, and the product of 0.1059137 and capital X sub a.
Pi hat is equal to the ratio of two quantities. The numerator is 1. The denominator is 1 plus e raised to the negative value of the sum of the following six quantities: negative 5.972664, the product of 0.0873416 and capital X sub k, the product of 0.3385193 and capital X sub w, the product of 1.9552664 and capital X sub s, the product of 1.1267330 and capital X sub m, and the product of 0.1059137 and capital X sub a.
Long description end. Return to Equation 11.