2021
National Survey on Drug Use and Health:
Guide to State Tables and Summary of Small
Area Estimation Methodology

Section A: Overview of NSDUH and Model-Based State Estimates

A.1 Introduction

This document provides information on the model-based small area estimates of substance use and mental health disorders in states based on data from the 2021 National Survey on Drug Use and Health (NSDUH). These estimates are available online along with other related information.1

NSDUH is an annual survey of the civilian, noninstitutionalized population aged 12 or older, conducted from January through December, and is sponsored by the Substance Abuse and Mental Health Services Administration (SAMHSA). The survey collects information from individuals residing in households, noninstitutionalized group quarters (e.g., shelters, rooming houses, dormitories), and civilians living on military bases. The 2021 NSDUH used multimode data collection, in which respondents completed the survey via the web or in person in eligible locations.2 In 2021, NSDUH collected data from 69,850 respondents aged 12 or older.

NSDUH is planned and managed by SAMHSA’s Center for Behavioral Health Statistics and Quality (CBHSQ). Data collection and analysis are conducted under contract with RTI International.3 A summary of NSDUH’s methodology is given in Section A.2. Section A.3 lists all the tables and files associated with the 2021 state estimates. Section A.4 provides details on the suppression criteria used for suppressing the estimates. Information is given in Section A.5 on the confidence intervals and margins of error and how to make interpretations with respect to the small area estimates. Section A.6 discusses related substance use measures and warns users about not drawing conclusions by subtracting small area estimates from two different measures. Section A.7 briefly discusses methodological changes for the 2021 NSDUH.

The survey-weighted hierarchical Bayes (SWHB) small area estimation (SAE) methodology used in the production of state estimates from the 1999 to 20194 surveys also was used in the production of the 2021 state estimates. The SWHB methodology is described in Appendix E of the 2001 state report (Wright, 2003b) and in Folsom and colleagues (1999). A general model description is given in Section B.1 of this document. A list of measures (outcomes) for which small area estimates are produced is given in Section B.2. Predictors used in the 2021 SAE modeling are listed and described in Section B.3. Selection of predictors for SAE modeling is described in Section B.4.

Small area estimates obtained using the SWHB methodology are design consistent (i.e., the small area estimates for states with large sample sizes are close to the robust design-based estimates). Additionally, the national small area estimates5 are very close to the national design-based estimates. However, to ensure internal consistency, it is desirable to have the national small area estimates exactly match the national design-based estimates. This process is called “benchmarking.” The benchmarked state-level estimates are also potentially less biased than the unbenchmarked state-level estimates. Beginning in 2002, exact benchmarking was introduced, as described in Section B.5.6 Tables of the estimated numbers of individuals associated with each measure are available online,7 and an explanation of how these counts and their respective Bayesian confidence intervals8 are calculated can be found in Section B.6. Section B.7 discusses the method to compute aggregated estimates by combining two age groups. The definition and explanation of the formula used in estimating the marijuana initiation rate are given in Section B.8. Note that, unlike the other outcomes discussed in this document, marijuana initiation is calculated as a ratio of two measures.

State estimates for the age groups 12 to 17, 18 to 25, 26 or older, 18 or older, and 12 or older9 are provided for all measures except any mental illness (AMI), serious mental illness (SMI), receipt of mental health services, major depressive episode (MDE; i.e., depression), serious thoughts of suicide, suicide plans, and suicide attempts. Additionally, estimates for youths aged 12 to 17 are not available for past year heroin use because heroin use in the past year for youths aged 12 to 17 was extremely rare in the 2021 NSDUH. As a result, estimates of past year heroin for people aged 12 or older are also not produced.

Estimates of underage (aged 12 to 20) alcohol use and binge alcohol use were also produced.10 Alcohol consumption is expected to differ significantly across the 18 to 25 age group because of the legalization of alcohol at age 21. Therefore, it was decided that it would be useful to produce small area estimates for people aged 12 to 20. A short description of the methodology used to produce underage drinking estimates is provided in Section B.9.

The remainder of Section B covers four additional topics:

In Section C, the 2021 survey sample sizes, response rates, and population estimates are included in Tables C.1 to C.3, respectively.

A.2 Summary of NSDUH Methodology

NSDUH is the primary source of statistical information on the use of tobacco, alcohol, prescription pain relievers, and other substances (e.g., marijuana, cocaine) by the U.S. civilian, noninstitutionalized population aged 12 or older. The survey also includes several series of questions that focus on mental health issues. NSDUH has been ongoing since 1971 and is conducted by the federal government. The survey collects information from residents of households and noninstitutional group quarters (e.g., shelters, rooming houses, dormitories) and from civilians living on military bases. NSDUH excludes homeless people who do not use shelters, military personnel on active duty, and residents of institutional group quarters, such as jails and hospitals. From 1999 to 2019, the data were collected via face-to-face (in-person) interviews at a respondent’s place of residence using a combination of computer-assisted personal interviewing conducted by an interviewer and audio computer-assisted self-interviewing. Because of the coronavirus disease 2019 (COVID-19) pandemic, an additional web data collection mode was introduced to the 2020 NSDUH and continued to be used in the 2021 survey.

SAMHSA suspended in-person data collection on the 2020 NSDUH on March 16, 2020, because of the COVID-19 pandemic, a situation that affected virtually all national surveys that collect data in person, including NSDUH. A small-scale data collection effort was conducted in July 2020 to test protocols to reduce the risk of COVID-19 infection through in-person data collection. Because of ongoing COVID-19 infection rates in the United States, however, it became evident that a return to full-scale in-person data collection would not be feasible for obtaining a representative sample with a sufficient number of interviews to produce national estimates with acceptable precision for people aged 12 or older. Therefore, SAMHSA approved multimode data collection (in-person and web-based data collection) for the 2020 NSDUH beginning in Quarter 4. In-person data collection resumed on October 1, 2020 (in locations where COVID-19 infection metrics were sufficiently low), and web-based data collection began on October 30, 2020. Therefore, in addition to the collection of data through multiple survey modes in 2020, there was a gap in full-scale data collection between Quarters 1 and 4. Detailed descriptions of the methodological changes to the 2020 NSDUH because of the COVID-19 pandemic are provided in Section A.7 of this document and in Chapters 2, 3, and 6 of the 2020 National Survey on Drug Use and Health (NSDUH): Methodological Summary and Definitions report (CBHSQ, 2021b).

The 2021 sample was selected using the coordinated sample design developed for the 2014 through 2022 NSDUHs. The coordinated sample design is state based, with an independent, multistage area probability sample within each state and the District of Columbia. This design designates 12 states as large sample states. These 12 states have the following target sample sizes per year: 4,560 interviews in California; 3,300 interviews in Florida, New York, and Texas; 2,400 interviews in Illinois, Michigan, Ohio, and Pennsylvania; and 1,500 interviews in Georgia, New Jersey, North Carolina, and Virginia. Making the sample sizes more proportional to the state population sizes improves the precision of national NSDUH estimates. This change also allows for a more cost-efficient sample allocation to the largest states while slightly increasing the sample sizes in smaller states to improve the precision of state estimates (note that the target sample size per year in the small states is 960 interviews, except for Hawaii, where the target sample size is 967 interviews). The fielded sample sizes for each state in 2021 are provided in Table C.1.

Nationally in 2021, a total of approximately 220,740 addresses were screened, and individuals responded within the screened addresses (see Table C.1). The weighted screening response rate (SRR) for 2021 was 22.2 percent, and the weighted interview response rate (IRR) was 46.2 percent, for an overall weighted response rate (ORR) of 10.3 percent (Table C.1). The ORRs for 2021 ranged from 7.1 percent in New Jersey to 18.6 percent in Vermont. Estimates have been adjusted to reflect the probability of selection, unit nonresponse, poststratification to known census population estimates, item imputation, and other aspects of the estimation process. These procedures are described in detail in 2021 National Survey on Drug Use and Health: Methodological Resource Book (CBHSQ, 2022a).

All sampled households are screened to confirm eligibility and to select zero, one, or two household members to participate in the survey. The weighted SRR is defined as the weighted number of successfully screened households (or dwelling units)11 divided by the weighted number of eligible households, or

Equation 1. Click 'D' link to access long description.,     D

where w sub h h is the inverse of the unconditional probability of selection for the household (hh) and excludes all adjustments for nonresponse and poststratification.

In successfully screened households, eligible household members who were selected were asked to complete the interview. The weighted IRR for NSDUH is defined as the weighted number of respondents divided by the weighted number of selected people, or

Equation 2. Click 'D' link to access long description.,     D

where w sub i is the inverse of the probability of selection for the ith person and includes household-level nonresponse and poststratification adjustments. To be considered a completed interview, a respondent must provide enough data to pass the usable case rule.12

The weighted ORR is defined as the product of the weighted SRR and the weighted IRR or

Equation 3. Click 'D' link to access long description..     D

For more details on the screening and response rates, see Section 3.3.1 in 2021 Methodological Resource Book (CBHSQ, 2022a).

A.3 Presentation of Data

This section lists all products associated with the 2021 state estimates. All estimates are based on data from the 2021 NSDUH only. Historically, starting with the 2002-2003 state report through the 2018-2019 state report, the state estimates have been produced by pooling two years of NSDUH data except for the 2002 state report where estimates were based only on 2002 data. The pooling of a current year’s data with a previous year’s data to produce state estimates was recommended by an SAE expert panel13 to increase the precision of year-to-year change estimates (e.g., 2017-2018 vs. 2018-2019). The panel also noted that a single year of NSDUH data is sufficient to produce reliable state estimates.

As mentioned in Section A.2, there was a gap in full-scale data collection between Quarters 1 and 4 of 2020 due to COVID-19, which made the 2020 data not comparable with any other year. It is anticipated that 2021 will become a new baseline for monitoring NSDUH trends, and as a result, state estimates presented here are based on 2021 data alone and are not compared with prior state estimates.

The following products exclude age groups 12 to 17 and 12 or older for past year heroin use because in 2021 heroin use among youth was very rare. Additionally, a suppression rule was applied to the state SAEs and suppressed estimates are noted by an asterisk (*) in the various tables discussed below. Information about the suppression criteria can be found in Section A.4. In addition to this methodology document for the 2021 state estimates, the following products are available at https://www.samhsa.gov/data/nsduh/state-reports-NSDUH-2021:

A.4 Suppression Criteria for State Estimates

Beginning in 2021, suppression is applied to unreliable estimates. The estimates meeting the suppression criteria discussed here are designated as unreliable and are not shown in tables and are noted by asterisks (*). The suppression criterion is based on a combination of the relative standard error (RSE) of the negative of the natural logarithm of p, where p denotes the state by age group level small area estimate, or the negative of the natural logarithm of 1 minus p, where p denotes the state by age group level small area estimate, and the effective sample size (EFN), where p denotes the unbenchmarked small area estimate and natural logarithm of p denotes the natural logarithm of p. For p ≤ 50 percent, an RSE of the negative of the natural logarithm of p, where p denotes the state by age group level small area estimate, is used, and for p > 50 percent, an RSE of the negative of the natural logarithm of 1 minus p, where p denotes the state by age group level small area estimate, is used. The separate formulas for p ≤ 50 percent and p > 50 percent produce a symmetric suppression rule; that is, if p is suppressed, then so will (1 − p). By using the first-order Taylor series approximation method, an estimate of an RSE of the negative of the natural logarithm of p, where p denotes the state by age group level small area estimate, and an RSE of the negative of the natural logarithm of 1 minus p, where p denotes the state by age group level small area estimate, is given by

Equation 4. Click 'D' link to access long description.,     D

where variance of p, where p denotes the state by age group level small area estimate denotes the posterior variance of p. The EFN is defined as The effective sample size is defined as the raw sample size divided by the design effect., where n denotes the raw sample size and design effect is defined as the product of the raw sample size and the posterior variance of p, divided by the product of p and 1 minus p; hence, The effective sample size is calculated as the product of p and 1 minus p divided by the posterior variance of p.. A lower bound of 0.2 also was imposed on the design effects (i.e., all design effects that were less than 0.2 were changed to 0.2) to avoid publishing state by age group estimates with very small sample sizes or small prevalence estimates.

The following criterion was used to suppress state small area estimates:

when p < 5.23 percent, then suppress if an RSE of the negative of the natural logarithm of p, where p denotes the state by age group level small area estimate, > 17.5 percent; when 5.23 percent ≤ p ≤ 94.77 percent, then suppress if the EFN ≤ 68; and when p > 94.77 percent, then suppress if an RSE of the negative of the natural logarithm of 1 minus p > 17.5 percent.

A graph is shown in Figure 1 in order to describe the relationship between p and the EFN for an RSE of the negative of the natural logarithm of p, where p denotes the state by age group level small area estimate, = 17.5 percent when p ≤ 50 percent and for an RSE of the negative of the natural logarithm of 1 minus p = 17.5 percent when p > 50 percent. The suppression criterion switches to EFN between 5.23 percent and 94.77 percent so that the EFN is not allowed to fall below the EFN of 68 required at p = 50 percent.

Figure 1. Small Area Estimate versus Effective Sample Size when the Relative Standard Error Equals 17.5 Percent

Figure 1. Click 'D' link to access long description.     D

A.5 Confidence Intervals and Margins of Error

At the top of each of the 35 tables showing state-level model-based estimates14 is the design-based national estimate along with a 95 percent design-based confidence interval, all of which are based on the survey design, the survey weights, and the reported data. The state estimates are model-based statistics (using SAE methodology) that have been adjusted (benchmarked) such that the population-weighted mean of the estimates across the 50 states and the District of Columbia equals the design-based national estimate. For more details on this benchmarking, see Section B.5. The region-level estimates are also benchmarked and are obtained by taking the population-weighted mean of the associated state-level benchmarked estimates. Associated with each state and regional estimate is a 95 percent Bayesian confidence interval. These intervals indicate the uncertainty in the estimate due to both sampling variability and model fit. For example, the state with the highest estimate of past month use of marijuana for young adults aged 18 to 25 in 2021 was Arizona, with an estimate of 40.0 percent and a 95 percent Bayesian confidence interval that ranged from 31.8 to 48.8 percent (see Table 3 of the state model-based prevalence estimates’ tables [CBHSQ, 2022c]). Assuming that sampling and modeling conditions held, the Bayes posterior probability was 0.95 that the true percentage of past month marijuana use in Arizona for young adults aged 18 to 25 in 2021 was between 31.8 and 48.8 percent. As noted earlier in footnote 8, the term “prediction interval” (PI) was used in the 2004-2005 NSDUH state report (Wright et al., 2007) and prior reports to represent uncertainty in the state and regional estimates. However, that term also is used in other applications to estimate future values of a parameter of interest. That interpretation does not apply to NSDUH state model-based estimates, so PI was replaced with “Bayesian confidence interval.”

“Margin of error” is another term used to describe uncertainty in the estimates. For example, if (lower interval l comma and upper interval u) is a 95 percent symmetric confidence interval for the population proportion (p) and p hat is an estimate of p obtained from the survey data, then the margin of error of p hat is given by (u minus p hat) or (p hat minus l). When (lower interval l comma and upper interval u) is a symmetric confidence interval, (u minus p hat) will be the same as (p hat minus l). The margin of error will vary for each estimate and will be affected not only by the sample size (e.g., the larger the sample, the smaller the margin of error) but also by the sample design (e.g., telephone surveys using random digit dialing and surveys employing a stratified multistage cluster design will, more than likely, produce a different margin of error) (Scheuren, 2004).

The confidence intervals shown in NSDUH state reports are asymmetric, meaning that the distance between the estimate and the lower confidence limit will not be the same as the distance between the upper confidence limit and the estimate. For example, Utah’s 2021 past month marijuana use estimate is 16.0 percent for young adults aged 18 to 25, with a 95 percent Bayesian confidence interval equal to 12.3 to 20.6 percent (see Table 3 of the state model-based prevalence estimates’ tables [CBHSQ, 2022c]). Therefore, Utah’s estimate is 3.7 (i.e., 16.0 – 12.3) percentage points from the lower 95 percent confidence limit and 4.6 (i.e., 20.6 – 16.0) percentage points from the upper limit. These asymmetric confidence intervals work well for small percentages often found in NSDUH state estimate tables and reports while still being appropriate for larger percentages. Some surveys or polls provide only one margin of error for all reported percentages. This single number is usually calculated by setting the sample percentage estimate (p hat) equal to 50 percent, which will produce an upper bound or maximum margin of error. Such an approach would not be feasible in this situation because the NSDUH state estimates vary from less than 1 percent to more than 75 percent; hence, applying a single margin of error to these estimates could significantly overstate or understate the actual precision levels. Therefore, given the differences mentioned above, it is more useful and informative to report the confidence interval for each estimate instead of a margin of error.

When it is indicated that a state has the highest or lowest estimate, it does not imply that the state’s estimate is significantly higher or lower than the next highest or lowest state’s estimate. Additionally, two significantly different state estimates (at the 5 percent level of significance) may have overlapping 95 percent confidence intervals. For details on a more accurate test to compare state estimates, see 2021 National Survey on Drug Use and Health: Comparison of Population Percentages from the United States, Census Regions, States, and the District of Columbia (CBHSQ, forthcoming a).

A.6 Related Substance Use Measures

State estimates are produced for a number of related measures, such as marijuana use in the past month and illicit drug use in the past month, or alcohol use disorder in the past year and needing but not receiving treatment at a specialty facility for alcohol use in the past year. It might appear that one could draw conclusions by subtracting one from the other (e.g., subtracting the percentage who misused pain relievers in the past year from the percentage who misused opioids [misuse of pain relievers or use of heroin] in the past year to find the percentage who used only heroin in the past year but did not misuse pain relievers). Because related measures have been estimated separately with different models, subtracting one measure from another related measure at the state or census region level can give misleading results, perhaps even a “negative” estimate, and should be avoided. Users are advised to view the estimates along with their respective confidence intervals to get a better idea of the range in which the “true” value of the prevalence rate might fall (see Section A.5 for more details).

However, at the national level, because these estimates are design-based estimates, such comparisons can be made. For example, at the national level, subtracting estimates for cigarette use in the past month from the estimates of tobacco use in the past month will give the estimate of people who did not use cigarettes in the past month but used other forms of tobacco, such as cigars, pipes, or smokeless tobacco, in the past month.

A.7 2021 NSDUH Methodological Changes and Implication for Estimates

Similar to the 2020 NSDUH, the COVID-19 pandemic affected data collection for the 2021 NSDUH. The 2021 NSDUH continued the use of multimode data collection procedures that were first implemented in October 2020 for the 2020 NSDUH. Multimode data collection was used for the entire 2021 NSDUH sample; however, the proportion of in-person interviews gradually increased from the beginning to the end of 2021. Even so, the multimode nature of the 2021 NSDUH is an important methodological difference from previous years. This section discusses special methodological issues specific to the 2021 NSDUH. More detailed information can be found in Chapter 6 of the 2021 Methodological Summary and Definitions report (CBHSQ, 2022b).

A.7.1 Methodological Changes

A.7.1.1 Data Collection Mode

Before October 2020, all NSDUH data were collected in person, mainly in respondents’ homes. It was known that the use of multimode data collection for the 2020 NSDUH could affect the validity of comparisons between estimates from 2020 and those from prior years. However, the benefits of including a web-based interview option outweighed this concern, especially given the limitations on in-person data collection imposed by the COVID-19 pandemic. The COVID-19 pandemic forced all in-person data collection for NSDUH to stop in mid-March 2020. Except for a brief data collection period in July 2020 to test in-person safety protocols, data collection did not resume until Quarter 4 of 2020 (i.e., October 2020). The web data collection mode was introduced for NSDUH in Quarter 4 of 2020, and more than 90 percent of interviews in that quarter were conducted via the web. This multimode design continued into 2021, although some modifications were made to the data collection procedures, as discussed in Section 2.2 of the 2021 Methodological Summary and Definitions report (CBHSQ, 2022b). More than three quarters of interviews in Quarter 1 were completed via the web (76.6 percent). By Quarter 4, fewer than half of the interviews (41.5 percent) were completed that way. Altogether, 54.6 percent of the 2021 interviews were completed via the web.

National estimates differed significantly by web and in-person modes of data collection (also known as a “mode effect”). These differences were observed even in analyses that adjusted for demographic characteristics of respondents such as age, gender, race, and Hispanic origin. Consequently, estimates based on both web and in-person interviews were not comparable with estimates based on only one of these data collection modes. Weighting for the demographic characteristics of the sample to match the demographic characteristics of the population only partially adjusts for this difference. Differences between web and in-person respondents for most measures were not consistent across quarters. See Section 6.2.2 of CBHSQ (2022b) for more information about the findings from the assessments of multimode methodological changes in 2021. The estimates based on 2021 data represent an overall average of temporal (over 4 quarters) and data collection mode effects. Due to these methodological differences, it is not recommended to compare 2021 data with data from prior surveys.

A.7.1.2 Use of 2020 Census Data in Weighting

NSDUH person-level weights are calibrated to population estimates for the state and demographic domains provided by the U.S. Census Bureau. For the 2011-2020 NSDUHs, the population estimates used in the poststratification adjustment were based on population estimates projected from the 2010 decennial census. Starting with the 2021 NSDUH, population estimates based on the 2020 decennial census were used in developing the person-level analysis weights.

The 2020 decennial census population estimates represent the current population characteristics more accurately than the population estimates calculated from the 2010 decennial census. As the U.S. Census Bureau noted in a press release on the quality indicators for the 2020 census,15 “Despite all the challenges of the pandemic, the completeness and accuracy of 2020 Census results are comparable with recent censuses.” For more details on how the 2021 NSDUH weights were developed, refer to Section 2.3.4 of CBHSQ (2022b).

A.7.2 Comparisons with Prior Years

The multimode data collection affected all methodological areas such as imputation procedures, weighting procedures, presentation of the data, and analysis and interpretation of the data. Given the impact of methodological differences in 2021, on the estimates (see Sections 6.1 and 6.2 of CBHSQ [2022b]), it was decided that it would not be appropriate to compare estimates from the 2021 NSDUH with those from prior years. For this reason, no statistical comparisons between the 2021 state estimates and estimates from prior years were done.

Section B: State Model-Based Estimation Methodology

B.1 General Model Description

The state small area estimation (SAE) model is a complex mixed16 (including both fixed and random effects) logistic regression model of the following form:

Equation 5. Click 'D' link to access long description.,     D

where pi sub a, i, j, k is the probability of engaging in the behavior of interest (e.g., using marijuana in the past month) for person-k belonging to age group-a in grouped state sampling region (SSR)-j of state-i.17 Let x sub a, i, j, k denote a p sub a times 1 vector of predictor variables (independent variables or fixed effects) associated with age group-a (12 to 17, 18 to 25, 26 to 34, and 35 or older) and beta sub a denote the associated vector of the regression parameters. The age group-specific vectors of the auxiliary variables are defined for every block group in the nation and also include person-level demographic variables, such as race/ethnicity and gender. The vectors of state-level random effects An eta sub i is a transposed vector of values eta sub 1, i and so on until eta sub capital A, i and grouped SSR-level random effects A nu sub i, j is a vector of transposed values nu sub 1, i, j and so on until nu sub capital A, i, j are assumed to be mutually independent with An eta sub i is normally distributed with mean 0 and variance denoted by matrix capital D sub eta and A nu sub i, j is normally distributed with mean 0 and variance denoted by matrix capital D sub nu where capital A is the total number of individual age groups modeled (generally, Capital A equals 4.). For hierarchical Bayes (HB) estimation purposes, an improper uniform prior distribution is assumed for beta sub a, and proper Wishart prior distributions are assumed for inverse of capital D sub eta and inverse of capital D sub nu. The HB solution for pi sub a, i, j, k involves a series of complex Markov Chain Monte Carlo (MCMC) steps to generate values of the desired fixed and random effects from the underlying joint posterior distribution. The basic process is described in Folsom and colleagues (1999); Shah and colleagues (2000); and Wright (2003a, 2003b).

Once the required number of MCMC samples (1,250 in all) for the parameters of interest are generated and tested for convergence properties (see Raftery & Lewis, 1992), the small area estimates for each race/ethnicity × gender cell within a block group can be obtained for each age group. These block group-level small area estimates then can be aggregated using the appropriate population count projections for the desired age group(s) to form state-level small area estimates. These state-level small area estimates are benchmarked to the national design-based estimates as described in Section B.5.

B.2 Measures (Outcomes) Modeled

The following list contains all binary (0,1) measures for which age group-specific state estimates were produced. For all measures listed below, 2021 National Survey on Drug Use and Health (NSDUH) data were used to produce estimates.

  1. illicit drug use in the past month,
  2. marijuana use in the past year,
  3. marijuana use in the past month,
  4. perceptions of great risk from smoking marijuana once a month,
  5. first use of marijuana in the past year among people at risk for initiation of marijuana,18
  6. illicit drug use other than marijuana in the past month,
  7. cocaine use in the past year,
  8. perceptions of great risk from using cocaine once a month,
  9. heroin use in the past year,
  10. perceptions of great risk from trying heroin once or twice,
  11. methamphetamine use in the past year,
  12. prescription pain reliever misuse in the past year,
  13. opioid misuse in the past year19,
  14. alcohol use in the past month,20
  15. binge alcohol use in the past month,21
  16. perceptions of great risk from having five or more drinks of an alcoholic beverage once or twice a week,
  17. tobacco product use in the past month,
  18. cigarette use in the past month,
  19. perceptions of great risk from smoking one or more packs of cigarettes per day,
  20. drug use disorder in the past year,
  21. pain reliever use disorder in the past year,
  22. opioid use disorder in the past year,22
  23. alcohol use disorder in the past year,
  24. substance use disorder (SUD) in the past year,
  25. needing but not receiving treatment for illicit drug use at a specialty facility in the past year,
  26. needing but not receiving treatment for alcohol use at a specialty facility in the past year,
  27. needing but not receiving treatment for substance use at a specialty facility in the past year,
  28. any mental illness (AMI) in the past year,
  29. serious mental illness (SMI) in the past year,
  30. major depressive episode (MDE; i.e., depression) in the past year,
  31. had serious thoughts of suicide in the past year,
  32. made any suicide plans in the past year,
  33. attempted suicide in the past year, and
  34. received mental health services in the past year.

B.3 Predictors Used in Mixed Logistic Regression Models

Local area data used as potential predictor variables in the mixed logistic regression models were obtained from the following sources:

  • Claritas. Claritas23 population projections are used to update age group, gender, and race/ethnicity predictor variables each year.
  • U.S. Census Bureau. The 2010 census (demographic and geographic variables) and 2019 food stamp participation estimates were used (https://www.census.gov/data/datasets/time-series/demo/saipe/model-tables.html). The Census Bureau’s Small Area Income and Poverty Estimates (SAIPE) program obtains Food Stamp program (now known as the Supplemental Nutrition Assistance Program [SNAP]) participation estimates from the U.S. Department of Agriculture, Food and Nutrition Service. Also, the Census Bureau’s 2015-2019 American Community Survey (ACS) 5-year demographic and socioeconomic variables at the tract level and poverty variable at the county level were used (https://www.census.gov/programs-surveys/acs/).
  • Federal Bureau of Investigation (FBI). Uniform Crime Report (UCR) arrest totals were obtained from https://www.icpsr.umich.edu/icpsrweb/NACJD/series/57 exit icon. The most current data used are from 2016 for most counties, with prior years’ data substituted in a few cases.
  • Bureau of Labor Statistics (BLS). The 2021 county-level unemployment estimates were used (https://www.bls.gov/lau/tables.htm). The BLS uses results from the Current Population Survey (CPS) to provide county-level unemployment estimates. The CPS is a monthly survey of households conducted by the Census Bureau for the BLS.
  • Bureau of Economic Analysis (BEA). The 2020 county-level per capita income estimates were used (https://www.bea.gov/data/income-saving/personal-income-county-metro-and-other-areas). These county-level per capita income estimates are produced by the Regional Income Division of the BEA.
  • National Center for Health Statistics (NCHS). Mortality data using International Classification of Diseases, 10th revision (ICD-10), 2012-2017, were used. The ICD-10 death data are from the NCHS at the Centers for Disease Control and Prevention.
  • Substance Abuse and Mental Health Services Administration (SAMHSA), Center for Behavioral Health Statistics and Quality (CBHSQ; formerly the Office of Applied Studies [OAS]). Data were used from the National Survey of Substance Abuse Treatment Services (N-SSATS). The 2017 and 2019 data on drug and alcohol treatment were obtained. Most recent data available on maintenance of effort expenditures, block grant awards, cost of services, and total taxable resources data were also used.

Data sources, along with the description of potential predictor variables obtained from each source, are provided in the following lists.

Claritas Data (Description) Claritas Data (Level)
% Population Aged 0 to 19 in Block Group Block Group
% Population Aged 20 to 24 in Block Group Block Group
% Population Aged 25 to 34 in Block Group Block Group
% Population Aged 35 to 44 in Block Group Block Group
% Population Aged 45 to 54 in Block Group Block Group
% Population Aged 55 to 64 in Block Group Block Group
% Population Aged 65 or Older in Block Group Block Group
% Non-Hispanic Blacks in Block Group Block Group
% Hispanics in Block Group Block Group
% Non-Hispanic Other Races in Block Group Block Group
% Non-Hispanic Whites in Block Group Block Group
% Males in Block Group Block Group
% American Indians, Eskimos, Aleuts in Tract Tract
% Asians, Pacific Islanders in Tract Tract
% Population Aged 0 to 19 in Tract Tract
% Population Aged 20 to 24 in Tract Tract
% Population Aged 25 to 34 in Tract Tract
% Population Aged 35 to 44 in Tract Tract
% Population Aged 45 to 54 in Tract Tract
% Population Aged 55 to 64 in Tract Tract
% Population Aged 65 or Older in Tract Tract
% Non-Hispanic Blacks in Tract Tract
% Hispanics in Tract Tract
% Non-Hispanic Other Races in Tract Tract
% Non-Hispanic Whites in Tract Tract
% Males in Tract Tract
% Population Aged 0 to 19 in County County
% Population Aged 20 to 24 in County County
% Population Aged 25 to 34 in County County
% Population Aged 35 to 44 in County County
% Population Aged 45 to 54 in County County
% Population Aged 55 to 64 in County County
% Population Aged 65 or Older in County County
% Non-Hispanic Blacks in County County
% Hispanics in County County
% Non-Hispanic Other Races in County County
% Non-Hispanic Whites in County County
% Males in County County
American Community Survey (ACS) (Description) ACS Data (Level)
% Population Who Dropped Out of High School Tract
% Housing Units Built in 1940 to 1949 Tract
% Females 16 Years or Older in Labor Force Tract
% Females Never Married Tract
% Females Separated, Divorced, Widowed, or Other Tract
% One-Person Households Tract
% Males 16 Years or Older in Labor Force Tract
% Males Never Married Tract
% Males Separated, Divorced, Widowed, or Other Tract
% Housing Units Built in 1939 or Earlier Tract
Average Number of Persons per Room Tract
% Families below Poverty Level Tract
% Households with Public Assistance Income Tract
% Housing Units Rented Tract
% Population with 9 to 12 Years of School, No High School Diploma Tract
% Population with 0 to 8 Years of School Tract
% Population with Associate’s Degree Tract
% Population with Some College and No Degree Tract
% Population with Bachelor’s, Graduate, Professional Degree Tract
% Housing Units with No Telephone Service Available Tract
% Households with No Vehicle Available Tract
% Population with No Health Insurance Tract
Median Rents for Rental Units Tract
Median Value of Owner-Occupied Housing Units Tract
Median Household Income Tract
% Families below the Poverty Level County
Uniform Crime Report (UCR) Data (Description) UCR Data (Level)
Drug Possession Arrest Rate County
Drug Sale or Manufacture Arrest Rate County
Drug Violations’ Arrest Rate County
Marijuana Possession Arrest Rate County
Marijuana Sale or Manufacture Arrest Rate County
Opium or Cocaine Possession Arrest Rate County
Opium or Cocaine Sale or Manufacture Arrest Rate County
Other Drug Possession Arrest Rate County
Other Dangerous Non-Narcotics Arrest Rate County
Serious Crime Arrest Rate County
Violent Crime Arrest Rate County
Driving under Influence Arrest Rate County
Other Categorical Data (Description) Other Categorical Data (Source) Other Categorical Data (Level)
= 1 if Hispanic, = 0 Otherwise National Survey on Drug Use and Health (NSDUH) Sample Person
= 1 if Non-Hispanic Black, = 0 Otherwise NSDUH Sample Person
= 1 if Non-Hispanic Other, = 0 Otherwise NSDUH Sample Person
= 1 if Male, = 0 if Female NSDUH Sample Person
= 1 if Metropolitan Statistical Area (MSA) with
≥ 1 Million, = 0 Otherwise
2010 Census County
= 1 if MSA with < 1 Million, = 0 Otherwise 2010 Census County
= 1 if Non-MSA Urban, = 0 Otherwise 2010 Census Tract
= 1 if Urban Area, = 0 if Rural Area 2010 Census Tract
= 1 if No Cubans in Tract, = 0 Otherwise 2010 Census Tract
= 1 if No Arrests for Dangerous Non-Narcotics,
= 0 Otherwise
Uniform Crime Report (UCR) County
= 1 if No Arrests for Opium or Cocaine Possession,
= 0 Otherwise
UCR County
= 1 if No Housing Units Built in 1939 or Earlier,
= 0 Otherwise
American Community Survey (ACS) Tract
= 1 if No Housing Units Built in 1940 to 1949,
= 0 Otherwise
ACS Tract
= 1 if No Households with Public Assistance Income,
= 0 Otherwise
ACS Tract
Miscellaneous Data (Description) Miscellaneous Data (Source) Miscellaneous Data (Level)
Alcohol Death Rate, Underlying Cause National Center for Health Statistics’ International Classification of Diseases, 10th revision (NCHS-ICD-10) County
Cigarette Death Rate, Underlying Cause NCHS-ICD-10 County
Drug Death Rate, Underlying Cause NCHS-ICD-10 County
Alcohol Treatment Rate National Survey of Substance Abuse Treatment Services (N-SSATS) County
Alcohol and Drug Treatment Rate N-SSATS County
Drug Treatment Rate N-SSATS County
Unemployment Rate Bureau of Labor Statistics (BLS) County
Per Capita Income (in Thousands) Bureau of Economic Analysis (BEA) County
Average Suicide Rate (per 10,000) NCHS-ICD-10 County
Food Stamp Participation Rate Census Bureau County
Single State Agency Maintenance of Effort National Association of State Alcohol and Drug Abuse Directors (NASADAD) State
Block Grant Awards Substance Abuse and Mental Health Services Administration (SAMHSA) State
Cost of Services Factor Index SAMHSA State
Total Taxable Resources per Capita Index U.S. Department of Treasury State
% Hispanics Who Are Cuban 2010 Census Tract

The predictor variables used in the SAE models were selected from the set of potential predictors given above using the method described in Section B.4.

B.4 Selection of Predictor Variables for the SAE Models

Predictor variable selection was done using the 2021 data for all measures, using the following multistep process:

  1. For each measure, age group-specific24 SAS® stepwise logistic regression models were fit using the sample data (SAS Institute Inc., 2017). The input list to these models included all linear polynomials (constructed from continuous predictor variables) and other categorical or indicator variables given in Section B.3. All significant25 predictors were input to step 2, given as follows.
  2. Using the sample, all significant predictors from step 1 then were input to PROC HPSPLIT to identify significant complex (at most three-way) interaction terms. PROC HPSPLIT is a SAS procedure that uses decision-tree algorithms to build classification systems. The exhaustive chi-squared automatic interaction detector (CHAID) algorithm was used to create the trees.
  3. All the significant variables from step 1, along with their corresponding higher-order polynomials (quadratic and cubic), interaction of gender and race, and the significant interactions detected by PROC HPSPLIT in step 2 then were input to SAS stepwise logistic regression models. All predictors that remained significant26 then were input to step 4 of variable selection.
  4. All significant variables from step 3 were input to fit SUDAAN (RTI International, 2013) logistic regression models, and predictors that remained significant27 were used in the SAE models described in Section B.1. The race and gender predictors were forced in most of the models.

B.5 Benchmarking the Age Group-Specific Small Area Estimates

The self-calibration built into the survey-weighted hierarchical Bayes (SWHB) solution ensures the population-weighted average of the state small area estimates will closely match the national design-based estimates. The national design-based estimates in NSDUH are based entirely on survey-weighted data using a direct estimation approach, whereas the state and census region estimates are model based. Given the self-calibration ensured by the SWHB method, for state reports prior to 2002, the standard Bayes prescription was followed; specifically, the posterior mean was used for the point estimate, and the tail percentiles of the posterior distribution were used for the Bayesian confidence interval limits.

Singh and Folsom (2001) extended Ghosh’s (1992) results on constrained Bayes estimation to include exact benchmarking to design-based national estimates. In the simplest version of this constrained Bayes solution where only the design-based mean is imposed as a benchmarking constraint, each of the 2021 state-by-age group small area estimates is adjusted by adding the common factor Delta sub a is defined as the national design-based estimate, capital D sub a, minus the national model-based small area estimate, capital P sub a., where capital D sub a is the design-based national estimate and capital P sub a is the population-weighted mean of the state small area estimates (Capital P sub s and a) for age group-a. The exactly benchmarked state-s and age group-a small area estimates then are given by The benchmarked state-s and age group-a small area estimate, Theta sub s and a, is defined as the sum of capital P sub s and a and Delta sub a. Experience with such additive adjustments suggests that the resulting exactly benchmarked state small area estimates will always be between 0 and 100 percent because the SWHB self-calibration ensures the adjustment factor is small relative to the size of the state-level small area estimates.

Relative to the Bayes posterior mean, these benchmark-constrained state small area estimates are biased by the common additive adjustment factor. Therefore, the posterior mean squared error (MSE) for each benchmarked state small area estimate has the square of this adjustment factor added to its posterior variance. To achieve the desirable feature of exact benchmarking, this constrained Bayes adjustment factor was implemented for the state-by-age group small area estimates. The associated Bayesian confidence (credible) intervals can be recentered at the benchmarked small area estimates on the logit scale with the symmetric interval end points based on the posterior root mean squared errors (RMSEs). The adjusted 95 percent Bayesian confidence intervals (Lower sub s and a is the lower bound of the 95 percent Bayesian confidence interval of Theta sub s and a; upper sub s and a is the upper bound of the 95 confidence interval of Theta sub s and a.) are defined as follows:

Equation 6. Click 'D' link to access long description.,     D

where

Equation 7. Click 'D' link to access long description.,     D

Equation 8. Click 'D' link to access long description.,     D     and

Equation 9. Click 'D' link to access long description..     D

The associated posterior coverage probabilities for these benchmarked intervals are very close to the prescribed 0.95 value because the state small area estimates have posterior distributions that can be approximated exceptionally well by a Gaussian distribution after the logit transformation.

B.6 Calculation of Estimated Number of Individuals Associated with Each Outcome

Tables 1 to 35 of 2021 National Survey on Drug Use and Health: Model-Based Estimated Totals (in Thousands) (50 States and the District of Columbia) (CBHSQ, forthcoming b) show the estimated numbers of individuals associated with each of the 34 measures of interest. To calculate these numbers, the benchmarked small area estimates and associated 95 percent Bayesian confidence intervals are multiplied by the 2021 population count of the state by the age group of interest (Tables C.1 to C.3 of this methodology document).

For example, past month use of alcohol among 18- to 25-year-olds in Alabama was 39.69 percent in 2021.28 The corresponding Bayesian confidence intervals ranged from 33.62 to 46.10 percent. The population count for 18- to 25-year-olds for 2021 in Alabama was 508,027 (see Table C.2 in Section C of this methodology document). Hence, the estimated number of 18- to 25-year-olds using alcohol in the past month in Alabama was 0.3969 × 508,027, which is 201,636.29 The associated Bayesian confidence intervals ranged from 0.3362 × 508,027 (i.e., 170,799) to 0.4610 × 508,027 (i.e., 234,200). Note that when estimates of the number of individuals are calculated for Tables 1 to 35 in 2021 Model-Based Estimated Totals (CBHSQ, forthcoming b), the unrounded percentages and population counts are used, then the numbers are reported to the nearest thousand. Hence, the number obtained by multiplying the published estimate with the published population estimate may not exactly match the counts published in these tables because of rounding differences.

The only exception to this calculation is the production of the estimated numbers of marijuana initiates. Those estimates cannot be directly calculated as the product of the percentage estimate of first use of marijuana and the population counts available in Section C. That is because the denominator of that percentage estimate is defined as the number of person-years at risk for marijuana initiation, which is a combination of individuals who never used marijuana and one half of the individuals who initiated in the past 24 months (see Section B.8 for more details).

B.7 Calculation of Aggregate Age Group Estimates and Limitations

Tables 1 to 35 of 2021 Model-Based Prevalence Estimates (CBHSQ, 2022c) show estimates for the following age groups: 12 to 17, 18 to 25, 26 or older, 18 or older, and 12 or older. If a user was interested in producing aggregated estimates, such as for those aged 12 to 25, the aggregated estimates could be calculated using prevalence estimates along with the population totals shown in Section C of this document. However, with the information provided in the tables, the confidence intervals cannot be calculated. Below is an example of the calculation of aggregated estimate for a given state.

In 2021, past month use of alcohol in Alabama among youths aged 12 to 17 was 5.96 percent, and among young adults aged 18 to 25 it was 39.69 percent.30 The population counts for 12- to 17-year-olds and 18- to 25-year-olds in 2021 in Alabama were 395,422 and 508,027, respectively (see Table C.2 in Section C of this methodology document). Hence, one would calculate the estimate for people aged 12 to 25 by first finding the number of users aged 12 to 25, which is 225,203 ([0.0596 × 395,422] + [0.3969 × 508,027]), then dividing that number by the population aged 12 to 25, which results in a rate of 24.93 percent (225,203 / [395,422 + 508,027]).

B.8 Calculation of Initiation of Marijuana Use

Initiation31 rates typically are calculated as the number of new initiates of a substance during a period of time (such as in the past year) divided by an estimate of the number of person-years of exposure (in thousands). The initiation definition used here employs a simpler form of the at-risk population based on the model-based methodology. This model-based initiation rate (i.e., first use of marijuana in the past year among people at risk for initiation of marijuana use) is defined as follows:

Equation 10. Click 'D' link to access long description.,     D

where capital X sub 1 is the number of marijuana initiates in the past 24 months, and capital X sub 2 is the number of persons who never used marijuana.

The initiation rate is expressed as a percentage or rate per 100 person-years of exposure. Note that this estimate uses a 2-year time period to accumulate initiation cases from the annual survey. By assuming further that the distribution of first use for the initiation cases is uniform across the 2-year interval, the total number of person-years of exposure is 1 year on average for the initiation cases plus 2 years for all the “never users” at the end of the time period. This approximation to the person-years of exposure permits one to recast the initiation rate as a function of two population prevalence rates—namely, the fraction of people who first used marijuana in the past 2 years and the fraction who had never used marijuana. Both of these prevalence estimates were estimated using the SWHB estimation approach. Note that only initiation rates for marijuana use are provided here.

B.9 Underage Drinking

To obtain small area estimates for people aged 12 to 20 for past month alcohol and binge alcohol use, a separate set of SAE models with predictors selected for the age groups 12 to 17, 18 to 20, 21 to 34, and 35 or older were used. Model-based estimates for people aged 12 to 20 were produced by taking the population-weighted average of the individual age group (12 to 17 and 18 to 20) estimates. Estimates for underage drinking for past month alcohol and binge alcohol use were benchmarked to match national design-based estimates for that age group using the process described in Section B.5.

B.10 Marijuana Use

In the 2021 NSDUH, questions about vaping marijuana were added to the emerging issues section of the questionnaire. Respondents who reported that they vaped anything were asked whether they ever vaped marijuana with a vaping device. Additionally, respondents who answered “yes” to ever vaping marijuana were then asked how long it had been since they last vaped marijuana with a vaping device.

To maintain consistent measures across years where possible, a general principle of editing is not to edit across interview sections (except in situations where answers to questions in a previous section govern skip logic in a later section). However, the introduction to the marijuana section of the interview did not mention the use of marijuana with a vaping device as one of the ways people could use marijuana. Therefore, respondents might not have thought about vaping marijuana when they answered the earlier marijuana questions. For this reason, data from these marijuana vaping questions were incorporated into the marijuana use measures and related measures that include marijuana beginning with the 2021 NSDUH. If respondents reported that they did not use marijuana in the marijuana section of the questionnaire, but they later reported that they vaped marijuana, they were considered to have used marijuana in their lifetime and in the applicable recency period.

For details on marijuana vaping, please refer to Section 3.4.10.3 of CBHSQ (2022b).

B.11 Substance Use Disorder (SUD)

The NSDUH questionnaire includes questions to measure SUD for alcohol and drugs. SUD estimates for drugs and alcohol in the 2021 NSDUH were based on the criteria in the Diagnostic and Statistical Manual of Mental Disorders, 5th edition (DSM-5; American Psychiatric Association [APA], 2013). Respondents were asked SUD questions separately for any drugs or alcohol they used in the 12 months prior to the survey.32

Drugs included marijuana, cocaine (including crack), heroin, hallucinogens, inhalants, methamphetamine, and any use of prescription pain relievers, tranquilizers, stimulants, or sedatives. Beginning in 2021, NSDUH respondents who reported any use of prescription psychotherapeutic drugs (i.e., pain relievers, tranquilizers, stimulants, or sedatives) in the past year (i.e., not just misuse of prescription drugs) were asked the respective SUD questions for that category of prescription drugs.

DSM-5 includes the following SUD criteria (as measured in the 2021 NSDUH):

  1. The substance is often taken in larger amounts or over a longer period than intended.
  2. There is a persistent desire or unsuccessful efforts to cut down or control substance use.
  3. A great deal of time is spent in activities necessary to obtain the substance, use the substance, or recover from its effects.
  4. There is craving, or a strong desire or urge, to use the substance.
  5. There is recurrent substance use resulting in a failure to fulfill major role obligations at work, school, or home.
  6. There is continued substance use despite having persistent or recurrent social or interpersonal problems caused by or exacerbated by the effects of the substance.
  7. Important social, occupational, or recreational activities are given up or reduced because of substance use.
  8. There is recurrent substance use in situations in which it is physically hazardous.
  9. Substance use is continued despite knowledge of having a persistent or recurrent physical or psychological problem that is likely to have been caused or exacerbated by the substance.
  10. There is a need for markedly increased amounts of the substance to achieve intoxication or the desired effect, or markedly diminished effect with continued use of the same amount of the substance (i.e., tolerance).
  11. There are two components of withdrawal symptoms, either of which meet the overall criterion for withdrawal symptoms:
    1. There is a required number of withdrawal symptoms that occur when substance use is cut back or stopped following a period of prolonged use.33
    2. The substance or a related substance is used to get over or avoid withdrawal symptoms.34

For alcohol, marijuana, cocaine, heroin, and methamphetamine, respondents were classified as having an SUD if they had at least 2 of the 11 criteria in a 12-month period. However, respondents were classified as having a hallucinogen use disorder or an inhalant use disorder if they had at least 2 of the first 10 criteria in the past 12 months; the withdrawal criterion does not apply to hallucinogens and inhalants.

For use or misuse of prescription drugs, the applicable DSM-5 criteria for classifying respondents as having a prescription drug use disorder depends on whether respondents misused prescription drugs or used but did not misuse prescription drugs in the past year. If respondents misused prescription drugs in the past year, they were classified as having a prescription drug use disorder if they had at least 2 of the 11 criteria shown. However, if respondents used but did not misuse prescription drugs in the past year, they were classified as having a prescription drug use disorder if they had at least two of the first nine criteria shown above. Criteria 10 (tolerance) and 11 (withdrawal) do not apply to respondents who used but did not misuse these prescription drugs in the past year; tolerance and withdrawal can occur as normal physiological adaptations when people use these prescription drugs appropriately under medical supervision (Hasin et al., 2015).

The following lists the substances and types of use or misuse that are included in the 2021 NSDUH state SAEs:

Illicit drug or alcohol use disorder includes data from past year users of alcohol, marijuana, cocaine, heroin, hallucinogens, inhalants, and methamphetamine, and past year misusers of prescription psychotherapeutic drugs. SAEs for this illicit drug or alcohol use disorder measure are not shown; however, it is relevant to the definition for the need for substance use treatment described in Section B.12.

For more information about the SUD definitions based on criteria from DSM-5, see Section 3.4.3.2 of CBHSQ (2022b).

B.12 Needing But Not Receiving Treatment

The 2021 NSDUH included a series of questions designed to measure treatment need for an alcohol or illicit drug use problem and to determine people needing but not receiving treatment. Respondents were classified as needing substance use treatment in the past year if they met either of the following criteria:

  1. presence of an illicit drug or alcohol use disorder in the past year (see Section B.11 of this report or Section 3.4.3 of CBHSQ [2022b]), or
  2. receipt of treatment at a specialty facility (i.e., drug and alcohol rehabilitation facility [inpatient or outpatient], hospital [inpatient only], or mental health center) in the past year for the use of alcohol or illicit drugs (or both).

For additional details on how respondents were classified as needing substance use treatment, see Section 3.4.4.1 of CBHSQ (2022b).

B.13 Mental Health Measures

This section provides a summary of the measurement issues associated with six mental health outcome variables such as mental illness, depression, and suicidal thoughts and behaviors. Additional details can be found in Sections 3.4.6, 3.4.7, and 3.4.14 of CBHSQ (2022b).

B.13.1 Mental Illness

In the 2000-2001 and 2002-2003 NSDUH state SAE reports (Wright, 2003a, 2003b; Wright & Sathe, 2005), the Kessler-6 (K6) distress scale was used to measure SMI (Kessler et al., 2003). However, SAMHSA discontinued producing state-level SMI estimates beginning with the release of the 2003-2004 state report (Wright & Sathe, 2006) because of concerns about the validity of using only the K6 distress scale without an impairment scale; see Section B.4.4 in Appendix B of the 2004 NSDUH national findings report (OAS, 2005). The use of the K6 distress scale continued in the 2003-2004 and the 2004-2005 state reports (Wright & Sathe, 2006; Wright et al., 2007), not as a measure of SMI but as a measure of serious psychological distress (SPD) because it was determined that the K6 scale measured only SPD and merely contributed to measuring SMI and AMI (see the details that follow).

In December 2006, a new technical advisory group was convened by SAMHSA’s OAS (which later became CBHSQ) and the Center for Mental Health Services to solicit recommendations for data collection strategies to address SAMHSA’s legislative requirements. Although the technical advisory group recognized the ideal way to estimate SMI in NSDUH would be to administer a clinical diagnostic interview annually to all 45,000 adult respondents, this approach was not feasible because of constraints on the interview time and the need for trained mental health clinicians to conduct the interviews. Therefore, the approach recommended by the technical advisory group and adopted by SAMHSA for NSDUH was to use short scales in the NSDUH interview that separately measure psychological distress and functional impairment for use in a statistical model that predicts whether a respondent had mental illness.

To accomplish this, SAMHSA’s CBHSQ initiated a Mental Health Surveillance Study (MHSS) in 2008 as part of NSDUH to develop and implement methods to estimate SMI. Models using the short scales for psychological distress and impairment to predict mental illness status were developed from a subsample of adult respondents who had completed the NSDUH interview and were administered a clinical psychological diagnostic interview soon afterward. For the clinical interview data, people were classified as having SMI if they had a diagnosable mental, behavioral, or emotional disorder in the past 12 months, other than a developmental disorder or SUD, that met the 4th edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV) criteria (APA, 1994) and resulted in substantial functional impairment. This estimation methodology was implemented in the 2008 NSDUH (for details on the 2008 model, see Section 3.4.7.2 of CBHSQ [2021b]).

Based on recommendations from this panel, estimates of SMI were presented based on a revised methodology; thus, they were not comparable with estimates for SMI or SPD shown in NSDUH state reports prior to 2009. However, in 2013, another revision to the methodology for creating SMI estimates was made, and the estimates presented for 2011 and 2012 were based on this revised methodology (and therefore are not comparable with previously published estimates of SMI). Thus, the 2008-2009, 2009-2010, and 2010-2011 SMI estimates were reproduced using the new 2013 methodology. The 2013 methodology refers to the 2012 model described as follows.

Clinical Measurement of Mental Illness. Mental illness was measured in the MHSS clinical interviews using an adapted version of the Structured Clinical Interview for the DSM-IV-TR Axis I Disorders, Research Version, Non-patient Edition (SCID) (First et al., 2002) and was differentiated by the level of functional impairment based on the Global Assessment of Functioning (GAF) scale (Endicott et al., 1976).36 Past year disorders assessed through the SCID included mood disorders (e.g., MDE, manic episode), anxiety disorders (e.g., panic disorder, generalized anxiety disorder, posttraumatic stress disorder), eating disorders (e.g., anorexia nervosa), intermittent explosive disorder, and adjustment disorder. In addition, the presence of psychotic symptoms was assessed. SUDs also were assessed, although these disorders were not used to produce estimates of mental illness.

The SCID and the GAF in combination were considered to be the “gold standard” for measuring mental illness.

The 2012 SMI Model. The 2012 SMI prediction model was fit with data from 4,912 MHSS respondents from 2008 through 2012. For more information about the instruments and items used to measure the variables employed in the 2012 model, see Sections 3.4.6.4 through 3.4.6.6 of CBHSQ (2022b). Specifically, in CBHSQ (2022b), the instrument used to measure mental illness in the clinical interviews is described, followed by descriptions of the scales and items in the main NSDUH interviews that were used as predictor variables in the model (i.e., the K6 and World Health Organization Disability Assessment Schedule [WHODAS] total scores, age, MDE, and suicidal thoughts). The response variable Y equaled 1 when an SMI diagnosis was positive based on the clinical interview; otherwise, Y was 0. Letting X be a vector of characteristics attached to a NSDUH respondent and letting the probability that this respondent had SMI be Pi equals the probability of capital Y equals 1 given capital X, where capital X is the vector of explanatory variables, the 2012 SMI prediction model was

Equation 11. Click 'D' link to access long description.,     D

where pi hat refers to an estimate of the SMI response probability pi.

The covariates in equation (1) came from the main NSDUH interview data:

A cut point probability pi sub zero was determined, so that if Pi hat is greater than or equal to pi sub zero for a particular respondent, then the respondent was predicted to be SMI positive; otherwise, the respondent was predicted to be SMI negative. The cut point (0.260573529) was chosen so that the weighted number of false positives and false negatives in the MHSS dataset were as close to equal as possible. To produce state estimates for SMI, the predicted SMI status for all adult NSDUH respondents was used in SAE modeling as the dependent variables.

A second cut point probability (0.0192519810) was determined so that any respondent with an SMI probability greater than or equal to the cut point was predicted to be positive for AMI, and the remaining respondents were predicted to be negative for AMI. The second cut point was chosen so that the weighted numbers of AMI false positives and false negatives were as close to equal as possible.

Starting in 2021, the measures used in the mental illness models were all imputed. Therefore, the source variables to create the measures of AMI and SMI had no missing data.

B.13.2 Major Depressive Episode (Depression)

Two sections related to MDE were included in the 2021 questionnaire: an adult depression section and an adolescent depression section. These sections were originally derived from DSM-IV criteria for MDE. Consistent with the more recent criteria in DSM-5, NSDUH does not exclude MDEs occurring exclusively in the context of bereavement.

Questions on depression permit estimates of MDE to be calculated. Separate sections were administered to adults aged 18 or older and youths aged 12 to 17. The adult questions were adapted from the depression section of the National Comorbidity Survey Replication (NCS-R), and the questions for youths were adapted from the depression section of the National Comorbidity Survey Replication Adolescent Supplement (NCS-A) (see https://www.hcp.med.harvard.edu/ncs/ exit icon). To make the sections developmentally appropriate for youths, there are minor wording differences in a few questions between the adult and youth sections. Revisions to the questions in both sections were made primarily to reduce the length and to modify the NCS questions, which are interviewer administered, for self-administration in NSDUH.

According to DSM-5, a person is classified as having had an MDE37 in their lifetime if they had at least five or more of the following nine symptoms nearly every day (except where noted) in the same 2-week period, where at least one of the symptoms is a depressed mood or loss of interest or pleasure in daily activities: (1) depressed mood most of the day; (2) markedly diminished interest or pleasure in all or almost all activities most of the day; (3) significant weight loss when not sick or dieting, or weight gain when not pregnant or growing, or decrease or increase in appetite; (4) insomnia or hypersomnia; (5) psychomotor agitation or retardation at a level observable by others; (6) fatigue or loss of energy; (7) feelings of worthlessness or excessive or inappropriate guilt; (8) diminished ability to think or concentrate or indecisiveness; and (9) recurrent thoughts of death or suicidality (i.e., recurrent suicidal ideation without a specific plan, making a specific plan, or making an attempt). Unlike the other symptoms listed previously, recurrent thoughts of death or suicidality did not need to have occurred nearly every day (APA, 2013).

Respondents who have had an MDE in their lifetime are asked if, during the past 12 months, they had a period of depression lasting 2 weeks or longer while also having some of the other symptoms mentioned. Respondents reporting experiences consistent with them having had an MDE in the past year are asked questions from the SDS to measure the level of functional impairment in major life activities reported to be caused by the MDE in the past 12 months (Leon et al., 1997).

Starting in 2021, the variables for MDE among adults were statistically imputed. MDE variables were not statistically imputed for youths aged 12 to 17.

B.13.3 Suicidal Thoughts and Behavior

The 2021 NSDUH included sets of questions asking adults aged 18 or older whether they had serious thoughts of suicide, made any suicide plans, or had attempted suicide in the past 12 months. All adult respondents in 2021 were asked whether they made a suicide plan or attempted suicide regardless of whether they reported that they had serious thoughts of suicide in the past 12 months. Additionally, beginning in 2021, the variables for suicidal thoughts and behaviors among adults were statistically imputed, so these variables had no missing data for 2021.

Section C: Sample Sizes, Response Rates, and Population Estimates

Table C.1 – Sample Sizes, Weighted Screening and Interview Response Rates, and Population Estimates; by State, for Persons Aged 12 or Older: 2021
State Total
Selected DUs
Total
Eligible
DUs
Total
Completed
Screeners
Weighted DU
Screening
Response Rate
Total
Selected
People
Total
Respondents
Population
Estimate
Weighted
Interview
Response
Rate
Weighted
Overall
Response
Rate
Total U.S. 1,138,830 1,021,720 220,740 22.21% 152,220 69,850 279,843,944 46.24% 10.27%
Northeast 219,370 197,530 39,510 19.13% 26,410 11,850 48,930,520 46.81% 8.95%
Midwest 265,410 239,590 58,010 25.59% 39,120 17,820 58,022,805 45.58% 11.66%
South 393,790 347,940 72,890 21.51% 48,810 23,470 106,587,154 48.25% 10.38%
West 260,260 236,660 50,330 22.49% 37,870 16,720 66,303,465 43.16% 9.71%
Alabama 17,460 15,560 4,600 32.05% 2,940 1,160 4,242,820 40.38% 12.94%
Alaska 16,680 14,040 2,910 19.40% 2,050 980 585,924 48.29% 9.37%
Arizona 18,390 15,980 3,180 19.11% 2,310 970 6,155,311 42.49% 8.12%
Arkansas 17,150 14,090 3,000 22.41% 2,000 890 2,525,675 47.35% 10.61%
California 60,030 57,390 11,530 21.32% 9,730 4,080 33,100,274 41.59% 8.87%
Colorado 17,010 14,980 3,810 25.72% 2,670 1,170 4,925,859 46.95% 12.08%
Connecticut 17,140 16,100 2,980 19.31% 2,020 860 3,106,746 46.04% 8.89%
Delaware 16,560 14,390 3,330 23.59% 2,010 950 854,842 47.68% 11.25%
District of Columbia 19,060 17,200 3,580 23.03% 1,340 770 569,400 60.05% 13.83%
Florida 53,970 46,780 8,870 20.42% 5,730 2,620 18,725,406 46.87% 9.57%
Georgia 23,370 21,710 4,870 22.93% 3,910 1,870 9,002,387 46.22% 10.60%
Hawaii 18,810 17,150 3,810 21.96% 2,800 1,070 1,183,093 38.42% 8.44%
Idaho 15,500 13,670 2,670 18.57% 1,890 870 1,583,643 48.27% 8.96%
Illinois 40,010 37,410 6,990 19.76% 5,170 2,080 10,711,240 39.18% 7.74%
Indiana 16,730 14,940 3,590 26.38% 2,610 1,250 5,693,016 49.18% 12.98%
Iowa 16,510 15,150 3,570 27.61% 2,310 1,070 2,682,262 46.23% 12.76%
Kansas 16,460 14,960 4,780 35.29% 3,650 1,670 2,422,699 43.18% 15.24%
Kentucky 18,950 16,570 3,720 24.81% 2,240 1,120 3,772,667 51.47% 12.77%
Louisiana 17,530 14,790 3,510 25.28% 2,460 1,060 3,822,416 42.72% 10.80%
Maine 16,400 13,640 3,790 26.49% 2,110 960 1,200,075 51.13% 13.55%
Maryland 19,430 18,250 4,390 26.52% 2,980 1,500 5,187,063 46.95% 12.45%
Massachusetts 18,430 17,190 3,150 20.47% 2,150 920 6,050,007 43.48% 8.90%
Michigan 41,490 37,110 9,880 28.43% 6,220 2,920 8,576,889 49.58% 14.10%
Minnesota 15,770 14,540 3,030 20.87% 1,900 860 4,804,648 45.98% 9.60%
Mississippi 17,350 14,930 2,660 19.23% 1,870 1,000 2,449,265 53.02% 10.20%
Missouri 16,460 14,240 3,100 22.96% 1,870 860 5,181,361 47.46% 10.90%
Montana 15,490 13,480 2,410 16.77% 1,450 690 939,222 53.48% 8.97%
Nebraska 15,210 13,680 3,280 25.55% 2,400 1,140 1,622,425 46.28% 11.83%
Nevada 18,250 17,240 3,120 19.28% 2,480 1,160 2,657,845 47.59% 9.17%
New Hampshire 15,480 13,750 3,400 26.69% 2,260 990 1,214,720 47.04% 12.55%
New Jersey 26,630 24,450 4,230 16.28% 3,190 1,280 7,869,140 43.72% 7.12%
New Mexico 15,260 13,430 2,580 20.00% 1,970 1,020 1,780,887 50.33% 10.07%
New York 50,790 46,140 7,530 18.00% 5,700 2,700 16,916,914 47.99% 8.64%
North Carolina 26,290 23,660 4,170 17.75% 2,580 1,320 8,880,603 50.02% 8.88%
North Dakota 14,150 12,390 2,830 22.75% 1,900 920 634,226 46.99% 10.69%
Ohio 40,990 37,900 9,810 25.72% 6,320 2,730 9,939,581 42.69% 10.98%
Oklahoma 16,930 15,140 2,930 19.38% 2,080 970 3,287,372 50.95% 9.87%
Oregon 16,390 15,290 4,800 33.63% 3,010 1,280 3,658,684 41.53% 13.97%
Pennsylvania 40,940 38,040 7,070 19.38% 4,590 2,080 11,058,561 48.17% 9.33%
Rhode Island 17,180 14,320 2,410 18.46% 1,520 690 947,403 48.13% 8.88%
South Carolina 17,680 15,170 2,490 16.87% 1,580 840 4,391,328 56.27% 9.49%
South Dakota 15,350 13,060 2,430 19.15% 1,730 840 734,115 47.50% 9.09%
Tennessee 18,080 16,120 2,780 18.93% 1,850 860 5,877,872 51.17% 9.69%
Texas 51,070 45,760 8,140 18.22% 6,810 3,140 24,239,644 47.23% 8.60%
Utah 16,610 15,110 3,850 24.61% 3,690 1,830 2,707,190 53.06% 13.06%
Vermont 16,380 13,910 4,960 34.05% 2,890 1,380 566,956 54.70% 18.63%
Virginia 26,470 23,390 6,600 29.25% 4,570 2,470 7,229,456 52.42% 15.33%
Washington 16,350 15,060 3,690 26.89% 2,580 940 6,539,450 38.71% 10.41%
West Virginia 16,440 14,440 3,250 23.93% 1,870 940 1,528,938 44.30% 10.60%
Wisconsin 16,280 14,210 4,740 34.47% 3,060 1,490 5,020,343 51.93% 17.90%
Wyoming 15,500 13,850 1,980 12.64% 1,260 670 486,083 57.37% 7.25%
DU = dwelling unit.
Source: SAMHSA, Center for Behavioral Health Statistics and Quality, National Survey on Drug Use and Health, 2021.
Table C.2 – Sample Sizes, Weighted Interview Response Rates, and Population Estimates; by State and Three Age Groups: 2021
State 12-17
Total
Selected
People
12-17
Total
Respondents
12-17
Population
Estimate
12-17
Weighted
Interview
Response
Rate
18-25
Total
Selected
People
18-25
Total
Respondents
18-25
Population
Estimate
18-25
Weighted
Interview
Response
Rate
26+
Total
Selected
People
26+
Total
Respondents
26+
Population
Estimate
26+
Weighted
Interview
Response
Rate
Total U.S. 35,430 13,270 26,019,281 38.40% 37,180 16,540 33,458,433 43.01% 79,610 40,040 220,366,229 47.63%
Northeast 5,950 2,120 4,144,411 34.44% 6,400 2,710 5,724,517 42.10% 14,060 7,020 39,061,593 48.76%
Midwest 8,980 3,140 5,482,975 36.86% 9,750 4,330 7,102,761 41.54% 20,400 10,350 45,437,069 47.27%
South 11,850 4,890 10,151,564 42.14% 11,760 5,590 12,696,092 46.49% 25,210 12,980 83,739,498 49.24%
West 8,660 3,120 6,240,332 36.29% 9,280 3,910 7,935,064 39.40% 19,930 9,690 52,128,069 44.53%
Alabama 700 240 395,422 38.51% 760 310 508,027 38.64% 1,470 610 3,339,372 40.88%
Alaska 500 190 59,073 34.56% 480 230 65,965 47.52% 1,080 570 460,886 50.14%
Arizona 530 210 581,282 38.11% 580 230 759,837 37.10% 1,190 530 4,814,192 43.72%
Arkansas 420 120 248,172 25.30% 550 240 309,645 43.32% 1,030 520 1,967,858 50.90%
California 2,220 830 3,110,181 38.04% 2,320 950 3,976,766 39.42% 5,190 2,300 26,013,327 42.34%
Colorado 570 190 446,231 33.37% 740 300 587,055 40.27% 1,350 680 3,892,572 49.51%
Connecticut 490 160 272,723 28.61% 410 160 378,772 37.45% 1,120 540 2,455,251 49.26%
Delaware 450 180 73,577 35.30% 590 260 92,181 44.50% 970 500 689,084 49.61%
District of Columbia 350 150 33,728 44.53% 330 210 76,598 69.14% 660 410 459,074 59.72%
Florida 1,340 530 1,515,024 41.09% 1,350 640 1,926,018 47.66% 3,040 1,450 15,284,364 47.31%
Georgia 890 420 913,170 45.96% 960 470 1,118,065 51.00% 2,060 990 6,971,153 45.54%
Hawaii 580 170 97,954 33.46% 640 250 117,389 38.48% 1,590 650 967,749 38.88%
Idaho 570 190 170,496 33.37% 370 160 194,262 43.12% 950 520 1,218,885 51.37%
Illinois 1,190 350 1,010,183 32.40% 1,260 470 1,275,485 33.86% 2,720 1,270 8,425,572 40.78%
Indiana 550 230 560,432 47.08% 680 310 735,781 47.67% 1,380 710 4,396,803 49.68%
Iowa 610 220 260,546 34.20% 480 230 352,606 42.98% 1,220 620 2,069,110 48.12%
Kansas 770 310 248,490 41.15% 980 430 318,285 38.36% 1,900 940 1,855,924 44.33%
Kentucky 550 220 355,789 47.12% 570 270 457,909 46.67% 1,130 630 2,958,969 52.72%
Louisiana 570 220 374,804 41.52% 550 220 453,214 36.20% 1,340 620 2,994,399 43.87%
Maine 500 180 92,597 35.99% 520 210 121,056 43.71% 1,080 560 986,422 53.57%
Maryland 750 340 476,715 40.81% 730 350 586,757 46.07% 1,500 810 4,123,592 47.80%
Massachusetts 430 120 489,322 29.34% 550 220 774,583 41.07% 1,170 580 4,786,102 45.26%
Michigan 1,510 570 770,114 36.06% 1,450 640 1,046,218 43.95% 3,270 1,710 6,760,557 52.09%
Minnesota 480 150 463,273 33.40% 430 200 563,299 43.15% 1,000 520 3,778,076 47.90%
Mississippi 490 230 251,241 46.26% 380 190 303,082 48.23% 1,010 580 1,894,942 54.58%
Missouri 440 140 486,555 38.16% 460 210 621,371 46.49% 960 510 4,073,435 48.78%
Montana 370 110 83,053 31.96% 350 140 111,204 37.19% 730 440 744,965 58.44%
Nebraska 520 170 167,182 34.51% 610 300 212,264 45.43% 1,270 670 1,242,979 47.96%
Nevada 570 230 246,567 45.27% 620 280 284,837 42.71% 1,290 640 2,126,441 48.52%
New Hampshire 510 190 95,411 41.38% 550 220 138,311 43.83% 1,190 570 980,998 48.02%
New Jersey 840 270 719,971 30.39% 870 350 872,319 40.15% 1,480 660 6,276,850 45.78%
New Mexico 440 200 172,756 39.36% 510 240 214,873 43.83% 1,020 580 1,393,259 52.55%
New York 1,140 440 1,407,192 38.11% 1,350 660 1,975,502 46.90% 3,210 1,600 13,534,221 49.11%
North Carolina 680 290 823,515 45.43% 540 270 1,059,223 44.65% 1,370 760 6,997,864 51.29%
North Dakota 430 140 60,339 34.11% 460 230 90,326 50.09% 1,010 550 483,561 47.91%
Ohio 1,450 500 919,931 35.74% 1,650 690 1,182,393 36.73% 3,220 1,540 7,837,257 44.42%
Oklahoma 550 210 334,862 35.05% 510 210 420,050 45.25% 1,020 550 2,532,461 54.30%
Oregon 620 210 308,408 30.30% 830 330 407,507 40.46% 1,560 740 2,942,769 42.84%
Pennsylvania 1,040 380 949,324 35.65% 1,160 460 1,267,925 37.60% 2,390 1,230 8,841,312 51.05%
Rhode Island 350 100 74,798 29.68% 320 130 122,791 43.35% 850 460 749,814 50.39%
South Carolina 380 160 399,457 43.04% 390 200 502,860 56.94% 810 480 3,489,012 57.60%
South Dakota 410 130 74,669 36.47% 440 220 91,353 43.36% 890 500 568,093 49.78%
Tennessee 490 170 542,631 35.79% 470 230 689,135 48.28% 890 470 4,646,107 53.41%
Texas 1,590 650 2,625,037 41.96% 1,610 730 3,151,861 44.25% 3,600 1,760 18,462,746 48.51%
Utah 880 360 337,975 42.07% 850 390 428,034 45.56% 1,960 1,080 1,941,181 56.75%
Vermont 650 270 43,072 42.41% 660 290 73,259 44.33% 1,580 830 450,625 57.47%
Virginia 1,140 580 659,759 50.18% 1,080 590 869,866 50.21% 2,340 1,310 5,699,831 52.98%
Washington 570 140 578,426 24.67% 640 210 730,514 31.78% 1,370 590 5,230,511 41.28%
West Virginia 500 210 128,663 45.76% 420 220 171,603 51.22% 950 510 1,228,673 43.14%
Wisconsin 620 250 461,262 39.96% 870 420 613,380 46.87% 1,570 820 3,945,702 54.20%
Wyoming 250 <100 47,929 41.78% 360 200 56,821 53.63% 650 380 381,333 59.89%
NOTE: Computations in this table are based on a respondent’s age at screening. Thus, the data in the Total Respondents column(s) could differ from data in other National Survey on Drug Use and Health tables that use the respondent’s age recorded during the interview.
Source: SAMHSA, Center for Behavioral Health Statistics and Quality, National Survey on Drug Use and Health, 2021.
Table C.3 – Sample Sizes, Weighted Interview Response Rates, and Population Estimates; by State and Two Age Groups: 2021
State 12-20
Total Selected
People
12-20
Total
Respondents
12-20
Population
Estimate
12-20
Weighted
Interview
Response Rate
18+
Total Selected
People
18+
Total
Respondents
18+
Population
Estimate
18+
Weighted
Interview
Response Rate
Total U.S. 48,710 19,030 39,093,526 39.90% 116,790 56,580 253,824,662 47.03%
Northeast 8,270 3,100 6,472,180 37.18% 20,460 9,730 44,786,109 47.93%
Midwest 12,440 4,590 8,144,512 37.66% 30,150 14,670 52,539,830 46.49%
South 16,030 6,850 15,096,995 43.65% 36,970 18,570 96,435,590 48.88%
West 11,960 4,490 9,379,838 37.60% 29,220 13,600 60,063,133 43.86%
Alabama 970 350 588,696 39.18% 2,240 910 3,847,398 40.57%
Alaska 670 270 88,221 40.59% 1,550 790 526,851 49.80%
Arizona 720 270 885,858 38.12% 1,770 770 5,574,029 42.90%
Arkansas 630 210 370,837 30.66% 1,590 770 2,277,503 49.88%
California 3,100 1,220 4,720,697 39.12% 7,510 3,250 29,990,092 41.95%
Colorado 830 290 688,166 35.27% 2,090 980 4,479,628 48.30%
Connecticut 640 210 378,959 29.13% 1,520 700 2,834,023 47.71%
Delaware 680 290 108,581 38.65% 1,560 770 781,265 48.93%
District of Columbia 430 200 54,718 52.84% 990 620 535,672 61.12%
Florida 1,850 780 2,422,602 44.88% 4,390 2,090 17,210,381 47.35%
Georgia 1,240 590 1,390,797 48.88% 3,020 1,460 8,089,218 46.25%
Hawaii 800 260 144,468 36.79% 2,220 900 1,085,138 38.84%
Idaho 700 230 227,513 33.45% 1,320 680 1,413,147 50.20%
Illinois 1,650 500 1,520,941 31.86% 3,980 1,730 9,701,057 39.89%
Indiana 800 330 821,526 45.61% 2,050 1,020 5,132,584 49.40%
Iowa 790 310 410,952 37.82% 1,700 850 2,421,716 47.42%
Kansas 1,120 460 355,573 40.05% 2,880 1,360 2,174,209 43.42%
Kentucky 740 300 536,035 46.20% 1,700 900 3,416,877 51.93%
Louisiana 770 300 556,021 40.82% 1,890 850 3,447,613 42.84%
Maine 700 250 134,837 37.63% 1,600 780 1,107,478 52.45%
Maryland 1,020 460 692,128 43.95% 2,230 1,160 4,710,349 47.57%
Massachusetts 620 190 829,183 32.25% 1,720 790 5,560,684 44.71%
Michigan 2,030 790 1,152,891 38.26% 4,710 2,360 7,806,775 50.95%
Minnesota 660 220 699,739 36.97% 1,420 720 4,341,374 47.28%
Mississippi 630 290 334,606 43.55% 1,380 770 2,198,024 53.75%
Missouri 590 200 678,153 39.73% 1,420 720 4,694,806 48.49%
Montana 480 150 115,066 29.23% 1,080 580 856,169 55.51%
Nebraska 730 280 250,908 38.62% 1,880 960 1,455,243 47.58%
Nevada 800 340 365,145 45.57% 1,910 930 2,411,278 47.84%
New Hampshire 720 270 143,496 41.55% 1,750 790 1,119,308 47.51%
New Jersey 1,140 410 1,123,258 36.55% 2,350 1,010 7,149,169 45.08%
New Mexico 620 270 236,762 39.24% 1,530 820 1,608,131 51.43%
New York 1,610 690 2,213,206 41.20% 4,560 2,260 15,509,722 48.85%
North Carolina 840 370 1,114,671 44.18% 1,900 1,030 8,057,088 50.49%
North Dakota 580 210 91,205 38.08% 1,470 780 573,887 48.26%
Ohio 2,030 730 1,390,643 35.59% 4,870 2,230 9,019,650 43.40%
Oklahoma 730 270 470,056 37.67% 1,530 760 2,952,510 52.90%
Oregon 900 330 452,306 34.68% 2,390 1,070 3,350,276 42.56%
Pennsylvania 1,490 560 1,452,682 36.16% 3,550 1,700 10,109,237 49.33%
Rhode Island 460 150 124,350 31.78% 1,170 590 872,604 49.57%
South Carolina 530 230 575,655 45.15% 1,200 680 3,991,871 57.52%
South Dakota 530 180 109,996 39.70% 1,330 720 659,446 48.81%
Tennessee 630 220 776,875 37.05% 1,360 700 5,335,241 52.73%
Texas 2,200 920 3,939,985 42.71% 5,220 2,490 21,614,607 47.87%
Utah 1,170 490 496,773 43.63% 2,810 1,470 2,369,215 54.65%
Vermont 900 380 72,209 42.31% 2,240 1,110 523,883 55.67%
Virginia 1,490 780 975,161 51.78% 3,420 1,890 6,569,697 52.63%
Washington 800 210 888,724 26.91% 2,010 800 5,961,025 40.10%
West Virginia 650 290 189,574 46.84% 1,370 730 1,400,276 44.16%
Wisconsin 930 390 661,986 40.26% 2,430 1,240 4,559,082 53.22%
Wyoming 380 160 70,139 44.96% 1,010 580 438,154 59.02%
NOTE: Computations in this table are based on a respondent’s age at screening. Thus, the data in the Total Respondents column(s) could differ from data in other National Survey on Drug Use and Health tables that use the respondent’s age recorded during the interview.
Source: SAMHSA, Center for Behavioral Health Statistics and Quality, National Survey on Drug Use and Health, 2021.

Section D: References

American Psychiatric Association. (1994). Diagnostic and statistical manual of mental disorders (DSM-IV) (4th ed.).

American Psychiatric Association. (2013). Diagnostic and statistical manual of mental disorders (DSM-5) (5th ed.). https://doi.org/10.1176/appi.books.9780890425787 exit icon

Center for Behavioral Health Statistics and Quality. (2021b). 2020 National Survey on Drug Use and Health: Methodological summary and definitions. https://www.samhsa.gov/data/report/2020-methodological-summary-and-definitions

Center for Behavioral Health Statistics and Quality. (2022a). 2021 National Survey on Drug Use and Health: Methodological resource book. Substance Abuse and Mental Health Services Administration. https://www.samhsa.gov/data/report/nsduh-2021-methodological-resource-book-mrb

Center for Behavioral Health Statistics and Quality. (2022b). 2021 National Survey on Drug Use and Health: Methodological summary and definitions. https://www.samhsa.gov/data/report/2021-methodological-summary-and-definitions

Center for Behavioral Health Statistics and Quality. (2022c). 2021 National Survey on Drug Use and Health: Model-based prevalence estimates (50 states and the District of Columbia). https://www.samhsa.gov/data/report/2021-nsduh-state-prevalence-estimates

Center for Behavioral Health Statistics and Quality. (forthcoming a). 2021 National Survey on Drug Use and Health: Comparison of population percentages from the United States, census regions, states, and the District of Columbia. Substance Abuse and Mental Health Services Administration.

Center for Behavioral Health Statistics and Quality. (forthcoming b). 2021 National Survey on Drug Use and Health: Model-based estimated totals (in thousands). Substance Abuse and Mental Health Services Administration.

Center for Systems Science and Engineering, Johns Hopkins University (2021). Coronavirus resource center: Global map: COVID-19 dashboard. https://coronavirus.jhu.edu/map.html exit icon

Endicott, J., Spitzer, R. L., Fleiss, J. L., & Cohen, J. (1976). The Global Assessment Scale: A procedure for measuring overall severity of psychiatric disturbance. Archives of General Psychiatry, 33(6), 766-771. https://doi.org/10.1001/archpsyc.1976.01770060086012 exit icon

First, M. B., Spitzer, R. L., Gibbon, M., & Williams, J. B. W. (2002). Structured Clinical Interview for DSM-IV-TR Axis I Disorders, Research Version, Non-patient Edition (SCID-I/NP). New York State Psychiatric Institute, Biometrics Research.

Folsom, R. E., Shah, B., & Vaish, A. (1999). Substance abuse in states: A methodological report on model based estimates from the 1994-1996 National Household Surveys on Drug Abuse. In Proceedings of the 1999 Joint Statistical Meetings, American Statistical Association, Survey Research Methods Section, Baltimore, MD (pp. 371-375). American Statistical Association.

Ghosh, M. (1992). Constrained Bayes estimation with applications. Journal of the American Statistical Association, 87(418), 533-540. https://doi.org/10.2307/2290287 exit icon

Hasin, D. S., Greenstein, E., Aivadyan, C., Stohl, M., Aharonovich, E., Saha, T., Goldstein, R., Nunes, E. V., Jung, J., Zhang, H., & Grant, B. F. (2015). The Alcohol Use Disorder and Associated Disabilities Interview Schedule-5 (AUDADIS-5): Procedural validity of substance use disorders modules through clinical re-appraisal in a general population sample. Drug and Alcohol Dependence, 148, 40-46. https://doi.org/10.1016/j.drugalcdep.2014.12.011 exit icon

Kessler, R. C., Barker, P. R., Colpe, L. J., Epstein, J. F., Gfroerer, J. C., Hiripi, E., Howes, M. J., Normand, S. L., Manderscheid, R. W., Walters, E. E., & Zaslavsky, A. M. (2003). Screening for serious mental illness in the general population. Archives of General Psychiatry, 60(2), 184-189. https://doi.org/10.1001/archpsyc.60.2.184 exit icon

Leon, A. C., Olfson, M., Portera, L., Farber, L., & Sheehan, D. V. (1997). Assessing psychiatric impairment in primary care with the Sheehan Disability Scale. International Journal of Psychiatry in Medicine, 27(2), 93-105. https://doi.org/10.2190/t8em-c8yh-373n-1uwd exit icon

Office of Applied Studies. (2005). Results from the 2004 National Survey on Drug Use and Health: National findings (HHS Publication No. SMA 05-4062, NSDUH Series H-28). Substance Abuse and Mental Health Services Administration.

Raftery, A. E., & Lewis, S. (1992). How many iterations in the Gibbs sampler? In J. M. Bernardo, J. O. Berger, A. P. Dawid, & A. F. M. Smith (Eds.), Bayesian statistics 4 (pp. 763-774). Oxford University Press.

Rao, J. N. K. (2003). Small area estimation (Wiley Series in Survey Methodology) (1st ed.). John Wiley & Sons.

RTI International. (2013). SUDAAN® language manual, release 11.0.1.

SAS Institute Inc. (2017). SAS/STAT software: Release 14.1.

Scheuren, F. (2004). What is a survey? (2nd ed.). https://www.unh.edu/institutional-research/sites/default/files/media/2022-05/what-is-a-survey.pdf exit icon

Shah, B. V., Barnwell, B. G., Folsom, R., & Vaish, A. (2000). Design consistent small area estimates using Gibbs algorithm for logistic models. In Proceedings of the 2000 Joint Statistical Meetings, American Statistical Association, Survey Research Methods Section, Indianapolis, IN (pp. 105-111). American Statistical Association.

Singh, A. C., & Folsom, R. E. (2001, April 11-14). Hierarchical Bayes calibrated domain estimation via Metropolis-Hastings Step in MCMC with application to small areas. Presented at the International Conference on Small Area Estimation and Related Topics, Potomac, MD.

Wright, D. (2003a). State estimates of substance use from the 2001 National Household Survey on Drug Abuse: Volume I. Findings (HHS Publication No. SMA 03-3775, NHSDA Series H-19). Substance Abuse and Mental Health Services Administration, Office of Applied Studies.

Wright, D. (2003b). State estimates of substance use from the 2001 National Household Survey on Drug Abuse: Volume II. Individual state tables and technical appendices (HHS Publication No. SMA 03-3826, NHSDA Series H-20). Substance Abuse and Mental Health Services Administration, Office of Applied Studies.

Wright, D., & Sathe, N. (2005). State estimates of substance use from the 2002-2003 National Surveys on Drug Use and Health (HHS Publication No. SMA 05-3989, NSDUH Series H-26). Substance Abuse and Mental Health Services Administration, Office of Applied Studies.

Wright, D., & Sathe, N. (2006). State estimates of substance use from the 2003-2004 National Surveys on Drug Use and Health (HHS Publication No. SMA 06-4142, NSDUH Series H-29). Substance Abuse and Mental Health Services Administration, Office of Applied Studies.

Wright, D., Sathe, N., & Spagnola, K. (2007). State estimates of substance use from the 2004-2005 National Surveys on Drug Use and Health (HHS Publication No. SMA 07-4235, NSDUH Series H-31). Substance Abuse and Mental Health Services Administration, Office of Applied Studies.

Section E: List of Contributors

This National Survey on Drug Use and Health (NSDUH) document was prepared by the Center for Behavioral Health Statistics and Quality (CBHSQ), Substance Abuse and Mental Health Services Administration (SAMHSA), U.S. Department of Health and Human Services (HHS), and by RTI International, Research Triangle Park, North Carolina. Work by RTI was performed under Contract No. HHSS283201700002C. Marlon Daniel served as government project officer and as the contracting officer representative.

This document was drafted by RTI and reviewed at SAMHSA. Production of the report at SAMHSA was managed by Rong Cai and Shiromani Gyawali.

End Notes

1 Use the NSDUH link on the following webpage: https://www.samhsa.gov/data/nsduh/state-reports-NSDUH-2021.

2 Eligibility of areas for in-person data collection was determined by state- and county-level COVID-19 metrics collected by Johns Hopkins University (Center for Systems Science and Engineering, Johns Hopkins University, 2021) (see Section 2.2.1 of the 2021 National Survey on Drug Use and Health (NSDUH): Methodological Summary and Definitions report [CBHSQ, 2022b]).

3 RTI International is a trade name of Research Triangle Institute. RTI and the RTI logo are U.S. registered trademarks of Research Triangle Institute.

4 The 2019-2020 State SAEs were produced, but they have since been removed from SAMHSA’s website. Methodological investigations found that the unusual societal circumstances in 2020 and the resulting methodological revisions to NSDUH data collection have affected the comparability of 2020 estimates with estimates from 2019 and earlier. Consequently, estimates that involve combining data from 2020 with previous years have been removed from the SAMHSA website. 

5 National small area estimates = Population-weighted averages of state-level small area estimates.

6 The census region-level estimates in the tables are population-weighted aggregates of the state estimates. The published national estimates, however, are benchmarked to exactly match the design-based estimates.

7 See Tables 1 to 35 in 2021 National Survey on Drug Use and Health: Model-Based Estimated Totals (in Thousands) (50 States and the District of Columbia) (CBHSQ, forthcoming b).

8 Note that in the 2004-2005 NSDUH state report (Wright et al., 2007) and prior reports, the term “prediction interval” (PI) was used to represent uncertainty in the state and regional estimates. However, that term also is used in other applications to estimate future values of a parameter of interest. That interpretation does not apply to NSDUH state report estimates; thus, “prediction interval” was dropped and replaced with “Bayesian confidence interval.”

9 For major depressive episode, estimates for people aged 12 or older are not included. For any mental illness, serious mental illness, receipt of mental health services, thoughts of suicide, suicide plans, and suicide attempts, estimates for youths aged 12 to 17 and people aged 12 or older are not included because youths are not asked these questions.

10 Binge drinking is defined as having five or more drinks (for males) or four or more drinks (for females) on the same occasion on at least 1 day in the 30 days prior to the survey.

11 A successfully screened household is one in which all screening questionnaire items were answered by an adult resident of the household and either zero, one, or two household members were selected for the NSDUH interview.

12 The usable case rule requires that a respondent answer “yes” or “no” to the question on lifetime use of cigarettes and “yes” or “no” to at least nine additional lifetime use questions.

13 The SAE expert panel, convened in 1999 and 2000, had six members: Dr. William Bell of the U.S. Bureau of the Census; Partha Lahiri, Professor of the Joint Program in Survey Methodology at the University of Maryland at College Park; Professor Balgobin Nandram of Worcester Polytechnic Institute; Wesley Schaible, formerly Associate Commissioner for Research and Evaluation at the Bureau of Labor Statistics; Professor J. N. K. Rao of Carleton University; and Professor Alan Zaslavsky of Harvard University.

14 See Tables 1 to 35 in 2021 Model-Based Prevalence Estimates (CBHSQ, 2022c).

15 See https://www.census.gov/newsroom/press-releases/2021/quality-indicators-on-2020-census.html.

16 The use of mixed models (fixed and random effects) allows additional error components (random effects) to be included. These account for differences between states and within-state variations that are not taken into account by the predictor variables (fixed effects) alone. It is also difficult (if not impossible) to produce valid mean squared errors (MSEs) for small area estimates based solely on a fixed-effect national regression model (i.e., synthetic estimation) (Rao, 2003, p. 52). The mixed models produce estimates that are approximately represented by a weighted combination of the direct estimate from the state data and a regression estimate from the national model. The regression coefficients of the national model are estimated using data from all of the states (i.e., borrowing strength), and the regression estimate for a particular state is obtained by applying the national model to the state-specific predictor data. The regression estimate for the state is then combined with the direct estimate from the state data in a weighted combination where the weights are obtained by minimizing the MSE (variance + squared bias) of the small area estimate.

17 To increase the precision of the estimated random effects at the within-state level, three SSRs from the 2021 sample were grouped together to form 250 grouped SSRs. California had 12 grouped SSRs; Florida, New York, and Texas each had 10 grouped SSRs; Illinois, Michigan, Ohio, and Pennsylvania each had 8 grouped SSRs; Georgia, New Jersey, North Carolina, and Virginia each had 5 grouped SSRs; and the rest of the states and the District of Columbia each had 4 grouped SSRs.

18 For details on how this outcome is calculated, see Section B.8 of this document.

19 This is the first time state estimates of opioid misuse in the past year have been published by the Substance Abuse and Mental Health Services Administration (SAMHSA).

20 Estimates of underage (aged 12 to 20) alcohol use were also produced.

21 Estimates of underage (aged 12 to 20) binge alcohol use were also produced.

22 This is the first-time state estimates of opioid use disorder in the past year have been published by SAMHSA.

23 Claritas is a market research firm headquartered in Cincinnati, Ohio (see https://claritas.com/ exit icon).

24 Generally, age groups are 12 to 17, 18 to 25, 26 to 34, and 35 or older. For underage alcohol and binge alcohol use, the age group is 12 to 20.

25 Depending on the measure and age group, significance levels were 1, 3, 5, or 10 percent.

26 Depending on the measure and age group, significance levels were 1, 3, or 5 percent.

27 Depending on the measure and age group, significance levels were 0.5, 1, 3, 5, or 10 percent.

28 See Table 14 in 2021 National Survey of Drug Use and Health: Model-Based Prevalence Estimates (50 States and the District of Columbia) (CBHSQ, 2022c).

29 See Table 14 in 2021 NSDUH: Model-Based Estimated Totals (CBHSQ, forthcoming b).

30 See Table 14 in 2021 NSDUH: Model-Based Prevalence Estimates (CBHSQ, 2022c).

31 In NSDUH SAE documents prior to 2016-2017, the term “initiation” was referred to as “incidence.”

32 NSDUH respondents in 2021 were asked the respective questions for alcohol use disorder or marijuana use disorder if they reported use of these substances on 6 or more days in the past year.

33 For alcohol, for example, withdrawal symptoms include (but are not limited to) trouble sleeping, hands trembling, hallucinations (seeing, feeling, or hearing things that are not really there), or feeling anxious.

34 For alcohol use disorder, for example, this criterion involves the use of alcohol, sedatives, or tranquilizers to get over or avoid alcohol withdrawal symptoms.

35 NSDUH respondents in 2021 were asked the respective questions for alcohol use disorder or marijuana use disorder only if they reported use of these substances on 6 or more days in the past year.

36 The GAF is a numeric scale used by mental health clinicians to quantify the severity of mental disorders and the extent to which mental disorders negatively affected a person’s daily functioning. In the MHSS, GAF scores were assigned by clinical interviewers at the end of each SCID interview based on information gathered throughout the interview about symptoms of mental disorders and related impairment. This procedure differs from use of the WHODAS in NSDUH, which relies on respondents’ (rather than clinicians’) perceptions of the extent to which their symptoms of psychological distress affected their day-to-day functioning.

37 “An MDE” refers to the occurrence of at least one MDE, rather than only one MDE. Similarly, reference to “the MDE” in a given period (e.g., the past 12 months) does not mean an individual had only one MDE in that period.

Long Descriptions—Figure

Long description, Figure 1. This figure is a graph of a function within a coordinate plane; the horizontal axis shows the estimated proportion (p = small area estimate), and the vertical axis shows the required effective sample size for the estimated proportion to be published. A horizontal line through the graph indicates that an effective sample size of 68 is required for the current suppression rule. There also is a dashed vertical line at the intersection of the estimated proportion of 0.05 and the effective sample size of 68. The graph decreases from an infinitely large required effective sample size when the estimated proportion is close to zero and approaches a local minimum of 50 when the estimated proportion is 0.2. The graph increases for estimated proportions greater than 0.2 until a required effective sample size of 68 is reached for an estimated proportion of 0.5. There also is a dashed vertical line at the intersection of the estimated proportion of 0.5 and the effective sample size of 68. The graph decreases for estimated proportions greater than 0.5 and approaches a local minimum of 50 for the required effective sample size when the estimated proportion is 0.8. The graph increases for estimated proportions greater than 0.8 and reaches an infinitely large required effective sample size when the estimated proportion is close to 1.0.

Long description end. Return to Figure1.

Long Descriptions—Equations

Long description, Equation 1. Capital S R R is equal to the ratio of two quantities. The numerator is the summation of the product of w sub h h and complete sub h h. The denominator is the summation of the product of w sub h h and eligible sub h h.

Long description end. Return to Equation 1.

Long description, Equation 2. Capital I R R is equal to the ratio of two quantities. The numerator is the summation of the product of w sub i and complete sub i. The denominator is the summation of the product of w sub i and selected sub i.

Long description end. Return to Equation 2.

Long description, Equation 3. Capital O R R is equal to the product of capital S R R and capital I R R.

Long description end. Return to Equation 3.

Long description, Equation 4. The relative standard error of the negative of the natural logarithm of p is equal to the square root of the posterior variance of p divided by the product of p and the negative of the natural logarithm of p. The relative standard error of the negative of the natural logarithm of 1 minus p is equal to the square root of the posterior variance of 1 minus p divided by the product of 1 minus p and the negative of the natural logarithm of 1 minus p.

Long description end. Return to Equation 4.

Long description, Equation 5. The model is given by the following equation: log of pi sub a, i, j, k divided by 1 minus pi sub a, i, j, k is equal to the sum of three terms. The first term is given by x transpose sub a, i, j, k times beta sub a. The second term is eta sub a, i. And the third term is nu sub a, i, j.

Long description end. Return to Equation 5.

Long description, Equation 6. Lower sub s and a is defined as the exponent of capital L sub s and a divided by the sum of 1 and the exponent of capital L sub s and a. And upper sub s and a is defined as the exponent of capital U sub s and a divided by the sum of 1 and the exponent of capital U sub s and a.

Long description end. Return to Equation 6.

Long description, Equation 7. Capital L sub s and a is defined as the difference of two quantities. The first quantity is the natural logarithm of the ratio of Theta sub s and a and 1 minus Theta sub s and a. The second quantity is the product of 1.96 and the square root of MSE sub s and a, which is the mean squared error for state-s and age group-a.

Long description end. Return to Equation 7.

Long description, Equation 8. Capital U sub s and a is defined as the sum of two quantities. The first quantity is the natural logarithm of the ratio of Theta sub s and a and 1 minus Theta sub s and a. The second quantity is the product of 1.96 and the square root of MSE sub s and a, which is the mean squared error for state-s and age group-a.

Long description end. Return to Equation 8.

Long description, Equation 9. The mean squared error, MSE sub s and a, is defined as the sum of two quantities. The first quantity is the square of the difference of two parts. Part 1 is defined as the natural logarithm of the ratio of capital P sub s and a and 1 minus capital P sub s and a. Part 2 is defined as the natural logarithm of the ratio of Theta sub s and a and 1 minus Theta sub s and a. The second quantity is the posterior variance of the natural logarithm of the ratio of capital P sub s and a and 1 minus capital P sub s and a.

Long description end. Return to Equation 9.

Long description, Equation 10. The average annual rate is defined as 100 times quantity q divided by 2. Quantity q is defined as capital X sub 1 divided by the sum of 0.5 times capital X sub 1 plus capital X sub 2.

Long description end. Return to Equation 10.

Long description, Equation 11. The logit of pi hat is equivalent to the logarithm of pi hat divided by the quantity 1 minus pi hat, which is equal to the sum of the following six quantities: negative 5.972664, the product of 0.0873416 and capital X sub k, the product of 0.3385193 and capital X sub w, the product of 1.9552664 and capital X sub s, the product of 1.1267330 and capital X sub m, and the product of 0.1059137 and capital X sub a.
or
Pi hat is equal to the ratio of two quantities. The numerator is 1. The denominator is 1 plus e raised to the negative value of the sum of the following six quantities: negative 5.972664, the product of 0.0873416 and capital X sub k, the product of 0.3385193 and capital X sub w, the product of 1.9552664 and capital X sub s, the product of 1.1267330 and capital X sub m, and the product of 0.1059137 and capital X sub a.

Long description end. Return to Equation 11.

Go to Top of Page