This methodology document is a companion piece to the Comparison of the 2021‑2022 and 2022‑2023 Population Percentages (50 States and the District of Columbia) tables. These tables can found at https://www.samhsa.gov/data/report/comparison-2021‑2022-and-2022‑2023-nsduh-population-percentages-50-states-and-district. These tables present the 2021‑2022 and 2022‑2023 National Survey on Drug Use and Health (NSDUH) state estimates and an indication of the statistical significance of the difference or change (p value). These tables were produced for outcomes that were consistently defined across the two time periods and for which 2021‑2022 and 2022‑2023 state-level estimates were available. Estimates shown in Tables 1 to 33 are based on the small area estimation (SAE) procedure.1 The moving average state estimates for the overlapping 2021‑2022 and 2022‑2023 time periods were obtained from independent applications of the survey-weighted hierarchical Bayes (SWHB) methodology; that is, the 2022‑2023 models were fit independently of the previously fitted 2021‑2022 models. This independent analysis approach was followed because there was no desire to revise the previously published 2021‑2022 estimates. The following discussion describes the methodology used to conduct statistical tests of significance for comparing 2021‑2022 and 2022‑2023 state population percentages.
Let and
denote the 2021‑2022 and 2022‑2023 population percentages, respectively, for state‑s and age group‑a. The difference between
and
is defined in terms of the log-odds ratio (
) as opposed to the simple difference because the posterior distribution of
is closer to Gaussian than the posterior distribution of the simple difference (
). Let ln denote the natural logarithm, then
is defined as follows:
. D
The p value given in the above referenced tables is computed to test the null hypothesis of no difference (i.e., or equivalently,
). An estimate of
is given by
, D
where and
are small area estimates of
and
, respectively.
Let and
, noting that subscript sa has been dropped from
and
in order to simplify the notation. An estimate of the posterior variance of
is given by the following formula:
, D
where denotes the covariance between
and
. This covariance is defined in terms of the associated correlation as follows:
. D
Note that and
used here to calculate
are the same posterior variances used in calculating 2021‑2022 and 2022‑2023 Bayesian confidence intervals, respectively.
The correlation between and
was obtained by simultaneously modeling the 2021, 2022, and 2023 NSDUH data. Note that these models used the same predictors employed in the 2022‑2023 SAE models for all 3 years. This simultaneous modeling approach was adopted based on the results of the validation study2 conducted for measuring change in the 1999‑2000 and 2000‑2001 state population percentages. For this simultaneous model, four age groups (12 to 17, 18 to 25, 26 to 34, and 35 or older) by 3 years (2021, 2022, and 2023), that is, 12 subpopulation-specific models, were fitted, each with its own set of fixed and random effects. In this case, the general covariance matrices for the state and within-state random effects were 12 × 12 matrices corresponding to the 12 element (age group × year) vectors of random effects. Note that the survey-weighted, Bernoulli-type log likelihood employed in the SWHB methodology was appropriate for this simultaneous model because the 12 (age group × year) subpopulations were nonoverlapping. The correlation
was approximated by the correlation calculated using the posterior distributions of
and
from the simultaneous model.
Note, for eight outcomes,3 the above-mentioned model did not converge. A different model that is based on simultaneous modeling of 2021‑2022 and 2022‑2023 data where 2022 data are repeated twice is used to obtain the correlations between 2021‑2022 and 2022‑2023 state estimates. This overlapping year model simultaneously fits eight subpopulation-specific models (i.e., four age groups × two overlapping time points [2021‑2022 and 2022‑2023]) instead of 12 subpopulation-specific models. Based on previous validation studies, this model is shown to underestimate the correlations,4 resulting in more conservative tests, meaning that there may be fewer significant differences that were able to be detected for these outcomes.
To calculate the p value for testing the null hypothesis of no difference (), it is assumed that the posterior distribution of
is normal with estimated
and
. The Bayesian p value or significance level for the null hypothesis of no difference,
, is
, where
is a standard normal random variate,
, and
denotes the absolute value of
. This Bayesian significance level (or p value) for the null value of
, say
, is defined following Rubin (1987)5 as the posterior probability for the collection of the
values that are less likely or have smaller posterior density,
, than the null (no change) value,
. That is,
. D
With the posterior distribution of approximately normal,
is given by the above expression.
1 For details, see Section B in 2022‑2023 National Surveys on Drug Use and Health: Guide to State Tables and Summary of Small Area Estimation Methodology at https://www.samhsa.gov/data/report/2022‑2023-nsduh-guide-state-tables-and-summary-sae-methodology.
2 See Appendix E, Section E.2, of the following report: Wright, D. (2003). State estimates of substance use from the 2001 National Household Survey on Drug Abuse: Volume II. Individual state tables and technical appendices (HHS Publication No. SMA 03-3826, NHSDA Series H-20). Rockville, MD: Substance Abuse and Mental Health Services Administration, Office of Applied Studies.
3 The outcomes were: first use of marijuana in the past year among those at risk for initiation of marijuana use (Table 5), illicit drug use other than marijuana in the past month (Table 6), cocaine use in the past year (Table 7), heroin use in the past year (Table 9), hallucinogen use in the past year (Table 11), methamphetamine use in the past year (Table 12), pain reliever use disorder in the past year (Table 26), and attempted suicide in the past year (Table 33).
4 See Appendix E, Section E.1, of the following report: Wright, D. (2003). State estimates of substance use from the 2001 National Household Survey on Drug Abuse: Volume II. Individual state tables and technical appendices (HHS Publication No. SMA 03-3826, NHSDA Series H-20). Rockville, MD: Substance Abuse and Mental Health Services Administration, Office of Applied Studies.
5 See the following reference: Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys (Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics). New York, NY: John Wiley & Sons.
Long description, Equation 1: The log-odds ratio, lor sub s and a, is defined as the natural logarithm of the ratio of two quantities. The numerator of the ratio is pi 2 sub s and a divided by 1 minus pi 2 sub s and a. The denominator of the ratio is pi 1 sub s and a divided by 1 minus pi 1 sub s and a.
Long description end. Return to Equation 1.
Long description, Equation 2: The estimate of the log-odds ratio, lor hat sub s and a, is defined as the natural logarithm of the ratio of two quantities. The numerator of the ratio is pi hat 2 sub s and a divided by 1 minus pi hat 2 sub s and a. The denominator of the ratio is pi hat 1 sub s and a divided by 1 minus pi hat 1 sub s and a, where pi 1 sub s and a represents the 2021‑2022 state estimates and pi hat 2 sub s and a represents the 2022‑2023 state estimates.
Long description end. Return to Equation 2.
Long description, Equation 3: Variance v of the estimate of the log-odds ratio, lor hat sub s and a, is a function of three quantities: q1, q2, and q3. It is expressed as the sum of q1 and q2 minus q3. Quantity q1 is the variance v of the natural logarithm of Theta 1 hat, quantity q2 is the variance v of the natural logarithm of Theta 2 hat, and quantity q3 is 2 times the covariance between the natural logarithm of Theta 1 hat and the natural logarithm of Theta 2 hat.
Long description end. Return to Equation 3.
Long description, Equation 4: The covariance between the natural logarithm of Theta 1 hat and the natural logarithm of Theta 2 hat is equal to the correlation between the natural logarithm of Theta 1 hat and the natural logarithm of Theta 2 hat multiplied by the square root of the product of the variance v of the natural logarithm of Theta 1 hat and variance v of the natural logarithm of Theta 2 hat.
Long description end. Return to Equation 4.
Long description, Equation 5: The p value of log-odds ratio lor sub zero is equal to the probability of d of the log-odds ratio lor sub s and a when it is less than or equal to d of the log-odds ratio lor sub zero.
Long description end. Return to Equation 5.