Methodology Used to Estimate the Bayes
Significance Level for the Null Hypothesis of
No Change between the 2021‑2022 and 2022‑2023
State Population Percentages

This methodology document is a companion piece to the Comparison of the 2021‑2022 and 2022‑2023 Population Percentages (50 States and the District of Columbia) tables. These tables can found at https://www.samhsa.gov/data/report/comparison-2021‑2022-and-2022‑2023-nsduh-population-percentages-50-states-and-district. These tables present the 2021‑2022 and 2022‑2023 National Survey on Drug Use and Health (NSDUH) state estimates and an indication of the statistical significance of the difference or change (p value). These tables were produced for outcomes that were consistently defined across the two time periods and for which 2021‑2022 and 2022‑2023 state-level estimates were available. Estimates shown in Tables 1 to 33 are based on the small area estimation (SAE) procedure.1 The moving average state estimates for the overlapping 2021‑2022 and 2022‑2023 time periods were obtained from independent applications of the survey-weighted hierarchical Bayes (SWHB) methodology; that is, the 2022‑2023 models were fit independently of the previously fitted 2021‑2022 models. This independent analysis approach was followed because there was no desire to revise the previously published 2021‑2022 estimates. The following discussion describes the methodology used to conduct statistical tests of significance for comparing 2021‑2022 and 2022‑2023 state population percentages.

Let pi 1 sub s and a and pi 2 sub s and a denote the 2021‑2022 and 2022‑2023 population percentages, respectively, for state‑s and age group‑a. The difference between pi 1 sub s and a and pi 2 sub s and a is defined in terms of the log-odds ratio (lor sub s and a) as opposed to the simple difference because the posterior distribution of lor sub s and a is closer to Gaussian than the posterior distribution of the simple difference (Pi 2 sub s and a minus pi 1 sub s and a represents the simple difference between the 2021-2022 and 2022-2023 prevalence rates.). Let ln denote the natural logarithm, then lor sub s and a is defined as follows:

Equation 1. Click 'D' link to access long description..     D


The p value given in the above referenced tables is computed to test the null hypothesis of no difference (i.e., Pi 2 sub s and a is equal to pi 1 sub s and a or equivalently, Log-odds ratio lor sub s and a is equal to zero). An estimate of lor sub s and a is given by

Equation 2. Click 'D' link to access long description.,     D

where pi hat 1 sub s and a and pi hat 2 sub s and a are small area estimates of pi 1 sub s and a and pi 2 sub s and a, respectively.

Let Theta 1 hat equal the ratio of pi hat 1 sub s and a and 1 minus pi hat 1 sub s and a and Theta 2 hat equal the ratio of pi hat 2 sub s and a and 1 minus pi hat 2 sub s and a, noting that subscript sa has been dropped from theta hat 1 and theta hat 2 in order to simplify the notation. An estimate of the posterior variance of lor sub s and a is given by the following formula:

Equation 3. Click 'D' link to access long description.,     D

where the covariance between the natural logarithm of Theta 1 hat and the natural logarithm of Theta 2 hat denotes the covariance between the natural logarithm of Theta 1 hat and the natural logarithm of Theta 2 hat. This covariance is defined in terms of the associated correlation as follows:

Equation 4. Click 'D' link to access long description..     D

Note that variance v of the natural logarithm of Theta 1 hat and variance v of the natural logarithm of Theta 2 hat used here to calculate variance v of the estimate of the log-odds ratio, lor hat sub s and a are the same posterior variances used in calculating 2021‑2022 and 2022‑2023 Bayesian confidence intervals, respectively.

The correlation between the natural logarithm of Theta 1 hat and the natural logarithm of Theta 2 hat was obtained by simultaneously modeling the 2021, 2022, and 2023 NSDUH data. Note that these models used the same predictors employed in the 2022‑2023 SAE models for all 3 years. This simultaneous modeling approach was adopted based on the results of the validation study2 conducted for measuring change in the 1999‑2000 and 2000‑2001 state population percentages. For this simultaneous model, four age groups (12 to 17, 18 to 25, 26 to 34, and 35 or older) by 3 years (2021, 2022, and 2023), that is, 12 subpopulation-specific models, were fitted, each with its own set of fixed and random effects. In this case, the general covariance matrices for the state and within-state random effects were 12 × 12 matrices corresponding to the 12 element (age group × year) vectors of random effects. Note that the survey-weighted, Bernoulli-type log likelihood employed in the SWHB methodology was appropriate for this simultaneous model because the 12 (age group × year) subpopulations were nonoverlapping. The correlation between the natural logarithm of Theta 1 hat and the natural logarithm of Theta 2 hat was approximated by the correlation calculated using the posterior distributions of the natural logarithm of pi 1 sub s and a divided by 1 minus pi 1 sub s and a and the natural logarithm of pi 2 sub s and a divided by 1 minus pi 2 sub s and a from the simultaneous model.

Note, for eight outcomes,3 the above-mentioned model did not converge. A different model that is based on simultaneous modeling of 2021‑2022 and 2022‑2023 data where 2022 data are repeated twice is used to obtain the correlations between 2021‑2022 and 2022‑2023 state estimates. This overlapping year model simultaneously fits eight subpopulation-specific models (i.e., four age groups × two overlapping time points [2021‑2022 and 2022‑2023]) instead of 12 subpopulation-specific models. Based on previous validation studies, this model is shown to underestimate the correlations,4 resulting in more conservative tests, meaning that there may be fewer significant differences that were able to be detected for these outcomes.

To calculate the p value for testing the null hypothesis of no difference (Log-odds ratio lor sub s and a is equal to zero), it is assumed that the posterior distribution of lor sub s and a is normal with estimated Mean is equal to estimate of the log-odds ratio, lor hat sub s and a. and Variance is equal to variance v of the estimate of the log-odds ratio, lor hat sub s and a.. The Bayesian p value or significance level for the null hypothesis of no difference, Log-odds ratio lor sub s and a is equal to zero, is The p value is equal to 2 times the probability of realizing a standard normal variate greater than or equal to the absolute value of a quantity z., where capital Z is a standard normal random variate, Quantity z is the estimate of the log-odds ratio, lor hat sub s and a, divided by the square root of the variance v of the estimate of the log-odds ratio, lor hat sub s and a., and absolute value of quantity z denotes the absolute value of quantity z. This Bayesian significance level (or p value) for the null value of lor sub s and a, say log-odds ratio lor sub zero, is defined following Rubin (1987)5 as the posterior probability for the collection of the lor sub s and a values that are less likely or have smaller posterior density, d of the log-odds ratio lor sub s and a, than the null (no change) value, log-odds ratio lor sub zero. That is,

Equation 5. Click 'D' link to access long description..     D

With the posterior distribution of lor sub s and a approximately normal, the p value of log-odds ratio lor sub zero is given by the above expression.

Endnotes

1 For details, see Section B in 2022‑2023 National Surveys on Drug Use and Health: Guide to State Tables and Summary of Small Area Estimation Methodology at https://www.samhsa.gov/data/report/2022‑2023-nsduh-guide-state-tables-and-summary-sae-methodology.

2 See Appendix E, Section E.2, of the following report: Wright, D. (2003). State estimates of substance use from the 2001 National Household Survey on Drug Abuse: Volume II. Individual state tables and technical appendices (HHS Publication No. SMA 03-3826, NHSDA Series H-20). Rockville, MD: Substance Abuse and Mental Health Services Administration, Office of Applied Studies.

3 The outcomes were: first use of marijuana in the past year among those at risk for initiation of marijuana use (Table 5), illicit drug use other than marijuana in the past month (Table 6), cocaine use in the past year (Table 7), heroin use in the past year (Table 9), hallucinogen use in the past year (Table 11), methamphetamine use in the past year (Table 12), pain reliever use disorder in the past year (Table 26), and attempted suicide in the past year (Table 33).

4 See Appendix E, Section E.1, of the following report: Wright, D. (2003). State estimates of substance use from the 2001 National Household Survey on Drug Abuse: Volume II. Individual state tables and technical appendices (HHS Publication No. SMA 03-3826, NHSDA Series H-20). Rockville, MD: Substance Abuse and Mental Health Services Administration, Office of Applied Studies.

5 See the following reference: Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys (Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics). New York, NY: John Wiley & Sons.

Long Descriptions for Equations

Long description, Equation 1: The log-odds ratio, lor sub s and a, is defined as the natural logarithm of the ratio of two quantities. The numerator of the ratio is pi 2 sub s and a divided by 1 minus pi 2 sub s and a. The denominator of the ratio is pi 1 sub s and a divided by 1 minus pi 1 sub s and a.

Long description end. Return to Equation 1.

Long description, Equation 2: The estimate of the log-odds ratio, lor hat sub s and a, is defined as the natural logarithm of the ratio of two quantities. The numerator of the ratio is pi hat 2 sub s and a divided by 1 minus pi hat 2 sub s and a. The denominator of the ratio is pi hat 1 sub s and a divided by 1 minus pi hat 1 sub s and a, where pi 1 sub s and a represents the 2021‑2022 state estimates and pi hat 2 sub s and a represents the 2022‑2023 state estimates.

Long description end. Return to Equation 2.

Long description, Equation 3: Variance v of the estimate of the log-odds ratio, lor hat sub s and a, is a function of three quantities: q1, q2, and q3. It is expressed as the sum of q1 and q2 minus q3. Quantity q1 is the variance v of the natural logarithm of Theta 1 hat, quantity q2 is the variance v of the natural logarithm of Theta 2 hat, and quantity q3 is 2 times the covariance between the natural logarithm of Theta 1 hat and the natural logarithm of Theta 2 hat.

Long description end. Return to Equation 3.

Long description, Equation 4: The covariance between the natural logarithm of Theta 1 hat and the natural logarithm of Theta 2 hat is equal to the correlation between the natural logarithm of Theta 1 hat and the natural logarithm of Theta 2 hat multiplied by the square root of the product of the variance v of the natural logarithm of Theta 1 hat and variance v of the natural logarithm of Theta 2 hat.

Long description end. Return to Equation 4.

Long description, Equation 5: The p value of log-odds ratio lor sub zero is equal to the probability of d of the log-odds ratio lor sub s and a when it is less than or equal to d of the log-odds ratio lor sub zero.

Long description end. Return to Equation 5.

Go to Top of Page