Skip To Content

Click for DHHS Home Page
Click for the SAMHSA Home Page
Click for the OAS Drug Abuse Statistics Home Page
Click for What's New
Click for Recent Reports and Highlights Click for Information by Topic Click for OAS Data Systems and more Pubs Click for Data on Specific Drugs of Use Click for Short Reports and Facts Click for Frequently Asked Questions Click for Publications Click to send OAS Comments, Questions and Requests Click for OAS Home Page Click for Substance Abuse and Mental Health Services Administration Home Page Click to Search Our Site

Prevalence of Substance Use Among Racial & Ethnic Subgroups in the U.S.


1- -  Problems of Sampling Small Racial/Ethnic Subgroups

Chapter 1 discussed four limitations of NHSDA data for the analysis of racial/ethnic patterns of substance use: 1) limitations in the measurement of substance use, 2)problems in the measurement of race/ethnicity (especially inability to measure identification with Asian/Pacific Islander national origin subgroups), 3) the lack of measurements of important covariates of race/ethnicity, and 4) inadequate sample representation of smaller racial/ethnic groups. This appendix addresses the fourth of these limitations by examining options for redesigning the NHSDA sample to yield larger samples of small racial/ethnic subgroups. We compare and evaluate these options with respect to statistical efficiency, feasibility, and survey costs.

The sample design and estimation techniques of the NHSDA currently support satisfactorily precise estimates of drug use prevalence among blacks, whites, and Hispanics in the NHSDA target population and among subclasses of these three groups defined by gender and age. [ The precision targets of the 1995 NHSDA included coefficients of variation (CVs) of less than 12% for estimates of drug use prevalence between 10% and 90% among Hispanics aged 12-17, 18-25, 26- 34, and 35 and older and among non-Hispanic blacks in the same age groups. See Research Triangle Institute (1995) for details.] Like the National Health Interview Survey (NCHS, 1989; Judkins et al., 1994), the NHSDA uses disproportionate stratified sampling of primary sampling units (PSUs), of areal segments within PSUs, of screened households within segments, and of individuals within households to increase the sample counts of blacks and Hispanics. Yet the current NHSDA does not allow sufficiently precise annual estimates of drug use prevalence in smaller racial/ethnic groups such as Native Americans, Asian and Pacific Islander subgroups (e.g., Chinese, Filipinos, Koreans, Vietnamese), and Hispanic subgroups (e.g., Cubans, Puerto Ricans). Even after combining the large NHSDAs conducted in 1991 through 1993, for a total combined sample size of more than 87,000 respondents, we found in Chapters 4 and 5 that detailed patterns of drug use prevalence in the Native American, "Hispanic- Caribbean," and "Hispanic-South America" subgroups were too imprecise to be reported. (See Appendix A for discussion of SAMHSA's criterion for suppressing low precision estimates.) Taking Native Americans as an example, standard error tables (Appendix A) suggest that it would be necessary to increase the combined 1991-93 sample size (n = 416) by a factor of about 2.5, to about 1,000 completed interviews, to produce drug use prevalence estimates for age-sex categories of Native Americans that are sufficiently precise to be reported. Perhaps four times as many Native American respondents would be required to produce more detailed estimates such as those presented for larger racial/ethnic categories in Chapter 5.

The required sample size of a subgroup in the NHSDA, or in any other survey, depends upon the specific analytical objectives. For example, an adequate sample size for estimating substance use prevalence in a subgroup might be inadequate for estimating change in prevalence in the subgroup or inadequate for estimating prevalence in sociodemographic subclasses of the subgroup. This Appendix does not assume specific required sample sizes, or corresponding levels of precision, for estimates of drug use prevalence within small racial/ethnic subgroups, because the sampling rates of each approach, and of each mix of approaches, might be tailored to obtain any levels of precision for domains defined by race/ethnicity and other variables. The optimal design would be the approach, or mix of approaches, that yielded the required levels of precision at minimum cost.

The optimal design for sampling rare populations generally depends upon two factors: 1) the availability of lists of rare population members and 2) the extent of geographical clustering of rare population members, i.e., the extent to which they are concentrated in relatively few geographic sub areas. Unfortunately, lists with good coverage of households and auxiliary information on the racial/ethnic identifications of household members are not available and cannot easily be constructed in the United States. [ Use of household lists from the decennial U.S. censuses is prohibited by the confidentiality provisions in the authorizing legislation of the U.S. Bureau of the Census. Metropolitan telephone book listings might be used to sample ethnic groups whose members mostly have known surnames that are largely distinct from the surnames used by nonmembers: For example, the surname "Wong" may be distinctively Chinese, but "Lee" is not. Attempts to use surnames to identify Hispanics have resulted in false positive rates of about 12-15% and a false negative rates of about 20-30% (Passel and Word, 1980; Judkins et al., 1992), and these rates are likely to rise if intermarriage between Hispanics and non- Hispanics continues to increase (Harrison and Bennett, 1995).

 Even if a set of surnames uniquely identified members of a specified ethnic group, the population coverage of the telephone listing would be poor to the extent that the ethnic group had a high percentage of households without telephones or with unlisted numbers. A sample drawn from telephone listings would also be expensive to field in a household interview survey unless the racial/ethnic group were highly geographically clustered. ] Thus, the principal design consideration is the extent of geographical clustering of racial/ethnic subgroups. This Appendix restricts attention to probability sampling designs [ Probability sampling designs are designs for which it is possible to estimate sampling variances because every unit in the population has a known nonzero chance of being selected.] and assumes that the collection of data from small racial/ethnic subgroups will use the standard NHSDA questionnaire and household interviewing procedures. [ The only exception is that both the screener and questionnaire would be revised to collect data on Asian/Pacific Islander subgroups (see Chapter 6). 

Much research shows that in-person interviews designed to encourage honest self-reporting are the best way to obtain valid data on drug use (e.g., NIDA, 1992).] If a racial/ethnic subgroup is small and not highly geographically clustered, then probability sampling of the subgroup can be extremely expensive because the costs of locating subgroup members can easily exceed interviewing costs (Alton and Anderson, 1986; Sudan et al., 1988). Another key design issue is that, for any fixed overall survey budget, increasing the precision of estimates of drug use prevalence for small racial/ethnic subgroups will reduce the precision of estimates for subgroups that are not oversampled and for the total target population.

Since racial/ethnic subgroups in the U.S. differ substantially in size and in geographical distribution, the optimal sample design may vary from subgroup to subgroup. Based on 1990 census ancestry data, Native Americans make up only about 0.8% of the U.S. population, each of the three largest Asian/Pacific Islander subgroups (Chinese, Filipino, and Japanese) less than 0.4% (see Chiswick and Sullivan, 1995), Mexicans 5.4%, Puerto Ricans 1.1%, Central Americans 0.5%, and other Hispanic subgroups less than 0.5% (see Table 2.2). [ Population estimates for small racial/ethnic subgroups can vary substantially depending on how subgroups are defined using ancestry, race, language, and/or birthplace data in the 1990 census. The estimates of the population percentages of Asian subgroups in this paragraph are based on ancestry and are larger than corresponding estimates based on race, language, and birthplace (Barringer et al., 1993) . ] Even if the population percentages of Asian and Hispanic subgroups increase rapidly in the next decade, as projected by the U.S. Bureau of the Census (see Chapter 1), probability sampling of these subgroups may continue to be difficult because of their diffuse geographic distributions in the U.S.

Table B1 presents estimated percentages of population ("%") and indexes of residential segregation ("S"), based on 1990 census data, of Asians, blacks, and Hispanics in all U.S. metropolitan areas and in selected metropolitan areas (Harrison and Bennett, 1995; Frey and Farley, 1996). The index S ranges from 0 (complete integration) to 100 (complete segregation) and is interpreted as the minimum percentage of the subgroup's population that would have to move to another census block group in order for the subgroup's population to have the same distribution across block groups as the total population of the metro areas). [ The census block group is defined as a grouping of one or more adjacent census blocks with average population equal to about 564 persons in the 1990 census (Frey and Farley, 1996).] 

Table B1 shows that, in general, blacks are much more highly residentially segregated than either Asians or Hispanics, Hispanics tend to be slightly more segregated than Asians, and both blacks and Hispanics have larger population percentages than Asians. An exception is San Francisco, where Asians and Hispanics are about equally segregated and both comprise larger percentages (about 15%) of the population than blacks. Segregation indexes are not available for Native Americans within specific metropolitan areas, but Harrison and Bennett (1995) indicate that only about 51% of the approximately 2 million Native Americans in the U.S. lived in metropolitan areas in 1990, that Native Americans comprised only about 0.5% of the population of U.S. metropolitan areas, and that Native Americans in metropolitan areas were even less residentially segregated than Asians (about 35 vs. 40). [ Similarly, Massey et al. (1993) showed that 8% of blacks, 15% of Hispanics, 37% of Asian/ Pacific Islanders, and 47% of Native Americans resided in blocks with no more 10% of the residents belonging to the same subgroup; 62% of blacks, 40% of Hispanics, 13% of Asian/Pacific Islanders, and 30% of Native Americans resided in blocks with 60% or more belonging to the same subgroup. ] 

Given the difficulties of oversampling rare and diffusely distributed subgroups, these data explain why blacks are oversampled in most federal household surveys of the national population, why Hispanics are oversampled in many such surveys (NHIS as well as NHSDA), and why Native Americans, Asian Americans, and Asian subgroups are oversampled in few if any major federal surveys of the national household population.

Table B1. Percentages of population (%) and indexes of residential segregation (S) of Asians, blacks, and Hispanics. All U.S. metropolitan areas and selected metropolitan areas, 1990.1


Racial/ethnic subgroup


% S


% S


% S

All U.S. metro areas

4 40

13 62

10 44


3 54

19 86

11 66


2 49

14 64

13 50


3 49

18 66

21 49

Los Angeles

9 45

8 66

33 53


1 40

17 74

33 56

New York

5 52

16 71

15 54

San Francisco

15 47

8 61

16 45

Washington, DC

5 39

26 66

6 41

1. The data for all metro areas in this table are taken from Harrison and Bennett (1995). The data for selected metro areas are from Frey and Farley (1996).























2 Specific Sample Design Options

This section presents details of four sampling approaches that might be used to obtain larger samples of small racial/ethnic subgroups in NHSDA. The four approaches are 1) increasing the NHSDA sample size, 2) oversampling of PSUs and areal segments, 3) household screening and subsampling, and 4) network sampling. The four sample redesign options have competing advantages and disadvantages:

2 Specific Sample Design Options

This section presents details of four sampling approaches that might be used to obtain larger samples of small racial/ethnic subgroups in NHSDA. The four approaches are 1) increasing the NHSDA sample size, 2) oversampling of PSUs and areal segments, 3) household screening and subsampling, and 4) network sampling. The four sample redesign options have competing advantages and disadvantages:

1. Increasing the NHSDA sample size. Increasing the NHSDA sample size by proportionally increasing the sampling fractions of all sampling domains is the simplest approach, but this approach is the most expensive. Of course, this needs to be considered in the context of the current NHSDA sample size and current plans. In 1998 the sample size will be approximately 25,000 persons, and in 1999 it will increase to approximately 70,000 persons in order to provide for state level estimation. Using the 1999 sample size as the new base level, for example, would imply that increasing the annual sample size of Native Americans to about 1,000 respondents would require increasing the annual overall sample size of the NHSDA by a factor of about 3, from about 70,000 respondents in the 1999 NHSDA to 210,000 in future NHSDAs, or combining 3 years of sample. Increasing the NHSDA overall sample size might be one element in a redesign aimed at increasing the precision of estimates for small racial/ethnic subgroups. Yet, in the interest of economy, this approach should probably be combined with other redesign options, especially #2 and #3. The key question is the extent to which other options can be used to reduce the increase in overall sample size that would otherwise be necessary to produce satisfactorily precise estimates for small racial/ethnic subgroups. Following Alton and Anderson (1986), we refer to options that would substantially increase precision without greatly increasing costs as "effective."

2. Oversampling primary sampling units and areal segments: Like other major national household surveys, the NHSDA already increases the sample representation of blacks and Hispanics by oversampling geographical subareas with large concentrations of blacks and Hispanics at each stage of sample selection. [ Like other major national household surveys, including NHIS and CPS, the NHSDA uses a multistage areal probability sample design. 

There are four stages of sampling: In the first stage, about 115 primary sampling units (PSUs) are randomly selected from a population of about 2,000 non-overlapping metropolitan and nonmetropolitan areas (counties and groups of counties) which divide up the entire land area of the U.S. In the second stage, area segments (subareas defined by combining adjacent blocks) are randomly selected from the PSUs. In the third stage, households are randomly selected from sample segments. In the fourth stage, household members are randomly selected from sample households.] In the first two stages, relatively high probabilities of selection are assigned to PSUs (first-stage sampling units) and to areal segments within PSUs (second-stage sampling units) that contain high percentages of blacks and Hispanics according to the U.S. census. In the third stage, a sample of housing units within each sample segment is screened to identify households that contain Hispanics and blacks. The screening allows households with blacks and Hispanics to be sampled at higher rates than other households, while maintaining approximately equal probabilities of selection of households within each racial/ethnic category and age-sex domain and approximately equal interviewer workloads per sample segment (about nine interviews per segment). In the fourth stage, household members (up to a maximum of two) are oversampled in Hispanic and black households. [ In the first stage selection of PSUs, the percentage of Hispanics in metropolitan areas was one criterion that was used to determine 43 sample PSUs (out of a total of 115) that were selected with certainty (i.e., probability 1.0). 

In the second stage selection of areal segments within PSUs, the segments of the 43 certainty PSUs were divided into five geographic strata based on 1990 census data on the racial and ethnic composition of these segments: 1) High Hispanic segments, 2) Moderate Hispanic segments, 3) Low Hispanic segments, 4) High black segments, and 5) High white segments. The segments of the 72 no certainty PSUs are similarly divided into five strata, yielding a total of ten geographic strata. Without geographic oversampling, the screening costs for obtaining the targeted numbers of Hispanic and black completed interviews would be extremely high (RTI, 1995, p. 9). ] With the overall expansion of the NHSDA sample in 1999, it is anticipated that adequate precision will be achieved for blacks and Hispanics without any oversampling.

The basic advantage of geographic oversampling is that statistics for a small subgroup are more precise if sample cases are drawn from geographic subareas in rough proportion to the numbers of subgroup members who live in those subareas. Unfortunately, even though geographic oversampling is effective in increasing the precision of estimates for blacks and Hispanics in the U.S., geographic oversampling would not greatly improve the precision of estimates for smaller racial/ethnic subgroups in the U.S. This is because, as shown by Alton and Anderson (1986), two conditions are necessary for geographic oversampling to be effective: 1) a large percentage of the small subgroup should live in the geographic subareas that are oversampled and 2) the small subgroup should make up a substantial percentage of the total population in the subareas that are oversampled. [ For simplicity, Alton and Anderson (1986) assume that the element variance of the response variable is constant across geographic strata and that the costs of including subgroup and non-subgroup members in the sample are equal. 

Even if screening and subsampling within geographic subareas are not applied (see the next section), oversampling of areas where rare subgroups are concentrated increases the precision of rare subgroup statistics by allocating subgroup sample cases across subareas in a way that is more nearly in proportion to the subgroup populations of the subareas. Unless screening and subsampling within subareas are applied, on the other hand, this kind of geographic oversampling can reduce the precision of statistics for non-subgroup members and for the total population.] Reductions of more than 20% in costs occur if 40% or more of the small subgroup resides in subareas containing no more than 5% of the total population or if 80% or more of the small subgroup resides in subareas containing no more than 20% of the total population. As suggested by Table B1, these conditions are not satisfied for small racial/ethnic subgroups in the U.S. as a whole. The value of geographic oversampling may be even less for NHSDAs conducted at the end of the 1990s, because 1990 census data on the racial/ethnic compositions of metropolitan areas and block segments are likely to become seriously outdated.

Given appropriate definitions of sampling strata, the Anderson-Anderson (1986) conditions are likely to be much more closely satisfied within selected metropolitan areas than in the U.S. as a whole. Many metropolitan areas where small racial/ethnic subgroups make up appreciable percentages of the population are already included with certainty as "self-representing PSUs" in the NHSDA sample (see Table B1). In at least two cases (Cubans in Miami; Puerto Ricans in New York), a large fraction of total U.S. subgroup members reside in a single metropolitan area. A cost-effective strategy would be to increase the sample size of each such PSU, while also increasing the selection probabilities of areal segments within the PSU that contained high percentages of the targeted subgroup. [ NHSDA's current technique of sampling block segments with probabilities proportional to a composite measure of size could be extended to simultaneously increase the sampling rate of a targeted small subgroup, such as Chinese-Americans, while continuing to oversample blacks and Hispanics. If r 1 , r 2 , and r 3 denote the sampling rates of the 3 racial/ethnic subgroups and N 1i , N 2i , and N 3i denote the numbers of subgroup members in the i-th segment of the metropolitan area, then areal segments would be selected with probabilities proportional to the composite M SUB i~ =~ SUM FROM {j = 1} TO 3 {{r SUB j}~{N SUB {ji}}}. This sampling approach would yield the desired mix of subgroup members and makes the sample within each subgroup approximately self-weighting and the sample sizes per segment approximately equal.] A disadvantage of this approach is that precise inferences about the racial/ethnic subgroup would be restricted to individuals who lived in the oversampled metropolitan area. No satisfactorily precise estimates could be made for subgroup members who did not live in the oversampled metro area or for subgroup members in the U.S. as a whole.

3. Screening of households with subsampling: With screening, a large sample of households is selected, and a brief questionnaire is administered to a knowledgeable household resident to determine the racial/ethnic identifications of all of the household's residents. Households that include members of a targeted racial/ethnic subgroup are included in the sample with a higher probability than households that do not include such individuals. Since racial/ethnic data for specific households cannot legally be made available from the U.S. census, screening is essential if households containing subgroup members are to be oversampled within areal segments. In general, screening is effective only if the cost of screening a household is much smaller than the cost of conducting a complete interview.

In the NHSDA, screening of households for blacks and Hispanics is used in combination with geographic oversampling of high-black and high-Hispanic PSUs and segments, but screening might also be applied without oversampling PSUs and segments. In 1999, current plans are to screen approximately 250,000 households (approximately 400,000 persons) to obtain a sample of 70,000 respondents. This screening is needed in order to obtain sufficient numbers of persons age 12-17 in the sample. Given this large sample of screened households, it would not be very costly to screen for race/ethnic minorities age 18 and older, because numbers of them would be found in the approximately 180,000 (surplus) households screened but not interviewed. It needs to be noted that most of the additional persons are in the oldest age groups, and that there are increasingly fewer "surplus" households as the age decreases, culminating in zero "surplus" households for ages 12-17. Of course, in addition to any extra screening required, any increase in the actual number of interviews would increase the survey costs proportionately. In other circumstances, the cost of screening a household for detailed racial/ethnic composition can be high, probably more than one-third the cost of conducting a standard interview. [ The Census Bureau estimated that screening for racial/ethnic composition costs one-third as much as a standard NHIS interview (Waksberg, 1995). The average interview in NHIS, like the average interview in NHSDA, lasts about one hour. Alton and Anderson (1986) point out that effective screening requires keeping the false negative rate low, which suggests using a less stringent criterion for identifying members of the rare subgroup, e.g., a screener that allows multiple ancestries. ]

Like geographic oversampling, screening with subsampling would be more cost-effective when applied to metropolitan areas where a targeted small racial/ethnic subgroup makes up a large percentage of the population. The most effective strategy for sampling rare subgroup members within a selected metropolitan area would be to combine geographic oversampling of areal segments with high subgroup concentrations (option #2) with screening and subsampling of subgroup households within areal segments (option #3). [ Without screening and subsampling within areal segments, geographic oversampling of areal segments within the metropolitan area would reduce the effectiveness of the overall sample design. Specifically, whites (and other non-subgroup individuals) who lived in segments with high subgroup representation would have a greater chance of being sampled than whites in predominantly white segments. Without screening, these unequal selection probabilities would result in a loss of precision in estimates of drug use prevalence for whites and for the total metropolitan area.]

4. Network sampling and related approaches. Network sampling (or multiplicity sampling as it is also called) would increase the information about rare subgroup members that is collected using the screener (option #3). If one or more rare subgroup members were found in a sample household, the screener would also ascertain not only the racial/ethnic identifications of all household members but also the names, addresses, telephone numbers, and racial/ethnic identifications of specified relatives of household members, such as parents, siblings, and children (Sirken, 1970). NHSDA interviewers would interview subgroup members living in the household and would also locate, track, and interview specified relatives living outside the household. Given the concentration of kinship ties within local areas, it would be economical to follow up only those relatives who were living in the same PSU as the screened household or within other NHSDA sample PSUs. Methods of appropriately weighting and analyzing data from network samples have been developed in recent years (e.g., Sirken, 1972; Czaja et al., 1986; Thompson, 1992). [ For an estimate to be unbiased, individuals who are eligible to enter the sample through the selection of n different households must be weighted by 1/n. The increased information obtained from networks is offset somewhat by reduced sample efficiency due to the variability of the weights.]

For the purpose of oversampling small racial/ethnic subgroups in the NHSDA, network sampling has four principal disadvantages:

•Respondents might be unable or unwilling to provide accurate and complete data on the race/ethnicities and addresses of relatives who are not living in the same household. In fact, the NHSDA does not currently collect either names or addresses because of the sensitive nature of the questions.•

•The costs of extending the screener and of following up and interviewing relatives might be substantial, even if the sample were restricted to one or more metropolitan areas.•

•Planning a network sample requires data on the sizes of familial networks and on the "network counting rules," i.e., rules linking any individual in the target population to their own and to their relatives' households. This information is not currently available for small racial/ethnic subgroups in the U.S.•

•The gains in statistical precision from network sampling might be small relative to the increases in the sample sizes of rare subgroup members. The reason is that measures of substance use prevalence are likely to be positively correlated among members of the same familial network. For example, parents influence children and siblings influence each other in their decisions to use drugs. If so, design effects due to clustering of individual respondents within familial networks are likely to be appreciable.•

Table B2 summarizes our assessments of the four redesign options.


Table B2. Assessment of four NHSDA sample redesign options.




Overall assessment

1. Increase the NHSDA sample size



Increases already planned will help significantly.

2. Oversampling PSUs and areal segments

Less expensive than a nonclustered sample

Not effective for small subgroups in U.S.; Depends on out-of-date 1990 census data

Effective in metros with high subgroup concentration when combined with #3

3. Screening of households with subsampling

Improves sampling efficiency when compared to #1

May not result in sufficient sample sizes for some small subgroups in the national population.

Effective given current plans for 1999, especially for metros with high subgroup concentration

4. Network sampling and related approaches

Permits the identification of rare population members without screening

Needs feasibility testing and data on network sizes not currently available. Precision reduced by intra-family correlation.

Quality of data on relatives outside the household and follow-up costs are major concerns.



















3 Summary and Conclusions

The preceding section suggests that the screening and within-household oversampling of specified population groups given the larger planned sample sizes in 1999 and beyond would be the most cost effective method of improving the precision of estimates for race/ethnic subgroups. If the goal were to sample 1,000 persons in a specified race/ethnic minority, then the overall sample size would need to be increased for each such group for which estimates are desired. Otherwise, varying probabilities of selection would result in lower precision for all other estimates.

Also, oversampling small racial/ethnic subgroups in one or more metropolitan areas where they are highly concentrated might be an especially effective sample redesign option. Table B1 shows that such oversampling might be feasible for specific subgroups within specified metropolitan areas. For example, Asians (mostly Chinese and Filipinos) comprised about 15% of the population of San Francisco in 1990, and Asians were more highly segregated in San Francisco than in all metro areas (47 vs. 40). Based on U.S. Bureau of the Census (1992c), other metropolitan areas that might be suitable for oversampling specific racial/ethnic subgroups include Miami (18% Cuban), New York City (7% Puerto Rican), Los Angeles (26% Mexican), and Honolulu (23% Japanese). 

Three techniques might be used to increase the yield of rare subgroup members within metropolitan areas where they are concentrated: 1) oversampling of area segments containing high percentages of the subgroup, 2) screening of households within segments and oversampling of those containing subgroup members, and 3) oversampling of subgroup members within sample households. [ The approach combines elements of several of the sampling approaches discussed in the preceding section: First, the metropolitan area containing a large concentration of the targeted subgroup would be included in the NHSDA sample with certainty. Second, area segments in the metropolitan area that contained a large percentage of targeted subgroup members, according to the U.S. Census, would be selected with higher probabilities than other segments. Third, households within selected area segments would be screened for members of the targeted subgroup, and households containing targeted subgroup members would be included in the sample with higher probabilities than other households. ]

Yet the approach of oversampling rare subgroups in selected metropolitan areas also has some disadvantages:

•Data from oversampled metropolitan area(s) cannot generally be used to make estimates for the population of rare subgroups in the U.S. as a whole. Only if subgroup members in the oversampled area(s) comprised a very large fraction (say 75% or more) of total subgroup members in the U.S. would it be defensible to compute estimates of the subgroups' drug use prevalence in the total U.S. by "poststratifying" estimates of the subgroups' drug use prevalence in the oversampled area(s). [ The basic idea of poststratification is simple: One first computes the estimated percentages of subgroup members ever using a specified drug separately for age-sex domains of the oversampled metropolitan areas, then multiplies each estimated percentage by the corresponding proportion of the national subgroup population in the age-sex domain (as estimated from the U.S. Census), and finally adds the products across age-sex domains to obtain an estimated percentage for the national population of the subgroup. Unfortunately, such poststratified national estimates can be highly biased unless the oversampled metropolitan area(s) contain a large fraction of total subgroup members in the U.S.] •

•Effective oversampling of metropolitan areas depends on accurate data on the racial/ethnic compositions of subareas, tracts and block groups, within the metropolitan areas, but tract and block data from the 1990 census are likely to be seriously outdated by the late 1990s.•

•Oversampling of selected metropolitan areas would not be effective for Native Americans. Only about 51% of Native Americans live in major metropolitan areas, and these Native Americans are not highly concentrated in any single metropolitan area (Harrison and Bennett, 1995). While 30% of Native Americans live on reservations or in other areas where Native Americans are highly concentrated, the remaining 70% live mostly in areas that contain 90% or more non-Native Americans (Massey et al., 1993). [ A variant of the approach of oversampling metropolitan areas applicable to Native Americans would be to select a supplementary stratified multistage sample of the approximately 30% of Native Americans who live on reservations. Reservations would be sampled in the first stage and households within reservations in the second stage. To ensure representation of major tribal affiliations, the sample of reservations could be stratified by region and selected with probabilities proportional to population size. A disadvantage is that conclusions could be made only about Native Americans on reservations.]

This report illustrates the potential uses as well as the limitations of an approach to improving estimates for small racial/ethnic subgroups that does not require redesign of the NHSDA sample. By pooling data across three unusually large NHSDAs conducted between 1991 and 1993, we were able to estimate overall drug use (i.e., not controlling for age, sex, or other covariates) of Native Americans, Asian/Pacific Islanders, and seven subclasses of Hispanics and to provide more detailed estimates of drug use (i.e., controlling for age, sex, and other covariates) for Asian/Pacific Islanders and four somewhat more broadly defined subclasses of Hispanics.

Previous Page Page Top TOC

This page was last updated on May 19, 2008.