DAWN Data Files
Legacy DAWN has data from 1997 through 2011 shared in public-use files (PUFs). These are full datasets treated with confidentiality protections. The PUF codebooks give more information on specific data treatments. Note, the codebooks may not contain all variables. This data used a longitudinal probability sample of hospitals in the United States. The PUFs contain information on emergency department (ED) visits related to substance use and misuse. This data can be used to track trends and understand the impact of substance use on healthcare.
In 2018, DAWN was reestablished to use hospital-based and probability sample-based surveillance.
Versions
2015-01-20: For a small number of cases (approximately 1%), some of the drug mention variables (i.e. CATID_1_1 and TOXTEST_1) were updated to reflect the current drug categorizations from the Drug Reference Vocabulary (DRV).
Dataset Documentation
ASCII Setup Files
Publications Using SAMHSA Data
Scope and Methodology Notes
GEOGRAPHIC COVERAGE: United States
DATA TYPES: Medical Records
UNIVERSE: All nonfederal, short-stay, general surgical and medical hospitals that operate at least one 24-hour ED with more than 100 annual ED visits in the United States
The re-established version of DAWN retains the valuable aspects of legacy DAWN, such as direct chart review to identify DAWN cases, an inclusive case definition in order to identify all ED visits related to substance use (not just misuse), and the collection of detailed information about the drugs involved in ED visits, using the most specific terminologies available in the ED records (for example, brand names or slang terms).
Important improvements to new DAWN include improved timeliness of data, data available at more frequent intervals, and data for a wider range of geographic types, including urban, suburban, and rural areas. Having data available more quickly means that DAWN can serve as a true “early warning” system and inform public health response efforts in local areas.
Major changes to DAWN were made in 2003 as a result of a redesign intended to improve the quality and representativeness of DAWN estimates. Changes included the design of the hospital sample, a new case definition for drug-related ED visits eligible for DAWN, revised data items submitted on these cases, a new protocol for case finding, and improved quality assurance measures. These improvements created a permanent disruption in trends. As a result, comparisons cannot be made between the old DAWN (2002 and prior years) and the redesigned DAWN (2004 and forward). The year 2003 was a period of transition between the legacy DAWN and the redesigned DAWN. As a result, only interim, half-year estimates were produced for 2003.
Currently, the only public use files that are available are from legacy DAWN. Several measures have been taken to protect the confidentiality of DAWN data.
In the public use file, complex design variables have been adjusted to optimize disclosure protection while preserving the original design and statistical properties of the data to the highest degree possible. Specifically, each year primary sampling units (PSUs) are randomly selected for combination or division, and original strata may be combined with adjacent strata. Self-representing PSUs may be treated as non-self-representing as a result of this process. Case weight, replicate, and PSU frame count values are adjusted to reflect changes to PSUs and strata and to further maximize disclosure protection.
PSU and strata identification values are randomized each year. While DAWN is not designed to identify the contribution or influence of a particular hospital, applied disclosure protection methods and identification value randomization preclude multilevel modeling at the hospital level and comparison of individual sampling units over time.
While disclosure protection has been applied to minimize deviance from the original sampling error calculation model, statistical analyses generated from the public-use file may vary from results provided on the DAWN website. For online analysis using Survey Documentation and Analysis (SDA), complex design variables are used to generate statistical results but are not directly accessible. Therefore, SDA uses original design variables modified slightly to accommodate the variance estimation capabilities of the SDA statistical program.
Original variables recoded for disclosure protection include:
- Quarter: “Month of episode” has been recoded into “quarter.”
- Day part: “Exact time of episode” has been recoded into four “day part” categories.
- Case disposition: “Chemical dependency/detox” has been combined with “psychiatric unit.” Hospitals with combined chemical dependency and psychiatric units are included in the “other inpatient unit” disposition category.
Several limitations to the data exist and should be noted before using legacy DAWN public use files.
Information on drug-related ED visits is based on a sample and is, therefore, subject to sampling variability. Hospital participation rates in oversampled metropolitan areas typically have been 50 percent or higher. However, the participation rate in the remainder of the United States has been lower, in the range of 20 to 30 percent, since the DAWN redesign in 2003.
In any sample survey, a low response rate is of concern because it creates the opportunity for bias. That is, nonparticipating hospitals may have different characteristics than participating hospitals, possibly including differences in the drugs reported, types of drug-related ED visits, patient disposition, or population demographics.
Although every effort is made during the data collection phase to collect data accurately and precisely, existing medical records vary in specificity and detail. Therefore, factors that may affect the reliability and accuracy of the findings include the following:
- DAWN data collectors attempt to identify the exact drugs involved in an ED visit. If existing medical records include only a general description of a drug (such as “benzodiazepines” or “opiates”), the drug is grouped in a general category (such as “benzodiazepines not otherwise specified”). Similarly, records often describe a drug as amphetamine without specifying if it is methamphetamine.
- DAWN seeks to report only drugs that are related to the ED visit, not all the drugs or medications that the patient may be taking on a regular basis as prescribed by a doctor (including over-the-counter medications). If the ED record is not clear on this point, drugs may be included in the data that are not specifically related to the visit. For example, anecdotal evidence suggests that methadone may be over-reported when the medical records fail to mention that the patient is in a methadone treatment program. The opposite is also true; a current medication may be involved in the ED visit but not recognized as a contributing factor by the clinician.
For methodological information for a particular year or date range, please check the codebook for a specific data set above.
Sample
The legacy version of DAWN employs a multistage sampling design for the selection of EDs for analysis. Stratified simple random sampling with oversampling in selected metropolitan areas is used to select the hospitals.
DAWN's target sample frame consists of all nonfederal, short-stay, general medical and surgical hospitals in the United States that have one or more EDs open 24 hours a day.
DAWN cases are identified by the systematic, retrospective review of ED medical records in participating hospitals. Due to the volume of cases in some EDs, a sample of medical records may be selected for review.
Weight
The legacy version of DAWN includes a set of complex sample design variables to calculate estimates for the universe of DAWN-eligible hospitals in the United States from the sampled hospitals participating in DAWN.
The primary sampling weights reflect the probability of selection, and separate adjustment factors are included to account for sampling of ED visits, nonresponse, data quality, and the known total of ED visits delivered by the universe of eligible hospitals.
DAWN design variables include variance estimation stratum (STRATA), PSU, replicate (REPLICATE), PSU frame count (PSUFRAME), and case weight (CASEWGT).