Upon receipt, questionnaires are checked for critical identification and demographic data, then keyed to disk. This creates a file consisting of one record for each completed interview. Extensive within-record consistency checks and resolution of most inconsistencies and missing data are done using machine editing routines, called logical imputation. For some key variables that still have missing values after the application of logical imputation, statistical imputation is used to replace the missing data with appropriate valid response codes. Two types of statistical imputation procedures are used. Hot-deck imputation involves the replacement of a missing value with a valid code taken from another respondent who is "similar" and has complete data. Logistic regression models are also used to determine replacement values for some variables.
Each record (i.e., respondent) is assigned an analysis weight which incorporates:
Data are generally released to the public about six months after the end of data collection. Public use data files are available 1-2 years after completion of data collection.
- a. The inverse of the selection probability for the respondent. This is the product of the inverses of selection probabilities at each stage of sampling.
- b. Adjustments for household and person-level nonresponse.
- c. Poststratification adjustment to Census projections (of the civilian noninstitutionalized population of the total U.S.) for the midpoint of each NHSDA data collection period. Adjustments are made to age, sex, and race/ethnicity distributions (see Appendix 2 for a discussion of the poststratification adjustment).
This page was last updated on June 16, 2008.