How does SAMHSA prepare public-use data files for release?
SAMHSA follows a series of steps for archiving each new data set. Program staff make any necessary corrections to the data and remedy any problems uncovered during data review.
Processing a study for public use requires that all variables, missing data codes, and coding schemes be standardized across elements of a study. This stage of processing may be lengthy, depending on the data and completeness of materials received. All variables must be examined to ensure that each is identified and labeled. When variables are not thoroughly described, SAMHSA staff consult the documentation and/or questionnaires.
Each study is assessed to determine whether any issues of respondent confidentiality exist and is checked for problems arising from either direct or indirect identifying variables. Direct identifiers may be blanked or deleted to safeguard privacy before releasing the data to the public. Reducing the disclosure risk introduced by indirect identifiers may involve recording the data. For example, dates may be converted to time intervals, which allows for time-lapse analyses without providing exact dates that may permit the identification of respondents. Variables, such as age and income, may be converted to categories.
Still need help? Contact Us