The initial investigations of computer-assisted interviewing (CAI) methods included a 1996 feasibility experiment and a series of cognitive laboratory studies that investigated specific issues related to response under CAI. In this chapter, we present a summary of these preliminary studies. Additional information can be found elsewhere (Caspar & Edwards, 1997; Research Triangle Institute [RTI], 1997).
4.1 1996 Feasibility Experiment
The 1996 feasibility experiment focused on assessing the operational feasibility of using an electronic version of the NHSDA instrument. The study was conducted in the fall of 1996 and compared two CAI versions to the 1996 paper-and-pencil interview (PAPI) survey in 20 purposively selected primary sampling units (PSUs). Each CAI instrument contained a computer-assisted personal interviewing (CAPI) section that corresponded to the 1996 PAPI instrument and an audio computer-assisted self-interviewing (ACASI) section that corresponded to the 1996 answer sheets. The two versions tested differed in their approach to the ACASI component.
The NHSDA core answer sheets7 began with a question on whether the respondent had ever used the substance, then questions on the characteristics of their use followed. In these core sections, even if the respondent had never used the substance, he or she was required to mark an answer for these subsequent questions on characteristics of use. Thus, for every question, there was a response choice labeled "I have never _______ in my life." The "answer every question strategy" was used to prevent false negative reports of substance use due to either privacy or response burden concerns on the part of the respondent and to eliminate response errors due to failure on the part of the respondent to follow the correct route through the questionnaire. Non-core answer sheets sometimes allowed the respondents to skip detailed questions that were not applicable given earlier responses.
The two CAI versions tested in 1996 differed in the structure of their ACASI components. In one version, the 1996 answer sheets were, more or less, exactly replicated and, for the core sections, respondents were not routed past detailed questions when they reported that they had not used the particular substance. In the other ACASI version, contingent questioning was used in the core answer sheets, and respondents who reported no use of a substance were routed past the detailed questions. The two ACASI versions were called MIRROR and SKIP. The CAPI components were the same.
This 1996 feasibility experiment was conducted in 20 of the 1996 NHSDA PSUs. Within these PSUs, additional housing units were selected and randomly assigned to receive either the CAPI/MIRROR or CAPI/SKIP version. All programming and testing of the CAI instruments was completed on an expedited basis; interviewers were trained in late October 1996, and all CAI interviews were completed in November 1996.8 A total of 435 interviews were completed: 177 paper, 136 CAPI/MIRROR, and 122 CAPI/SKIP.
Additional information was collected using seven mechanisms, described as follows:
Several debriefing questions were completed by the interviewer immediately after the interview to gather information on the setting and privacy of the interview and the respondent's ability to complete the self-administered portions of the interview.
A brief discussion was held between the respondent and interviewer on the privacy, difficulty, and interest in the interview. These discussions were recorded, transcribed, and coded.
An analysis was made of a keystroke file that had recorded all keystrokes made by either interviewers or respondents during the interview.
Time stamps were subsequently analyzed to determine the time required to complete sections of the interview.
An observation form was completed by the interviewer while the respondent was working on the ACASI component of the interview.
Problem logs were maintained by in-house project staff during the data collection.
Interviewer debriefing calls were made during which operations and potential improvements were discussed.
Overall, the 1996 feasibility experiment demonstrated that a CAI approach to collection of NHSDA data was workable:
Respondents were able and willing to complete an extended ACASI interview.
Large decreases in response rates were unlikely.
CAPI reduces the time it takes for interviewers to complete the personal interview component.
The SKIP version was some 10 minutes shorter than the MIRROR version due to respondents answering fewer questions.
About 14% of all ACASI respondents asked the interviewer to explain a question, and about 25% asked about how to use the computer during the interview. Respondents were less likely to ask for help when using the SKIP version.
Relative to the answer sheets, there were large differences in the degree to which the interviewer could glean the respondents' answers to the self-administered sections under the two ACASI versions. This indicated that the ACASI administration was more private, even though the privacy of the overall setting was similar.
ACASI appeared to increase reporting of past year and past month marijuana and cocaine use.
Very few respondents gave a pattern of response that indicated that they were either unwilling or unable to complete the response task once they had begun.
Based on these positive results, several laboratory experiments were conducted in 1997 to examine alternative questioning strategies under a CAI version of the NHSDA.
4.2 Cognitive Laboratory Testing
To refine the CAI NHSDA instrument prior to the 1997 field experiment, cognitive laboratory testing was conducted between April 22, 1997, and June 10, 1997. A total of 50 subjects were recruited and interviewed in RTI's Laboratory for Survey Methods and Measurement. The majority of the testing covered three specific areas: (a) the voice used for the ACASI portion of the interview, (b) a new method for collecting the 12-month frequency of use data, and (c) procedures that allow the respondent to resolve inconsistencies in his or her data at the time of interview. In addition, we tested the implementation of a "multiple use" treatment under which respondents received multiple opportunities to report use of a particular substance within the same time period.
4.2.2 Subject Recruitment and Demographics
Subjects were recruited for laboratory testing through a number of sources, including flyers placed around the Raleigh/Durham/Chapel Hill, North Carolina, area; advertisements placed in newsletters; and word of mouth. For those subjects who were included in the test of multiple use questions, we required them to have drunk alcohol during the past 12 months. Most of the subjects were interviewed at Research Triangle Institute (RTI); however, in a few cases, interviews were scheduled for other locations when participants were unable to travel to RTI. Participants were told that the interview would last no longer than 2 hours. Each participant was paid $35 for his or her time. All the laboratory participants were 18 years of age or older. Youths were not included in this round of testing because the need for parental permission extends the time it takes to conduct laboratory testing.
4.2.3 Inconsistency Resolution Procedures
One of the most significant potential benefits of converting the NHSDA to a computer-assisted format is the chance to resolve inconsistent data at the time of the interview. However, achieving the privacy benefits of the ACASI component of the interview requires that the respondent be able to resolve inconsistencies for many items on his or her own. Thus, one of the goals of the laboratory testing was to develop a method for resolving inconsistent data that the respondent could easily understand and complete without significant intervention by the interviewer.
We developed a resolution methodology that combines two components. First, respondents were asked to verify that an answer they entered was, in fact, correct. So, for example, when a 20-year-old respondent indicated that she was 51 the first time she drank alcohol (a clearly inconsistent answer), the computer was programmed to verify that this information was correct. This step was included to help eliminate inconsistencies that may be due to keying errors. If the respondent indicated that this information was incorrect, she was then routed back to answer the question again (perhaps this time entering her age when she took her first drink as "15"). A second component incorporates the resolution of inconsistent answers. For example, a respondent who indicated drinking alcohol on 15 days in the past 12 months, but then reported drinking alcohol on 25 days in the past 30 days would first be asked to verify the last entry keyed. If he indicated that the entry was correct, then he was routed to a question that identified the inconsistency and was provided with an opportunity to fix the incorrect entry.
In developing the actual text of these verification and resolution screens, we sought to incorporate several important features:
The original responses that were inconsistent were provided to the respondent to enhance recall and comprehension.
The resolution screens were worded so as to not explicitly place the responsibility for the inconsistency on the respondent (e.g., "your answers"), but rather to imply that the computer may be incorrect (e.g., "the computer recorded").
Respondents were asked to identify the incorrect response when two items were inconsistent to facilitate the flow of questioning.
When a respondent indicated that both answers were incorrect, he or she was routed back to the two items in the same order they were presented the first time in order to maintain a consistent flow through the instrument.
Respondents were explicitly notified when their answers were inconsistent. This was done rather than attempting to resolve the inconsistency without actually making the respondents aware of the problem.
The results indicated that in at least one case we were incorrect in our assumptions about how best to structure the verification and resolution tasks.
To maximize the efficiency of our laboratory testing, we felt we could not rely on respondents to give inconsistent answers in the laboratory testing. Even with an extremely large sample, the number of respondents who will provide inconsistent answers is quite small. Because only 40 respondents were recruited for this task, it seemed entirely possible that none of the respondents who came into the laboratory would provide an inconsistent response. For this reason, we developed a laboratory task that incorporated the use of vignettes. The vignettes we used were essentially brief descriptions of a person and his or her drinking behavior. The laboratory subject was instructed to answer a series of questions about drinking alcohol as though he or she was the person in the vignette. The subject read the vignette, then began answering the questions as they appeared on the computer screen. At a specific point in the questioning process, the subject was instructed to obtain additional information from the interviewer to be able to continue answering the questions. This additional information resulted in the subject providing inconsistent answers, which he or she was then required to verify or resolve.
The vignette methodology was somewhat artificial. That is, subjects did not answer the questions based on their own experiences. Thus, we did not learn anything about why people may give inconsistent answers (e.g., poorly worded questions, difficulty in recalling the information, desire to conceal information). However, our primary objective in this testing was to determine whether respondents could easily navigate through the verification and resolution process without becoming either confused or annoyed.
Our initial round of testing indicated one very pronounced problem. Respondents found it very confusing to be asked which of their answers was incorrect when two answers were identified as inconsistent. Respondents reported that it was much more logical to be asked which answer was correct. Even when the researcher pointed out that the reason for the question was to determine which question to re-ask, the subjects were nearly unanimous in their preference for identifying the correct answer.
Our initial testing also pointed to some problems with the vignette task itself. A number of respondents had difficulties figuring out which information in the vignette was applicable to which survey question. In our effort to make the vignettes seem more "realistic," we added information not specifically needed to complete the set of questions asked. To reduce the confusion caused by the vignettes, we scaled back the amount of information provided in each vignette to only that which was needed to complete the task.
Using the revised vignettes and the resolution process that asked the respondent to indicate which of his or her answers was correct, we began testing a second round of subjects. The second round of subjects seemed to find the resolution task much easier than subjects in the first round had found it. Subjects were able to easily select which of the answers was correct during the resolution process and understood that they were being routed back to the incorrect item to make the necessary correction. In general, respondents were not put off by the verification and resolution process. Some respondents went so far as to note that they would appreciate the computer pointing out inconsistencies in their data. Although not unanimous, most respondents preferred the less direct wording of "the computer recorded..." to the more direct wording that would say, "you reported that..." Respondents with little computer experience seemed to prefer this wording because they believed that the computer could make mistakes in how entries were stored. Respondents with greater computer literacy recognized that recording errors are made by the respondent and not the computer. However, the majority of these respondents still felt the less direct wording would be less confrontational and less likely to embarrass the respondent.
Verification screens were reported to be easier to complete than the resolution questions. Comments made by the subjects indicated that the verification screens were short and to the point. Resolution screens were reported to be "too wordy." Respondents indicated that there was so much text to read that it was easy to get confused. Therefore, for our second round of testing, we reduced the amount of text in the resolution questions. For example, in round two the resolution screen that identified inconsistencies between 12-month frequency and 30-day frequency was worded as follows:
The computer compared the answers for the last question and an earlier question. According to the answers it recorded, you drank one or more alcoholic beverages on more days in the past 30 days than in the past 12 months. This is not possible. Which of the following is correct?
I drank alcohol on [XX] days in the past 12 months
I drank alcohol on [YY] days in the past 30 days
Neither answer is correct
Subjects noted that the scripting in the body of the question was repeated again in the categories. This redundancy was viewed as unnecessary and sometimes confusing.
Prior to conducting the third (and last) round of laboratory interviews, we made some additional changes to the resolution screens to reduce the amount of text. Scripting was reduced again, and the revised version of the question shown above was worded as follows:
The answers for the last question and an earlier question disagree. Which answer is correct?
I drank alcohol on [XX] days in the past 12 months
I drank alcohol on [YY] days in the past 30 days
Neither answer is correct
The revised wording seemed to work well in the laboratory. Subjects were still able to complete the resolution task with little trouble, and we did not have as many complaints about the wordiness of the screens.
Based on these three rounds of testing, we developed a method for resolving inconsistent responses in the NHSDA that was tested on respondents in the 1997 field experiment.
4.2.4 12-Month Frequency of Use
Two problems with the 12-month frequency of use item have been noted over the years. First, the question is difficult to answer because it requires the respondent to recall information over a long period of time. Second, the answer categories combine total number or days with a periodicity estimate, which confuses respondents who have "episodic" use patterns. The response categories were as follows:
More than 300 days (every day or almost every day),
At least 201 but not more than 300 days (5 to 6 days a week),
At least 101 but not more than 200 days (3 to 4 days a week),
At least 51 but not more than 100 days (1 to 2 days a week),
At least 25 but not more than 50 days (3 to 4 days a month),
At least 12 but not more than 24 days (1 to 2 days a month),
At least 6 but not more than 11 days (less than 1 day a month),
At least 3 but not more than 5 days in the past 12 months, and
At least 1 but not more than 2 days in the past 12 months.
For example, a respondent who only drinks alcohol every day during a 2-week vacation each year may have difficulty choosing a category because the number of days she drank alcohol (14) falls in a category that is also identified as "1 to 2 days a month."
To begin to understand how people work with these categories, we developed a laboratory protocol that split the two parts of the response categories. Respondents were first asked to report the number of days they drank alcohol during the past 12 months using a showcard that displayed only the text shown in parentheses (i.e., "every day or almost every day," "5 to 6 days a week," etc.). Based on the category they selected, the interviewer provided them with a numerical estimate of their 12-month frequency. In each case, the estimate given was the numeric range that corresponds to the text in parentheses. Subjects were asked whether the numeric estimate seemed right for them and if not, whether the actual frequency was higher or lower than the estimate. We also developed a parallel set of items that asked about eating candy. These items were developed for use with respondents who had not drunk alcohol during the past 12 months. Initially, we used these items only with nondrinkers, but about halfway through the testing period, we began taking every subject through both sets of materials in order to maximize the amount of information we could collect about how people work with the set of categories.
Not surprisingly, respondents had difficulty recalling their alcohol use over the past 12 months. Although having categories to choose from made the task easier, a number of subjects still reported estimating their use based on their use over just the past few months. Respondents with more sporadic use patterns had more difficulties reporting their answer than did respondents with regular use patterns. Generally, the difficulty was caused by the implicit regularity of the response alternatives. Subjects who used the substance infrequently noted that the categories were difficult to work with because they implied that the use occurred on a regular basis. One subject volunteered that although he could answer the questions for alcohol, his sporadic use of other drugs (marijuana and inhalants) would make it difficult for him to report his use accurately.
However, among respondents who were able to select a category (about 70% of the respondents), the estimate provided by the interviewer was reported to be correct. Respondents noted that because there was such a broad range to the categories, it was possible to allow for less regular use without needing to switch categories. In only four cases did the subjects indicate that their actual use was lower or higher than the estimate provided. In these cases, respondents were not changing categories based on an exact count, but rather based on "gut reactions" that led them to believe that their use was outside the range provided. In some cases, respondents seemed to agree with the estimate provided by the interviewer simply because the interviewer was considered to be the authority. Several respondents made such comments as, "if that's how it multiplies out, then it must be right" or "if you say so." Some respondents checked the interviewer's math themselves by multiplying the rate of use by the number of months or weeks in a year.
Based on this laboratory testing, it seemed that the biggest problem respondents had with the 12-month frequency of use question was dealing with the inherent periodicity contained in the categories. The question was particularly difficult for respondents with sporadic or infrequent use. In an attempt to facilitate the response task, we developed a revised 12-month frequency of use question series. This new series includes a question that allows the respondent to select the unit for reporting his or her use. Respondents can choose to report the number of days per week, the days per month, or the total number of days they used during the past 12 months. The follow-up question that the respondent receives is based on his or her choice of units.
We interviewed 10 subjects using this protocol and asked them to report any confusion or difficulties they experienced with any of the questions. With only 10 subjects, we must use caution in interpreting our results, but the general response was favorable. All but one of the subjects readily understood why the units question was being asked. Most of the subjects noted that the inclusion of the units question made the 12-month frequency question significantly easier to answer. Two respondents chose to report their use in total number of days, four chose to report the average number of days per month, and four chose to report the average number of days per week. When asked why they selected the category they did, respondents uniformly reported they had chosen their category based on the frequency of their use. The two respondents who chose to report total number of days both indicated that they drank only on special occasions (New Year's Eve, weddings, anniversaries, etc.) and thus it was easiest to count up the special occasions that had occurred during the past 12 months. Respondents who chose to report the average number of days per month did so because there were too many days for them to count up total days, but they did not drink frequently enough to report weekly. Respondents who chose to report the average number of days per week did so because they drink more often and more regularly than monthly.
Particularly for the respondents who chose to report the average number of days per month, we were concerned that their answer to the 12-month frequency question would affect how they answered the past 30-day frequency question. This proved not to be a problem, however. Respondents did not feel that their past 30-day frequency of use needed to parallel their answer for the 12-month frequency of use. The respondents commented that the 12-month frequency question was asking for an average number of days per month during the past 12 months, while the 30-day frequency question was asking about a specific 30-day period. Thus, respondents did not feel obligated to provide the same answer to the 30-day question that they had just provided to the 12-month question. Their responses were similar, howeverusually within 1 to 3 days of their 12-month answer.
In short, the new 12-month series of questions seemed to work well. Respondents had a few comments for ways to improve the wording of the response categories to make them more clear, but overall this methodology seemed feasible and was used for the 1997 field experiment.
4.2.5 Selecting a Voice for ACASI
Originally, we had planned to examine respondent reactions to male and female voices in the 1997 field experiment by using a within-subjects design in which respondents heard some of the questions read in a female voice and some read in a male voice. We decided to drop this treatment from the experiment because there was not time at the end of the development period to record and program the entire interview in two voices. However, a cognitive laboratory study examined respondents' reactions to the different voices. Respondents listened to four pairs of voices (male and female in each pair) and were asked to indicate which pair they preferred. Following this, they listened to each voice and rated it on several voice characteristics. Respondents were able to reliably choose a preferred voice. There was some indication that respondents rated the preferred female voice as being deeper and slower and as having fewer changes in loudness than the other female voices. Among the male voices, there was little difference in the respondents' ratings of voice characteristics for the preferred voice relative to the others, although the two most preferred male voices were ranked as having higher pitch (see Caspar & Edwards, 1997, for detailed results of this study).
7 The term "answer sheets" is used because the interview was constructed so that the respondent could mark answers in response to questions read by an interviewer. In most cases, however, the respondent was allowed to complete an answer sheet without the interviewer reading the questions.
8 Because the PAPI cases were part of the regular NHSDA sample, interviewing of these cases continued through the end of 1996, which corresponds to the regular NHSDA Quarter 4 data collection period.
This page was last updated on June 16, 2008.