A Survey Design for a Sensitive Binary Variable Correlated with Another Nonsensitive Binary Variable

Tian et al. (2007) introduced a so-called hidden sensitivity model for evaluating the association of two sensitive questions with binary outcomes. However, in practice, we sometimes need to assess the association between one sensitive binary variable (e.g., whether or not a drug user, the number of sex partner being ?1 or >1, and so on) and one nonsensitive binary variable (e.g., good or poor health status, with or without cervical cancer, and so on). To address this issue, by sufficiently utilizing the information contained in the non-sensitive binary variable, in this paper, we propose a new survey scheme, called combination questionnaire design/model, which consists of a main questionnaire and a supplemental questionnaire. The introduction of the supplemental questionnaire which is indeed a design of direct questioning can effectively reduce the noncompliance behavior since more respondents will not be faced with the sensitive question. Likelihood-based inferences including maximum likelihood estimates via the expectation-maximization algorithm, asymptotic confidence intervals, and bootstrap confidence intervals of parameters of interest are derived. A likelihood ratio test is provided to test the association between the two binary random variables. Bayesian inferences are also discussed. Simulation studies are performed, and a cervical cancer data set in Atlanta is used to illustrate the proposed methods. 1. Introduction Warner [1] introduced a randomized response technique to obtain truthful answers to questions with sensitive attributes. Using the Warner design, Kraemer [2] derived a bivariate correlation between an attribute with polytomous responses and an attribute with normally distributed responses. Fox and Tracy [3] derived estimation of the Pearson product moment correlation coefficient between two sensitive questions by assuming that randomized response observations can be treated as individual-level scores that are contaminated by random measurement error. Edgell et al. [4] considered the correlation between two sensitive questions using the unrelated question design or the additive constants design. Christofides [5] presented a randomized response technique with two randomization devices to estimate the proportion of individuals having two sensitive characteristics at the same time. Kim and Warde [6] considered a multinomial randomized response model which can handle untruthful responses. They also derived the Pearson product moment correlation estimator which may be used to quantify the linear relationship between two variables when


