|
Against quantiles: categorization of continuous variables in epidemiologic research, and its discontentsAbstract: In this paper we argue that this approach is highly problematic and present several potential alternatives. We also discuss the perceived drawbacks of these newer statistical methods and the possible reasons for their slow adoption by epidemiologists.The use of quantiles is often inadequate for epidemiologic research with continuous variables.Epidemiology is often introduced using examples in which both exposure and outcome are considered in binary terms: research participants are defined as having, say, lung cancer or not, and being smokers or not, and then the proportion of smokers compared between cases and controls. Many exposures, however, are inherently continuous. Indeed, in the classic case-control study on smoking and lung cancer[1], Doll and Bradford-Hill report results both for cases and controls in terms of proportion of smokers and by "amount of tobacco consumed", grouping into several different categories such as 1 - 4, 15-24 or 50 + cigarettes per day. In contemporary epidemiologic practice, it is more customary to group continuous variables into quantiles - most often tertiles, quartiles or quintiles - based on the exposure's distribution. In one recent study, for example, researchers examining the link between dietary fat and breast cancer grouped fat intake into quintiles. They reported that women in the highest quintile of fat intake were 11% more likely to get breast cancer than women in the lowest quintile[2]. As another example, surgeon annual caseload was found to be significantly associated with the survival of patients after an acute myocardial infarction[3]. The authors reported that the 30-day mortality rate was 13.5% for physicians in the lowest quartile of volume (5 or fewer cases per year) compared to 11.8% for physicians in the highest quartile (more than 24 cases annually).A number of researchers have commented on the disadvantages of categorization in epidemiologic studies[4]. Many associations can be tested using linear models and p
|