%0 Journal Article %T Weighted Kappas for Tables %A Matthijs J. Warrens %J Journal of Probability and Statistics %D 2013 %I Hindawi Publishing Corporation %R 10.1155/2013/325831 %X Weighted kappa is a widely used statistic for summarizing inter-rater agreement on a categorical scale. For rating scales with three categories, there are seven versions of weighted kappa. It is shown analytically how these weighted kappas are related. Several conditional equalities and inequalities between the weighted kappas are derived. The analytical analysis indicates that the weighted kappas are measuring the same thing but to a different extent. One cannot, therefore, use the same magnitude guidelines for all weighted kappas. 1. Introduction In biomedical, behavioral, and engineering research, it is frequently required that a group of objects is rated on a categorical scale by two observers. Examples are the following: clinicians that classify the extent of disease in patients; pathologists that rate the severity of lesions from scans; and experts that classify production faults. Analysis of the agreement between the two observers can be used to assess the reliability of the rating system. High agreement would indicate consensus in the diagnosis and interchangeability of the observers. Various authors have proposed statistical methodology for analyzing agreement. For example, for modeling patterns of agreement, the loglinear models proposed in Tanner and Young [1] and Agresti [2, 3] can be used. However, in practice researchers are frequently only interested in a single number that quantifies the degree of agreement between the raters [4, 5]. Various statistics have been proposed in the literature [6, 7], but the most popular statistic for summarizing rater agreement is the weighted kappa introduced by Cohen [8]. Weighted kappa allows the use of weighting schemes to describe the closeness of agreement between categories. Each weighting scheme defines a different version or special case of weighted kappa. Different weighting schemes have been proposed for the various scale types. In this paper, we only consider scales of three categories. This is the smallest number of categories for which we can distinguish three types of categorical scales, namely, nominal scales, continuous-ordinal scales, and dichotomous-ordinal scales [9]. A dichotomous-ordinal scale contains a point of ˇ°absenceˇ± and two points of ˇ°presenceˇ±, for example, no disability, moderate disability, or severe disability. A continuous-ordinal scale does not have a point of ˇ°absenceˇ±. The scale can be described by three categories of ˇ°presenceˇ±, for example, low, moderate, or high. Identity weights are used when the categories are nominal [10]. In this case, weighted kappa becomes the %U http://www.hindawi.com/journals/jps/2013/325831/