%0 Journal Article %T Power Weighted Versions of Bennett, Alpert, and GoldsteinĄ¯s %A Matthijs J. Warrens %J Journal of Mathematics %D 2014 %I Hindawi Publishing Corporation %R 10.1155/2014/231909 %X A weighted version of Bennett, Alpert, and GoldsteinĄ¯s S, denoted by , is studied. It is shown that the special cases of are often ordered in the same way. It is also shown that many special cases of tend to produce values close to unity, especially when the number of categories of the rating scale is large. It is argued that the application of as an agreement coefficient is not without difficulties. 1. Introduction In behavioral and biomedical science it is frequently required to measure the intensity of a behavior or a disease. Examples are the degree of arousal of a speech-anxious participant while giving a presentation, the severity of lesions from scans, or the severity of sedation during opioid administration for pain management. The intensity of these phenomena is usually classified by a single observer using a rating scale with ordered categories, for example, mild, moderate, or severe. To avoid that the observer did not fully understand what he or she was asked to interpret, the categories must be clearly defined. To measure the reliability of the rating scale researchers typically ask two observers to rate independently the same set of subjects. Analysis of the agreement between the observers can then be used to asses the reliability of the scale. High agreement between the ratings of the observers usually indicates consensus in the diagnosis and interchangeability of the classifications of the observers. For assessing agreement on an ordinal scale various statistical methodologies have been developed. For example, the loglinear models presented in Tanner and Young [1] and Agresti [2, 3] can be used for analyzing the patterns of agreement and potential sources of disagreement. Applications of these models can be found in Becker [4] and Graham and Jackson [5]. However, it turns out that researchers are usually only interested in a coefficient that (roughly) summarizes the agreement in a single number. The most commonly used coefficient for summarizing agreement on an ordinal scale is weighted kappa proposed in Cohen [6] ([5, 7]). Cohen [8] proposed coefficient kappa as an index of agreement when the rating scale has nominal (unordered) categories [9]. The coefficient corrects for agreement due to chance. Weighted kappa extends CohenĄ¯s original kappa to rating scales with ordered categories. In the latter case there is usually more disagreement between the observers on adjacent categories than on categories that are further apart. With weighted kappa it is possible to describe the closeness between categories using weights. Both kappa and %U http://www.hindawi.com/journals/jmath/2014/231909/