Theoretically, confidence intervals are represented by the kappa subtraction of the desired DE level value times the standard kappa error. As the most frequently desired value is 95%, Formula 1.96 uses as a constant to multiply the standard error of Kappa (SE). The confidence interval formula is as follows: an example of the calculated Kappa statistics is shown in Figure 3. Note that the agreement percentage is 0.94, while the Kappa is 0.85 – a significant reduction in the level of congruence. The greater the expected random chord, the lower the resulting value of the Kappa. This is a simple procedure when the values are zero and one and the number of data collectors is two. If there are more data collectors, the procedure is a little more complex (Table 2). However, as long as the values are limited to only two values, the calculation remains simple. The researcher calculates only the percentage agreement for each line and on average the lines. Another advantage of the matrix is that it allows the researcher to determine whether errors are accidental and are therefore fairly evenly distributed among all flows and variables, or whether a data collector often indicates different values from other data collectors.

Table 2, which has an overall reliability of 90% for interraters, found that no data collector had an excessive number of outlier assessments (scores that did not agree with the majority of the evaluators` scores). Another advantage of this technique is that it allows the researcher to identify variables that can be problematic. Note that Table 2 shows that evaluators received only 60% approval for variable 10. This variable may warrant a review to determine the cause of such a low match in its assessment. To calculate pe (the probability of a random match), we find that: The formula of calculation of Cohens Kappa for two spleens is: where: Po – the relative correspondence observed among the advisors. Pe – the hypothetical probability of a random agreement If you only have two categories, Scott`s statistic (with confidence interval, constructed according to the Eliasziw Thunder Method (1992) is more reliable for inter-rater agreement (Zwick, 1988) than Kappa. – if the standard error to test Kappa is separated for each evaluation category, the standard error for the Kappa test is zero for all of Kappa in the k categories. The Kappa hat is calculated as for the method me>2, k-2 shown above. In the case of multinomiade sampling, the value of the sample is « a » a normal distribution with a large sample. For sample deviation, you can refer to Agresti (2013), p. 435. Thus, we can count on the usual asymptomatic confidence interval at 95%.

Another factor is the number of codes. As the number of codes increases, kappas become higher. Based on a simulation study, Bakeman and colleagues concluded that for fallible observers, Kappa values were lower when codes were lower.