Comparison between inter-rater agreement measures

Welcome,
This demo compares the agreement between evaluators in a dataset, computed with Common Agreement Phi, Krippendorff's alpha, and Percent Agreement.
Each row represents a different item, and each column represents a different assessor.
Note that this demo has a limited allocation of memory and CPU time. For extensive analyses please download the source code and run it locally.
You can input your own file or generate a random example:

Random example

The following table contains an example randomly created:

13322
33551
42254
42314

Upload your own file

You can also upload a csv numeric file and compute the agreement on it. Each row represents a different item, and each column represents a different assessor.
The file has to conform to the following:
  • CSV format;
  • no header;
  • extremes of the scale appearing at least once in the file;
(You can access a version of Phi without such limitations here)

Results

No results yet.

More information in our paper

Let's Agree to Disagree: Fixing Agreement Measures for Crowdsourcing. Alessandro Checco, Kevin Roitero, Eddy Maddalena, Stefano Mizzaro and Gianluca Demartini. in Proceedings of the 5th AAAI Conference on Human Computation and Crowdsourcing (HCOMP) 2017.

BibTeX

@inproceedings{checco2017let,
  title={Let’s Agree to Disagree: Fixing Agreement Measures for Crowdsourcing},   author={Checco, A and Roitero, A and Maddalena, E and Mizzaro, S and Demartini, G},   booktitle={Proceedings of the Fifth AAAI Conference on Human Computation and Crowdsourcing (HCOMP-17)},   pages={11--20},   year={2017},   organization={AAAI Press}
}
Demo by Eddy Maddalena and Alessandro Checco. Email: a.checco@sheffield.ac.uk.