Evaluating Quadratic Weighted Kappa as the Standard Performance Metric for Automated Essay Scoring

Afrizal Doewes, Nughthoh Kurdhi, Akrati Saxena

Onderzoeksoutput: Hoofdstuk in Boek/Rapport/CongresprocedureConferentiebijdrageAcademicpeer review

659 Downloads (Pure)

Samenvatting

Automated Essay Scoring (AES) tools aim to improve the efficiency and consistency of essay scoring by using machine learning algorithms. In the existing research work on this topic, most researchers agree that human-automated score agreement remains the benchmark for assessing the accuracy of machine-generated scores. To measure the performance of AES models, the Quadratic Weighted Kappa (QWK) is commonly used as the evaluation metric. However, we have identified several limitations of using QWK as the sole metric for evaluating AES model performance. These limitations include its sensitivity to the rating scale, the potential for the so-called “kappa paradox” to occur, the impact of prevalence, the impact of the position of agreements in the diagonal agreement matrix, and its limitation in handling a large number of raters. Our findings suggest that relying solely on QWK as the evaluation metric for AES performance may not be sufficient. We further discuss insights into additional metrics to comprehensively evaluate the performance and accuracy of AES models.
Originele taal-2Engels
TitelProceedings of the 16th International Conference on Educational Data Mining
RedacteurenMingyu Feng, Tanja Käser, Partha Talukdar
UitgeverijInternational Educational Data Mining Society (IEDMS)
Pagina's103-113
Aantal pagina's11
ISBN van elektronische versie978-1-7336736-4-8
DOI's
StatusGepubliceerd - 11 jul. 2023
Evenement16th International Conference on Educational Data Mining, EDM 2023 - Bengaluru, India
Duur: 11 jul. 202314 jul. 2023

Congres

Congres16th International Conference on Educational Data Mining, EDM 2023
Verkorte titelEDM 2023
Land/RegioIndia
StadBengaluru
Periode11/07/2314/07/23

Vingerafdruk

Duik in de onderzoeksthema's van 'Evaluating Quadratic Weighted Kappa as the Standard Performance Metric for Automated Essay Scoring'. Samen vormen ze een unieke vingerafdruk.

Citeer dit