Individual Fairness Evaluation for Automated Essay Scoring System

Onderzoeksoutput: Bijdrage aan congresPaperAcademic

3 Citaten (Scopus)

Samenvatting

In Automated Essay Scoring (AES) systems, many previous works have studied group fairness using the demographic features of essay writers. However, individual fairness also plays an important role in fair evaluation and has not been yet explored. Initialized by Dwork et al. [10], the fundamental concept of individual fairness is “similar people should get similar treatment”. In the context of AES, individual fairness means that “similar essays should be treated similarly”. In this work, we propose a methodology to measure individual fairness in AES. The similarity of essays can be computed using the distance of the text representation of essays. We compare several text representations of essays, from the classical text features, such as BOW and TF-IDF, to the more recent deep-learning-based features, such as Sentence-BERT and LASER. We also show their performance against paraphrased essays to understand if they can maintain the ranking of similarities between the original and the paraphrased essays. Finally, we demonstrate how to evaluate the performance of automated scoring systems models with regard to individual fairness by counting the number of pairs of essays that satisfy the individual fairness equation and by observing the correlation of score difference with the distance of essays. Our analysis suggests that the Sentence-BERT, as the text representation of the essays, and Gradient Boosting, as the score prediction model, provide better results based on the proposed individual fairness evaluation methodology.

Originele taal-2Engels
DOI's
StatusGepubliceerd - 18 jul. 2022

Vingerafdruk

Duik in de onderzoeksthema's van 'Individual Fairness Evaluation for Automated Essay Scoring System'. Samen vormen ze een unieke vingerafdruk.

Citeer dit