TY - CONF
T1 - Individual Fairness Evaluation for Automated Essay Scoring System
AU - Doewes, Afrizal
AU - Saxena, Akrati
AU - Pei, Yulong
AU - Pechenizkiy, Mykola
PY - 2022/7/18
Y1 - 2022/7/18
N2 - In Automated Essay Scoring (AES) systems, many previous works have studied group fairness using the demographic features of essay writers. However, individual fairness also plays an important role in fair evaluation and has not been yet explored. Initialized by Dwork et al. [10], the fundamental concept of individual fairness is “similar people should get similar treatment”. In the context of AES, individual fairness means that “similar essays should be treated similarly”. In this work, we propose a methodology to measure individual fairness in AES. The similarity of essays can be computed using the distance of the text representation of essays. We compare several text representations of essays, from the classical text features, such as BOW and TF-IDF, to the more recent deep-learning-based features, such as Sentence-BERT and LASER. We also show their performance against paraphrased essays to understand if they can maintain the ranking of similarities between the original and the paraphrased essays. Finally, we demonstrate how to evaluate the performance of automated scoring systems models with regard to individual fairness by counting the number of pairs of essays that satisfy the individual fairness equation and by observing the correlation of score difference with the distance of essays. Our analysis suggests that the Sentence-BERT, as the text representation of the essays, and Gradient Boosting, as the score prediction model, provide better results based on the proposed individual fairness evaluation methodology.
AB - In Automated Essay Scoring (AES) systems, many previous works have studied group fairness using the demographic features of essay writers. However, individual fairness also plays an important role in fair evaluation and has not been yet explored. Initialized by Dwork et al. [10], the fundamental concept of individual fairness is “similar people should get similar treatment”. In the context of AES, individual fairness means that “similar essays should be treated similarly”. In this work, we propose a methodology to measure individual fairness in AES. The similarity of essays can be computed using the distance of the text representation of essays. We compare several text representations of essays, from the classical text features, such as BOW and TF-IDF, to the more recent deep-learning-based features, such as Sentence-BERT and LASER. We also show their performance against paraphrased essays to understand if they can maintain the ranking of similarities between the original and the paraphrased essays. Finally, we demonstrate how to evaluate the performance of automated scoring systems models with regard to individual fairness by counting the number of pairs of essays that satisfy the individual fairness equation and by observing the correlation of score difference with the distance of essays. Our analysis suggests that the Sentence-BERT, as the text representation of the essays, and Gradient Boosting, as the score prediction model, provide better results based on the proposed individual fairness evaluation methodology.
KW - Individual Fairness
KW - Automated Essay Scoring
KW - Distance Metric
KW - Rank Evaluation
KW - Automated Essay Scoring System
UR - http://www.scopus.com/inward/record.url?scp=85174809220&partnerID=8YFLogxK
U2 - 10.5281/zenodo.6853151
DO - 10.5281/zenodo.6853151
M3 - Paper
ER -