TY - GEN
T1 - Statistical tests for joint analysis of performance measures
AU - Benavoli, Alessio
AU - de Campos, Cassio P.
PY - 2016/1/8
Y1 - 2016/1/8
N2 - Recently there has been an increasing interest in the development of new methods using Pareto optimality to deal with multiobjective criteria (for example, accuracy and architectural complexity). Once one has learned a model based on their devised method, the problem is then how to compare it with the state of art. In machine learning, algorithms are typically evaluated by comparing their performance on different data sets by means of statistical tests. Unfortunately, the standard tests used for this purpose are not able to jointly consider performance measures. The aim of this paper is to resolve this issue by developing statistical procedures that are able to account for multiple competing measures at the same time. In particular, we develop two tests: a frequentist procedure based on the generalized likelihood-ratio test and a Bayesian procedure based on a multinomial-Dirichlet conjugate model. We further extend them by discovering conditional independences among measures to reduce the number of parameter of such models, as usually the number of studied cases is very reduced in such comparisons. Real data from a comparison among general purpose classifiers is used to show a practical application of our tests.
AB - Recently there has been an increasing interest in the development of new methods using Pareto optimality to deal with multiobjective criteria (for example, accuracy and architectural complexity). Once one has learned a model based on their devised method, the problem is then how to compare it with the state of art. In machine learning, algorithms are typically evaluated by comparing their performance on different data sets by means of statistical tests. Unfortunately, the standard tests used for this purpose are not able to jointly consider performance measures. The aim of this paper is to resolve this issue by developing statistical procedures that are able to account for multiple competing measures at the same time. In particular, we develop two tests: a frequentist procedure based on the generalized likelihood-ratio test and a Bayesian procedure based on a multinomial-Dirichlet conjugate model. We further extend them by discovering conditional independences among measures to reduce the number of parameter of such models, as usually the number of studied cases is very reduced in such comparisons. Real data from a comparison among general purpose classifiers is used to show a practical application of our tests.
UR - http://www.scopus.com/inward/record.url?scp=84955318574&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-28379-1_6
DO - 10.1007/978-3-319-28379-1_6
M3 - Conference contribution
SN - 978-3-319-28378-4
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 76
EP - 92
BT - Advanced Methodologies for Bayesian Networks - 2nd International Workshop, AMBN 2015, Proceedings
A2 - Suzuki, Joe
A2 - Ueno, Maomi
PB - Springer
CY - Berlin
T2 - 2nd International Workshop on Advanced Methodologies for Bayesian Networks, AMBN 2015
Y2 - 16 November 2015 through 18 November 2015
ER -