TY - GEN
T1 - Forecasting Publication’s Success Using Machine Learning
AU - Alchokr, Rand
AU - Haider, Rayed
AU - Shakeel, Yusra
AU - Leich, Thomas
AU - Saake, Gunter
AU - Krüger, Jacob
PY - 2023
Y1 - 2023
N2 - Measuring the success and impact of a scientific publication is an important, thus controversial matter. Despite all the criticism, it is widespread that citation counts is considered a popular indication of a publication‘s success. Therefore, in this paper, we use a machine learning framework to test the ability of alternative metrics (altmetrics) to predict the future impact of papers reflected in the citation counts. To achieve the experiment, we extracted 7,588 papers from 10 computer science journals. To build the feature space for the prediction problem, 14 different altmetric indices were collected, 3 feature selection approaches, namely, Variance threshold, Pearson’s Correlation, and Mutual information method, were used to minimize the feature space and rank the features according to their contribution to the original dataset. To identify the classification performance of these features, three classifiers were used: Decision Tree, Random Forest, and Support Vector Machines. According to the experimental data, altmetrics can predict future citations and the most useful altmetrics indications are social media count, tweets, news count, capture count, and full-text view, with Random Forest outperforming the other classifiers.
AB - Measuring the success and impact of a scientific publication is an important, thus controversial matter. Despite all the criticism, it is widespread that citation counts is considered a popular indication of a publication‘s success. Therefore, in this paper, we use a machine learning framework to test the ability of alternative metrics (altmetrics) to predict the future impact of papers reflected in the citation counts. To achieve the experiment, we extracted 7,588 papers from 10 computer science journals. To build the feature space for the prediction problem, 14 different altmetric indices were collected, 3 feature selection approaches, namely, Variance threshold, Pearson’s Correlation, and Mutual information method, were used to minimize the feature space and rank the features according to their contribution to the original dataset. To identify the classification performance of these features, three classifiers were used: Decision Tree, Random Forest, and Support Vector Machines. According to the experimental data, altmetrics can predict future citations and the most useful altmetrics indications are social media count, tweets, news count, capture count, and full-text view, with Random Forest outperforming the other classifiers.
KW - Bibliometric
KW - Alternative metrics,
KW - Machine learning
KW - Computer Science
M3 - Conference contribution
T3 - CEUR Workshop Proceedings
SP - 77
EP - 89
BT - BIR 2023 : Bibliometric-enhanced Information Retrieval
A2 - Frommholz, Ingo
A2 - Mayr, Philipp
A2 - Cabanac, Guillaume
A2 - Verberne, Suzan
A2 - Brennan, Jordan
PB - CEUR-WS.org
T2 - 13th International Workshop on Bibliometric-enhanced Information Retrieval, BIR 2023
Y2 - 2 April 2023 through 2 April 2023
ER -