Samenvatting
Community-Based Question Answering (CQA) has grown in popularity as a way for people from all backgrounds to share information and knowledge. Stack Overflow is a widespread CQA website that focuses on problems and queries related to programming. Many of the questions posted on Stack Overflow have already been answered. However, two questions that ask the same thing could have vastly different vocabulary and grammatical structures, making determining their semantic equivalence difficult. Automatic duplicate detection saves moderators time before taking action and also assists question issuers in finding solutions rapidly. Also, finding a similar question on two different websites in two different languages is a troublesome task. Thus, the proposed approach focuses on similarity detection on the Stack Overflow website in English and Spanish. It prepares labeled data by collecting the questions from both websites and providing the labels manually. Moreover, it utilizes the Synthetic Minority Oversampling Technique (SMOTE) data augmentation technique for data balancing. This work also uses machine learning techniques such as neural networks, Word Mover Distance (WDM), and Logistic Regression for detecting similar questions on SO and SO-ES sites. The model is evaluated using standard metrics such as the confusion matrix, accuracy, and recall. Logistic Regression outperforms the other three algorithms in terms of accuracy, while WDM performs well in terms of recall.
Originele taal-2 | Engels |
---|---|
Titel | Proceedings - 2022 48th Latin American Computing Conference, CLEI 2022 |
Uitgeverij | Institute of Electrical and Electronics Engineers |
ISBN van elektronische versie | 9781665476713 |
DOI's | |
Status | Gepubliceerd - 2022 |
Evenement | 48th Latin American Computing Conference, CLEI 2022 - Armenia, Colombia Duur: 17 okt. 2022 → 21 okt. 2022 |
Congres
Congres | 48th Latin American Computing Conference, CLEI 2022 |
---|---|
Land/Regio | Colombia |
Stad | Armenia |
Periode | 17/10/22 → 21/10/22 |
Bibliografische nota
Publisher Copyright:© 2022 IEEE.