Question Similarity Detection on Stack Overflow Sites

Onderzoeksoutput: Hoofdstuk in Boek/Rapport/CongresprocedureConferentiebijdrageAcademicpeer review

Samenvatting

Community-Based Question Answering (CQA) has grown in popularity as a way for people from all backgrounds to share information and knowledge. Stack Overflow is a widespread CQA website that focuses on problems and queries related to programming. Many of the questions posted on Stack Overflow have already been answered. However, two questions that ask the same thing could have vastly different vocabulary and grammatical structures, making determining their semantic equivalence difficult. Automatic duplicate detection saves moderators time before taking action and also assists question issuers in finding solutions rapidly. Also, finding a similar question on two different websites in two different languages is a troublesome task. Thus, the proposed approach focuses on similarity detection on the Stack Overflow website in English and Spanish. It prepares labeled data by collecting the questions from both websites and providing the labels manually. Moreover, it utilizes the Synthetic Minority Oversampling Technique (SMOTE) data augmentation technique for data balancing. This work also uses machine learning techniques such as neural networks, Word Mover Distance (WDM), and Logistic Regression for detecting similar questions on SO and SO-ES sites. The model is evaluated using standard metrics such as the confusion matrix, accuracy, and recall. Logistic Regression outperforms the other three algorithms in terms of accuracy, while WDM performs well in terms of recall.

Originele taal-2Engels
TitelProceedings - 2022 48th Latin American Computing Conference, CLEI 2022
UitgeverijInstitute of Electrical and Electronics Engineers
ISBN van elektronische versie9781665476713
DOI's
StatusGepubliceerd - 2022
Evenement48th Latin American Computing Conference, CLEI 2022 - Armenia, Colombia
Duur: 17 okt. 202221 okt. 2022

Congres

Congres48th Latin American Computing Conference, CLEI 2022
Land/RegioColombia
StadArmenia
Periode17/10/2221/10/22

Bibliografische nota

Publisher Copyright:
© 2022 IEEE.

Vingerafdruk

Duik in de onderzoeksthema's van 'Question Similarity Detection on Stack Overflow Sites'. Samen vormen ze een unieke vingerafdruk.

Citeer dit