Samenvatting
Topic modeling is a popular method for analysing large amounts of unstructured text data and extracting meaningful insights. The coherence of the generated topics is a critical metric for determining the model quality and measuring the semantic relatedness of the words in a topic. The distributional hypothesis, a fundamental theory in linguistics, states that words occurring in the same contexts tend to have similar meanings. Based on this theory, word co-occurrence in a given context is often used to reflect word association in coherence scores. To this end, many coherence scores use Normalised Pointwise Mutual Information (NPMI), which uses a sliding window to describe the neighbourhood that defines the context. It is assumed that there is no other structure in the neighbourhood except for the presence of words. Inspired by the distributional hypothesis, we hypothesise the word distance to be relevant for determining the word association. Hence, we propose using a fuzzy sliding window to define a neighbourhood in which the association between words depends on the membership of the words in the fuzzy sliding window. To this end, we propose Fuzzy Normalized Pointwise Mutual Information (FNPMI) to calculate fuzzy coherence scores. We implement two different neighbourhood structures by the definition of the membership function of the sliding window.
In the first implementation, the association between two words correlates positively with the distance, whereas the correlation is negative in the second. We compare the correlation of our proposed new coherence metrics with human judgment. We find that the use of a fuzzy sliding window correlates less with human judgment than a crisp sliding window. This finding indicates that word distance within a window is less important than defining the window size itself.
In the first implementation, the association between two words correlates positively with the distance, whereas the correlation is negative in the second. We compare the correlation of our proposed new coherence metrics with human judgment. We find that the use of a fuzzy sliding window correlates less with human judgment than a crisp sliding window. This finding indicates that word distance within a window is less important than defining the window size itself.
Originele taal-2 | Engels |
---|---|
Titel | 2023 IEEE International Conference on Fuzzy Systems, FUZZ 2023 |
Uitgeverij | Institute of Electrical and Electronics Engineers |
Aantal pagina's | 6 |
ISBN van elektronische versie | 979-8-3503-3228-5 |
DOI's | |
Status | Gepubliceerd - 9 nov. 2023 |
Evenement | 2023 IEEE International Conference on Fuzzy Systems - Songdo Incheon, Zuid-Korea Duur: 13 aug. 2023 → 17 aug. 2023 http://fuzz-ieee.org |
Congres
Congres | 2023 IEEE International Conference on Fuzzy Systems |
---|---|
Verkorte titel | FUZZ-IEEE |
Land/Regio | Zuid-Korea |
Stad | Songdo Incheon |
Periode | 13/08/23 → 17/08/23 |
Internet adres |