Effect of calculating Pointwise Mutual Information using a Fuzzy Sliding Window in Topic Modeling

Emil Rijcken, Kalliopi Zervanou, Marco Spruit, Floortje Scheepers, Uzay Kaymak

Onderzoeksoutput: Hoofdstuk in Boek/Rapport/CongresprocedureConferentiebijdrageAcademicpeer review

89 Downloads (Pure)

Samenvatting

Topic modeling is a popular method for analysing large amounts of unstructured text data and extracting meaningful insights. The coherence of the generated topics is a critical metric for determining the model quality and measuring the semantic relatedness of the words in a topic. The distributional hypothesis, a fundamental theory in linguistics, states that words occurring in the same contexts tend to have similar meanings. Based on this theory, word co-occurrence in a given context is often used to reflect word association in coherence scores. To this end, many coherence scores use Normalised Pointwise Mutual Information (NPMI), which uses a sliding window to describe the neighbourhood that defines the context. It is assumed that there is no other structure in the neighbourhood except for the presence of words. Inspired by the distributional hypothesis, we hypothesise the word distance to be relevant for determining the word association. Hence, we propose using a fuzzy sliding window to define a neighbourhood in which the association between words depends on the membership of the words in the fuzzy sliding window. To this end, we propose Fuzzy Normalized Pointwise Mutual Information (FNPMI) to calculate fuzzy coherence scores. We implement two different neighbourhood structures by the definition of the membership function of the sliding window.
In the first implementation, the association between two words correlates positively with the distance, whereas the correlation is negative in the second. We compare the correlation of our proposed new coherence metrics with human judgment. We find that the use of a fuzzy sliding window correlates less with human judgment than a crisp sliding window. This finding indicates that word distance within a window is less important than defining the window size itself.
Originele taal-2Engels
Titel2023 IEEE International Conference on Fuzzy Systems, FUZZ 2023
UitgeverijInstitute of Electrical and Electronics Engineers
Aantal pagina's6
ISBN van elektronische versie979-8-3503-3228-5
DOI's
StatusGepubliceerd - 9 nov. 2023
Evenement2023 IEEE International Conference on Fuzzy Systems - Songdo Incheon, Zuid-Korea
Duur: 13 aug. 202317 aug. 2023
http://fuzz-ieee.org

Congres

Congres2023 IEEE International Conference on Fuzzy Systems
Verkorte titelFUZZ-IEEE
Land/RegioZuid-Korea
StadSongdo Incheon
Periode13/08/2317/08/23
Internet adres

Vingerafdruk

Duik in de onderzoeksthema's van 'Effect of calculating Pointwise Mutual Information using a Fuzzy Sliding Window in Topic Modeling'. Samen vormen ze een unieke vingerafdruk.

Citeer dit