Effect of calculating Pointwise Mutual Information using a Fuzzy Sliding Window in Topic Modeling

Emil Rijcken, Kalliopi Zervanou, Marco Spruit, Floortje Scheepers, Uzay Kaymak

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

71 Downloads (Pure)

Abstract

Topic modeling is a popular method for analysing large amounts of unstructured text data and extracting meaningful insights. The coherence of the generated topics is a critical metric for determining the model quality and measuring the semantic relatedness of the words in a topic. The distributional hypothesis, a fundamental theory in linguistics, states that words occurring in the same contexts tend to have similar meanings. Based on this theory, word co-occurrence in a given context is often used to reflect word association in coherence scores. To this end, many coherence scores use Normalised Pointwise Mutual Information (NPMI), which uses a sliding window to describe the neighbourhood that defines the context. It is assumed that there is no other structure in the neighbourhood except for the presence of words. Inspired by the distributional hypothesis, we hypothesise the word distance to be relevant for determining the word association. Hence, we propose using a fuzzy sliding window to define a neighbourhood in which the association between words depends on the membership of the words in the fuzzy sliding window. To this end, we propose Fuzzy Normalized Pointwise Mutual Information (FNPMI) to calculate fuzzy coherence scores. We implement two different neighbourhood structures by the definition of the membership function of the sliding window.
In the first implementation, the association between two words correlates positively with the distance, whereas the correlation is negative in the second. We compare the correlation of our proposed new coherence metrics with human judgment. We find that the use of a fuzzy sliding window correlates less with human judgment than a crisp sliding window. This finding indicates that word distance within a window is less important than defining the window size itself.
Original languageEnglish
Title of host publication2023 IEEE International Conference on Fuzzy Systems, FUZZ 2023
PublisherInstitute of Electrical and Electronics Engineers
Number of pages6
ISBN (Electronic)979-8-3503-3228-5
DOIs
Publication statusPublished - 9 Nov 2023
Event2023 IEEE International Conference on Fuzzy Systems - Songdo Incheon, Korea, Republic of
Duration: 13 Aug 202317 Aug 2023
http://fuzz-ieee.org

Conference

Conference2023 IEEE International Conference on Fuzzy Systems
Abbreviated titleFUZZ-IEEE
Country/TerritoryKorea, Republic of
CitySongdo Incheon
Period13/08/2317/08/23
Internet address

Keywords

  • natural language processing
  • Distributional Hypothesis
  • Fuzzy Sliding Window
  • Topic Modeling
  • Natural Language Processing

Fingerprint

Dive into the research topics of 'Effect of calculating Pointwise Mutual Information using a Fuzzy Sliding Window in Topic Modeling'. Together they form a unique fingerprint.

Cite this