AbstractThe aim of the research was to approach from the methodological point of view the question of filtering a hard-to-reach, minority population among social media users. For this purpose, a case study of the Dutch veg(etari)an Twitter population was used. Predictive performance was measured on three different data sets. These data sets reflect the main approaches for social media data collection, namely: choosing accounts arbitrarily and filtering their followers; analysing the social network of users; and analysing the tweets (text) of the social media users. For modelling, supervised learning techniques were applied. The results show that the highest predictive performance was reached when modelling based on social network data
(F1-score: 0.90 – 0.97), whilst the lowest when on text (tweets) data (F1-score: 0.87 – 0.90). It is concluded, that filtering a minority population among social media users is possible with all of the three aforementioned data collection approaches. However, the highest predictive performance was reached when modelling on the social network data.
|Date of Award