N-gram representations for comment filtering

D. Brand, St. Kroon, B. Van Der Merwe, L. Cleophas

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

1 Downloads (Pure)

Abstract

Accurate classifiers for short texts are valuable assets in many applications. Especially in online communities, where users contribute to content in the form of posts and com- ments, an effective way of automatically categorising posts proves highly valuable. This paper investigates the use of N- grams as features for short text classification, and compares it to manual feature design techniques that have been popu- lar in this domain. We find that the N-gram representations greatly outperform manual feature extraction techniques.

Original languageEnglish
Title of host publicationSAICSIT '15 Proceedings of the 2015 Annual Research Conference on South African Institute of Computer Scientists and Information Technologists, 28-30 September 2015, Stellenbosch, South Africa
Place of PublicationNew York
PublisherAssociation for Computing Machinery, Inc
Pages1-10
ISBN (Print)9781450336833
DOIs
Publication statusPublished - 28 Sep 2015
Event2015 Annual Research Conference of the South African Institute of Computer Scientists and Information Technologists (SAICSIT 2015) - Stellenbosch Institute for Advanced Study (STIAS), Stellenbosch, South Africa
Duration: 28 Sep 201530 Sep 2015
http://www.saicsit2015.org/

Conference

Conference2015 Annual Research Conference of the South African Institute of Computer Scientists and Information Technologists (SAICSIT 2015)
Abbreviated titleSAICSIT 2015
CountrySouth Africa
CityStellenbosch
Period28/09/1530/09/15
Other"Knowledge through Technology"
Internet address

Keywords

  • Classification
  • Feature design
  • Information retrieval
  • N-gram models
  • NLP
  • Text mining
  • Vector space models

Fingerprint Dive into the research topics of 'N-gram representations for comment filtering'. Together they form a unique fingerprint.

Cite this