N-gram representations for comment filtering

D. Brand, St. Kroon, B. Van Der Merwe, L. Cleophas

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    1 Citation (Scopus)
    1 Downloads (Pure)

    Abstract

    Accurate classifiers for short texts are valuable assets in many applications. Especially in online communities, where users contribute to content in the form of posts and com- ments, an effective way of automatically categorising posts proves highly valuable. This paper investigates the use of N- grams as features for short text classification, and compares it to manual feature design techniques that have been popu- lar in this domain. We find that the N-gram representations greatly outperform manual feature extraction techniques.

    Original languageEnglish
    Title of host publicationSAICSIT '15 Proceedings of the 2015 Annual Research Conference on South African Institute of Computer Scientists and Information Technologists, 28-30 September 2015, Stellenbosch, South Africa
    Place of PublicationNew York
    PublisherAssociation for Computing Machinery, Inc
    Pages1-10
    ISBN (Print)9781450336833
    DOIs
    Publication statusPublished - 28 Sep 2015
    Event2015 Annual Research Conference of the South African Institute of Computer Scientists and Information Technologists (SAICSIT 2015) - Stellenbosch Institute for Advanced Study (STIAS), Stellenbosch, South Africa
    Duration: 28 Sep 201530 Sep 2015
    http://www.saicsit2015.org/

    Conference

    Conference2015 Annual Research Conference of the South African Institute of Computer Scientists and Information Technologists (SAICSIT 2015)
    Abbreviated titleSAICSIT 2015
    Country/TerritorySouth Africa
    CityStellenbosch
    Period28/09/1530/09/15
    Other"Knowledge through Technology"
    Internet address

    Keywords

    • Classification
    • Feature design
    • Information retrieval
    • N-gram models
    • NLP
    • Text mining
    • Vector space models

    Fingerprint

    Dive into the research topics of 'N-gram representations for comment filtering'. Together they form a unique fingerprint.

    Cite this