N-gram representations for comment filtering

D. Brand, St. Kroon, B. Van Der Merwe, L. Cleophas

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

1 Downloads (Pure)

Abstract

Accurate classifiers for short texts are valuable assets in many applications. Especially in online communities, where users contribute to content in the form of posts and com- ments, an effective way of automatically categorising posts proves highly valuable. This paper investigates the use of N- grams as features for short text classification, and compares it to manual feature design techniques that have been popu- lar in this domain. We find that the N-gram representations greatly outperform manual feature extraction techniques.

Original languageEnglish
Title of host publicationSAICSIT '15 Proceedings of the 2015 Annual Research Conference on South African Institute of Computer Scientists and Information Technologists, 28-30 September 2015, Stellenbosch, South Africa
Place of PublicationNew York
PublisherAssociation for Computing Machinery, Inc
Pages1-10
ISBN (Print)9781450336833
DOIs
Publication statusPublished - 28 Sep 2015
Event2015 Annual Research Conference of the South African Institute of Computer Scientists and Information Technologists (SAICSIT 2015) - Stellenbosch Institute for Advanced Study (STIAS), Stellenbosch, South Africa
Duration: 28 Sep 201530 Sep 2015
http://www.saicsit2015.org/

Conference

Conference2015 Annual Research Conference of the South African Institute of Computer Scientists and Information Technologists (SAICSIT 2015)
Abbreviated titleSAICSIT 2015
CountrySouth Africa
CityStellenbosch
Period28/09/1530/09/15
Other"Knowledge through Technology"
Internet address

Fingerprint

Feature extraction
Classifiers

Keywords

  • Classification
  • Feature design
  • Information retrieval
  • N-gram models
  • NLP
  • Text mining
  • Vector space models

Cite this

Brand, D., Kroon, S., Van Der Merwe, B., & Cleophas, L. (2015). N-gram representations for comment filtering. In SAICSIT '15 Proceedings of the 2015 Annual Research Conference on South African Institute of Computer Scientists and Information Technologists, 28-30 September 2015, Stellenbosch, South Africa (pp. 1-10). [6] New York: Association for Computing Machinery, Inc. https://doi.org/10.1145/2815782.2815789
Brand, D. ; Kroon, St. ; Van Der Merwe, B. ; Cleophas, L. / N-gram representations for comment filtering. SAICSIT '15 Proceedings of the 2015 Annual Research Conference on South African Institute of Computer Scientists and Information Technologists, 28-30 September 2015, Stellenbosch, South Africa . New York : Association for Computing Machinery, Inc, 2015. pp. 1-10
@inproceedings{a731a0391edd499d90f2fdd684902ab0,
title = "N-gram representations for comment filtering",
abstract = "Accurate classifiers for short texts are valuable assets in many applications. Especially in online communities, where users contribute to content in the form of posts and com- ments, an effective way of automatically categorising posts proves highly valuable. This paper investigates the use of N- grams as features for short text classification, and compares it to manual feature design techniques that have been popu- lar in this domain. We find that the N-gram representations greatly outperform manual feature extraction techniques.",
keywords = "Classification, Feature design, Information retrieval, N-gram models, NLP, Text mining, Vector space models",
author = "D. Brand and St. Kroon and {Van Der Merwe}, B. and L. Cleophas",
year = "2015",
month = "9",
day = "28",
doi = "10.1145/2815782.2815789",
language = "English",
isbn = "9781450336833",
pages = "1--10",
booktitle = "SAICSIT '15 Proceedings of the 2015 Annual Research Conference on South African Institute of Computer Scientists and Information Technologists, 28-30 September 2015, Stellenbosch, South Africa",
publisher = "Association for Computing Machinery, Inc",
address = "United States",

}

Brand, D, Kroon, S, Van Der Merwe, B & Cleophas, L 2015, N-gram representations for comment filtering. in SAICSIT '15 Proceedings of the 2015 Annual Research Conference on South African Institute of Computer Scientists and Information Technologists, 28-30 September 2015, Stellenbosch, South Africa ., 6, Association for Computing Machinery, Inc, New York, pp. 1-10, 2015 Annual Research Conference of the South African Institute of Computer Scientists and Information Technologists (SAICSIT 2015), Stellenbosch, South Africa, 28/09/15. https://doi.org/10.1145/2815782.2815789

N-gram representations for comment filtering. / Brand, D.; Kroon, St.; Van Der Merwe, B.; Cleophas, L.

SAICSIT '15 Proceedings of the 2015 Annual Research Conference on South African Institute of Computer Scientists and Information Technologists, 28-30 September 2015, Stellenbosch, South Africa . New York : Association for Computing Machinery, Inc, 2015. p. 1-10 6.

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

TY - GEN

T1 - N-gram representations for comment filtering

AU - Brand, D.

AU - Kroon, St.

AU - Van Der Merwe, B.

AU - Cleophas, L.

PY - 2015/9/28

Y1 - 2015/9/28

N2 - Accurate classifiers for short texts are valuable assets in many applications. Especially in online communities, where users contribute to content in the form of posts and com- ments, an effective way of automatically categorising posts proves highly valuable. This paper investigates the use of N- grams as features for short text classification, and compares it to manual feature design techniques that have been popu- lar in this domain. We find that the N-gram representations greatly outperform manual feature extraction techniques.

AB - Accurate classifiers for short texts are valuable assets in many applications. Especially in online communities, where users contribute to content in the form of posts and com- ments, an effective way of automatically categorising posts proves highly valuable. This paper investigates the use of N- grams as features for short text classification, and compares it to manual feature design techniques that have been popu- lar in this domain. We find that the N-gram representations greatly outperform manual feature extraction techniques.

KW - Classification

KW - Feature design

KW - Information retrieval

KW - N-gram models

KW - NLP

KW - Text mining

KW - Vector space models

UR - http://www.scopus.com/inward/record.url?scp=84959374437&partnerID=8YFLogxK

U2 - 10.1145/2815782.2815789

DO - 10.1145/2815782.2815789

M3 - Conference contribution

AN - SCOPUS:84959374437

SN - 9781450336833

SP - 1

EP - 10

BT - SAICSIT '15 Proceedings of the 2015 Annual Research Conference on South African Institute of Computer Scientists and Information Technologists, 28-30 September 2015, Stellenbosch, South Africa

PB - Association for Computing Machinery, Inc

CY - New York

ER -

Brand D, Kroon S, Van Der Merwe B, Cleophas L. N-gram representations for comment filtering. In SAICSIT '15 Proceedings of the 2015 Annual Research Conference on South African Institute of Computer Scientists and Information Technologists, 28-30 September 2015, Stellenbosch, South Africa . New York: Association for Computing Machinery, Inc. 2015. p. 1-10. 6 https://doi.org/10.1145/2815782.2815789