Term based semantic clusters for very short text classification

Jasper Paalman, Shantanu Mullick, Kalliopi Zervanou, Yingqian Zhang

Onderzoeksoutput: Bijdrage aan congresAbstractAcademic

16 Downloads (Pure)

Samenvatting

Very short texts, such as tweets and invoices, present challenges in classification. Such texts abound in ellipsis, grammatical errors, misspellings, and semantic variation. Although term occurrences are strong indicators of content, in very short texts, sparsity makes it difficult to capture enough content for a semantic classifier A solution calls for a method that not only considers term occurrence, but also handles sparseness well. In this work, we introduce such an approach for the classification of short invoice descriptions, in such a way that each class reflects a different group of products or services. The developed algorithm is called Term Based Semantic Clusters (TBSeC).

Originele taal-2Engels
Aantal pagina's12
StatusGepubliceerd - 8 nov 2019
Evenement31st Benelux Conference on Artificial Intelligence and the 28th Belgian Dutch Conference on Machine Learning, BNAIC/BENELEARN 2019 - Brussels, België
Duur: 6 nov 20198 nov 2019

Congres

Congres31st Benelux Conference on Artificial Intelligence and the 28th Belgian Dutch Conference on Machine Learning, BNAIC/BENELEARN 2019
LandBelgië
StadBrussels
Periode6/11/198/11/19

Vingerafdruk Duik in de onderzoeksthema's van 'Term based semantic clusters for very short text classification'. Samen vormen ze een unieke vingerafdruk.

  • Citeer dit

    Paalman, J., Mullick, S., Zervanou, K., & Zhang, Y. (2019). Term based semantic clusters for very short text classification. Abstract van 31st Benelux Conference on Artificial Intelligence and the 28th Belgian Dutch Conference on Machine Learning, BNAIC/BENELEARN 2019, Brussels, België. http://ceur-ws.org/Vol-2491/abstract117.pdf