Exploiting multi-word similarity for retrieval in medical document collections: The TSRM approach

Euthymios Drymonas, Kalliopi Zervanou, Euripides G.M. Petrakis

Onderzoeksoutput: Bijdrage aan tijdschriftTijdschriftartikelAcademicpeer review

3 Citaten (Scopus)
2 Downloads (Pure)

Samenvatting

In this paper, we investigate on potential improvements to Information Retrieval (IR) models related to document representation and conceptual, topic retrieval, in medical document collections. We propose the TSRM1 (Term Similarity and Retrieval Model) approach, where document representations are based on multi-word domain terms, rather than mere single key-words, typically applied in traditional IR. The proposed representation is semantically compact and more efficient, being reduced to a limited number of meaningful multi-word terms (phrases), rather than large vectors of single-words, part of which may be void of distinctive content semantics. In computing document similarity, contrary to other state-of-theart methods examined in this work, TSRM adopts a knowledge poor solution, namely an approach which does not require any existing knowledge resources, such as ontologies, or thesauri. The evaluation of TSRM is based on OHSUMED, a standard TREC collection of medical documents and illustrated the effciency of TSRM over other well established general purpose IR models.

Originele taal-2Engels
Pagina's (van-tot)315-321
Aantal pagina's7
TijdschriftJournal of Digital Information Management
Volume8
Nummer van het tijdschrift5
StatusGepubliceerd - 1 okt 2010
Extern gepubliceerdJa

Vingerafdruk

Duik in de onderzoeksthema's van 'Exploiting multi-word similarity for retrieval in medical document collections: The TSRM approach'. Samen vormen ze een unieke vingerafdruk.

Citeer dit