In this paper, we investigate on potential improvements to Information Retrieval (IR) models related to document representation and conceptual, topic retrieval, in medical document collections. We propose the TSRM1 (Term Similarity and Retrieval Model) approach, where document representations are based on multi-word domain terms, rather than mere single key-words, typically applied in traditional IR. The proposed representation is semantically compact and more efficient, being reduced to a limited number of meaningful multi-word terms (phrases), rather than large vectors of single-words, part of which may be void of distinctive content semantics. In computing document similarity, contrary to other state-of-theart methods examined in this work, TSRM adopts a knowledge poor solution, namely an approach which does not require any existing knowledge resources, such as ontologies, or thesauri. The evaluation of TSRM is based on OHSUMED, a standard TREC collection of medical documents and illustrated the effciency of TSRM over other well established general purpose IR models.
|Tijdschrift||Journal of Digital Information Management|
|Nummer van het tijdschrift||5|
|Status||Gepubliceerd - 1 okt. 2010|