A machine learning approach for the curation of biomedical literature

  • M. Shi
  • , D.S. Edwin
  • , R. Menon
  • , Lixiang Shen
  • , J.Y.K. Lim
  • , H.T. Loh
  • , S. Sathiyakeerthi
  • , C.J. Ong

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    2 Citations (Scopus)
    1 Downloads (Pure)

    Abstract

    In the field of the biomedical sciences there exists a vast repository of information located within large quantities of research papers. Very often, researchers need to spend considerable amounts of time reading through entire papers before being able to determine whether or not they should be curated (archived). In this paper, we present an automated text classification system for the classification of biomedical papers. This classification is based on whether there is experimental evidence for the expression of molecular gene products for specified genes within a given paper. The system performs preprocessing and data cleaning, followed by feature extraction from the raw text. It subsequently classifies the paper using the extracted features with a Naïve Bayes Classifier. Our approach has made it possible to classify (and curate) biomedical papers automatically, thus potentially saving considerable time and resources. The system proved to be highly accurate, and won honourable mention in the KDD Cup 2002 task 1.
    Original languageEnglish
    Title of host publicationAdvances in Information Retrieval, 25th European Conference on IR Research, Pisa, Italy, April 14-16, 2003
    EditorsF. Sebastiani
    Place of PublicationBerlin
    PublisherSpringer
    Pages597-604
    ISBN (Print)978-3-540-01274-0
    DOIs
    Publication statusPublished - 2003

    Publication series

    NameLecture Notes in Computer Science
    Volume2633
    ISSN (Print)0302-9743

    Fingerprint

    Dive into the research topics of 'A machine learning approach for the curation of biomedical literature'. Together they form a unique fingerprint.

    Cite this