Large scale analysis of small repeats via mining of the human genome

I. van den Berg, D. Bosnacki, P.A.J. Hilbers

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

2 Citations (Scopus)
1 Downloads (Pure)


Small repetitive sequences, called tandem repeats, are abundant throughout the human genome, both in coding and in non-coding regions. Their role is still mostly unknown, but at least 20 of those repetitive sequences have been related to neurodegenerative disorders. The mutational process that is the basis of these disorders is not yet fully understood. Comprehending the origin, function and possible usefulness of the tandem repeats, will require analysis of huge data from various sources. In this paper we attempt such a large scale analysis of short repeats. We describe and discuss the steps that are needed to be taken to perform large scale genomic analysis. We define tandem repeats and compare the results of repeat localization with genome annotations. We show that the degree of repetitiveness is different for the human chromosomes. Chromosome 19 and 17 have more repeats per mega base pair than any of the other chromosomes, the Y chromosome has the least. We also demonstrate that some repeat motifs are much more common than others. Mono- and dinucleotide repeats are the most abundant, with A and AAC the most common motifs, while CG is hardly present within the genome. Repeats with unit length three are under represented on the genome and repeats with unit length 9 are extremely rare.
Original languageEnglish
Title of host publicationProceedings of the 20th International Workshop on Database and Expert Systems Application (DEXA 2009), 31 August - 4 September 2009, Linz, Austria
PublisherIEEE Computer Society
ISBN (Print)978-0-7695-3763-4
Publication statusPublished - 2009


Dive into the research topics of 'Large scale analysis of small repeats via mining of the human genome'. Together they form a unique fingerprint.

Cite this