TY - UNPB
T1 - PAV-spotter
T2 - using signal cross-correlations to identify Presence/Absence Variation in target capture data
AU - de Visser, Manon Chantal
AU - Ploeg, Chris van der
AU - Cvijanovic, Milena
AU - Vucic, Tijana
AU - Theodoropoulos, Anagnostis
AU - Wielstra, Ben
PY - 2024/10/29
Y1 - 2024/10/29
N2 - High throughput sequencing technologies have become essential in the fields of evolutionary biology and genomics. When dealing with non model organisms or genomic gigantism, sequencing whole genomes is still relatively costly and therefore reduced-genome representations are frequently obtained, for instance by "target capture" approaches. While computational tools exist that can handle target capture data and identify small scale variants such as single nucleotide polymorphisms and micro-indels, options to identify large scale structural variants are limited. To meet this need, we introduce PAVspotter: a tool that can identify presence/absence variation (PAV) in target capture data. PAVspotter conducts a signal cross-correlation calculation, in which the distribution of read counts per target between samples of different a priori defined classes, e.g. male versus female, or diseased versus healthy, are compared. We apply and test our methodology by studying Triturus newts: salamanders with gigantic genomes that currently lack an annotated reference genome. Triturus newts suffer from a hereditary disease that kills half their offspring during embryogenesis. We compare the target capture data of two different types of diseased embryos, characterized by unique deletions, with those of healthy embryos. Our findings show that PAVspotter helps to expose such structural variants, even in the face of medium to low sequencing coverage levels, low sample sizes, and background noise due to mismapped reads. PAVspotter can be used to study the structural variation underlying supergene systems in the absence of chromosome level genome assemblies. The code, including further explanation on how to use and customize it, is available through the PAVspotter GitHub repository.
AB - High throughput sequencing technologies have become essential in the fields of evolutionary biology and genomics. When dealing with non model organisms or genomic gigantism, sequencing whole genomes is still relatively costly and therefore reduced-genome representations are frequently obtained, for instance by "target capture" approaches. While computational tools exist that can handle target capture data and identify small scale variants such as single nucleotide polymorphisms and micro-indels, options to identify large scale structural variants are limited. To meet this need, we introduce PAVspotter: a tool that can identify presence/absence variation (PAV) in target capture data. PAVspotter conducts a signal cross-correlation calculation, in which the distribution of read counts per target between samples of different a priori defined classes, e.g. male versus female, or diseased versus healthy, are compared. We apply and test our methodology by studying Triturus newts: salamanders with gigantic genomes that currently lack an annotated reference genome. Triturus newts suffer from a hereditary disease that kills half their offspring during embryogenesis. We compare the target capture data of two different types of diseased embryos, characterized by unique deletions, with those of healthy embryos. Our findings show that PAVspotter helps to expose such structural variants, even in the face of medium to low sequencing coverage levels, low sample sizes, and background noise due to mismapped reads. PAVspotter can be used to study the structural variation underlying supergene systems in the absence of chromosome level genome assemblies. The code, including further explanation on how to use and customize it, is available through the PAVspotter GitHub repository.
U2 - 10.1101/2024.10.25.620064
DO - 10.1101/2024.10.25.620064
M3 - Preprint
BT - PAV-spotter
PB - bioRxiv
ER -