CNV in human genome

DNA sequence features underlying large-scale duplications and deletions in human

Copy Number Variants are large structural variants manifested mainly as duplications and deletions. They may cover up to 12% of the whole genome and have impact on phenotypic diversity and disease. Our study used 5 867 structural duplications and 33 181 structural deletions available from the 1000 Genomes Project.

The aim of this project was to characterize regions of human genome that are susceptible to structural duplications or deletions. We study the question, whether some specific regions of the human genome are particularly vulnerable to formation of duplications and deletions, and if so, what distinguishes those regions from unchanged regions.

Main findings:

The number of unknown nucleotides in analysed dataset was relatively low, which confirmed high reliability of the structural variants. The distribution of the fraction of GC pairs within copy number variant regions was not normal and differed significantly from the distribution in randomised set of sequences. The of low-complexity regions in duplications was not significantly different from the randomized data, but in deletions it was. 100-bp regions located downstream and upstream of duplications, as well as downstream of deletions were significantly different from the randomised data, but sequences located upstream of deletions were not. The majority of variants intersected with gene regions – mainly with introns.