Describing the genome-wide distribution of copy number variations in various breeds of domestic cattle based on the next-generation sequencing data of 121 individuals

Key investigators: Magda Mielczarek, Joanna Szyda
Period of work: 2015-2016
Funded: National Science Centre Logo_NCN.JPG

Objectives: Copy number variations (CNVs) are the major source of genetic diversity in mammals and they are defined as the gains (duplications) and losses (deletions) of the DNA fragments. Their length ranges from 50 bp to several milions bp and they are present widely in genomes. Moreover, CNVs cover many functional elements of the genome, such as genes or regulatory sequences which can markedly effect the phenotypic characteristics of individuals. In the project we have a unique opportunity to analyse CNVs across many individuals belonging to 9 different breeds of domestic cattle (Bos taurus Linnaeus, 1758). Therefore, the main aims of this study comprise (i) detecting CNV polymorphisms in 121 individuals, (ii) describing the distribution of these polymorphisms across the genome (iii) assessing the inter-breed and inter-individual variation in the number and distribution of copies, (iv) the functional annotation of CNVs as well.

Methods: The way leading to obtain a reliable set of annotated CNVs included the following steps: (i) an alignment to the reference genome, (ii) data processing after alignment, (iii) CNVs detection, (iv) validation of CNVs and (v) their annotation. The final set of validated CNVs was subjected to the statistical analysis in order to provide population wide inferences. In particular, separately for duplications and deletions the inter-individual and the inter-breed variation analysis of the number and the distribution of detected polymorphisms were tested.

Results: The total number of detected CNVs varied strongly between CNV detection programs. The plot below shows the total number of duplications found by RD-based CNVnator and SR-based Pindel.


In order to create the final and realiable dataset of polymorphisms overlapping CNVs were determined. The number of duplications per animal ranged between 12 and 11,704, while mean and median were 1,343 ± 1,086 and 1,212. The number of deletions ranged between 0 and 3,960 and the corresponding mean and median were 1,708 ± 700 and 1,628. The length varied strongly and for duplications ranged from 200 to 4,992,800 bp. The corresponding mean and median were 31,018 ± 169,307 and 6,900 bp. The shortest deletion was 200 bp and the longest 4,536,800 bp. The mean was 10,836 ± 53,724, while the median was equal to 2,000 bp. The graphical representation of the number of duplications (a) and deletions (b) per bull and the length of duplications (c) and deletions (d) observed in the whole validated data set is presented in figure below.

boxplot CNV numb and lenght

As the effect of the functional annotation 70.51 % of duplications and 67.92 % of deletions fell into non-genic regions while for 29.49 % of duplications and 32.08 % of deletions SO terms corresponding to gene regions were assigned. The figure below shows the proportion of both annotation types across all duplications and deletions.

6. Conclusions
In the present study, the genome assessment of Copy Number Variations in cattle using whole genome sequence data was performed. The analysis enabled detection, validation and annotation of 445,791 CNVs as well as hypotheses testing of obtained results. A highly significant inter-individual and inter-breed variation was observed both in the number and in the length of CNVs. The breed-specific phenomenon was especially strongly emphasized for the Fleckvieh breed suggesting it was subjected to different artificial selection pressure than other breeds. Moreover a significant variation in the CNV distribution was also observed within a genome by varying density of CNV depending on genome function. Summarising, the analysis showed a high complexity of a CNV landscape in Bos taurus genomes.


Results included here were essential to write the PhD thesis “The genome-wide distribution of copy number variations in various breeds of domestic cattle (Bos taurus Linnaeus, 1758) based on the next-generation sequencing data”. The PhD thesis was defended on 10.10. 2016, on the Faculty of Biology and Environmental Protection at Nicholas Copernicus University in Toruń.