Resolving genetic predisposition to clinical mastitis based on whole genome sequences of 32 cows

Key investigators: Magdalena Frąszczak, Magda Mielczarek, Tomasz Suchocki, Joanna Szyda
Period: 2015-2018

Work Done Summary

The main goal of the project was to understand the genetic backgraound of clinical mastitis in dairy cattle. To achieve this goal we characterized the inter-individual variability resulting from the presence of SNP and CNV polymorphisms. As a result of the research, the following scientific articles were created:

  • The assessment of inter-individual variation of whole-genome DNA sequence in 32 cows
  • Analysis of Copy Number Variations in Holstein-Friesian Cow Genomes Based on Whole Genome Sequence Data
  • The Genetic Background of Clinical Mastitis in 2 Holstein-Friesian cattle.

Bioinformatics tools designed in this project have been additionally used to analyze the quality of the cattle reference genome (UMD 3.1), which is described in the article:

  • The impact of a reference genome quality on copy number variant detection.

Moreover, these tools were used to describe breed specific SNPs in domestic cattle:

  • Identification and annotation of breed-specific single nucleotide polymorphisms in Bos taurus genomes
  • Population Structure Analysis of Bull Genomes of European and Western Ancestry.

Grant Description


The major aims of this study is to use the whole genome DNA sequence of 32 individuals to identify mononucleotide polymorphisms as well as the Copy Number Variations (CNVs) which are present in the genome of domestic cattle. These polymorphisms will be then used to find genes or genomic regions responsible for the risk of clinical mastitis.


Whole genome DNA sequences are available for 32 cows representing the Polish Holstein-Friesian breed selected out of the data base of 991 cows comprising individuals with clinical mastitis cases diagnosed by a veterinarian and their healthy herdmates. The average sequencing coverage calculated across the 32 individuals is high and amounts to 14.03 varying between 5 and 17.


The genome averaged sequencing coverage for each individual

The experimental design comprises 16 paternal halfsib pairs comprising halfsisters matched by the number of parities, production level, and birth year, but differing in terms of their mastitis resistance expressed by the frequency of clinical mastitis diagnosed throughout their production life. In particular in each pair one of the halfsibs represents an animal without clinical mastitis occurrence throughout the whole production period (control group) and the other represents an animal with multiple clinical mastists cases (case group).

The alignment to the reference genome will be carried out based on a paired-end alignment, using the BWA-MEM software. Mononucleotide variant detection will be performed using the GATK and Samtools packages, further on the CNVnator program will be used for CNV detection, which uses local differences in read depth to identify copy number variation sites. Allele frequencies will be estimated using the Jackknife resampling algorithm. Moreover, the r2 statistics, which quantifies the amount of linkage disequilibrium between pairs of SNPs will be calculated using the PLINK package. SNP haplotypes will be reconstructed with GATK and Beagle software packages. The Odds Ratio and the Likelihood Ratio Tests will be used to assess differences in SNP allele frequency between case and control groups. Afterwards, the nominal P values will be corrected for multiple testing based using the estimated number of effective, independent tests. In addition, the False Discovery Rate will be calculated. For testing haplotype effects a logistic regression model will be compared with linear and quadratic discriminant models as well as with a random forest algorithm. Finally, a Variant Effect Predictor software together with custom written programs will be used for the functional annotation of the significant variants and regions.

Project impact

Genetically, the significance of our project is related to the fact that, in contrast to commercial SNP panels, individual sequence data allow for the identification of rare genetic variants, which thanks to recent results from human genetic studies play a predominant role in the determination of genetic variation. To our knowledge no study involving rare variants has been conducted in dairy cattle. Also some novel statistical approaches need to be introduced: a novel method of multiple testing correction and a Jackknife resampling procedure, which has not been so far used in the context of whole genome DNA sequence analysis. Moreover, the understanding of genetic determination of clinical mastitis is of high importance for dairy cattle breeding, since udder infections in high performing cows are very common. They cause not only problems with animal welfare, but also considerable economical loss for the breeder.

Selected results

total num of snsps

The total number of SNPs identified across 32 individuals

tabela z genami

The table lists genes that were selected for inter-individual comparison of mononucleotide variability. The primary selection criterion was function for which three categories were considered: (1) housekeeping, (2) neutral to selection and (3) strongly selected genes.

del dup

Annotation of all detected CNVs to Sequence Ontology terms: deletions – left, duplications – right

The total number of detected autosomal duplications (black) and deletions (grey) for 29 cows. B. The median (black) and mean (grey) lengths of deletions found on all autosomes for 29 cows. C. The median (black) and mean (grey) lengths of duplications found on all autosomes for 29 cows. Click the Figure to make it bigger!

The number of annotated deletion and duplication breakpoints. The abbreviated names of 8 categories are: (1) cod – coding sequences, (2) int – introns, (3) spl – splice regions, (4) ncod – non-coding transcripts (5) utr – 5’ and 3’ UTR, (6) ug – upstream gene regions, (7) dg – downstream gene regions and (8) ing – intergenic variants. Click the Figure to make it bigger!


  1. The genomic landscape of SNPs and CNVs is very dynamic
  2. The high number of functional genomic units exhibiting differential polymorphisms between a healthy and a sick sib is involved in immune response
  3. CNVs play an important role in disease susceptibility
  4. Sequence deletion has more severe consequences on reducing resistance against CM, than sequence duplication – on increasing resistance to CM.


The research was supported by the European Union Seventh Framework Programme through the NADIR (FP7-228394) project, by the Polish National Science Centre (NCN) grant 2014/13/B/NZ9/02016, and by The Leading National Research Centre (KNOW) programme for 2014-2018. Computations were carried out at the Poznan Supercomputing and Networking Centre.