Resolving genetic predisposition to clinical mastitis based on whole genome sequences of 32 cows

investigators: Joanna Szyda, Magdalena Frąszczak, Magda Mielczarek, Tomasz Suchocki

    Research project objectives/Hypothesis
    The major aims of this study is to use the whole genome DNA sequence of 32 individuals to identify mononucleotide polymorphisms as well as the Copy Number Variations (CNVs) which are present in the genome of domestic cattle. These polymorphisms will be then used to find genes or genomic regions responsible for the risk of clinical mastitis.
    Research methodology
    Whole genome DNA sequences are available for 32 cows representing the Polish Holstein-Friesian breed selected out of the data base of 991 cows comprising individuals with clinical mastitis cases diagnosed by a veterinarian and their healthy herdmates. The average sequencing coverage calculated across the 32 individuals is high and amounts to 14.03 varying between 5 and 17. The experimental design comprises 16 paternal halfsib pairs comprising halfsisters matched by the number of parities, production level, and birth year, but differing in terms of their mastitis resistance expressed by the frequency of clinical mastitis diagnosed throughout their production life. In particular in each pair one of the halfsibs represents an animal without clinical mastitis occurrence throughout the whole production period (control group) and the other represents an animal with multiple clinical mastists cases (case group). The alignment to the reference genome will be carried out based on a paired-end
    alignment, using the BWA-MEM software. Mononucleotide variant detection will be performed using the GATK and Samtools packages, further on the CNVnator program will be used for CNV detection, which uses local differences in read depth to identify copy number variation sites. Allele frequencies will be estimated using the Jackknife resampling algorithm. Moreover, the r square statistics, which quantifies the amount of linkage disequilibrium between pairs of SNPs will be calculated using the PLINK package. SNP haplotypes will be reconstructed with GATK and Beagle software packages. The Odds Ratio and the Likelihood Ratio Tests will be used to assess differences in SNP allele frequency between case and control groups. Afterwards, the nominal P values will be corrected for multiple testing based using the estimated number of effective, independent tests. In addition, the False Discovery Rate will be calculated. For testing haplotype effects a logistic regression model will be
    compared with linear and quadratic discriminant models as well as with a random forest algorithm. Finally, a Variant Effect Predictor software together with custom written programs will be used for the functional annotation of the significant variants and regions.
    Research project impact
    Genetically, the significance of our project is related to the fact that, in contrast to commercial SNP panels, individual sequence data allow for the identification of rare genetic variants, which thanks to recent results from human genetic studies play a predominant role in the determination of genetic variation. To our knowledge no study involving rare variants has been conducted in dairy cattle. Also some novel statistical approaches need to be introduced: a novel method of multiple testing correction and a Jackknife resampling procedure, which has not been so far used in the context of whole genome DNA sequence analysis. Moreover, the understanding of genetic determination of clinical mastitis is of high importance for dairy cattle breeding, since udder infections in high performing cows are very common. They cause not only problems with animal welfare, but also considerable economical loss for the breeder.