Bioinformatic modelling of the impact of probiotic supplementation on microbiomes of breeding ponds and of digestive tract of the Common carp (Cyprinus carpio)

Investigators: J. Szyda, M. Mielczarek, J. Jakimowicz, T. Suchocki, D. Słomian
Students involved: P. Hajduk, L. Jarosz, M. Sztuka

Bioinformatic modelling of the impact of probiotic supplementation on microbiomes of breeding ponds and of digestive tract of the Common carp (Cyprinus carpio)

In recent years, there is great interest in the use of effective microorganisms as probiotic supplementation in aquaculture to improve water quality, inhibit pathogens, and promote the growth of farmed fish. The use of probiotics, which control pathogens through a variety of mechanisms is viewed as an alternative to antibiotics and has become a major field in the development of aquaculture. Considering the potential benefits of adding probiotics, many farmers have recently been using commercially available products in their fish farms.

Although the practical efficacy of probiotic products in aquaculture has been extensively studied, there is a shortage of research related to practical, on-farm (i.e. not experimental) application of probiotic products in the Common carp (Cyprinus carpio) breeding. Especially when probiotics are implemented as a mixture of effective microorganism communities and not as a particular bacteria species.

Having the above in mind, the general goal of the project is to assess the dynamics of water, sediment, and fish intestinal microbiota diversity as well as the differences in water metatranscriptome compositions in earthen ponds, which express practical fish breeding conditions. The two major factors potentially influencing the dynamics that are considered in the project are (i) the probiotic supplementation of water and (ii) the probiotic supplementation of feed.

The sequencing process went well. The number of sequences for water samples can be found below.

The plots below express the quality of raw 16S rRNA data.

For the 16s RNA analysis, the QIIME2 software was used. First of all, adapter trimming eliminated non-biological information was done. A quality threshold of 30 was applied to exclude sequences with a quality score below that threshold. Moreover, sequences shorter than 200 nucleotides were removed. Paired ends were merged with specific thresholds, maintaining sequence length consistency within 16S regions. The denoising step was performed, which produced Amplicon Sequence Variants (ASVs), distinguishing single-nucleotide differing sequences—an advancement over Operational Taxonomic Unit (OTU) tables. These tables were then used to determine the microbial composition of the samples. After that taxonomic assignment was performed.  The alpha diversity metric was computed based on the taxonomic assignment. Furthermore, ANCOM was employed to conduct a differential abundance test for detecting alterations in the microbial composition between experimental groups.


Neural Network Classification:

To differentiate microbiomes and classify individuals based on the probiotics they were fed, we attempted to construct a neural network classifier using Keras under TensorFlow. The data was in the form of feature tables containing the abundance of bacterial families for various individuals from different ponds. These tables were joined into one, containing the bacterial abundance for all individuals and families with at least some presence in certain individuals. Missing values were substituted with zeroes.

For classification, the experimental setup (i.e., what individuals were fed) served as the class, while the abundance of bacterial families acted as predictors. This dataset was then applied to a Dense Neural Network (DNN) and a Convolutional Neural Network (CNN) model. Accuracy was used as the metric to assess the model’s performance. It was calculated on a binary basis, with each correctly classified experimental setup assigned a score of 1, and each misclassification assigned a score of 0. The average of these scores was then determined.

In initial results, the CNN model exhibited superior performance, achieving an unstable accuracy around 0.65, whereas the DNN model stabilized at 0.6. The primary challenge encountered was rapid overfitting due to limited data availability (only 125 individuals). The plots below show the accuracy tested on validation data during model training (DNN – Top, CNN – Below).