Research article Special Issues

Global dynamics of a model for treating microorganisms in sewage by periodically adding microbial flocculants

  • Received: 26 April 2019 Accepted: 16 September 2019 Published: 29 September 2019
  • In this paper, a mathematical model for microbial treatment in livestock and poultry sewage is proposed and analyzed. We consider periodic addition of microbial flocculants to treat microorganisms such as Escherichia coli in sewage. Different from the traditional models, a class of composite dynamics models composed of impulsive differential equations is established. Our aim is to study the relationship between substrate, microorganisms and flocculants in sewage systems as well as the treatment strategies of microorganisms. Precisely, we first show the process of mathematical modeling by using impulsive differential equations. Then by using the theory of impulsive differential equations, the dynamics of the model is investigated. Our results show that the system has a microorganismsextinction periodic solution which is globally asymptotically stable when a certain threshold value is less than one, and the system is permanent when a certain threshold value is greater than one. Furthermore, the control strategy for microorganisms treatment is discussed. Finally, some numerical simulations are carried out to illustrate the theoretical results.

    Citation: Tongqian Zhang, Ning Gao, Tengfei Wang, Hongxia Liu, Zhichao Jiang. Global dynamics of a model for treating microorganisms in sewage by periodically adding microbial flocculants[J]. Mathematical Biosciences and Engineering, 2020, 17(1): 179-201. doi: 10.3934/mbe.2020010

    Related Papers:

    [1] Yufeng Qian . Exploration of machine algorithms based on deep learning model and feature extraction. Mathematical Biosciences and Engineering, 2021, 18(6): 7602-7618. doi: 10.3934/mbe.2021376
    [2] Yu Li, Meilong Zhu, Guangmin Sun, Jiayang Chen, Xiaorong Zhu, Jinkui Yang . Weakly supervised training for eye fundus lesion segmentation in patients with diabetic retinopathy. Mathematical Biosciences and Engineering, 2022, 19(5): 5293-5311. doi: 10.3934/mbe.2022248
    [3] Keying Du, Liuyang Fang, Jie Chen, Dongdong Chen, Hua Lai . CTFusion: CNN-transformer-based self-supervised learning for infrared and visible image fusion. Mathematical Biosciences and Engineering, 2024, 21(7): 6710-6730. doi: 10.3934/mbe.2024294
    [4] Qiao Pan, Chen Huang, Dehua Chen . A method based on multi-standard active learning to recognize entities in electronic medical record. Mathematical Biosciences and Engineering, 2021, 18(2): 1000-1021. doi: 10.3934/mbe.2021054
    [5] Yue Li, Hongmei Jin, Zhanli Li . A weakly supervised learning-based segmentation network for dental diseases. Mathematical Biosciences and Engineering, 2023, 20(2): 2039-2060. doi: 10.3934/mbe.2023094
    [6] Xiaobo Zhang, Donghai Zhai, Yan Yang, Yiling Zhang, Chunlin Wang . A novel semi-supervised multi-view clustering framework for screening Parkinson's disease. Mathematical Biosciences and Engineering, 2020, 17(4): 3395-3411. doi: 10.3934/mbe.2020192
    [7] Yanghan Ou, Siqin Sun, Haitao Gan, Ran Zhou, Zhi Yang . An improved self-supervised learning for EEG classification. Mathematical Biosciences and Engineering, 2022, 19(7): 6907-6922. doi: 10.3934/mbe.2022325
    [8] Zhanhong Qiu, Weiyan Gan, Zhi Yang, Ran Zhou, Haitao Gan . Dual uncertainty-guided multi-model pseudo-label learning for semi-supervised medical image segmentation. Mathematical Biosciences and Engineering, 2024, 21(2): 2212-2232. doi: 10.3934/mbe.2024097
    [9] Ruoqi Zhang, Xiaoming Huang, Qiang Zhu . Weakly supervised salient object detection via image category annotation. Mathematical Biosciences and Engineering, 2023, 20(12): 21359-21381. doi: 10.3934/mbe.2023945
    [10] Jingyao Liu, Qinghe Feng, Yu Miao, Wei He, Weili Shi, Zhengang Jiang . COVID-19 disease identification network based on weakly supervised feature selection. Mathematical Biosciences and Engineering, 2023, 20(5): 9327-9348. doi: 10.3934/mbe.2023409
  • In this paper, a mathematical model for microbial treatment in livestock and poultry sewage is proposed and analyzed. We consider periodic addition of microbial flocculants to treat microorganisms such as Escherichia coli in sewage. Different from the traditional models, a class of composite dynamics models composed of impulsive differential equations is established. Our aim is to study the relationship between substrate, microorganisms and flocculants in sewage systems as well as the treatment strategies of microorganisms. Precisely, we first show the process of mathematical modeling by using impulsive differential equations. Then by using the theory of impulsive differential equations, the dynamics of the model is investigated. Our results show that the system has a microorganismsextinction periodic solution which is globally asymptotically stable when a certain threshold value is less than one, and the system is permanent when a certain threshold value is greater than one. Furthermore, the control strategy for microorganisms treatment is discussed. Finally, some numerical simulations are carried out to illustrate the theoretical results.


    Supervised learning is an important branch of machine learning. In supervised multi-classification problems, each sample is assigned a label which indicates the category it belongs to [1]. Supervised learning is effective when there are enough samples with high quality labels. However, it is expensive and time-consuming to build datasets with a multitude of accurate labels. To solve this problem, researchers have proposed a series of weakly supervised learning (WSL) methods, which aim to train models with partial, incomplete or inaccurate supervised information, such as noise-label learning [2,3,4,5], semi-supervised learning [6,7,8,9], partial-label learning [10,11,12], positive-confidence learning [13], unlabeled-unlabeled learning [14] and others.

    In this paper, we consider another WSLframework called complementary label learning (CLL). We show the difference between complemtary labels and true labels in Figure 1. Compared to an ordinary label, a complementary label indicates the class that the sample does not belong to. Obviously, it is easier and less costly to collect these complementary labels. For example, in some very specialized domains, the expert knowledge is very expensive. If complementary labels are used for annotation, we need to only determine the extent of the label space and then use common sense to determine which category is wrong. It is much simpler and faster to determine which class a sample does not belong to than it belongs to. Besides, CLL can also protect data privacy in some sensitive fields like medical and financial records because we no longer need to disclose the true information of the data. This not only protects data privacy and security, but also makes it easier to collect data in these areas.

    Figure 1.  Comparison of the complementary labels (bottom) with the real labels (top). Complementary label is one of categories the image does not belong to.

    The framework of CLL was first proposed by Ishida et al. [15]. They proved that the unbiased risk estimator (URE) only from complementary labels is equivalent to the ordinary classification risk when the loss function satisfies certain conditions. In URE, the loss function must be nonconvex and symmetric which leads to certain limitations. To overcome this limitation, Yu et al. [16] made cross-entropy loss usable in CLL by constructing a complementary label transition matrix, and they also considered that different labels had different probability of being selected as a complementary label. Then, Ishida et al. [17] expanded URE and proposed a CLL framework adapted to more general loss functions. This framework still has an unbiased estimator of the regular classification risk, but it works for all loss functions. Chou et al. [18] optimized URE from gradient estimation, and proposed that using surrogate complementary loss (SCL) to obtain unbiased risk estimation, which effectively alleviated the problem of overfitting in URE. Liu et al. [19] applied common losses such as categorical cross entropy (CCE), mean square error (MSE) and mean absolute error (MAE) to CLL. Ishiguro et al. [20] conducted a study on the problem that complementary labels may be affected by label noise. To mitigate its adverse effects, they selected losses with noise robustness which satisfied weighted symmetric condition or a more relaxed condition. Recently, Zhang et al. [21] broadened the setting of complementary label datasets and discussed the case that the datasets contained a large number of complementary labels and a small number of true labels at the same time. They proposed an adversarial complementary label learning network, named Clarinet. Clarinet consists of two deep neural networks, one to classify complementary labels and true labels, and the other to learn from complementary labels.

    Previous studies on CLL always focus on rewriting the classification risk under the ordinary label distribution to the risk under the complementary label distribution and exploring the use of more loss functions [15,16,17,18,19]. These rewriting risk techniques prove the consistency relationship between the risk of complementary label classification and the risk of supervised classification. This enables the classifier to perform accurate classification using only the complementary labels. However, in this process, only complementary labels are involved in the risk calculation, and the information contained in them is extremely limited, which results in consistently lower performance of CLL compared to supervised learning. Therefore, we aim to enhance the supervision information of the complementary labels to further improve the performance of CLL. In this paper, we propose a two-step complementary label enhancement framework based on knowledge distillation (KDCL). It consists of the following components: 1) a teacher model trained on complementary label dataset to generate soft labels which contain more supervision information as label distribution; 2) a student model trained on the same dataset to learn from both soft labels and complementary labels; 3) a final loss function to integrate loss from soft labels and complementary labels and update parameters of the student model. We use three CLL loss functions to conduct experiments on several benchmark datasets, and compare the accuracy of the student model before and after enhancement by KDCL. The experimental results show that KDCL can effectively improve the performance of CLL.

    Supposing that the input sample is a d-dimensional vector xRd with class labels y{1,2,...,K}, where K stands for K classes in the dataset. Giving a training set D={(xi,yi)}Ni=1 with N samples, all of which independently follow the same distribution p(x,y). The goal of learning from true labels is to learn a mapping relation f(x) from the sample space Rd to the label space {1,2,...,K} and f(x) is also called a classifier. We want f(x) to minimize the multi-class classification risk:

    R(f)=Ep(x,y)D[L(f(x),y)], (1)

    where L(f(x),y) is multi-class loss function, f(x) is usually obtained by the following equation:

    f(x)=argmaxy1,2,,Kgy(x), (2)

    where g(x):RdRK. In deep neural networks, g(x) is the prediction distribution of the output from the last fully connected layer.

    In general, distribution p(x,y) is unknown. We can use the sample mean to approximate the classification risk in Eq (1). R(f) is empirically estimated as ˆR(f):

    ˆR(f)=1Nni=1L(f(xi),yi), (3)

    where N is the number of training data and i is the i-th sample.

    In CLL, each sample x is assigned only one complementary label ˉy. Therefore, the dataset is switched from D={(xi,yi)}Ni=1 to ˉD={(xi,ˉyi)}Ni=1, where ˉy{1,2,...,K}{y} and DˉD. ˉD independently follow an unknown distribution ˉp(x,ˉy). If all complementary labels are selected in an unbiased way, which means that they have the same probability of being chosen, ˉp(x,ˉy) can be presented as:

    ˉp(x,ˉy)=1K1yˉyp(x,y). (4)

    Supposing that ˉL(f(x),ˉy) is complementary loss function, we can obtain similar multi-class risk as Eq (1) in distribution ˉp(x,ˉy):

    ˉR(f)=Eˉp(x,ˉy)ˉD[ˉL(f(x),ˉy)]. (5)

    To our best knowledge, Ishida et al. [15] are the first to prove that the difference between Eq (1) and Eq (5) is constant when the loss function ˉL satisfies certain conditions and this constant M only depends on the number of categories K:

    R(f)=(K1)Eˉp(x,ˉy)ˉD[ˉL(f(x),ˉy)]+M=(K1)ˉR(f)+M. (6)

    All coefficients are constant when the loss function satisfies the condition. So it is possible to learn from complementary labels by minimizing R(f) in Eq (6). Then, they rewrite one-versus-all (OVA) loss LOVA and pairwise-comparison (PC) loss LPC in ordinary multi-class classification as ˉLOVA and ˉLPC in CLL:

    ˉLOVA(g(x),ˉy)=1K1yˉyl(gy(x))+l(gˉy(x)),ˉLPC(g(x),ˉy)=yˉyl(gy(x)gˉy(x)), (7)

    where l(z):RR is a binary loss and it must be nonconvex and symmetric, such as sigmoid loss. g(x) is the same as Eq (2) and gy(x) is the y-th element of g(x). Finally, the unbiased risk estimator of R(f) can be obtained by sample mean:

    ˆR(f)(K1)NNn=1ˉL(f(xn),ˉyn)+M. (8)

    Although it is feasible to learn a classifier that minimizes Eq (8) from complementary labels, the restriction on the loss function limits the application of URE. Yu et al. [16] analyze the relationship between ordinary and complementary labels in terms of conditional probability:

    P(ˉy=j|x)=ijP(ˉy=j|y=i)P(y=i|x), (9)

    where i,j{1,2,,K}. When all complementary labels are selected in an unbiased way, P(ˉy|y) can be expressed as a transition matrix Q:

    Q=[01K11K11K10]K×K, (10)

    where each element in Q represents P(ˉy=j|y=i). Since the true label and the complementary label of the sample are mutually-exclusive, that is P(ˉy=j|y=i)=0. Therefore, the entries on the diagonal of the matrix are 0.

    Combining Eqs (5), (9) and (10), we can rewrite ˉR(f) as:

    ˉR(f)=Eˉp(x,ˉy)[LCE(QTg(x),ˉy)], (11)

    where LCE is cross-entropy loss which is widely used in deep learning. The classification risk ˉR(f) in Eq (8) is also consistent with the ordinary classification risk R(f) [16].

    In image classification, outputs from the last fully connected layer of a deep neural network contain the predicted probability distribution of all classes after the Softmax function. Comparing with a single logical label, the outputs carry more information. Hinton et al. [22] define the outputs as soft labels and propose a knowledge distillation framework. We draw on the idea of knowledge distillation and hope to improve the performance of CLL by enhancing complementary labels through soft labels.

    In the framework of knowledge distillation, Hinton et al. [22] modify the Softmax function and they introduce the parameter T to control the smoothness of soft labels. The ordinary Softmax function can be expressed as follows:

    y'i=exp(yi)jexp(yj), (12)

    where y'i is the predicted probability of the i-th class, exp() is the exponential function and yi is the predicted output of the classification network for the ith class. The Softmax function combines the prediction outputs of the model for all classes, and uses the exponential function to normalize the output values in the interval [0, 1].

    The rewritten Softmax function is as follows:

    y'i=exp(yiT)jexp(yiT). (13)

    We present a comparison of the smoothness of soft labels for different T in Figure 2. As T gradually increases, soft labels will become smoother. Actually, T regulates the degree to the attention to the negative labels. The higher T, the more attention is paid to negative labels. T is an adjustable hyperparameter during training.

    Figure 2.  The smoothness of soft labels for different T. The higher T, the smoother soft labels will be.

    For one sample, soft labels not only clarify its correct category, but also contain the correlation between other labels. More abundant information is carried in soft labels than the complementary label. If we add an extra term to the ordinary supplementary label classification loss and introduce soft labels as additional supervision information, CLL will perform better than using only complementary labels. Of course, we need a model with high accuracy to produce soft labels, which will make the soft labels more credible. This model is also trained by complementary labels.

    Taking advantage of this property, we propose KDCL, a complementary label learning framework based on knowledge distillation. The overall structure is shown in Figure 3.

    Figure 3.  The framework architecture of KDCL. α and β are the weighting factors to balance KL loss and complementary loss.

    KDCL is a two-stage training framework consisting of a more complex teacher model with higher accuracy and a simpler student model with lower accuracy. First, the teacher model is trained with complementary labels on the dataset and predicts all samples in the training set. The prediction results are normalized by the Softmax function with T=t(t>1) to generate soft labels Stea. Second, the student model is trained and its outputs are processed in two ways, one to produce the soft prediction results Sstu with T=t(t>1), and the other to output ordinary prediction results Pstu with T=1. Then, the KL divergence between Stea and Sstu is calculated, and the complementary label loss between Pstu and the complementary labels is calculated at the same time. The two losses are weighted to obtain the final distillation loss. Finally, parameters of the student model will be updated by the final loss.

    In KDCL, the final loss consists of Kullback-Leible (KL) loss and complementary loss. On the one hand, the student model needs to learn knowledge from the teacher model to improve its ability. On the other hand, the teacher model is not completely correct, and the student model also needs to learn by itself to reduce the influence of the teacher model’s errors on the learning process. It is better to consider both of them.

    The final distillation loss consists of two parts and it can be expressed as follows:

    LKDCL=αLKL+LCL, (14)

    where LKL denotes the KL divergence and LCL denotes the complementary loss. Given the probability distributions pt from the teacher model and ps from the student model, their KL divergence can be expressed as follows:

    LKL(pt,ps)=iptilogpsipti, (15)

    where i denotes the i-th element in tensor pt or ps.

    We select three complementary losses for KDCL. They are the PC loss proposed by Ishida et al. [15], FWD loss proposed by Yu et al. [16] and SCL-NL loss proposed by Chou et al. [18]. Supposing that ps is the probability distribution for sample x from the student model and ˉy is the complementary label of x, these complementary losses are shown in Eqs (16)–(18).

    ˉLPC(ps,ˉy)=K1nyˉy(psypsˉy)K×(K1)2+K1, (16)
    ˉLFWD(ps,ˉy)=iˉyi×log(QT×psi), (17)
    ˉLSCLNL(ps,ˉy)=iˉyi×(log(1psˉy)), (18)

    where K denotes the number of categories of the dataset, and QT denotes the transpose of Q which is a K×K square matrix with all entries 1/(K1) except the diagonal.

    With parameters pt,psandˉy, the final loss can be expressed in more detail as follows:

    LKDPC(pt,ps,ˉy)=αLKL(pt,ps)+ˉLPC(ps,ˉy) (19)
    LKDFWD(pt,ps,ˉy)=αLKL(pt,ps)+ˉLFWD(ps,ˉy) (20)
    LKDSCL(pt,ps,ˉy)=αLKL(pt,ps)+ˉLSCLNL(ps,ˉy) (21)

    α is the weighting factor, which is used to control the degree of influence of soft labels on the overall classification loss. The values of α will be determined in the experiment.

    We evaluate and compare the student models optimized by KDCL with the same models only trained by complementary labels on four public image classification datasets. Three complementary label losses including PC loss [15], FWD loss [16] and SCL-NL loss [18], are used as loss functions for training the models. All the experiments are carried out on a server with a 15 vCPU Intel(R) Xeon(R) Platinum 8358P CPU @ 2.60GHz, 80 GB RAM and one RTX 3090 GPU with 24 GB memory.

    Four benchmark image classification datasets, including MNIST, Fashion-MNIST(F-MNIST), Kuzushiji-MNIST(K-MNIST) and CIFAR10, are used to verify the effectiveness of KDCL.

    MNIST: consists of 60,000 28 × 28 pixel grayscale images for training and 10,000 images for testing, with a total of 10 categories representing numbers between 0 and 9.

    F-MNIST: is an alternative dataset to MNIST and consists of 10 categories, 60,000 training images and 10,000 test images, each with a size of 28 × 28 pixels.

    K-MNIST: is a dataset derived from 10 Japanese ancient characters widely used between the mid-Heian period and early modern Japan, which is an extension of the MNIST dataset. K-MNIST contains a total of 74,000 gray-scale images of 28 × 28 pixels in 10 categories.

    CIFAR10: consists of 60,000 32 × 32 color images, 50,000 of which are used as the training set and 10,000 as the test set. Each category contains 6000 images.

    Following the settings in [15,17,18], we use an unbiased way to select complementary labels for samples in all datasets. Besides, we apply two different sets of teacher-student networks to these datasets. Specifically, for MNIST, F-MNIST and K-MNIST, we chose Lenet-5 [23] as the teacher model and MLP [24] with 500 hidden neurons as the student model. Because these datasets are relatively simple, simple networks can work well. For CIFAR10 dataset, since color images are more difficult to be classified, we need deeper CNN to extract features. We choose DenseNet-121 [25] as the teacher model and ResNet-18 [26] as the student model.

    In the setting of training details, for MNIST, F-MNIST and K-MNIST, we train Lenet-5 and MLP with 120 epochs and use SGD as the optimizer with a momentum 0.9 and a weight decay of 0.0001. The initial learning rate is 0.1 and it is halved every 30 epochs. The batch size is set to 128. For CIFAR10 dataset, we train DenseNet-121 and ResNet-18 with 80 epochs and use SGD as the optimizer with a momentum 0.9 and a weight decay of 0.0005. The learning rate is from {1e-1, 1e-2, 5e-3, 1e-3, 5e-4, 1e-4} and it is divided by 10 every 30 epochs.

    In Figure 4, we make a parameter sensitivity analysis of the distillation temperature T in Eq (13) and the soft label weighting factor α in Eqs (19)–(21).

    Figure 4.  Test accuracy results of different T with fixed α and comparison results of different α with fixed T. The experiments are conducted with Lenet-5 and MLP on MNIST, F-MNIST, K-MNIST and Desenet-121 and Resnet-18 on CIFAR-10.

    We first explore the influence of different distillation temperature T. As we can see, when T=1, which means directly using the probability distribution output by the teacher model as soft labels without softening, KDCL exhibits the worst accuracy. This is because when the temperature is low, there is a significant difference in soft labels between positive and negative classes, making it difficult for the student model to learn effectively. As T gradually increases, the soft labels become more and more smooth, and student model can easily learn the knowledge in soft labels, and the accuracy is gradually improved. When T80, the gap between positive and negative classes in soft labels is extremely small, as well as the influence of negative classes is too large, which leads to the accuracy no longer increasing, or even decreasing.

    Then, we further investigate the optimal value of soft label weighting factor α. We follow the setting in Hinton et al. [22], and set α in the range of 0 to 1. On the same dataset, the change of α does not have a great impact on the accuracy of KDCL. This indicates that the KDCL model parameter optimization process is not sensitive to the hyperparameter α. Nevertheless, the model still achieves higher accuracy when α=0.5.

    Based on the above analysis, we will set T=80, α=0.5 in subsequent experiments.

    We show the accuracy for all models with three complementary label losses before and after being optimized by KDCL on four datasets. The results are presented in Table 1.

    Table 1.  Comparison of classification accuracies between different methods using different network architectures on MNIST, F-MNIST, K-MNIST and CIFAR-10.
    Dataset MNIST F-MNIST K-MNIST CIFAR-10
    Model Lenet-5 MLP KDCL-MLP Lenet-5 MLP KDCL-MLP Lenet-5 MLP KDCL-MLP Lenet-5 MLP KDCL-MLP
    PC 89.94% 83.78% 86.10% 77.22% 76.67% 77.42% 67.77% 60.52% 60.34% 38.31% 32.74% 33.37%
    FWD 85.35% 83.67% 84.61% 85.35% 83.67% 84.61% 86.85% 70.86% 75.41% 60.74% 44.93% 46.65%
    SCL-NL 98.18% 92.06% 94.33% 85.93% 83.69% 84.66% 86.85% 70.59% 75.25% 61.64% 40.46% 45.98%

     | Show Table
    DownLoad: CSV

    In Table 1, we show the experimental results of KDCL, where we compare the performance of the student model optimized by KDCL with that trained only with complementary labels across different losses and datasets. On MNIST, which is a relatively simple and easy dataset, all methods can achieve high accuracies. With the help of KDCL, we improve the accuracy of MLP from 83.78% to 86.10% with PC loss, 92.07% to 94.32% with FWD loss and 92.06% to 94.33% with SCL-NL loss. SCL-NL loss performs better among three loss functions. Besides, after being enhanced by KDCL, the accuracy of KDCL-MLP falls between the accuracy of MLP model and Lenet-5. On F-MNIST, which is more complex than MNIST, all methods have a slight decrease. Our KDCL achives 77.42% with PC loss, 84.61% with FWD loss and 84.66% with SCL-NL loss. On K-MNIST, which is more complex than F-MNIST, when using PC loss, our method does not significantly improve the accuracy of MLP, but we improve 4.55% with FWD loss and 4.66% with SCL-NL loss. On CIFAR-10, which is the most complex among the four datasets, there is a significant drop in accuracies. Nevertheless, the student model can still be optimized by KDCL, demonstrating its robustness and effectiveness across different datasets.

    We show the testing process of all models in Figure 5.

    Figure 5.  Comparison of the testing process of teacher models, student models and KDCL-student models on four datasets.

    In Figure 5, we present the convergence speed of all models in our experiments. The results show that the student model distilled by KDCL converges faster than that trained only with complementary labels. This indicates that the model can learn the features of the images more accurately and efficiently when utilizing both soft labels and complementary labels.

    Additionally, we observe that the PC loss exhibits a decrease in accuracy on more challenging datasets, particularly on CIFAR10. This is because the PC loss uses the Sigmoid function as the normalization function, which can lead to negative values in the loss calculation and prevent the model from finding better parameters when updating. This phenomenon becomes more pronounced on the CIFAR10 dataset, where a peak appears. However, KDCL can alleviate this phenomenon and shift the peak to a later epoch. This demonstrates the effectiveness of KDCL in addressing the limitations of existing CLL methods and improving the performance of complementary label learning.

    In this study, we established a knowledge distillation training framework for CLL, called KDCL. As stated in the introduction, the supervision information in complementary labels is easily missed. The proposed framework employed a deep CNN model with higher accuracy to soften complementary labels to soft labels. Both soft labels and origion complementary labels are used to train the classification model. After the optimization of KDCL, compared to just using the normal CLL methods, the accuracy has been improved by 0.5–4.5%.

    The main limitation lies in multiple aspects. First, KDCL’s performance could be influenced by the choice of teacher-student models and CLL algorithms. Our experiments utilize specific combinations of models and algorithms, and the results may vary with different configurations. By choosing better CNN networks and more excellent CLL algorithms, KDCL can achieve better performance on more difficult datasets. Another drawback of the proposed scheme is time cost. Due to the two-stage training framework of KDCL, which involves training a high-accuracy teacher model using complementary labels, the overall training time cost of KDCL is relatively high. Training a high-accuracy model typically takes a considerable amount of time, which poses a challenge to the efficiency of KDCL. In addition, KDCL is only tested on public datasets, and the data distribution is relatively uniform. In the future, we also consider expanding the application scope of KDCL to use dynamically imbalanced data for CLL, or to combine with hybrid deep learning models [27,28,29].

    In this paper, we give the first attempt to leverage the knowledge distillation training framework in CLL. To enhance the supervised information present in complementary labels, which are often overlooked in existing CLL methods, we propose a complementary label enhancement framework based on knowledge distillation, called KDCL. Specifically, KDCL consists of a teacher model and a student model. By adopting knowledge distillation techniques, the teacher model transfers its softened knowledge to the student model. The student model then learns from both soft labels and complementary labels to improve its classification performance. The experimental results on four benchmark datasets show that KDCL can improve the classification accuracy of CLL, and maintain robustness and effectiveness on difficult datasets.

    The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

    This work was supported by the National Natural Science Foundation of China (No. 61976217, 62306320), the Natural Science Foundation of Jiangsu Province (No. BK20231063), the Fundamental Research Funds of Central Universities (No. 2019XKQYMS87), Science and Technology Planning Project of Xuzhou (No. KC21193).

    All authors declare that they have no conflicts of interest.



    [1] Agricultural Statistics Annual, United States Department of Agriculture, 2017. Available from: https://www.nass.usda.gov/Publications/Ag_Statistics/2017/index.php.
    [2] Per Capita Consumption of Poultry and Livestock, 1960 to Forecast 2020, in Pounds, National Chicken Council, 2019. Available from: https://www.nationalchickencouncil.org/about-the-industry/statistics/per-capita-consumption-of-poultry-\and-livestock-1965-to-estimated-2012-in-pounds/.
    [3] T. Zheng, P. Li, X. Ma, et al., Pilot-scale multi-level biological contact oxidation system on the treatment of high concentration poultry manure wastewater, Process Saf. Environ., 120 (2018), 187-194.
    [4] K. Yetilmezsoy, F. Ilhan, Z. Sapci-Zengin, et al., Decolorization and COD reduction of UASB pretreated poultry manure wastewater by electrocoagulation process: A post-treatment study, J. Hazard. Mater., 162 (2009), 120-132.
    [5] R. Rajagopal and D. I. Massé, Start-up of dry anaerobic digestion system for processing solid poultry litter using adapted liquid inoculum, Process Saf. Environ., 102 (2016), 495-502.
    [6] R. K. Upadhyay, K. Ranjit, J. Datta, et al., Emergence of spatial patterns in a damaged diffusive eco-epidemiological system, Internat. J. Bifur. Chaos., 28 (2018), 1830028. doi: 10.1142/S0218127418300288
    [7] W. Zhang, Y. Wei and Y. Jin, Full-scale processing by anaerobic baffle reactor, sequencing batch reactor, and sand filter for treating high-salinity wastewater from offshore oil rigs, Processes, 6 (2018).
    [8] T. Zhang, Global analysis of continuous flow bioreactor and membrane reactor models with death and maintenance, J. Math. Chem., 50 (2012), 2239-2247.
    [9] T. Zhang, Z. Chen and M. Han, Dynamical analysis of a stochastic model for cascaded continuous flow bioreactors, J. Math. Chem., 52 (2014), 1441-1459.
    [10] Z. Jiang, X. Bi, T. Zhang, et al., Global Hopf bifurcation of a delayed phytoplankton-zooplankton system considering toxin producing effect and delay dependent coefficient, Math. Biosci. Eng., 16 (2019), 3807-3829.
    [11] S. Biggs, M. Habgood, G. J. Jameson, et al., Aggregate structures formed via a bridging flocculation mechanism, Chem. Eng. J., 80 (2000), 13-22. doi: 10.1016/S1383-5866(00)00072-1
    [12] M. Hjorth and B. U. Jørgensen, Polymer flocculation mechanism in animal slurry established by charge neutralization, Water Res., 46 (2012), 1045-1051.
    [13] P. Sun, C. Hui, N. Bai, et al., Revealing the characteristics of a novel bioflocculant and its flocculation performance in Microcystis aeruginosa removal, Sci Rep., 5 (2015), 17465.
    [14] T. Holst, N. O. G Jørgensen, C. Jørgensen, et al., Degradation of microcystin in sediments at oxic and anoxic, denitrifying conditions, Water Res., 37 (2003), 4748-4760.
    [15] C. Butterfield, Studies of sewage purification: ii. a zooglea-forming bacterium isolated from activated sludge, Public Health Rep., 50 (1935), 671-684.
    [16] J. Chattopadhyay, R. Sarkar and A. El Abdllaoui, Conditions for production of microbial cell flocculant by aspergillus sojae AJ7002, Agric. Biol. Chem., 40 (1976), 1341-1347.
    [17] H. Takagi and K. Kadowaki, Purification and chemical properties of a flocculant produced by paecilomyces, Agric. Biol. Chem., 49 (1985), 3159-3164.
    [18] R. Kurane, K. Toeda, K. Takeda, et al., Culture conditions for production of microbial flocculant by rhodococcus erythropolis, Agric. Biol. Chem., 50 (1986), 2309-2313.
    [19] H. Salehizadeh and S. Shojaosadati, Isolation and characterisation of a bioflocculant produced by Bacillus firmus, Biotechnol. Lett., 24 (2002), 35-40.
    [20] Z. Zhang, S. Xia, J. Zhao, et al., Characterization and flocculation mechanism of high efficiency microbial flocculant TJ-F1 from proteus mirabilis, Colloid. Surface B., 75 (2010), 247-251.
    [21] K. Song, W. Ma, S. Guo, et al., A class of dynamic models describing microbial flocculant with nutrient competition and metabolic products in wastewater treatment, Adv. Difference Equ., 2018 (2018), 33.
    [22] K. Song, W. Ma, S. Guo, et al., Global behavior of a dynamic model with biodegradation of microcystins, J. Appl. Anal. Comput., 9 (2019), 1261-1276.
    [23] K. Song, T. Zhang and W. Ma, Nontrivial periodic solution of a stochastic non-autonomous model with biodegradation of microcystins, Appl. Math. Lett., 94 (2019), 87-93.
    [24] S. Guo and W. Ma, Global dynamics of a microorganism flocculation model with time delay, Commun. Pure Appl. Anal., 16 (2017), 1883-1891.
    [25] S. Guo, W. Ma and X. Zhao, Global dynamics of a time-delayed microorganism flocculation model with saturated functional responses, J. Dynam. Differential Equations, 30 (2018), 1247-1271.
    [26] M. Galle and C. Jungen, Discontinuous sewage treatment process and small installation for carrying out this process, EP 2004.
    [27] J. Fleischer, K. Schlafmann, R. Otchwemah, et al., Elimination of enteroviruses, other enteric viruses, F-specific coliphages, somatic coliphages and E. coli in four sewage treatment plants of southern Germany, J. Water. Supply: Res. T., 49 (2000), 127-138.
    [28] T. Hsu, T. Meadows, L. Meadows, et al., Growth on Two Limiting Essential Resources in a Self-Cycling Fermentor, Math. Biosci. Eng., 16 (2018), 78-100.
    [29] M. Mohajerani, M. Mehrvar and F. Ein-Mozaffari, Recent Achievements in Combination of Ultrasonolysis and Other Advanced Oxidation Processes for Wastewater Treatment, Int. J. Chem. React. Eng., 8 (2010).
    [30] T. Zhang, T. Zhang and X. Meng, Stability analysis of a chemostat model with maintenance energy, Appl. Math. Lett., 68 (2017), 1-7.
    [31] T. Zhang, X. Liu, X. Meng, et al., Spatio-temporal dynamics near the steady state of a planktonic system, Comput. Math. Appl., 75 (2018), 4490-4504. doi: 10.1016/j.camwa.2018.03.044
    [32] T. Zhang, W. Ma, X. Meng, et al., Periodic solution of a prey-predator model with nonlinear state feedback control, Appl. Math. Comput., 266 (2015), 95-107.
    [33] H. Zhang, P. Georgescu and L. Zhang, Periodic patterns and Pareto efficiency of state dependent impulsive controls regulating interactions between wild and transgenic mosquito populations, Commun. Nonlinear Sci. Numer. Simul., 31 (2016), 83-107.
    [34] H. Qi, X. Meng and T. Feng, Dynamics analysis of a stochastic non-autonomous one-predatortwo-prey system with Beddington-DeAngelis functional response and impulsive perturbations, Adv. Difference Equ., 2019 (2019), 235.
    [35] Y. Li, H. Cheng, J. Wang, et al., Dynamic analysis of unilateral diffusion Gompertz model with impulsive control strategy, Adv. Difference Equ., 2018 (2018), 32.
    [36] T. Zhang, W. Ma and X. Meng, Global dynamics of a delayed chemostat model with harvest by impulsive flocculant input, Adv. Difference Equ., 2017 (2017), 115.
    [37] B. Liu, Y. Zhang and L. Chen, The dynamical behaviors of a Lotka-Volterra predator-prey model concerning integrated pest management, Nonlinear Anal. Real World Appl., 6 (2005), 227-243.
    [38] G. Liu, Z. Chang and X. Meng, Asymptotic analysis of impulsive dispersal predator-prey systems with Markov switching on finite-state space, J. Funct. Spaces, 2019 (2019), 8057153.
    [39] J. Jiao, S. Cai and L. Chen, Analysis of a stage-structured predator-prey system with birth pulse and impulsive harvesting at different moments, Nonlinear Anal. Real World Appl., 12 (2011), 2232-2244.
    [40] M. Chi and W. Zhao, Dynamical Analysis of Two-Microorganism and Single Nutrient Stochastic Chemostat Model with Monod-Haldane Response Function, Complexity, 2019 (2019), 8719067.
    [41] X. Zhuo, Global attractability and permanence for a new stage-structured delay impulsive ecosystem, J. Appl. Anal. Comput., 8 (2018), 457-470.
    [42] J. Wang, H. Cheng, Y. Li, et al., The geometrical analysis of a predator-prey model with multi-state dependent impulsive, J. Appl. Anal. Comput., 8 (2018), 427-442.
    [43] Z. Jiang, W. Zhang, J. Zhang, et al., Dynamical analysis of a phytoplankton-zooplankton system with harvesting term and holling Ⅲ functional response, Internat. J. Bifur. Chaos., 28 (2018), 1850162.
    [44] S. Yuan and T. Zhang, Dynamics of a plasmid chemostat model with periodic nutrient input and delayed nutrient recycling, Nonlinear Anal. Real World Appl., 13 (2012), 2104-2119.
    [45] X. Meng, L. Wang and T. Zhang, Global dynamics analysis of a nonlinear impulsive stochastic chemostat system in a polluted environment, J. Appl. Anal. Comput., 6 (2016), 865-875.
    [46] J. Gao, B. Shen, E. Feng, et al., Modelling and optimal control for an impulsive dynamical system in microbial fed-batch culture, J. Comput. Appl. Math., 32 (2013), 275-290.
    [47] S. Sun, Y. Sun, G. Zhang, et al., Dynamical behavior of a stochastic two-species Monod competition chemostat model, in Appl. Math. Comput., 298 (2017), 153-170.
    [48] S. Zhang, X. Meng, T. Feng, et al., Dynamics analysis and numerical simulations of a stochastic non-autonomous predator-prey system with impulsive effects, Nonlinear Anal. Hybrid Syst., 26 (2017), 19-37.
    [49] Z. Li, L. Chen and Z. Liu, Periodic solution of a chemostat model with variable yield and impulsive state feedback control, in Appl. Math. Model., 36 (2012), 1255-1266.
    [50] M. Chi and W. Zhao, Dynamical analysis of multi-nutrient and single microorganism chemostat model in a polluted environment, Adv. Difference Equ., 2018 (2018), 120.
    [51] K. Sun, Y. Tian, L. Chen, et al., Nonlinear modelling of a synchronized chemostat with impulsive state feedback control, Math. Comput. Model., 52 (2010), 227-240.
    [52] H. Guo and L. Chen, Periodic solution of a chemostat model with Monod growth rate and impulsive state feedback control, J. Theoret. Biol., 260 (2009), 502-509.
    [53] K. Liu, T. Zhang and L. Chen, State-dependent pulse vaccination and therapeutic strategy in an SI epidemic model with nonlinear incidence rate, Comput. Math. Methods Med., 2019 (2019), Article ID 3859815, 10 pages.
    [54] T. Zhang, X. Meng, Y. Song, et al., A stage-structured predator-prey SI model with disease in the prey and impulsive effects, Math. Model. Anal., 18 (2013), 505-528.
    [55] S. Gao, L. Luo, S. Yan, et al., Dynamical behavior of a novel impulsive switching model for HLB with seasonal fluctuations, Complexity, 2018 (2018), 11 pages.
    [56] Y. Song, A. Miao, T. Zhang, et al., Extinction and persistence of a stochastic SIRS epidemic model with saturated incidence rate and transfer from infectious to susceptible, Adv. Difference Equ., 2018 (2018), 293.
    [57] X. Fan, Y. Song and W. Zhao, Modeling cell-to-cell spread of HIV-1 with nonlocal infections, Complexity, 2018 (2018), 2139290.
    [58] N. Gao, Y. Song, X. Wang, et al., Dynamics of a stochastic SIS epidemic model with nonlinear incidence rates, Adv. Difference Equ., 2019 (2019), 41.
    [59] Z. Bai, X. Dong and C. Yin, Existence results for impulsive nonlinear fractional differential equation with mixed boundary conditions, Bound. Value Probl., 2016 (2016), 63.
    [60] G. Li, W. Ling and C. Ding, A new comparison principle for impulsive functional differential equations, Discrete Dyn. Nat. Soc., (2015), 139828.
    [61] Z. Lü, Y. Zheng and L. Zhang, Razumikhin type boundedness theorems in terms of two measures for impulsive integro-differential systems, Acta. Math. Appl. Sin-E, 30 (2014), 1007-1016.
    [62] D. Bainov and P. Simeonov, Systems with Impulse Effect: Stability, Theory, and Applications, Ellis Horwood, Chichester, 1989.
    [63] V. Lakshmikantham, D. Bainov and P. Simeonov, Theory of Impulsive Differential Equations, Vol. 6, World Scientific, Singapore, 1989.
    [64] V. Nemytskii and V. Stepanov, Qualitative Theory of Differential Equations, Vol. 22, Courier Dover Publications, New York, 1989.
  • Reader Comments
  • © 2020 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(5159) PDF downloads(584) Cited by(13)

Figures and Tables

Figures(5)  /  Tables(1)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog