Research article Special Issues

Vaccine-induced reduction of COVID-19 clusters in school settings in Japan during the epidemic wave caused by B.1.1.529 (Omicron) BA.2, 2022

  • Clusters of COVID-19 in high-risk settings, such as schools, have been deemed a critical driving force of the major epidemic waves at the societal level. In Japan, the vaccination coverage among students remained low up to early 2022, especially for 5–11-year-olds. The vaccination of the student population only started in February 2022. Given this background and considering that vaccine effectiveness against school transmission has not been intensively studied, this paper proposes a mathematical model that links the occurrence of clustering to the case count among populations aged 0–19, 20–59, and 60+ years of age. We first estimated the protected (immune) fraction of each age group either by infection or vaccination and then linked the case count in each age group to the number of clusters via a time series regression model that accounts for the time-varying hazard of clustering per infector. From January 3 to May 30, 2022, there were 4,722 reported clusters in school settings. Our model suggests that the immunity offered by vaccination averted 226 (95% credible interval: 219–232) school clusters. Counterfactual scenarios assuming elevated vaccination coverage with faster roll-out reveal that additional school clusters could have been averted. Our study indicates that even relatively low vaccination coverage among students could substantially lower the risk of clustering through vaccine-induced immunity. Our results also suggest that antigenically updated vaccines that are more effective against the variant responsible for the ongoing epidemic may greatly help decrease not only the incidence but also the unnecessary loss of learning opportunities among school-age students.

    Citation: Yuta Okada, Hiroshi Nishiura. Vaccine-induced reduction of COVID-19 clusters in school settings in Japan during the epidemic wave caused by B.1.1.529 (Omicron) BA.2, 2022[J]. Mathematical Biosciences and Engineering, 2024, 21(9): 7087-7101. doi: 10.3934/mbe.2024312

    Related Papers:

    [1] William Guo . Solving word problems involving triangles and implications on training pre-service mathematics teachers. STEM Education, 2024, 4(3): 263-281. doi: 10.3934/steme.2024016
    [2] Isidro Max V. Alejandro, Joje Mar P. Sanchez, Gino G. Sumalinog, Janet A. Mananay, Charess E. Goles, Chery B. Fernandez . Pre-service teachers' technology acceptance of artificial intelligence (AI) applications in education. STEM Education, 2024, 4(4): 445-465. doi: 10.3934/steme.2024024
    [3] Ana Barbosa, Isabel Vale . Rebuilding manipulatives through digital making in teacher education. STEM Education, 2025, 5(4): 515-545. doi: 10.3934/steme.2025025
    [4] Ana Barbosa, Isabel Vale, Dina Alvarenga . The use of Tinkercad and 3D printing in interdisciplinary STEAM education: A focus on engineering design. STEM Education, 2024, 4(3): 222-246. doi: 10.3934/steme.2024014
    [5] Jingwen He, Shirley Simon, Feng-Kuang Chiang . A comparative study of pre-service teachers' perceptions on STEAM education in UK and China. STEM Education, 2022, 2(4): 318-344. doi: 10.3934/steme.2022020
    [6] Yujuan Li, Robert N. Hibbard, Peter L. A. Sercombe, Amanda L. Kelk, Cheng-Yuan Xu . Inspiring and engaging high school students with science and technology education in regional Australia. STEM Education, 2021, 1(2): 114-126. doi: 10.3934/steme.2021009
    [7] William Guo . Design and implementation of multi-purpose quizzes to improve mathematics learning for transitional engineering students. STEM Education, 2022, 2(3): 245-261. doi: 10.3934/steme.2022015
    [8] William Guo . A comparative case study of a foundation mathematics course for student mathematics teachers before and after the COVID-19 pandemic and implications for low-enrollment programs at regional universities. STEM Education, 2025, 5(3): 333-355. doi: 10.3934/steme.2025017
    [9] William Guo . Solving problems involving numerical integration (I): Incorporating different techniques. STEM Education, 2023, 3(2): 130-147. doi: 10.3934/steme.2023009
    [10] Tang Wee Teo, Zi Qi Peh . An exploratory study on eye-gaze patterns of experts and novices of science inference graph items. STEM Education, 2023, 3(3): 205-229. doi: 10.3934/steme.2023013
  • Clusters of COVID-19 in high-risk settings, such as schools, have been deemed a critical driving force of the major epidemic waves at the societal level. In Japan, the vaccination coverage among students remained low up to early 2022, especially for 5–11-year-olds. The vaccination of the student population only started in February 2022. Given this background and considering that vaccine effectiveness against school transmission has not been intensively studied, this paper proposes a mathematical model that links the occurrence of clustering to the case count among populations aged 0–19, 20–59, and 60+ years of age. We first estimated the protected (immune) fraction of each age group either by infection or vaccination and then linked the case count in each age group to the number of clusters via a time series regression model that accounts for the time-varying hazard of clustering per infector. From January 3 to May 30, 2022, there were 4,722 reported clusters in school settings. Our model suggests that the immunity offered by vaccination averted 226 (95% credible interval: 219–232) school clusters. Counterfactual scenarios assuming elevated vaccination coverage with faster roll-out reveal that additional school clusters could have been averted. Our study indicates that even relatively low vaccination coverage among students could substantially lower the risk of clustering through vaccine-induced immunity. Our results also suggest that antigenically updated vaccines that are more effective against the variant responsible for the ongoing epidemic may greatly help decrease not only the incidence but also the unnecessary loss of learning opportunities among school-age students.



    Supervised learning is an important branch of machine learning. In supervised multi-classification problems, each sample is assigned a label which indicates the category it belongs to [1]. Supervised learning is effective when there are enough samples with high quality labels. However, it is expensive and time-consuming to build datasets with a multitude of accurate labels. To solve this problem, researchers have proposed a series of weakly supervised learning (WSL) methods, which aim to train models with partial, incomplete or inaccurate supervised information, such as noise-label learning [2,3,4,5], semi-supervised learning [6,7,8,9], partial-label learning [10,11,12], positive-confidence learning [13], unlabeled-unlabeled learning [14] and others.

    In this paper, we consider another WSLframework called complementary label learning (CLL). We show the difference between complemtary labels and true labels in Figure 1. Compared to an ordinary label, a complementary label indicates the class that the sample does not belong to. Obviously, it is easier and less costly to collect these complementary labels. For example, in some very specialized domains, the expert knowledge is very expensive. If complementary labels are used for annotation, we need to only determine the extent of the label space and then use common sense to determine which category is wrong. It is much simpler and faster to determine which class a sample does not belong to than it belongs to. Besides, CLL can also protect data privacy in some sensitive fields like medical and financial records because we no longer need to disclose the true information of the data. This not only protects data privacy and security, but also makes it easier to collect data in these areas.

    Figure 1.  Comparison of the complementary labels (bottom) with the real labels (top). Complementary label is one of categories the image does not belong to.

    The framework of CLL was first proposed by Ishida et al. [15]. They proved that the unbiased risk estimator (URE) only from complementary labels is equivalent to the ordinary classification risk when the loss function satisfies certain conditions. In URE, the loss function must be nonconvex and symmetric which leads to certain limitations. To overcome this limitation, Yu et al. [16] made cross-entropy loss usable in CLL by constructing a complementary label transition matrix, and they also considered that different labels had different probability of being selected as a complementary label. Then, Ishida et al. [17] expanded URE and proposed a CLL framework adapted to more general loss functions. This framework still has an unbiased estimator of the regular classification risk, but it works for all loss functions. Chou et al. [18] optimized URE from gradient estimation, and proposed that using surrogate complementary loss (SCL) to obtain unbiased risk estimation, which effectively alleviated the problem of overfitting in URE. Liu et al. [19] applied common losses such as categorical cross entropy (CCE), mean square error (MSE) and mean absolute error (MAE) to CLL. Ishiguro et al. [20] conducted a study on the problem that complementary labels may be affected by label noise. To mitigate its adverse effects, they selected losses with noise robustness which satisfied weighted symmetric condition or a more relaxed condition. Recently, Zhang et al. [21] broadened the setting of complementary label datasets and discussed the case that the datasets contained a large number of complementary labels and a small number of true labels at the same time. They proposed an adversarial complementary label learning network, named Clarinet. Clarinet consists of two deep neural networks, one to classify complementary labels and true labels, and the other to learn from complementary labels.

    Previous studies on CLL always focus on rewriting the classification risk under the ordinary label distribution to the risk under the complementary label distribution and exploring the use of more loss functions [15,16,17,18,19]. These rewriting risk techniques prove the consistency relationship between the risk of complementary label classification and the risk of supervised classification. This enables the classifier to perform accurate classification using only the complementary labels. However, in this process, only complementary labels are involved in the risk calculation, and the information contained in them is extremely limited, which results in consistently lower performance of CLL compared to supervised learning. Therefore, we aim to enhance the supervision information of the complementary labels to further improve the performance of CLL. In this paper, we propose a two-step complementary label enhancement framework based on knowledge distillation (KDCL). It consists of the following components: 1) a teacher model trained on complementary label dataset to generate soft labels which contain more supervision information as label distribution; 2) a student model trained on the same dataset to learn from both soft labels and complementary labels; 3) a final loss function to integrate loss from soft labels and complementary labels and update parameters of the student model. We use three CLL loss functions to conduct experiments on several benchmark datasets, and compare the accuracy of the student model before and after enhancement by KDCL. The experimental results show that KDCL can effectively improve the performance of CLL.

    Supposing that the input sample is a d-dimensional vector xRd with class labels y{1,2,...,K}, where K stands for K classes in the dataset. Giving a training set D={(xi,yi)}Ni=1 with N samples, all of which independently follow the same distribution p(x,y). The goal of learning from true labels is to learn a mapping relation f(x) from the sample space Rd to the label space {1,2,...,K} and f(x) is also called a classifier. We want f(x) to minimize the multi-class classification risk:

    R(f)=Ep(x,y)D[L(f(x),y)], (1)

    where L(f(x),y) is multi-class loss function, f(x) is usually obtained by the following equation:

    f(x)=argmaxy1,2,,Kgy(x), (2)

    where g(x):RdRK. In deep neural networks, g(x) is the prediction distribution of the output from the last fully connected layer.

    In general, distribution p(x,y) is unknown. We can use the sample mean to approximate the classification risk in Eq (1). R(f) is empirically estimated as ˆR(f):

    ˆR(f)=1Nni=1L(f(xi),yi), (3)

    where N is the number of training data and i is the i-th sample.

    In CLL, each sample x is assigned only one complementary label ˉy. Therefore, the dataset is switched from D={(xi,yi)}Ni=1 to ˉD={(xi,ˉyi)}Ni=1, where ˉy{1,2,...,K}{y} and DˉD. ˉD independently follow an unknown distribution ˉp(x,ˉy). If all complementary labels are selected in an unbiased way, which means that they have the same probability of being chosen, ˉp(x,ˉy) can be presented as:

    ˉp(x,ˉy)=1K1yˉyp(x,y). (4)

    Supposing that ˉL(f(x),ˉy) is complementary loss function, we can obtain similar multi-class risk as Eq (1) in distribution ˉp(x,ˉy):

    ˉR(f)=Eˉp(x,ˉy)ˉD[ˉL(f(x),ˉy)]. (5)

    To our best knowledge, Ishida et al. [15] are the first to prove that the difference between Eq (1) and Eq (5) is constant when the loss function ˉL satisfies certain conditions and this constant M only depends on the number of categories K:

    R(f)=(K1)Eˉp(x,ˉy)ˉD[ˉL(f(x),ˉy)]+M=(K1)ˉR(f)+M. (6)

    All coefficients are constant when the loss function satisfies the condition. So it is possible to learn from complementary labels by minimizing R(f) in Eq (6). Then, they rewrite one-versus-all (OVA) loss LOVA and pairwise-comparison (PC) loss LPC in ordinary multi-class classification as ˉLOVA and ˉLPC in CLL:

    ˉLOVA(g(x),ˉy)=1K1yˉyl(gy(x))+l(gˉy(x)),ˉLPC(g(x),ˉy)=yˉyl(gy(x)gˉy(x)), (7)

    where l(z):RR is a binary loss and it must be nonconvex and symmetric, such as sigmoid loss. g(x) is the same as Eq (2) and gy(x) is the y-th element of g(x). Finally, the unbiased risk estimator of R(f) can be obtained by sample mean:

    ˆR(f)(K1)NNn=1ˉL(f(xn),ˉyn)+M. (8)

    Although it is feasible to learn a classifier that minimizes Eq (8) from complementary labels, the restriction on the loss function limits the application of URE. Yu et al. [16] analyze the relationship between ordinary and complementary labels in terms of conditional probability:

    P(ˉy=j|x)=ijP(ˉy=j|y=i)P(y=i|x), (9)

    where i,j{1,2,,K}. When all complementary labels are selected in an unbiased way, P(ˉy|y) can be expressed as a transition matrix Q:

    Q=[01K11K11K10]K×K, (10)

    where each element in Q represents P(ˉy=j|y=i). Since the true label and the complementary label of the sample are mutually-exclusive, that is P(ˉy=j|y=i)=0. Therefore, the entries on the diagonal of the matrix are 0.

    Combining Eqs (5), (9) and (10), we can rewrite ˉR(f) as:

    ˉR(f)=Eˉp(x,ˉy)[LCE(QTg(x),ˉy)], (11)

    where LCE is cross-entropy loss which is widely used in deep learning. The classification risk ˉR(f) in Eq (8) is also consistent with the ordinary classification risk R(f) [16].

    In image classification, outputs from the last fully connected layer of a deep neural network contain the predicted probability distribution of all classes after the Softmax function. Comparing with a single logical label, the outputs carry more information. Hinton et al. [22] define the outputs as soft labels and propose a knowledge distillation framework. We draw on the idea of knowledge distillation and hope to improve the performance of CLL by enhancing complementary labels through soft labels.

    In the framework of knowledge distillation, Hinton et al. [22] modify the Softmax function and they introduce the parameter T to control the smoothness of soft labels. The ordinary Softmax function can be expressed as follows:

    y'i=exp(yi)jexp(yj), (12)

    where y'i is the predicted probability of the i-th class, exp() is the exponential function and yi is the predicted output of the classification network for the ith class. The Softmax function combines the prediction outputs of the model for all classes, and uses the exponential function to normalize the output values in the interval [0, 1].

    The rewritten Softmax function is as follows:

    y'i=exp(yiT)jexp(yiT). (13)

    We present a comparison of the smoothness of soft labels for different T in Figure 2. As T gradually increases, soft labels will become smoother. Actually, T regulates the degree to the attention to the negative labels. The higher T, the more attention is paid to negative labels. T is an adjustable hyperparameter during training.

    Figure 2.  The smoothness of soft labels for different T. The higher T, the smoother soft labels will be.

    For one sample, soft labels not only clarify its correct category, but also contain the correlation between other labels. More abundant information is carried in soft labels than the complementary label. If we add an extra term to the ordinary supplementary label classification loss and introduce soft labels as additional supervision information, CLL will perform better than using only complementary labels. Of course, we need a model with high accuracy to produce soft labels, which will make the soft labels more credible. This model is also trained by complementary labels.

    Taking advantage of this property, we propose KDCL, a complementary label learning framework based on knowledge distillation. The overall structure is shown in Figure 3.

    Figure 3.  The framework architecture of KDCL. α and β are the weighting factors to balance KL loss and complementary loss.

    KDCL is a two-stage training framework consisting of a more complex teacher model with higher accuracy and a simpler student model with lower accuracy. First, the teacher model is trained with complementary labels on the dataset and predicts all samples in the training set. The prediction results are normalized by the Softmax function with T=t(t>1) to generate soft labels Stea. Second, the student model is trained and its outputs are processed in two ways, one to produce the soft prediction results Sstu with T=t(t>1), and the other to output ordinary prediction results Pstu with T=1. Then, the KL divergence between Stea and Sstu is calculated, and the complementary label loss between Pstu and the complementary labels is calculated at the same time. The two losses are weighted to obtain the final distillation loss. Finally, parameters of the student model will be updated by the final loss.

    In KDCL, the final loss consists of Kullback-Leible (KL) loss and complementary loss. On the one hand, the student model needs to learn knowledge from the teacher model to improve its ability. On the other hand, the teacher model is not completely correct, and the student model also needs to learn by itself to reduce the influence of the teacher model’s errors on the learning process. It is better to consider both of them.

    The final distillation loss consists of two parts and it can be expressed as follows:

    LKDCL=αLKL+LCL, (14)

    where LKL denotes the KL divergence and LCL denotes the complementary loss. Given the probability distributions pt from the teacher model and ps from the student model, their KL divergence can be expressed as follows:

    LKL(pt,ps)=iptilogpsipti, (15)

    where i denotes the i-th element in tensor pt or ps.

    We select three complementary losses for KDCL. They are the PC loss proposed by Ishida et al. [15], FWD loss proposed by Yu et al. [16] and SCL-NL loss proposed by Chou et al. [18]. Supposing that ps is the probability distribution for sample x from the student model and ˉy is the complementary label of x, these complementary losses are shown in Eqs (16)–(18).

    ˉLPC(ps,ˉy)=K1nyˉy(psypsˉy)K×(K1)2+K1, (16)
    ˉLFWD(ps,ˉy)=iˉyi×log(QT×psi), (17)
    ˉLSCLNL(ps,ˉy)=iˉyi×(log(1psˉy)), (18)

    where K denotes the number of categories of the dataset, and QT denotes the transpose of Q which is a K×K square matrix with all entries 1/(K1) except the diagonal.

    With parameters pt,psandˉy, the final loss can be expressed in more detail as follows:

    LKDPC(pt,ps,ˉy)=αLKL(pt,ps)+ˉLPC(ps,ˉy) (19)
    LKDFWD(pt,ps,ˉy)=αLKL(pt,ps)+ˉLFWD(ps,ˉy) (20)
    LKDSCL(pt,ps,ˉy)=αLKL(pt,ps)+ˉLSCLNL(ps,ˉy) (21)

    α is the weighting factor, which is used to control the degree of influence of soft labels on the overall classification loss. The values of α will be determined in the experiment.

    We evaluate and compare the student models optimized by KDCL with the same models only trained by complementary labels on four public image classification datasets. Three complementary label losses including PC loss [15], FWD loss [16] and SCL-NL loss [18], are used as loss functions for training the models. All the experiments are carried out on a server with a 15 vCPU Intel(R) Xeon(R) Platinum 8358P CPU @ 2.60GHz, 80 GB RAM and one RTX 3090 GPU with 24 GB memory.

    Four benchmark image classification datasets, including MNIST, Fashion-MNIST(F-MNIST), Kuzushiji-MNIST(K-MNIST) and CIFAR10, are used to verify the effectiveness of KDCL.

    MNIST: consists of 60,000 28 × 28 pixel grayscale images for training and 10,000 images for testing, with a total of 10 categories representing numbers between 0 and 9.

    F-MNIST: is an alternative dataset to MNIST and consists of 10 categories, 60,000 training images and 10,000 test images, each with a size of 28 × 28 pixels.

    K-MNIST: is a dataset derived from 10 Japanese ancient characters widely used between the mid-Heian period and early modern Japan, which is an extension of the MNIST dataset. K-MNIST contains a total of 74,000 gray-scale images of 28 × 28 pixels in 10 categories.

    CIFAR10: consists of 60,000 32 × 32 color images, 50,000 of which are used as the training set and 10,000 as the test set. Each category contains 6000 images.

    Following the settings in [15,17,18], we use an unbiased way to select complementary labels for samples in all datasets. Besides, we apply two different sets of teacher-student networks to these datasets. Specifically, for MNIST, F-MNIST and K-MNIST, we chose Lenet-5 [23] as the teacher model and MLP [24] with 500 hidden neurons as the student model. Because these datasets are relatively simple, simple networks can work well. For CIFAR10 dataset, since color images are more difficult to be classified, we need deeper CNN to extract features. We choose DenseNet-121 [25] as the teacher model and ResNet-18 [26] as the student model.

    In the setting of training details, for MNIST, F-MNIST and K-MNIST, we train Lenet-5 and MLP with 120 epochs and use SGD as the optimizer with a momentum 0.9 and a weight decay of 0.0001. The initial learning rate is 0.1 and it is halved every 30 epochs. The batch size is set to 128. For CIFAR10 dataset, we train DenseNet-121 and ResNet-18 with 80 epochs and use SGD as the optimizer with a momentum 0.9 and a weight decay of 0.0005. The learning rate is from {1e-1, 1e-2, 5e-3, 1e-3, 5e-4, 1e-4} and it is divided by 10 every 30 epochs.

    In Figure 4, we make a parameter sensitivity analysis of the distillation temperature T in Eq (13) and the soft label weighting factor α in Eqs (19)–(21).

    Figure 4.  Test accuracy results of different T with fixed α and comparison results of different α with fixed T. The experiments are conducted with Lenet-5 and MLP on MNIST, F-MNIST, K-MNIST and Desenet-121 and Resnet-18 on CIFAR-10.

    We first explore the influence of different distillation temperature T. As we can see, when T=1, which means directly using the probability distribution output by the teacher model as soft labels without softening, KDCL exhibits the worst accuracy. This is because when the temperature is low, there is a significant difference in soft labels between positive and negative classes, making it difficult for the student model to learn effectively. As T gradually increases, the soft labels become more and more smooth, and student model can easily learn the knowledge in soft labels, and the accuracy is gradually improved. When T80, the gap between positive and negative classes in soft labels is extremely small, as well as the influence of negative classes is too large, which leads to the accuracy no longer increasing, or even decreasing.

    Then, we further investigate the optimal value of soft label weighting factor α. We follow the setting in Hinton et al. [22], and set α in the range of 0 to 1. On the same dataset, the change of α does not have a great impact on the accuracy of KDCL. This indicates that the KDCL model parameter optimization process is not sensitive to the hyperparameter α. Nevertheless, the model still achieves higher accuracy when α=0.5.

    Based on the above analysis, we will set T=80, α=0.5 in subsequent experiments.

    We show the accuracy for all models with three complementary label losses before and after being optimized by KDCL on four datasets. The results are presented in Table 1.

    Table 1.  Comparison of classification accuracies between different methods using different network architectures on MNIST, F-MNIST, K-MNIST and CIFAR-10.
    Dataset MNIST F-MNIST K-MNIST CIFAR-10
    Model Lenet-5 MLP KDCL-MLP Lenet-5 MLP KDCL-MLP Lenet-5 MLP KDCL-MLP Lenet-5 MLP KDCL-MLP
    PC 89.94% 83.78% 86.10% 77.22% 76.67% 77.42% 67.77% 60.52% 60.34% 38.31% 32.74% 33.37%
    FWD 85.35% 83.67% 84.61% 85.35% 83.67% 84.61% 86.85% 70.86% 75.41% 60.74% 44.93% 46.65%
    SCL-NL 98.18% 92.06% 94.33% 85.93% 83.69% 84.66% 86.85% 70.59% 75.25% 61.64% 40.46% 45.98%

     | Show Table
    DownLoad: CSV

    In Table 1, we show the experimental results of KDCL, where we compare the performance of the student model optimized by KDCL with that trained only with complementary labels across different losses and datasets. On MNIST, which is a relatively simple and easy dataset, all methods can achieve high accuracies. With the help of KDCL, we improve the accuracy of MLP from 83.78% to 86.10% with PC loss, 92.07% to 94.32% with FWD loss and 92.06% to 94.33% with SCL-NL loss. SCL-NL loss performs better among three loss functions. Besides, after being enhanced by KDCL, the accuracy of KDCL-MLP falls between the accuracy of MLP model and Lenet-5. On F-MNIST, which is more complex than MNIST, all methods have a slight decrease. Our KDCL achives 77.42% with PC loss, 84.61% with FWD loss and 84.66% with SCL-NL loss. On K-MNIST, which is more complex than F-MNIST, when using PC loss, our method does not significantly improve the accuracy of MLP, but we improve 4.55% with FWD loss and 4.66% with SCL-NL loss. On CIFAR-10, which is the most complex among the four datasets, there is a significant drop in accuracies. Nevertheless, the student model can still be optimized by KDCL, demonstrating its robustness and effectiveness across different datasets.

    We show the testing process of all models in Figure 5.

    Figure 5.  Comparison of the testing process of teacher models, student models and KDCL-student models on four datasets.

    In Figure 5, we present the convergence speed of all models in our experiments. The results show that the student model distilled by KDCL converges faster than that trained only with complementary labels. This indicates that the model can learn the features of the images more accurately and efficiently when utilizing both soft labels and complementary labels.

    Additionally, we observe that the PC loss exhibits a decrease in accuracy on more challenging datasets, particularly on CIFAR10. This is because the PC loss uses the Sigmoid function as the normalization function, which can lead to negative values in the loss calculation and prevent the model from finding better parameters when updating. This phenomenon becomes more pronounced on the CIFAR10 dataset, where a peak appears. However, KDCL can alleviate this phenomenon and shift the peak to a later epoch. This demonstrates the effectiveness of KDCL in addressing the limitations of existing CLL methods and improving the performance of complementary label learning.

    In this study, we established a knowledge distillation training framework for CLL, called KDCL. As stated in the introduction, the supervision information in complementary labels is easily missed. The proposed framework employed a deep CNN model with higher accuracy to soften complementary labels to soft labels. Both soft labels and origion complementary labels are used to train the classification model. After the optimization of KDCL, compared to just using the normal CLL methods, the accuracy has been improved by 0.5–4.5%.

    The main limitation lies in multiple aspects. First, KDCL’s performance could be influenced by the choice of teacher-student models and CLL algorithms. Our experiments utilize specific combinations of models and algorithms, and the results may vary with different configurations. By choosing better CNN networks and more excellent CLL algorithms, KDCL can achieve better performance on more difficult datasets. Another drawback of the proposed scheme is time cost. Due to the two-stage training framework of KDCL, which involves training a high-accuracy teacher model using complementary labels, the overall training time cost of KDCL is relatively high. Training a high-accuracy model typically takes a considerable amount of time, which poses a challenge to the efficiency of KDCL. In addition, KDCL is only tested on public datasets, and the data distribution is relatively uniform. In the future, we also consider expanding the application scope of KDCL to use dynamically imbalanced data for CLL, or to combine with hybrid deep learning models [27,28,29].

    In this paper, we give the first attempt to leverage the knowledge distillation training framework in CLL. To enhance the supervised information present in complementary labels, which are often overlooked in existing CLL methods, we propose a complementary label enhancement framework based on knowledge distillation, called KDCL. Specifically, KDCL consists of a teacher model and a student model. By adopting knowledge distillation techniques, the teacher model transfers its softened knowledge to the student model. The student model then learns from both soft labels and complementary labels to improve its classification performance. The experimental results on four benchmark datasets show that KDCL can improve the classification accuracy of CLL, and maintain robustness and effectiveness on difficult datasets.

    The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

    This work was supported by the National Natural Science Foundation of China (No. 61976217, 62306320), the Natural Science Foundation of Jiangsu Province (No. BK20231063), the Fundamental Research Funds of Central Universities (No. 2019XKQYMS87), Science and Technology Planning Project of Xuzhou (No. KC21193).

    All authors declare that they have no conflicts of interest.



    [1] N. Banholzer, K. Zürcher, P. Jent, P. Bittel, L. Furrer, M. Egger, et al., SARS-CoV-2 transmission with and without mask wearing or air cleaners in schools in Switzerland: A modeling study of epidemiological, environmental, and molecular data, PLoS Med., 20 (2023), e1004226. https://doi.org/10.1371/journal.pmed.1004226 doi: 10.1371/journal.pmed.1004226
    [2] E.A. Meyerowitz, A. Richterman, R.T. Gandhi, P.E. Sax, Transmission of SARS-CoV-2: A review of viral, host, and environmental factors, Ann. Intern. Med., 174 (2021), 69–79. https://doi.org/10.7326/M20-5008 doi: 10.7326/M20-5008
    [3] M. Cevik, J.L. Marcus, C. Buckee, T.C. Smith, Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) transmission dynamics should inform policy, Clin. Infect. Dis., 73 (2021), S170–S176. https://doi.org/10.1093/cid/ciaa1442 doi: 10.1093/cid/ciaa1442
    [4] D.C. Adam, P. Wu, J.Y. Wong, E.H.Y. Lau, T.K. Tsang, S. Cauchemez, et al., Clustering and superspreading potential of SARS-CoV-2 infections in Hong Kong, Nat Med., 26 (2020), 1714–1719. https://doi.org/10.1038/s41591-020-1092-0 doi: 10.1038/s41591-020-1092-0
    [5] T. Liu, D. Gong, J. Xiao, J. Hu, G. He, Z. Rong, W. Ma, Cluster infections play important roles in the rapid evolution of COVID-19 transmission: A systematic review, Int. J. Infect. Dis., 99 (2020), 374–380. https://doi.org/10.1016/j.ijid.2020.07.073 doi: 10.1016/j.ijid.2020.07.073
    [6] Y. Furuse, N. Tsuchiya, R. Miyahara, I. Yasuda, E. Sando, Y.K. Ko, et al., COVID-19 case-clusters and transmission chains in the communities in Japan, J. Infect., 84 (2022), 248–288. https://doi.org/10.1016/j.jinf.2021.08.016 doi: 10.1016/j.jinf.2021.08.016
    [7] P. Tupper, S. Pai, COVID Schools Canada, C. Colijn, COVID-19 cluster size and transmission rates in schools from crowdsourced case reports, ELife, 11 (2022), e76174. https://doi.org/10.7554/eLife.76174 doi: 10.7554/eLife.76174
    [8] M. Ueda, K. Hayashi, H. Nishiura, Identifying High-Risk Events for COVID-19 Transmission: Estimating the Risk of Clustering Using Nationwide Data, Viruses, 15 (2023), 456. https://doi.org/10.3390/v15020456 doi: 10.3390/v15020456
    [9] H. Oshitani, L.A.W.J. The Expert Members of The National COVID-19 Cluster Taskforce at The Ministry of Health, Cluster-Based Approach to Coronavirus Disease 2019 (COVID-19) Response in Japan, from February to April 2020, Jpn. J. Infect. Dis., 73 (2020), 491–493. https://doi.org/10.7883/yoken.JJID.2020.363 doi: 10.7883/yoken.JJID.2020.363
    [10] T. Imamura, A. Watanabe, Y. Serizawa, M. Nakashita, M. Saito, M. Okada, et al., Transmission of COVID-19 in Nightlife, Household, and Health Care Settings in Tokyo, Japan, in 2020, JAMA Netw Open., 6 (2023), e230589. https://doi.org/10.1001/jamanetworkopen.2023.0589 doi: 10.1001/jamanetworkopen.2023.0589
    [11] S. Nagata, T. Nakaya, Y. Adachi, T. Inamori, K. Nakamura, D. Arima, et al., Mobility change and COVID-19 in Japan: Mobile data analysis of locations of infection, J. Epidemiol., 31 (2021), 387–391. https://doi.org/10.2188/jea.JE20200625 doi: 10.2188/jea.JE20200625
    [12] M. Nakanishi, R. Shibasaki, S. Yamasaki, S. Miyazawa, S. Usami, H. Nishiura, et al., On-site dining in Tokyo during the COVID-19 pandemic: Time series analysis using mobile phone location data, JMIR MHealth UHealth., 9 (2021), e27342. https://doi.org/10.2196/27342 doi: 10.2196/27342
    [13] Y. Okada, S. Yamasaki, A. Nishida, R. Shibasaki, H. Nishiura, Night-time population consistently explains the transmission dynamics of coronavirus disease 2019 in three megacities in Japan, Front Public Health., 11 (2023). https://doi.org/10.3389/fpubh.2023.1163698 doi: 10.3389/fpubh.2023.1163698
    [14] Ministry of Education Culture Sports Science and Technology, Surveys on the situation in schools, case studies of initiatives (Japanese), Ministry of Education, Culture, Sports, Science and Technology. https://www.mext.go.jp/a_menu/coronavirus/mext_00007.html (accessed September 4, 2024)
    [15] A. Takamatsu, H. Honda, T. Miwa, T. Tabuchi, K. Taniguchi, K. Shibuya, et al., Factors associated with COVID-19 booster vaccine hesitancy: A nationwide, cross-sectional survey in Japan, Public Health, 223 (2023), 72–79. https://doi.org/10.1016/j.puhe.2023.07.02 doi: 10.1016/j.puhe.2023.07.02
    [16] Y. Takahashi, K. Ishitsuka, M. Sampei, S. Okawa, Y. Hosokawa, A. Ishiguro, et al., COVID-19 vaccine literacy and vaccine hesitancy among pregnant women and mothers of young children in Japan, Vaccine, 40 (2022), 6849–6856. https://doi.org/10.1016/j.vaccine.2022.09.094 doi: 10.1016/j.vaccine.2022.09.094
    [17] D. Courtney, P. Watson, M. Battaglia, B.H. Mulsant, P. Szatmari, COVID-19 Impacts on Child and Youth Anxiety and Depression: Challenges and Opportunities, Can, J, Psychiatry., 65 (2020), 688–691. https://doi.org/10.1177/0706743720935646 doi: 10.1177/0706743720935646
    [18] R.E. Norman, M. Byambaa, R. De, A. Butchart, J. Scott, T. Vos, The long-term health consequences of child physical abuse, emotional abuse, and neglect: A systematic review and meta-analysis, PLoS Med., 9 (2012), e1001349. https://doi.org/10.1371/journal.pmed.1001349 doi: 10.1371/journal.pmed.1001349
    [19] M. Sasanami, T. Kayano, H. Nishiura, The number of COVID-19 clusters in healthcare and elderly care facilities averted by vaccination of healthcare workers in Japan, February–June 2021, Math. Biosci. Eng., 19 (2022), 2762–2773. https://doi.org/10.3934/mbe.2022126 doi: 10.3934/mbe.2022126
    [20] E. Abdollahi, M. Haworth-Brockman, Y. Keynan, J.M. Langley, S.M. Moghadas, Simulating the effect of school closure during COVID-19 outbreaks in Ontario, Canada, BMC Med., 18 (2020), 230. https://doi.org/10.1186/s12916-020-01705-8 doi: 10.1186/s12916-020-01705-8
    [21] K.A. Auger, S.S. Shah, T. Richardson, D. Hartley, M. Hall, A. Warniment, et al., Association Between Statewide School Closure and COVID-19 Incidence and Mortality in the US, JAMA, 324 (2020), 859–870. https://doi.org/10.1001/jama.2020.1434 doi: 10.1001/jama.2020.1434
    [22] K. Fukumoto, C.T. McClean, K. Nakagawa, No causal effect of school closures in Japan on the spread of COVID-19 in spring 2020, Nat. Med., 27 (2021), 2111–2119. https://doi.org/10.1038/s41591-021-01571-8 doi: 10.1038/s41591-021-01571-8
    [23] K. Iwata, A. Doi, C. Miyakoshi, Was school closure effective in mitigating coronavirus disease 2019 (COVID-19)? Time series analysis using Bayesian inference, Int. J. Infect. Dis., 99 (2020), 57–61. https://doi.org/10.1016/j.ijid.2020.07.052 doi: 10.1016/j.ijid.2020.07.052
    [24] T. Leng, E.M. Hill, A. Holmes, E. Southall, R.N. Thompson, M.J. Tildesley, et al., Quantifying pupil-to-pupil SARS-CoV-2 transmission and the impact of lateral flow testing in English secondary schools, Nat. Commun., 13 (2022), 1106. https://doi.org/10.1038/s41467-022-28731-9 doi: 10.1038/s41467-022-28731-9
    [25] K.E. Wiens, C.P. Smith, E. Badillo-Goicoechea, K.H. Grantz, M.K. Grabowski, A.S. Azman, et al., In-person schooling and associated COVID-19 risk in the United States over spring semester 2021, Sci Adv., 8 (2022), eabm9128. https://doi.org/10.1126/sciadv.abm9128 doi: 10.1126/sciadv.abm9128
    [26] C. Molina Grané, P. Mancuso, M. Vicentini, F. Venturelli, O. Djuric, M. Manica, et al., SARS-CoV-2 transmission patterns in educational settings during the Alpha wave in Reggio-Emilia, Italy, Epidemics, 44 (2023), 100712. https://doi.org/10.1016/j.epidem.2023.100712 doi: 10.1016/j.epidem.2023.100712
    [27] K.O. Zimmerman, I.C. Akinboyo, M.A. Brookhart, A.E. Boutzoukas, K.A. McGann, M.J. Smith, et al., Incidence and secondary transmission of SARS-CoV-2 infections in schools, Pediatrics, 147 (2021), e2020048090. https://doi.org/10.1542/peds.2020-048090 doi: 10.1542/peds.2020-048090
    [28] P. van den Berg, E.M. Schechter-Perkins, R.S. Jack, I. Epshtein, R. Nelson, E. Oster, et al., Effectiveness of 3 Versus 6 ft of Physical Distancing for Controlling Spread of Coronavirus Disease 2019 Among Primary and Secondary Students and Staff: A Retrospective, Statewide Cohort Study, Clin. Infect. Dis., 73 (2021), 1871–1878. https://doi.org/10.1093/cid/ciab230 doi: 10.1093/cid/ciab230
    [29] M. Hast, M. Swanson, C. Scott, E. Oraka, C. Espinosa, E. Burnett, et al., Prevalence of risk behaviors and correlates of SARS-CoV-2 positivity among in-school contacts of confirmed cases in a Georgia school district in the pre-vaccine era, December 2020–January 2021, BMC Public Health, 22 (2022), 101. https://doi.org/10.1186/s12889-021-12347-7 doi: 10.1186/s12889-021-12347-7
    [30] T. Akaishi, S. Kushimoto, Y. Katori, N. Sugawara, K. Igarashi, M. Fujita, et al., COVID-19 transmission at schools in Japan, Tohoku J. Exp. Med., 255 (2021), 239–246. https://doi.org/10.1620/tjem.255.239 doi: 10.1620/tjem.255.239
    [31] E. Colosi, G. Bassignana, D.A. Contreras, C. Poirier, P.-Y. Boëlle, S. Cauchemez, et al., Screening and vaccination against COVID-19 to minimise school closure: A modelling study, Lancet Infect. Dis., 22 (2022), 977–989. https://doi.org/10.1016/S1473-3099(22)00138-4 doi: 10.1016/S1473-3099(22)00138-4
    [32] P. Tupper, C. Colijn, COVID-19 in schools: Mitigating classroom clusters in the context of variable transmission, PLoS Comput. Biol., 17 (2021), e1009120. https://doi.org/10.1371/journal.pcbi.1009120 doi: 10.1371/journal.pcbi.1009120
    [33] T. Imamura, M. Saito, Y.K. Ko, T. Imamura, K. Otani, H. Akaba, et al., Roles of children and adolescents in COVID-19 transmission in the community: A retrospective analysis of nationwide data in Japan, Front. Pediatr., 9 (2021). https://doi.org/10.3389/fped.2021.705882 doi: 10.3389/fped.2021.705882
    [34] T. Akaishi, T. Ishii, Coronavirus disease 2019 transmission and symptoms in young children during the severe acute respiratory syndrome coronavirus 2 Delta variant and Omicron variant outbreaks, J. Int. Med. Res., 50 (2022), 03000605221102079. https://doi.org/10.1177/03000605221102079 doi: 10.1177/03000605221102079
    [35] M. Sasanami, M. Fujimoto, T. Kayano, K. Hayashi, H. Nishiura, Projecting the COVID-19 immune landscape in Japan in the presence of waning immunity and booster vaccination, J. Theor. Biol., 559 (2023), 111384. https://doi.org/10.1016/j.jtbi.2022.111384 doi: 10.1016/j.jtbi.2022.111384
    [36] Ministry of Health Labour and Welfare, Visualizing the data: information on COVID-19 infections, (n.d.). https://covid19.mhlw.go.jp/en/ (accessed September 4, 2024)
    [37] Bureau of Social Welfare and Public Health Tokyo Metropolitan Government, Details of announcement of new coronavirus-positive patients in Tokyo, 2023, (n.d.). https://catalog.data.metro.tokyo.lg.jp/dataset/t000010d0000000068
    [38] Alex Selby, Estimating Generation Time Of Omicron - Covid-19, (2022). https://sonorouschocolate.com/covid19/index.php/Estimating_Generation_Time_Of_Omicron (accessed September 4, 2024)
    [39] National Institute of Infectious Diseases, Estimation of Incubation Period for SARS-CoV-2 Mutant B.1.1.529 Strain (Omicron Strain): Preliminary Report, 2022, (2022). https://www.niid.go.jp/niid/ja/2019-ncov/2551-cepr/10903-b11529-period.html (accessed September 4, 2024)
    [40] M. Salmon, D. Schumacher, M. Höhle, Monitoring Count Time Series in R: Aberration detection in public health surveillance, J. Stat. Softw., 70 (2016), 1–35. https://doi.org/10.18637/jss.v070.i10 doi: 10.18637/jss.v070.i10
    [41] C. Fraser, Estimating individual and household reproduction numbers in an emerging epidemic, PLoS One, 2 (2007), e758. https://doi.org/10.1371/journal.pone.0000758 doi: 10.1371/journal.pone.0000758
    [42] H. Nishiura, Time variations in the transmissibility of pandemic influenza in Prussia, Germany, from 1918–19, Theor. Biol. Med. Model., 4 (2007), 20. https://doi.org/10.1186/1742-4682-4-20 doi: 10.1186/1742-4682-4-20
    [43] H. Nishiura, G. Chowell, The effective reproduction number as a prelude to statistical estimation of time-dependent epidemic trends, in: G. Chowell, J.M. Hyman, L.M.A. Bettencourt, C. Castillo-Chavez (Eds.), Mathematical and Statistical Estimation Approaches in Epidemiology, Springer Netherlands, Dordrecht, 2009: pp. 103–121. https://doi.org/10.1007/978-90-481-2313-1_5
    [44] T. Sanada, T. Honda, F. Yasui, K. Yamaji, T. Munakata, N. Yamamoto, et al., Serologic survey of IgG against SARS-CoV-2 among hospital visitors without a history of SARS-CoV-2 infection in Tokyo, 2020–2021, J. Epidemiol., 32 (2022), 105–111. https://doi.org/10.2188/jea.JE20210324 doi: 10.2188/jea.JE20210324
    [45] Ministry of Internal Affairs and Communications Statistics Bureau, Statistics Bureau Homepage/Population Estimates/Population Estimates (as of October 1, 2022) - Nationwide: Age (each year), Population by Gender; Prefectures: Age (5-year age groups), Population by Gender. (in Japanese), 2023, (2023). https://www.stat.go.jp/data/jinsui/2022np/index.html (accessed September 4, 2024)
    [46] R: The R Project for Statistical Computing, (2023). https://www.r-project.org/ (accessed September 4, 2024)
    [47] J. Gabry, R. Češnovar, A. Johnson, S. Bronder, R Interface to CmdStan, (2023). https://mc-stan.org/cmdstanr/ (accessed September 4, 2024)
    [48] Stan Development Team, Stan modeling language users guide and reference manual, Version 2.32, 2023, (2023). https://mc-stan.org/ (accessed September 4, 2024)
    [49] L. Casini, M. Roccetti, Reopening Italy's schools in September 2020: A Bayesian estimation of the change in the growth rate of new SARS-CoV-2 cases, BMJ Open, 11 (2021), e051458. https://doi.org/10.1136/bmjopen-2021-051458 doi: 10.1136/bmjopen-2021-051458
    [50] T. Braeye, L. Catteau, R. Brondeel, J.A.F. van Loenhout, K. Proesmans, L. Cornelissen, et al., Vaccine effectiveness against transmission of alpha, delta and omicron SARS-COV-2-infection, Belgian contact tracing, 2021–2022, Vaccine, 41 (2023), 3292–3300. https://doi.org/10.1016/j.vaccine.2023.03.069 doi: 10.1016/j.vaccine.2023.03.069
  • mbe-21-09-312_supplementary_final.docx
  • Reader Comments
  • © 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(1328) PDF downloads(62) Cited by(0)

Figures and Tables

Figures(2)  /  Tables(1)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog