
The purpose of this study was to examine the effect of neuromuscular electrical stimulation (NMES) immediate intervention training on the countermovement jump (CMJ) height and to explore kinematic differences in the CMJ at each instant. A total of 15 male students who had never received electrical stimulation were randomly selected as the research participants. In the first test, the CMJ performance was completed with an all-out effort. The second experiment was best performed immediately to complete the CMJ operation after NMES for 30 min. Both experiments used a high-speed camera optical capture system to collect kinematic data. The results of this experiment revealed that after im-mediate NMES training, neuromuscular activation causes post-activation potentiation, which increases the height of the center of gravity of the CMJ and affects the angular velocity of the hip joint, the velocity and acceleration of the thigh and the shank and the velocity of the soles of the feet. The use of NMES interventional training based on the improvement of technical movements and physical exercises is recommended in the future.
Citation: Chao-Fu Chen, Shu-Fan Wang, Xing-Xing Shen, Lei Liu, Hui-Ju Wu. Kinematic analysis of countermovement jump performance in response to immediate neuromuscular electrical stimulation[J]. Mathematical Biosciences and Engineering, 2023, 20(9): 16033-16044. doi: 10.3934/mbe.2023715
[1] | William Guo . Solving word problems involving triangles and implications on training pre-service mathematics teachers. STEM Education, 2024, 4(3): 263-281. doi: 10.3934/steme.2024016 |
[2] | Isidro Max V. Alejandro, Joje Mar P. Sanchez, Gino G. Sumalinog, Janet A. Mananay, Charess E. Goles, Chery B. Fernandez . Pre-service teachers' technology acceptance of artificial intelligence (AI) applications in education. STEM Education, 2024, 4(4): 445-465. doi: 10.3934/steme.2024024 |
[3] | Ana Barbosa, Isabel Vale . Rebuilding manipulatives through digital making in teacher education. STEM Education, 2025, 5(4): 515-545. doi: 10.3934/steme.2025025 |
[4] | Ana Barbosa, Isabel Vale, Dina Alvarenga . The use of Tinkercad and 3D printing in interdisciplinary STEAM education: A focus on engineering design. STEM Education, 2024, 4(3): 222-246. doi: 10.3934/steme.2024014 |
[5] | Jingwen He, Shirley Simon, Feng-Kuang Chiang . A comparative study of pre-service teachers' perceptions on STEAM education in UK and China. STEM Education, 2022, 2(4): 318-344. doi: 10.3934/steme.2022020 |
[6] | Yujuan Li, Robert N. Hibbard, Peter L. A. Sercombe, Amanda L. Kelk, Cheng-Yuan Xu . Inspiring and engaging high school students with science and technology education in regional Australia. STEM Education, 2021, 1(2): 114-126. doi: 10.3934/steme.2021009 |
[7] | William Guo . Design and implementation of multi-purpose quizzes to improve mathematics learning for transitional engineering students. STEM Education, 2022, 2(3): 245-261. doi: 10.3934/steme.2022015 |
[8] | William Guo . A comparative case study of a foundation mathematics course for student mathematics teachers before and after the COVID-19 pandemic and implications for low-enrollment programs at regional universities. STEM Education, 2025, 5(3): 333-355. doi: 10.3934/steme.2025017 |
[9] | William Guo . Solving problems involving numerical integration (I): Incorporating different techniques. STEM Education, 2023, 3(2): 130-147. doi: 10.3934/steme.2023009 |
[10] | Tang Wee Teo, Zi Qi Peh . An exploratory study on eye-gaze patterns of experts and novices of science inference graph items. STEM Education, 2023, 3(3): 205-229. doi: 10.3934/steme.2023013 |
The purpose of this study was to examine the effect of neuromuscular electrical stimulation (NMES) immediate intervention training on the countermovement jump (CMJ) height and to explore kinematic differences in the CMJ at each instant. A total of 15 male students who had never received electrical stimulation were randomly selected as the research participants. In the first test, the CMJ performance was completed with an all-out effort. The second experiment was best performed immediately to complete the CMJ operation after NMES for 30 min. Both experiments used a high-speed camera optical capture system to collect kinematic data. The results of this experiment revealed that after im-mediate NMES training, neuromuscular activation causes post-activation potentiation, which increases the height of the center of gravity of the CMJ and affects the angular velocity of the hip joint, the velocity and acceleration of the thigh and the shank and the velocity of the soles of the feet. The use of NMES interventional training based on the improvement of technical movements and physical exercises is recommended in the future.
Supervised learning is an important branch of machine learning. In supervised multi-classification problems, each sample is assigned a label which indicates the category it belongs to [1]. Supervised learning is effective when there are enough samples with high quality labels. However, it is expensive and time-consuming to build datasets with a multitude of accurate labels. To solve this problem, researchers have proposed a series of weakly supervised learning (WSL) methods, which aim to train models with partial, incomplete or inaccurate supervised information, such as noise-label learning [2,3,4,5], semi-supervised learning [6,7,8,9], partial-label learning [10,11,12], positive-confidence learning [13], unlabeled-unlabeled learning [14] and others.
In this paper, we consider another WSLframework called complementary label learning (CLL). We show the difference between complemtary labels and true labels in Figure 1. Compared to an ordinary label, a complementary label indicates the class that the sample does not belong to. Obviously, it is easier and less costly to collect these complementary labels. For example, in some very specialized domains, the expert knowledge is very expensive. If complementary labels are used for annotation, we need to only determine the extent of the label space and then use common sense to determine which category is wrong. It is much simpler and faster to determine which class a sample does not belong to than it belongs to. Besides, CLL can also protect data privacy in some sensitive fields like medical and financial records because we no longer need to disclose the true information of the data. This not only protects data privacy and security, but also makes it easier to collect data in these areas.
The framework of CLL was first proposed by Ishida et al. [15]. They proved that the unbiased risk estimator (URE) only from complementary labels is equivalent to the ordinary classification risk when the loss function satisfies certain conditions. In URE, the loss function must be nonconvex and symmetric which leads to certain limitations. To overcome this limitation, Yu et al. [16] made cross-entropy loss usable in CLL by constructing a complementary label transition matrix, and they also considered that different labels had different probability of being selected as a complementary label. Then, Ishida et al. [17] expanded URE and proposed a CLL framework adapted to more general loss functions. This framework still has an unbiased estimator of the regular classification risk, but it works for all loss functions. Chou et al. [18] optimized URE from gradient estimation, and proposed that using surrogate complementary loss (SCL) to obtain unbiased risk estimation, which effectively alleviated the problem of overfitting in URE. Liu et al. [19] applied common losses such as categorical cross entropy (CCE), mean square error (MSE) and mean absolute error (MAE) to CLL. Ishiguro et al. [20] conducted a study on the problem that complementary labels may be affected by label noise. To mitigate its adverse effects, they selected losses with noise robustness which satisfied weighted symmetric condition or a more relaxed condition. Recently, Zhang et al. [21] broadened the setting of complementary label datasets and discussed the case that the datasets contained a large number of complementary labels and a small number of true labels at the same time. They proposed an adversarial complementary label learning network, named Clarinet. Clarinet consists of two deep neural networks, one to classify complementary labels and true labels, and the other to learn from complementary labels.
Previous studies on CLL always focus on rewriting the classification risk under the ordinary label distribution to the risk under the complementary label distribution and exploring the use of more loss functions [15,16,17,18,19]. These rewriting risk techniques prove the consistency relationship between the risk of complementary label classification and the risk of supervised classification. This enables the classifier to perform accurate classification using only the complementary labels. However, in this process, only complementary labels are involved in the risk calculation, and the information contained in them is extremely limited, which results in consistently lower performance of CLL compared to supervised learning. Therefore, we aim to enhance the supervision information of the complementary labels to further improve the performance of CLL. In this paper, we propose a two-step complementary label enhancement framework based on knowledge distillation (KDCL). It consists of the following components: 1) a teacher model trained on complementary label dataset to generate soft labels which contain more supervision information as label distribution; 2) a student model trained on the same dataset to learn from both soft labels and complementary labels; 3) a final loss function to integrate loss from soft labels and complementary labels and update parameters of the student model. We use three CLL loss functions to conduct experiments on several benchmark datasets, and compare the accuracy of the student model before and after enhancement by KDCL. The experimental results show that KDCL can effectively improve the performance of CLL.
Supposing that the input sample is a d-dimensional vector x∈Rd with class labels y∈{1,2,...,K}, where K stands for K classes in the dataset. Giving a training set D={(xi,yi)}Ni=1 with N samples, all of which independently follow the same distribution p(x,y). The goal of learning from true labels is to learn a mapping relation f(x) from the sample space Rd to the label space {1,2,...,K} and f(x) is also called a classifier. We want f(x) to minimize the multi-class classification risk:
R(f)=Ep(x,y)∼D[L(f(x),y)], | (1) |
where L(f(x),y) is multi-class loss function, f(x) is usually obtained by the following equation:
f(x)=argmaxy∈1,2,…,Kgy(x), | (2) |
where g(x):Rd→RK. In deep neural networks, g(x) is the prediction distribution of the output from the last fully connected layer.
In general, distribution p(x,y) is unknown. We can use the sample mean to approximate the classification risk in Eq (1). R(f) is empirically estimated as ˆR(f):
ˆR(f)=1Nn∑i=1L(f(xi),yi), | (3) |
where N is the number of training data and i is the i-th sample.
In CLL, each sample x is assigned only one complementary label ˉy. Therefore, the dataset is switched from D={(xi,yi)}Ni=1 to ˉD={(xi,ˉyi)}Ni=1, where ˉy∈{1,2,...,K}∖{y} and D≠ˉD. ˉD independently follow an unknown distribution ˉp(x,ˉy). If all complementary labels are selected in an unbiased way, which means that they have the same probability of being chosen, ˉp(x,ˉy) can be presented as:
ˉp(x,ˉy)=1K−1∑y≠ˉyp(x,y). | (4) |
Supposing that ˉL(f(x),ˉy) is complementary loss function, we can obtain similar multi-class risk as Eq (1) in distribution ˉp(x,ˉy):
ˉR(f)=Eˉp(x,ˉy)∼ˉD[ˉL(f(x),ˉy)]. | (5) |
To our best knowledge, Ishida et al. [15] are the first to prove that the difference between Eq (1) and Eq (5) is constant when the loss function ˉL satisfies certain conditions and this constant M only depends on the number of categories K:
R(f)=(K−1)Eˉp(x,ˉy)∼ˉD[ˉL(f(x),ˉy)]+M=(K−1)ˉR(f)+M. | (6) |
All coefficients are constant when the loss function satisfies the condition. So it is possible to learn from complementary labels by minimizing R(f) in Eq (6). Then, they rewrite one-versus-all (OVA) loss LOVA and pairwise-comparison (PC) loss LPC in ordinary multi-class classification as ˉLOVA and ˉLPC in CLL:
ˉLOVA(g(x),ˉy)=1K−1∑y≠ˉyl(gy(x))+l(−gˉy(x)),ˉLPC(g(x),ˉy)=∑y≠ˉyl(gy(x)−gˉy(x)), | (7) |
where l(z):R→R is a binary loss and it must be nonconvex and symmetric, such as sigmoid loss. g(x) is the same as Eq (2) and gy(x) is the y-th element of g(x). Finally, the unbiased risk estimator of R(f) can be obtained by sample mean:
ˆR(f)(K−1)NN∑n=1ˉL(f(xn),ˉyn)+M. | (8) |
Although it is feasible to learn a classifier that minimizes Eq (8) from complementary labels, the restriction on the loss function limits the application of URE. Yu et al. [16] analyze the relationship between ordinary and complementary labels in terms of conditional probability:
P(ˉy=j|x)=∑i≠jP(ˉy=j|y=i)P(y=i|x), | (9) |
where ∀i,j∈{1,2,…,K}. When all complementary labels are selected in an unbiased way, P(ˉy|y) can be expressed as a transition matrix Q:
Q=[0⋯1K−1⋮⋱⋮1K−11K−10]K×K, | (10) |
where each element in Q represents P(ˉy=j|y=i). Since the true label and the complementary label of the sample are mutually-exclusive, that is P(ˉy=j|y=i)=0. Therefore, the entries on the diagonal of the matrix are 0.
Combining Eqs (5), (9) and (10), we can rewrite ˉR(f) as:
ˉR(f)=Eˉp(x,ˉy)[LCE(QTg(x),ˉy)], | (11) |
where LCE is cross-entropy loss which is widely used in deep learning. The classification risk ˉR(f) in Eq (8) is also consistent with the ordinary classification risk R(f) [16].
In image classification, outputs from the last fully connected layer of a deep neural network contain the predicted probability distribution of all classes after the Softmax function. Comparing with a single logical label, the outputs carry more information. Hinton et al. [22] define the outputs as soft labels and propose a knowledge distillation framework. We draw on the idea of knowledge distillation and hope to improve the performance of CLL by enhancing complementary labels through soft labels.
In the framework of knowledge distillation, Hinton et al. [22] modify the Softmax function and they introduce the parameter T to control the smoothness of soft labels. The ordinary Softmax function can be expressed as follows:
y'i=exp(yi)∑jexp(yj), | (12) |
where y'i is the predicted probability of the i-th class, exp(⋅) is the exponential function and yi is the predicted output of the classification network for the ith class. The Softmax function combines the prediction outputs of the model for all classes, and uses the exponential function to normalize the output values in the interval [0, 1].
The rewritten Softmax function is as follows:
y'i=exp(yi╱T)∑jexp(yi╱T). | (13) |
We present a comparison of the smoothness of soft labels for different T in Figure 2. As T gradually increases, soft labels will become smoother. Actually, T regulates the degree to the attention to the negative labels. The higher T, the more attention is paid to negative labels. T is an adjustable hyperparameter during training.
For one sample, soft labels not only clarify its correct category, but also contain the correlation between other labels. More abundant information is carried in soft labels than the complementary label. If we add an extra term to the ordinary supplementary label classification loss and introduce soft labels as additional supervision information, CLL will perform better than using only complementary labels. Of course, we need a model with high accuracy to produce soft labels, which will make the soft labels more credible. This model is also trained by complementary labels.
Taking advantage of this property, we propose KDCL, a complementary label learning framework based on knowledge distillation. The overall structure is shown in Figure 3.
KDCL is a two-stage training framework consisting of a more complex teacher model with higher accuracy and a simpler student model with lower accuracy. First, the teacher model is trained with complementary labels on the dataset and predicts all samples in the training set. The prediction results are normalized by the Softmax function with T=t(t>1) to generate soft labels Stea. Second, the student model is trained and its outputs are processed in two ways, one to produce the soft prediction results Sstu with T=t(t>1), and the other to output ordinary prediction results Pstu with T=1. Then, the KL divergence between Stea and Sstu is calculated, and the complementary label loss between Pstu and the complementary labels is calculated at the same time. The two losses are weighted to obtain the final distillation loss. Finally, parameters of the student model will be updated by the final loss.
In KDCL, the final loss consists of Kullback-Leible (KL) loss and complementary loss. On the one hand, the student model needs to learn knowledge from the teacher model to improve its ability. On the other hand, the teacher model is not completely correct, and the student model also needs to learn by itself to reduce the influence of the teacher model’s errors on the learning process. It is better to consider both of them.
The final distillation loss consists of two parts and it can be expressed as follows:
LKDCL=αLKL+LCL, | (14) |
where LKL denotes the KL divergence and LCL denotes the complementary loss. Given the probability distributions pt from the teacher model and ps from the student model, their KL divergence can be expressed as follows:
LKL(pt,ps)=∑i−ptilogpsipti, | (15) |
where i denotes the i-th element in tensor pt or ps.
We select three complementary losses for KDCL. They are the PC loss proposed by Ishida et al. [15], FWD loss proposed by Yu et al. [16] and SCL-NL loss proposed by Chou et al. [18]. Supposing that ps is the probability distribution for sample x from the student model and ˉy is the complementary label of x, these complementary losses are shown in Eqs (16)–(18).
ˉLPC(ps,ˉy)=K−1n∑y≠ˉy(psy−psˉy)−K×(K−1)2+K−1, | (16) |
ˉLFWD(ps,ˉy)=−∑iˉyi×log(QT×psi), | (17) |
ˉLSCL−NL(ps,ˉy)=∑iˉyi×(−log(1−psˉy)), | (18) |
where K denotes the number of categories of the dataset, and QT denotes the transpose of Q which is a K×K square matrix with all entries 1/(K−1) except the diagonal.
With parameters pt,psandˉy, the final loss can be expressed in more detail as follows:
LKD−PC(pt,ps,ˉy)=αLKL(pt,ps)+ˉLPC(ps,ˉy) | (19) |
LKD−FWD(pt,ps,ˉy)=αLKL(pt,ps)+ˉLFWD(ps,ˉy) | (20) |
LKD−SCL(pt,ps,ˉy)=αLKL(pt,ps)+ˉLSCL−NL(ps,ˉy) | (21) |
α is the weighting factor, which is used to control the degree of influence of soft labels on the overall classification loss. The values of α will be determined in the experiment.
We evaluate and compare the student models optimized by KDCL with the same models only trained by complementary labels on four public image classification datasets. Three complementary label losses including PC loss [15], FWD loss [16] and SCL-NL loss [18], are used as loss functions for training the models. All the experiments are carried out on a server with a 15 vCPU Intel(R) Xeon(R) Platinum 8358P CPU @ 2.60GHz, 80 GB RAM and one RTX 3090 GPU with 24 GB memory.
Four benchmark image classification datasets, including MNIST, Fashion-MNIST(F-MNIST), Kuzushiji-MNIST(K-MNIST) and CIFAR10, are used to verify the effectiveness of KDCL.
MNIST: consists of 60,000 28 × 28 pixel grayscale images for training and 10,000 images for testing, with a total of 10 categories representing numbers between 0 and 9.
F-MNIST: is an alternative dataset to MNIST and consists of 10 categories, 60,000 training images and 10,000 test images, each with a size of 28 × 28 pixels.
K-MNIST: is a dataset derived from 10 Japanese ancient characters widely used between the mid-Heian period and early modern Japan, which is an extension of the MNIST dataset. K-MNIST contains a total of 74,000 gray-scale images of 28 × 28 pixels in 10 categories.
CIFAR10: consists of 60,000 32 × 32 color images, 50,000 of which are used as the training set and 10,000 as the test set. Each category contains 6000 images.
Following the settings in [15,17,18], we use an unbiased way to select complementary labels for samples in all datasets. Besides, we apply two different sets of teacher-student networks to these datasets. Specifically, for MNIST, F-MNIST and K-MNIST, we chose Lenet-5 [23] as the teacher model and MLP [24] with 500 hidden neurons as the student model. Because these datasets are relatively simple, simple networks can work well. For CIFAR10 dataset, since color images are more difficult to be classified, we need deeper CNN to extract features. We choose DenseNet-121 [25] as the teacher model and ResNet-18 [26] as the student model.
In the setting of training details, for MNIST, F-MNIST and K-MNIST, we train Lenet-5 and MLP with 120 epochs and use SGD as the optimizer with a momentum 0.9 and a weight decay of 0.0001. The initial learning rate is 0.1 and it is halved every 30 epochs. The batch size is set to 128. For CIFAR10 dataset, we train DenseNet-121 and ResNet-18 with 80 epochs and use SGD as the optimizer with a momentum 0.9 and a weight decay of 0.0005. The learning rate is from {1e-1, 1e-2, 5e-3, 1e-3, 5e-4, 1e-4} and it is divided by 10 every 30 epochs.
In Figure 4, we make a parameter sensitivity analysis of the distillation temperature T in Eq (13) and the soft label weighting factor α in Eqs (19)–(21).
We first explore the influence of different distillation temperature T. As we can see, when T=1, which means directly using the probability distribution output by the teacher model as soft labels without softening, KDCL exhibits the worst accuracy. This is because when the temperature is low, there is a significant difference in soft labels between positive and negative classes, making it difficult for the student model to learn effectively. As T gradually increases, the soft labels become more and more smooth, and student model can easily learn the knowledge in soft labels, and the accuracy is gradually improved. When T≥80, the gap between positive and negative classes in soft labels is extremely small, as well as the influence of negative classes is too large, which leads to the accuracy no longer increasing, or even decreasing.
Then, we further investigate the optimal value of soft label weighting factor α. We follow the setting in Hinton et al. [22], and set α in the range of 0 to 1. On the same dataset, the change of α does not have a great impact on the accuracy of KDCL. This indicates that the KDCL model parameter optimization process is not sensitive to the hyperparameter α. Nevertheless, the model still achieves higher accuracy when α=0.5.
Based on the above analysis, we will set T=80, α=0.5 in subsequent experiments.
We show the accuracy for all models with three complementary label losses before and after being optimized by KDCL on four datasets. The results are presented in Table 1.
Dataset | MNIST | F-MNIST | K-MNIST | CIFAR-10 | ||||||||
Model | Lenet-5 | MLP | KDCL-MLP | Lenet-5 | MLP | KDCL-MLP | Lenet-5 | MLP | KDCL-MLP | Lenet-5 | MLP | KDCL-MLP |
PC | 89.94% | 83.78% | 86.10% | 77.22% | 76.67% | 77.42% | 67.77% | 60.52% | 60.34% | 38.31% | 32.74% | 33.37% |
FWD | 85.35% | 83.67% | 84.61% | 85.35% | 83.67% | 84.61% | 86.85% | 70.86% | 75.41% | 60.74% | 44.93% | 46.65% |
SCL-NL | 98.18% | 92.06% | 94.33% | 85.93% | 83.69% | 84.66% | 86.85% | 70.59% | 75.25% | 61.64% | 40.46% | 45.98% |
In Table 1, we show the experimental results of KDCL, where we compare the performance of the student model optimized by KDCL with that trained only with complementary labels across different losses and datasets. On MNIST, which is a relatively simple and easy dataset, all methods can achieve high accuracies. With the help of KDCL, we improve the accuracy of MLP from 83.78% to 86.10% with PC loss, 92.07% to 94.32% with FWD loss and 92.06% to 94.33% with SCL-NL loss. SCL-NL loss performs better among three loss functions. Besides, after being enhanced by KDCL, the accuracy of KDCL-MLP falls between the accuracy of MLP model and Lenet-5. On F-MNIST, which is more complex than MNIST, all methods have a slight decrease. Our KDCL achives 77.42% with PC loss, 84.61% with FWD loss and 84.66% with SCL-NL loss. On K-MNIST, which is more complex than F-MNIST, when using PC loss, our method does not significantly improve the accuracy of MLP, but we improve 4.55% with FWD loss and 4.66% with SCL-NL loss. On CIFAR-10, which is the most complex among the four datasets, there is a significant drop in accuracies. Nevertheless, the student model can still be optimized by KDCL, demonstrating its robustness and effectiveness across different datasets.
We show the testing process of all models in Figure 5.
In Figure 5, we present the convergence speed of all models in our experiments. The results show that the student model distilled by KDCL converges faster than that trained only with complementary labels. This indicates that the model can learn the features of the images more accurately and efficiently when utilizing both soft labels and complementary labels.
Additionally, we observe that the PC loss exhibits a decrease in accuracy on more challenging datasets, particularly on CIFAR10. This is because the PC loss uses the Sigmoid function as the normalization function, which can lead to negative values in the loss calculation and prevent the model from finding better parameters when updating. This phenomenon becomes more pronounced on the CIFAR10 dataset, where a peak appears. However, KDCL can alleviate this phenomenon and shift the peak to a later epoch. This demonstrates the effectiveness of KDCL in addressing the limitations of existing CLL methods and improving the performance of complementary label learning.
In this study, we established a knowledge distillation training framework for CLL, called KDCL. As stated in the introduction, the supervision information in complementary labels is easily missed. The proposed framework employed a deep CNN model with higher accuracy to soften complementary labels to soft labels. Both soft labels and origion complementary labels are used to train the classification model. After the optimization of KDCL, compared to just using the normal CLL methods, the accuracy has been improved by 0.5–4.5%.
The main limitation lies in multiple aspects. First, KDCL’s performance could be influenced by the choice of teacher-student models and CLL algorithms. Our experiments utilize specific combinations of models and algorithms, and the results may vary with different configurations. By choosing better CNN networks and more excellent CLL algorithms, KDCL can achieve better performance on more difficult datasets. Another drawback of the proposed scheme is time cost. Due to the two-stage training framework of KDCL, which involves training a high-accuracy teacher model using complementary labels, the overall training time cost of KDCL is relatively high. Training a high-accuracy model typically takes a considerable amount of time, which poses a challenge to the efficiency of KDCL. In addition, KDCL is only tested on public datasets, and the data distribution is relatively uniform. In the future, we also consider expanding the application scope of KDCL to use dynamically imbalanced data for CLL, or to combine with hybrid deep learning models [27,28,29].
In this paper, we give the first attempt to leverage the knowledge distillation training framework in CLL. To enhance the supervised information present in complementary labels, which are often overlooked in existing CLL methods, we propose a complementary label enhancement framework based on knowledge distillation, called KDCL. Specifically, KDCL consists of a teacher model and a student model. By adopting knowledge distillation techniques, the teacher model transfers its softened knowledge to the student model. The student model then learns from both soft labels and complementary labels to improve its classification performance. The experimental results on four benchmark datasets show that KDCL can improve the classification accuracy of CLL, and maintain robustness and effectiveness on difficult datasets.
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
This work was supported by the National Natural Science Foundation of China (No. 61976217, 62306320), the Natural Science Foundation of Jiangsu Province (No. BK20231063), the Fundamental Research Funds of Central Universities (No. 2019XKQYMS87), Science and Technology Planning Project of Xuzhou (No. KC21193).
All authors declare that they have no conflicts of interest.
[1] |
G. Papaiakovou, Kinematic and kinetic differences in the execution of vertical jumps between people with good and poor ankle joint dorsiflexion, J. Sports Sci., 31 (2013), 1789–1796. https://doi.org/10.1080/02640414.2013.803587 doi: 10.1080/02640414.2013.803587
![]() |
[2] |
W. C. Dobbs, D. V. Tolusso, M. V. Fedewa, M. R. Esco, Effect of postactivation potentiation on explosive vertical jump: A systematic review and meta-analysis, J. Strength Cond. Res., 33 (2019), 2009–2018. https://doi.org/10.1519/JSC.0000000000002750 doi: 10.1519/JSC.0000000000002750
![]() |
[3] |
G. P. Berriel, A. S. Cardoso, R. R. Costa, R. G. Rosa, H. B. Oliveira, L. F. M. Kruel, et al., Effects of postactivation performance enhancement on the vertical jump in high-level volleyball athletes, J. Hum. Kinet., 82(2022), 145–153. https://doi.org/10.2478/hukin-2022-0041 doi: 10.2478/hukin-2022-0041
![]() |
[4] |
F. Bello, E. Aedo-Muñoz, D. Moreira, C. Brito, B. Miarka, E. Cabello, Beach and indoor volleyball athletes present similar lower limb muscle activation during a countermovement jump, Hum. Mov., 21(2019), 42–50. https://doi.org/10.5114/hm.2020.89913 doi: 10.5114/hm.2020.89913
![]() |
[5] |
M. A. Mina, A. J. Blazevich, T. Tsatalas, G. Giakas, L. B. Seitz, A. D. Kay, Variable, but not free-weight, resistance back squat exercise potentiates jump performance following a comprehensive task-specific warm-up, Scand. J. Med. Sci. Sports, 29 (2019), 380–392. https://doi.org/10.1111/sms.13341 doi: 10.1111/sms.13341
![]() |
[6] |
R. J. Downey, D. A. Deprez, P. D. Chilibeck, Effects of postactivation potentiation on maximal vertical jump performance after a conditioning contraction in upper-body and lower-body muscle groups, J. Strength Cond. Res., 36 (2022), 259–261. https://doi.org/10.1519/JSC.0000000000004171 doi: 10.1519/JSC.0000000000004171
![]() |
[7] |
D. G. Sale, Postactivation potentiation: role in performance, Br. J. Sports Med., 38 (2004), 386–387. https://doi.org/10.1136/bjsm.2002.003392 doi: 10.1136/bjsm.2002.003392
![]() |
[8] |
S. Vuk, G. Markovic, S. Jaric, External loading and maximum dynamic output in vertical jumping: The role of training history, Hum. Mov. Sci., 31 (2012), 139–151. https://doi.org/10.1016/j.humov.2011.04.007 doi: 10.1016/j.humov.2011.04.007
![]() |
[9] |
J. J. McMahon, T. J. Suchomel, J. P. Lake, P. Comfort, Understanding the key phases of the countermovement jump force-time curve, Strength Cond. J., 40 (2018), 96–106. https://doi.org/10.1519/SSC.0000000000000375 doi: 10.1519/SSC.0000000000000375
![]() |
[10] |
Z. Anicic, D. Janicijevic, O. M. Knezevic, A. Garcia-Ramos, M. R. Petrovic, D. Cabarkapa, et al., Assessment of countermovement jump: What should we report?, Life, 13 (2023), 190. https://doi.org/10.3390/life13010190 doi: 10.3390/life13010190
![]() |
[11] |
D. G. Sale, Postactivation potentiation: role in human performance, Exercise Sport Sci. Rev., 30 (2002), 138–143. https://doi.org/10.1097/00003677-200207000-00008 doi: 10.1097/00003677-200207000-00008
![]() |
[12] |
M. Spieszny, R. Trybulski, P. Biel, A. Zając, M. Krzysztofik, Post-Isometric back squat performance enhancement of squat and countermovement jump, Int. J. Environ. Res. Public Health, 19 (2022), 12720. https://doi.org/10.3390/ijerph191912720 doi: 10.3390/ijerph191912720
![]() |
[13] |
L. Villalon-Gasch, A. Penichet-Tomas, S. Sebastia-Amat, B. Pueo, J. M. Jimenez-Olmedo, Postactivation performance enhancement (PAPE) increases vertical jump in elite female volleyball players, Int. J. Environ. Res. Public Health, 19 (2022), 462. https://doi.org/10.3390/ijerph19010462 doi: 10.3390/ijerph19010462
![]() |
[14] |
R. G. Gheller, J. Dal Pupo, L. A. P. D. Lima, B. M. D. Moura, S. G. D. Santos, A influência da profundidade de agachamento no desempenho e em parâmetros biomecânicos do salto com contra movimento, Revista Brasileira Cineantropometria Desempenho Humano, 16 (2014), 658–668. https://doi.org/10.5007/1980-0037.2014v16n6p658 doi: 10.5007/1980-0037.2014v16n6p658
![]() |
[15] |
R. G. Gheller, J. Dal Pupo, J. Ache-Dias, D. Detanico, J. Padulo, S. G. dos Santos, Effect of different knee starting angles on intersegmental coordination and performance in vertical jumps, Hum. Mov. Sci., 42 (2015), 71–80. https://doi.org/10.1016/j.humov.2015.04.010 doi: 10.1016/j.humov.2015.04.010
![]() |
[16] |
T. B. M. A. Moura, V. H. A. Okazaki, Kinematic and kinetic variable determinants on vertical jump performance: A review, MOJ Sports Med., 5 (2022), 25–33. https://doi.org/10.15406/mojsm.2022.05.00113 doi: 10.15406/mojsm.2022.05.00113
![]() |
[17] |
M. J. Bade, J. E. Stevens-Lapsley, Restoration of physical function in patients following total knee arthroplasty: An update on rehabilitation practices, Curr. Opin. Rheumatol., 24 (2012), 208–214. https://doi.org/10.1097/BOR.0b013e32834ff26d doi: 10.1097/BOR.0b013e32834ff26d
![]() |
[18] |
T. E. Cayot, J. W. Bellew, S. Kennedy, E. Pursley, N. Smith, K. Stemme, Acute effects of 3 neuromuscular electrical stimulation waveforms on exercising and recovery microvascular oxygenation responses, J. Sport Rehabil., 31 (2022), 554–561. https://doi.org/10.1123/jsr.2021-0326 doi: 10.1123/jsr.2021-0326
![]() |
[19] | M. Behringer, M. Moser, J. Montag, M. McCourt, D. Tenner, J. Mester, Electrically induced muscle cramps induce hypertrophy of calf muscles in healthy adults, J. Musculoskeletal Neuronal Interact., 15 (2015), 227–236. |
[20] | M. Pantović, B. Popović, D. Madić, J. Obradović, Effects of neuromuscular electrical stimulation and resistance training on knee extensor/flexor muscles, Coll. Antropol., 39 (2015), 153–157. |
[21] | C. He, X. X. Shen, X. X. Xie, X. Y. Chen, C. F. Chen, The influence of NMES intervention on martial arts flying double flying feet, Phys. Educ. Kaohsiung Univ. Appl. Sci., 5 (2022), 29–39. |
[22] |
Y. Lima, O. Ozkaya, G. A. Balci, R. Aydinoglu, C. Islegen, Electromyostimulation application on peroneus longus muscle improves balance and strength in American football players, J. Sport Rehabil., 31(2022), 599–604. https://doi.org/10.1123/jsr.2021-0264 doi: 10.1123/jsr.2021-0264
![]() |
[23] |
Ş. Şimşek, A. N. O. Soysal, A. K. Özdemir, Ü. B. Aslan, M. B. Korkmaz, Effect of superimposed Russian current on quadriceps strength and lower-extremity endurance in healthy males and females, J. Sport Rehabil., 32 (2022), 46–52. https://doi.org/10.1123/jsr.2021-0437 doi: 10.1123/jsr.2021-0437
![]() |
[24] | B. Andrew, Trail Guide To The Body’S Quick Reference To Trigger Points, Books of Discovery, 2019. |
[25] |
W. T. Dempster, The anthropometry of body action, Ann. New York Acad. Sci., 63 (1955), 559–585. https://doi.org/10.1111/j.1749-6632.1955.tb32112.x doi: 10.1111/j.1749-6632.1955.tb32112.x
![]() |
[26] |
C. F. Chen, H. J. Wu, The effect of an 8-week rope skipping intervention on standing long jump performance, Int. J. Environ. Res. Pub. Health, 19 (2022), 8472. https://doi.org/10.3390/ijerph19148472 doi: 10.3390/ijerph19148472
![]() |
[27] |
C. F. Chen, H. J. Wu, Z. S. Yang, H. Chen, H. T. Peng, Motion analysis for jumping discus throwing correction, Int. J. Environ. Res. Pub. Health, 18 (2021), 13414. https://doi.org/10.3390/ijerph182413414 doi: 10.3390/ijerph182413414
![]() |
[28] |
C. F. Chen, H. J. Wu, C. Liu, S. C. Wang, Kinematics analysis of male runners via forefoot and rearfoot strike strategies: A preliminary study, Int. J. Environ. Res. Pub. Health, 19 (2022), 15924. https://doi.org/10.3390/ijerph192315924 doi: 10.3390/ijerph192315924
![]() |
[29] | C. F. Chen, M. H. Chuang, H. J. Wu, Joint energy and shot mechanical energy of glide-style shot put, in Proceedings of the Institution of Mechanical Engineers, Part P: Journal of Sports Engineering and Technology, (2022). https://doi.org/10.1177/17543371221123168 |
[30] | J. Cohen, Statistical Power Analysis for the Behavioral Sciences, 2nd edition, Academic Press, New York, 1988. |
[31] |
K. Mackala, J. Stodólka, A. Siemienski, M. Coh, Biomechanical analysis of squat jump and countermovement jump from varying starting positions, J. Strength Cond. Res., 27 (2013), 2650–2661. https://doi.org/10.1519/JSC.0b013e31828909ec doi: 10.1519/JSC.0b013e31828909ec
![]() |
[32] |
M. Billot, A. Martin, C. Paizis, C. Cometti, N. Babault, Effects of an electrostimulation training program on strength, jumping, and kicking capacities in soccer players, J. Strength Cond. Res., 24 (2010), 1407–1413. https://doi.org/10.1519/JSC.0b013e3181d43790 doi: 10.1519/JSC.0b013e3181d43790
![]() |
[33] |
S. Rtolomei, R. De Luca, S. M. Marcora, May a nonlocalized postactivation performance enhancement exist between the upper and lower body in trained men, J. Strength Cond. Res., 37 (2023), 68–73. https://doi.org/10.1519/JSC.0000000000004243 doi: 10.1519/JSC.0000000000004243
![]() |
[34] | R. M. Enoka, Neuromechanics of Human Movement 5th Edition, Champaign, 2015. https://doi.org/10.5040/9781492595632 |
Dataset | MNIST | F-MNIST | K-MNIST | CIFAR-10 | ||||||||
Model | Lenet-5 | MLP | KDCL-MLP | Lenet-5 | MLP | KDCL-MLP | Lenet-5 | MLP | KDCL-MLP | Lenet-5 | MLP | KDCL-MLP |
PC | 89.94% | 83.78% | 86.10% | 77.22% | 76.67% | 77.42% | 67.77% | 60.52% | 60.34% | 38.31% | 32.74% | 33.37% |
FWD | 85.35% | 83.67% | 84.61% | 85.35% | 83.67% | 84.61% | 86.85% | 70.86% | 75.41% | 60.74% | 44.93% | 46.65% |
SCL-NL | 98.18% | 92.06% | 94.33% | 85.93% | 83.69% | 84.66% | 86.85% | 70.59% | 75.25% | 61.64% | 40.46% | 45.98% |
Dataset | MNIST | F-MNIST | K-MNIST | CIFAR-10 | ||||||||
Model | Lenet-5 | MLP | KDCL-MLP | Lenet-5 | MLP | KDCL-MLP | Lenet-5 | MLP | KDCL-MLP | Lenet-5 | MLP | KDCL-MLP |
PC | 89.94% | 83.78% | 86.10% | 77.22% | 76.67% | 77.42% | 67.77% | 60.52% | 60.34% | 38.31% | 32.74% | 33.37% |
FWD | 85.35% | 83.67% | 84.61% | 85.35% | 83.67% | 84.61% | 86.85% | 70.86% | 75.41% | 60.74% | 44.93% | 46.65% |
SCL-NL | 98.18% | 92.06% | 94.33% | 85.93% | 83.69% | 84.66% | 86.85% | 70.59% | 75.25% | 61.64% | 40.46% | 45.98% |