Zero-shot learning recognizes the unseen samples via the model learned from the seen class samples and semantic features. Due to the lack of information of unseen class samples in the training set, some researchers have proposed the method of generating unseen class samples by using generative models. However, the generated model is trained with the training set samples first, and then the unseen class samples are generated, which results in the features of the unseen class samples tending to be biased toward the seen class and may produce large deviations from the real unseen class samples. To tackle this problem, we use the autoencoder method to generate the unseen class samples and combine the semantic features of the unseen classes with the proposed new sample features to construct the loss function. The proposed method is validated on three datasets and showed good results.
Citation: Tianshu Wei, Jinjie Huang, Cong Jin. Zero-shot learning via visual-semantic aligned autoencoder[J]. Mathematical Biosciences and Engineering, 2023, 20(8): 14081-14095. doi: 10.3934/mbe.2023629
[1] | Yufeng Li, Chengcheng Liu, Weiping Zhao, Yufeng Huang . Multi-spectral remote sensing images feature coverage classification based on improved convolutional neural network. Mathematical Biosciences and Engineering, 2020, 17(5): 4443-4456. doi: 10.3934/mbe.2020245 |
[2] | Tao Zhang, Hao Zhang, Ran Wang, Yunda Wu . A new JPEG image steganalysis technique combining rich model features and convolutional neural networks. Mathematical Biosciences and Engineering, 2019, 16(5): 4069-4081. doi: 10.3934/mbe.2019201 |
[3] | Hongqiang Zhu . A graph neural network-enhanced knowledge graph framework for intelligent analysis of policing cases. Mathematical Biosciences and Engineering, 2023, 20(7): 11585-11604. doi: 10.3934/mbe.2023514 |
[4] | Jun Gao, Qian Jiang, Bo Zhou, Daozheng Chen . Convolutional neural networks for computer-aided detection or diagnosis in medical image analysis: An overview. Mathematical Biosciences and Engineering, 2019, 16(6): 6536-6561. doi: 10.3934/mbe.2019326 |
[5] | Limei Bai . Intelligent body behavior feature extraction based on convolution neural network in patients with craniocerebral injury. Mathematical Biosciences and Engineering, 2021, 18(4): 3781-3789. doi: 10.3934/mbe.2021190 |
[6] | Guozhen Dong . A pixel-wise framework based on convolutional neural network for surface defect detection. Mathematical Biosciences and Engineering, 2022, 19(9): 8786-8803. doi: 10.3934/mbe.2022408 |
[7] | Bin Zhang, Linkun Sun, Yingjie Song, Weiping Shao, Yan Guo, Fang Yuan . DeepFireNet: A real-time video fire detection method based on multi-feature fusion. Mathematical Biosciences and Engineering, 2020, 17(6): 7804-7818. doi: 10.3934/mbe.2020397 |
[8] | Fengjuan Liu, Binbin Qu, Lili Wang, Yahui Xu, Xiufa Peng, Chunling Zhang, Dexiang Xu . Effect of selective sleep deprivation on heart rate variability in post-90s healthy volunteers. Mathematical Biosciences and Engineering, 2022, 19(12): 13851-13860. doi: 10.3934/mbe.2022645 |
[9] | Paweł Konieczka, Lech Raczyński, Wojciech Wiślicki, Oleksandr Fedoruk, Konrad Klimaszewski, Przemysław Kopka, Wojciech Krzemień, Roman Y. Shopa, Jakub Baran, Aurélien Coussat, Neha Chug, Catalina Curceanu, Eryk Czerwiński, Meysam Dadgar, Kamil Dulski, Aleksander Gajos, Beatrix C. Hiesmayr, Krzysztof Kacprzak, Łukasz Kapłon, Grzegorz Korcyl, Tomasz Kozik, Deepak Kumar, Szymon Niedźwiecki, Szymon Parzych, Elena Pérez del Río, Sushil Sharma, Shivani Shivani, Magdalena Skurzok, Ewa Łucja Stępień, Faranak Tayefi, Paweł Moskal . Transformation of PET raw data into images for event classification using convolutional neural networks. Mathematical Biosciences and Engineering, 2023, 20(8): 14938-14958. doi: 10.3934/mbe.2023669 |
[10] | Sakorn Mekruksavanich, Wikanda Phaphan, Anuchit Jitpattanakul . Epileptic seizure detection in EEG signals via an enhanced hybrid CNN with an integrated attention mechanism. Mathematical Biosciences and Engineering, 2025, 22(1): 73-105. doi: 10.3934/mbe.2025004 |
Zero-shot learning recognizes the unseen samples via the model learned from the seen class samples and semantic features. Due to the lack of information of unseen class samples in the training set, some researchers have proposed the method of generating unseen class samples by using generative models. However, the generated model is trained with the training set samples first, and then the unseen class samples are generated, which results in the features of the unseen class samples tending to be biased toward the seen class and may produce large deviations from the real unseen class samples. To tackle this problem, we use the autoencoder method to generate the unseen class samples and combine the semantic features of the unseen classes with the proposed new sample features to construct the loss function. The proposed method is validated on three datasets and showed good results.
Let
● (Divisorial)
● (Flipping)
● (Mixed)
Note that the mixed case can occur only if either
We can almost always choose the initial
Our aim is to discuss a significant special case where the
Definition 1 (MMP with scaling). Let
By the
(Xj,Θj)ϕj→Zjψj←(Xj+1,Θj+1)gj↘↓↙gj+1S | (1.1) |
where
(2)
(3)
(4)
Note that (4) implies that
In general such a diagram need not exist, but if it does, it is unique and then
(X,Θ)ϕ→Zϕ+←(X+,Θ+)g↘↓↙g+S | (1.5) |
We say that the MMP terminates with
(6) either
(7) or
Warning 1.8. Our terminology is slightly different from [7], where it is assumed that
One advantage is that our MMP steps are uniquely determined by the starting data. This makes it possible to extend the theory to algebraic spaces [33].
Theorem 2 is formulated for Noetherian base schemes. We do not prove any new results about the existence of flips, but Theorem 2 says that if the MMP with scaling exists and terminates, then its steps are simpler than expected, and the end result is more controlled than expected.
On the other hand, for 3-dimensional schemes, Theorem 2 can be used to conclude that, in some important cases, the MMP runs and terminates, see Theorem 9.
Theorem 2. Let
(i)
(ii)
(iii)
(iv)
(v) The
We run the
(1)
(a) either
(b) or
(2) The
(3)
Furthermore, if the MMP terminates with
(4)
(5) if
Remark 2.6. In applications the following are the key points:
(a) We avoided the mixed case.
(b) In the fipping case we have both
(c) In (3) we have an explicit, relatively ample, exceptional
(d) In case (5) we end with
(e) In case (5) the last MMP step is a divisorial contraction, giving what [35] calls a Kollár component; no further flips needed.
Proof. Assertions (1-3) concern only one MMP-step, so we may as well drop the index
Let
∑hi(Ei⋅C)=−r−1(EΘ⋅C). | (2.7) |
By Lemma 3 this shows that the
∑hi(e′(Ei⋅C)−e(Ei⋅C′))=0. | (2.8) |
By the linear independence of the
Assume first that
ϕ∗(EΘ+rH)=∑i>1(ei+rhi)ϕ∗(Ei) |
is
Otherwise
g−1(g(supp(EΘ+rH)))=supp(EΘ+rH). | (2.9) |
If
Thus
Assume next that the flip
Finally, if the MMP terminates with
Lemma 3. Let
∑ni=1hivi=γv0 for some γ∈L. |
Then
Proof. We may assume that
∑ni=1hiai=γa0 and n∑i=1hibi=γb0. |
This gives that
∑ni=1hi(b0ai−a0bi)=0. |
Since the
Lemma 4. Let
Proof. Assume that
∑ni=1sihi=−(∑ni=1siei)⋅∑ni=0rihi. |
If
The following is a slight modifications of [3,Lem.1.5.1]; see also [17,5.3].
Lemma 5. Let
Comments on
Conjecture 6. Let
(1)
(2) The completion of
Using [30,Tag 0CAV] one can reformulate (6.2) as a finite type statement:
(3) There are elementary étale morphisms
(x,X,∑DXi)←(u,U,∑DUi)→(y,Y,∑DYi). |
Almost all resolution methods commute with étale morphisms, thus if we want to prove something about a resolution of
A positive answer to Conjecture 6 (for
(Note that [27] uses an even stronger formulation: Every normal, analytic singularity has an algebraization whose class group is generated by the canonical class. This is, however, not true, since not every normal, analytic singularity has an algebraization.)
Existence of certain resolutions.
7 (The assumptions 2.i-v). In most applications of Theorem 2 we start with a normal pair
Typically we choose a log resolution
We want
The existence of a
8 (Ample, exceptional divisors). Assume that we blow up an ideal sheaf
Claim 8.1. Let
Resolution of singularities is also known for 3-dimensional excellent schemes [10], but in its original form it does not guarantee projectivity in general. Nonetheless, combining [6,2.7] and [23,Cor.3] we get the following.
Claim 8.2. Let
Next we mention some applications. In each case we use Theorem 2 to modify the previous proofs to get more general results. We give only some hints as to how this is done, we refer to the original papers for definitions and details of proofs.
The first two applications are to dlt 3-folds. In both cases Theorem 2 allows us to run MMP in a way that works in every characteristic and also for bases that are not
Relative MMP for dlt 3-folds.
Theorem 9. Let
Then the MMP over
(1) each step
(a) either a contraction
(b) or a flip
(2)
(3) if either
Proof. Assume first that the MMP steps exist and the MMP terminates. Note that
KX+E+g−1∗Δ∼Rg∗(KY+Δ)+∑j(1+a(Ej,Y,Δ))Ej∼g,R∑j(1+a(Ej,Y,Δ))Ej=:EΘ. |
We get from Theorem 2 that (1.a-b) are the possible MMP-steps, and (2-3) from Theorem 15-5.
For existence and termination, all details are given in [6,9.12].
However, I would like to note that we are in a special situation, which can be treated with the methods that are in [1,29], at least when the closed points of
The key point is that everything happens inside
Contractions for reducible surfaces have been treated in [1,Secs.11-12], see also [12,Chap.6] and [31].
The presence of
The short note [34] explains how [15,3.4] gives 1-complemented 3-fold flips; see [16,3.1 and 4.3] for stronger results.
Inversion of adjunction for 3-folds. Using Theorem 9 we can remove the
Corollary 10. Let
This implies that one direction of Reid's classification of terminal singularities using 'general elephants' [28,p.393] works in every characteristic. This could be useful in extending [2] to characteristics
Corollary 11. Let
Divisor class group of dlt singularities. The divisor class group of a rational surface singularity is finite by [24], and [8] plus an easy argument shows that the divisor class group of a rational 3-dimensional singularity is finitely generated. Thus the divisor class group of a 3-dimensional dlt singularity is finitely generated in characteristic
Proposition 12. [21,B.1] Let
It seems reasonable to conjecture that the same holds in all dimensions, see [21,B.6].
Grauert-Riemenschneider vanishing. One can prove a variant of the Grauert-Riemenschneider (abbreviated as G-R) vanishing theorem [13] by following the steps of the MMP.
Definition 13 (G-R vanishing). Let
Let
(1)
(2)
Then
We say that G-R vanishing holds over
By an elementary computation, if
If
G-R vanishing also holds over 2-dimensional, excellent schemes by [24]; see [20,10.4]. In particular, if
However, G-R vanishing fails for 3-folds in every positive characteristic, as shown by cones over surfaces for which Kodaira's vanishing fails. Thus the following may be the type of G-R vanishing result that one can hope for.
Theorem 14. [5] Let
Proof. Let
A technical problem is that we seem to need various rationality properties of the singularities of the
For divisorial contractions
For flips
From G-R vanishing one can derive various rationality properties for all excellent dlt pairs. This can be done by following the method of 2 spectral sequences as in [19] or [20,7.27]; see [5] for an improved version.
Theorem 15. [5] Let
(1)
(2) Every irreducible component of
(3) Let
See [5,12] for the precise resolution assumptions needed. The conclusions are well known in characteristic 0, see [22,5.25], [12,Sec.3.13] and [20,7.27]. For 3-dimensional dlt varieties in
The next two applications are in characteristic 0.
Dual complex of a resolution. Our results can be used to remove the
Corollary 16. Let
Theorem 17. Let
(1)
(2)
(3)
Then
Proof. Fix
Let us now run the
Note that
We claim that each MMP-step as in Theorem 2 induces either a collapse or an isomorphism of
By [11,Thm.19] we get an elementary collapse (or an isomorphism) if there is a divisor
It remains to deal with the case when we contract
Dlt modifications of algebraic spaces. By [25], a normal, quasi-projective pair
However, dlt modifications are rarely unique, thus it was not obvious that they exist when the base is not quasi-projective. [33] observed that Theorem 2 gives enough uniqueness to allow for gluing. This is not hard when
Theorem 18 (Villalobos-Paz). Let
(1)
(2)
(3)
(4)
(5) either
I thank E. Arvidsson, F. Bernasconi, J. Carvajal-Rojas, J. Lacini, A. Stäbler, D. Villalobos-Paz, C. Xu for helpful comments and J. Witaszek for numerous e-mails about flips.
[1] |
R. Gao, X. Hou, J. Qin, Y. Shen, Y. Long, L. Liu, et al., Visual-semantic aligned bidirectional network for zero-shot learning, IEEE Trans. Multimedia, 25 (2022), 1649–1664. https://doi.org/10.1109/TMM.2022.3145666 doi: 10.1109/TMM.2022.3145666
![]() |
[2] |
L. Yang, X. Gao, Q. Gao, J. Han, L. Shao, Label-activating framework for zero-shot learning, Neural Netw., 121 (2020), 1–9. https://doi.org/10.1016/j.neunet.2019.08.023 doi: 10.1016/j.neunet.2019.08.023
![]() |
[3] |
G. Kwon, G. A. Regib, A gating model for bias calibration in generalized zero-shot learning, IEEE Trans. Image Process., (2022), 1. https://doi.org/10.1109/TIP.2022.3153138 doi: 10.1109/TIP.2022.3153138
![]() |
[4] | T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representations in vector space, preprint, arXiv: 1301.3781. |
[5] |
X. Li, M. Fang, J. Liu, Low-rank embedded orthogonal subspace learning for zero-shot Classification, J. Visual Commun. Image Representation, 74 (2021), 102981. https://doi.org/10.1016/j.jvcir.2020.102981 doi: 10.1016/j.jvcir.2020.102981
![]() |
[6] | Z. Ding, M. Shao, Y. Fu, Low-rank embedded ensemble semantic dictionary for zero-shot learning, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017), 6005–6013. https://doi.org/10.1109/CVPR.2017.636 |
[7] |
Y. Liu, X. Gao, J. Han, L. Liu, L. Shao, Zero-shot learning via a specific rank-controlled semantic autoencoder, Pattern Recognit., 122 (2022), 108237. https://doi.org/10.1016/j.patcog.2021.108237 doi: 10.1016/j.patcog.2021.108237
![]() |
[8] | E. Kodirov, T. Xiang, S. Gong, Semantic autoencoder for zero-shot learning, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017), 4447–4456. https://doi.org/10.1109/CVPR.2017.473 |
[9] | D. P. Kingma, M. Welling, Auto-encoding variational bayes, preprint, arXiv: 1312.6114. |
[10] | I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, et al., Generative adversarial nets, in Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS), 2 (2014), 2672–2680. |
[11] |
J. Li, M. Jing, K. Lu, L. Zhu, H. T. Shen, Investigating the bilateral connections in generative zero-shot learning, IEEE Trans. Cybern., 52 (2022), 8167–8178. https://doi.org/10.1109/TCYB.2021.3050803 doi: 10.1109/TCYB.2021.3050803
![]() |
[12] |
X. Chen, J. Li, X. Lan, N. Zheng, Generalized zero-shot learning via multi-modal aggregated posterior aligning neural network, IEEE Trans. Multimedia, 24 (2022), 177–187. https://doi.org/10.1109/TMM.2020.3047546 doi: 10.1109/TMM.2020.3047546
![]() |
[13] | W. Cao, C. Zhou, Y. Wu, Z. Ming, Z. Xu, J. Zhang, Research progress of zero-shot learning beyond computer vision, in International Conference on Algorithms and Architectures for Parallel Processing, 12453 (2020), 538–551. https://doi.org/10.1007/978-3-030-60239-0_36 |
[14] | W. Chao, S. Changpinyo, B. Gong, F. Sha, An empirical study and analysis of generalized zero-shot learning for object recognition in the wild, in European Conference on Computer Vision (ECCV), 9906 (2016), 52–68. https://doi.org/10.1007/978-3-319-46475-6_4 |
[15] | W. Xu, Y. Xian, J. Wang, B. Schiele, Z. Akata, Attribute prototype network for zero-shot learning, in Proceedings of the 34th International Conference on Neural Information Processing Systems (NIPS), (2020), 21969–21980. |
[16] |
W. Cao, Y. Wu, Y. Sun, H. Zhang, J. Ren, D. Gu, et al., A review on multimodal zero-shot learning, WIREs Data Min. Knowl. Discovery, 13 (2023), 1488. https://doi.org/10.1002/widm.1488 doi: 10.1002/widm.1488
![]() |
[17] |
W. Cao, Y. Wu, C. Huang, M. J. A. Patwary, X. Wang, MFF: Multi-modal feature fusion for zero-shot learning, Neurocomputing, 510 (2022), 172–180. https://doi.org/10.1016/j.neucom.2022.09.070 doi: 10.1016/j.neucom.2022.09.070
![]() |
[18] | E. Schönfeld, S. Ebrahimi, S. Sinha, T. Darrell, Z. Akata, Generalized zero- and few-shot learning via aligned variational autoencoders, in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2019), 8239–8247. https://doi.org/10.1109/CVPR.2019.00844 |
[19] | R. Keshari, R. Singh, M. Vatsa, Generalized zero-shot learning via over-complete distribution, in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2020), 13297–13305. https://doi.org/10.1109/CVPR42600.2020.01331 |
[20] | Y. Xian, T. Lorenz, B. Schiele, Z. Akata, Feature generating networks for zero-shot learning, in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), (2018), 5542–5551. https://doi.org/10.1109/CVPR.2018.00581 |
[21] |
Y. Yang, X. Zhang, M. Yang, C. Deng, Adaptive bias-aware feature generation for generalized zero-shot learning, IEEE Trans. Multimedia, 25 (2023), 280–290. https://doi.org/10.1109/TMM.2021.3125134 doi: 10.1109/TMM.2021.3125134
![]() |
[22] |
Y. Luo, X. Wang, F. Pourpanah, Dual VAEGAN: A generative model for generalized zero-shot learning, Appl. Soft Comput., 107 (2021), 107352. https://doi.org/10.1016/J.ASOC.2021.107352 doi: 10.1016/J.ASOC.2021.107352
![]() |
[23] | X. Chen, X. Lan, F. Sun, N. Zheng, A boundary based out-of-distribution classifier for generalized zero-Shot learning, in European Conference on Computer Vision (ECCV), (2020), 572–588. https://doi.org/10.1007/978-3-030-58586-0_34 |
[24] |
W. Cao, Y. Wu, C. Chakraborty, D. Li, L. Zhao, S. K. Ghosh, Sustainable and transferable traffic sign recognition for intelligent transportation systems, IEEE Trans. Intell. Transp. Syst., (2022), 1–11. https://doi.org/10.1109/TITS.2022.3215572 doi: 10.1109/TITS.2022.3215572
![]() |
[25] |
Y. Liu, X. Gao, J. Han, L. Shao, A discriminative cross-aligned variational autoencoder for Zero-Shot Learning, IEEE Trans. Cybern., 53 (2023), 3794–3805. https://doi.org/10.1109/TCYB.2022.3164142 doi: 10.1109/TCYB.2022.3164142
![]() |
[26] |
C. H. Lampert, H. Nickisch, S. Harmeling, Attribute-based classification for zero-shot visual object categorization, IEEE Trans. Pattern Anal. Mach. Intell., 36 (2014), 453–465. https://doi.org/10.1109/TPAMI.2013.140 doi: 10.1109/TPAMI.2013.140
![]() |
[27] |
Y. Xian, C. H. Lampert, B. Schiele, Z. Akata, Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly, IEEE Trans. Pattern Anal. Mach. Intell., 41 (2019), 2251–2265. https://doi.org/10.1109/TPAMI.2018.2857768 doi: 10.1109/TPAMI.2018.2857768
![]() |
[28] | A. Farhadi, I. Endres, D. Hoiem, D. Forsyth, Describing objects by their attributes, in 2009 IEEE Conference on Computer Vision and Pattern Recognition, (2009), 1778–1785. https://doi.org/10.1109/CVPR.2009.5206772 |
[29] | Z. Akata, S. Reed, D. Walter, H. Lee, B. Schiele, Evaluation of output embeddings for fine-grained image classification, in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2015), 2927–2936. https://doi.org/10.1109/CVPR.2015.7298911 |
[30] | S. Biswas, Y. Annadani, Preserving semantic relations for Zero-shot learning, in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2018), 7603–7612. https://doi.org/10.1109/CVPR.2018.00793 |
[31] |
H. Zhang, Y. Long, Y. Guan, L. Shao, Triple verification network for generalized zero-shot learning, IEEE Trans. Image Process., 28 (2019), 506–517. https://doi.org/10.1109/TIP.2018.2869696 doi: 10.1109/TIP.2018.2869696
![]() |
[32] | J. Liu, X. Li, G. Yang, Cross-class sample synthesis for zero-shot learning, in British Machine Vision Conference, (2018). |
1. | John D. Kechagias, Materials for Additive Manufacturing, 2022, 9, 2372-0484, 785, 10.3934/matersci.2022048 | |
2. | Jian Liu, Mengyao Xu, Rongdi Zhang, Xirui Zhang, Wenxiong Xi, Progress of Porous/Lattice Structures Applied in Thermal Management Technology of Aerospace Applications, 2022, 9, 2226-4310, 827, 10.3390/aerospace9120827 | |
3. | Zisheng Wang, Xingyu Jiang, Guozhe Yang, Boxue Song, Hongyu Sha, Design and mechanical performance analysis of T-BCC lattice structures, 2024, 32, 22387854, 1538, 10.1016/j.jmrt.2024.08.021 |