Processing math: 100%
Research article

Convergence of online learning algorithm with a parameterized loss

  • The research on the learning performance of machine learning algorithms is one of the important contents of machine learning theory, and the selection of loss function is one of the important factors affecting the learning performance. In this paper, we introduce a parameterized loss function into the online learning algorithm and investigate the performance. By applying convex analysis techniques, the convergence of the learning sequence is proved and the convergence rate is provided in the expectation sense. The analysis results show that the convergence rate can be greatly improved by adjusting the parameter in the loss function.

    Citation: Shuhua Wang. Convergence of online learning algorithm with a parameterized loss[J]. AIMS Mathematics, 2022, 7(11): 20066-20084. doi: 10.3934/math.20221098

    Related Papers:

    [1] Hira Waheed, Akbar Zada, Rizwan Rizwan, Choonkil Park, Niamat Ullah . Qualitative analysis of coupled system of sequential fractional integrodifferential equations. AIMS Mathematics, 2022, 7(5): 8012-8034. doi: 10.3934/math.2022447
    [2] Subramanian Muthaiah, Manigandan Murugesan, Muath Awadalla, Bundit Unyong, Ria H. Egami . Ulam-Hyers stability and existence results for a coupled sequential Hilfer-Hadamard-type integrodifferential system. AIMS Mathematics, 2024, 9(6): 16203-16233. doi: 10.3934/math.2024784
    [3] Riadh Chteoui . Retraction notice to "Identifications of the coefficients of the Taylor expansion (second order) of periodic non-collision solutions for the perturbed planar Keplerian Hamiltonian system " [AIMS Mathematics 8(7) (2023) 16528–16541]. AIMS Mathematics, 2023, 8(10): 22730-22730. doi: 10.3934/math.20231157
    [4] Editorial Office of AIMS Mathematics . Retraction notice to "The Generalized Riemann Hypothesis on elliptic complex fields" [AIMS Mathematics 8(11) (2023) 25772–25803]. AIMS Mathematics, 2023, 8(11): 27857-27857. doi: 10.3934/math.20231425
    [5] Ayub Samadi, Sotiris K. Ntouyas, Jessada Tariboon . Coupled systems of nonlinear sequential proportional Hilfer-type fractional differential equations with multi-point boundary conditions. AIMS Mathematics, 2024, 9(5): 12982-13005. doi: 10.3934/math.2024633
    [6] Aziz Belmiloudi . Cardiac memory phenomenon, time-fractional order nonlinear system and bidomain-torso type model in electrocardiology. AIMS Mathematics, 2021, 6(1): 821-867. doi: 10.3934/math.2021050
    [7] Sajjad Ali Khan, Kamal Shah, Poom Kumam, Aly Seadawy, Gul Zaman, Zahir Shah . Study of mathematical model of Hepatitis B under Caputo-Fabrizo derivative. AIMS Mathematics, 2021, 6(1): 195-209. doi: 10.3934/math.2021013
    [8] H. H. G. Hashem, A. M. A. El-Sayed, Maha A. Alenizi . Weak and pseudo-solutions of an arbitrary (fractional) orders differential equation in nonreflexive Banach space. AIMS Mathematics, 2021, 6(1): 52-65. doi: 10.3934/math.2021004
    [9] Thabet Abdeljawad, Sabri T. M. Thabet, Imed Kedim, Miguel Vivas-Cortez . On a new structure of multi-term Hilfer fractional impulsive neutral Levin-Nohel integrodifferential system with variable time delay. AIMS Mathematics, 2024, 9(3): 7372-7395. doi: 10.3934/math.2024357
    [10] A. K. Mittal, L. K. Balyan . Chebyshev pseudospectral approximation of two dimensional fractional Schrodinger equation on a convex and rectangular domain. AIMS Mathematics, 2020, 5(3): 1642-1662. doi: 10.3934/math.2020111
  • The research on the learning performance of machine learning algorithms is one of the important contents of machine learning theory, and the selection of loss function is one of the important factors affecting the learning performance. In this paper, we introduce a parameterized loss function into the online learning algorithm and investigate the performance. By applying convex analysis techniques, the convergence of the learning sequence is proved and the convergence rate is provided in the expectation sense. The analysis results show that the convergence rate can be greatly improved by adjusting the parameter in the loss function.



    It is well known that the Herbrand-Ribet theorem is about the relation between the p-th class group of cyclotomic field Q(ζp) and the Bernoulli number.

    We introduce some notations. Let F=Q(ζp) be the cyclotomic field, and

    G=Gal(Q(ζp)/Q)={σa:1ap1}

    be the Galois group, where σa(ζp)=ζap. Let ω be the Teichmuller character of group (Z/p)×, that is, a character ω:(Z/p)×Z×p such that for aZ,(a,p)=1. Then ω(a)p1=1 and ω(a)amodp. For the group ring Zp[G], where Zp is the p-adic integer ring, the idempotents are

    εi=1p1p1a=1ωi(a)σ1a, 0ip2.

    Let A be the p-part of Cl(F), which is the class group of F. Then A=p2i=0Ai, where Ai=εiA.

    The Herbrand theorem states that if p divides the numerator of the Bernoulli number Bpi, then εiA0. In 1976, Ribet [7] proved the converse of the Herbrand's theorem. So the Herbrand-Ribet theorem is as follow.

    Theorem 1.1. Let i be an odd integer with 3ip2. If p divides the numerator of the Bernoulli number Bpi, then εiA0.

    The Herbrand theorem is obtained by the properties of the Stickelberger element and the p-adic L-function. In [8], the Herbrand-Ribet theorem for function fields was obtained. In addition, Coats and Sinnott [2] proved an analogue of Stickelberger's theorem for the K2 groups.

    Throughout this paper, inspired by the above results, we obtain respectively the K2 analogue of Herbrand-Ribet theorem and the K2 analogue of the Vandiver conjecture.

    Let S be a finite set of places of F=Q(ζp) including the archimedean ones. Let OS denote the ring of S-integers in F, i.e., the ring of all aF such that v(a)0 for each place vS. Then

    0kerdSK2FdSvSκ(v)0.

    By Quillen's localization sequence, we have the isomorphism kerdSK2(OS), which is moreover a G-isomorphism if S is stable under G (see [9, P. 271]).

    Let K2(Z[ζp]) be the K2 group of the ring of algebraic integers Z[ζp], and let C be the p-part of K2(Z[ζp]). Then we have C=p2i=0Ci, Ci=εiC.

    Lemma 2.1. There exist G-isomorphisms:

    εjA/pεj+1C/p,   0jp3.

    Proof. We note an isomorphism [4]

    μpAC/p,  (2.1)

    where G acts on μpA by the formula

    (ζx)ρ=ζρxρ,  for ζμp, ρG, xA.

    We claim that the above isomorphism is a G-isomorphism. Let S be a set of the places of Q(ζp) consisting of the archimedean ones and the finite ones above p. Let Sc denote the set of complex places. Then there is a natural exact sequence (see [9, Theorem 6.2])

    0μpCl(OS)K2OS/phS1(vSScμp)00, (2.2)

    where (μp)0 denotes the subgroup of the direct sum consisting of the elements z=(zv) such that zv=0. The map hS1 is that induced by the l-th power norm residue symbols for vSSc. Since S is stable under G, the above exact sequence is sequence of G-modules with G-homomorphisms(see [9, P. 271]). By[11, Theorem 73], C and the p-part of K2(Z[ζp,1/p]) are equal to H2ét(Z[ζp,1/p],Zp(2)). Since pZ[ζp]=(1ζp)p1, the p-part of Cl(OS) is equal to A. Moreover, the fourth term in (2.2) is 0 (see [11, Example 5]), we get that (2.1) is a G-isomorphism.

    Then we consider the following homomorphism

    δ:AμpA, xζpx.

    Here, δ is not a homomorphism of G-modules. The kernel of δ is pA, so we get an isomorphism

    δ:A/pAμpA. (2.3)

    Next we give the explicit description of δ under the Galois group action. For z:=ζpn, we have σa(z)=zω(a) (see [1, Lemma 3.3]), so there is

    σa(δx)=σa(ζp)σax=ζω(a)pσax=ω(a)δ(σa(x)).

    Therefore,

    ζpεjx=ζp(1p1p1a=1ωj(a)σ1a(x))=1p1p1a=1ω(j+1)(a)σ1a(ζpx)=εj+1(ζpx).

    Hence

    δ(εjx)=ζpεjx=εj+1(ζpx)=εj+1δ(x). (2.4)

    By (2.4), the action of idempotents εj on (2.3) leads to

    εj(A/pA)εj+1(μpA).

    Since (2.1) is a G-isomorphism, combining with the above isomorphism, we obatin

    εjA/pεj+1C/p,   0jp3

    as desired.

    Next, we give the K2 analogue of the Herbrand-Ribet theorem of the field Q(ζp) as follow.

    Theorem 2.1. Let i be even, 4ip3. Then

    Ci0p|Bp+1i.

    Proof. It is clearly that

    Ci0εiC/p0,
    Ai10εi1A/p0.

    From Lemma 2.1, we have εiC/pεi1A/p. Utilizing Theorem 1.1, we get Ci0p|Bp+1i, as required.

    However, the proof of "⇒" can also be obtained by the properties of the Stickelberger element without using Theorem 1.1 and Lemma 2.1, We sketch the proof as follow.

    Considering the Stickelberger element for the cyclotomic field Q(ζp)

    θ1=p1a=1ζ(σa,1)σ1a,

    where ζ(σ,s) is the partial zeta function, we can prove that (c2ωi(c))B2,ωi annihilates Ci, moreover, for i=4,6,,p3, B2,ωi annihilates Ci.

    We now suppose Ci0. Then B2,ωi0 (mod p). Since

    B2,ωnBn+2n+2  (mod p),

    we get

    B2,ωi=B2,ωp1iBp+1ip+1i  (mod p).

    Therefore, p|Bp+1i.

    The Vandiver's conjecture states that p does not divide the class number of Q(ζp)+, where Q(ζp)+ is the maximal real subfield of the cyclotomic field Q(ζp). Equivalently, the Vandiver's conjecture says that all the even part εiA are trivial.

    Lemma 3.1. For any irregular prime p, A2i=0, where 1i14.

    Proof. From [10] (Tables §1 Bernoulli numbers), for i=1,2,3,4,5,7, we have pB2i. So from Theorem 1.1, we have Ap2i=0,i=1,2,3,4,5,7. By the reflection theorem (see [10, Theorem 10.9])

    p-rankA2ip-rankAp2i,

    we get A2i=0.

    Let Pn denote the maximal prime factor of Bn if Bn has a prime factor. For i=6,8,9,10,11,12,13,14, from [10] (Tables §1 Bernoulli numbers) we have

    P12=691, P16=3617, P18=43867, P20=617,
    P22=593, P24=2294797, P26=657931, P28=362903.

    These primes are all less than 12,000,000. But it is well know that the Vandiver conjecture has been checked to be true for all irregular primes less than 12,000,000. So we get A2i=0 for i=6,8,9,10,11,12,13,14.

    Now we can make a K2-analogue of Vandiver's conjecture as follow.

    Conjecture 3.1. For odd i, εiC=0, where C is the p-part of K2(Z[ζp]).

    It has been proved that εp3A always vanishes (see [5]) and that if the prime p3 (mod 4), then ε(p+1)/2A is trivial (see [3,6]). Combining these results with Lemmas 2.1 and 3.1, we get the following result, which checks some cases of Conjecture 3.1.

    Theorem 3.1. For any irregular prime p, C2i+1=0 (1i14), Cp2=0 and C(p+3)/2=0 if p3 (mod 4).

    We gave the K2 analogue of Herbrand-Ribet theorem and prove the case. The K2 analogue of Vandiver's conjecture was also obtained, but this case is hard to prove. However, we just check some special circumstances of it.

    The authors are thankful for the careful reviews of referees and the editor. The first author was supported by the National Natural Science Foundation of China (No. 11901079), China Postdoctoral Science Foundation (No. 2021M700751) and the Scientific and Technological Research Program Foundation of Jilin Province (No. JJKH20190690KJ; No. 20200401085GX; No. JJKH20220091KJ). The second author was supported by the National Natural Science Foundation of China (No. 11601211).

    The authors declare that there are no conflicts of interest regarding the publication of this paper.



    [1] N. Aronszajn, Theory of reproducing kernels, Trans. Amer. Math. Soc., 68 (1950), 337–404. http://dx.doi.org/10.2307/1990404 doi: 10.2307/1990404
    [2] W. Dai, J. Hu, Y. Cheng, X. Wang, T. Chai, RVFLN-based online adaptive semi-supervised learning algorithm with application to product quality estimation of industrial processes, J. Cent. South Univ., 26 (2019), 3338–3350. http://dx.doi.org/10.1007/s11771-019-4257-6 doi: 10.1007/s11771-019-4257-6
    [3] J. Gui, Y. Liu, X. Deng, B. Liu, Network capacity optimization for Cellular-assisted vehicular systems by online learning-based mmWave beam selection, Wirel. Commun. Mob. Com., 2021 (2021), 8876186. http://dx.doi.org/10.1155/2021/8876186 doi: 10.1155/2021/8876186
    [4] M. Li, I. Sethi, A new online learning algorithm with application to image segmentation, Image Processing: Algorithms and Systems IV, 5672 (2005), 277–286. http://dx.doi.org/10.1117/12.586328 doi: 10.1117/12.586328
    [5] S. Sai Santosh, S. Darak, Intelligent and reconfigurable architecture for KL divergence based online machine learning algorithm, arXiv: 2002.07713.
    [6] B. Yang, J. Yao, X. Yang, Y. Shi, Painting image classification using online learning algorithm, In: Distributed, ambient and pervasive interactions, Cham: Springer, 2017,393–403. http://dx.doi.org/10.1007/978-3-319-58697-7_29
    [7] S. Das, Kuhoo, D. Mishra, M. Rout, An optimized feature reduction based currency forecasting model exploring the online sequential extreme learning machine and krill herd strategies, Physica A, 513 (2019), 339–370. http://dx.doi.org/10.1016/j.physa.2018.09.021 doi: 10.1016/j.physa.2018.09.021
    [8] S. Smale, Y. Yao, Online learning algorithms, Found. Comput. Math., 6 (2006), 145–170. http://dx.doi.org/10.1007/s10208-004-0160-z
    [9] Y. Ying, D. Zhou, Online regularized classification algorithms, IEEE Trans. Inform. Theory, 52 (2006), 4775–4788. http://dx.doi.org/10.1109/TIT.2006.883632 doi: 10.1109/TIT.2006.883632
    [10] Y. Ying, D. Zhou, Unregularized online learning algorithms with general loss functions, Appl. Comput. Harmon. Anal., 42 (2017), 224–244. http://dx.doi.org/10.1016/J.ACHA.2015.08.007 doi: 10.1016/J.ACHA.2015.08.007
    [11] Y. Zeng, D. Klabjian, Online adaptive machine learning based algorithm for implied volatility surface modeling, Knowl.-Based Syst., 163 (2019), 376–391. http://dx.doi.org/10.1016/j.knosys.2018.08.039 doi: 10.1016/j.knosys.2018.08.039
    [12] J. Lin, D. Zhou, Online learning algorithms can converge comparably fast as batch learning, IEEE Trans. Neural Netw. Learn. Syst., 29 (2018), 2367–2378. http://dx.doi.org/10.1109/TNNLS.2017.2677970 doi: 10.1109/TNNLS.2017.2677970
    [13] P. Huber, E. Ronchetti, Robust statistics, Hoboken: John Wiley & Sons, 2009. http://dx.doi.org/10.1002/9780470434697
    [14] Y. Wu, Y. Liu, Robust truncated hinge loss support vector machine, J. Am. Stat. Assoc., 102 (2007), 974–983. http://dx.doi.org/10.1198/016214507000000617 doi: 10.1198/016214507000000617
    [15] Y. Yu, M. Yang, L. Xu, M. White, D. Schuurmans, Relaxed clipping: a global training method for robust regression and classification, Proceedings of the 23rd International Conference on Neural Information Processing Systems, 2 (2010), 2532–2540.
    [16] S. Huang, Y. Feng, Q. Wu, Learning theory of minimum error entropy under weak moment conditions, Anal. Appl., 20 (2022), 121–139. http://dx.doi.org/10.1142/S0219530521500044 doi: 10.1142/S0219530521500044
    [17] F. Lv, J. Fan, Optimal learning with Gaussians and correntropy loss, Anal. Appl., 19 (2021), 107–124. http://dx.doi.org/10.1142/S0219530519410124 doi: 10.1142/S0219530519410124
    [18] X. Zhu, Z. Li, J. Sun, Expression recognition method combining convolutional features and Transformer, Math. Found. Compt., in press. http://dx.doi.org/10.3934/mfc.2022018
    [19] S. Suzumura, K. Ogawa, M. Sugiyama, M. Karasuyama, I. Takeuchi, Homotopy continuation approaches for robust SV classification and regression, Mach. Learn., 106 (2017), 1009–1038. http://dx.doi.org/10.1007/s10994-017-5627-7 doi: 10.1007/s10994-017-5627-7
    [20] Z. Guo, T. Hu, L. Shi, Gradient descent for robust kernel-based regression, Inverse Probl., 34 (2018), 065009. http://dx.doi.org/10.1088/1361-6420/aabe55 doi: 10.1088/1361-6420/aabe55
    [21] B. Sheng, H. Zhu, The convergence rate of semi-supervised regression with quadratic loss, Appl. Math. Comput., 321 (2018), 11–24. http://dx.doi.org/10.1016/j.amc.2017.10.033 doi: 10.1016/j.amc.2017.10.033
    [22] M. Pontil, Y. Ying, D. Zhou, Error analysis for online gradient descent algorithms in reproducing kernel Hilbert spaces, Proceedings of Technical Report, University College London, 2005, 1–20.
    [23] S. Wang, Z. Chen, B. Sheng, Convergence of online pairwise regression learning with quadratic loss, Commun. Pur. Appl. Anal., 19 (2020), 4023–4054. http://dx.doi.org/10.3934/cpaa.2020178 doi: 10.3934/cpaa.2020178
    [24] H. Bauschke, P. Combettes, Convex analysis and monotone operator theory in Hilber spaces, Cham: Springer-Verlag, 2010. http://dx.doi.org/10.1007/978-3-319-48311-5
    [25] Z. Guo, L. Shi, Fast and strong convergence of online learning algorithms, Adv. Comput. Math., 45 (2019), 2745–2770. http://dx.doi.org/10.1007/s10444-019-09707-8 doi: 10.1007/s10444-019-09707-8
    [26] Y. Lei, D. Zhou, Convergence of online mirror descent, Appl. Comput. Harmon. Anal., 48 (2020), 343–373. http://dx.doi.org/10.1016/j.acha.2018.05.005 doi: 10.1016/j.acha.2018.05.005
    [27] I. Baloch, T. Abdeljawad, S. Bibi, A. Mukheimer, G. Farid, A. Haq, Some new Caputo fractional derivative inequalities for exponentially (θ,hm)-convex functions, AIMS Mathematics, 7 (2022), 3006–3026. http://dx.doi.org/10.3934/math.2022166 doi: 10.3934/math.2022166
    [28] P. Mohammed, D. O'Regan, A. Brzo, K. Abualnaja, D. Baleanu, Analysis of positivity results for discrete fractional operators by means of exponential kernels, AIMS Mathematics, 7 (2022), 15812–15823. http://dx.doi.org/10.3934/math.2022865 doi: 10.3934/math.2022865
    [29] Y. Xia, J. Zhou, T. Xu, W. Gao, An improved deep convolutional neural network model with kernel loss function in image classifiaction, Math. Found. Comput., 3 (2020), 51–64. http://dx.doi.org/10.3934/mfc.2020005 doi: 10.3934/mfc.2020005
    [30] D. Zhou, Deep distributed convolutional neural networks: universality, Anal. Appl., 16 (2018), 895–919. http://dx.doi.org/10.1142/S0219530518500124 doi: 10.1142/S0219530518500124
    [31] D. Zhou, Universality of deep convolutional neural networks, Appl. Comput. Harmon. Anal., 48 (2020), 787–794. http://dx.doi.org/10.1016/j.acha.2019.06.004 doi: 10.1016/j.acha.2019.06.004
    [32] D. Zhou, Theory of deep convolutional neural networks: downsampling, Neural Networks, 124 (2020), 319–327. http://dx.doi.org/10.1016/j.neunet.2020.01.018 doi: 10.1016/j.neunet.2020.01.018
  • Reader Comments
  • © 2022 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(1678) PDF downloads(68) Cited by(1)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog