Research article

An online conjugate gradient algorithm for large-scale data analysis in machine learning

  • In recent years, the amount of available data is growing exponentially, and large-scale data is becoming ubiquitous. Machine learning is a key to deriving insight from this deluge of data. In this paper, we focus on the large-scale data analysis, especially classification data, and propose an online conjugate gradient (CG) descent algorithm. Our algorithm draws from a recent improved Fletcher-Reeves (IFR) CG method proposed in Jiang and Jian[13] as well as a recent approach to reduce variance for stochastic gradient descent from Johnson and Zhang [15]. In theory, we prove that the proposed online algorithm achieves a linear convergence rate under strong Wolfe line search when the objective function is smooth and strongly convex. Comparison results on several benchmark classification datasets demonstrate that our approach is promising in solving large-scale machine learning problems, viewed from the points of area under curve (AUC) value and convergence behavior.

    Citation: Wei Xue, Pengcheng Wan, Qiao Li, Ping Zhong, Gaohang Yu, Tao Tao. An online conjugate gradient algorithm for large-scale data analysis in machine learning[J]. AIMS Mathematics, 2021, 6(2): 1515-1537. doi: 10.3934/math.2021092

    Related Papers:

    [1] Zhi Guang Li . Global regularity and blowup for a class of non-Newtonian polytropic variation-inequality problem from investment-consumption problems. AIMS Mathematics, 2023, 8(8): 18174-18184. doi: 10.3934/math.2023923
    [2] Yudong Sun, Tao Wu . Hölder and Schauder estimates for weak solutions of a certain class of non-divergent variation inequality problems in finance. AIMS Mathematics, 2023, 8(8): 18995-19003. doi: 10.3934/math.2023968
    [3] Jia Li, Zhipeng Tong . Local Hölder continuity of inverse variation-inequality problem constructed by non-Newtonian polytropic operators in finance. AIMS Mathematics, 2023, 8(12): 28753-28765. doi: 10.3934/math.20231472
    [4] Zengtai Gong, Chengcheng Shen . Monotone set-valued measures: Choquet integral, f-divergence and Radon-Nikodym derivatives. AIMS Mathematics, 2022, 7(6): 10892-10916. doi: 10.3934/math.2022609
    [5] Jittiporn Tangkhawiwetkul . A neural network for solving the generalized inverse mixed variational inequality problem in Hilbert Spaces. AIMS Mathematics, 2023, 8(3): 7258-7276. doi: 10.3934/math.2023365
    [6] Imran Ali, Faizan Ahmad Khan, Haider Abbas Rizvi, Rais Ahmad, Arvind Kumar Rajpoot . Second order evolutionary partial differential variational-like inequalities. AIMS Mathematics, 2022, 7(9): 16832-16850. doi: 10.3934/math.2022924
    [7] Xinyue Zhang, Haibo Chen, Jie Yang . Blow up behavior of minimizers for a fractional p-Laplace problem with external potentials and mass critical nonlinearity. AIMS Mathematics, 2025, 10(2): 3597-3622. doi: 10.3934/math.2025166
    [8] Zongqi Sun . Regularity and higher integrability of weak solutions to a class of non-Newtonian variation-inequality problems arising from American lookback options. AIMS Mathematics, 2023, 8(6): 14633-14643. doi: 10.3934/math.2023749
    [9] Ting Xie, Dapeng Li . On the stability of projected dynamical system for generalized variational inequality with hesitant fuzzy relation. AIMS Mathematics, 2020, 5(6): 7107-7121. doi: 10.3934/math.2020455
    [10] Tao Wu . Some results for a variation-inequality problem with fourth order p(x)-Kirchhoff operator arising from options on fresh agricultural products. AIMS Mathematics, 2023, 8(3): 6749-6762. doi: 10.3934/math.2023343
  • In recent years, the amount of available data is growing exponentially, and large-scale data is becoming ubiquitous. Machine learning is a key to deriving insight from this deluge of data. In this paper, we focus on the large-scale data analysis, especially classification data, and propose an online conjugate gradient (CG) descent algorithm. Our algorithm draws from a recent improved Fletcher-Reeves (IFR) CG method proposed in Jiang and Jian[13] as well as a recent approach to reduce variance for stochastic gradient descent from Johnson and Zhang [15]. In theory, we prove that the proposed online algorithm achieves a linear convergence rate under strong Wolfe line search when the objective function is smooth and strongly convex. Comparison results on several benchmark classification datasets demonstrate that our approach is promising in solving large-scale machine learning problems, viewed from the points of area under curve (AUC) value and convergence behavior.


    First, consider a kind of variation-inequality problem

    {Lu0,(x,t)ΩT,uu0,(x,t)ΩT,Lu(uu0)=0,(x,t)ΩT,u(0,x)=u0(x),xΩ,u(t,x)=0,(x,t)Ω×(0,T), (1)

    with the non-Newtonian polytropic operator,

    Lu=tuum(|um|p2um)γ|um|p. (2)

    Here, ΩRN(N2) is a bounded domain with an appropriately smooth boundary Ω, p2, m>0 and u0 satisfies

    u0>0,um0W1,p(Ω)L(Ω).

    The theory of variation-inequality problems has gained significant attention due to its applications in option pricing. These applications are discussed in references [1,2,3], where more details on the financial background can be found. In recent years, there has been a growing interest in the study of variation-inequality problems, with a particular emphasis on investigating the existence and uniqueness of solutions.

    In 2022, Li and Bi considered a two dimension variation-inequality system [4],

    {min{Liuifi(x,t,u1,u2),uiui,0}=0,(x,t)ΩT,u(0,x)=u0(x),xΩ,u(t,x)=0,(x,t)Ω×(0,T),

    involving a degenerate parabolic operator

    Liui=tuidiv(|ui|pi2ui),i=1,2.

    Using the comparison principle of Liui and norm estimation techniques, the sequence of upper and lower solutions for the auxiliary problem is obtained. The existence and uniqueness of weak solutions are then analyzed. While reference [5] considers the initial boundary value problem under a single variational inequality, the author explores more complex non-divergence parabolic operators

    Lu=tuudiv(a(u)|u|p(x)2u)γ|u|p(x)f(x,t).

    Reference [5] constructs a more intricate auxiliary problem and proves that the weak solutions are both unique and existent by using progressive integration and various inequality amplification techniques. Readers can refer to references [6,7,8] for further information on these interesting results.

    In the field of differential equations, there are various literature available on initial boundary value problems that involve the non-Newtonian polytropic operator. In [9,10], the authors focused on a specific class of initial boundary value problems that feature the non-Newtonian polytropic operator

    {tu(|um|p2um)+h(x,t)uα=0,(x,t)ΩT,u(0,x)=u0(x),xΩ,u(t,x)=0,(x,t)Ω×(0,T).

    To investigate the existence of a weak solution, they made use of topologic degree theory.

    Currently, there is no literature on the study of variational inequalities under non-Newtonian polytropic operators (2). Therefore, we attempt to use the results of partial differential equations from literature [5,9,10] to investigate the existence and blow-up properties of weak solutions for variational inequalities (1). Additionally, considering the degeneracy of the operator Lu at u=0 and um=0, some traditional methods for existence proofs are no longer applicable. Here, we attempt to use the fixed point theorem to solve this issue, and obtain the existence and blow up of generalized solutions.

    We first consider the case of variation-inequality in corn options. During the harvest season, farmers face the problem of corn storage, while flour manufacturers are concerned about the downtime caused by a lack of raw materials.

    In exchange for the farmer storing the raw materials in the warehouse, the flour manufacturer promises the farmer the following contract:

    Farmers at any time within a year have the right to sell corn at the agreed price K.

    Assuming that the current time is 0, the corn price St follows the time interval [0, T], and is given by:

    dSt=μStdt+σStdWt,

    where S0 is known, μ represents the annual growth rate of corn price, and σ represents the volatility rate. {Wt,t0} stands for a winner process, representing market noise.

    In addition, to avoid significant economic losses for flour manufacturers due to rapid increases in raw material prices, obstacle clauses are often included in the following form: if the price of corn rises more than B, the option contract becomes null and void. According to literature [1,2,3], the value V of the option contract at any time t[0,T] satisfies

    {min{L0V,Vmax{SK,0}}=0,(S,t)(0,B)×(0,T),u(0,x)=max{SK,0},S(0,B),u(t,B)=0,t(0,T),u(t,0)=0,t(0,T), (3)

    where L0V=tV+12σ2S2SSV+rSSVrV, r is the risk-free interest rate of the agricultural product market; B is the upper bound of corn prices, which prevents flour manufacturers from incurring significant losses due to rising corn prices. On the one hand, if x=lnS, then (3) can be written as

    {min{L1V,Vmax{exK,0}}=0,(x,t)(,lnB)×(0,T),u(0,x)=max{exK,0},x(,lnB),u(t,lnB)=0,t(0,T),u(t,0)=0,t(0,T),

    where L1V=tV+12σ2xxV+rxVrV. It can be seen that problem (4) is a constant coefficient parabolic variational inequality problem, which has long been studied by scholars (see [1,2,3] for details). On the other hand, when there are transaction costs involved in agricultural product trading, the constant σ in the operator LV is no longer valid and is often related to SV, as well as V itself. For instance, the well-known Leland model [5] adjusts volatility σ into a non-divergence structure represented by

    σ2=σ20(1+Lesign(Vx(|xV|p2xV))),p2, (4)

    where σ denotes the original volatility and Le corresponds to the Leland number.

    Inspired by these findings, we aim to explore more intricate variation-inequality models in (1). When m=1, the non-divergence polytropic structure um(|um|p2um) in model (1) degenerates into a similar n-dimensional structure as model (4). It's worth noting that while model (4) only considers one type of risky asset and is defined in a 1-dimensional space, model (1) studies the problem in an n-dimensional space.

    Variation-inequality (1) degenerates when either u=0 or um=0. Classically, there would be no traditional solution. Following a similar way in [1,3], we consider generalized solutions and first give a class of maximal monotone maps G:[0,+)[0,+) satisfies

    G(x)=0ifx>0,G(x)>0ifx=0. (5)

    Definition 2.1 A pair (u,ξ) is called a generalized solution for variation-inequality, if for any fixed T>0,

    (a) umL(0,T,W1,p0(Ω)), tuL2(ΩT),

    (b) ξG(uu0)forany(x,t)ΩT,

    (c) u(x,t)u0(x),u(x,0)=u0(x)forany(x,t)ΩT,

    (d) for every test-function φC1(ˉΩT), there holds

    ΩTtuφ+um|um|p2umφdxdt+(1γ)ΩT|um|pφdxdt=ΩTξφdxdt.

    As far as what was mentioned above, um(|um|p2um) degenerates when um=0 or um=0. We set and use a parameter ε[0,1] to regularize um(|um|p2um) in operator Lu and the initial boundary condition. Meanwhile, we use ε to construct a penalty function βε() and use it to control the inequalities in (1) that the penalty map βε:R+R satisfies

    βε(x)=0ifx>ε,βε(x)[M0,0)ifx[0,ε]. (6)

    In other words, we consider the following regular problem

    {Lεuε=βε(uεu0),(x,t)QT,uε(x,0)=u0ε(x),xΩ,uε(x,t)=ε,(x,t)QT, (7)

    where

    Lεuε=tuεumε((|umε|2+ε)p2umε)γ(|umε|2+ε)p2|umε|2.

    Similar to [4,5], problem (8) admits a solution uε satisfies umεL(0,T;W1,p(Ω)), tumεL(0,T;L2(Ω)), and the identity

    Ω(tuεφ+umε(|umε|2+ε)p22umεφ+(1γ)(|umε|2+ε)p22|umε|2φ)dx=Ωβε(uεu0)φdx, (8)

    with φC1(ˉΩT). Meanwhile, for any ε(0,1),

    u0εuε|u0|+ε, uε1uε2forε1ε2. (9)

    Indeed, define Aθ(uε)=θumε+(1θ)uε,

    Lθ,ωεuε=tuεAθdiv((|Aθ(uε)|2+ε)p22Aθ(uε))γ(|Aθ(uε)|2+ε)p2|Aθ(uε)|2.

    One can use a map based on Leray-Schauder fixed point theory

    M:L(0,T;W1,p0(Ω))×[0,1]L(0,T;W1,p0(Ω)), (10)

    that is,

    {Lθ,ωεuε=θβε(uεu0),(x,t)ΩT,uε(x,0)=u0ε(x)=u0+ε,xΩ,uε(x,t)=ε,(x,t)ΩT, (11)

    so that by proving the boundedness, continuity and compactness of operator M, the existence result of (6) can be established. For details, see literature [11], omitted here.

    In this section, we consider the existence of a generalized solution to variation-inequality (1). Since umεL(0,T;W1,p(Ω)), tumεL(0,T;L2(Ω)), by combining with (9), we may infer that the sequence {uε,ε0} contains a subsequence {uεk,k=1,2,} and a function u, εk0ask,

    uεkua.e.inΩTask, (12)
    umεkweakuminL(0,T;W1,p0(Ω))ask, (13)
    tumεkweaktuminL2(ΩT)ask. (14)

    From (9), one can easily show that uεku, (x,t)ΩT, k=1,2,3,. So, one can infer that for all (x,t)ΩT,

    βεk(uεku0)ξask. (15)

    Next, we pass the limit k. It follows from (13), that for any (x,t)ΩT, k=1,2,3,,

    umεk(|umεk|2+εk)p22umεkweakχ1inL1(Ω)ask, (16)
    (|umεk|2+εk)p22|umεk|2weakχ2inL1(Ω)ask. (17)

    so that pass the limit k,

    Ωtuφ+χ1φdx+(1γ)Ωχ2φdx=Ωξφdx. (18)

    Choosing φ=uεku in (8) and turning ε into εk, one can infer that

    Ωtuεkφ+umεk(|umεk|2+ε)p22umεkφ+(1γ)(|umεk|2+εk)p22|umεk|2φdx=Ωβεk(uεku0)φdx. (19)

    Subtracting (18) from (19) and integrating it from 0 to T,

    ΩT(tuεktu)φ+[umεk(|umεk|2+εk)p22umεkχ1]φdxdt+(1γ)ΩT[(|umεk|2+εk)p22|umεk|2χ2]φdxdt=t0Ω[βεk(uεku0)+ξ]φdxdt. (20)

    From (32), we infer that

    limkt0Ω[βεk(uεku0)+ξ]φdxdt=0, (21)
    ΩT[(|umεk|2+εk)p22|umεk|2χ2]φdxdt=0. (22)

    Recall that uεk(x,0)=u0(x)+εk for any xΩ. Then

    ΩT(tuεktu)φdxdt=12Ω(uεku)2dx12ε2k0.

    Note that εk0ask. So we may infer that ΩT(tuεktu)φdxdt0 if k is large enough. Then, removing the non negative term on the left hand-side in (20) and passing the limit k, it is clear to verify that

    limkΩT[umεk(|umεk|2+εk)p22umεkχ1]φdxdt0, (23)

    Note that um|um|p2um=|uμm|p2uμm, μ=pp1. As mentioned in [12], it follows from (9) that

    [umεk(|umεk|2+εk)p22umεkum|um|p2um](uμmεkuμm)[|uμmεk|p2uμmεk|uμm|p2uμm](umεkum)C|uμmεkuμm|p0. (24)

    Thus, by using sgn(uμmεkuμm)=sgn(φ), leads to

    [(|umεk|2+εk)p22umεk|um|p2um]φ0. (25)

    Subtracting (24) from (25), one can see that

    ΩT[um|um|p2umχ1]φdxdt0. (26)

    Obviously, if we swap uεk and u, one can get another inequality

    ΩT[χ1um|um|p2um]φdxdt0. (27)

    Combining (26) and (27), we obtain (28) below and give the following Lemma.

    Lemma 3.1 For any t(0,T] and xΩ,

    χ1=um|um|p2uma.e.inΩT, (28)
    uμmεkuμmLp(Ω)0ask. (29)

    Proof. One can deduce that (29) is an immediate result of combining (23), (24) and (29).

    Following a similar proof showed in (16)–(28), one can infer that

    χ2=|um|pa.e.inΩT. (30)

    Further, we prove ξG(uu0). When uεku0+ε, we have βεk(uεku0)=0, so

    ξ(x,t)=0u>u0. (31)

    If u0uεk<u0+εk, βε(uεu0)0 and βε()C2(R) imply that

    ξ(x,t)0u=u0. (32)

    Combining (31) and (32), it can be easily verified that ξG(uu0).

    Further, passing the limit k in the second line of (44) and the third line of (6),

    u(x,0)=u0(x) in Ω, u(x,t)u0(x) in ΩT.

    Combining the equations above, we infer that (u,ξ) satisfies the conditions of Definition 2.1, such that (u,ξ) is a generalized solution of (1).

    Theorem 3.1 Assume that um0W1,p(Ω)L(Ω), γ1. Then variation-inequality (1) admits at least one solution (u,ξ) within the class of Definition 2.1.

    In this section, we consider the blowup of the generalized solution when γ>2 and try to prove it by contradiction. Assume (u,ξ) is a generalized solution of (1). Taking φ=um in Definition 1, it is easy to see that

    1m+1ddtΩu(,t)m+1dx+(2γ)Ω|um|pumdx=Ωξumdx. (33)

    It follows from (5), (9), and ξG(uu0) that Ωξumdx0. Let

    E(t)=Ωu(,t)ωm+1dx.

    It follows from (c) in Definition 2.1 that E(t)0 for any t(0,T], so one can infer that

    ddtE(t)(m+1)(γ2)Ω|um|pumdx. (34)

    Using the Poincare inequality gives

    Ω|um|pumdx=p(ω+p)mΩ|u(1p+1)m|pdxp(p+1)mΩ|u|(p+1)mdx. (35)

    Here, Ω|u|(p+1)mdx need to keep shrinking. By the Holder inequality

    E(t)C(|Ω|)(Ω|u|(p+1)mdx)m+1(p+1)m,

    so that

    Ω|u|(p+1)mdxC(|Ω|)E(t)(p+1)mm+1. (36)

    Combining (34)–(36), one can find that

    ddtE(t)C(|Ω|)p(m+1)(γ2)(p+1)mE(t)(p+1)mm+1. (37)

    Note that mp>1. Using variable separation method, we have that

    ddtE(t)1mpm+1C(|Ω|)p(γ2)(1mp)(p+1)m, (38)

    such that

    E(t)1(E(0)1mpm+1C(|Ω|)p(γ2)(mp1)(p+1)mt)m+1mp1. (39)

    This means that the generalized solution blows up at T=E(0)1mpm+1(p+1)mp(γ2)(mp1)C(|Ω|).

    Theorem 4.1 Assume mp>1. if γ>2, the generalized solution (u,ξ) of variation-inequality (1) blows up in finite time.

    In this study, the existence and blowup of a generalized solution to a class of variation-inequality problems with non-divergence polytropic parabolic operators

    Lu=tuum(|um|p2um)γ|um|p.

    We first consider the existence of generalized solution. Due to the use of integration by parts, γ|um|p becomes (1γ)|um|p. In the process of proving umεL(0,T;W1,p(Ω)) and tumεL(0,T;L2(Ω)) in [4,5], (1γ)|um|p is required to be greater than 0, therefore eliciting the restriction γ1. Regarding the restriction of p, the condition p1 is required in (24) and the above formula. As what mentioned, we have used the results umεL(0,T;W1,p(Ω)) and tumεL(0,T;L2(Ω)) in literature [4,5] where p2 is required, therefore we require the restriction that p2. The results show that variation-inequality (1) admits at least one solution (u,ξ) when γ1.

    Second, we analyzed the blowup phenomenon of a generalized solution. In (38), mp must be big than 1, otherwise (39) is invalid. The results show that the generalized solution (u,ξ) of the variation-inequality (1) blows up in finite time when γ2.

    The author sincerely thanks the editors and anonymous reviewers for their insightful comments and constructive suggestions, which greatly improved the quality of the paper.

    The author declares no conflict of interest.



    [1] J. Barzilai, J. M. Borwein, Two-point step size gradient methods, IMA J. Numer. Anal., 8 (1988), 141-148. doi: 10.1093/imanum/8.1.141
    [2] E. Bisong, Batch vs. online larning, Building Machine Learning and Deep Learning Models on Google Cloud Platform, 2019.
    [3] L. Bottou, F. E. Curtis, J. Nocedal, Optimization methods for large-scale machine learning, SIAM Rev., 60 (2018), 223-311. doi: 10.1137/16M1080173
    [4] Y. H. Dai, Y. Yuan, Nonlinear conjugate gradient methods, Shanghai: Shanghai Scientific Technical Publishers, 2000.
    [5] D. Davis, B. Grimmer, Proximally guided stochastic subgradient method for nonsmooth, nonconvex problems, SIAM J. Optim., 29 (2019), 1908-1930. doi: 10.1137/17M1151031
    [6] R. Dehghani, N. Bidabadi, H. Fahs, M. M. Hosseini, A conjugate gradient method based on a modified secant relation for unconstrained optimization, Numer. Funct. Anal. Optim., 41 (2020), 621-634. doi: 10.1080/01630563.2019.1669641
    [7] P. Faramarzi, K. Amini, A modified spectral conjugate gradient method with global convergence, J. Optim. Theory Appl., 182 (2019), 667-690. doi: 10.1007/s10957-019-01527-6
    [8] R. Fletcher, C. M. Reeves, Function minimization by conjugate gradients, Comput. J., 7 (1964), 149-154. doi: 10.1093/comjnl/7.2.149
    [9] J. C. Gilbert, J. Nocedal, Global convergence properties of conjugate gradient methods for optimization, SIAM J. Optim., 2 (1992), 21-42. doi: 10.1137/0802003
    [10] W. W. Hager, H. Zhang, Algorithm 851: CG DESCENT, a conjugate gradient method with guaranteed descent, ACM Trans. Math. Software, 32 (2006), 113-137. doi: 10.1145/1132973.1132979
    [11] A. S. Halilu, M. Y. Waziri, Y. B. Musa, Inexact double step length method for solving systems of nonlinear equations, Stat. Optim. Inf. Comput., 8 (2020), 165-174. doi: 10.19139/soic-2310-5070-532
    [12] H. Jiang, P. Wilford, A stochastic conjugate gradient method for the approximation of functions, J. Comput. Appl. Math., 236 (2012), 2529-2544. doi: 10.1016/j.cam.2011.12.012
    [13] X. Jiang, J. Jian, Improved Fletcher-Reeves and Dai-Yuan conjugate gradient methods with the strong Wolfe line search, J. Comput. Appl. Math., 348 (2019), 525-534. doi: 10.1016/j.cam.2018.09.012
    [14] X. B. Jin, X. Y. Zhang, K. Huang, G. G. Geng, Stochastic conjugate gradient algorithm with variance reduction, IEEE Trans. Neural Networks Learn. Syst., 30 (2018), 1360-1369.
    [15] R. Johnson, T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, Advances in Neural Information Processing Systems, 2013.
    [16] X. L. Li, Preconditioned stochastic gradient descent, IEEE Trans. Neural Networks Learn. Syst., 29 (2018), 1454-1466. doi: 10.1109/TNNLS.2017.2672978
    [17] Y. Liu, X. Wang, T. Guo, A linearly convergent stochastic recursive gradient method for convex optimization, Optim. Lett., 2020. Doi: 10.1007/s11590-020-01550-x. doi: 10.1007/s11590-020-01550-x
    [18] M. Lotfi, S. M. Hosseini, An efficient Dai-Liao type conjugate gradient method by reformulating the CG parameter in the search direction equation, J. Comput. Appl. Math., 371 (2020), 112708. doi: 10.1016/j.cam.2019.112708
    [19] S. Mandt, M. D. Hoffman, D. M. Blei, Stochastic gradient descent as approximate Bayesian inference, J. Mach. Learn. Res., 18 (2017), 4873-4907.
    [20] P. Moritz, R. Nishihara, M. I. Jordan, A linearly-convergent stochastic L-BFGS algorithm, Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, 2016.
    [21] L. M. Nguyen, J. Liu, K. Scheinberg, M. Takáč, SARAH: A novel method for machine learning problems using stochastic recursive gradient, Proceedings of the 34th International Conference on Machine Learning, 2017.
    [22] A. Nitanda, Accelerated stochastic gradient descent for minimizing finite sums, Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, 2016.
    [23] H. Robbins, S. Monro, A stochastic approximation method, Ann. Math. Statist., 22 (1951), 400-407. doi: 10.1214/aoms/1177729586
    [24] N. N. Schraudolph, T. Graepel, Combining conjugate direction methods with stochastic approximation of gradients, Proceedings of the 9th International Workshop on Artificial Intelligence and Statistics, 2003.
    [25] G. Shao, W. Xue, G. Yu, X. Zheng, Improved SVRG for finite sum structure optimization with application to binary classification, J. Ind. Manage. Optim., 16 (2020), 2253-2266.
    [26] C. Tan, S. Ma, Y. H. Dai, Y. Qian, Barzilai-Borwein step size for stochastic gradient descent, Advances in Neural Information Processing Systems, 2016.
    [27] P. Toulis, E. Airoldi, J. Rennie, Statistical analysis of stochastic gradient methods for generalized linear models, Proceedings of the 31th International Conference on Machine Learning, 2014.
    [28] V. Vapnik, The nature of statistical learning theory, New York: Springer, 1995.
    [29] L. Xiao, T. Zhang, A proximal stochastic gradient method with progressive variance reduction, SIAM J. Optim., 24 (2014), 2057-2075. doi: 10.1137/140961791
    [30] Z. Xu, Y. H. Dai, A stochastic approximation frame algorithm with adaptive directions, Numer. Math. Theory Methods Appl., 1 (2008), 460-474.
    [31] W. Xue, J. Ren, X. Zheng, Z. Liu, Y. Ling, A new DY conjugate gradient method and applications to image denoising, IEICE Trans. Inf. Syst., 101 (2018), 2984-2990.
    [32] Q. Zheng, X. Tian, N. Jiang, M. Yang, Layer-wise learning based stochastic gradient descent method for the optimization of deep convolutional neural network, J. Intell. Fuzzy Syst., 37 (2019), 5641-5654. doi: 10.3233/JIFS-190861
  • Reader Comments
  • © 2021 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(7203) PDF downloads(323) Cited by(11)

Figures and Tables

Figures(17)  /  Tables(3)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog