Loading [MathJax]/jax/output/SVG/jax.js
Research article Special Issues

Existence, uniqueness, and convergence rates for gradient flows in the training of artificial neural networks with ReLU activation

  • The training of artificial neural networks (ANNs) with rectified linear unit (ReLU) activation via gradient descent (GD) type optimization schemes is nowadays a common industrially relevant procedure. GD type optimization schemes can be regarded as temporal discretization methods for the gradient flow (GF) differential equations associated to the considered optimization problem and, in view of this, it seems to be a natural direction of research to first aim to develop a mathematical convergence theory for time-continuous GF differential equations and, thereafter, to aim to extend such a time-continuous convergence theory to implementable time-discrete GD type optimization methods. In this article we establish two basic results for GF differential equations in the training of fully-connected feedforward ANNs with one hidden layer and ReLU activation. In the first main result of this article we establish in the training of such ANNs under the assumption that the probability distribution of the input data of the considered supervised learning problem is absolutely continuous with a bounded density function that every GF differential equation admits for every initial value a solution which is also unique among a suitable class of solutions. In the second main result of this article we prove in the training of such ANNs under the assumption that the target function and the density function of the probability distribution of the input data are piecewise polynomial that every non-divergent GF trajectory converges with an appropriate rate of convergence to a critical point and that the risk of the non-divergent GF trajectory converges with rate 1 to the risk of the critical point. We establish this result by proving that the considered risk function is semialgebraic and, consequently, satisfies the Kurdyka-Łojasiewicz inequality, which allows us to show convergence of every non-divergent GF trajectory.

    Citation: Simon Eberle, Arnulf Jentzen, Adrian Riekert, Georg S. Weiss. Existence, uniqueness, and convergence rates for gradient flows in the training of artificial neural networks with ReLU activation[J]. Electronic Research Archive, 2023, 31(5): 2519-2554. doi: 10.3934/era.2023128

    Related Papers:

    [1] Don J. DeGroot, Melissa E. Landon, Steven E. Poirier . Geology and engineering properties of sensitive Boston Blue Clay at Newbury, Massachusetts. AIMS Geosciences, 2019, 5(3): 412-447. doi: 10.3934/geosci.2019.3.412
    [2] Paul W. Mayne, Ethan Cargill, Bruce Miller . Geotechnical characteristics of sensitive Leda clay at Canada test site in Gloucester, Ontario. AIMS Geosciences, 2019, 5(3): 390-411. doi: 10.3934/geosci.2019.3.390
    [3] Bruno Di Buò, Marco D’Ignazio, Juha Selänpää, Markus Haikola, Tim Länsivaara, Marta Di Sante . Investigation and geotechnical characterization of Perniö clay, Finland. AIMS Geosciences, 2019, 5(3): 591-616. doi: 10.3934/geosci.2019.3.591
    [4] Robert Radaszewski, Jędrzej Wierzbicki . Characterization and engineering properties of AMU Morasko soft clay. AIMS Geosciences, 2019, 5(2): 235-264. doi: 10.3934/geosci.2019.2.235
    [5] Mike Long, Jean Sebastien L’Heureux, Bjørn Kristian Fiskvik Bache, Alf Kristian Lund, Svein Hove, Karl Gunnar Sødal, Helene Alexandra Amundsen, Steinar Nordal, Alberto Montafia . Site characterisation and some examples from large scale testing at the Klett quick clay research site. AIMS Geosciences, 2019, 5(3): 344-389. doi: 10.3934/geosci.2019.3.344
    [6] Ariane Locat, Pascal Locat, Hubert Michaud, Kevin Hébert, Serge Leroueil, Denis Demers . Geotechnical characterization of the Saint-Jude clay, Quebec, Canada. AIMS Geosciences, 2019, 5(2): 273-302. doi: 10.3934/geosci.2019.2.273
    [7] Sonia Fanelli, Fabien Szymkiewicz, Philippe Reiffsteck . The Cran experimental site in Brittany: 50 years of geotechnical investigations and on site testing. AIMS Geosciences, 2025, 11(3): 540-557. doi: 10.3934/geosci.2025023
    [8] Øyvind Blaker, Roselyn Carroll, Priscilla Paniagua, Don J. DeGroot, Jean-Sebastien L'Heureux . Halden research site: geotechnical characterization of a post glacial silt. AIMS Geosciences, 2019, 5(2): 184-234. doi: 10.3934/geosci.2019.2.184
    [9] Tonje Eide Helle, Per Aagaard, Steinar Nordal, Michael Long, Sara Bazin . A geochemical, mineralogical and geotechnical characterization of the low plastic, highly sensitive glaciomarine clay at Dragvoll, Norway. AIMS Geosciences, 2019, 5(4): 704-722. doi: 10.3934/geosci.2019.4.704
    [10] Don J. DeGroot, Tom Lunne, Ravindra Ghanekar, Siren Knudsen, Cody D. Jones, Tor Inge Yetginer-Tjelta . Engineering properties of low to medium overconsolidation ratio offshore clays. AIMS Geosciences, 2019, 5(3): 535-567. doi: 10.3934/geosci.2019.3.535
  • The training of artificial neural networks (ANNs) with rectified linear unit (ReLU) activation via gradient descent (GD) type optimization schemes is nowadays a common industrially relevant procedure. GD type optimization schemes can be regarded as temporal discretization methods for the gradient flow (GF) differential equations associated to the considered optimization problem and, in view of this, it seems to be a natural direction of research to first aim to develop a mathematical convergence theory for time-continuous GF differential equations and, thereafter, to aim to extend such a time-continuous convergence theory to implementable time-discrete GD type optimization methods. In this article we establish two basic results for GF differential equations in the training of fully-connected feedforward ANNs with one hidden layer and ReLU activation. In the first main result of this article we establish in the training of such ANNs under the assumption that the probability distribution of the input data of the considered supervised learning problem is absolutely continuous with a bounded density function that every GF differential equation admits for every initial value a solution which is also unique among a suitable class of solutions. In the second main result of this article we prove in the training of such ANNs under the assumption that the target function and the density function of the probability distribution of the input data are piecewise polynomial that every non-divergent GF trajectory converges with an appropriate rate of convergence to a critical point and that the risk of the non-divergent GF trajectory converges with rate 1 to the risk of the critical point. We establish this result by proving that the considered risk function is semialgebraic and, consequently, satisfies the Kurdyka-Łojasiewicz inequality, which allows us to show convergence of every non-divergent GF trajectory.



    Lung cancer is one of the leading causes of cancer deaths globally, accounting for 24% cases of all cancer deaths [1,2]. The mortality-to-incidence ratio of lung cancer is 0.87 globally, indicating the poor outcomes of lung cancer [3]. Non-small cell lung cancer (NSCLC) accounts for about 85% cases of lung cancers, of which lung adenocarcinoma (LUAD) is the most common subtype [4]. Lung cancer is often diagnosed at a late stage with concomitant poor prognosis [5]. The prognosis of lung cancer is associated with lymph node metastasis (LNM). It was reported that 5-year survival of lung adenocarcinoma patients with LNM was only 26–53% while 5-year survival of early-stage lung adenocarcinoma without LNM was over 95% [6,7]. Therefore, screening specific markers of lymph node metastasis of early-stage lung adenocarcinoma would be helpful for cancer diagnosis and treatment.

    Various biomolecules such as proteins, mRNA, miRNA, methylated DNA, lncRNA, etc. have been implemented as cancer biomarkers. Among which miRNA is emerging as a useful tool. The miRNAs are a class of non-coding RNAs that post-transcriptionally control gene expression via either translational repression or mRNA degradation. Evidence reveals that miRNAs play significant roles in regulatory mechanisms including tumorigenesis [8,9]. Multiple studies have shown that tumor-derived miRNAs can persist in human plasma in a very stable form, thus serving as potential biomarkers to facilitate the early detection of lung cancer [10,11]. Compared with single miRNA biomarker, a miRNA signature comprising multiple miRNAs may improve prediction accuracy and would be more powerful in classifying cancer subtypes [12,13]. To our knowledge, no studies have worked on LNM prediction in LUAD using miRNA signatures. Therefore, we constructed a miRNA signature to predict metastasis of lymph node in LUAD and hypothesized that the signature achieves high accuracy for prediction.

    Transcriptome data including microRNA expression and mRNA expression for lung adenocarcinoma patients were downloaded from The Cancer Genome Atlas (TCGA) (https://portal.gdc.cancer.gov/) on July 3rd 2019. Tumor staging, sex and other corresponding clinical data were obtained from TCGA clinical information. MiRNA expressions were available for 342 patients without distant metastases. Only miRNAs with a trimmed mean of 95% larger than 5 counts were retained in the profile.

    Patients were randomly partitioned into training and validation cohorts, with t-test, Fisher's exact test and chi-square test proving no significant difference of patients' characteristics between two cohorts. Least absolute shrinkage and selection operator (LASSO) regression analysis was used in training cohort to minimize multicollinearity [14]. The 10-fold cross validation was applied in LASSO to select the optimal tuning parameters (λ) via the minimum criteria, and the optimal λ value of 0.054 with log(λ) = −1.268 was chosen. Logistic regression model was used to construct the final miRNA disease signature to determine patients with lymph node metastasis. The logistic regression formula formed in the training cohort was then applied to validation cohort. Receiver operator characteristic (ROC) curve was constructed and the area under ROC curve (AUC) value was calculated to validate the performance of LMN prediction. To evaluate the clinical application value of signature, decision curve analysis (DCA) was conducted by R studio. A heatmap was plotted using TB tools (https://github.com/CJ-Chen/TBtools) to differentiate miRNA expression between the patients with or without LMN [15]. All statistical analysis were conducted using SPSS Version 23.0 software or package "rmda" within the R statistical software version 3.6.0. Two-tailed tests and p values < 0.05 for significance were used.

    miRWalk database (http://www.umm.uni-heidelberg.de/apps/zmf/mirwalk/) was used to predict the transcriptional target-genes of identified miRNAs [16]. We conducted miRNA-target gene interaction networks with miRNA-target gene interacting pairs, and the miRNA-mRNA interaction network was visualized by Cytoscape (version 3.7.1) software (http://cytoscape.org/) [17].

    TCGA-LUAD cohort was randomly divided into a training set (n = 259) and a validation set (n = 83). The mean age of all patients in the study was 65.2 ± 9.9,174 (50.9%) females, 168 (49.1%) males and 123 (36.0%) had lymph node metastasis. The demographics of training and validation cohorts were well balanced, as shown in Table 1.

    Table 1.  Clinical characteristic of the training and validation cohorts.
    Characteristics Training cohort Validation cohort p Value
    N 259 83
    LNM 89 (34.4%) 34 (41%) 0.295(a)
    Age 65.29 ± 9.97 65.06 ± 9.83 0.864(b)
    Sex
    female 134 (51.7%) 40 (48.2%)
    male 125 (48.3%) 43 (51.8%) 0.615(b)
    T stage
    T1 81 (31.1%) 21 (25.3%) 0.686(c)
    T2 147 (56.8%) 49 (59%)
    T3 21 (8.8%) 9 (10.8%)
    T4 10 (4.1%) 4 (4.9%)
    Cigarettes smoked per year(d) 39.85 ± 23.12 42.07 ± 27.54 0.609(a)
    Event
    alive 149 (57.5%) 56 (67.5%) 0.207(c)
    dead 103 (39.8%) 25 (30.1%)
    unknown 9 (2.7%) 2 (2.4%)
    (a)The p value was calculated by Fisher's exact test. (b)The p value was calculated by the t test. (c)The p value was calculated by the χ2 test. (d)Number of packets of cigarettes smoked per year by the patients.

     | Show Table
    DownLoad: CSV

    In the training cohort, a total of 387 miRNAs were put into the LASSO logistic regression program and 10 of them had non-zero coefficients as potential predictors (Figure 1A). The selected 10 miRNAs were hsa-miR-30d, hsa-miR-338, hsa-miR-582, hsa-miR-378a, hsa-miR-3065, hsa-miR-664a, hsa-miR-552, hsa-miR-3653, hsa-miR-4728 and hsa-miR-376b. To simplify the signature, each miRNA was removed from the disease signature sequentially to analyze AUC in testing cohort. Based on the fact that the removal of has-miR-3653 led to greater AUC, we excluded it in our signature. Given hsa-miR-598 [18], hsa-miR-891a [19], hsa-miR-509-3 [20] and hsa-miR-133a [21] have been reported to have significant prognostic value in LUAD, we then included them. A 13-miRNA signature was established using logistic regression. The patient risk score was derived by the summation of each miRNA expression levels times its corresponding coefficient. The risk score for lymph node metastasis was calculated as follows: Risk score = (0.00002 × miR-30d) + (-0.000053 × miR-338) + (0.000113 × miR-582) + (0.000412 × miR-378a) + (-0.000767 × miR-3065) + (-0.002072 × miR-664a) + (-0.006408 × miR-598) + (-0.002753 × miR-552) + (0.002608 × miR-891a) + (0.007073 × miR-5093) + (-0.016114 × miR-133a1) + (-0.009681 × miR-4728) + (0.042837 × miR-376b) - 0.456838.

    Figure 1.  Selection of features by the least absolute shrinkage and selection operator (LASSO) binary logistic regression model and miRNA signature for the prediction of LNM. (A) Tuning parameters (λ) selected in the LASSO model applied 10-fold cross-validation via the minimum criteria. The Y-axis indicates the binomial deviances. The lower X-axis indicates the log(λ). Numbers along the upper X-axis represent the average number of predictors. Red dots indicate average deviance values for each model with a given λ. Vertical bars through the red dots show the upper and lower limits of the deviances. Dotted vertical lines were drawn at the optimal values using the minimum criteria with 1 standard error (the 1-SE criteria). Theoptimal λ value of 0.054 with log(λ) = −1.268 was chosen; (B, C) the ROC curve of the radiomics signature. ROC, receiver operator characteristic. (D) DCA for the miRNA disease signature. The Y-axis represents net benefit. The X-axis represents threshold probability. The threshold probability is where the expected benefit of treatment is equal to the expected benefit of avoiding treatment. The red line represents 13-miRNA signature model. The blue line represents the hypothesis that all patients have lymph node metastases (LNM). The black line represents the hypothesis that no patients had LNM.

    We evaluated the accuracy of the 13-miRNA signature in training cohort and validation cohort, the area under the curve (AUC) value were 0.782 (95% CI, 0.725–0.839) and 0.691 (95% CI, 0.575–0.806) respectively (Figure 1B, C). We calculated the AUC values of different sex, age and T-stage subetaoup to validate the performance of our miRNA signature in these diverse subsets. As shown in Table 2, the AUC value was high in all subetaoups ranging from 0.600 to 0.811. Notably, the AUC value reached 0.811 in patients aged over 80 years old. Hence, the signature showed good performance in both training and validation cohorts. Decision Curve Analysis (DCA) is shown in Figure 1D. If the threshold probability of a patient was between 0.2 and 0.6, the miRNA signature for predicting LNM would be more beneficial than the strategies "treat all" or "treat none". For example, at the 0.25 threshold, the net benefit was 14.6% (95% CI, 7.6–22.0%) in the treat-all model, 20.7% (95% CI, 15.7–27.0%) in our disease signature model. The net benefit of the disease signature model was significantly higher compared with the treat-none strategy at thresholds ≤ 0.63. Thus, our signature was considered clinically valuable. The differences of thirteen-miRNA expression between patients with LNM or without LNM were shown in Figure 2. The expression levels of miR-30d, miR-376b, miR-378a, miR-503-9, miR-582 and miR-891a in patients with LNM were relatively higher than patients without LNM, while the expression levels of the miR-133a, miR-3065, miR-338, miR-4728, miR-552, miR-598 and miR-664a were lower in patients with LNM. Therefore, the results of DCA and gene expression distribution both proved our miRNAs signature had good predict validation.

    Table 2.  AUC value and 95% Confidence Intervals (CI) in different sex, age and T-stage subgroups.
    N AUC 95% CI
    Overall 342 0.757 (0.705, 0.809)
    Age
    ≤ 49 22 0.709 (0.483, 0.935)
    50–59 71 0.694 (0.568, 0.820)
    60–69 108 0.753 (0.657, 0.849)
    ≥ 70 122 0.811 (0.733, 0.889)
    Sex
    M 168 0.774 (0.703, 0.844)
    F 174 0.738 (0.661, 0.816)
    T-stage
    T1 102 0.697 (0.569, 0.825)
    T2 196 0.765 (0.699, 0.831)
    T3 30 0.787 (0.623, 0.952)
    T4 14 0.600 (0.284, 0.916)

     | Show Table
    DownLoad: CSV
    Figure 2.  Expression of miRNAs in signature in the lymph node metastasis group versus non-lymph node metastasis group. Red indicates high relative expression and green indicates low relative expression.

    A total of 8968 mRNAs were identified from the TCGA database, of which 325 mRNAs were differentially expressed between lung adenocarcinoma patients with LNM and those without LNM (fold change > 20, P < 0.05). The next step was to deduce mRNAs targeted by miRNAs. We focused on the 13 miRNAs in our signature and 27 out of 325 differentially expressed mRNAs were identified as potential targets using miRWalk database.

    Based on the data collected, we constructed a miRNA-mRNA regulatory network using Cytoscape 3.7 (Figure 3). A total of 184 interactions were identified in this network. Among them, hsa-miR-378a, hsa-miR-4728, hsa-miR-598, hsa-miR-3065, hsa-miR-338, hsa-miR-509-3, hsa-miR-552, hsa-miR-664a, hsa-miR-891a regulated the most target mRNAs. LIPF, NEUROD4, PSG4, PPPIR3A, DAZ2, CRAISP1 were putatively regulated by most miRNAs and may have a potential role in lymph node metastasis in lung adenocarcinoma patients.

    Figure 3.  The proposed regulatory network between DEMIs and DEMs associated with lymph node metastasis in lung adenocarcinoma. The rectangular and ellipses represent the miRNAs and mRNAs, respectively. The red and green rings indicated relatively up-regulated and down-regulated expression in lung adenocarcinoma patients with LNM, respectively.

    Lung cancer is a leading cause of cancer deaths worldwide, contributing to 1.8 million deaths in 2018 alone [22,23]. Involvement of the regional lymph nodes is a characteristic feature of early stage of tumor growth in NSCLC, and "N status" is a major factor clinical management of this subtype [24]. NSCLC patients with LNM often have poor prognosis and high mortality. It was reported that 5-year survival of lung adenocarcinoma patients without LNM was over 95% while 5-year survival of mid-stage or late-stage cancer patients with LNM was only 30% [7]. Therefore, lymph node metastasis is closely associated with prognosis of lung adenocarcinoma and its accurate prediction is helpful to instigate early treatment to improve patient outcomes.

    MiRNA signatures of lung cancer are commonly used to predict the survival time of cancer patients [25,26]. However, survival time is easily swayed by healthcare qualities, psychological condition and treatment options etc. Lymph nodes metastases is just a status of cancer disease, which is highly related to gene expressions. Thus, prediction of lymph nodes metastases with miRNA is more reasonable and objective compared to prediction of survival time. For the first time, we developed a miRNA disease signature to predict lymph node metastasis in patients with lung adenocarcinoma. Although previous studies have identified miRNA signatures that correlate with overall survival, their clinical use was limited owing to LUAD patients' significant heterogeneity from multiple aspects like pathology, treatment plans, surgical options, healthcare services and psychological conditions. Each of these aspects may significantly influence patients' overall survival rates, thus contributing to the unreliability of survival prediction. Using the TCGA lung adenocarcinoma cohort, we constructed a 13-miRNA signature with satisfying AUC values to predict LNM of LUAD. Our signature focused on the factors associated with lymph node metastasis instead of overall survival, whose evaluation were more objective, thereby conceiving great clinical value. More importantly, the prediction of lymph node metastasis serves as a distinct approach to the evaluation of patients' prognosis and could be implemented as an important indicator for physicians to pick proper treatment plans. For patients at T1 and T2 stage, the signature had AUC values of 0.697 and 0.765 respectively, indicating its good performance in early stage lung adenocarcinoma. And the higher T stage, the better prediction accuracy was achieved. In addition, our signature worked better in elder populations, as patients above 50 years old had a higher AUC value compared to younger patients.

    The miRNAs in our signature were known to function in oncogenesis in lung cancer, or have reported to have prognostic value. In our signature, miR-582, miR-378a, miR-891a, miR-509-3, miR-376b were upregulated in LUAD patients with lymph node metastasis. Fang et al. reported that in NSCLC, miR-582-3p had an activating effect on Wnt/β-catenin signaling and the overexpression of which targets several negative receptors, AXIN2, DKK3 and SFRP1, in the pathway, thereby promoting tumorigenesis and tumor recurrence [27]. Contrarily, miR-582-5p was also reported to suppress the growth and invasion of NSCLC cell by targeting NOTCH1 and MAP3K2 [28,29]. MiR-378 overexpression could decrease the expression of HMOX1 and p53 while enhancing that of MUC5AC, vascular endothelial growth factor, interleukin-8, and Ang-1, thereby promoting proliferation, migration, and stimulation of endothelial cells in NSCLC cell lines [30]. Overexpression of miR-891a was observed in NSCLC and it negatively regulated HOXA5, a tumor suppressor gene whose gene product could up-regulate the expression of the TP53 tumor suppressor gene [31]. The result was further proved by a study, showing miR-891a-5p was found to present at significantly higher level in lung cancer compared with control sample [32]. MiR-509-3 could repress PLK1 expression for inhibiting cancer proliferation and sensitizes cells to DNA damage agents, which provides insights into future optimization of chemotherapy [33]. In breast cancer and renal cell carcinoma, miR-509-3 played a similar role in suppressing cell invasion and migration [34,35]. Korkmaz et al. found that overexpression of miR-376b lowered the levels of ATG4C and BECN1 to control autophagy [36].

    On the other hand, miR-30d, miR-338, miR-3065, miR-664a, miR-598, miR-552, miR-133a-1, mirR4728 were down-regulated in LUAD patients with lymph node metastasis. MiR-30d-5p, an often down-regulated miRNA in NSCLC tissues, could target cyclin E2 (CCNE2) to inhibit the growth, distribution and motility of NSCLC cells [37]. Similarly, miR-338-3p could suppress the migration and invasion of NSCLC cells by targeting integrin β3, a metastasis related protein or Sox4, an epithelial-mesenchymal transition (EMT)-related transcription factor [38,39]. A negative correlation was observed between the expression of miR-338-3p and insulin receptor substrate 2, an oncogene, suggesting a new way for miR-338-3p to suppress tumor [40]. MiR-3065 was reported to target some lung squamous cell carcinoma lncRNAs, but its role in oncogenesis has not yet been studied in lung adenocarcinoma [41]. The presence of miR-664 may enhance the proliferation and migration in lung cancer cell lines while inhibiting apoptosis [42]. Overexpression of miR-598 could inhibit tumor cell metastasis in NSCLC in vivo and negatively regulates Derlin-1 and EMT to suppress cancer cell invasion and metastasis in vitro [43]. Inhibitory effect of miR-598 overexpression to LNM was also observed in NSCLC cells by targeting zinc finger E-box-binding homeobox 2 [44]. Xu et al. found a positive feedback loop between LINC01296, Twist1 and miR-598, indicating a potential target for therapeutic strategy [45]. MiR-552 was mainly reported in colorectal cancer as a negative prognostic indicator and may potentially be used to predict the origin of lung carcinoma, namely differentiating primary lung adenocarcinoma and colorectal cancer metastases [46–48]. Studies have also concluded that overexpression of miR-133a could suppress cell proliferation, invasion and metastasis in lung cancer cell lines by suppressing the expressions of matrix metalloproteinase MMP14 and oncogenic receptors including insulin-like growth factor 1 receptor (IGF-1R), TGF-beta receptor type-1 (TGFBR1), and epidermal growth factor receptor (EGFR) [49,50]. Clinical studies also confirmed that the expression of miR-133a negatively associated with the status of N classification and MMP-14 expression [51]. The role of mir-4728 in lung cancer has yet been reported, but it served as a negative regulator for of MAPK signaling through directly targeting the ERK upstream kinase MST4 to suppress cancer cell proliferation in vitro [52].

    These studies demonstrated the association between expression of these miRNAs and lymph node metastasis. Identified miRNAs in our signature and the interactions between miRNA and mRNA is helpful to increase our understanding for the pathogenesis and lymph node metastasis of LUAD. Moreover, determination of these miRNAs and pathways could serve as potential therapeutic targets for treatments in lung cancer and provide insights into future clinical use.

    It should be noted that although miRNAs in our signature have demonstrated functions in cell migration, invasion and other developmental processes, there was little overlap between our miRNAs and the ones reported in other signatures that predicted overall survival, possibly due to different methodology and distinct expected outcomes. Concerning that mRNAs reported in our miRNA-mRNA regulatory network have yet been reported, investigations are warranted to look into these genes. One other limitation is that both our training and validation cohorts were obtained from TCGA database. External validation sets and experimental validation in biological function of these miRNAs may suggest further implications of lymph node metastasis in lung adenocarcinoma.

    This work was supported by Zhejiang Medical and Health Project (2019334185), Zhejiang Natural Sciences Foundation Grant (LQ17H160011, LY17H160029, LY18H160007), the National Natural Science Foundation of China (81602635, 81703072), Zhejiang Medical Innovative Discipline Construction Project-2016 and the Innovation research grant program for 8-year-system medical students at Zhejiang University (No.119000-5405A1).

    The authors declare no competing interests.



    [1] F. Bach, E. Moulines, Non-strongly-convex smooth stochastic approximation with convergence rate , in Advances in Neural Information Processing Systems, 26 (2013), 773–781. Available from: https://proceedings.neurips.cc/paper/2013/file/7fe1f8abaad094e0b5cb1b01d712f708-Paper.pdf.
    [2] A. Jentzen, B. Kuckuck, A. Neufeld, P. von Wurstemberger, Strong error analysis for stochastic gradient descent optimization algorithms, IMA J. Numer. Anal., 41 (2021), 455–492. https://doi.org/10.1093/imanum/drz055 doi: 10.1093/imanum/drz055
    [3] E. Moulines, F. Bach. Non-asymptotic analysis of stochastic approximation algorithms for machine learning, in Advances in Neural Information Processing Systems, 24 (2011), 451–459. Available from: https://proceedings.neurips.cc/paper/2011/file/40008b9a5380fcacce3976bf7c08af5b-Paper.pdf.
    [4] Y. Nesterov, Introductory Lectures on Convex Optimization, 2004. https://doi.org/10.1007/978-1-4419-8853-9
    [5] A. Rakhlin, O. Shamir, K. Sridharan, Making gradient descent optimal for strongly convex stochastic optimization, in Proceedings of the 29th International Conference on Machine Learning, Madison, WI, USA, (2012), 1571–1578.
    [6] P. A. Absil, R. Mahony, B. Andrews, Convergence of the iterates of descent methods for analytic cost functions, SIAM J. Optim., 16 (2005), 531–547. https://doi.org/10.1137/040605266 doi: 10.1137/040605266
    [7] H. Attouch, J. Bolte, On the convergence of the proximal algorithm for nonsmooth functions involving analytic features, Math. Program., 116 (2009), 5–16. https://doi.org/10.1007/s10107-007-0133-5 doi: 10.1007/s10107-007-0133-5
    [8] H. Attouch, J. Bolte, B. F. Svaiter, Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods, Math. Program., 137 (2013), 91–129. https://doi.org/10.1007/s10107-011-0484-9 doi: 10.1007/s10107-011-0484-9
    [9] J. Bolte, A. Daniilidis, A. Lewis. The łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems, SIAM J. Optim., 17 (2007), 1205–1223. https://doi.org/10.1137/050644641 doi: 10.1137/050644641
    [10] S. Dereich, S. Kassing, Convergence of stochastic gradient descent schemes for Lojasiewicz-landscapes, preprint, arXiv: 2102.09385.
    [11] H. Karimi, J. Nutini, M. Schmidt, Linear convergence of gradient and proximal-gradient methods under the Polyak-Lojasiewicz condition, in Machine Learning and Knowledge Discovery in Databases, (2016), 795–811. https://doi.org/10.1007/978-3-319-46128-1_50
    [12] K. Kurdyka, T. Mostowski, A. Parusiński, Proof of the gradient conjecture of R. Thom, Ann. Math., 152 (2000), 763–792. https://doi.org/10.2307/2661354 doi: 10.2307/2661354
    [13] J. Lee, I. Panageas, G. Piliouras, M. Simchowitz, M. Jordan, B. Recht, First-order methods almost always avoid strict saddle points, Math. Program., 176 (2019), 311–337. https://doi.org/10.1007/s10107-019-01374-3 doi: 10.1007/s10107-019-01374-3
    [14] J. D. Lee, M. Simchowitz, M. I. Jordan, B. Recht, Gradient descent only converges to minimizers, in 29th Annual Conference on Learning Theory, 49 (2016), 1246–1257. Available from: http://proceedings.mlr.press/v49/lee16.html.
    [15] S. Łojasiewicz, Sur les trajectoires du gradient d'une fonction analytique, in Geometry Seminars, (1983), 115–117.
    [16] P. Ochs, Unifying abstract inexact convergence theorems and block coordinate variable metric iPiano, SIAM J. Optim., 29 (2019), 541–570. https://doi.org/10.1137/17M1124085 doi: 10.1137/17M1124085
    [17] D. P. Bertsekas, J. N. Tsitsiklis, Gradient convergence in gradient methods with errors, SIAM J. Optim., 10 (2000), 627–642. https://doi.org/10.1137/S105262349733106 doi: 10.1137/S105262349733106
    [18] B. Fehrman, B. Gess, A. Jentzen, Convergence rates for the stochastic gradient descent method for non-convex objective functions, J. Mach. Learn. Res., 21 (2022), 5354–5401. Available from: https://dl.acm.org/doi/abs/10.5555/3455716.3455852.
    [19] Y. Lei, T. Hu, G. Li, K. Tang, Stochastic gradient descent for nonconvex learning without bounded gradient assumptions, IEEE Trans. Neural Networks Learn. Syst., 31 (2019), 4394–4400. https://doi.org/10.1109/TNNLS.2019.2952219 doi: 10.1109/TNNLS.2019.2952219
    [20] V. Patel. Stopping criteria for, and strong convergence of, stochastic gradient descent on Bottou-Curtis-Nocedal functions, Math. Program., 195 (2022), 693–734. https://doi.org/10.1007/s10107-021-01710-6 doi: 10.1007/s10107-021-01710-6
    [21] F. Santambrogio, {Euclidean, metric, and Wasserstein} gradient flows: an overview, Bull. Math. Sci., 7 (2017), 87–154. https://doi.org/10.1007/s13373-017-0101-1 doi: 10.1007/s13373-017-0101-1
    [22] S. Arora, S. Du, W. Hu, Z. Li, R. Wang, Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks, in Proceedings of the 36th International Conference on Machine Learning, 97 (2019), 322–332. Available from: http://proceedings.mlr.press/v97/arora19a.html.
    [23] L. Chizat, E. Oyallon, F. Bach, On lazy training in differentiable programming, in Advances in Neural Information Processing Systems, 32 (2019). Available from: https://proceedings.neurips.cc/paper/2019/file/ae614c557843b1df326cb29c57225459-Paper.pdf.
    [24] S. S. Du, X. Zhai, B. Póczos, A. Singh, Gradient descent provably optimizes over-parameterized neural networks, in International Conference on Learning Representations, 2019. Available from: https://openreview.net/forum?id = S1eK3i09YQ.
    [25] W. E, C. Ma, L. Wu, A comparative analysis of optimization and generalization properties of two-layer neural network and random feature models under gradient descent dynamics, Sci. China Math., 63 (2020), 1235–1258. https://doi.org/10.1007/s11425-019-1628-5 doi: 10.1007/s11425-019-1628-5
    [26] A. Jacot, F. Gabriel, C. Hongler, Neural tangent kernel: convergence and generalization in neural networks, in Advances in Neural Information Processing Systems, 31 (2018). Available from: https://proceedings.neurips.cc/paper/2018/file/5a4be1fa34e62bb8a6ec6b91d2462f5a-Paper.pdf.
    [27] A. Jentzen, T. Kröger, Convergence rates for gradient descent in the training of overparameterized artificial neural networks with biases, preprint, arXiv: 2102.11840.
    [28] G. Zhang, J. Martens, R. Grosse, Fast convergence of natural gradient descent for over-parameterized neural networks, in Advances in Neural Information Processing Systems, 32 (2019), 8082–8093. Available from: https://proceedings.neurips.cc/paper/2019/file/1da546f25222c1ee710cf7e2f7a3ff0c-Paper.pdf.
    [29] Z. Chen, G. Rotskoff, J. Bruna, E. Vanden-Eijnden, A dynamical central limit theorem for shallow neural networks, in Advances in Neural Information Processing Systems, 33 (2020), 22217–22230. Available from: https://proceedings.neurips.cc/paper/2020/file/fc5b3186f1cf0daece964f78259b7ba0-Paper.pdf.
    [30] L. Chizat, Sparse optimization on measures with over-parameterized gradient descent, Math. Program., 194 (2022), 487–532. https://doi.org/10.1007/s10107-021-01636-z doi: 10.1007/s10107-021-01636-z
    [31] L. Chizat, F. Bach, On the global convergence of gradient descent for over-parameterized models using optimal transport, in Advances in Neural Information Processing Systems, 31 (2018), 3036–3046. Available from: https://proceedings.neurips.cc/paper/2018/file/a1afc58c6ca9540d057299ec3016d726-Paper.pdf.
    [32] W. E, C. Ma, S. Wojtowytsch, L. Wu, Towards a mathematical understanding of neural network-based machine learning: what we know and what we don't, preprint, arXiv: 2009.10713.
    [33] P. Cheridito, A. Jentzen, A. Riekert, F. Rossmannek, A proof of convergence for gradient descent in the training of artificial neural networks for constant target functions, J. Complexity, 72 (2022), 101646. https://doi.org/10.1016/j.jco.2022.101646 doi: 10.1016/j.jco.2022.101646
    [34] A. Jentzen, A. Riekert, A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions, Z. Angew. Math. Phys., 73 (2022), 188. https://doi.org/10.1007/s00033-022-01716-w doi: 10.1007/s00033-022-01716-w
    [35] A. Jentzen, A. Riekert, A proof of convergence for the gradient descent optimization method with random initializations in the training of neural networks with ReLU activation for piecewise linear target functions, J. Mach. Learn. Res., 23 (2022), 1–50. Available from: https://www.jmlr.org/papers/volume23/21-0962/21-0962.pdf.
    [36] P. Cheridito, A. Jentzen, F. Rossmannek, Landscape analysis for shallow neural networks: complete classification of critical points for affine target functions, J. Nonlinear Sci., 32 (2022), 64. https://doi.org/10.1007/s00332-022-09823-8 doi: 10.1007/s00332-022-09823-8
    [37] A. Jentzen, A. Riekert, Convergence analysis for gradient flows in the training of artificial neural networks with ReLU activation, J. Math. Anal. Appl., 517 (2023), 126601. https://doi.org/10.1016/j.jmaa.2022.126601 doi: 10.1016/j.jmaa.2022.126601
    [38] D. Gallon, A. Jentzen, F. Lindner, Blow up phenomena for gradient descent optimization methods in the training of artificial neural networks, preprint, arXiv: 2211.15641.
    [39] R. T. Rockafellar, R. Wets, Variational Analysis, Springer-Verlag, Berlin, 1998. https://doi.org/10.1007/978-3-642-02431-3
    [40] E. Bierstone, P. D. Milman, Semianalytic and subanalytic sets, Inst. Hautes Études Sci. Publ. Math., 67 (1998), 5–42. https://doi.org/10.1007/BF02699126
    [41] T. Kaiser, Integration of semialgebraic functions and integrated Nash functions, Math. Z., 275 (2013), 349–366. https://doi.org/10.1007/s00209-012-1138-1 doi: 10.1007/s00209-012-1138-1
    [42] M. Coste, An introduction to semialgebraic geometry, 2000. Available from: http://blogs.mat.ucm.es/jesusr/wp-content/uploads/sites/52/2020/03/SAG.pdf.
  • This article has been cited by:

    1. Gustav Grimstad, Michael Long, Davood Dadrasajirlou, Seyed Ali Ghoreishian Amiri, Investigation of Development of the Earth Pressure Coefficient at Rest in Clay During Creep in the Framework of Hyper-Viscoplasticity, 2021, 21, 1532-3641, 04020235, 10.1061/(ASCE)GM.1943-5622.0001883
    2. Georg H. Erharter, Simon Oberhollenzer, Anna Fankhauser, Roman Marte, Thomas Marcher, Learning decision boundaries for cone penetration test classification, 2021, 36, 1093-9687, 489, 10.1111/mice.12662
    3. Jean-Sebastien L’Heureux, Tom Lunne, Characterization and Engineering properties of Natural Soils used for Geotesting, 2020, 6, 2471-2132, 35, 10.3934/geosci.2020004
    4. Jordan Aaron, 2022, 9780128182352, 46, 10.1016/B978-0-12-818234-5.00157-7
    5. Stefan Ritter, Priscilla Paniagua, Caroline Berge Hansen, Gerard Cornelissen, Biochar amendment for improved and more sustainable peat stabilisation, 2022, 1755-0750, 1, 10.1680/jgrim.22.00023
    6. Gerardo W Quirós, Patricia M Peters, Kuat C Gan, Combining NSP- and CPTu-based Nkt to evaluate undrained shear strength, 2023, 9, 2471-2132, 95, 10.3934/geosci.2023007
    7. Tonje Eide Helle, Marianne Kvennås, Bob Hamel, Stein-Are Strand, Geir Svanø, Bjørn Kristian Fiskvik Bache, Anders Samstad Gylland, Eigil Haugen, Christian Sætre, Toril Wiig, Atle Horn, Potassium chloride wells used as quick-clay landslide mitigation: installation procedures, cost–benefit analysis, and recommendations for design, 2022, 59, 0008-3674, 1660, 10.1139/cgj-2021-0230
    8. Toralv Berre, Tom Lunne, Jean-Sebastien L’Heureux, Quantification of sample disturbance for soft, lightly overconsolidated, sensitive clay samples, 2022, 59, 0008-3674, 300, 10.1139/cgj-2020-0551
    9. Sølve Hov, Priscilla Paniagua, Christian Sætre, Håkon Rueslåtten, Ingvild Størdal, Martin Mengede, Caroline Mevik, Lime-cement stabilisation of Trondheim clays and its impact on carbon dioxide emissions, 2022, 62, 00380806, 101162, 10.1016/j.sandf.2022.101162
    10. Michael Long, Priscilla Paniagua, Gustav Grimstad, Andrew Trafford, Samson Degago, Jean-Sebastien L'Heureux, Engineering properties of Norwegian peat for calculation of settlements, 2022, 308, 00137952, 106799, 10.1016/j.enggeo.2022.106799
    11. Mike Long, Priscilla Paniagua, Gustav Grimstad, Andrew Trafford, Samson Degago, Jean-Sebastien L'Heureux, Engineering Properties of Norwegian Peat for Calculation of Settlements, 2021, 1556-5068, 10.2139/ssrn.3978360
    12. Priscilla Paniagua, Stefan Ritter, Mari Moseid, Gudny Okkenhaug, 2023, Bioashes and Steel Slag as Alternative Binders in Ground Improvement of Quick Clays, 9780784484661, 25, 10.1061/9780784484661.003
    13. Stefan Ritter, Priscilla Paniagua, Gerard Cornelissen, 2023, Biochar in Quick Clay Stabilization: Reducing Carbon Footprint and Improving Shear Strength, 9780784484661, 15, 10.1061/9780784484661.002
    14. Santiago Quinteros, Antonio Carraro, Jean-Sébastien L’Heureux, Anne-Lise Berggren, A. Viana da Fonseca, C. Ferreira, Disturbance of sand samples obtained by piston samplers and ground freezing, 2024, 544, 2267-1242, 03001, 10.1051/e3sconf/202454403001
    15. Iason Papaioannou, Oindrila Kanjilal, Thi Minh Hue Le, Anteneh Biru Tsegaye, Jean-Sébastien L’Heureux, 2023, Dealing with Uncertainties in Detecting and Characterizing Quick Clay in Norway, 9780784484975, 404, 10.1061/9780784484975.042
    16. Demetrious C. Koutsoftas, Discussion of “Effect of Viscoplasticity on Localization in Saturated Clays and Plastic Silts”, 2024, 150, 1090-0241, 10.1061/JGGEFK.GTENG-11994
    17. Erik R. Sørlie, Quoc A. Tran, Gudmund R. Eiksund, Samson A. Degago, Heidi Kjennbakken, Tonni F. Johansen, Experimental study on erosion of clays during rapid flows, 2024, 312, 00298018, 119357, 10.1016/j.oceaneng.2024.119357
    18. Luisa Dhimitri, John J.M. Powell, A. Viana da Fonseca, C. Ferreira, Constrained modulus of fine-grained soils from in situ-based correlations and comparison with laboratory tests, 2024, 544, 2267-1242, 17003, 10.1051/e3sconf/202454417003
    19. Thi Minh Hue Le, Saman Tavakoli, Jean-Sebastien L’Heureux, Iason Papaioannou, 2024, Chapter 123, 978-981-99-9721-3, 1815, 10.1007/978-981-99-9722-0_123
    20. Solve Hov, Priscilla Paniagua, Christian Sætre, Mike Long, Gerard Cornelissen, Stefan Ritter, Stabilisation of Soft Clay, Quick Clay and Peat by Industrial By-Products and Biochars, 2023, 13, 2076-3417, 9048, 10.3390/app13169048
    21. Elin Asta Olafsdottir, Bjarni Bessason, Sigurdur Erlingsson, Amir M. Kaynia, A Tool for Processing and Inversion of MASW Data and a Study of Inter-session Variability of MASW, 2024, 47, 0149-6115, 1006, 10.1520/GTJ20230380
    22. S. Hov, K. Zabłocka, P. Paniagua, K. D. Weerdt, Elemental mapping by micro X-ray fluorescence to assess binder distribution of improved soils, 2024, 14, 2045-2543, 1, 10.1680/jgele.24.00078
    23. Marta Di Sante, Evelina Fratalocchi, Francesco Mazzieri, Bruno Di Buò, Tim Länsivaara, Comparative Analysis of Physico-Chemical and Potassium Sorption Properties of Sensitive Clays, 2024, 14, 2075-163X, 1273, 10.3390/min14121273
  • Reader Comments
  • © 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(2566) PDF downloads(117) Cited by(3)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog