Single-stranded DNA-binding protein 1 (SSBP1) plays an important role in DNA repair processes and the maintenance of genomic stability. The aim of this study was to evaluate the expression of SSBP1 and its prognostic value in lung adenocarcinoma (LUAD) using bioinformatics approaches.
Methods
We applied databases including UALCAN, Kaplan-Meier plotter, LinkedOmics, Webgestalt, cBioPortal and TIMER2.0 in this study.
Results
We found that SSBP1 expression was up-regulated in LUAD samples and was correlated with clinicopathological features including age, cancer stage, and nodal metastasis status by the UALCAN analysis. Multivariate Cox regression analysis by the Kaplan-Meier plotter showed that high SSBP1 expression was independently correlated with poor overall survival (hazard ratio = 1.63, 95% confidence interval: 1.08−2.46, logrank P = 0.02). The LinkedOmics analysis showed that 5078 genes were positively correlated with SSBP1 expression, whereas 7905 genes were negatively correlated with SSBP1 in LUAD. Functional enrichment analysis using the Webgestalt tool showed that for SSBP1 and the genes positively correlating with it, the significantly enriched biological process was ribosomal large subunit biogenesis, and the significantly enriched pathway was proteasome. According to the cBioPortal database, the frequency of SSBP1 alterations was 1.7% in LUAD patients, and patients with SSBP1 alterations had worse prognosis (logrank P = 4.26e-05) compared with those unaltered for SSBP1. Finally, SSBP1 expression was negatively correlated with B cell infiltration level (Rho = −0.193, P = 1.54e-05) and the expression of B cell biomarkers including CD79A and CD19.
Conclusion
Our results suggest that SSBP1 may be a prognostic marker for human LUAD.
Citation: Jian Huang, Zheng-Fu Xie. Identification of SSBP1 as a prognostic marker in human lung adenocarcinoma using bioinformatics approaches[J]. Mathematical Biosciences and Engineering, 2022, 19(3): 3022-3035. doi: 10.3934/mbe.2022139
Related Papers:
[1]
Abdulaziz S. Alghamdi, Muhammad Ahsan-ul-Haq, Ayesha Babar, Hassan M. Aljohani, Ahmed Z. Afify .
The discrete power-Ailamujia distribution: properties, inference, and applications. AIMS Mathematics, 2022, 7(5): 8344-8360.
doi: 10.3934/math.2022465
[2]
Monthira Duangsaphon, Sukit Sokampang, Kannat Na Bangchang .
Bayesian estimation for median discrete Weibull regression model. AIMS Mathematics, 2024, 9(1): 270-288.
doi: 10.3934/math.2024016
[3]
Saurabh L. Raikar, Dr. Rajesh S. Prabhu Gaonkar .
Jaya algorithm in estimation of P[X > Y] for two parameter Weibull distribution. AIMS Mathematics, 2022, 7(2): 2820-2839.
doi: 10.3934/math.2022156
[4]
Rasha Abd El-Wahab Attwa, Shimaa Wasfy Sadk, Hassan M. Aljohani .
Investigation the generalized extreme value under liner distribution parameters for progressive type-Ⅱ censoring by using optimization algorithms. AIMS Mathematics, 2024, 9(6): 15276-15302.
doi: 10.3934/math.2024742
[5]
Haiping Ren, Xue Hu .
Estimation for inverse Weibull distribution under progressive type-Ⅱ censoring scheme. AIMS Mathematics, 2023, 8(10): 22808-22829.
doi: 10.3934/math.20231162
[6]
Xue Hu, Haiping Ren .
Statistical inference of the stress-strength reliability for inverse Weibull distribution under an adaptive progressive type-Ⅱ censored sample. AIMS Mathematics, 2023, 8(12): 28465-28487.
doi: 10.3934/math.20231457
[7]
A. M. Abd El-Raheem, Ehab M. Almetwally, M. S. Mohamed, E. H. Hafez .
Accelerated life tests for modified Kies exponential lifetime distribution: binomial removal, transformers turn insulation application and numerical results. AIMS Mathematics, 2021, 6(5): 5222-5255.
doi: 10.3934/math.2021310
[8]
Heba S. Mohammed, Zubair Ahmad, Alanazi Talal Abdulrahman, Saima K. Khosa, E. H. Hafez, M. M. Abd El-Raouf, Marwa M. Mohie El-Din .
Statistical modelling for Bladder cancer disease using the NLT-W distribution. AIMS Mathematics, 2021, 6(9): 9262-9276.
doi: 10.3934/math.2021538
[9]
Qasim Ramzan, Muhammad Amin, Ahmed Elhassanein, Muhammad Ikram .
The extended generalized inverted Kumaraswamy Weibull distribution: Properties and applications. AIMS Mathematics, 2021, 6(9): 9955-9980.
doi: 10.3934/math.2021579
[10]
Nora Nader, Dina A. Ramadan, Hanan Haj Ahmad, M. A. El-Damcese, B. S. El-Desouky .
Optimizing analgesic pain relief time analysis through Bayesian and non-Bayesian approaches to new right truncated Fréchet-inverted Weibull distribution. AIMS Mathematics, 2023, 8(12): 31217-31245.
doi: 10.3934/math.20231598
Abstract
Objective
Single-stranded DNA-binding protein 1 (SSBP1) plays an important role in DNA repair processes and the maintenance of genomic stability. The aim of this study was to evaluate the expression of SSBP1 and its prognostic value in lung adenocarcinoma (LUAD) using bioinformatics approaches.
Methods
We applied databases including UALCAN, Kaplan-Meier plotter, LinkedOmics, Webgestalt, cBioPortal and TIMER2.0 in this study.
Results
We found that SSBP1 expression was up-regulated in LUAD samples and was correlated with clinicopathological features including age, cancer stage, and nodal metastasis status by the UALCAN analysis. Multivariate Cox regression analysis by the Kaplan-Meier plotter showed that high SSBP1 expression was independently correlated with poor overall survival (hazard ratio = 1.63, 95% confidence interval: 1.08−2.46, logrank P = 0.02). The LinkedOmics analysis showed that 5078 genes were positively correlated with SSBP1 expression, whereas 7905 genes were negatively correlated with SSBP1 in LUAD. Functional enrichment analysis using the Webgestalt tool showed that for SSBP1 and the genes positively correlating with it, the significantly enriched biological process was ribosomal large subunit biogenesis, and the significantly enriched pathway was proteasome. According to the cBioPortal database, the frequency of SSBP1 alterations was 1.7% in LUAD patients, and patients with SSBP1 alterations had worse prognosis (logrank P = 4.26e-05) compared with those unaltered for SSBP1. Finally, SSBP1 expression was negatively correlated with B cell infiltration level (Rho = −0.193, P = 1.54e-05) and the expression of B cell biomarkers including CD79A and CD19.
Conclusion
Our results suggest that SSBP1 may be a prognostic marker for human LUAD.
1.
Introduction
Discrete lifetime data—such as the number of appliance failures of a particular brand within a given time frame, the total number of machine operations prior to a failure, the number of bullets fired by a weapon before the first malfunction, and the anticipated lifespan of humans (in years)—are frequently handled in reliability lifetime studies. For more classic examples, see Szymkowiak and Iwinska [1]. Data scientists typically employ discrete models as analysis tools, such as the Poisson distribution, negative binomial distribution, and geometric distribution, in order to more correctly define, analyze, and model these data. But in many situations, these discrete distribution functions are not the best options. For instance, seasonal or periodic data cannot be handled by the Poisson distribution, while underdispersed data cannot be described by the negative binomial distribution. More suitable discrete lifetime distributions are required to explore many additional kinds of complex discrete lifespan data. Discretizing continuous random variables is a useful strategy that yields a discrete life model with characteristics that are comparable to the continuous model.
The essential concept of discretizing continuous random variables was first presented by Roy [2]. Specifically, let Y be a continuous random variable with a survival function denoted by S(y). Define the random variable Z=[Y] as the maximum integer less than or equal to Y. The probability mass function (PMF) P(Z=z) of Z can be expressed as
P(Z=z)=SY(z)−SY(z+1).
Many researchers have introduced various new models for discrete life distributions by the approach. For instance, the discrete normal distribution was first introduced by Roy [3]. Using the generic method of discretizing a continuous distribution, Krishna and Pundir [4] introduced the discrete Burr and Pareto distributions. In addition, Bracquemond and Gaudoin [5] provided an extensive overview of discrete distributions, such as the Weibull distribution, that are employed in reliability to describe discrete lifetimes of nonrepairable systems. It is well-known that the Weibull distribution has become the most commonly used distribution for analyzing continuous life data due to its ability to fit various types of data and relatively simple structure (Johnson et al. [6]). At least three cases exist for the corresponding discrete Weibull distribution: (a) the Type Ⅰ discrete Weibull distribution, which maintains the form of the continuous survival function (SF), as introduced by Nakagawa and Osaki [7]; (b) the Type Ⅱ discrete Weibull distribution, as suggested by Stein and Dattero [8]; and (c) the three-parameter discrete Weibull distribution, as introduced by Padgett and Spurrier [9]. The most popular of them is the Type Ⅰ discrete Weibull distribution, whose features have been extensively researched by numerous academics. Englehardt and Li [10] employed the discrete Weibull distribution to analyze pathogen counts in treated water over time. Barbiero [11,12] compared several parameter estimation methods of this distribution, and solved the minimum Chi-square and least squares estimation. Vila et al. [13] studied in detail the basic theoretical properties of the Type Ⅰ discrete Weibull and analyzed the censored data. Yoo [14] extended the application of the discrete Weibull regression model to accommodate missing data. In addition, El-Morshedy et al. [15] conducted a detailed study on a new bivariate exponential discrete Weibull distribution.
The primary goal in this study is to enhance the current techniques for estimating the complex discrete probability distribution model. The probability distribution's score function typically lacks an explicit analytical solution, hence the Newton approach is usually used to estimate the numerical solution for parameter estimation. Nevertheless, the algorithm's low convergence and strong dependence on the initial value make it challenging to achieve the best estimation outcomes. Recently, Liu et al. [16] employed the majorize minimize (MM) algorithm to enhance the resolution of the maximum likelihood estimation for the simplex distribution. Li and Tian [17] introduced a novel root-finding method known as the upper-crossing/solution (US) algorithm. In contrast to conventional iterative algorithms (like Newton's algorithm), the US algorithm can lessen the influence of initial values and achieve a strong, stable convergence to the objective equation's real root at each iteration. The benefits of this technique have been illustrated through the use of a few classic models, such as the Weibull distribution, gamma distribution, zeta distribution, and generalized Poisson distribution. Cai [18] has improved the maximum likelihood estimation of generalized gamma distribution parameters by combining the US algorithm with the second-derivative lower-bound function (SeLF) algorithm.
The essence of the US algorithm is to identify a U-function U(θ|θ(t)), which simplifies the solution of the complicated nonlinear equation h(θ)=0 to the solution of the equation U(θ|θ(t))=0 with an explicit solution. Li and Tian [17] presented a variety of approaches to discover the U-function, among which the first-derivative lower bound (FLB) function method requires only by using the first derivative of the objective function h(θ), thereby diminishing algorithmic complexity. In previous research on the US algorithm, it was generally used to solve the roots of univariate nonlinear equations or the maximum likelihood estimation problem of a multi-parameter probability distribution with an explicit partial score function. Specifically, for a probability distribution with two parameters (α,β), while solving for maximum likelihood estimation, the estimator of the parameter α can be explicitly expressed by the other estimator of the parameter β(α). However, there is no further discussion provided in Li and Tian [17] about whether this approach can be applicable in more complex multi-parameter distributions, where an estimator of one parameter cannot be clearly represented by the other, and it is an issue deserving of more investigation.
The rest of the paper will proceed as follows. Section 2 provides a detailed introduction to the US algorithm and FLB function method. The application of the suggested approach to the Type Ⅰ discrete Weibull distribution's maximum likelihood parameter estimation is covered in Section 3. Numerical simulation experiments will be conducted in Section 4 in order to evaluate the performance of the employed methods and compare them with alternative estimation approaches. Section 5 will demonstrate the applicability of the US algorithm through the analysis of two real data sets. Conclusions and discussions will be provided in Section 6.
2.
The US algorithm
One of the most frequent issues in numerical computations is figuring out the zero point of a function or an equation's root. In classical statistics, the maximum likelihood estimate (MLE) of parameters and the calculation of maximum a posteriori probability in Bayesian statistics may typically be turned into the problem of solving the zero point of a nonlinear function h(θ). In summary, since h(θ) is a nonlinear function of a single variable θ, we must identify the unique root θ⋆ such that
h(θ)=0,θ∈Θ⊆R.
(2.1)
The US algorithm is the most recent method for discovering roots. It has a similar procedure to the commonly used EM (expectation maximum) and MM (maximize minimize) algorithms[17]. There are two primary steps in this process: the upper-crossing step (U-step) and the solution step (S-step). The two primary advantages of this algorithm are as follows:
(a) It converges strongly and stably to the root θ⋆ of the Eq (2.1) with each iteration, that is, for an iterative points set sequence {θ(t)}∞t=0, there is
θ(0)<θ(1)<⋯<θ(t)<⋯≤θ⋆orθ⋆≤⋯<θ(t)<⋯<θ(1)<θ(0).
(b) The Newton algorithm's sensitivity to the initial value is decreased.
Two new symbols, sgn(α)≤ and sgn(α)≥, are introduced regarding the changing direction (CD) inequalities in order to simplify the explanation of the US algorithm. The specific definition is presented as follows: for two functions f1(x) and f2(x) on the same domain Q,
It is typically challenging to locate the root θ⋆ of the nonlinear equation h(θ)=0 directly. The US algorithm aims to create an alternative function U(θ|θ(T)) to replace h(θ), transforming the challenge of solving complex nonlinear equations into solving the equation U(θ|θ(T)) with explicit solutions. First, we assume that
h(θ)<0,∀θ>θ⋆andh(θ)>0,∀θ<θ⋆.
(2.2)
If h(θ)<0 when θ<θ⋆, then both sides of the equation h(θ)=0 can be multiplied by -1, which can also obtain the same root θ⋆ satisfying the assumption (2.2). Let θ(t) represent the solution after the (t-1)-th iteration, and the function U(θ|θ(t)) satisfying the following criteria is designated as the U-function of h(θ) at θ=θ(t):
According to the definition of the CD inequalities symbol, the above condition may be represented as
h(θ)sgn(θ−θ(t))≥U(θ|θ(t)),∀θ,θ(t)∈Θ.
(2.4)
2.2. The U-equation
As described above, the US algorithm is an iterative approach for solving nonlinear equations, with each iteration including a U-step and an S-step. The purpose of the U-step is to find a U-function that satisfies the condition (2.4), whereas the S-step involves solving the simplified U-equation: U(θ|θ(t))=0 to obtain its root θ(t+1),
θ(t+1)=sol{U(θ|θ(t))=0,∀θ,θ(t)∈Θ}.
(2.5)
In typical scenarios, θ(t+1) can be explicitly expressed, even as a linear equation. Through the iterative execution of these two steps, {θ(t)}∞t=0 can gradually converge to the real root θ⋆ of the U-equation.
2.3. The first-derivative lower bound function method
There are numerous U-functions for a given objective function h(θ); as Eq (2.4) illustrates, distinct U-functions correlate to distinct US algorithms. We may express the U-function using the lower-order derivatives of the goal function h(θ). This can be accomplished by a variety of techniques, such as the first-derivative lower bound (FLB), second-derivative lower-upper bound (SLUB) constants method, and third-derivative lower bound (TLB) constant method [17]. These three methodologies enhance efficient solutions when the objective function is complex and the solution is not closed, each with a distinct convergence speed. In terms of maximizing the objective function, the US algorithm based on the FLB approach shares qualities with the EM algorithm and the MM algorithm, both of which exhibit linear convergence. The FLB function technique, which is dependent on the target function's first-order derivative, is mostly used in this article to generate the required U-function. First for parameter space Θ, we suppose that there exists a certain first-derivative lower bound function b(θ) for the first derivative of h(θ), i.e.,
h′(θ)≥b(θ),∀θ∈Θ.
(2.6)
The U-function of h(θ) at θ=θ(t) can be formally defined as follows
where g(θ(t))=g(θ⋆)+(θ(t)−θ⋆)h′(θ⋆)+0.5(θ(t)−θ⋆)2h′′(ˆθ) is the first-order Taylor expansion around θ⋆, and θ⋆ is a point between θ(t) and θ⋆.
Although Li and Tian [17] proposed the idea of the US algorithm, in practical applications, only the distribution of univariate and binary parameters were studied. In the case of binary parameter distribution, when discussing the solution of the scoring equation, only one parameter can be explicitly expressed with another parameter. However, when one parameter cannot be explicitly expressed by the other parameter for this more general and complex situation, whether the US algorithm can be effectively applied is not further discussed, which is the issue to be carried out in this article. For the parameters of interest, the new FLB functions are constructed in this article, then, starting with initial values, the iterative values are updated using the corresponding S-step until the convergence criteria are met.
3.
The US algorithm for the MLE of the Type Ⅰ discrete Weibull distribution
The discrete Weibull distribution's score function has a complex double exponential form, which makes it impossible to depict its solution and, thus, prevents its two parameters from being mutually expressed. For investigating the US algorithm's applicability in complicated models, we combine the US algorithm with the FLB method in this section to optimize the maximum likelihood estimation.
3.1. The Type Ⅰ discrete Weibull distribution
Assuming a random variable following the Weibull distribution W(λ,β), where λ>0 and β>0, the cumulative distribution function (CDF) of the Weibull distribution is defined as H(t,λ,β)=1−e−λtβ, where t>0. Define α=e−λ, and then 0<α<1. If the probability mass function (PMF) for a random variable X can be represented as
P(X=x;α,β)=α([x]−1)β−α[x]β,(x≥1),
then we say that X follows the Type Ⅰ discrete Weibull distribution, denoted as X∼DW(θ). Here, [x] represents the maximum integer less than or equal to x. When β = 1, the discrete Weibull distribution degenerates to the geometric distribution Geo(q) with q=1−α.
Naturally, the cumulative distribution function of X takes the following form:
F(x;α,β)=1−α[x]β.
3.2. The US algorithm for the MLE of α and β
This section will go into detail on the application of the US algorithm for maximum likelihood estimation of the Type Ⅰ discrete Weibull distribution. Assume X is a random variable with the Type Ⅰ discrete Weibull distribution DW(θ), where the parameter vector θ=(α,β)T is in the parameter space Θ⊂R2. Let x=(x1,...,xn) denote the observed values of the random sample (X1,...,Xn). Then, the log-likelihood function of the parameter vector θ is given by
ℓ(θ|x)=n∑i=1log(α(xi−1)β−α(xi)β).
First, the first-order partial derivative of ℓ(θ|x) with respect to α can be calculated as
where C1=h1(α(t)), C2=∑ni=1[4x2β(t)i+2xβ(t)i], and C3=3C2[1α(t)−11−α(t)]. Similarly, we can obtain the first-order partial derivative of ℓ(θ|x) with respect to β,
In order to construct the FLB function and derive the US algorithm without explicit solutions for the two parameters, we need the following two lemmas.
The algorithm process for estimating two parameters can be described as follows. In the first stage, we determine the FLB functions for parameters α and β using Eqs (3.2) and (3.5), respectively. Subsequently, we set two initial values α(t) and β(t), calculate Eq (3.3) to get α(t+1), and then compute β(t+1) via Eq (3.6) using α(t+1) and β(t). If both of the estimates for the parameters satisfy the convergence criteria, then their corresponding values will be returned. Otherwise, we resume to update the iteration value and repeat the preceding steps until the two estimated parameters converge.
Algorithm :Calculating the MLEs of α and β via the US algorithm.
Input: The initial value α(0) and β(0); The observed data Xobs={xi}ni=0;
Output:ˆα,ˆβ.
1 Select FLB function for parameters α and β, respectively;
2 Set initial values α(t),β(t), t = 0;
3 repeat
4 Using α(t) and β(t), calculate α(t+1) based on (3.3);
5 Using α(t+1) and β(t), calculate β(t+1) based on (3.6), update t = t + 1;
6 until convergence.
4.
Simulation study
In this section, we conduct simulation studies to confirm the applicability to complex nonlinear equations and compare its performance to that of the classic Newton algorithm. First, we provide the calculation steps for parameter estimation using the Newton algorithm as follows:
The sample size of the studies is set as n = (50,100,200), and the parameters are set as α=(0.2,0.4,0.6,0.8) and β=(0.5,1.0,1.5,2.0), respectively. We independently generated X(k)1,…,X(k)niid∼DW(α,β), where k=1,…,K(K = 1000). The MLE of the parameters under the US algorithm were computed via Eqs (3.2) and (3.4). For every combination of parameters, we ran 1000 iterations of the experiments and evaluated the two methods' fitting performance using the convergence percentage and the mean squared error (MSE) of parameter estimation.
Tables 1–4 display the outcomes of the two algorithms' simulations for each scenario. The MSE of the parameters under both algorithms progressively drops as the sample size rises, according to the statistics in the table, suggesting that both techniques are asymptotically unbiased. When the value of β is fixed, as the value of α increases, the MSE of β will steadily decrease. Overall, the MSE of both parameters under the US algorithm is smaller than that of the Newton algorithm, suggesting that the US algorithm performs better when it comes to estimation. Furthermore, it is evident from the table's convergence percentages that the US algorithm is more stable.
Table 1.
The MSE and percentage from simulated data for β=0.5.
The trend of the predicted values of α and β using the US algorithm and Newton technique as the number of iterations grows is shown in Figure 1. From a stability standpoint, the Newton method shows significant instability and often requires multiple twists to preserve the correct trend, while the US algorithm approaches the true values of parameters monotonically. From a convergence speed perspective, the FLB method exhibits linear convergence, while the Newton algorithm demonstrates quadratic convergence. Consequently, the FLB method has a comparatively slower convergence rate.
This section describes the analysis of two different real data sets to illustrate the applicability of the US algorithm. The first data set in Table 5 contains the remission times in weeks of 20 leukemia patients with treatment studied by Hassan et al. [19]. Another data set is from the National Highway Traffic Safety Administration (www-fars.nhtsa.dot.gov) of the United States, which reports the number of fatalities due to motor vehicle accidents among children under the age of 5 in 32 states during the year 2022.
Table 5.
The leukaemia patients data and the vehicle fatalities data.
We model the two data sets with the Type Ⅰ discrete Weibull distribution and the geometric distribution. The fitting results for the leukemia patients data by two distributions are provided in Table 6. The results of the Cramer-von Mises test, Anderson-Darling test, and Kolmogorov-Smirnov test show that the two distributions can successfully fit the data set. The p-values from the three tests of the Type Ⅰ discrete Weibull distribution employing the US algorithm demonstrate the best fitting effect. Moreover, the values of Akaike information criterion (AIC) [20] and Bayesian information criterion (BIC) [21] also show that the DW distribution based on the US algorithm has better estimation effect.
Table 6.
Fitting for the leukaemia patients data by the DW distribution and the geometric distribution.
Table 7 shows the results of fitting the vehicle fatality data with the Type Ⅰ discrete Weibull distribution and the geometric distribution. Comparing the p-values of the three tests at the significance level α=0.05 reveals that the Geo distribution and the Type Ⅰ discrete Weibull distribution estimated by the Newton algorithm are considered to be insufficient. The fitting effectiveness of the DW distribution using the US algorithm is significant, as indicated by the values of AIC and BIC. The histogram for two different data sets evaluated by the DW distribution and the Geo distribution is shown in Figure 2. Figures 3 and 4 present QQ plots for these two distributions. It is also evident that the US algorithm performs better.
Table 7.
Fitting for the vehicle fatalities data by the DW distribution and the geometric distribution.
Figure 2.
Histogram of leukaemia patients data (left panel) and vehicle fatalities data (right panel), and the correlation density curve fitted by the DW and Geo distributions.
The US algorithm is a novel iterative method with high stability and convergence. The existing research only involves simple models such as univariate nonlinear equations or univariate functions. This paper extends the US algorithm to more complex cases of two parameter discrete distribution functions, where one parameter cannot be explicitly represented by the other parameter estimate. In order to successfully estimate the parameters, this paper combines the FLB method to perform optimization estimation of a distribution function. The simulation results for the Type Ⅰ discrete Weibull distribution demonstrate that the US algorithm has good accuracy and stability. Simultaneously, for the purpose of demonstrating the applicability of the algorithm in complex situations, this paper conducted empirical research on two real data sets that follow the Type Ⅰ discrete Weibull distribution, namely, the data from patients with leukemia and children who die from motor vehicle accidents. After comparing and analyzing the US method with the conventional Newton algorithm, the results show that the recommended strategy has an excellent fitting effect.
Author contributions
Yuanhang Ouyang: Formal analysis, Writing original draft, Software, Investigation, Methodology, Data curation; Ruyun Yan: Validation, Software, Formal analysis; Jianhua Shi: Validation, Resources, Writing-review and editing, Methodology. All authors have read and approved the final version of the manuscript for publication.
Use of AI tools declaration
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
Acknowledgments
This research was conducted under a project titled "The National Social Science Fund of China" (20XTJ003).
Conflict of interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
References
[1]
K. C. Thandra, A. Barsouk, K. Saginala, J. S. Aluru, A. Barsouk, Epidemiology of lung cancer, Contemp. Oncol., 25 (2021), 45−52. https://doi.org/10.5114/wo.2021.103829 doi: 10.5114/wo.2021.103829
[2]
Y. Li, E. Bolderson, R. Kumar, P. A. Muniandy, Y. Xue, D. J. Richard, et al., HSSB1 and hSSB2 form similar multiprotein complexes that participate in DNA damage response, J. Biol. Chem., 284 (2009), 23525−23531. https://doi.org/10.1074/jbc.C109.039586 doi: 10.1074/jbc.C109.039586
[3]
D. J. Richard, K. Savage, E. Bolderson, L. Cubeddu, S. So, M. Ghita, et al., hSSB1 rapidly binds at the sites of DNA double-strand breaks and is required for the efficient recruitment of the MRN complex, Nucleic. Acids. Res., 39 (2011), 1692−1702. https://doi.org/10.1093/nar/gkq1098 doi: 10.1093/nar/gkq1098
[4]
D. J. Richard, E. Bolderson, L. Cubeddu, R. I. Wadsworth, K. Savage, G. G. Sharma, et al., Single-stranded DNA-binding protein hSSB1 is critical for genomic stability, Nature, 453 (2008), 677−681. https://doi.org/10.1038/nature06883 doi: 10.1038/nature06883
[5]
P. Gu, W. Deng, M. Lei, S. Chang, Single strand DNA binding proteins 1 and 2 protect newly replicated telomeres, Cell Res., 23 (2013), 705−719. https://doi.org/10.1038/cr.2013.31 doi: 10.1038/cr.2013.31
[6]
Y. S. Kim, J. D. Hwan, S. Bae, D. H. Bae, W. A. Shick, Identification of differentially expressed genes using an annealing control primer system in stage Ⅲ serous ovarian carcinoma, BMC. Cancer., 10 (2010), 576. https://doi.org/10.1186/1471-2407-10-576 doi: 10.1186/1471-2407-10-576
[7]
Y. Ye, A. Huang, C. Huang, J. Liu, B. Wang, K. Lin, et al., Comparative mitochondrial proteomic analysis of hepatocellular carcinoma from patients, Proteomics Clin. Appl., 7 (2013), 403−415. doi: 10.1002/prca.201100103 doi: 10.1002/prca.201100103
[8]
Y. Yang, C. Pan, L. Yu, H. Ruan, L. Chang, J. Yang, et al., SSBP1 Upregulation in colorectal cancer regulates mitochondrial mass, Cancer Manag. Res., 11 (2019), 10093−10106. https://doi.org/10.2147/CMAR.S211292 doi: 10.2147/CMAR.S211292
[9]
D. S. Chandrashekar, B. Bashel, S. A. H. Balasubramanya, C. J. Creighton, I. Ponce-Rodriguez, B. V. S. K. Chakravarthi, et al., UALCAN: A portal for facilitating tumor subgroup gene expression and survival analyses, Neoplasia, 19 (2017), 649−658. https://doi.org/10.1016/j.neo.2017.05.002 doi: 10.1016/j.neo.2017.05.002
[10]
C. H. Chen, J. M. Lai, T. Y. Chou, C. Y. Chen, L. J. Su, Y. C. Lee, et al., VEGFA upregulates FLJ10540 and modulates migration and invasion of lung cancer via PI3K/AKT pathway, PLoS One, 4 (2009), e5052. https://doi.org/10.1371/journal.pone.0005052 doi: 10.1371/journal.pone.0005052
[11]
M. Kabbout, M. M. Garcia, J. Fujimoto, D. D. Liu, D. Woods, C. W. Chow, et al., ETS2 mediated tumor suppressive function and MET oncogene inhibition in human non-small cell lung cancer, Clin. Cancer Res., 19 (2013), 3383−3395. https://doi.org/10.1158/1078-0432.CCR-13-0341 doi: 10.1158/1078-0432.CCR-13-0341
[12]
B. Györffy, A. Lanczky, A. C. Eklund, C. Denkert, J. Budczies, Q. Li, et al., An online survival analysis tool to rapidly assess the effect of 22,277 genes on breast cancer prognosis using microarray data of 1,809 patients, Breast. Cancer Res. Treat., 123 (2010), 725−731. https://doi.org/10.1007/s10549-009-0674-9 doi: 10.1007/s10549-009-0674-9
[13]
S. V. Vasaikar, P. Straub, J. Wang, B. Zhang, LinkedOmics: analyzing multi-omics data within and across 32 cancer types, Nucleic Acids Res., 46 (2018), D956−D963. https://doi.org/10.1093/nar/gkx1090 doi: 10.1093/nar/gkx1090
[14]
Y. Liao, J. Wang, E. J. Jaehnig, Z. Shi, B. Zhang, WebGestalt 2019: gene set analysis toolkit with revamped UIs and APIs, Nucleic Acids Res., 47 (2019), W199−W205. https://doi.org/10.1093/nar/gkz401 doi: 10.1093/nar/gkz401
[15]
L. Chang, G. Zhou, O. Soufan, J. Xia, miRNet 2.0: network-based visual analytics for miRNA functional analysis and systems biology, Nucleic Acids Res., 48 (2020), W244−W251. https://doi.org/10.1093/nar/gkaa467 doi: 10.1093/nar/gkaa467
[16]
T. Li, J. Fan, B. Wang, N. Traugh, Q. Chen, J. S. Liu, et al., TIMER: A web server for comprehensive analysis of tumor-infiltrating immune cells, Cancer Res., 77 (2017), e108−e110. https://doi.org/10.1158/0008-5472.CAN-17-0307 doi: 10.1158/0008-5472.CAN-17-0307
[17]
Z. W. Chen, B. Liu, N. W. Tang, Y. H. Xu, X. Y. Ye, Z. M. Li, et al., FBXL5-mediated degradation of single-stranded DNA-binding protein hSSB1 controls DNA damage response, Nucleic Acids Res., 42 (2014), 11560−11569. https://doi.org/10.1093/nar/gku876 doi: 10.1093/nar/gku876
[18]
Y. Wang, L. Hu, X. Zhang, H. Zhao, H. Xu, Y. Wei, et al., Downregulation of mitochondrial single stranded DNA binding protein (SSBP1) induces mitochondrial dysfunction and increases the radiosensitivity in non-small cell lung cancer cells, J. Cancer, 8 (2017), 1400−1409. https://doi.org/10.7150/jca.18170.eCollection2017 doi: 10.7150/jca.18170.eCollection2017
[19]
E. Koedoot, M. Fokkelman, V. M. Rogkoti, M. Smid, I. van de Sandt, H. de Bont, et al., Uncovering the signaling landscape controlling breast cancer cell migration identifies novel metastasis driver genes, Nat. Commun., 10 (2019), 2983. https://doi.org/10.1038/s41467-019-11020-3 doi: 10.1038/s41467-019-11020-3
[20]
W. Xu, H. Huang, L. Yu, L. Cao, Meta-analysis of gene expression profiles indicates genes in spliceosome pathway are up-regulated in hepatocellular carcinoma (HCC), Med. Oncol., 32 (2015), 96. https://doi.org/10.1007/s12032-014-0425-6 doi: 10.1007/s12032-014-0425-6
[21]
T. Y. Hsu, L. M. Simon, N. J. Neill, R. Marcotte, A. Sayad, C. S. Bland, et al., The spliceosome is a therapeutic vulnerability in MYC-driven cancer, Nature., 525 (2015), 384−388. https://doi.org/10.1038/nature14985 doi: 10.1038/nature14985
[22]
T. Saiki, T. Kawai, K. Morita, M. Ohta, T. Saito, K. Rokutan, et al., Identification of marker genes for differential diagnosis of chronic fatigue syndrome, Mol. Med., 14 (2008), 599−607. https://doi.org/10.2119/2007-00059.Saiki doi: 10.2119/2007-00059.Saiki
[23]
J. He, H. C. Ford, J. Carroll, C. Douglas, E. Gonzales, S. Ding, et al., Assembly of the membrane domain of ATP synthase in human mitochondria, Proc. Natl. Acad. Sci., 115 (2018), 2988−2993. https://doi.org/10.1073/pnas.1722086115 doi: 10.1073/pnas.1722086115
[24]
C. Galber, G. Minervini, G. Cannino, F. Boldrin, V. Petronilli, S. Tosatto, et al., The f subunit of human ATP synthase is essential for normal mitochondrial morphology and permeability transition, Cell. Rep., 35 (2021), 109111. https://doi.org/10.1016/j.celrep.2021.109111 doi: 10.1016/j.celrep.2021.109111
[25]
S. Tamilzhalagan, M. Muthuswami, J. Periasamy, M. H. Lee, S. Y. Rha, P. Tan, et al., Upregulated, 7q21-22 amplicon candidate gene SHFM1 confers oncogenic advantage by suppressing p53 function in gastric cancer, Cell. Signal., 27 (2015), 1075−1086. https://doi.org/10.1016/j.cellsig.2015.02.010 doi: 10.1016/j.cellsig.2015.02.010
[26]
K. L. Pokrzywinski, T. G. Biel, D. Kryndushkin, V.A. Rao, Therapeutic targeting of the mitochondria initiates excessive superoxide production and mitochondrial depolarization causing decreased mtDNA integrity, PLoS. One., 11 (2016), e0168283. https://doi.org/10.1371/journal.pone.0168283 doi: 10.1371/journal.pone.0168283
[27]
A. Tartarone, V. Lapadula, C. Di Micco, G. Rossi, C. Ottanelli, A. Marini, et al., Beyond conventional: The new horizon of targeted therapy for the treatment of advanced non small cell lung cancer, Front. Oncol., 11 (2021), 632256. https://doi.org/10.3389/fonc.2021.632256 doi: 10.3389/fonc.2021.632256
[28]
Z. Tan, H. Xue, Y. Sun, C. Zhang, Y. Song, Y. Qi, The role of tumor inflammatory microenvironment in lung cancer, Front. Pharmacol., 12 (2021), 688625. https://doi.org/10.3389/fphar.2021.688625 doi: 10.3389/fphar.2021.688625
[29]
A. M. van der Leun, D. S. Thommen, T. N. Schumacher, CD8+ T cell states in human cancer: insights from single-cell analysis, Nat. Rev. Cancer, 20 (2020), 218−232. https://doi.org/10.1038/s41568-019-0235-4 doi: 10.1038/s41568-019-0235-4
[30]
J. Mei, Z. Xiao, C. Guo, Q. Pu, L. Ma, C. Liu, et al., Prognostic impact of tumor-associated macrophage infiltration in non-small cell lung cancer: A systemic review and meta-analysis, Oncotarget., 7 (2016), 34217−34228. https://doi.org/10.18632/oncotarget.9079 doi: 10.18632/oncotarget.9079
[31]
S. S. Wang, W. Liu, D. Ly, H. Xu, L. Qu, L. Zhang, Tumor-infiltrating B cells: Their role and application in anti-tumor immunity in lung cancer, Cell. Mol. Immunol., 16 (2019), 6−18. https://doi.org/10.1038/s41423-018-0027-x doi: 10.1038/s41423-018-0027-x
Jian Huang, Zheng-Fu Xie. Identification of SSBP1 as a prognostic marker in human lung adenocarcinoma using bioinformatics approaches[J]. Mathematical Biosciences and Engineering, 2022, 19(3): 3022-3035. doi: 10.3934/mbe.2022139
Jian Huang, Zheng-Fu Xie. Identification of SSBP1 as a prognostic marker in human lung adenocarcinoma using bioinformatics approaches[J]. Mathematical Biosciences and Engineering, 2022, 19(3): 3022-3035. doi: 10.3934/mbe.2022139
Figure 1. SSBP1 mRNA expression was evaluated using the UALCAN database. (A) Up-regulation of SSBP1 mRNA expression in lung adenocarcinoma samples. (B) The mRNA expression of SSBP1 in subgroups of gender. (C) The mRNA expression of SSBP1 in subgroups of age. (D) The mRNA expression of SSBP1 in subgroups of tobacco smoking. (E) The mRNA expression of SSBP1 in subgroups of cancer stage. (F) The mRNA expression of SSBP1 in subgroups of nodal metastasis status. (G) Evaluation of SSBP1 mRNA expression in GSE43458. (H) Evaluation of SSBP1 mRNA expression in GSE7670
Figure 2. The correlation between SSBP1 expression and lung adenocarcinoma prognosis. (A) The high-SSBP1 expression group had considerably worse overall survival than the medium/low-SSBP1 expression group according to the UALCAN database. (B) The high-SSBP1 expression group had considerably worse overall survival than the low-SSBP1 expression group according to the Kaplan-Meier plotter. (C) Multivariate Cox regression analysis showed that the high-SSBP1 expression group was correlated with poor overall survival
Figure 3. SSBP1 co-expressed genes and functional enrichment analysis (A) A volcano plot demonstrated that 5078 genes were positively correlated with SSBP1 expression, whereas 7905 genes were negatively correlated with the expression of SSBP1. (B) The heat map showed the top 50 genes that were positively correlated with SSBP1 expression. (C) The heat map showed the top 50 genes that were negatively correlated with SSBP1 expression. (D) The gene ontology analysis (biological process) for SSBP1 and the genes that were positively correlated with it. (E) The Kyoto Encyclopedia of Genes and Genomes pathway analysis for SSBP1 and the genes that were positively correlated with it. (F) The gene ontology analysis (biological process) for the genes that were negatively correlated with SSBP1. (G) The Kyoto Encyclopedia of Genes and Genomes pathway analysis for the genes that were negatively correlated with SSBP1
Figure 4. The genes that showed the strongest correlation with SSBP1 expression and their correlation with lung adenocarcinoma prognosis. (A) The genes that showed the strongest correlation with SSBP1 expression. (B) The correlation between the genes that showed the strongest correlation with SSBP1 expression and lung adenocarcinoma prognosis
Figure 5. cBioPortal analysis for SSBP1. (A) The frequency of SSBP1 alterations in lung adenocarcinoma patients. (B) Lung adenocarcinoma patients with SSBP1 alterations had worse prognosis compared with those unaltered for SSBP1. (C) Genes in the SSBP1 altered group had significantly higher altered frequency than the SSBP1 unaltered group. (D) Association of SSBP1 alterations with alterations of major oncogenic drivers. (E) The top 5 genes with highest frequency in the SSBP1 altered group
Figure 6. Survival analysis for the top 5 genes with highest frequency in the SSBP1 altered group. (A-E) The top 5 genes with highest frequency in the SSBP1 altered group were DENND11, TAS2R4, TAS2R3, WEE2 and AGK; the alterations of these genes were correlated with poor prognosis in lung adenocarcinoma
Figure 7. The correlation of SSBP1 expression with immune cell infiltration in lung adenocarcinoma. (A) SSBP1 expression was negatively correlated with B cell infiltration level. (B) SSBP1 expression was negatively correlated with the expression of B cell biomarkers including CD79A and CD19. (C) The low B cell group had considerably worse prognosis than the high B cell group
Catalog
Abstract
1.
Introduction
2.
The US algorithm
2.1. Definition of the U-function
2.2. The U-equation
2.3. The first-derivative lower bound function method
3.
The US algorithm for the MLE of the Type Ⅰ discrete Weibull distribution