This paper focuses on the adaptive reinforcement learning-based optimal control problem for standard nonstrict-feedback nonlinear systems with the actuator fault and an unknown dead zone. To simultaneously reduce the computational complexity and eliminate the local optimal problem, a novel neural network weight updated algorithm is presented to replace the classic gradient descent method. By utilizing the backstepping technique, the actor critic-based reinforcement learning control strategy is developed for high-order nonlinear nonstrict-feedback systems. In addition, two auxiliary parameters are presented to deal with the input dead zone and actuator fault respectively. All signals in the system are proven to be semi-globally uniformly ultimately bounded by Lyapunov theory analysis. At the end of the paper, some simulation results are shown to illustrate the remarkable effect of the proposed approach.
Citation: Zichen Wang, Xin Wang. Fault-tolerant control for nonlinear systems with a dead zone: Reinforcement learning approach[J]. Mathematical Biosciences and Engineering, 2023, 20(4): 6334-6357. doi: 10.3934/mbe.2023274
[1] | Mingyu Zhang . On the Cauchy problem of 3D nonhomogeneous micropolar fluids with density-dependent viscosity. AIMS Mathematics, 2024, 9(9): 23313-23330. doi: 10.3934/math.20241133 |
[2] | Mingyu Zhang . On the Cauchy problem of compressible Micropolar fluids subjected to Hall current. AIMS Mathematics, 2024, 9(12): 34147-34183. doi: 10.3934/math.20241627 |
[3] | Mingyu Zhang . Regularity and uniqueness of 3D compressible magneto-micropolar fluids. AIMS Mathematics, 2024, 9(6): 14658-14680. doi: 10.3934/math.2024713 |
[4] | Li Lu . One new blow-up criterion for the two-dimensional full compressible magnetohydrodynamic equations. AIMS Mathematics, 2023, 8(7): 15876-15891. doi: 10.3934/math.2023810 |
[5] | Zhongying Liu . Global well-posedness to the Cauchy problem of 2D inhomogeneous incompressible magnetic Bénard equations with large initial data and vacuum. AIMS Mathematics, 2021, 6(11): 12085-12103. doi: 10.3934/math.2021701 |
[6] | Ahmed G. Salem, Turki D. Alharbi, Abdulaziz H. Alharbi, Anwar Ali Aldhafeeri . Impact of a spherical interface on a concentrical spherical droplet. AIMS Mathematics, 2024, 9(10): 28400-28420. doi: 10.3934/math.20241378 |
[7] | Ahmad Mohammad Alghamdi, Sadek Gala, Jae-Myoung Kim, Maria Alessandra Ragusa . The anisotropic integrability logarithmic regularity criterion to the 3D micropolar fluid equations. AIMS Mathematics, 2020, 5(1): 359-375. doi: 10.3934/math.2020024 |
[8] | Yingying Chen, Lan Huang, Jianwei Yang . Large time behavior of the Euler-Poisson system coupled to a magnetic field. AIMS Mathematics, 2023, 8(5): 11460-11479. doi: 10.3934/math.2023580 |
[9] | Muhammad Naqeeb, Amjad Hussain, Ahmad Mohammed Alghamdi . Blow-up criteria for different fluid models in anisotropic Lorentz spaces. AIMS Mathematics, 2023, 8(2): 4700-4713. doi: 10.3934/math.2023232 |
[10] | Abdulaziz H. Alharbi, Ahmed G. Salem . Analytical and numerical investigation of viscous fluid-filled spherical slip cavity in a spherical micropolar droplet. AIMS Mathematics, 2024, 9(6): 15097-15118. doi: 10.3934/math.2024732 |
This paper focuses on the adaptive reinforcement learning-based optimal control problem for standard nonstrict-feedback nonlinear systems with the actuator fault and an unknown dead zone. To simultaneously reduce the computational complexity and eliminate the local optimal problem, a novel neural network weight updated algorithm is presented to replace the classic gradient descent method. By utilizing the backstepping technique, the actor critic-based reinforcement learning control strategy is developed for high-order nonlinear nonstrict-feedback systems. In addition, two auxiliary parameters are presented to deal with the input dead zone and actuator fault respectively. All signals in the system are proven to be semi-globally uniformly ultimately bounded by Lyapunov theory analysis. At the end of the paper, some simulation results are shown to illustrate the remarkable effect of the proposed approach.
Consider the linear hybrid time-delay system
˙x(t)=Ax(t)+Bx(t−τ)+C∫0−hx(t+θ)dθ, | (1.1) |
where x(t)∈Rn, the matrices A,B and C are in Rn×n, and τ≥0 and h≥0 are constant delays. Systems of this nature are extensively employed in modeling dynamical systems in economics and population dynamics, as well as in the study of deformable solids with memory, and wave processes in extended electrical circuits [1,2].
A system is said to be positive if for any nonnegative initial condition, the solutions x(t) of the system remain nonnegative for all t≥0.
Throughout this paper, we will be dealing with real square matrices. In addition, for a symmetric matrix A∈Rn×n, we shall write A≺0 (A⪯0, resp.) if A is negative definite (semidefinite, resp.).
Let A=[aij]ni,j=1 be a real n×n matrix. Then, A is defined as follows:
● Metzler matrix: if aij≥0 for all i≠j, where i,j=1,…,n.
● Nonnegative matrix: if aij≥0 for all i,j=1,…,n.
● Hurwitz matrix: if all its eigenvalues have negative real parts.
For further details on such matrices, refer to [3].
The following result, known as the Lyapunov theorem, provides a necessary and sufficient condition for a matrix to be Hurwitz.
Lemma 1.1. [4] Let A∈Rn×n. Then, A is Hurwitz if and only if there exists a positive definite matrix P∈Rn×n such that
ATP+PA≺0. | (1.2) |
The matrix inequality (1.2) is commonly referred to as the Lyapunov inequality. In the case where P is a positive diagonal matrix satisfying the above Lyapunov inequality, A is called a Lyapunov diagonally stable matrix. It follows from the definition of Hurwitz matrices that a Lyapunov diagonally stable matrix is Hurwitz. For those interested in gaining a deeper understanding and broader context of this topic from the perspective of matrix theory, we direct the reader to [3,4,5,6]. These references offer an in-depth and thorough exploration of the subject, covering essential aspects and methodologies integral to the field. They provide valuable insights and a comprehensive overview that enrich and support the discussion presented in our work.
Lyapunov diagonal stability has been applied in various fields, such as population dynamics [7,8], communication networks [9], and systems theory [3]. Given its significance, it has been extensively studied in the literature [10,11,12,13,14,15].
This notion of matrix stability has been expanded to include simultaneous Lyapunov diagonal stability. This involves the existence of a positive diagonal matrix D that satisfies the Lyapunov inequality for a family of matrices A={Ai}ri=1, where Ai∈Rn×n for i=1,…,r. This form of stability is referred to as common Lyapunov diagonal stability. There is more than one approach used in the literature to characterize this type of stability. In [16], a theorem of alternatives for linear maps over inner product spaces is presented. Another approach, developed in [17], utilizes the notion of P-matrices and Hadamard products. In [18], the Khatri-Rao products are used to derive further characterizations. For more recent developments in this area, see [16,17,18,19,20,21,22].
Returning to the system (1.1), let us quote a few results. The first one provides a characterization of system (1.1) assuming it is a positive system. Meanwhile, the second result offers a necessary and sufficient condition for the asymptotic stability of the system under the same assumption.
Lemma 1.2. [23] The system given in (1.1) is positive if and only if A is Metzler; meanwhile, B and C are nonnegative.
Lemma 1.3. [24] If (1.1) is a positive system, then it is asymptotically stable if and only if
A+B+hC |
is a Hurwitz matrix.
According to [25], when employing the direct Lyapunov method for the stability analysis of the system (1.1), a Lyapunov-Krasovskii functional can be determined and expressed in the following form:
V(x)=xT(t)Px(t)+∫tt−τxT(θ)Qx(θ)dθ+∫0−h∫tt+θxT(u)Rx(u)dudθ, | (1.3) |
where \(P \), \(Q \), and \(R \) are positive definite matrices.
The time-delayed system in (1.1) is said to be as diagonally stable if a Lyapunov-Krasovskii functional, as described in (1.3), exists with positive diagonal matrices P, Q, and R that meet the criteria of the Krasovskii theorem on asymptotic stability [26].
Theorem 1.1. [27] If system (1.1) is positive, then it is asymptotically stable if and only if it is diagonally stable.
As it has been shown in [27], a positive system in the form of (1.1) is diagonally stable if and only if there exist a triple of positive diagonal matrices P, Q, and R satisfying the following linear matrix inequality:
[ATP+PA+Q+hRPBhPCBTP−Q0hCTP0−hR]≺0. |
The main focus of the work of this paper is deriving a characterization for the existence of a diagonal solution for the above inequality in the case that A is a Metzler matrix with B and C being nonnegative matrices. Without loss of generality, we assume h=1. In other words, we investigate the existence of positive diagonal solution for the inequality in the form
[ATP+PA+Q+RPBPCBTP−Q0CTP0−R]≺0. | (1.4) |
An immediate observation here is that when B=C=0, this last inequality reduces to the Lyapunov inequality. Meanwhile, if only C=0, it becomes the Riccati inequality; see [28] for further details.
In a Euclidean vector space over R, nonempty disjoint convex sets can be separated by a hyperplane. The hyperplane separation theorem, which appears in various forms in the literature (see [29]) is particularly relevant when one of the convex sets is a cone, as highlighted in the following lemma. This version, which we will refer to as the Separation Theorem, is an important result for this paper.
Lemma 1.4. [30] Let U and V be nonempty convex subsets in a Euclidean space E (over R) with an inner product ⟨⋅,⋅⟩. In addition, suppose that V is a cone and the intersection of U and V is empty. Then, there exists a nonzero vector u in E such that
⟨u,z⟩≥0,∀z∈U, |
⟨u,z⟩≤0,∀z∈V. |
It is well-known that the cone of positive semidefinite matrices in Rn×n is self-dual. The next lemma demonstrates this.
Lemma 1.5. [29] Let W denote the space of all n×n symmetric matrices equipped with the inner product ⟨X,Y⟩=tr(XY) for all X,Y∈W. Then, the cone of positive semidefinite matrices S in W is self-dual. That is, if H∈W satisfies
tr(HX)≥0,∀X∈S, |
then H∈S.
We start this section with some technical lemmas which are going to be necessary for us to develop the main results of this paper.
Lemma 2.1. [19] Suppose \(A \in \mathbb{R}^{n \times n} \) is a Metzler matrix and \(B \in \mathbb{R}^{n \times n} \) is a nonnegative matrix. If the matrix \(C = A + B \) is Hurwitz, then there exists a positive diagonal matrix \(D \in \mathbb{R}^{n \times n} \) such that the following inequalities hold:
ATD+DA≺0 |
and
CTD+DC≺0. |
Lemma 2.2. [28] Suppose \(A \in \mathbb{R}^{n \times n} \) is a Metzler matrix and \(H \in \mathbb{R}^{n \times n} \) is a positive semidefinite matrix. Furthermore, let \(u \in \mathbb{R}^n \) be a nonnegative vector such that \(u_i = \sqrt{h_{ii}} \) for \(i = 1, \dots, n \). Then, the following inequality holds:
tr(uuTA)≥tr(HA). |
Using the Schur complement, we can derive the following result, which provides an alternative method for finding the solution to (1.1). For more detailed discussions on the Schur complement and its applications, see, for example, [31].
Lemma 2.3. Let \(A, B, C \in \mathbb{R}^{n \times n} \). There exist positive definite matrices \(P, Q, \) and \(R \) in \(\mathbb{R}^{n \times n} \) satisfying the inequality
[ATP+PA+Q+RPBPCBTP−Q0CTP0−R]≺0 | (2.1) |
if and only if
ATP+PA+Q+R+PBQ−1BTP+PCR−1CTP≺0. |
Proof. It follows directly from the Schur complement that inequality (2.1) is true if and only if the following is satisfied:
(ⅰ) [−Q00−R]≺0, and
(ⅱ) ATP+PA+Q+R−[PBPC][−Q00−R]−1[BTPCTP]≺0.
Since Q and R are positive definite matrices, then (i) is always true. Meanwhile, (ii) is equivalent to
ATP+PA+Q+R+PBQ−1BTP+PCR−1CTP≺0. |
Theorem 2.1. Let A,B,C∈Rn×n such that A is a Metzler matrix, while B and C are nonnegative matrices. If there is a positive diagonal matrix D and any positive definite matrices Q and R satisfying the inequality (1.4), then the matrix A+B+C is Lyapunov diagonally stable.
Proof. Suppose that D∈Rn×n is a positive diagonal matrix and Q,R∈Rn×n are positive definite matrices satisfying the inequality in (1.4). Thus, for any nonzero vector u∈Rn, construct a vector v∈R3n to be such that v=[uuu]. Clearly, v is nonzero since u is nonzero. Therefore, it follows that
vT[ATD+DA+Q+RDBDCBTD−Q0CTD0−R]v=[uTuTuT][ATD+DA+Q+RDBDCBTD−Q0CTD0−R][uuu]<0. |
This means that uT(ATD+DA+BTD+DB+CTD+DC)u<0, i.e.,
(A+B+C)TD+D(A+B+C)≺0. |
This implies that the matrix A+B+C is Lyapunov diagonally stable.
We are now ready to present our main result, the proof of which follows the approach outlined in Theorem 3.1 from [28].
Theorem 2.2. Let A,B,C∈Rn×n such that A is a Metzler matrix, while B and C are nonnegative matrices. Then, there is a positive diagonal matrix D and positive definite matrices Q and R satisfying the inequality (1.4) if and only if the matrix A+B+C is Hurwitz.
Proof. Necessity: It is immediate from Theorem 2.1 and the Laypunov theorem.
Sufficiency: Suppose that A+B+C is a Hurwitz matrix. Observe that since both B and C are both nonnegative, then B+C is also a nonnegative matrix. Hence, since A is Metzler, by Lemma 2.1, there is a positive diagonal matrix D∈Rn×n satisfying the following inequality:
(A+B+C)TD+D(A+B+C)≺0. | (2.2) |
Select a matrix D satisfying (2.2). Therefore, to finish this direction, we need to show that there is a pair of positive definite matrices B and C in Rn×n such that
[ATD+DA+Q+RDBDCBTD−Q0CTD0−R]≺0. | (2.3) |
We shall proceed with a contrapositive argument. First, suppose that there are no positive definite matrices Q and R in Rn×n satisfying (2.3). Next, consider the space E consisting of all 3n×3n real symmetric matrices, with an inner product defined by
⟨F,G⟩=tr(FG), |
where F,G∈E. Furthermore, consider the following sets:
U={[ATD+DA+Q+RDBDCBTD−Q0CTD0−R]| Q≻0,R≻0 in Rn×n}, | (2.4) |
and
V={S≺0 | S∈R3n×3n}. | (2.5) |
Since there are no positive definite matrices satisfying (2.3), we must have U∩V=∅. Furthermore, it is clear that U and V represent nonempty convex cones within the vector space E. Based on these observations, the separation theorem, specifically Lemma 1.4, is applicable in this context. Consequently, there exists a nonzero matrix H∈E such that
tr(HX)≥0,∀X∈U, | (2.6) |
tr(HY)≤0,∀Y∈V. | (2.7) |
According to Lemma 1.5, inequality (2.7) indicates that H⪰0. Next, partition H into a 3 by 3 block matrix as follows:
H=[H11H12H13HT12H22H23HT13HT23H33], |
where each Hij∈Rn×n. Consequently, from inequality (2.6), we have
tr(HX)=tr([H11H12H13HT12H22H23HT13HT23H33][ATD+DA+Q+RDBDCBTD−Q0CTD0−R])≥0 |
for all Q≻0 and R≻0 in Rn×n. By expanding the matrix multiplication in this last trace and recalling that the trace is the sum of the diagonal entries, we obtain the following expression:
tr(H11(ATD+DA)+H12BTD+H13CTD+HT12DB+(H11−H22)Q+HT13DC+(H11−H33)R)≥0 |
for all Q≻0 and R≻0. Reordering this last inequality, we conclude that for all Q≻0 and R≻0, we have
tr(H11(ATD+DA)+H12BTD+H13CTD+HT12DB+HT13DC)≥tr((H22−H11)Q+(H33−H11)R). | (2.8) |
For any given positive definite matrix Q, if tr(Q(H22−H11))>0, this leads to a contradiction with (2.8). To demonstrate this, let us consider the positive definite matrix tQ, where t>0 is a sufficiently large scalar. This substitution will reverse the inequality in (2.8). Therefore, we conclude that
tr(Q(H22−H11))≤0 | (2.9) |
for all positive definite matrices Q. Using a similar argument, it can be shown that for all positive definite matrices R,
tr(R(H33−H11))≤0. | (2.10) |
Now, the following trace:
tr(H11(ATD+DA)+H12BTD+H13CTD+HT12DB+HT13DC) |
must be negative. Otherwise, we will obtain a contradiction to (2.8). To see that, let t>0 be sufficiently small scalar. Then, consider the positive definite matrices tQ and tR and substitute them in inequality (2.8). Doing this would result in the following:
tr((H22−H11)tQ+(H33−H11)tR)>tr(H11(ATD+DA)+H12BTD+H13CTD+HT12DB+HT13DC). |
This is clearly contradicts that (2.8) holds for all positive definite matrices Q and R. Therefore, we must have
tr(H11(ATD+DA)+H12BTD+H13CTD+HT12DB+HT13DC)≥0. | (2.11) |
Now, recall that the matrix H is partitioned into 3 by 3 block matrices with each block in Rn×n. Let us construct a vector u∈R3n as the following: u=[xyz], where xj=√(H11)jj, yj=√(H22)jj, and zj=√(H33)jj. This means that x2j is the jth diagonal component of H11, y2j is the jth diagonal component of H22, and z2j is the jth diagonal component of H33, j=1,…,n.
According to Lemma 1.5, inequality (2.9) implies that H22−H11⪯0 and inequality (2.10) implies that H33−H11⪯0. This is equivalent to H11−H22⪰0 and H11−H33⪰0, i.e., xj−yj≥0 and xj−zj≥0 for j=1,…,n. Since H22⪰0 and H33⪰0, we must have yj≥0 and zj≥0. Therefore, for j=1,…,n, xj≥yj≥0 and xj≥zj≥0. Also, observe that x is a nonzero vector; otherwise y=0 and z=0, meaning that H is a zero matrix and this leads to a contradiction.
Additionally, note that we have
tr(H11(ATD+DA)+H12BTD+H13CTD+HT12DB+HT13DC)=tr([H11H12H13HT12H22H23HT13HT23H33][ATD+DADBDCBTD00CTD00]). | (2.12) |
Recall that H is a positive semidefinite matrix. On the other hand, it is not difficult to see that the matrix
[ATD+DADBDCBTD00CTD00] |
is Metzler. Thus, by Lemma 2.2 it follows that
tr(uuT[ATD+DADBDCBTD00CTD00])≥tr([H11H12H13HT12H22H23HT13HT23H33][ATD+DADBDCBTD00CTD00]). | (2.13) |
From (2.11), we obtain that
tr(uuT[ATD+DADBDCBTD00CTD00])≥0, | (2.14) |
which is the same as the following trace:
tr(xxT(ATD+DA)+xyTBTD+xzTCTD+yxTDB+zxTDC)≥0. | (2.15) |
This is identical to
xTDAx+xTDBy+xTDCz≥0. | (2.16) |
We note that B, C, and D all contain nonnegative entries. Additionally, for each j, we have xj≥yj and xj≥zj. Thus, we conclude that
xTDBx≥xTDBy |
and
xTDCx≥xTDCz. |
By this and (2.16), we can see that
xTDAx+xTDBx+xTDCx≥xTDAx+xTDBy+xTDCz≥0, |
i.e.,
xT(DA+DB+DC)x≥0. |
This last inequality suggests that
xT(ATD+DA+BTD+DB+CTD+DC)x=xT((A+B+C)TD+D(A+B+C))x≥0. |
This contradicts that D satisfies the inequality in (2.2). This means that there is Q≻0 and R≻0 in Rn×n satisfying (2.3). This completes the proof.
We note that when C=0, Theorem 2.2 coincides with Theorem 3.1 in [28].
In this paper, we have derived a characterization for the existence of diagonal solutions for a class of a linear matrix inequality. We considered systems where the matrices involved are Metzler and nonnegative, and we established conditions for the asymptotic stability of these systems. Using the separation theorems, we proved that if there exist positive diagonal matrices satisfying certain inequalities, then the system matrices are Hurwitz. Our findings extend the current understanding of Lyapunov diagonal stability and provide practical criteria for ensuring the stability of positive time-delay systems.
Our work broadens the scope of Lyapunov diagonal stability by providing a more comprehensive set of conditions under which stability can be ensured. This advancement is particularly relevant for applications in economics, population dynamics, and engineering, where systems often exhibit time delays and require robustness under nonnegativity constraints.
Future work can explore the possibility of developing similar characterizations for arbitrary matrices A, B, and C, without the restrictive condition that A is Metzler and B and C are nonnegative matrices. Such a generalization could open up new avenues for analysis in systems where these conditions do not hold. Additionally, another promising direction for future research is the development of further characterizations that parallel the results for Lyapunov diagonal stability, as seen in the works of [10] and [15]. These explorations could provide deeper insights into the stability of complex dynamical systems and enhance our understanding of the interplay between these inequalities and broader stability criteria.
We also would like to express our sincere gratitude to the anonymous reviewers for their valuable and constructive feedback, which greatly enhanced the quality of this paper.
The researchers would like to thank the Deanship of Graduate Studies and Scientific Research at Qassim University for financial support (QU-APC-2024-9/1).
The author does not have any conflict of interest.
[1] |
J. B. Du, W. J. Cheng, G. Y. Lu, H. T. Gao, X. L. Chu, Z. C. Zhang, et al., Resource pricing and allocation in MEC enabled blockchain systems: An A3C deep reinforcement learning approach, IEEE Trans. Network Sci. Eng., 9 (2022), 33–44. https://10.1109/TNSE.2021.3068340 doi: 10.1109/TNSE.2021.3068340
![]() |
[2] |
H. X. Peng, X. M. Shen, Deep reinforcement learning based resource management for multi-access edge computing in vehicular networks, IEEE Trans. Network Sci. Eng., 7 (2021), 2416–2428. https://10.1109/TNSE.2020.2978856 doi: 10.1109/TNSE.2020.2978856
![]() |
[3] |
D. C. Chen, X. L. Liu, W. W. Yu, Finite-time fuzzy adaptive consensus for heterogeneous nonlinear multi-agent systems, IEEE Trans. Network Sci. Eng., 7 (2021), 3057–3066. https://10.1109/TNSE.2020.3013528 doi: 10.1109/TNSE.2020.3013528
![]() |
[4] |
J. Wang, Q. Wang, H. Wu, T. Huang, Finite-time consensus and finite-time H∞ consensus of multi-agent systems under directed topology, IEEE Trans. Network Sci. Eng., 7 (2020), 1619–1632. https://10.1109/TNSE.2019.2943023 doi: 10.1109/TNSE.2019.2943023
![]() |
[5] |
T. Gao, T. Li, Y. J. Liu, S. Tong, IBLF-based adaptive neural control of state-constrained uncertain stochastic nonlinear systems, IEEE Trans. Neural Networks Learn. Syst., 33 (2022), 7345–7356. https://10.1109/TNNLS.2021.3084820 doi: 10.1109/TNNLS.2021.3084820
![]() |
[6] |
T. T. Gao, Y. J. Liu, D. P. Li, S. C. Tong, T. S. Li, Adaptive neural control using tangent time-varying BLFs for a class of uncertain stochastic nonlinear systems with full state constraints, IEEE Trans. Cybern., 51 (2021), 1943–1953. https://10.1109/TCYB.2019.2906118 doi: 10.1109/TCYB.2019.2906118
![]() |
[7] | P. J. Werbos, Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences, Ph. D. dissertation, Harvard University, 1974. |
[8] |
Y. Tang, D. D. Zhang, P. Shi, W. B. Zhang, F. Qian, Event-based formation control for nonlinear multiagent systems under DOS attacks, IEEE Trans. Autom. Control., 66 (2021) 452–459. https://10.1109/TAC.2020.2979936 doi: 10.1109/TAC.2020.2979936
![]() |
[9] |
Y. Tang, X. T. Wu, P. Shi, F. Qian, Input-to-state stability for nonlinear systems with stochastic impulses, Automatica, 113 (2020), 0005–1098. https://doi.org/10.1016/j.automatica.2019.108766 doi: 10.1016/j.automatica.2019.108766
![]() |
[10] |
X. T. Wu, Y. Tang, J. D. Cao, X. R. Mao, Stability analysis for continuous-time switched systems with stochastic switching signals, IEEE Trans. Autom. Control., 63 (2018), 3083–3090. https://10.1109/TAC.2017.2779882 doi: 10.1109/TAC.2017.2779882
![]() |
[11] |
B. Kiumarsi, K. G. Vamvoudakis, H. Modares, F. L. Lewis, Optimal and autonomous control using reinforcement learning: A survey, IEEE Trans. Neural Networks Learn. Syst., 29 (2018), 2042–2062. https://10.1109/TNNLS.2017.2773458 doi: 10.1109/TNNLS.2017.2773458
![]() |
[12] |
V. Narayanan, S. Jagannathan, Event-triggered distributed control of nonlinear interconnected systems using online reinforcement learning with exploration, IEEE Trans. Cybern., 48 (2018), 2510–2519. https://10.1109/TCYB.2017.2741342 doi: 10.1109/TCYB.2017.2741342
![]() |
[13] |
B. Luo, H. N. Wu, T. Huang, Off-policy reinforcement learning for H∞ control design, IEEE Trans. Cybern., 45 (2015), 65–76. https://10.1109/TCYB.2014.2319577 doi: 10.1109/TCYB.2014.2319577
![]() |
[14] |
R. Song, F. L. Lewis, Q. Wei, Off-policy integral reinforcement learning method to solve nonlinear continuous-time multiplayer nonzero sum games, IEEE Trans. Neural Networks Learn. Syst., 28 (2017), 704–713. https://10.1109/TNNLS.2016.2582849 doi: 10.1109/TNNLS.2016.2582849
![]() |
[15] |
X. Yang, D. Liu, B. Luo, C. Li, Data-based robust adaptive control for a class of unknown nonlinear constrained-input systems via integral reinforcement learning, Inf. Sci., 369 (2016), 731–747. https://doi.org/10.1016/j.ins.2016.07.051 doi: 10.1016/j.ins.2016.07.051
![]() |
[16] |
H. Zhang, K. Zhang, Y. Cai, J. Han, Adaptive fuzzy fault-tolerant tracking control for partially unknown systems with actuator faults via integral reinforcement learning method, IEEE Trans. Fuzzy Syst., 27 (2019), 1986–1998. https://10.1109/TFUZZ.2019.2893211 doi: 10.1109/TFUZZ.2019.2893211
![]() |
[17] |
W. Bai, Q. Zhou, T. Li, H. Li, Adaptive reinforcement learning neural network control for uncertain nonlinear system with input saturation, IEEE Trans. Cybern., 50 (2020), 3433–3443. https://10.1109/TCYB.2019.2921057 doi: 10.1109/TCYB.2019.2921057
![]() |
[18] |
Y. Li, S. Tong, Adaptive neural networks decentralized FTC design for nonstrict-feedback nonlinear interconnected large-scale systems against actuator faults, IEEE Trans. Neural Networks Learn. Syst., 28 (2017), 2541–2554. https://10.1109/TNNLS.2016.2598580 doi: 10.1109/TNNLS.2016.2598580
![]() |
[19] |
Q. Chen, H. Shi, M. Sun, Echo state network-based backstepping adaptive iterative learning control for strict-feedback systems: An error-tracking approach, IEEE Trans. Cybern., 50 (2020), 3009–3022. https://10.1109/TCYB.2019.2931877 doi: 10.1109/TCYB.2019.2931877
![]() |
[20] |
S. Tong, Y. Li, S. Sui, Adaptive fuzzy tracking control design for SISO uncertain nonstrict feedback nonlinear systems, IEEE Trans. Fuzzy Syst., 24 (2016), 1441–1454. https://10.1109/TFUZZ.2016.2540058 doi: 10.1109/TFUZZ.2016.2540058
![]() |
[21] |
W. Bai, T. Li, S. Tong, NN reinforcement learning adaptive control for a class of nonstrict-feedback discrete-time systems, IEEE Trans. Cybern., 50 (2020), 4573–4584. https://10.1109/TCYB.2020.2963849 doi: 10.1109/TCYB.2020.2963849
![]() |
[22] |
Y. Li, K. Sun, S. Tong, Observer-based adaptive fuzzy fault-tolerant optimal control for SISO nonlinear systems, IEEE Trans. Cybern., 49 (2019), 649–661. https://10.1109/TCYB.2017.2785801 doi: 10.1109/TCYB.2017.2785801
![]() |
[23] |
H. Modares, F. L. Lewis, M. B. Naghibi-Sistani, Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems, Automatica, 50 (2014), 193–202. https://doi.org/10.1016/j.automatica.2013.09.043 doi: 10.1016/j.automatica.2013.09.043
![]() |
[24] |
Z. Wang, L. Liu, Y. Wu, H. Zhang, Optimal fault-tolerant control for discrete-time nonlinear strict-feedback systems based on adaptive critic design, IEEE Trans. Neural Networks Learn. Syst., 29 (2018), 2179–2191. https://10.1109/TNNLS.2018.2810138 doi: 10.1109/TNNLS.2018.2810138
![]() |
[25] |
H. Li, Y. Wu, M. Chen, Adaptive fault-tolerant tracking control for discrete-time multiagent systems via reinforcement learning algorithm, IEEE Trans. Cybern., 51 (2021), 1163–1174. https://10.1109/TCYB.2020.2982168 doi: 10.1109/TCYB.2020.2982168
![]() |
[26] |
Y. J. Liu, L. Tang, S. Tong, C. L. P. Chen, D. J. Li, Reinforcement learning design-based adaptive tracking control with less learning parameters for nonlinear discrete-time MIMO systems, IEEE Trans. Neural Networks Learn. Syst., 26 (2015), 165–176. https://10.1109/TNNLS.2014.2360724 doi: 10.1109/TNNLS.2014.2360724
![]() |
[27] |
W. Bai, T. Li, Y. Long, C. L. P. Chen, Event-triggered multigradient recursive reinforcement learning tracking control for multiagent systems, IEEE Trans. Neural Networks Learn. Syst., 34 (2023), 355–379. https://10.1109/TNNLS.2021.3094901 doi: 10.1109/TNNLS.2021.3094901
![]() |
[28] |
H. Wang, G. H. Yang, A finite frequency domain approach to fault detection for linear discrete-time systems, Int. J. Control., 81 (2008), 1162–1171. https://doi.org/10.1080/00207170701691513 doi: 10.1080/00207170701691513
![]() |
[29] |
C. Tan, G. Tao, R. Qi, A discrete-time parameter estimation based adaptive actuator failure compensation control scheme, Int. J. Control., 86 (2013), 276–289. https://doi.org/10.1080/00207179.2012.723828 doi: 10.1080/00207179.2012.723828
![]() |
[30] |
J. Na, X. Ren, G. Herrmann, Z. Qiao, Adaptive neural dynamic surface control for servo systems with unknown dead-zone, Control Eng. Pract., 19 (2011), 1328–1343. https://doi.org/10.1016/j.conengprac.2011.07.005 doi: 10.1016/j.conengprac.2011.07.005
![]() |
[31] |
Y. J. Liu, S. Li, S. Tong, C. L. P. Chen, Adaptive reinforcement learning control based on neural approximation for nonlinear discrete-time systems with unknown nonaffine dead-zone input, IEEE Trans. Neural Networks Learn. Syst., 30 (2019), 295–305. https://10.1109/TNNLS.2018.2844165 doi: 10.1109/TNNLS.2018.2844165
![]() |
[32] |
S. S. Ge, J. Zhang, T. H. Lee, Adaptive neural network control for a class of MIMO nonlinear systems with disturbances in discrete time, IEEE Trans. Syst., Man Cybern. B, Cybern., 34 (2004), 1630–1645. https://10.1109/TSMCB.2004.826827 doi: 10.1109/TSMCB.2004.826827
![]() |
[33] |
Y. J. Liu, Y. Gao, S. Tong, Y. Li, Fuzzy approximation-based adaptive backstepping optimal control for a class of nonlinear discrete time systems with dead-zone, IEEE Trans. Fuzzy Syst., 24 (2016), 16–28. https://10.1109/TFUZZ.2015.2418000 doi: 10.1109/TFUZZ.2015.2418000
![]() |
[34] |
S. S. Ge, G. Y. Li, T. H. Lee, Adaptive NN control for a class of strict-feedback discrete-time nonlinear systems, Automatica, 39 (2003), 807–819. https://doi.org/10.1016/S0005-1098(03)00032-3 doi: 10.1016/S0005-1098(03)00032-3
![]() |
[35] |
Q. Yang, S. Jagannathan, Reinforcement learning controller design for affine nonlinear discrete-time systems using online approximators, IEEE Trans. Syst., Man, Cybern. B, Cybern., 42 (2012), 377–390. https://10.1109/TSMCB.2011.2166384 doi: 10.1109/TSMCB.2011.2166384
![]() |
[36] |
Y. J. Liu, Y. Gao, S. Tong, Y. Li, Fuzzy approximation-based adaptive backstepping optimal control for a class of nonlinear discrete time systems with dead-zone, IEEE Trans. Fuzzy Syst., 24 (2016), 16–28. https://10.1109/TFUZZ.2015.2418000 doi: 10.1109/TFUZZ.2015.2418000
![]() |
[37] |
S. Ferrari, J. E. Steck, R. Chandramohan, Adaptive feedback control by constrained approximate dynamic programming, IEEE Trans. Syst., Man, Cybern. B, Cybern., 38 (2008), 982–987. https://10.1109/TSMCB.2008.924140 doi: 10.1109/TSMCB.2008.924140
![]() |
[38] |
S. Tong, Y. Li, S. Sui, Adaptive fuzzy tracking control design for SISO uncertain nonstrict feedback nonlinear systems, IEEE Trans. Fuzzy Syst., 24 (2016), 1441–1454. https://10.1109/TFUZZ.2016.2540058 doi: 10.1109/TFUZZ.2016.2540058
![]() |