Radial basis function neural networks (RBFNNs) of Hankel translates of order μ>−1/2 with varying widths whose activation function σ is a.e. continuous, such that z−μ−1/2σ(z) is locally essentially bounded and not an even polynomial, are shown to enjoy the universal approximation property (UAP) in appropriate spaces of continuous and integrable functions. In this way, the requirement that σ be continuous for this kind of networks to achieve the UAP is weakened, and some results that hold true for RBFNNs of standard translates are extended to RBFNNs of Hankel translates.
Citation: Isabel Marrero. Relaxed conditions for universal approximation by radial basis function neural networks of Hankel translates[J]. AIMS Mathematics, 2025, 10(5): 10852-10865. doi: 10.3934/math.2025493
[1] | Changgui Wu, Liang Zhao . Finite-time adaptive dynamic surface control for output feedback nonlinear systems with unmodeled dynamics and quantized input delays. AIMS Mathematics, 2024, 9(11): 31553-31580. doi: 10.3934/math.20241518 |
[2] | Mohamed Kharrat, Moez Krichen, Loay Alkhalifa, Karim Gasmi . Neural networks-based adaptive command filter control for nonlinear systems with unknown backlash-like hysteresis and its application to single link robot manipulator. AIMS Mathematics, 2024, 9(1): 959-973. doi: 10.3934/math.2024048 |
[3] | Tomomichi Hagiwara, Masaki Sugiyama . L2/L1 induced norm and Hankel norm analysis in sampled-data systems. AIMS Mathematics, 2024, 9(2): 3035-3075. doi: 10.3934/math.2024149 |
[4] | Shengyang Gao, Fashe Li, Hua Wang . Evaluation of the effects of oxygen enrichment on combustion stability of biodiesel through a PSO-EMD-RBF model: An experimental study. AIMS Mathematics, 2024, 9(2): 4844-4862. doi: 10.3934/math.2024235 |
[5] | Bing Jiang . Rate of approximaton by some neural network operators. AIMS Mathematics, 2024, 9(11): 31679-31695. doi: 10.3934/math.20241523 |
[6] | Jianwei Jiao, Keqin Su . A new Sigma-Pi-Sigma neural network based on L1 and L2 regularization and applications. AIMS Mathematics, 2024, 9(3): 5995-6012. doi: 10.3934/math.2024293 |
[7] | Xia Song, Lihua Shen, Fuyang Chen . Adaptive backstepping position tracking control of quadrotor unmanned aerial vehicle system. AIMS Mathematics, 2023, 8(7): 16191-16207. doi: 10.3934/math.2023828 |
[8] | Hansaem Oh, Gwanghyun Jo . Physics-informed neural network for the heat equation under imperfect contact conditions and its error analysis. AIMS Mathematics, 2025, 10(4): 7920-7940. doi: 10.3934/math.2025364 |
[9] | Suliman Khan, M. Riaz Khan, Aisha M. Alqahtani, Hasrat Hussain Shah, Alibek Issakhov, Qayyum Shah, M. A. EI-Shorbagy . A well-conditioned and efficient implementation of dual reciprocity method for Poisson equation. AIMS Mathematics, 2021, 6(11): 12560-12582. doi: 10.3934/math.2021724 |
[10] | Kairui Chen, Yongping Du, Shuyan Xia . Adaptive state observer event-triggered consensus control for multi-agent systems with actuator failures. AIMS Mathematics, 2024, 9(9): 25752-25775. doi: 10.3934/math.20241258 |
Radial basis function neural networks (RBFNNs) of Hankel translates of order μ>−1/2 with varying widths whose activation function σ is a.e. continuous, such that z−μ−1/2σ(z) is locally essentially bounded and not an even polynomial, are shown to enjoy the universal approximation property (UAP) in appropriate spaces of continuous and integrable functions. In this way, the requirement that σ be continuous for this kind of networks to achieve the UAP is weakened, and some results that hold true for RBFNNs of standard translates are extended to RBFNNs of Hankel translates.
Many complex problems are nowadays modeled and solved by means of neural networks (NNs), which have become a fundamental tool in machine learning and artificial intelligence. While NNs admit many possible architectures, radial basis function neural networks (RBFNNs) may be classified as single hidden layer, feedforward nonlinear NNs. In fact, they consist of three sequential layers: the first or input layer, the last or output layer, and an intermediate one, referred to as the hidden layer. Information flows only in one direction, from the input layer to the output one. Each layer is composed of several nodes, which act as neurons in the network. Once an input is received by the neurons in the first layer, it is processed by the neurons in the hidden layer by means of a locally biased activation function, thus producing partial outputs that are linearly combined by the neurons in the last layer to render a final output. The nonlinearity of the model comes from the activation function, which, in the case of RBFNNs, is some radial kernel, often a Gaussian.
More specifically, given d∈N, an RBFNN is any function v:Rd→R expressible as
v(x)=N∑i=1wih(‖x−zi‖θi), | (1.1) |
where h:[0,∞)→R represents the activation function; x∈Rd is the input; N∈N is the quantity of hidden layer nodes; (w1,…,wN)∈RN is the N-tuple of weights connecting the i-th node to the output layer; and zi∈Rd, θi>0 respectively denote the centroid and width of the kernel at the i-th node (1≤i≤N). The kernel widths can either remain uniform across all nodes or vary individually for each node.
Soon after their introduction by Broomhead and Lowe [1] in the 1980s, RBFNNs were applied to supervised learning tasks like classification, pattern recognition, regression, and time series prediction [2,3]. Their theoretical appeal relies on their capacity of being dense in appropriate spaces of integrable or continuous functions, which, in NNs terminology, is referred to as the universal approximation property (UAP). A substantial corpus of literature has been devoted to studying this property in terms of the activation function h. For instance, Park and Sandberg [4,5] demonstrated that relatively soft conditions on h (such as being integrable with a nonzero integral, bounded, and a.e. continuous) are sufficient to guarantee this property in Lp(Rd) (1≤p<∞). Later on, Liao et al. [6] established that RBFNNs can uniformly approximate any continuous function provided that h is a.e. continuous, locally essentially bounded, and not a polynomial. Moreover, for 1≤p<∞, any function in an Lp space with respect to a finite measure can be approximated by some RBFNN with an essentially bounded activation function h that is not a polynomial. For further insights on p-mean approximation capabilities of RBFNNs, see [7] and references therein. Although the nonpolynomiality of h is clearly necessary, it has also been shown to suffice for other classes of networks to achieve the UAP [8,9].
The Hankel transformation, being particularly well-suited to handle radial functions, motivated Arteaga and Marrero [10] to propose and study a radial basis function (RBF) interpolation scheme where the interpolants are given by
u(x)=n∑i=1αi(τaiϕ)(x)+m−1∑j=0βjpμ,j(x)(x∈I). |
Here, I=(0,∞), ϕ is a complex basis function on I, μ≥−1/2, and τz=τμ,z stands for the operator of Hankel translation with order μ and symbol z∈I, while, for 1≤i≤n and 0≤j≤m−1, ai∈I are the interpolation nodes, pμ,j(x)=x2j+μ+1/2 are monomials of Müntz type, and αi, βj are complex coefficients.
Details on the Hankel transformation and its associated translation and convolution operators will be provided in Section 2 below, as the results in the present paper will delve into this approach in the framework of NNs. In fact, by replacing the standard translation with the Hankel translation τz (z∈I) in (1.1), we give the next
Definition 1.1 ([11,12]). An RBFNN of Hankel translates is any real function v on I that can be expressed as
v(x)=N∑i=1wiτzi(λσiϕ)(x)(x∈I), |
where ϕ is the activation function, N∈N accounts for the quantity of nodes in the hidden layer, and wi∈R stands for the weight from the i-th node to the output one, while zi,σi∈I represent the centroid and width, respectively, of the i-th node (1≤i≤N). Also, (λrϕ)(t)=ϕ(rt) (t∈I) is a homothety of ratio r∈I.
The class of all RBFNNs of Hankel translates will be denoted by S1(ϕ)=Sμ,1(ϕ).
It should be remarked that the UAP of closely related structures (termed RBFNNs of Delsarte translates) was investigated by Arteaga and the author in a series of papers, beginning with [13]. By considering RBFNNs of Hankel (or Delsarte) translates, a new parameter μ is introduced, which provides the practitioner with a greater variety of manageable kernels. This might be useful in handling mathematical models built upon a class of RBFs depending on the order μ [14,15], as network performance can be improved just by finely tuning this extra parameter, without increasing the number of centroids. Indeed, numerical and graphical examples illustrating the effect of μ in the approximation of functions can be found in [12, Section 5].
Unless otherwise stated, henceforth we let μ>−1/2. The following function spaces are to be considered:
● L∞μ,c=zμ+1/2L∞([0,c],z2μ+1dz) (c∈I). The usual norm of this space will be denoted by ‖⋅‖μ,∞,c.
● L∞μ,ℓ is the space of functions belonging to L∞μ,c for all c∈I, topologized by the sequence of seminorms {‖⋅‖μ,∞,n}n∈N.
● Cμ,c (c∈I) is the space of functions u, continuous on (0,c], for which
limz→0+z−μ−1/2u(z) | (1.2) |
exists and is finite, normed by ‖⋅‖μ,∞,c. The correspondence u↦z−μ−1/2u(z) sets up an isometric isomorphism between Cμ,c and the Banach space C[0,c] of the functions that are continuous on the interval [0,c], with the supremum norm. Therefore, Cμ,c is Banach, too.
● Cμ is the space of functions u, continuous on I, for which (1.2) exists and is finite. Topologized by the sequence of seminorms {‖⋅‖μ,∞,n}n∈N, Cμ becomes Fréchet.
In [12], Marrero proved the following: When ϕ∈Cμ, the class S1(ϕ) is dense in Cμ if, and only if, ϕ∉πμ, where
πμ=span{t2r+μ+1/2:r∈N0}. | (1.3) |
This generalizes to RBFNNs of Hankel translates a result of Pinkus [9, Theorem 12] for standard translates. Here we aim to extend to the Hankel setting the results in [6] as well: We will show that the density of S1(ϕ) in Cμ (in the sense that the closure of S1(ϕ) as a subspace of L∞μ,ℓ contains Cμ) can be achieved under relaxed conditions on ϕ, namely, membership in L∞μ,ℓ∖πμ and a.e. continuity, instead of membership in Cμ.
The structure and main results of the paper are as follows: After gathering in Section 2 the basic preliminaries on the translation and convolution operators associated with the Hankel transformation, the UAP is addressed. In Section 3, we recall from [12] the UAP for the case of activation functions in Cμ (Theorem 3.2) along with an auxiliary lemma, which gets slightly improved. In Section 4, the UAP for a.e. continuous activation functions in L∞μ,ℓ is established (Theorems 4.6 and 4.7). We remark that, at any event, nonpolynomiality of the activation function in the hidden layer, understood as exclusion from the class (1.3), has a pivotal role.
Let μ∈R, let Jμ denote the well-known Bessel function of the first kind and order μ, and let Jμ(z)=z1/2Jμ(z) (z∈I). Whenever the involved integral exists, the Hankel transform of a function ϕ=ϕ(x) (x∈I) is typically defined as
(hμϕ)(x)=∫∞0ϕ(t)Jμ(xt)dt(x∈I). |
Zemanian extended the Hankel transformation to spaces of distributions by adapting the ideas that led Schwartz [16] to produce a distributional theory of the Fourier transformation. In fact, the Zemanian class Hμ [17,18] of all complex functions ϕ∈C∞(I) such that
νμ,r(ϕ)=max0≤k≤rsupx∈I|(1+x2)r(x−1D)kx−μ−1/2ϕ(x)|<∞(r∈N0), |
where D=d/dx, plays in the Hankel transformation setting the same role as the Schwartz space of rapidly decreasing functions with respect to the Fourier transformation. When μ≥−1/2, the sequence of norms {νμ,r}r∈N0 makes Hμ into a Fréchet space, and hμ a self-isomorphism of Hμ. Hence, its adjoint h′μ is also a self-isomorphism of the dual H′μ when either its weak∗ or strong topologies are considered.
Zemanian [19] further introduced the class Bμ, which plays with respect to the Hankel transformation the same role as the test space of infinitely differentiable, compactly supported functions in the context of the Fourier transformation. Given a∈I, the space Bμ,a consists of all complex functions ϕ∈C∞(I) satisfying ϕ(x)=0 for x>a, and
δμ,r(ϕ)=supx∈I|(x−1D)rx−μ−1/2ϕ(x)|<∞(r∈N0). |
Topologized by means of the seminorms {δμ,r}r∈N0, this space is Fréchet. The strict inductive limit Bμ of {Bμ,a}a∈I is a dense subspace of Hμ; consequently, its dual B′μ can be viewed as a superspace of H′μ.
Sousa Pinto [20] pioneered in the study of the distributional Hankel convolution, although focusing on distributions of compact support, with μ=0. Betancor and the author [21,22,23] subsequently extended this theory to wider distribution spaces for any μ>−1/2. The definition of the Hankel #-convolution of φ,ϕ∈Hμ, in the classical sense, is as follows:
(φ#ϕ)(x)=∫∞0φ(y)(τxϕ)(y)dy(x∈I), |
where
(τxϕ)(y)=∫∞0ϕ(z)Dμ(x,y,z)dz(y∈I) | (2.1) |
is the Hankel translate of ϕ, with symbol x∈I. For x,y,z∈I, the nonnegative function
Dμ(x,y,z)=∫∞0t−μ−1/2Jμ(xt)Jμ(yt)Jμ(zt)dt={[z2−(x−y)2]μ−1/2[(x+y)2−z2]μ−1/223μ−1π1/2Γ(μ+1/2)(xyz)μ−1/2,|x−y|<z<x+y0,otherwise |
occurring in (2.1) is known as the Delsarte kernel. It is symmetric in its variables and satisfies the duplication formula
∫∞0Jμ(zt)Dμ(x,y,z)dz=Jμ(xt)Jμ(yt)(x,y,t∈I) |
along with the integrability property
∫∞0Dμ(x,y,z)zμ+1/2dz=c−1μ(xy)μ+1/2(x,y∈I), | (2.2) |
where cμ=2μΓ(μ+1). In particular,
(τxϕ)(y)=(τyϕ)(x)(ϕ∈Hμ,x,y∈I). |
Other key results include the shifting formula
hμ(τyϕ)(x)=x−μ−1/2Jμ(xy)(hμϕ)(x)(ϕ∈Hμ,x,y∈I), |
and the exchange formula
hμ(φ#ϕ)(x)=x−μ−1/2(hμφ)(x)(hμϕ)(x)(φ,ϕ∈Hμ,x∈I). |
The translation operator extends up to H′μ by transposition. Given f∈H′μ and ϕ∈Hμ, their Hankel convolution f#ϕ∈H′μ is
(f#ϕ)(x)=⟨f,τxϕ⟩(x∈I) [23, Definition 3.1]. |
The shifting and exchange formulas
h′μ(τyf)(x)=x−μ−1/2Jμ(xy)(h′μf)(x) |
and
h′μ(f#ϕ)(x)=x−μ−1/2(hμϕ)(x)(h′μf)(x) |
are valid in the distributional sense (cf. [23, Proposition 3.5]). The interested reader is especially referred to [18,21,22,23] for a more extensive study of the generalized Hankel transformation and its associated translation and convolution.
Except for the a.e. pointwise convergence stated in part (ⅰ), the next lemma is contained in [12, Lemma 2.1].
Lemma 3.1. For z∈I and ϕ∈L∞μ,ℓ, let τzϕ be as in (2.1), and define
(Tzϕ)(x)=ϕz(x)=cμz−μ−1/2(τzϕ)(x)(x∈I). |
Then, the following holds:
(i) The function x↦(τzϕ)(x) is well defined and continuous on I. Both operators Tz and τz are linear and continuous from L∞μ,ℓ into itself. If, moreover, ϕ is a.e. continuous, then limz→0+ϕz(x)=ϕ(x) a.e. x∈I.
(ii) When restricted to Cμ, both Tz and τz define continuous linear operators into Cμ. Also, if ϕ∈Cμ, then limz→0+ϕz=ϕ in Cμ.
Proof. As said above, it only remains to show that limz→0+ϕz(x)=ϕ(x) a.e. x∈I whenever ϕ∈L∞μ,ℓ is a.e. continuous, that is, the measure of the set of its discontinuity points is null.
Assume x∈I is a continuity point of ϕ; then, given any ε>0, for some δ=δ(x,ε)>0, the conditions t∈I and |t−x|<δ imply
|t−μ−1/2ϕ(t)−x−μ−1/2ϕ(x)|<ε. |
Furthermore, if 0<z<δ and t∈I with |t−x|≥δ>z, then Dμ(x,z,t)=0. Thus, using (2.2), we may write
|x−μ−1/2ϕz(x)−x−μ−1/2ϕ(x)|=|cμ(xz)−μ−1/2(τzϕ)(x)−x−μ−1/2ϕ(x)|=|cμ(xz)−μ−1/2∫∞0ϕ(t)Dμ(x,z,t)dt−cμ(xz)−μ−1/2x−μ−1/2ϕ(x)∫∞0Dμ(x,z,t)tμ+1/2dt|≤cμ(xz)−μ−1/2∫|t−x|<δ|t−μ−1/2ϕ(t)−x−μ−1/2ϕ(x)|Dμ(x,z,t)tμ+1/2dt<ε(0<z<δ), |
which settles the lemma.
We end this section with a main result from [12] and some comments about its proof.
Theorem 3.2. ([12, Theorem 3.3]). Let ϕ∈Cμ∖πμ. Then, S1(ϕ)=span{τs(λrϕ):s,r∈I}⊂Cμ is dense in Cμ, i.e., for any f∈Cμ, c∈I and ε>0, some g∈S1(ϕ) satisfies ‖f−g‖μ,∞,c<ε.
Conversely, if ϕ∈πμ, then S1(ϕ) has finite dimension, which prevents it from being dense in Cμ.
Proof. The description of S1(ϕ) is clear. A proof of the converse part was given in [12, Theorem 2.5]; however, we include it here for completeness. Let
Sμ=x−μ−1/2Dx2μ+1Dx−μ−1/2 |
denote the Bessel differential operator of order μ. Given m∈N0, a distribution f∈H′μ solves the differential equation Sm+1μf=0 if, and only if, f∈πμ and the degree of the even polynomial t−μ−1/2f(t) is not greater than 2m [10, Theorem 2.19]. Assume ϕ∈πμ and z−μ−1/2ϕ(z) has degree 2m, so that Sm+1μϕ=0. The commutativity of Sμ with Hankel translations (cf. [24]), followed by a simple computation, yields
Sm+1μ[τs(λrϕ)]=r2(m+1)τs[λr(Sm+1μϕ)]=0(s,r∈I). |
This means that the dimension of the linear space S1(ϕ) is at most 2m. Being finite-dimensional and hence closed, S1(ϕ) cannot be dense in infinite-dimensional spaces.
In this section, a series of lemmas will lead us to our main result. We begin with the following basic fact.
Lemma 4.1. Assume A⊂Xμ, where Xμ=L∞μ,ℓ or Xμ=Cμ, and let ¯A, respectively ¯Ac, denote the closure of A in the topology of Xμ, respectively in the norm of Xμ,c, where, for any c∈I, Xμ,c=L∞μ,c or Xμ,c=Cμ,c. Then,
¯A=⋂c∈I¯Ac. |
Proof. The inclusion map Xμ↪Xμ,c being continuous, it is evident that ¯A⊂¯Ac for all c∈I.
Conversely, suppose g∈¯Ac whenever c∈I. Then, in particular, for every n∈N, there exists gn∈A such that ‖g−gn‖μ,∞,n<n−1. Given b∈I and ε>0, choose m∈N with m≥max{b,ε−1}. We have
‖g−gn‖μ,∞,b≤‖g−gn‖μ,∞,m≤‖g−gn‖μ,∞,n<1n≤1m≤ε(n≥m). |
The arbitrariness of b∈I shows that limn→∞gn=g in the topology of Xμ, so that g∈¯A.
Lemma 4.2. Let σ∈L∞μ,ℓ be a.e. continuous, and let b,c∈I. Then, given ρ∈Bμ,b, the convolution
(σ#ρ)(x)=∫∞0(τxσ)(t)ρ(t)dt(x∈I) | (4.1) |
lies in Cμ,c and can be approximated from span{τsσ:s∈I} in the norm of L∞μ,c. In other words, for any ρ∈Bμ we have that σ#ρ lies in Cμ and belongs to the closure of span{τsσ:s∈I} in L∞μ,ℓ.
Proof. It can be adapted from that of [12, Lemma 3.1]. Fix ρ∈Bμ,b. By virtue of Lemma 3.1(i), τxσ∈L∞μ,ℓ for each x∈I; consequently, the function (4.1) is well defined.
We begin by showing the continuity of σ#ρ on (0,c]. With this purpose, pick x0∈(0,c]. We have
|(σ#ρ)(x)−(σ#ρ)(x0)|≤∫∞0|(τxσ)(z)−(τx0σ)(z)||ρ(z)|dz≤bμ+1/2∫b0|(τxσ)(z)−(τx0σ)(z)||z−μ−1/2ρ(z)|dz≤bμ+1/2supz∈I|z−μ−1/2ρ(z)|∫b0|(τxσ)(z)−(τx0σ)(z)|dz(x∈(0,c]). |
Moreover, for each z∈(0,b], using (2.2) we may write
|(τxσ)(z)−(τx0σ)(z)|≤esssupt∈[0,b+c]|t−μ−1/2σ(t)|∫b+c0|Dμ(x,z,t)−Dμ(x0,z,t)|tμ+1/2dt≤c−1μzμ+1/2(xμ+1/2+xμ+1/20)esssupt∈[0,b+c]|t−μ−1/2σ(t)|≤2c−1μ(bc)μ+1/2esssupt∈[0,b+c]|t−μ−1/2σ(t)|(x∈(0,c]). |
Lemma 3.1(ⅰ) guarantees that
limx→x0|(τxσ)(z)−(τx0σ)(z)|=0(z∈(0,b]). |
The desired continuity now follows from an application of the Lebesgue theorem of dominated convergence.
Similarly, because of Lemma 3.1(ⅰ), the estimate
|cμx−μ−1/2(σ#ρ)(x)−∫∞0σ(z)ρ(z)dz|=|∫b0cμx−μ−1/2(τxσ)(z)ρ(z)dz−∫b0σ(z)ρ(z)dz|≤∫b0|cμ(xz)−μ−1/2(τxσ)(z)−z−μ−1/2σ(z)||ρ(z)|zμ+1/2dz=∫b0|z−μ−1/2σx(z)−z−μ−1/2σ(z)||z−μ−1/2ρ(z)|z2μ+1dz≤supz∈I|z−μ−1/2ρ(z)|∫b0|z−μ−1/2σx(z)−z−μ−1/2σ(z)|z2μ+1dz(x∈I), |
and dominated convergence:
|z−μ−1/2σx(z)−z−μ−1/2σ(z)|≤|z−μ−1/2σx(z)|+|z−μ−1/2σ(z)|≤|cμ(xz)−μ−1/2∫b+x0D(x,z,t)σ(t)dt|+|z−μ−1/2σ(z)|≤2esssupt∈[0,b+c]|t−μ−1/2σ(t)|(x∈(0,c],z∈(0,b]), |
we arrive at
limx→0+x−μ−1/2(σ#ρ)(x)=c−1μ∫∞0σ(z)ρ(z)dz. |
Thus, σ#ρ∈Cμ,c.
Next, fix x∈(0,c]. For each n∈N, consider the partition {ti=ib/n:0≤i≤n} of [0,b], and let ε>0. The following estimate is easily obtained:
|(σ#ρ)(x)−n∑i=1bρ(ti)n(τtiσ)(x)|≤|∫∞0(τxσ)(t)ρ(t)dt−n∑i=1∫titi−1t−μ−1/2i(τxσ)(ti)tμ+1/2ρ(t)dt|+|n∑i=1∫titi−1t−μ−1/2i(τxσ)(ti)tμ+1/2ρ(t)dt−bnn∑i=1(τxσ)(ti)ρ(ti)|. | (4.2) |
As z2μ+1 and z−μ−1/2ρ(z) are uniformly continuous on [0,b] (cf. [18, Lemma 5.2-1]), for large enough n, the second term on the right-hand side of (4.2) can be bounded by
|n∑i=1∫titi−1t−μ−1/2i(τxσ)(ti)tμ+1/2ρ(t)dt−bnn∑i=1(τxσ)(ti)ρ(ti)|≤xμ+1/2c−1μesssupz∈[0,b+c]|z−μ−1/2σ(z)|n∑i=1∫titi−1|tμ+1/2ρ(t)−tμ+1/2iρ(ti)|dt≤xμ+1/2c−1μesssupz∈[0,b+c]|z−μ−1/2σ(z)|×n∑i=1∫titi−1[supt∈I|t−μ−1/2ρ(t)||t2μ+1−t2μ+1i|+|t−μ−1/2ρ(t)−t−μ−1/2iρ(ti)|t2μ+1i]dt<xμ+1/2ε2. | (4.3) |
Concerning the first term on the right-hand side of (4.2), recall that σ is a.e. continuous and note that the representation (2.1), jointly with Lemma 3.1, renders the map (x,t)↦(xt)−μ−1/2(τxσ)(t) continuous on (I∖U)×[0,∞), where U is some open set containing the points of discontinuity of σ, with measure less than a given λ>0. Therefore, this map is uniformly continuous over compacta: To every α,β>0, there corresponds N∈N, independent of x∈[α,c]∖U, such that n≥N implies
|(xt)−μ−1/2(τxσ)(t)−(xti)−μ−1/2(τxσ)(ti)|<β(t∈[ti−1,ti],1≤i≤n). |
In particular, given α,η>0, we may arrange for
|∫∞0(τxσ)(t)ρ(t)dt−n∑i=1∫titi−1t−μ−1/2i(τxσ)(ti)tμ+1/2ρ(t)dt|≤n∑i=1∫titi−1|t−μ−1/2(τxσ)(t)−t−μ−1/2i(τxσ)(ti)||t−μ−1/2ρ(t)|t2μ+1dt≤xμ+1/2supt∈I|t−μ−1/2ρ(t)|n∑i=1∫titi−1|(xt)−μ−1/2(τxσ)(t)−(xti)−μ−1/2(τxσ)(ti)|t2μ+1dt<xμ+1/2η(x∈[α,c]∖U), | (4.4) |
provided that n is large enough. This way, given η,δ>0, there exists N∈N such that, whenever n≥N, the measure of the set of points x∈(0,c] for which the left-hand side of (4.4), weighted by x−μ−1/2, is greater than or equal to η, does not exceed δ; that is, the sequence of such measures converges to zero, or, in other words, the corresponding functional sequence converges to zero in measure. By passing to a subsequence if necessary, a.e. convergence is achieved; thus, we obtain
|∫∞0(τxσ)(t)ρ(t)dt−n∑i=1∫titi−1t−μ−1/2i(τxσ)(ti)tμ+1/2ρ(t)dt|<xμ+1/2ε2 | (4.5) |
for a.e. x∈[0,c] and sufficiently large n. A combination of (4.2), (4.3), and (4.5) results in the estimate
‖σ#ρ−n∑i=1bρ(ti)nτtiσ‖μ,∞,c=esssupx∈[0,c]|x−μ−1/2(σ#ρ)(x)−x−μ−1/2n∑i=1bρ(ti)n(τtiσ)(x)|<ε |
being valid for large n, which accomplishes the first part of the proof.
Now, for any ρ∈Bμ, we have that σ#ρ∈Cμ lies in the closure of span{τsσ:s∈I} in L∞μ,c whenever c∈I. Since, by Lemma 3.1(ⅰ), span{τsσ:s∈I}⊂L∞μ,ℓ, a direct application of Lemma 4.1 reveals that σ#ρ belongs to the closure of span{τsσ:s∈I} in L∞μ,ℓ. The proof is complete.
Remark 4.3. Observe that, in the notation and conditions of Lemma 4.2, both
{n∑i=1bρ(ti)nτtiσ}n∈N |
and
{n∑i=1[t−μ−1/2i∫titi−1tμ+1/2ρ(t)dt]τtiσ}n∈N |
are approximating sequences to σ#ρ from span{τsσ:s∈I}.
Lemma 4.4. Assume σ∈L∞μ,ℓ is a.e. continuous and does not lie in πμ. Then, some ρ∈Bμ is such that σ#ρ does not lie in πμ, either.
Proof. Lemma 4.2 allows us to argue as in the proof of [12, Lemma 3.2].
Lemma 4.5. If σ∈L∞μ,ℓ, ρ∈Bμ and a∈I, then τa(σ#ρ)=σ#τaρ.
Proof. Defined as in (4.1), the convolution σ#τaρ makes sense, because Bμ is stable under Hankel translations [21, Corollary 3.3].
Let b∈I be such that ρ(t)=0 for t>b. There holds:
∫∞0Dμ(a,x,z)dz∫∞0|ρ(s)|ds∫∞0|σ(t)|Dμ(z,s,t)dt≤∫x+a0Dμ(a,x,z)dz∫b0|ρ(s)|ds∫x+a+b0|σ(t)|Dμ(z,s,t)dt≤sups∈I|s−μ−1/2ρ(s)|∫∞0Dμ(a,x,z)dz∫x+a+b0|σ(t)|dt∫∞0Dμ(z,s,t)sμ+1/2ds=c−1μsups∈I|s−μ−1/2ρ(s)|∫∞0Dμ(a,x,z)zμ+1/2dz∫x+a+b0|t−μ−1/2σ(t)|t2μ+1dt≤c−2μ(ax)μ+1/2esssupt∈[0,x+a+b]|t−μ−1/2σ(t)|sups∈I|s−μ−1/2ρ(s)|∫x+a+b0t2μ+1dt<∞(x∈I). |
Thus, the Fubini theorem may be applied to obtain
τa(σ#ρ)(x)=∫∞0(σ#ρ)(z)Dμ(a,x,z)dz=∫∞0Dμ(a,x,z)dz∫∞0ρ(s)ds∫∞0σ(t)Dμ(z,s,t)dt=∫∞0σ(t)dt∫∞0ρ(s)ds∫∞0Dμ(a,x,z)Dμ(z,s,t)dz=∫∞0σ(t)dt∫∞0ρ(s)ds∫∞0Dμ(a,z,s)Dμ(x,z,t)dz=∫∞0dz∫∞0σ(t)Dμ(x,z,t)dt∫∞0ρ(s)Dμ(a,z,s)ds=∫∞0(τxσ)(z)(τaρ)(z)dz=(σ#τaρ)(x)(x∈I), |
as claimed.
Theorem 4.6. Let σ∈L∞μ,ℓ∖πμ be a.e. continuous. Then,
S1(σ)=span{τs(λrσ):s,r∈I}⊂L∞μ,ℓ |
is dense in Cμ, i.e., for any f∈Cμ, c∈I and ε>0, some g∈S1(σ) satisfies ‖f−g‖μ,∞,c<ε.
Conversely, if σ∈πμ, then S1(σ) has finite dimension, which prevents it from being dense in Cμ.
Proof. The converse statement is contained in Theorem 3.2.
For the direct one, use Lemmas 4.2 and 4.4 to get some ρ∈Bμ such that σ#ρ∈Cμ∖πμ. The identity
λr(τqσ)=rμ+1/2τq/r(λrσ)(r,q∈I) | (4.6) |
can be derived by simple changes of variables. A combination of Theorem 3.2 with (4.6) and Lemma 4.5 yields the density of
S1(σ#ρ)=span{λr(σ#τqρ):r,q∈I} |
in Cμ. Recalling that Bμ is stable under Hankel translations, invoke Lemma 4.2 again, this time to approximate σ#τqρ from span{τsσ:s∈I} in the topology of L∞μ,ℓ. After a new application of (4.6), we are done.
As a consequence of Theorem 4.6, the hypotheses imposed on the activation function in [12, Theorem 4.1] can be weakened.
Theorem 4.7. Let σ∈L∞μ,ℓ be a.e. continuous, and let 1≤p<∞. Given c∈I, let γ be a Radon measure on [0,c] satisfying
∫c0tμ+1/2d|γ|(t)<∞. |
Then, for S1(σ)=span{τs(λrσ):s,r∈I} to be dense in Lp([0,c],dγ), it is necessary and sufficient that σ∉πμ.
Proof. If σ∈πμ then, as shown above, S1(σ) has finite dimension, which prevents it from being dense in Lp([0,c],dγ).
Conversely, if σ∉πμ then, from Theorem 4.6, S1(σ) is dense in Cμ,c, and hence in Lp([0,c],dγ).
The universal approximation property (UAP) of three-layered radial basis function neural networks of Hankel translates with varying widths has been studied. The requirement on the activation function σ in the hidden layer for such networks to approximate continuous functions locally in the esssup-norm has been satisfactorily weakened from continuity to local essential boundedness and a.e. continuity, provided that z−μ−1/2σ(z) (z∈I) is not an even polynomial. The UAP in p-mean (1≤p<∞) with respect to a suitable finite measure can therefore be attained under the same relaxed condition.
The author declares she has not used Artificial Intelligence (AI) tools in the creation of this article.
The author wants to express her gratitude to the anonymous reviewers for valuable comments that helped improve the presentation of the paper.
There is no conflict of interest to disclose.
[1] | D. S. Broomhead, D. Lowe, Multivariable functional interpolation and adaptive networks, Complex Syst., 2 (1988), 321–355. |
[2] |
R. P. Lippmann, Pattern classification using neural networks, IEEE Commun. Mag., 27 (1989), 47–64. https://doi.org/10.1109/35.41401 doi: 10.1109/35.41401
![]() |
[3] | S. Renals, R. Rohwer, Phoneme classification experiments using radial basis functions, International 1989 Joint Conference on Neural Networks, Washington DC (USA), 1989,461–467. https://doi.org/10.1109/IJCNN.1989.118620 |
[4] |
J. Park, I. W. Sandberg, Universal approximation using Radial-Basis-Function networks, Neural Comput., 3 (1991), 246–257. https://doi.org/10.1162/neco.1991.3.2.246 doi: 10.1162/neco.1991.3.2.246
![]() |
[5] |
J. Park, I. W. Sandberg, Approximation and radial-basis-function networks, Neural Comput., 5 (1993), 305–316. https://doi.org/10.1162/neco.1993.5.2.305 doi: 10.1162/neco.1993.5.2.305
![]() |
[6] |
Y. Liao, S. C. Fang, H. L. W. Nuttle, Relaxed conditions for radial-basis function networks to be universal approximators, Neural Netw., 16 (2003), 1019–1028. https://doi.org/10.1016/S0893-6080(02)00227-7 doi: 10.1016/S0893-6080(02)00227-7
![]() |
[7] |
D. Nan, W. Wu, J. L. Long, Y. M. Ma, L. J. Sun, Lp approximation capability of RBF neural networks, Acta Math. Sin.-Engl. Ser., 24 (2008), 1533–1540. https://doi.org/10.1007/s10114-008-6423-x doi: 10.1007/s10114-008-6423-x
![]() |
[8] |
M. Leshno, V. Y. Lin, A. Pinkus, S. Schocken, Multilayer feedforward networks with a nonpolynomial activation function can approximate any function, Neural Netw., 6 (1993), 861–867. https://doi.org/10.1016/S0893-6080(05)80131-5 doi: 10.1016/S0893-6080(05)80131-5
![]() |
[9] |
A. Pinkus, TDI-subspaces of C(Rd) and some density problems from neural networks, J. Approx. Theory, 85 (1996), 269–287. https://doi.org/10.1006/jath.1996.0042 doi: 10.1006/jath.1996.0042
![]() |
[10] |
C. Arteaga, I. Marrero, A scheme for interpolation by Hankel translates of a basis function, J. Approx. Theory, 164 (2012), 1540–1576. https://doi.org/10.1016/j.jat.2012.08.005 doi: 10.1016/j.jat.2012.08.005
![]() |
[11] |
I. Marrero, The role of nonpolynomiality in uniform approximation by RBF networks of Hankel translates, J. Funct. Spaces, 2019 (2019), 1845491 https://doi.org/10.1155/2019/1845491 doi: 10.1155/2019/1845491
![]() |
[12] |
I. Marrero, Radial basis function neural networks of Hankel translates as universal approximators, Anal. Appl. (Singap.), 17 (2019), 897–930. https://doi.org/10.1142/S0219530519500064 doi: 10.1142/S0219530519500064
![]() |
[13] |
C. Arteaga, I. Marrero, Universal approximation by radial basis function networks of Delsarte translates, Neural Netw., 46 (2013), 299–305. https://doi.org/10.1016/j.neunet.2013.06.011 doi: 10.1016/j.neunet.2013.06.011
![]() |
[14] |
H. Corrada, K. Lee, B. Klein, R. Klein, S. Iyengar, G. Wahba, Examining the relative influence of familial, genetic, and environmental covariate information in flexible risk models, Proc. Natl. Acad. Sci. USA, 106 (2009), 8128–8133. https://doi.org/10.1073/pnas.0902906106 doi: 10.1073/pnas.0902906106
![]() |
[15] |
S. Hamzehei Javaran, N. Khaji, A. Noorzad, First kind Bessel function (J-Bessel) as radial basis function for plane dynamic analysis using dual reciprocity boundary element method, Acta Mech., 218 (2011), 247–258. https://doi.org/10.1007/s00707-010-0421-7 doi: 10.1007/s00707-010-0421-7
![]() |
[16] | L. Schwartz, Théorie des distributions, Vols. Ⅰ, Ⅱ, Publications de l'Institut de Mathématique de l'Université de Strasbourg, Paris: Hermann & Cie, 1950-1951. |
[17] |
A. H. Zemanian, A distributional Hankel transformation, SIAM J. Appl. Math., 14 (1966), 561–576. https://doi.org/10.1137/0114049 doi: 10.1137/0114049
![]() |
[18] | A. H. Zemanian, Generalized integral transformations, Pure and Applied Mathematics, Vol. 18, New York: John Wiley & Sons, 1968. |
[19] |
A. H. Zemanian, The Hankel transformation of certain distributions of rapid growth, SIAM J. Appl. Math., 14 (1966), 678–690. https://doi.org/10.1137/0114056 doi: 10.1137/0114056
![]() |
[20] |
J. de Sousa Pinto, A generalised Hankel convolution, SIAM J. Math. Anal., 16 (1985), 1335–1346. https://doi.org/10.1137/0516097 doi: 10.1137/0516097
![]() |
[21] |
J. J. Betancor, I. Marrero, The Hankel convolution and the Zemanian spaces Bμ and B′μ, Math. Nachr., 160 (1993), 277–298. https://doi.org/10.1002/mana.3211600113 doi: 10.1002/mana.3211600113
![]() |
[22] | J. Betancor, I. Marrero, Structure and convergence in certain spaces of distributions and the generalized Hankel convolution, Math. Japon., 38 (1993), 1141–1155. |
[23] | I. Marrero, J. J. Betancor, Hankel convolution of generalized functions, Rend. Mat. Ser. VII, 15 (1995), 351–380. |
[24] |
J. J. Betancor, A new characterization of the bounded operators commuting with Hankel translation, Arch. Math., 69 (1997), 403–408. https://doi.org/10.1007/s000130050138 doi: 10.1007/s000130050138
![]() |