In predictive modeling, addressing class imbalance is a critical concern, particularly in applications where certain classes are disproportionately represented. This study delved into the implications of class imbalance on the interpretability of the random forest models. Class imbalance is a common challenge in machine learning, particularly in domains where certain classes are under-represented. This study investigated the impact of class imbalance on random forest model performance in churn and fraud detection scenarios. We trained and evaluated random forest models on churn datasets with class imbalances ranging from 20% to 50% and fraud datasets with imbalances from 1% to 15%. The results revealed consistent improvements in the precision, recall, F1-score, and accuracy as class imbalance decreases, indicating that models become more precise and accurate in identifying rare events with balanced datasets. Additionally, we employed interpretability techniques such as Shapley values, partial dependence plots (PDPs), and breakdown plots to elucidate the effect of class imbalance on model interpretability. Shapley values showed varying feature importance across different class distributions, with a general decrease as datasets became more balanced. PDPs illustrated a consistent upward trend in estimated values as datasets approached balance, indicating consistent relationships between input variables and predicted outcomes. Breakdown plots highlighted significant changes in individual predictions as class imbalance varied, underscoring the importance of considering class distribution in interpreting model outputs. These findings contribute to our understanding of the complex interplay between class balance, model performance, and interpretability, offering insights for developing more robust and reliable predictive models in real-world applications.
Citation: Lindani Dube, Tanja Verster. Interpretability of the random forest model under class imbalance[J]. Data Science in Finance and Economics, 2024, 4(3): 446-468. doi: 10.3934/DSFE.2024019
[1] | Hong Lu, Linlin Wang, Mingji Zhang . Studies on invariant measures of fractional stochastic delay Ginzburg-Landau equations on Rn. Mathematical Biosciences and Engineering, 2024, 21(4): 5456-5498. doi: 10.3934/mbe.2024241 |
[2] | Azmy S. Ackleh, Rainey Lyons, Nicolas Saintier . High resolution finite difference schemes for a size structured coagulation-fragmentation model in the space of radon measures. Mathematical Biosciences and Engineering, 2023, 20(7): 11805-11820. doi: 10.3934/mbe.2023525 |
[3] | H.T. Banks, S. Dediu, H.K. Nguyen . Sensitivity of dynamical systems to parameters in a convex subset of a topological vector space. Mathematical Biosciences and Engineering, 2007, 4(3): 403-430. doi: 10.3934/mbe.2007.4.403 |
[4] | D. Criaco, M. Dolfin, L. Restuccia . Approximate smooth solutions of a mathematical model for the activation and clonal expansion of T cells. Mathematical Biosciences and Engineering, 2013, 10(1): 59-73. doi: 10.3934/mbe.2013.10.59 |
[5] | Azmy S. Ackleh, Rainey Lyons, Nicolas Saintier . Finite difference schemes for a structured population model in the space of measures. Mathematical Biosciences and Engineering, 2020, 17(1): 747-775. doi: 10.3934/mbe.2020039 |
[6] | Qihua Huang, Hao Wang . A toxin-mediated size-structured population model: Finite difference approximation and well-posedness. Mathematical Biosciences and Engineering, 2016, 13(4): 697-722. doi: 10.3934/mbe.2016015 |
[7] | Gheorghe Craciun, Matthew D. Johnston, Gábor Szederkényi, Elisa Tonello, János Tóth, Polly Y. Yu . Realizations of kinetic differential equations. Mathematical Biosciences and Engineering, 2020, 17(1): 862-892. doi: 10.3934/mbe.2020046 |
[8] | Bing Hu, Minbo Xu, Zhizhi Wang, Jiahui Lin, Luyao Zhu, Dingjiang Wang . Existence of solutions of an impulsive integro-differential equation with a general boundary value condition. Mathematical Biosciences and Engineering, 2022, 19(4): 4166-4177. doi: 10.3934/mbe.2022192 |
[9] | Mario Lefebvre . An optimal control problem without control costs. Mathematical Biosciences and Engineering, 2023, 20(3): 5159-5168. doi: 10.3934/mbe.2023239 |
[10] | Azmy S. Ackleh, Mark L. Delcambre, Karyn L. Sutton, Don G. Ennis . A structured model for the spread of Mycobacterium marinum: Foundations for a numerical approximation scheme. Mathematical Biosciences and Engineering, 2014, 11(4): 679-721. doi: 10.3934/mbe.2014.11.679 |
In predictive modeling, addressing class imbalance is a critical concern, particularly in applications where certain classes are disproportionately represented. This study delved into the implications of class imbalance on the interpretability of the random forest models. Class imbalance is a common challenge in machine learning, particularly in domains where certain classes are under-represented. This study investigated the impact of class imbalance on random forest model performance in churn and fraud detection scenarios. We trained and evaluated random forest models on churn datasets with class imbalances ranging from 20% to 50% and fraud datasets with imbalances from 1% to 15%. The results revealed consistent improvements in the precision, recall, F1-score, and accuracy as class imbalance decreases, indicating that models become more precise and accurate in identifying rare events with balanced datasets. Additionally, we employed interpretability techniques such as Shapley values, partial dependence plots (PDPs), and breakdown plots to elucidate the effect of class imbalance on model interpretability. Shapley values showed varying feature importance across different class distributions, with a general decrease as datasets became more balanced. PDPs illustrated a consistent upward trend in estimated values as datasets approached balance, indicating consistent relationships between input variables and predicted outcomes. Breakdown plots highlighted significant changes in individual predictions as class imbalance varied, underscoring the importance of considering class distribution in interpreting model outputs. These findings contribute to our understanding of the complex interplay between class balance, model performance, and interpretability, offering insights for developing more robust and reliable predictive models in real-world applications.
Transport type equations arise ubiquitously in the physical, biological and social sciences (e.g., see [1,2,3]). They were, for example, recently used to approximate the dynamics of opinion formation [3] (see also [4] and [5]), to describe flow on networks (see [6,7,8]) and to model the dynamics of structured populations [9]. Because of the natural setting of the space of measures for these equations, as it allows for unifying discrete and continuous dynamics under the same framework, researchers have recently focused their efforts to study well-posedness of such equations on this space [10,11,12]; hence generalizing previous results that treated these equations in the space of integrable functions (e.g, [1]).
The importance of understanding differentiability of solutions of differential equation models with respect to parameters is crucial for many applications including optimal control (e.g., [13,14]), parameter estimation and least-square problems of fitting models to data [15,16], and sensitivity analysis of solutions to model parameters that can be used to obtain information on parameter uncertainty including confidence intervals for estimated model parameters (e.g., [17,18,19]). Such applications require the minimization of a functional that depends on the model solution and hence (numerically) solving for the critical points of the equation that represents the derivative of the solutions with respect to parameter often becomes necessary.
In this paper, we focus on deriving an equation that represents the derivative of a transport equation with respect to the vector field. To this end, consider the following transport equation in the space of bounded, nonnegative Radon measures M+(Rd):
∂tμt+∂x(v(x)μt)=0 | (1.1) |
where μt:[0,T]→M+(Rd) and v:Rd→Rd is a given vector field. Equation (1.1) is equipped with the initial condition μ|t=0=μ0. It is well-known that if v∈W1,∞(Rd), this equation has a unique solution in C([0,+∞),M+(Rd)) given by μt=T#tμ0 where Tt is the flow of v (defined in (2.2)) and T#t denotes the push - forward along the map Tt (see Eq (2.5)). Here, the space of measures is endowed with the so-called bounded Lipschitz norm ‖⋅‖BL∗ (see Eq (2.1)).
Here, we focus on the regularity of μt with respect to v, i.e., if v is slightly perturbed, how will μt change? To be more precise, suppose v(x) is replaced with the new vector field vh(x):=v0(x)+hvp(x) where v0 and vp are fixed vector fields and h can vary. The perturbed equation is then
∂tμht+∂x(vh(x)μht)=0 | (1.2) |
which has the unique solution μht=(Tht)#μ0 where Tht is the flow of the vector field vh. It is easy to see, using the representation formula for solutions to (1.2) presented in [20] or [21] (Eq 1.3) and estimates similar to the ones used to prove Lemma 3.8 in [22], that the map h↦μh is Lipschitz continuous in C([0,T],M+(Rd)) so that in particular limh→0μh=μ in C([0,T],M+(Rd)) for any T>0 (see also Eq (2.7)).
The next step in understanding the regularity of h↦μht consists in studying the existence of the derivative ∂hμh. This type of questions has been recently addressed in [23] for linear transport equation and for general nonlinear structured population models (including transport equation) in [24]. Briefly, denoting by ρΔht,h:=(μh+Δht−μht)/Δh, a difference quotient, the question is to give a precise mathematical meaning to the limit limΔh→0ρΔht,h. It turns out that this type of problems cannot be answered in the framework of bounded Lipschitz norm (see Example 3.5 in [24]). Indeed it is necessary to move to the bigger space Z defined as the closure of M(Rd) endowed with the dual norm (C1,α(Rd)∗ (see Section 2.3 for a brief introduction). Then, according to Theorem 1.1 in [23], one can prove that there exists ρt,h∈Z such that limΔh→0ρΔht,h=ρt,h (see also Theorem 2.1 below).
In this paper we want to characterize ρ as the unique solution to some equation. In fact, one of the main results in this work (see, Theorem 4.1 below) states that ρ is the unique solution to the equation
∂tρt+∂x(v0(x)ρt)=−∂x(vp(x)μt). |
This equation can then be thought of as the sensitivity equation satisfied by the directional derivative of μ under the perturbation v0+hvp. We will also prove an analogous result in the non-linear case when the vector-field v depends on μ (see Theorem 5.1 below).
The proofs of these results require the detailed study of a linear transport equation in Z of the form
∂tμt+∂x(v(x)μt)=νt,μ|t=0=μ0. | (1.3) |
While the existence of a solution to (1.3) can be established by extending standard techniques to the current setting on the space Z, the uniqueness issue presents some unexpected difficulties which led to a new notion of solution. With this new concept of solution, we are able to prove in Theorem 3.1 that this equation is well-posed.
The paper is organized as follows. In Section 2 we briefly recall some known facts concerning transport equations in the space of measures and the space Z. We also establish new properties of the space Z. For a smooth flow of the paper we provide the details of long proofs of these new properties in the Appendix. In Section 3, we prove the existence and uniqueness of a solution to linear equation of type (1.3) in Z. This allows to formulate sensitivity equations in the the linear (Section 4) and the nonlinear (Section 5) cases. In Section 6, we discuss possible applications of our results.
We briefly review here the formulation of the transport equation
∂tμt+∂x(v(x)μt)=0 |
on the space nonnegative Radon measures M+(Rd). This space is equipped with the bounded Lipschitz norm defined for μ∈M+(Rd) as
‖μ‖BL∗=sup‖ψ‖W1,∞(Rd)≤1∫Rdψ(x)dμ(x), | (2.1) |
as the total variation norm is too strong. Here, W1,∞(Rd) is the space of bounded and globally Lipschitz functions.
Let v be a vector field with v∈W1,∞(Rd,Rd). Then, the flow of v denoted by Ttv:Rd→Rd is defined as the solution to the ODE:
ddt(Ttv)(x)=v((Ttv)(x)),(T0v)(x)=x. | (2.2) |
Notice that (Ttv)(x) is defined for all t∈R. If there is no risk of confusion, we write Tt instead of Ttv. Now, the classical method of characteristics allows to solve the transport equation
∂tμt+∂x(v(x)μt)=νt,μt|t=0=μ0, | (2.3) |
where νt∈C([0,T],M+(Rd)). More precisely, the unique measure solution in C([0,T],M(Rd)) to (2.3) is given by propagating the initial condition μ0 along the flow of v, namely
μt=T#tμ0+∫t0T#t−sνsds, | (2.4) |
where for f:Rd→Rd and measure μ∈M+(Rd), f#μ is the push-forward measure defined as
f#μ(A)=μ(f−1(A)) for any measurable A⊂Rd. | (2.5) |
We remark here that the definition of the push-forward measure yields the following change of variables formula: for all measurable maps T:Rd→Rd and ϕ:Rd→R,
∫Rdϕ(x)d(T#μ)(x)=∫Rdϕ∘T(x)dμ(x). | (2.6) |
For the proof, see [25] for the case ν=0 and Proposition 3.6 in [21]. Let us also note that formula (2.4) is true also in the setting of bounded Radon measures M(Rd): as the equation is linear, one can apply the Hahn-Jordan decomposition (see Section 4.2 in [26]) and solve the equations for the positive and the negative parts of the measure separately.
Now, let v1 and v2 be two bounded and globally Lipschitz vector fields. Let μ(1)t and μ(2)t be the solutions to (2.3) with vector fields v1 and v2, respectively. Then, there is a constant C=C(T,‖v1‖W1,∞,‖v2‖W1,∞,μ0) such that
‖μ(1)t−μ(2)t‖BL∗≤C‖v1−v2‖∞,for any t∈[0,T]. | (2.7) |
For the proof, one simply applies the triangle and Gronwall inequalities as in the proof of Lemma 3.8 in [22]. The solution to (2.3) thus depends continuously on v.
The transport equation (2.3) can also be studied in a nonlinear setting where the vector field depends on the measure solution itself. Then, the nonlinear transport equation takes the form
∂tμt+∂x(v[μt](x)μt)=0. | (2.8) |
where v:M+(Rd)→W1,∞(Rd,Rd). It is common in application that v depends on μ through some weighted mean of μ of the form
v[μ](x)=V(x,∫RdKV(x,y)dμ(y)) | (2.9) |
for given maps V:Rd×R→Rd and KV:Rd×Rd→R.
Given α∈(0,1), we consider the space C1,α(Rd) of bounded continuous functions with bounded and α-Hölder derivative endowed with the norm
‖u‖C1,α:=‖u‖∞+‖Du‖∞+supx≠y|Du(x)−Du(y)||x−y|α. |
Lemma 2.1. 1. For any u∈C1,α(Rd),
|u(x+y)−u(y)−∇u(x)y|≤‖∇u‖C0,α|y|1+αfor any x,y∈Rd. | (2.10) |
2. If ϕ∈C1,α(Rd) and T∈C1,α(Rd,Rd) then ϕ∘T∈C1,α(Rd) with norm bounded by a constant depending only on a bound of ‖ϕ‖C1,α and ‖T‖C1,α.
Proof. The first assertion follows from
u(x+y)−u(y)−∇u(x)y=∫10ddtu(x+ty)−∇u(x)ydt=∫10(∇u(x+ty)−∇u(x))ydt. |
For the second one we only need to estimate |D(ϕ∘T)(x)−D(ϕ∘T)(y)|. We have
|Dϕ(T(x))DT(x)−Dϕ(T(y))DT(y)|≤|Dϕ(T(x))(DT(x)−DT(y)|+|(Dϕ(T(x))−Dϕ(T(y)))DT(y)|≤‖ϕ‖C1,α‖T‖C1,α|x−y|α+‖ϕ‖C1,α|T(x)−T(y)|α‖T‖C1,α≤C|x−y|α, |
where C=‖ϕ‖C1,α‖T‖C1,α+‖ϕ‖C1,α‖T‖1+αC1,α.
We also recall the following result from Cor. 3.16 in [24] regarding the regularity of the flow Ttv defined in (2.2):
Proposition 2.1. Assume that v∈C1,α(Rd,Rd). Then there exists a constant CT>0 depending only on T and ‖v‖C1,α such that ‖D(Ttv)‖C0,α≤CT for any t∈[0,T]. Moreover it can be checked upon inspection of the proof that CT→1 as T→0.
We consider the space Z defined as the closure of M(Rd) endowed with the dual norm (C1,α(Rd))∗ for some α (see Remark 2.1 on the choice of α). This space was first introduced in [23] where the authors demonstrated that Z has a lot of convenient topological properties. In particular, Z is a separable Banach space with its dual being isometrically isomorphic to C1,α(Rd). Indeed it was proved in [23][Prop. 5.1] that span{δx,x∈Qd} is dense in Z. In particular this implies that any element of Z can be approximated by bounded measures.
Notice that using duality we have for any μ∈Z,
‖μ‖Z=sup‖ψ‖C1,α≤1(μ,ψ). |
The main advantage of space Z is its applicability to studying differentiation problems with respect to perturbation of transport equations. More precisely, let us consider Eq (2.3) with νt=0 and vector field v0(x)+hvp(x) where h∈[−M,M] for some M>0, and denote by μht its solution, namely
∂tμht+∂x((v0+hvp)μht)=0,μht|t=0=μ0. |
One is then interested in the limit μh+Δht−μhtΔh as Δh→0. The following result was obtained in [23]:
Theorem 2.1. Let v0,vp∈C1+α(Rd,Rd). Then, μh+Δht−μhtΔh converges in C([0,T],Z) as Δh→0.
Remark 2.1. Let Zα=¯M(Rd)(C1,α(Rd))∗. Notice that if 0<α<α′<1 then C1,α′⊂C1,α from which we deduce that Zα⊂Zα′ with continuous injection. Therefore, if incremental quotient μh+Δht−μhtΔh converges in Zα, it also does so in Zα′ for any α′<α. Moreover, since Zα⊂Zα′ continuously, both limits coincide. So there is no ambiguity and we simply write Z instead of Zα.
Such a perturbation problem can be also studied for the nonlinear transport equation (2.8) with a vector-field v0[μ] like (2.9). We perturb v0[μ] considering vh[μ](x) defined as
vh[μ](x)=v0[μ](x)+hvp[μ](x)=V0(x,∫RdKV0(x,y)dμ(y))+hVp(x,∫RdKVp(x,y)dμ(y)). | (2.11) |
Then, we have the following result:
Theorem 2.2. Let α>12 and vh[μ] be given by (2.11), where V0,Vp∈C1+α(Rd×R,Rd) and KV0,KVp∈C2+α(Rd×Rd,R). Let μht be the unique solution to (2.8) with the vector field vh[μ]. Then, μh+Δht−μhtΔh converges in C([0,T],Z) as Δh→0.
Remark 2.2. The proof of existence and uniqueness of solutions as well as of a differentiability result was actually given only for the case of R+ in [22] and [24] respectively. However, the proof can be easily extended to Rd. Indeed, the main idea is to construct approximating sequence as follows. The interval of time [0,T] is divided into 2k subintervals of the form [lT2k,(l+1)T2k] where l=0,1,...,2k−1. Then, the following approximation is defined recursively: for t∈(lT2k,(l+1)T2k], let μt be the solution to
∂tμt+∂x(v[μlT2k](x)μt)=0. |
with initial condition μlT2k. One then uses the formula for the solution of the linear problem (2.4) to conclude the proof. See [22] and [24] for more details.
The following Propositions discuss the distributional derivatives of bounded Radon measures as elements of space Z. For easier flow of this section long proofs are provided in the Appendix.
We can see a Radon measure μ∈M(Rd) as a distribution by (μ,ϕ)=∫ϕdμ, ϕ∈C∞c(Rd). We denote by ∂xϕ:=∇ϕ⋅x the derivative of ϕ in direction x∈Rd. We then define a distribution ∂xμ by duality letting (∂xμ,ϕ)=−(μ,∂xϕ). The next result shows that in fact ∂xμ belongs to Z when μ is bounded.
Proposition 2.2. For any bounded μ∈M(Rd), the distributional derivative ∂xμ of μ in direction x∈Rd belongs to Z.
Proof. Let μ∈M(Rd) be bounded. To prove that the distributional derivative ∂xμ belongs to Z, we need to find a sequence νh∈M(Rd) such that νh→∂xμ as h→0 in Z. Let τh be the translation operator defined by τhϕ(y):=ϕ(y+hx) for any ϕ. Take νh:=(τ#hμ−μ)/h∈M(Rd). Then for any ϕ∈C1,α(Rd) with ‖ϕ‖C1,α(Rd)≤1 we have using (2.10) that
|(νh,ϕ)−(−∂xμ,ϕ)|=∫Rd|ϕ(y+hx)−ϕ(y)h−∂xϕ(y)|dμ(y)≤|h|α‖μ‖TV |
so that νh→−∂xμ in Z as h→0.
Proposition 2.3. Consider μn,μ∈Mb(Rd) such that μn→μ narrowly (i.e. in duality with bounded and continuous functions Cb(Rd)). Then ∂xμn→∂xμ in Z.
Proof. See Appendix.
,
Proposition 2.4. For a bounded vector field v on Rd and μ∈Mb(Rd) we have
‖∂x(vμ)‖Z≤‖μ‖TV‖v‖∞. | (2.12) |
Moreover, consider measures μn,μ∈Mb(Rd) such that μn→μ narrowly and vector fields vn,v∈Cb(Rd,Rd) such that vn→v uniformly. Then ∂x(vnμn)→∂x(vμ) in Z.
Proof. For any ϕ such that ‖ϕ‖C1,α≤1 we have
|(∂x(vμ),ϕ)|=|(μ,v∂xϕ)|≤‖μ‖TV‖v∂xϕ‖∞≤‖μ‖TV‖v‖∞. |
Then, in view of Proposition 2.3, to verify the second assertion, it is sufficient to prove that vnμn→vμ narrowly. For ϕ∈Cb(Rd), we have
(vnμn−vμ,ϕ)=(μn,(vn−v)ϕ)+(μn−μ,vϕ) |
where (⋅,⋅) denotes the dual pairing. The first term can be bounded by ‖μn‖TV‖(vn−v)ϕ‖∞≤C‖(vn−v)‖∞→0 while the second tends to 0 since vϕ∈Cb(Rd).
We deduce that
Corollary 2.1. Let [0,T]∋t↦μt∈Mb(Rd) be a narrowly continuous map and v∈Cb(Rd,Rd). Then ∂x(vμt)∈C([0,T],Z).
It will also be useful to define the push-forward of an element of Z. The idea is quite simple. In fact, since this is well-defined on the space of measures, we can extend its definition for elements of Z by means of Cauchy sequences.
Proposition 2.5. Let T∈C1,α(Rd,Rd). Then for any μ∈Z we can define T#μ∈Z by
T#μ:=limn→∞T#μn |
where {μn}n∈N⊂M(Rd) is any sequence such that μn→μ in Z. Then, for any ϕ∈C1,α(Rd) we have the following analogue of the change of variables formula (2.6):
(T#μ,ϕ)=(μ,ϕ∘T) |
where ϕ∘T denotes composition of the maps ϕ and T.
Proof. See Appendix.
We conclude this section with the following classical observation. By definition, if μ∈Z, there is a sequence of bounded measures {μn}n∈N such that μn→μ in Z. Now, if μ∈C([0,T],Z), for each t∈[0,T], one can choose an approximating sequence for each μt, t∈[0,T]. However, it is possible to construct an approximating sequence that is continuous in time and so, that approximates the whole curve t↦μt, t∈[0,T]. This is the content of the following lemma.
Lemma 2.2. Let μ∈C([0,T],Z). There is a sequence {μ(n)}n∈N⊂C([0,T],Mb(Rd)) such that μ(n)→μ in C([0,T],Z) as n→∞.
Proof. See Appendix.
Corollary 2.2. Let ν∈C([0,T],Z) and v∈C1,α(Rd). Then the map t→Ttv#νt is continuous from [0,T] to Z.
Proof. See Appendix.
In this section, we study the following transport equation in the space Z:
∂tμt+∂x(v(x)μt)=νt,μ|t=0=μ0, | (3.1) |
where v∈C1,α(Rd,Rd), ν∈C([0,T],Z) and μ0∈Z. We begin with a concept of a very weak solution.
Definition 3.1. We say that μ∈C([0,T],Z) is a very weak solution to (3.1) in Z if for any φ∈C([0,T]×Rd) with φ∈C([0,T],C2+α(Rd)) and φt∈C([0,T],C1+α(Rd)) we have:
(μT,φ(x,T))=(μ0,φ(x,0))+∫T0(μt,φt(.,t)+v⋅∇φ(.,t))dt+∫T0(νt,ϕ(.,t))dt. | (3.2) |
Note that we have to use test functions of regularity at least C2+α in space variable x so that function φt+v(x)⋅∇xφ lies in Z, the domain of the functional μt.
Proposition 3.1. Equation (3.1) has at least one very weak solution in C([0,T],Z) given by
μt=T#tμ0+∫t0T#t−sνsds | (3.3) |
where the integral is a Bochner integral in Z.
Moreover, if μ0=0 and νt=0, then for any weak solution μt we have
(μt,η)=0 | (3.4) |
for all η∈C2+α(Rd) and t∈[0,T].
Proof. We first verify that the integral appearing on the right-hand side of (3.3) is a Bochner integral in Z. According to Corollary 2 the map f:s∈[0,t]→T#t−sνs∈Z is continuous. Thus for any z∗∈Z∗, z∗∘f is also continuous. Since Z is separable, we conclude using Pettis theorem that f is measurable. Moreover for any ϕ∈C1,α(Rd), ‖ϕ‖C1,α≤1, we have
|(f(s),ϕ)|=|(νs,ϕ∘Tt−s)|≤‖νs‖Z‖ϕ∘Tt−s‖C1,α≤CT |
since ν∈C([0,T],Z) and in view of Lemma 2.1 and Proposition 1. It follows that max0≤s≤t‖f(s)‖Z≤CT and thus that f is Bochner-integrable. It is also easily seen that ∫t0T#t−sνsds is continuous in t.
Let μt be defined by (3.3). Clearly μ∈C([0,T],Z). We now verify that μt is a solution in the sense of Definition 1. According to Lemma 2, we we can find sequences {ν(n)}n∈N⊂C([0,T],Mb(Rd)) and {μn0}n∈N⊂Mb(Rd) such that ‖μ(n)0−μ0‖Z→0 and ‖ν(n)t−νt‖Z→0 uniformly in t∈[0,T]. Then the transport equation
∂tμt+∂x(v(x)μt)=ν(n)t,μ|t=0=μ(n)0 | (3.5) |
has a unique solution μ(n)∈C([0,T],Mb(Rd)) given by
μ(n)t=T#tμ(n)0+∫t0T#t−sν(n)sds. | (3.6) |
According to Proposition 2.5, T#tμ(n)0→T#tμ0 in Z and, for any s, T#t−sν(n)s→T#t−sνs in Z. Since ‖T#t−sν(n)s‖Z≤CT we have applying the Dominated Convergence Theorem that ∫t0T#t−sν(n)sds→∫t0T#t−sνsds in Z. Thus for any t∈[0,T], μ(n)t converges in Z to μt given by
μt:=limn→+∞μ(n)t=T#tμ0+∫t0T#t−sνsds. |
Clearly, μ∈C([0,T],Z). On the other hand, weak formulation for (3.5) is valid for test functions of class C1([0,T]×Rd)∩W1,∞([0,T]×Rd). In particular, taking test functions as in Definition 3.1, we send n→∞ in the weak formulation for (3.5) to deduce that μt is a very weak solution to (3.1).
To prove (3.4), we use the so-called dual problem (cf. Remark 8.1.5 and Proposition 8.1.7 in [27] or Proposition 5.34 in [25]). More precisely, given some function ψ(x,t), let φ be the solution of
∂tφ+v(x)⋅∇xφ=ψ,φ(x,T)=0. | (3.7) |
which is explicitly given by φ(x,t)=−∫Ttψ(Ts−t(x),s)ds. We consider ϕ of the form ψ(x,t)=ξ(t)η(x) where ξ∈C∞c([0,T]) and η∈C2,α(Rd). We then use the corresponding solution ϕ of (3.7) as a test function in (3.2) to conclude
∫T0ξ(t)(μt,η)dt=0. |
Since the map t↦(μt,η(x)) is continuous for t∈[0,T] and since ξ is arbitrary, we deduce that (μt,η)=0 for any η∈C2,α(Rd) and t∈[0,T].
Unfortunately, condition (3.4) does not imply that μt=0 so that we cannot deduce the uniqueness of a solution to (3.1). The problem here is that C2+α(Rd) is not dense in C1+α(Rd). The following two examples shows the typical problem with approximation of Hölder functions.
Example 3.1. One can easily check that f(x)=√x∈C1/2([0,1]). Suppose there is a sequence {fn}n∈N⊂C1([0,1]) such that ‖fn−f‖C1/2→0. Then
0←‖fn−f‖C1/2≥supx∈(0,1]|1−fn(x)−fn(0)√x|≥supx∈(0,1]|fn(x)−fn(0)|√x−1, |
contradicting {fn}n∈N⊂C1([0,1]).
Example 3.2. We construct a nontrivial functional on C1/2([0,1]) which vanishes on C1,1/2([0,1]). In particular, this shows that functionals on C1/2([0,1]) cannot be uniquely characterized by their values on C1,1/2([0,1]). Let X=C1([0,1])⊕lin(√x) be a linear subspace of C1/2([0,1]). On X, we can define a functional φ:X→R with
φ(f)=limx→0f(x)−f(0)√x. |
Notice that ϕ is continuous since |φ(f)|≤‖f‖C1/2. By the analytic version of Hahn-Banach Theorem (Theorem 1.1 in [28]), we can then extend φ to a continuous functional on C1/2([0,1]). It is easily seen that ϕ(f)=0 for any f∈C1([0,1]) by Taylor's estimate but that φ(√x)=1.
There is also characterization of subset in Cα consisting of functions that can be approximated by smooth functions:
Remark 3.1. Let Ω⊂Rd. Then, f∈Cα(Ω) can be approximated by smooth functions if and only if f is an element of the set
Fα(Ω)={f∈Cα(Ω):limt→0+sup|x−y|≤t|f(x)−f(y)||x−y|α=0}. |
One easily checks that for Ω=[0,1], √x∉F1/2(Ω). Moreover, for any β>α, Cβ(Ω)⊂Fα(Ω).
Therefore, we realize that the space of test functions is too small to deduce uniqueness of weak solutions. This is the case for many PDEs formulated in the weak sense. Probably one of the most famous is Euler's equation where one can construct infinitely many distributional solutions with prescribed energy profile (thus contradicting conservation of energy), see [29] and references therein. The standard procedure in such situation for many evolutionary problems is to require some additional conditions to be satisfied by a weak solution (like entropy condition for conservation laws, see [30], section 3.4).
To establish additional conditions required from weak solutions, we should get some insight about which solutions we would like to extract. First, note that if ν∈Z, there is an approximating sequence of measures νn∈M(Rd) such that νn→ν in Z. Now, recall that we want to find an equation that is satisfied by the derivative of the solution to (3.1) with respect to perturbation parameter h. Therefore, in our case, such approximating sequence is of the form μh+Δht−μhtΔh. We will see in the proof of Theorem 4.1 below that ‖μh+Δht−μhtΔh‖BL∗≤CT for some constant C independent of h, Δh and t. This suggests to define the following admissibility class:
A={ν∈Z:∃{νn}n∈N⊂M(Rd) s.t. νn→ν in Z and ‖νn‖BL∗≤C}. | (3.8) |
Notice that A is a subspace of Z containing the bounded measures Mb(Rd) so that A is dense in Z. In view of the proof of Proposition 2.2 we also have that ∂xμ∈A for any μ∈Mb(Rd). In fact we have the folowing stronger result:
Proposition 3.2. Let \mu:[0, T]\to \mathcal{M}_b(\mathbb{R}^d) be continuous and TV-bounded, and let x\in \mathbb{R}^d . Then \partial_x\mu\in C([0, T], Z) with values in \mathcal{A} and in fact there exists \rho^h\in C([0, T], \mathcal{M}_b(\mathbb{R}^d)) , h\in (0, 1) , such that
\lim\limits_{t\to 0} \max\limits_{0\le t\le T}\|\rho^h_t- \partial_x(\mu_t)\|_Z = 0 \qquad \mathit{\text{and}} \qquad \sup\limits_{h\in (0, 1], \, t\in [0, T]}\|\rho_t^h\|_{BL^*} \le C. |
Proof. According to Proposition 2.3, \partial_x\mu\in C([0, T], Z) . Let \tau_h be the translation defined by \tau_h\phi(y) = \phi(y+hx) . It is then easy to verify using the same arguments as in the proof of Proposition 2 that \rho_t^h: = (\tau_h^{\#} \mu_t - \mu_t)/h satisfies the requirements.
We can now define a weak solutions as follows.
Definition 3.2. We say that \mu \in C([0, T], Z) is a weak solution to (3.1) in Z if \mu is a very weak solution (see Definition 3.1) and for all t\in [0, T] , \mu_t \in A .
With this definition we are now able to establish the following existence and uniqueness result:
Theorem 3.1. Let \mu_0 \in \mathcal{A} and \nu \in C([0, T], Z) with values in \mathcal{A} . Assume that there exists a sequence \nu^n\in C([0, T], \mathcal{M}_b(\mathbb{R}^d)) , n\in\mathbb{N} , such that
\begin{equation} \lim\limits_{n\to +\infty} \max\limits_{0\le t\le T}\|\nu_t^n-\nu_t\|_Z = 0 \qquad \mathit{\text{and}} \qquad \sup\limits_{n\in\mathbb{N}, \, t\in [0, T]}\|\nu_t^n\|_{BL^*} \le C. \end{equation} | (3.9) |
Then, equation (3.1) has a unique weak solution in the sense of Definition 3.2 which is given by
\begin{equation} \mu_t = T_t^{\#} \mu_0 + \int_0^t T_{t-s}^{\#}\nu_s\, ds. \end{equation} | (3.10) |
Note that according to Proposition 3.2, the Theorem applies in particular when \nu_t = \partial_x(\mu_t) with \mu:[0, T]\to \mathcal{M}_b(\mathbb{R}^d) continuous and TV-bounded.
Proof. To prove the uniqueness statement, since the equation is linear, it is sufficient to prove that if \mu_0 = 0 and \nu_t = 0 for all t \in [0, T] , then \mu_t = 0 for all t\in [0, T] . This is equivalent to (\mu_t, \eta) = 0 for any \eta \in C^{1, \alpha}(\mathbb{R}^d) . Fix \eta \in C^{1, \alpha}(\mathbb{R}^d) and for \epsilon > 0 denote by \eta^{\epsilon} the standard mollification of \eta . Since \eta and its derivatives are uniformly continuous, we have \|\eta^{\epsilon} - \eta\|_{W^{1, \infty}} \to 0 as \epsilon \to 0 . Moreover, for fixed \epsilon > 0 , \eta^\epsilon \in C^{2, \alpha}(\mathbb{R}^d) so that (\mu_t, \eta^{\epsilon}) = 0 by (3.4). Since \mu_t \in \mathcal{A} there exists a BL-bounded sequence \mu_t^{(n)}\in \mathcal{M}_b(\mathbb{R}^d) converging in Z to \mu_t . For a fixed \varepsilon > 0 we then write
(\mu_t, \eta) = (\mu_t, \eta^{\epsilon}) + (\mu_t, \eta - \eta^{\epsilon}) = \lim\limits_{n\to \infty} (\mu_t^{(n)}, \eta - \eta^{\epsilon}) |
with
(\mu_t^{(n)}, \eta - \eta^{\epsilon}) \le \|\mu_t^{(n)}\|_{BL^*}\| \eta - \eta^{\epsilon}\|_{W^{1, \infty}} \le C \| \eta - \eta^{\epsilon}\|_{W^{1, \infty}} |
for some constant C independent of n . Thus
|(\mu_t, \eta)| \leq C \| \eta - \eta^{\epsilon}\|_{W^{1, \infty}}. |
Since \epsilon > 0 is arbitrary, we conclude (\mu_t, \eta) = 0 .
Concerning the existence, we already know from Proposition 3.1 that \mu_t = T_t^{\#} \mu_0 + \int_0^t T_{t-s}^{\#}\nu_s\, ds belongs to C([0, T], Z) and solves the equation. It remains to prove that \mu_t\in \mathcal{A} for any t\in [0, T] . Since \mu_0\in \mathcal{A} there exists a BL-bounded sequence \mu_0^{(n)}\in \mathcal{M}_b(\mathbb{R}^d) converging in Z to \mu_0 . Let
\mu_t^{(n)} : = T_t^{\#} \mu_0^{(n)} + \int_0^t T_{t-s}^{\#}\nu_s^{(n)}\, ds |
where \nu_n satisfies (3.9). We verify as in the proof of Proposition 3.1 that \mu_t^{(n)}\to \mu_t in Z for any given t . Moreover for any bounded Lipschitz \phi we have
\begin{eqnarray*} (\mu_t^{(n)}, \phi) & = & (\mu_0^{(n)}, \phi\circ T_t) + \int_0^t (\nu_s^{(n)}, \phi\circ T_{t-s})\, ds \\ & \le & \|\mu_0^{(n)}\|_{BL^*} \|\phi\circ T_t\|_{BL} + \int_0^t \|\nu_s^{(n)}\|_{BL^*}\|\phi\circ T_{t-s}\|_{BL}\, ds \end{eqnarray*} |
Since Lip(T_t)\le e^{t \, Lip(v)} we have \|\phi\circ T_t\|_{BL}\le e^{t \, Lip(v)} . Thus, choosing C_T = e^{t \, Lip(v)}\Big(\sup_n \|\mu_0^{(n)}\|_{BL^*} + T\, \sup_{n, 0\le s\le T}\|\nu_s^{(n)}\|_{BL^*} \Big) we see that
\begin{equation*} (\mu_t^{(n)}, \phi) \le C_T . \end{equation*} |
Hence, \sup_{n\in\mathbb{N}, \, t\in [0, T]}\|\mu_t^n\|_{BL^*} \le C_T .
In this section we formulate an equation that is satisfied by the derivative of the solutions \mu_t with respect to h , i.e., \rho_{t, h} = \lim_{\Delta h \to 0} \frac{\mu^{h+\Delta h}_t - \mu^h_t}{\Delta h} , where \mu^h_t solves
\begin{equation} \partial_t\mu_t^h + \partial_x(v^h(x)\mu_t^h) = 0 \end{equation} | (4.1) |
with initial condition \mu^h_{|t = 0} = \mu_0 and v^h = v_0 + hv_p where v_0, v_p \in C^{1+\alpha}(\mathbb{R}^d, \mathbb{R}^d) are given vector fields. The derivative \rho_{t, h} exists according to Theorem 2.1.
To obtain the equation \rho_{t, h} should solve, we substract the equations satisfied by \mu_t^h and \mu_t^{h+\Delta h} , namely
\partial_t\mu_t^h + \partial_x(v^h(x)\mu_t^h) = 0 |
\partial_t\mu_t^{h+\Delta h} + \partial_x((v^h(x)+\Delta h v_p(x))\mu_t^{h+\Delta h}) = 0 |
to obtain that \rho_{t, h}^{\Delta h} : = \frac{\mu^{h+\Delta h}_t - \mu^h_t}{\Delta h} satisfies
\partial_t\rho_{t, h}^{\Delta h} + \partial_x(v^h(x)\rho_{t, h}^{\Delta h} ) = - \partial_x(v_p(x) \mu_t^{h+\Delta h} ). |
Thus, intuitively the limit \rho_{t, h} = \lim_{\Delta h \to 0}\, \rho_{t, h}^{\Delta h} should satisfy
\begin{equation} \partial_t \rho_{t, h} + \partial_x(v^h(x)\rho_{t, h}) = - \partial_x(v_p(x)\mu_t^h). \end{equation} | (4.2) |
Since the right-hand side belongs to Z in view of Proposition 2.4, we are naturally led to study this equation in Z . The following Theorem asserts that this intuition is correct and can be rigurously justifed.
Theorem 4.1. The derivative \rho_{t, h} = \lim_{\Delta h \to 0} \frac{\mu_t^{h+\Delta h} - \mu_t^h}{\Delta h} where \mu_t^h and \mu_t^{h+\Delta h} solve (4.1) is the unique weak solution (cf. Definition 3.2) of
\begin{equation} \partial_t \rho_{t, h} + \partial_x(v^h(x)\rho_{t, h}) = - \partial_x(v_p(x)\mu_t^h) \end{equation} | (4.3) |
with initial condition \rho_{0, h} = 0 .
Proof. Let \rho^{\Delta h}_{t, h}: = (\mu_t^{h+\Delta h}-\mu_t^h)/{\Delta h} . Since \mu_t^{h+\Delta h} and \mu_t^h are solutions to (4.1), we have that for any \varphi \in C^1([0, T]\times \mathbb{R}^d) \cap W^{1, \infty}([0, T]\times \mathbb{R}^d) :
\begin{multline*} \int_{\mathbb{R}^d} \varphi(x, t)\, d\mu^h_t(x) - \int_{\mathbb{R}^d} \varphi(x, 0)\, d\mu_0(x) \\ = \int_0^t \int_{\mathbb{R}^d} \partial_t \varphi(x, s) d\mu^h_s(x)ds + \int_0^t\int_{\mathbb{R}^d} (v_0(x)+hv_p(x))\cdot \nabla \varphi(x, s)\, d\mu^h_s ds \end{multline*} |
and similarly
\begin{multline*} \int_{\mathbb{R}^d} \varphi(x, t)\, d\mu^{h+\Delta h}_t(x) - \int_{\mathbb{R}^d} \varphi(x, 0)\, d\mu_0(x) \\ = \int_0^t \int_{\mathbb{R}^d} \partial_t \varphi(x, s) d\mu^{h+\Delta h}_s(x)ds + \int_0^t\int_{\mathbb{R}^d} (v_0(x)+(h+\Delta h)v_p(x))\cdot \nabla \varphi(x, s)\, d\mu^{h+\Delta h}_s ds. \end{multline*} |
Substracting these equations and dividing by \Delta h , we obtain that
\begin{eqnarray*} \label{IncQuot} \int_{\mathbb{R}^d} \varphi(x, t)\, d\rho^{\Delta h}_{t, h} & = & \int_0^t \int_{\mathbb{R}^d} \partial_t \varphi(x, s)\, d\rho^{\Delta h}_{s, h} ds + \int_0^t\int_{\mathbb{R}^d} v^h(x)\cdot\nabla\varphi(x)\, d\rho^{\Delta h}_{s, h}(x) ds \\ && + \int_0^t\int_{\mathbb{R}^d} v_p(x)\cdot\nabla\varphi(x)\, d\mu_s^{h+\Delta h} ds. \end{eqnarray*} |
Since \mu^{h+\Delta h}\to \mu^h in C([0, T], \mathcal{M}(\mathbb{R}^d)) as \Delta h\to 0 , We can pass to the limit \Delta h\to 0 in the last term on the right-hand side using the Dominated Convergence Theorem to obtain
\begin{equation} \begin{split} \int_{\mathbb{R}^d} \varphi(x, t)\, d\rho^{\Delta h}_{t, h} = & \int_0^t \int_{\mathbb{R}^d} \partial_t \varphi(x, s)\, d\rho^{\Delta h}_{s, h} ds + \int_0^t\int_{\mathbb{R}^d} v^h(x)\cdot\nabla\varphi(x)\, d\rho^{\Delta h}_{s, h}(x) ds \\ & + \int_0^t\int_{\mathbb{R}^d} v_p(x)\cdot\nabla\varphi(x)\, d\mu_s^h ds. \end{split} \end{equation} | (4.4) |
Recall that we know from Theorem 2.1 that the limit \rho_h = \lim_{h\to 0} \rho_{h}^{\Delta h} exists in C([0, T], Z) - in particular \|\rho_{t, h}^{\Delta h}\|_Z\le C_T for any t\in [0, T] and any \Delta h small. Now, if \varphi satisfies \varphi \in C([0, T], C^{2+\alpha}(\mathbb{R}^d)) and \partial_t\varphi \in C([0, T], C^{1+\alpha}(\mathbb{R}^d)) , using that v_0, v_p \in C^{1+\alpha}(\mathbb{R}^d, \mathbb{R}^d) , we deduce that as \Delta h\to 0 ,
(\rho^{\Delta h}_{s, h}, \partial_t \varphi(., s) + v^h\cdot\nabla\varphi(., s)) \to (\rho_{s, h}, \partial_t \varphi(., s) + v^h\cdot\nabla\varphi(., s)). |
Moreover for any s\in [0, T] and \Delta h small,
\begin{eqnarray*} |(\rho^{\Delta h}_{s, h}, \partial_t \varphi(., s) + v^h\cdot\nabla\varphi(., s))| \le \|\rho_{t, h}^{\Delta h}\|_Z \| \partial_t \varphi(., s) + v^h\cdot\nabla\varphi(., s) \|_{C^{1+\alpha}} \le C_T. \end{eqnarray*} |
Using the Dominated Convergence Theorem, we can thus send \Delta h \to 0 in (4.4) to deduce:
\begin{equation} \begin{split} \left(\rho_{t, h}, \varphi(\cdot, t) \right) = & \int_0^t\int_{\mathbb{R}^d} v_p(x)\cdot\nabla\varphi(x, s)\, d\mu_s^h ds \\ & + \int_0^t \left(\rho_{s, h} , \partial_t \varphi(\cdot, s) + v^h(\cdot)\cdot\nabla\varphi(\cdot, s) \right)\, ds. \end{split} \end{equation} | (4.5) |
Thus, \rho_{t, h} is a very weak solution of (4.3) with initial condition \rho_{0, h} = 0 .
Let us prove that \rho_{t, h} is the unique weak solution to (4.3). First note that due to Corollary 2.1, \partial_x(v_p(x)\mu^h_t) \in C([0, T], Z) . Moreover, we claim that \rho_{t, h} \in \mathcal{A} for all t\in [0, T] , where \mathcal{A} is the admissibility class defined in (3.8). Indeed since \rho_{t, h}^{\Delta h}\to \rho_{t, h} in Z , it suffices to verify that \|\rho_{t, h}^{\Delta h}\|_{BL^*}\le C with C independent of \Delta h . Recall that \mu_t^h = (T_t^h)^{\#} \mu_0 where T^h_t is the flow of v^h . Using Gronwall inequality it is easy to see that
\|T_t^h-T_t^{h+\Delta h}\|_\infty \le \Delta h \|v_p\|_\infty \exp(Lip(v^h)t). |
Thus for any \phi\in W^{1, \infty}(\mathbb{R}^d) , \|\phi\|_{W^{1, \infty}}\le 1 ,
\begin{eqnarray*} (\mu_t^h-\mu_t^{h+\Delta h}, \phi) & = & (\mu_0, \phi\circ T_t^h - \phi\circ T_t^{h+\Delta h}) \le \|\mu_0\|_\infty \|T_t^h - T_t^{h+\Delta h}\|_\infty \\ & \le & \|\mu_0\|_\infty \Delta h \|v_p\|_\infty \exp(Lip(v^h)t) = : C_{T, h}\Delta h \end{eqnarray*} |
Taking the supremum over such \phi , we deduce that \|\mu_t^h-\mu_t^{h+\Delta h}\|_{BL^*}\le C_{T, h}\Delta h . Therefore, in view of Theorem 3.1, we conclude that \rho_{t, h} is the unique weak solution to (4.3).
Notice that in the previous proof we exploited the fact that we already knew that the derivative \rho_h = \lim_{h\to 0} \rho_{h}^{\Delta h} exists due to [23]. But the well-posedness theory we established in the previous section and the fact \rho_t^h is characterized as the unique solution to equation (4.3) allow us to give an alternative short proof of the existence of \rho_h . Indeed let us define \rho_{t, h} as the unique solution to (4.3). We then need to prove that
\begin{equation} \lim\limits_{\Delta h\to 0} \max\limits_{0\le t\le T} \|\rho_{t, h}^{\Delta h}-\rho_{t, h}\|_Z = 0. \end{equation} | (4.6) |
In view of (4.4), \rho_{t, h}^{\Delta h} satisfies
\partial_t \rho^{\Delta h}_{t, h} + \partial_x(v^h(x)\rho^{\Delta h}_{t, h}) = - \partial_x(v_p(x)\mu_t^{h+\Delta h}). |
Since \rho_{t, h}, \rho_{t, h}^{\Delta h}\in \mathcal{A} , Theorem 3.1 yields
\rho_{t, h} = \int_0^t (T_{t-s}^h)^{\#} \nu_s^h \, ds, \qquad \nu_s^h = - \partial_x(v_p(x)\mu_s^h), |
\rho_{t, h}^{\Delta h} = \int_0^t (T_{t-s}^h)^{\#} \nu_s^{h+\Delta h} \, ds, \qquad \nu_s^{h+\Delta h} = - \partial_x(v_p(x)\mu_s^{h+\Delta h}), |
where T_t^h is the flow of v^h . Then
\|\rho_{t, h}^{\Delta h}-\rho_{t, h}\| \le \int_0^t \|(T_{t-s}^h)^{\#} \nu_s^{h+\Delta h}- (T_{t-s}^h)^{\#} \nu_s^h\|_Z\, ds \le C_{T, h}\int_0^t \|\nu_s^{h+\Delta h}- \nu_s^h\|_Z\, ds |
where we used in the last equality that \|\phi\circ T_t^h\|_{C^{1+\alpha}}\le C_{T, h} for any \|\phi\|_{C^{1+\alpha}}\le 1 . We deduce (4.6) using Lemma 4.1 below.
Lemma 4.1. There holds
\begin{equation} \lim\limits_{\Delta h\to 0} \max\limits_{0\le t\le T} \|\nu_s^{h+\Delta h}- \nu_s^h\|_Z = 0. \end{equation} | (4.7) |
Proof. The proof follows the line of the proof of Proposition 2.3. Suppose that (4.7) is not true so that there exist \varepsilon > 0 and sequences \{t_{\Delta h} \}\subset [0, T] , \{\phi_{\Delta h}\}\subset C^{1+\alpha}(\mathbb{R}^d) , \|\phi_{\Delta h}\|_{C^{1+\alpha}}\le 1 such that
\begin{equation} (\mu_{t_{\Delta h}}^{h+\Delta h}-\mu_{t_{\Delta h}}^h, v_p \nabla \phi_{\Delta h} )\ge \varepsilon \gt 0. \end{equation} | (4.8) |
As in the proof of Proposition 2.3 there exists \phi\in C^{1+\alpha}(\mathbb{R}^d) , \|\phi\|_{C^{1+\alpha}}\le 1 , such that up to a subsequence \phi_{\Delta h}\to \phi in C^1_{loc}(\mathbb{R}^d) . Moreover there exists t_0 = \lim_{\Delta h\to 0}t_{\Delta h} up to a subsequence. Independently recall that \mu_t^h = (T_t^h)^{\#}\mu_0 and \mu_t^{h+\Delta h} = (T_t^{h+\Delta h})^{\#}\mu_0 . It follows that \|\mu_t^h\|_{TV}, \|\mu_t^{h+\Delta h}\|_{TV}\le \|\mu_0\|_{TV} and also that for any \delta > 0 there exists a compact set K\subset \mathbb{R}^d such that
|\mu_t^{h+\Delta h}|( \mathbb{R}^d\backslash K), \, |\mu_t^h|( \mathbb{R}^d\backslash K)\le \delta \qquad \text{for any $|\Delta h|\le 1$ and $t\in [0, T]$.} |
Since v_p \nabla \phi_{\Delta h}\to v_p \nabla \phi in C_{loc}(\mathbb{R}^d) it follows that
\begin{equation} (\mu_{t_{\Delta h}}^{h+\Delta h}, v_p \nabla \phi_{\Delta h}) - (\mu_{t_{\Delta h}}^{h+\Delta h}, v_p \nabla \phi) \to 0. \end{equation} | (4.9) |
Eventually letting \psi: = v_p \nabla \phi we have
(\mu_{t_{\Delta h}}^{h+\Delta h}-\mu_{t_{\Delta h}}^h, v_p \nabla \phi) = \int_{ \mathbb{R}^d} \psi(T_{t_{\Delta h}}^{h+\Delta h}(x))-\psi(T_{t_{\Delta h}}^h(x)) \, d\mu_0(x). |
Since \psi is bounded and T_{t_{\Delta h}}^{h+\Delta h}(x))\to T_{t_0}^h(x) , T_{t_{\Delta h}}^h(x)\to T_{t_0}^h(x) for any x\in \mathbb{R}^d , the Dominated Convergence Theorem gives (\mu_t^{h+\Delta h}-\mu_t^h, v_p \nabla \phi)\to 0 . This and (4.9) contradicts (4.8).
In this Section we formulate an equation satisfied by the derivative
\rho_{t, h} = \lim\limits_{\Delta h \to 0} \frac{\mu^{h+\Delta h}_t - \mu^h_t}{\Delta h} |
where \mu^h_t solves
\begin{equation} \partial_t\mu_t^h + \partial_x(v^h[\mu_t^h](x)\mu_t^h) = 0 \end{equation} | (5.1) |
with the initial condition \mu^h_{|t = 0} = \mu_0 and v^h[\mu_t^h] is a vector field which depends a priori in a non-linear way of \mu_t^h .
Let us first present some heuristic computations to determine the equation \rho_{t, h} should satisfy. Let \rho^{\Delta h}_{t, h}: = (\mu_t^{h+\Delta h}-\mu_t^h)/{\Delta h} . Since \mu_t^{h+\Delta h} and \mu_t^h are solutions to (5.1) we have that for any \varphi \in C^1([0, T]\times \mathbb{R}^d) \cap W^{1, \infty}([0, T]\times \mathbb{R}^d) :
\begin{multline} \int_{\mathbb{R}^d} \varphi(x, t)\, d\mu^h_t(x) - \int_{\mathbb{R}^d} \varphi(x, 0)\, d\mu_0(x) = \\ = \int_0^t \int_{\mathbb{R}^d} \partial_t \varphi(x, s) d\mu^h_s(x)ds + \int_0^t\int_{\mathbb{R}^d} v^h[\mu^h_s](x)\cdot \nabla \varphi(x, s)\, d\mu^h_s ds \end{multline} | (5.2) |
and similarly
\begin{multline} \int_{\mathbb{R}^d} \varphi(x, t)\, d\mu^{h+\Delta h}_t(x) - \int_{\mathbb{R}^d} \varphi(x, 0)\, d\mu_0(x) = \int_0^t \int_{\mathbb{R}^d} \partial_t \varphi(x, s) d\mu^{h+\Delta h}_s(x)ds+ \\ + \int_0^t\int_{\mathbb{R}^d} (v^h[\mu^{h+\Delta h}_s](x) + \Delta h.v_p[\mu^{h+\Delta h}_s](x))\cdot \nabla \varphi(x, s)\, d\mu^{h+\Delta h}_s ds. \end{multline} | (5.3) |
Notice that \mu^{h+\Delta h}_s = \mu^{h}_s + \Delta h \rho^{\Delta h}_{t, h} . Then performing formally a first order Taylor expansion,
\begin{equation} \begin{split} v^h[\mu^{h+\Delta h}_s](x) & = v^h[\mu^{h}_s + \Delta h. \rho^{\Delta h}_{t, h}] \\ & = v^h[\mu^{h}_s] + \Delta h. Dv^h[\mu^h_s].\rho^{\Delta h}_{t, h} + \Delta h. o(1). \end{split} \end{equation} | (5.4) |
Substracting (5.9) from (5.10) and dividing by \Delta h , we then obtain
\begin{eqnarray*} && \int_{\mathbb{R}^d} \varphi(x, t)\, d\rho^{\Delta h}_{t, h}(x) - \int_0^t \int_{\mathbb{R}^d} \partial_t \varphi(x, s) d\rho^{\Delta h}_{s, h}(x)ds - \int_0^t\int_{\mathbb{R}^d} v^h[\mu^h_s](x)\nabla\phi(x, s)\, d\rho^{\Delta h}_{s, h}ds \\ && = \int_0^t\int_{ \mathbb{R}^d} \Big( Dv^h[\mu^h_s]\rho^{\Delta h}_{s, h} + v_p[\mu_s^{h+\Delta h}] + o(1)\Big)\nabla\phi(x, s)\, d\mu_s^{h+\Delta h}ds. \end{eqnarray*} |
Thus \rho^{\Delta h}_{t, h} solves the linear equation
\begin{equation} \partial_t\rho^{\Delta h}_{t, h} + \partial_x (v^h[\mu_t^h]\rho^{\Delta h}_{t, h}) = - \partial_x \Big( (Dv^h[\mu^h_s]\rho^{\Delta h}_{s, h} + v_p[\mu_s^{h+\Delta h}] + o(1)) \mu_t^{h+\Delta h} \Big) \end{equation} | (5.5) |
with initial condition \rho^{\Delta h}_{t = 0, h} = 0 . We thus expect the limit \rho_{t, h} to solve
\begin{equation} \partial_t\rho_{t, h} + \partial_x (v^h[\mu_t^h]\rho_{t, h}) = - \partial_x \Big( (Dv^h[\mu^h_t]\rho_{t, h} + v_p[\mu_t^h]\mu_t^h \Big). \end{equation} | (5.6) |
Comparing with the linear caser studied in the previous section where we obtained the sensitivity equation (4.3), the situation now is more complicated because even if (5.6) is linear in \rho_{t, h} , the right-hand side depends on \rho_{t, h} and the existence and uniqueness theory developed so far does not apply directly.
It turns out however that the previous formal reasonning (in particular the formal Taylor expansion (5.4)) can be justified when v^h[\mu] is of the form (2.11), namely
\begin{equation*} \begin{array}{ll} v^h[\mu](x) & = v_0[\mu](x) + hv_p[\mu](x) \\ & = V_0\left(x, \int_{\mathbb{R}^d} K_{V_0}(x, y) d\mu(y)\right) + h V_p\left(x, \int_{\mathbb{R}^d} K_{V_p}(x, y) d\mu(y)\right) \end{array} \end{equation*} |
with V_0, V_p \in C^{1+\alpha}(\mathbb{R}^d \times \mathbb{R}, \mathbb{R}^d) and K_{V_0}, K_{V_p} \in C^{2+\alpha}(\mathbb{R}^d \times \mathbb{R}^d, \mathbb{R}) for some \alpha > \frac{1}{2} . In that case the derivative \rho_{t, h} exists according to Theorem 2.2 and we have the following result from [24] (Lemma 4.6):
Lemma 5.1. Let V, K_V \in C^{1+\alpha}(\mathbb{R}^d \times \mathbb{R}^d) and the map h \mapsto \mu_t^h be differentiable in Z . Then, for every x \in \mathbb{R}^d , the map h \mapsto V \left(x, \int_{\mathbb{R}^d} K_V(x, y) d \mu_t^h\right) is C^{1+\alpha}(\mathbb{R}, \mathbb{R}^d) with norms bounded by some constant depending on the C^{1+\alpha} norms of V and K_V as well as Z norm of derivative of \mu_t^h . Moreover, if \rho_{t, h} = \lim_{\Delta h \to 0} \frac{\mu_t^{h+\Delta h} - \mu_t^h}{\Delta h} , we have the following chain rule:
\frac{ \partial}{ \partial h} V \left(x, \int_{\mathbb{R}^d} K_V(x, y) d \mu_t^h(y)\right) = \nabla_y V \left(x, \int_{\mathbb{R}^d} K_V(x, y) d \mu_t^h(y)\right) \left( \rho_{t, h}, K_V(x, \cdot) \right). |
where \nabla_y V denotes the gradient of V with respect to the second variable.
Then Lemma 5.1 and Lemma 2.1 gives the following rigorous Taylor expansion:
Corollary 5.1. In the framework of Lemma 5.1,
\begin{equation} \begin{split} & V \left(x, \int_{\mathbb{R}^d} K_V(x, y) d \mu_t^{h+\Delta h}\right) - V \left(x, \int_{\mathbb{R}^d} K_V(x, y) d \mu_t^h\right) \\ & = \mathcal{C}[V, \mu_t^h](x) \left( \rho_{t, h}, K_V(x, \cdot) \right) + O(|h|^{1+\alpha}) \end{split} \end{equation} | (5.7) |
where
\mathcal{C}[V, \mu](x) = \nabla_y V \left(x, \int_{\mathbb{R}^d} K_V(x, y) d \mu \right) |
and the O(|h|^{1+\alpha}) is uniform in x\in \mathbb{R}^d .
The following theorem asserts that the sensitivity equation (5.6) we obtained formally is the correct one:
Theorem 5.1. The derivative \rho_{t, h} = \lim_{\Delta h \to 0} \frac{\mu_t^{h+\Delta h} - \mu_t^h}{\Delta h} where \mu_t^h and \mu_t^{h+\Delta h} solve (5.1) is the unique weak solution of
\begin{multline} \partial_t \rho_{t, h} + \partial_x(v^h[\mu_t^h] (x)\rho_{t, h}) = - \partial_x(v_p(x)\mu_t^h) \\ - \partial_x\left[\mathcal{C}[V_0, \mu_t^h](x) \left( \rho_{s, h}, K_{V_0}(x, \cdot) \right) \mu_t^h\right] - \partial_x\left[\mathcal{C}[V_p, \mu_t^h](x) \left( \rho_{s, h}, K_{V_p}(x, \cdot) \right) \mu_t^h\right] \end{multline} | (5.8) |
with initial condition \rho_{0, h} = 0 . More precisely, the weak formulation is satisfied for all test functions \varphi(x, t) of regularity \varphi \in C([0, T], C^{2+\alpha}(\mathbb{R}^d)) , \varphi_t \in C([0, T], C^{1+\alpha}(\mathbb{R}^d)) , and \rho_{t, h} \in \mathcal{A} for all t\in [0, T] where \mathcal{A} is defined in (3.8).
Proof. Let \rho^{\Delta h}_{t, h}: = (\mu_t^{h+\Delta h}-\mu_t^h)/{\Delta h} . Since \mu_t^{h+\Delta h} and \mu_t^h are solutions to (5.1) we have that for any \varphi \in C^1([0, T]\times \mathbb{R}^d) \cap W^{1, \infty}([0, T]\times \mathbb{R}^d) :
\begin{multline} \int_{\mathbb{R}^d} \varphi(x, t)\, d\mu^h_t(x) - \int_{\mathbb{R}^d} \varphi(x, 0)\, d\mu_0(x) = \int_0^t \int_{\mathbb{R}^d} \partial_t \varphi(x, s) d\mu^h_s(x)ds \\ +\int_0^t\int_{\mathbb{R}^d} (v_0[\mu^h_s](x) + hv_p[\mu^h_s](x))\cdot \nabla \varphi(x, s)\, d\mu^h_s ds \end{multline} | (5.9) |
and similarly
\begin{multline} \int_{\mathbb{R}^d} \varphi(x, t)\, d\mu^{h+\Delta h}_t(x) - \int_{\mathbb{R}^d} \varphi(x, 0)\, d\mu_0(x) = \int_0^t \int_{\mathbb{R}^d} \partial_t \varphi(x, s) d\mu^{h+\Delta h}_s(x)ds \\ + \int_0^t\int_{\mathbb{R}^d} (v_0[\mu^{h+\Delta h}_s](x) + (h+\Delta h)v_p[\mu^{h+\Delta h}_s](x))\cdot \nabla \varphi(x, s)\, d\mu^{h+\Delta h}_s ds. \end{multline} | (5.10) |
The plan is to substract these equations, divide by \Delta h and pass to the limit \Delta h \to 0 . First, in view of (5.7),
\begin{eqnarray*} && v_0[\mu^{h+\Delta h}_s](x) + h v_p[\mu^{h+\Delta h}_s](x) = v_0[\mu^h_s](x) + h v_p[\mu^h_s](x) \\ && + \, \mathcal{C}[V_0, \mu_s^h](x)\left( \rho_{s, h}, K_{V_0}(x, \cdot) \right) + \mathcal{C}[V_p, \mu_s^h](x)\left( \rho_{s, h}, K_{V_p}(x, \cdot) \right) + O(|h|^{1+\alpha}) . \end{eqnarray*} |
Therefore, for \varphi(x, t) of regularity \varphi \in C([0, T], C^{2+\alpha}(\mathbb{R}^d)) and \varphi_t \in C([0, T], C^{1+\alpha}(\mathbb{R}^d)) , we substract (5.9) from (5.10), divide by \Delta h and send \Delta h \to 0 . Recalling that \rho_{t, h}^{\Delta h}\to \rho_{t, h} in Z uniformly in t\in [0, T] , we obtain
\begin{eqnarray*} \left(\rho_{t, h}, \varphi(\cdot, t) \right) & = & \int_0^t\int_{\mathbb{R}^d} v_p[\mu_s^h](x)\cdot\nabla\varphi(x, s)\, d\mu_s^h(x) ds \\ && + \int_0^t \left(\rho_{s, h} , \partial_t \varphi(\cdot, s) + v^h[\mu^h_s](\cdot)\cdot\nabla\varphi(\cdot, s) \right)\, ds \\ && +\int_0^t\int_{\mathbb{R}^d} \left[\mathcal{C}[V_0, \mu_s^h](x) \left( \rho_{s, h}, K_{V_0}(x, \cdot) \right) \right] \cdot \nabla\varphi(x, s) d\mu_s^h(x) ds \\ && +h \int_0^t\int_{\mathbb{R}^d} \left[\mathcal{C}[V_p, \mu_s^h](x) \left( \rho_{s, h}, K_{V_p}(x, \cdot) \right) \right] \cdot \nabla\varphi(x, s) d\mu_s^h(x) ds. \end{eqnarray*} |
Thus, \rho_{t, h} is a weak solution of (5.8). It is also in the admissible class \mathcal{A} due to the Lipschitz continuity of solutions with respect to the vector field.
To obtain uniqueness, suppose that \rho^{(1)}_{t, h} and \rho^{(2)}_{t, h} are solutions to (5.8) with values in \mathcal{A} . Then, their difference \rho_{t, h} = \rho^{(1)}_{t, h} - \rho^{(2)}_{t, h}\in \mathcal{A} satisfies
\begin{align} \left(\rho_{t, h}, \varphi(\cdot, t) \right) = & \int_0^t \left(\rho_{s, h} , \partial_t \varphi(\cdot, s) + v^h[\mu^h_s](\cdot)\cdot\nabla\varphi(\cdot, s) \right)\, ds \\ &+ \int_0^t\int_{\mathbb{R}^d} \left[\mathcal{C}[V_0, \mu_s^h](x) \left( \rho_{s, h}, K_{V_0}(x, \cdot) \right) \right] \cdot \nabla\varphi(x, s) d\mu_s^h(x) ds \\ &+ h \int_0^t\int_{\mathbb{R}^d} \left[\mathcal{C}[V_p, \mu_s^h](x) \left( \rho_{s, h}, K_{V_p}(x, \cdot) \right) \right] \cdot \nabla\varphi(x, s) d\mu_s^h(x) ds. \end{align} | (5.11) |
Fix \psi \in C^{2+\alpha}(\mathbb{R}^d) . As in the proof of Proposition 3.1, we again use the duality method to find a test function \varphi_{\psi}(x, t) such that
\partial_t \varphi_{\psi}(\cdot, s) + v^h[\mu^h_t](x)\cdot\nabla\varphi_{\psi}(x, s) = 0 \quad \quad \varphi_{\psi}(x, t) = \psi(x). |
Actually, it can be given explicitly as \varphi_{\psi}(x, s) = \psi(T(x, t, s)) where T is the flow of the non-autonomous vector field v^h[\mu^h_s] which solves the ODE:
\partial_s \; T(x, s, t) = v^h[\mu^h_s]\left(T(x, s, t) \right), \quad \quad T(x, t, t) = x, |
see Remark 8.1.5 and Proposition 8.1.7 in [27]. Using the test-function \phi_v in (5.11) we deduce
\begin{equation} \begin{split} \left(\rho_{t, h}, \psi \right) & = \int_0^t\int_{\mathbb{R}^d} \left[\mathcal{C}[V_0, \mu_s^h](x) \left( \rho_{s, h}, K_{V_0}(x, \cdot) \right) \right] \cdot \nabla\varphi_{\psi}(x, s) d\mu_s^h(x) ds \\ &+ h \int_0^t\int_{\mathbb{R}^d} \left[\mathcal{C}[V_p, \mu_s^h](x) \left( \rho_{s, h}, K_{V_p}(x, \cdot) \right) \right] \cdot \nabla\varphi_{\psi}(x, s) d\mu_s^h(x) ds \end{split} \end{equation} | (5.12) |
for any \psi \in C^{2+\alpha}(\mathbb{R}^d) . Since the kernels K_{V_0} and K_{V_p} are both assumed to be C^{2+\alpha}(\mathbb{R}^d \times \mathbb{R}^d) , there is a constant C such that
\left( \rho_{s, h}, K_{V_0}(x, \cdot) \right), \left( \rho_{s, h}, K_{V_p}(x, \cdot) \right) \leq C \sup\limits_{\|\psi\|_{ C^{2+\alpha}} \leq 1} \left(\rho_{s, h}, \psi \right). |
Moreover, for \psi \in C^{2+\alpha}(\mathbb{R}^d) with \| \psi \|_{C^{2+\alpha}} \leq 1 we see from the explicit formula that there is another constant C such that \|\nabla\varphi_{\psi} \|_{\infty} \leq C . Therefore, from (5.12), we conclude
\sup\limits_{\|\psi\|_{ C^{2+\alpha}} \leq 1} \left(\rho_{t, h}, \psi \right) \leq C \int_0^t \sup\limits_{\|\psi\|_{ C^{2+\alpha}} \leq 1} \left(\rho_{s, h}, \psi\right) ds |
for some possibly bigger constant C . Now, Gronwall inequality implies
\left(\rho_{s, h}, \psi\right) = 0 |
for all s \in [0, t] and all \psi \in C^{2+\alpha}(\mathbb{R}^d) . As \rho_{s, h} is in the admissible class \mathcal{A} , we can repeat the uniqueness proof from Theorem 3.1 to deduce that \rho_{s, h} = 0 as desired.
As mentioned above, transport-type equations like (1.1) represent a big variety of phenomena occurring in physics, biology and social sciences. In this section we present applications that the theory developed here is of use.
Here we are interested in functionals of the form
\mathcal{J}(h) = \int_{\mathbb{R}^d} F(x) d\mu_t^h(x), |
where \mu_t^h is a measure solution to the perturbed transport equation (1.2) on the space of nonnegative Radon measure, while F \in C^{1+\alpha}(\mathbb{R}^d) . Such functionals can describe various quantities of practical importance. For example, for F(x) = 1 this functional provides the total number of individuals in a population, since \mu^h \in C([0, T], \mathcal{M}^+ (\mathbb{R}^d) .
Now, let \partial_h \mu_t^h \in C([0, T], Z) be the derivative of \mu_t^h with respect to h . Then, h \mapsto \mathcal{J}(h) is differentiable and
\partial_h \mathcal{J}(h) = (\partial_h \mu_t^h, F), |
value of this derivative can be used in the optimization of the functional \mathcal{J} , i.e., finding value of h for which \mathcal{J} is the smallest. Our work characterizes the derivative as the solutions of some PDE, thus allowing to work on appropriate approximating schemes for the quantity (\partial_h \mu_t^h, F) .
Another application of paramount importance is parameter estimation and fitting models to data, as this allows for model validation. To this end, let \int_{\mathbb{R}^d} d\mu_t^h , represents the total number of individuals in a population at time t provided by the perturbed transport equation model considered on the space of nonnegative Radon measures. Suppose that D_{k} represents data on the number of individuals in the population at time t_k , k = 1, \dots, K (a time series of the total population). Consider the following minimization problem involving a least-squares functional that measures the distance between the model solution and data:
\min\limits_{h} \mathcal{J} (h) = \min\limits_{h} \sum\limits_{k = 1}^K \left |\int_{\mathbb{R}^d} d\mu_{t_k}^h - D_{k} \right |^2, |
subject to
\partial_t\mu^h_{t} + \partial_x(v^h[\mu](x)\mu^h_t) = 0, \quad \mu^h_t |_{t = 0} = \mu_0 \in \mathcal{M}^+ (\mathbb{R}^d). |
The derivative \partial_h\mathcal{J}(h) which depends on the derivative of \partial_h(\mu_t^{h}) , the solution to (4.3), can be used to minimize the least-squares distance \mathcal{J}(h) . The value \bar h that minimizes \mathcal{J}(h) , also provides an estimate for the vector field given by v^{\bar h} .
We conclude by pointing our that the above two applications demonstrate the need for the development of numerical approximation schemes for computing solutions to transport equations of the type (4.1) or (4.3). There has been some efforts in the direction of solving transport equations in the space of nonnegative Radon measures endowed with the Bounded Lipschitz norm (e.g. [31,32]), but to our knowledge, no such numerical schemes exist for solving transport equations in the space Z . Furthermore, because minimization problems generally involve computing the solution multiple times until a minimizer is reached, it is important that for any scheme developed to be efficient and fast.
Proof. We want to prove that if \mu_n\to \mu narrowly, then \partial_x\mu_n\to \partial_x\mu in Z i.e.
\lim\limits_{n\to +\infty} \| \partial_x\mu_n- \partial_x\mu\|_Z = \lim\limits_{n\to +\infty} \sup\limits_{\|\phi\|_{C^{1, \alpha}}\le 1} \, |(\mu_n-\mu, \partial_x\phi)| = 0. |
Assume that this is not true. Then there exist \varepsilon > 0 , a subsequence (\mu_{n_k})_k that we still denote by (\mu_n)_n for simplicity, and functions \phi_n , \|\phi_n\|_{C^{1, \alpha}}\le 1 , such that
\begin{equation} |(\mu_n-\mu, \partial_x\phi_n)| \ge \varepsilon \gt 0. \end{equation} | (6.1) |
By Arzela-Ascoli theorem, up to a subsequence, \phi_n\to \phi in C^1(K) for any compact set K\subset \mathbb{R}^d . Passing to the limit in |\phi_n(x)|\le 1 , |\phi'_n(x)|\le 1 , and |\phi_n'(x)-\phi_n'(y)|\le |x-y|^\alpha , we obtain that \|\phi\|_{C^{1, \alpha}}\le 1 . From Theorem 5 in [33], we deduce that
(\mu_n, \partial_x\phi_n) \to (\mu, \partial_x \phi). |
Moreover, from Theorem 4 in [33], we know that the sequence \{\mu_n \}_{n \in \mathbb{N}} is tight and TV-bounded. It follows that \mu is bounded and thus tight. We deduce that
(\mu, \partial_x\phi_n) \to (\mu, \partial_x \phi). |
These two facts contradict (6.1).
Proof. Let \{\mu_n\}_{n \in \mathbb{N}} \subset \mathcal{M}(\mathbb{R}^d) be such that \mu_n\to\mu in Z for \mu \in Z . Let \phi\in C^{1, \alpha}(\mathbb{R}^d) with \|\phi\|_{C^{1, \alpha}}\le 1 . Since T\in C^{1, \alpha}(\mathbb{R}^d, \mathbb{R}^d) we have \phi\circ T\in C^{1, \alpha}(\mathbb{R}^d) with \|\phi\circ T\|_{C^{1, \alpha}} \le C , independently of \phi . Then
|(T^{\#}\mu_n - T^{\#}\mu_m, \phi)| = |(\mu_n - \mu_m, \phi\circ T)| \le \|\mu_n-\mu_m\|_Z \|\phi\circ T\|_{C^{1, \alpha}} \le C \|\mu_n-\mu_m\|_Z. |
Thus, \|T^{\#}\mu_n - T^{\#}\mu_m\|_Z\le C \|\mu_n-\mu_m\|_Z and so the sequence \{T^{\#}\mu_n\}_{n \in \mathbb{N}} is a Cauchy sequence in Z . By completeness of Z , it converges to some element we denote by T^{\#}\mu . This is indepentent of the choice of the approximating sequence \mu_n because if \{\tilde\mu_n \}_{n\in \mathbb{N}} \subset \mathcal{M}(\mathbb{R}^d) is another sequence such that \tilde\mu_n\to \mu in Z then
|(T\circ\tilde\mu_n-T\circ\mu_n, \phi)| = |(\mu_n-\tilde \mu_n, \phi\circ T)| \le C\|\mu_n-\mu\|_Z + C\|\tilde\mu_n-\mu\|_Z |
so that \|T\circ\tilde\mu_n-T\circ\mu_n\|_Z\to 0 . Moreover, for any \phi\in C^{1, \alpha}(\mathbb{R}^d) ,
(T^{\#}\mu, \phi) = \lim\limits_{n \to \infty} \, (T^{\#}\mu_n, \phi) = \lim \, (\mu_n, \phi\circ T) = (\mu, \phi\circ T). |
Proof. First note that map t\in [0, T] \mapsto \mu_t is uniformly continuous so there is a nondeacreasing function \omega:[0, \infty] \to [0, \infty] with \lim_{t\to 0^+} \omega(t) = \omega(0) = 0 (it is usually called modulus of continuity) such that
\| \mu_t - \mu_s \|_{Z} \leq \omega(|t-s|) \qquad s, t\in [0, T]. |
Given n \in \mathbb{N} , let \delta_n = T/n . We consider the partition \{t^{(n)}_0, .., t^{(n)}_n\} of [0, T] with mesh points t^{(n)}_k = k\delta_n for k = 0, ..., n . For each such k , consider a bounded measure \mu_k^{(n)} such that
\|\mu_{t^{(n)}_k} - \mu_k^{(n)}\|_{Z} \leq 1/n. |
Then, we define \mu^{(n)}\in C([0, T], Z) as the polygonal curve passing through the points (t^{(n)}_k, \mu_k^{(n)}) , k = 0, .., n , namely
\mu^{(n)}_t = \begin{cases} \mu_k^{(n)} & \mbox{ if } t = t^{(n)}_k \mbox{ for some } k = 0, ..., n. \\ \frac{t-t^{(n)}_k}{\delta_n}\mu_{k+1}^{(n)} +\frac{ t^{(n)}_{k+1}-t}{\delta_n}\mu_k^{(n)} & \mbox{ if } t \in (t^{(n)}_k, t^{(n)}_{k+1}) \mbox{ for some } k = 0, ..., n-1. \end{cases} |
Clearly, \mu^{(n)} \in C([0, T], Z) and for any n , \max_{0\le t\le T}\|\mu^{(n)}_t\|_{TV}\le C_n .
Now, for t\in[0, T] , let \hat t and \check t be the closest mesh points from left and right respectively. Then,
\begin{eqnarray*} \Big\| \mu^{(n)}_t - \mu^{(n)}_{\hat t}\Big\|_Z & = & \frac{ t^{(n)}_{k+1}-t}{\delta_n} \Big\| \mu^{(n)}_{\check t}- \mu^{(n)}_{\hat t}\Big\|_Z \leq \Big\| \mu^{(n)}_{\check t}- \mu^{(n)}_{\hat t}\Big\|_Z \\ & \leq & 2/n + \Big\| \mu_{\check t}- \mu_{\hat t}\Big\|_Z \leq 2/n + \omega(|\check t - \hat t|) \leq 2/n + \omega(\delta_n) \end{eqnarray*} |
Therefore, for any t\in[0, T] :
\begin{eqnarray*} \Big\| \mu^{(n)}_t - \mu_t \Big\|_Z & \leq & \Big\| \mu^{(n)}_t - \mu^{(n)}_{\hat t}\Big\|_Z + \Big\| \mu^{(n)}_{\hat t} - \mu_{\hat t}\Big\|_Z + \Big\| \mu_{\hat t} - \mu_{t}\Big\|_Z \\ & \leq & (2/n + \omega(\delta_n)) + 1/n + \omega(\delta_n) \end{eqnarray*} |
Thus \lim_{n\to +\infty} \max_{0\le t\le T} \Big\| \mu^{(n)}_t - \mu_t \Big\|_Z = 0 .
Proof. In view of Lemma 2.2 there exists (\nu^n)_n\subset C([0, T], \mathcal{M}_b(\mathbb{R}^d)) such that \lim_{n\to +\infty} \|\nu^n_t - \nu_t\|_Z = 0 uniformly in t\in [0, T] . For any \phi\in C^{1, \alpha}(\mathbb{R}^d) , \|\phi\|_{C^{1, \alpha}}\le 1 , and any s, t\in [0, T] , we write
\begin{eqnarray*} && |(T_s^{\#} \nu_s - T_t^{\#} \nu_t, \phi) | \le |(T_s^{\#} \nu_s - T_s^{\#} \nu_s^n, \phi)| +|(T_s^{\#} \nu_s^n - T_t^{\#} \nu_t^n, \phi)| + |(T_t^{\#} \nu_t^n - T_t^{\#} \nu_t, \phi)| \\ && \le \|\nu_s-\nu_s^n\|_Z \|\phi\circ T_s\|_{C^{1, \alpha}} + |(T_s^{\#} \nu_s^n - T_t^{\#} \nu_t^n, \phi)| + \|\nu_t-\nu_t^n\|_Z \|\phi\circ T_t\|_{C^{1, \alpha}} \end{eqnarray*} |
In view of Lemma 2.1 and Proposition 2.1 we have \|\phi\circ T_\tau\|_{C^{1, \alpha}}\le C_T for any \tau\in[0, T] . Thus
\begin{eqnarray*} && |(T_s^{\#} \nu_s - T_t^{\#} \nu_t, \phi) | \le |(T_s^{\#} \nu_s^n - T_t^{\#} \nu_t^n, \phi)| + 2C_T \max\limits_{0\le t\le T}\|\nu_t-\nu_t^n\|_Z \end{eqnarray*} |
Now, we handle the first term on the right-hand side as follows
\begin{eqnarray*} |(T_s^{\#} \nu_s^n - T_t^{\#} \nu_t^n, \phi)| & \le & |(T_s^{\#} \nu_s^n - T_s^{\#} \nu_t^n, \phi)| + |(T_s^{\#} \nu_t^n - T_t^{\#} \nu_t^n, \phi)| \\ & \le & \|\nu_s^n-\nu_t^n\|_Z \|\phi\circ T_s\|_{C^{1, \alpha}} + \|\nu_t^n\|_{TV} \|\phi\circ T_s-\phi\circ T_t\|_\infty \\ & \le & C_T\|\nu_s^n-\nu_t^n\|_Z + \|\nu_t^n\|_{TV} \|v\|_\infty|s-t|. \end{eqnarray*} |
Thus,
\begin{eqnarray*} && |(T_s^{\#} \nu_s - T_t^{\#} \nu_t, \phi) | \le C_T\|\nu_s^n-\nu_t^n\|_Z + \|\nu_t^n\|_{TV} \|v\|_\infty|s-t| + 2C_T \max\limits_{0\le t\le T}\|\nu_t-\nu_t^n\|_Z. \end{eqnarray*} |
We conclude recalling that for a fixed n , \nu_t^n is continuous in t for the Z -norm and TV-bounded uniformly in t\in [0, T] .
Nicolas Saintier is supported by the University of Buenos Aires through the grant UBACYT 20020170200256BA. Jakub Skrzeczkowski is supported by National Science Center, Poland through project no. 2017/27/B/ST1/01569.
The authors declare there is no conflicts of interest.
[1] | Abd Algani YM, Ritonga M, Bala BK, et al. (2022) Machine learning in health condition check-up: An approach using Breiman's random forest algorithm. Measurement 23: 100406. |
[2] |
Ariza-Garzón MJ, Arroyo J, Caparrini A, et al. (2020). Explainability of a machine learning granting scoring model in peer-to-peer lending. Ieee Access 8: 64873–64890. https://doi.org/10.1109/ACCESS.2020.2984412 doi: 10.1109/ACCESS.2020.2984412
![]() |
[3] | Biecek P, Burzykowski T (2021a) Explanatory model analysis: explore, explain, and examine predictive models. CRC Press. https://doi.org/10.1201/9780429027192 |
[4] | Biecek P, Burzykowski T (2021b) Local interpretable model-agnostic explanations (lime). Explanatory Model Analysis Explore, Explain and Examine Predictive Models, 1: 107–124. |
[5] |
Breiman L (2001) Random forests. Mach learn 45: 5–32. https://doi.org/10.1023/A:1010933404324 doi: 10.1023/A:1010933404324
![]() |
[6] |
Chen Y, Calabrese R, Martin-Barragan B (2024) Interpretable machine learning for imbalanced credit scoring datasets. Eur J Oper Res 312: 357–372. https://doi.org/10.1016/j.ejor.2023.06.036 doi: 10.1016/j.ejor.2023.06.036
![]() |
[7] | Davis R, Lo AW, Mishra S, et al. (2022) Explainable machine learning models of consumer credit risk. J Financ Data Sci 5. |
[8] | Du Toit H, Schutte WD, Raubenheimer H (2023) Shapley values as an interpretability technique in credit scoring. J Risk Model Validat 17. |
[9] |
Dube L, Verster T (2023) Enhancing classification performance in imbalanced datasets: A comparative analysis of machine learning models. Data Sci Financ Econ 3: 354–379. https://doi.org/10.3934/DSFE.2023021 doi: 10.3934/DSFE.2023021
![]() |
[10] | Dube L, Verster T (2024) Assessing the performance of machine learning models for default prediction under missing data and class imbalance: A simulation study. ORiON 40: 1–24. |
[11] |
Dumitrache A, Nastu AA, Stancu S (2020) Churn prediction in telecommunication industry: Model interpretability. J Eastern Eur Res Bus Econ 2020. https://doi.org/10.5171/2020.241442 doi: 10.5171/2020.241442
![]() |
[12] | Fisher A, Rudin C, Dominici F (2019) All models are wrong, but many are useful: Learning a variable's importance by studying an entire class of prediction models simultaneously. J Mach Learn Res 20: 1–81. |
[13] | Gislason PO, Benediktsson JA, Sveinsson JR (2006) Random forests for land cover classification. Pattern Recogn Lett 27: 294–300. |
[14] | Goutte C, Gaussier E (2005) A probabilistic interpretation of precision, recall and f-score, with implication for evaluation. In European conference on information retrieval, 345–359, Springer. |
[15] | Greenwell BM (2017) pdp: An r package for constructing partial dependence plots. R J 9: 421. |
[16] |
Gregorutti B, Michel B, Saint-Pierre P (2017) Correlation and variable importance in random forests. Stat Comput 27: 659–678. https://doi.org/10.1007/s11222-016-9646-1 doi: 10.1007/s11222-016-9646-1
![]() |
[17] | Guliyev H, Tatoğlu FY (2021) Customer churn analysis in banking sector: Evidence from explainable machine learning models. J Appl Microeconometrics 1: 85–99. |
[18] | Hastie T, Tibshirani R, Friedman J, et al. (2009) Random forests. The elements of statistical learning: Data mining, inference, and prediction, 587–604. |
[19] |
Jafari MJ, Tarokh MJ, Soleimani P (2023) An interpretable machine learning framework for customer churn prediction: A case study in the telecommunications industry. J Ind Eng Manage Stud 10: 141–157. https://doi.org/10.22116/jiems.2023.365114.1504 doi: 10.22116/jiems.2023.365114.1504
![]() |
[20] |
Jiao Y, Du P (2016) Performance measures in evaluating machine learning based bioinformatics predictors for classifications. Quant Biol 4: 320–330. https://doi.org/10.1007/s40484-016-0081-2 doi: 10.1007/s40484-016-0081-2
![]() |
[21] | Liaw A, Wiener M, et al. (2002) Classification and regression by randomforest. R News 2: 18–22. |
[22] | Lundberg SM, Lee SI (2017) A unified approach to interpreting model predictions. Adv Neur Inf Process Syst 30. |
[23] |
Moraffah R, Karami M, Guo R, et al. (2020) Causal interpretability for machine learning-problems, methods and evaluation. ACM SIGKDD Explor Newsl 22: 18–33. https://doi.org/10.1145/3400051.3400058 doi: 10.1145/3400051.3400058
![]() |
[24] | Nationalbank Oesterreichische (2004). Guidelines on credit risk management: Rating models and validation. Oesterreichische Nationalbank. |
[25] | Nohara Y, Matsumoto K, Soejima H, et al. (2022) Explanation of machine learning models using Shapley additive explanation and application for real data in hospital. Comput Meth Prog Bio 214: 106584. |
[26] | Peng K, Peng Y, Li W (2023) Research on customer churn prediction and model interpretability analysis. Plos one 18: e0289724. |
[27] | Ribeiro MT, Singh S, Guestrin C (2016) Model-agnostic interpretability of machine learning. arXiv preprint. https://doi.org/10.48550/arXiv.1606.05386 |
[28] | Rodríguez-Pérez R, Bajorath J (2019) Interpretation of compound activity predictions from complex machine learning models using local approximations and Shapley values. J Med Chem 63: 8761–8777. |
[29] | Shahhosseini M, Hu G (2021) Improved weighted random forest for classification problems. In Progress in Intelligent Decision Science: Proceeding of IDS 2020, 42–56, Springer. |
[30] | Shapley L (2020) A value for n-person games. Class Game Theory 69–79. |
[31] | Staniak M, Biecek P (2018) Explanations of model predictions with live and breakdown packages. arXiv preprint. |
[32] |
Tekouabou SC, Gherghina SC, Toulni H, et al. (2022) Towards explainable machine learning for bank churn prediction using data balancing and ensemble-based methods. Mathematics 10: 2379. https://doi.org/10.3390/math10142379 doi: 10.3390/math10142379
![]() |
[33] |
Tran KL, Le HA, Nguyen TH, et al. (2022) Explainable machine learning for financial distress prediction: evidence from Vietnam. Data 7: 160. https://doi.org/10.3390/data7110160 doi: 10.3390/data7110160
![]() |
[34] |
Uddin MS, Chi G, Al Janabi MA, et al. (2022) Leveraging random forest in micro-enterprises credit risk modelling for accuracy and interpretability. Int J Financ Econ 27: 3713–3729. https://doi.org/10.1002/ijfe.2346 doi: 10.1002/ijfe.2346
![]() |
[35] |
Verster T, Fourie E (2023) The changing landscape of financial credit risk models. Int J Financ Stud 11: 98. https://doi.org/10.3390/ijfs11030098 doi: 10.3390/ijfs11030098
![]() |
[36] |
Winham SJ, Freimuth RR, Biernacka JM (2013) A weighted random forests approach to improve predictive performance. Stat Anal Data Min ASA Data Sci J 6: 496–505. https://doi.org/10.1002/sam.11196 doi: 10.1002/sam.11196
![]() |
[37] |
Yu F, Wei C, Deng P, et al. (2021) Deep exploration of random forest model boosts the interpretability of machine learning studies of complicated immune responses and lung burden of nanoparticles. Sci Adv 7: eabf4130. https://doi.org/10.1126/sciadv.abf413 doi: 10.1126/sciadv.abf413
![]() |
[38] |
Zhu X, Chu Q, Song X, et al. (2023) Explainable prediction of loan default based on machine learning models. Data Sci Manag 6: 123–133. https://doi.org/10.1016/j.dsm.2023.04.003 doi: 10.1016/j.dsm.2023.04.003
![]() |
1. | Iftikhar Ahmad, Syed Ibrar Hussain, Hira Ilyas, Juan Luis García Guirao, Adeel Ahmed, Shabnam Rehmat, Tareq Saeed, Numerical solutions of Schrödinger wave equation and Transport equation through Sinc collocation method, 2021, 105, 0924-090X, 691, 10.1007/s11071-021-06596-9 | |
2. | Zuzanna Szymańska, Jakub Skrzeczkowski, Błażej Miasojedow, Piotr Gwiazda, Bayesian inference of a non-local proliferation model, 2021, 8, 2054-5703, 10.1098/rsos.211279 | |
3. | Piotr Gwiazda, Błażej Miasojedow, Jakub Skrzeczkowski, Zuzanna Szymańska, Convergence of the EBT method for a non-local model of cell proliferation with discontinuous interaction kernel, 2023, 43, 0272-4979, 590, 10.1093/imanum/drab102 | |
4. | Lucía Pedraza, Juan Pablo Pinasco, Nicolas Saintier, Pablo Balenzuela, An analytical formulation for multidimensional continuous opinion models, 2021, 152, 09600779, 111368, 10.1016/j.chaos.2021.111368 | |
5. | Sander C. Hille, Esmée S. Theewis, Explicit expressions and computational methods for the Fortet–Mourier distance of positive measures to finite weighted sums of Dirac measures, 2023, 294, 00219045, 105947, 10.1016/j.jat.2023.105947 | |
6. | Christian Düll, Piotr Gwiazda, Anna Marciniak-Czochra, Jakub Skrzeczkowski, Measure differential equation with a nonlinear growth/decay term, 2023, 73, 14681218, 103917, 10.1016/j.nonrwa.2023.103917 | |
7. | Carlo Bianca, Nicolas Saintier, Thermostatted kinetic theory in measure spaces: Well-posedness, 2025, 251, 0362546X, 113666, 10.1016/j.na.2024.113666 | |
8. | Juan Pablo Pinasco, Nicolas Saintier, Martin Kind, Learning, Mean Field Approximations, and Phase Transitions in Auction Models, 2024, 14, 2153-0785, 396, 10.1007/s13235-023-00508-9 |