Citation: Hamidou Tembine. Mean-field-type games[J]. AIMS Mathematics, 2017, 2(4): 706-735. doi: 10.3934/Math.2017.4.706
[1] | Qianwei Zhang, Zhihua Yang, Binwei Gui . Two-stage network data envelopment analysis production games. AIMS Mathematics, 2024, 9(2): 4925-4961. doi: 10.3934/math.2024240 |
[2] | Sharifeh Soofizadeh, Reza Fallahnejad . Evaluation of groups using cooperative game with fuzzy data envelopment analysis. AIMS Mathematics, 2023, 8(4): 8661-8679. doi: 10.3934/math.2023435 |
[3] | Martin Do Pham . Fractal approximation of chaos game representations using recurrent iterated function systems. AIMS Mathematics, 2019, 5(6): 1824-1840. doi: 10.3934/math.2019.6.1824 |
[4] | Jun Wang, Dan Wang, Yuan Yuan . Research on low-carbon closed-loop supply chain strategy based on differential games-dynamic optimization analysis of new and remanufactured products. AIMS Mathematics, 2024, 9(11): 32076-32101. doi: 10.3934/math.20241540 |
[5] | Ruimin Xu, Kaiyue Dong, Jingyu Zhang, Ying Zhou . Linear-quadratic-Gaussian mean-field games driven by Poisson jumps with major and minor agents. AIMS Mathematics, 2025, 10(5): 11086-11110. doi: 10.3934/math.2025503 |
[6] | Mustafa Ekici . On an axiomatization of the grey Banzhaf value. AIMS Mathematics, 2023, 8(12): 30405-30418. doi: 10.3934/math.20231552 |
[7] | Xin-Hui Shao, Wan-Chen Zhao . Relaxed modified Newton-based iteration method for generalized absolute value equations. AIMS Mathematics, 2023, 8(2): 4714-4725. doi: 10.3934/math.2023233 |
[8] | Zuliang Lu, Lu Xing, Ruixiang Xu, Mingsong Li, Junman Li . Stochastic evolution game analysis of the strategic coalition of enterprise pollution control. AIMS Mathematics, 2024, 9(4): 9287-9310. doi: 10.3934/math.2024452 |
[9] | Rami Amira, Mohammed Salah Abdelouahab, Nouressadat Touafek, Mouataz Billah Mesmouli, Hasan Nihal Zaidi, Taher S. Hassan . Nonlinear dynamic in a remanufacturing duopoly game: spectral entropy analysis and chaos control. AIMS Mathematics, 2024, 9(3): 7711-7727. doi: 10.3934/math.2024374 |
[10] | Bingyuan Gao, Yaxin Zheng, Jieyu Huang . General equilibrium of Bertrand game: A spatial computational approach. AIMS Mathematics, 2021, 6(9): 10025-10036. doi: 10.3934/math.2021582 |
The term "mean-field" has been referred to a physics concept that attempts to describe the effect of an infinite number of particles on the motion of a single particle. Researchers began to apply the concept to social sciences in the early 1960s to study how an infinite number of factors affect individual decisions. However, the key ingredient in a game-theoretic context is the influence of the distribution of states and or control actions into the payoffs of the decision-makers who may have different preferences and characters, and are not necessarily exchangeable per class (or indistinguishable per class/type). A mean-field-type game is a game in which the payoffs and/or the state dynamics coefficient functions involve not only the state and actions profiles but also the distributions of state-action process (or its marginal distributions). In contrast to mean-field games [2,3] in which a single player does not influence of the mean-field terms, here, it is totally different. In mean-field-type games, a single player may have a strong impact on the mean-field terms. This paper presents the key ingredients and recent development of mean-field-type game theory.
Stochastic games is a model for dynamic interactions in which the state evolves in a way that depends on the actions of the decision-makers. The model was introduced in [1], who proved that two-player zero-sum discounted games with finite states under perfect state observation have a value and both players have optimal stationary strategies. In Shapley's initial model of stochastic games, the state transitions and instant payoff were given by the state-action profiles. However, explicit dependence on the distribution of states or distribution of actions were not examined.
A stochastic optimization problem in which the first moments (expected values of the state) influences the state dynamics and the performance criterion were examined in [10]. This corresponds to a first moment-based mean-field-type game with a single decision-maker. The model was extended by [11] to include aggregative structures of the distribution of states and a new stochastic maximum principle was established in the risk-neutral setting. The resulting system is not a simple augmented state space approach because the expected value of the Hamiltonian is involved in it. The work of [17] extends that result to performance criteria that include the entire probability measure of the states. The work of [12,13] consider both expected value of states and expected of control-actions in the performance criteria. We refer the reader to [4,14,15,16] for stochastic maximum principle of mean-field type.
Mean-field-type games with two or more players can be seen as the multi-agent generalization of the single agent mean-field-type control problem. In [18,19] it is shown that the methodology can be used for risk-sensitive mean-field-type games where weakened conditions on the drifts are provided. State measurement noise and partial observation studies were conducted in [20,21]. See also [27,28,29,31,32,33,34,35,36,37,38,39,46,47,48,49,50].
Wiener chaos expansion has been proposed in [30] to transform the mean-field-type games into equivalent deterministic game problems where the state process is replaced by its polynomial chaos or Wiener chaos expansion counterpart. The decomposition into decoupling series expansion is similar to the one known in the theory of stochastic processes, as the Kosambi-Karhunen-Loeve theorem, is a representation of a stochastic process as an infinite linear combination of orthogonal functions, analogous to a Fourier series representation of a function on a bounded domain. Cooperative mean-field-type games were introduced in [5,41,42,43,44,45]. In [40] several applications of mean-field-type games in engineering are provided.
The rest of the paper is structured as follows. Section 2 presents a generic model. Section 3 focuses on solution methods. Classes of mean-field-type games are provided in Section 4. Section 5 examines the partial state observation case. Section 6 presents some recent development and or extensions of mean-field-type games. Section 7 concludes the paper.
Table 1 summarizes the notations used in the article.
I | ≜ | set of decision-makers |
T | ≜ | Length of the horizon |
[0,T] | ≜ | horizon of the mean-field-type game |
t | ≜ | time index. |
S | ≜ | state space |
s(t) | ≜ | state at time t |
Δ(S) | ≜ | set of probability measures on S |
m(t,.) | ≜ | probability measure of the state at time t |
Ai | ≜ | control action set of decision-maker i∈I |
ai(.) | ≜ | strategy of decision-maker i∈I |
a=(ai)i∈I | ≜ | strategy profile |
b(t,s,m,a) | ≜ | drift coefficient function |
σ(t,s,m,a) | ≜ | diffusion coefficient function |
ri(t,s,m,a) | ≜ | instant payoff of decision-maker i∈I |
gi(s,m) | ≜ | terminal payoff of decision-maker i∈I |
Ri,T(m0,a) | ≜ | cumulative payoff of i |
Vi(t,m) | ≜ | equilibrium payoff of i∈I |
(pi,qi) | ≜ | first order adjoint process of i∈I |
(Pi,Qi) | ≜ | second order adjoint process of i∈I |
v∗i(t,s) | ≜ | dual function of i∈I |
Hi | ≜ | ri+bpi+σqi of i∈I |
Hi,s | ≜ | ri,s+bspi+σsqi |
Hi,ss | ≜ | ri,ss+bsspi+σssqi |
A basic mean-field-type game in continuous time is described by:
• The set of decision-makers I={1,2,…,I}, where I∈N.
• The horizon of the interaction is the interval [0,T], T>0.
• There is a non-empty state space S. The set of probability measures on S is denoted by Δ(S).
• For each decision-maker i, a non-empty control action set Ai is available. The set Ai is not necessarily convex. The set of all control actions of all the decision-makers is A=∏i∈IAi.
• An instant payoff of decision-maker i is ri: S×Δ(S)×A→R
• The state evolution is explicitly given by a controlled Itô's stochastic differential equation of mean-field type, called controlled McKean-Vlasov equation:
s(t)=s0+∫t0bdt′+∫t0σdB(t′), t>0. |
with s(0)=s0∈S, and b,σ: S×Δ(S)×A→R, b is the drift coefficient functional and σ is the diffusion coefficient functional, B is a one-dimensional standard Brownian motion on a given probability space (Ω,P,F).
Given a initial state s0 which is drawn from the initial distribution m0, the game G(s0) proceeds as follows. At each instant, each decision-maker observes the state (perfect monitoring, perfect state observation), chooses a control action according her strategy (defined below) and observes/measures her payoff.
For this game, we adopt the following notions: An admissible control of decision-maker i∈I is progressively measurable process with respect to the filtration Ft, taking values in Ai. L2F([0,T],Aj) denotes the set of F-adapted, Aj-valued processes within [0,T], and C([0,T],X) denotes the set of continuous (hence measurable) [0,T]→X. The set of admissible controls of decision-maker i is denoted by Ai.
We identify two processes ai and ˜ai in Ai if
P(ai=˜ai, a.e. on [0,T])=1. |
A strategy for decision-maker i starting at time 0 is a measurable map (renamed again by) ai: [0,T]×C([0,T],X)×∏j≠iL2F([0,T],Aj)→Ai for which there exists ϵ>0 such that for any
(t′,f1,f2)∈(0,T]×{C([0,T],X)×∏j≠iL2F([0,T],Aj)}2, |
if f1=f2 on [0,t′] then ai(.,f1)=ai(.,f2), on [0,t′+ϵ]. To each strategy profile one can associate a control process profile. This will allow us to work with both open-loop and feedback form of strategies by considering ai(.,ss0,a,m). For open-loop strategies the information structure is limited to {t}, ai is simply a measurable function of time (and the initial data). The stochastic maximum principle can be used as a methodology for finding optimal open-loop strategies. The information structure for feedback strategies is s and its distribution (and the initial data). The dual adjoint functions which are obtained from the Bellman functions can be used for finding feedback strategies where here the feedback is a state-and-mean-field feedback.
The cumulative payoff of decision-maker i is
Ri,T(m0,a):=gi(s(T),m(T,.))+∫T0ri(s(t),m(t,.),a(t)) dt, |
where gi(s(T),m(T,.)) is the terminal payoff of i. The risk-neutral payoff of i is E[Ri,T(m0,a)]. The risk-neutral mean-field-type game G0,T(m0) is the normal-form game (I,(Ai,ERi,T)i∈I).
We now introduce the key problems addressed in this paper and some solution concepts.
Given a−i:=(a1,…,ai−1,ai+1,…), the best-response value problem associated with decision-maker i, is
Vi(0,m0)={sup | (1) |
The strategies a_i of i that achieve the above best-response value V(0, m_0) are called best-response strategies of i to a_{-i}: =(a_{j})_{j\neq i}. The set of such strategies is denoted by BR_{i}(a_{-i}). We are interested in characterizing best response strategies of every decision-maker.
An equilibrium point is a strategy profile a such that for every decision-maker i, a_i\in BR_i(a_{-i}) and the resulting mean-field is m(t, .) =\mathbb{P}_{s^{s_0, a}(t)}. We are interested in characterizing equilibrium strategies of every decision-maker.
As the state dynamics is a stochastic differential equation of mean-field type, the existence and uniqueness of solution to the state equation is not always guaranteed. Below, we provide sufficiency conditions for existence and uniqueness of a solution.
Lemma 1 (Existence). If the coefficient functions b, \sigma are continuously differentiable with respect to s, m and the Gateaux-derivatives are bounded, then for each strategy profile (a_i)_{i\in \mathcal{I}}\in \prod_{i\in \mathcal{I}} \mathcal{A}_i, there is a unique solution to the state dynamics which we denote by s(t): = s^{s_0, a}(t).
A proof of this Lemma can be directly obtained from [7] by choosing a strategy profile (a_i)_{i\in \mathcal{I}}\in \prod_{i\in \mathcal{I}} \mathcal{A}_i.
Lemma 2. The probability law of the state solves (in the weak sense) the following Fokker-Planck-Kolmogorov equation of mean-field type:
m_t+(bm)_s-\frac{1}{2}(\sigma^2 m)_{ss} = 0, \ \ m(0, .) = m_0(.). | (2) |
The proof is by now standard using integration by parts in the sense of distribution.
In this section we present three different solution approaches for solving mean-field-type game problems. We also provide some relationships between these methods. We introduce a notion of Gâteaux differentiability with respect to a measure.
Let \mathcal{S} be a vector space, b\in L^{\alpha}(\Omega\times [0, T]\times \mathcal{S}\times \Delta(\mathcal{S}), \ \mathbb{R}), O\subset L^{\alpha^*} is an open set.
Definition 1 (Gâteaux derivative). The Gâteaux differential b_m[\tilde{m}] of the functional b at m\in O in the direction \tilde{m} is defined as
\lim\limits_{\epsilon \rightarrow 0_+}\frac{d}{d\epsilon} {b}(., t, s, m+\epsilon \tilde{m}) = \int {b}_m(., t, s, m)(\xi) \ \tilde{m}(d\xi). |
If the limit exists for all direction \tilde{m}, then one says that b is Gâteaux differentiable at m.
If {m}\in L^{\alpha} and b_m\in L^{\alpha^*}, the limit above is finite. The limit appearing in Definition 1 is taken relative to the topology of L^{\alpha^*}.
Note that one can connect the derivative with the respect to s of the Gateaux derivative b_m to the notion of functional derivative to respect to measure and also to the Wasserstein gradient.
We provide Gâteaux differentiation of \| s\|_{\alpha}-based functions:
• \alpha-norm: Let g(m) = (\int |s|^{ \alpha} m(t, ds))^{1/ \alpha} = :m_{\alpha}^{1/ \alpha}, and m\in L^{\alpha}, then
\lim\limits_{\epsilon \rightarrow 0_+}\frac{d}{d\epsilon} g( m+\epsilon \tilde{m}) = \frac{1}{ \alpha} \left[\int |\xi|^{ \alpha} \ m(d\xi)\right]^{\frac{1}{ \alpha}-1} \left[\int |\xi|^{ \alpha} \ \tilde{m}(d\xi)\right]. |
Thus, g_m(m)(s) =\frac{s^{ \alpha}}{ \alpha m_{ \alpha}^{ \alpha-1}}, and \partial_s g_m(m)(s) =\frac{s^{ \alpha-1}}{m_{ \alpha}^{ \alpha-1}} \neq 0 = \partial_{m}[g_s].
• L^{ \alpha}-normed drift: Let
\bar{b} = \left(\int_{y\in \mathcal{S}} |b|^{\alpha}(., t, s^a(t), y, a(t)) m^a(t, dy)\right)^{\frac{1}{\alpha}}, |
i.e., the L^{\alpha}-norm of b with respect to the measure m^a(t, .). We compute the Gâteaux-derivative of the L^{ \alpha}-normed drifts: \bar{b}_m(., t, s, m)(\xi): =\frac{b^{ \alpha}(., t, s, \xi)}{ \alpha \bar{b}^{ \alpha-1}}. By changing variables, one has \bar{b}_m(., t, \xi, m)(s) =\frac{b^{ \alpha}(., t, \xi, s)}{ \alpha \bar{b}^{ \alpha-1}(t, \xi, m)}. We differentiate with respect to the state s to get:
\begin{eqnarray} \partial_s \bar{b}_m(., t, \xi, m)(s) & = & \frac{b^{ \alpha-1}(., t, \xi, s) b_y(., t, \xi, s)}{\bar{b}^{ \alpha-1}(t, \xi, m)}. \end{eqnarray} | (3) |
E[L \partial_s \bar{b}_m(., t, S, m)(s)] =E[L \frac{b^{ \alpha-1}(., t, S, s) b_y(., t, S, s)}{\bar{b}^{ \alpha-1}(t, S, m)}]: = \tilde{E}[\tilde{L} \frac{b^{ \alpha-1}(., t, \tilde{S}, s) b_y(., t, \tilde{S}, s)}{\bar{b}^{ \alpha-1}(t, \tilde{S}, m)}], where the notation \tilde{E} denotes the expectation with respect to the variables with \tilde{S} which is a copy of S. We now replace the argument s by S to get \tilde{E}[\tilde{L} \partial_s \bar{b}_m(., t, \tilde{S}, m)(S)] = \tilde{E}[\tilde{L} \frac{b^{ \alpha-1}(., t, \tilde{S}, S) b_y(., t, \tilde{S}, S)}{\bar{b}^{ \alpha-1}(t, \tilde{S}, m)}].
In contrast to classical differential games or classical mean-field games in which one can directly use the standard Dynamic Programming Principle (DPP) on s(t), here, the presence of the term m^{a}(t, .) in the functions r_i, b, \sigma may create a time-inconsistency, and s(t) is not the appropriate state for DPP. The next Proposition identifies an appropriate state space under which DPP can be obtained and a Hamilton-Jacobi-Bellman (HJB) equation can be derived.
Proposition 1. If there exists V such that
\begin{equation}\label{dpp}\left\{ \begin{array}{lll} i\in \mathcal{I}, \\ V_{i, t}+\int_s [\sup\limits_{a_i} \ r_i+ b V_{i, sm}+\frac{\sigma^2}{2}V_{i, ssm}]m(t, ds) = 0, \\ V_i(T, m) = \int_s m(T, ds) g_i(s, m(T, .)), \\ m(t, .) = \mathbb{P}_{s^a(t)} \end{array} \right. \end{equation} | (4) |
then V_i(0, m_0) is an equilibrium payoff and the optimal strategy a_i of i maximizes r_i+ b V_{i, sm}+\frac{\sigma^2}{2}V_{i, ssm}.
Proof. In order to use a dynamic programming principle we look for an augmented state such that one gets a Markovian system. By rewriting the payoff functional \mathbb{E}\mathcal{R}_{i, T} as a function of the measures m(t, .) and the strategies a_i(.), we obtain a classical setup. Hence, we only need the evolution of the measures m(t, .) which is given by (2). Then, one obtains a deterministic differential game problem with m as state (in infinite dimensions). The best response of i is
V_i(0, m_0) = \left\{ \begin{array}{lll} \sup\limits_{a_i\in \mathcal{A}_i} \mathbb{E}[\mathcal{R}_{i, T}(m_0, a)]\\ m(t, .) \ \mbox{solves} \ (2). \end{array} \right. |
Applying the classical DPP to the deterministic problem yields
V_{i, t}+\sup\limits_{a_i}[\int_s r_i m(t, ds)+ \langle \tilde{b}, V_{i, m}\rangle] = 0, |
where \tilde{b}(m)(t, s) = -(bm)_s+\frac{1}{2}(\sigma^2 m)_{ss}. We apply the theory of distributions to establish the following equalities:
\langle \tilde{b}, V_{i, m}\rangle =\int_{s} \tilde{b}(m) V_{i, m}\ ds | (5) |
=\int_{s} -(bm)_sV_{i, m}+\frac{1}{2}(\sigma^2 m)_{ss}V_{i, m} | (6) |
=\int_{s} [b V_{i, sm}+\frac{\sigma^2}{2}V_{i, ssm}] m(t, ds), | (7) |
which is obtained by integration by part (in the sense of distribution). This completes the proof.
Example 1 (Mean-field-free equations). If r_i, g_i, b, \sigma are all independent of m then V_i(t, m) = \int v_i(t, s)\ m(t, ds) = \mathbb{E}_{m(t, .)} v_i(t, s(t)) where the function (v_i(t, s))_{i} solves the classical Bellman system
\begin{equation}\label{dpptt2}\left\{ \begin{array}{lll} i\in \mathcal{I}, \\ v_{i, t}+H_{i} = 0, \\ v_i(T, s) = g_i(s, m(T, .)), \\ \end{array} \right. \end{equation} | (8) |
Moreover
\begin{equation} \label{britt} v_i(0, s_0) = \left\{ \begin{array}{lll} \sup\limits_{a_i\in \mathcal{A}_i} \mathbb{E}[\mathcal{R}_{i, T}(s_0, a)]\\ s(t) =s^{s_0, a}(t), \ s(0) = s_0\\ \end{array} \right. \end{equation} | (9) |
Proposition 1 provides a sufficiency condition for equilibria. However, the existence of classical solution to the infinite-dimensional HJB equation is not guaranteed and the value function may not be differentiable as in the classical case. Weaker notion of solution such as viscosity solution using weak sub/super-differential set will be introduced in the next section.
Since the DPP equilibrium system (4) is in infinite dimensions, the solvability of such an integro-partial differential equation poses some technical difficulties. However, when the value function V is weakly differentiable with respect to (s, m) then a finite dimensional quantity can be obtained as it is in the dual space.
Definition 2. The function v^*(t, s) = V_{m}(t, s) is the dual function associated to V.
Example 2. In the mean-field-free setting, i.e., when r_i, g_i, b, \sigma are independent of m, the dual function v^*(t, s) = V_{m}(t, s) coincides with the function v(t, s). However, the two functions do not coincide in the mean-field-dependent setting.
Proposition 2. If V_{m}, V_{sm}, V_{ssm} exist and are continuous then the equilibrium dual function v^* = V_m solves the following system of partial differential equations
\begin{equation}\label{dpp2}\left\{ \begin{array}{lll} i\in \mathcal{I}, \\ v^*_{i, t}+H_{i}+ \mathbb{E} H_{i, m} = 0, \\ v^*_i(T, s) = g_i(s, m(T, .))+\mathbb{E} g_{i, m}, \\ m(t, .) = \mathbb{P}_{s^a(t)}, \ m(0, .) = m_0(.) \end{array} \right. \end{equation} | (10) |
where {H}_i(t, s, m, v^*_{i, s}, v^*_{i, ss}) =\sup_{a_i} \ [r_i+ b v^*_{i, s}+\frac{\sigma^2}{2}v^*_{i, ss}].
Proof. We differentiate the HJB system (4) in the direction \tilde{m} = m,
\begin{equation}\label{dpptt}\left\{ \begin{array}{lll} i\in \mathcal{I}, \\ (V_{i, t})_m+[\sup\limits_{a_i} \ r_i+ b V_{i, sm}+\frac{\sigma^2}{2}V_{i, ssm}]\\ +\int_{\tilde s} [\sup\limits_{a_i} \ r_i+ b V_{i, sm}+\frac{\sigma^2}{2}V_{i, ssm}]_m(t, \tilde{s}, m)(s) m(t, d\tilde{s}) = 0, \\ V_{i, m}(T, s) = g_i(s, m(T, .))+\int_s m(T, ds) g_i(s, m(T, .)), \\ m(t, .) = \mathbb{P}_{s^a(t)} \end{array} \right. \end{equation} | (11) |
Noting that (V_{i, t})_m = v_{i, t}^*, \ V_{i, sm} = v_{i, s}^* and V_{i, ssm} = v_{i, ss}^*, the latter system (11) becomes the announced one. This completes the proof.
As pointed out by [17], the dual function v^*_{i}(0, s_0) is not the equilibrium payoff of the decision-maker i. The dual function plays an important role in establishing a stochastic maximum principle as we will see in the next subsection.
We now present first order and second-order adjoint processes that are useful in establishing necessary conditions for mean-field-type equilibria. The advantage of the adjoint processes is that they solve a class of linear backward SDEs of mean-field type. The following result provides existence and uniqueness conditions.
Lemma 3 ([9,11]). Consider the following mean-field backward SDE
p(t) = p(T)+\int_t^T \tilde{E}[\hat{f}(t', \tilde{p}(t'), \tilde{q}(t'), p(t'), q(t'))]\ dt' -\int_t^T q(t') dB(t'), \ |
where p(T) is a progressively measurable, square integrable random variable. Let \hat{f}(t, ., ., ., .) be Lipschitz for all time t\in [0, T] and t \mapsto \hat{f}(t, 0, 0, 0, 0) be square integrable over [0, T]. Then, the mean-field backward SDE has a unique adapted solution satisfying
E\left[\sup\limits_{t\in[0, T]}| p(t)|^2+\int_0^T | q(t)|^2 dt\right]<\infty. | (12) |
Consider the first-order adjoint processes (p_i, q_i)_{i\in \mathcal{I}}
\begin{equation}\label{firstorder}\left\{\begin{array}{ll} dp_i = - [H_{i, s} + \tilde{\mathbb{E}} \partial_s H_{i, m}]dt +q_i dB, \\ p_i(T) = g_{i, s}(T)+\tilde{\mathbb{E}}\{ \partial_s g_{i, m}(T)\}, \\ i\in \mathcal{I}. \end{array}\right. \end{equation} | (13) |
Lemma 4. If the functions b, \sigma, r_i, g_i are continuously differentiable with respect to (s, m), all their first-order derivatives with respect to (s, m) are continuous in (s, m, a), and bounded. Then, the first-order adjoint system is a linear SDE with almost surely bounded coefficient functions. There is a unique \mathcal{F}-adapted solution such that
\mathbb{E}\left[\sup\limits_{t\in [0, T]}|p_i(t)|^2+\int_0^T |q_i(t)|^2 dt\right] < +\infty. |
These strong smoothness conditions on b, \sigma, r_i, g_i can be considerably weakened using representations of weak sub/super-differential sets. We refer the reader to [8,9] for more details on existence and uniqueness of solutions to backward SDE of mean-field type.
Proof. By choosing \hat{f}(t, \tilde{p}(t), \tilde{q}(t), p(t), q(t)) = \alpha_0(t, .)+\alpha_1(t, .) \tilde{p}(t)+\alpha_2(t, .) \tilde{q}(t)+\alpha_3(t, .) p(t) +\alpha_4(t, .)q(t) where \alpha_i(t, .) are measurable bounded coefficient functions, one gets a backward equation in the form of the adjoint equations. Under the assumptions imposed on b, \sigma, r_i, g_i, the coefficient \alpha_i fulfill the requirement and result follows as a direct application.
Note that for linear-quadratic mean-field-type games, the payoff function r_i =-s^2-\int y^2 m(t, dy) -a_i^2-\int_{a'_i} (a'_i)^2 \mathbb{P}_{a_i}(da'_i) does not satisfy the above assumptions. In that case, one can relax the boundedness assumption and replace it by L^{2}-estimates if the second moment is finite.
Consider the second-order adjoint processes (P_i, Q_i)_{i\in \mathcal{I}}
\begin{equation}\label{secondorder} \left\{\begin{array}{ll} dP_i =-\{H_{i, ss}+\tilde{\mathbb{E}} \partial_{ss}{H}_{i, m} +( 2 b_s+\sigma_s^2) P_i+2\sigma_s Q_i \}dt+ Q_i dB, \\ P_i(T) = g_{i, ss}(T)+\tilde{\mathbb{E}}\{ \partial_{ss}g_{i, m}(T)\}, \\ i\in \mathcal{I}. \end{array}\right. \end{equation} | (14) |
Lemma 5. If the functions b, \sigma, r_i, g_i are twice continuously differentiable with respect to (s, m), all their derivatives up to second order with respect to (s, m) are continuous in (s, m, a), and bounded. Then, the second order adjoint system is a linear backward SDE with almost surely bounded coefficient functions. There is a unique \mathcal{F}-adapted solution such that
\mathbb{E}\left[\sup\limits_{t\in [0, T]}|P_i(t)|^2+\int_0^T |Q_i(t)|^2 dt\right] < +\infty. |
Follows immediately from Lemma 3.
Proposition 3 (Stochastic Maximum Principle). If (s, a) is a pair of equilibrium state and equilibrium strategy, then there is vector of \mathbb{F}-adapted processes (p_i, q_i, P_i, Q_i)_{i\in \mathcal{I}} such that H_i(s, m, a'_i, a_{-i})-H_i(s, m, a)+\frac{1}{2}P_i(\sigma(s, m, a'_i, a_{-i})-\sigma(s, m, a))^2\leq 0 for all a'_i\in \mathcal{A}_i, almost every time t\in [0, T] and \mathbb{P}-almost surely. In particular, if \sigma(t, s, m, a) = \sigma(t, s, m) is independent of a_i then \sup_{a'_i\in \mathcal{A}_i}H_i(s, m, a'_i, a_{-i})\leq H_i(s, m, a).
Proof. The proof follows similar steps as in [10] by replacing the derivative with the respect to first moment component by the Gateaux-derivative with respect to m, for each decision-maker i when fixing the strategies of the other decision-makers.
In order to provide weaker conditions we introduce super-and subdifferentials of the viscosity solution {V}. Define the first order super-differential of the dual function as
D^{1, +}_{t, w}V_{m}(t, w) = \{ (d_1, d_2)\in \mathbb{R}\times T(\mathcal{S})\ | \limsup_{t', w' \rightarrow t, w} \frac{V_{m}(t', w')-V_{m}(t, w)-d_1(t'-t)-d_2(w'-w)}{ |t'-t|+\| w'-w\|}\leq 0\}.
Similarly, the first order sub-differential is
D^{1, -}_{t, w}V_m(t, w) = \{ (d_1, d_2)\in \mathbb{R}\times T(\mathcal{S})\ | \lim\inf_{t', w' \rightarrow t, w} \frac{V_{m}(t', w')-V_{m}(t, w)-d_1(t'-t)-d_2(w'-w)}{ |t'-t|+\| w'-w\|}\geq 0 \}.
Using these weak derivatives, one has a relationship between stochastic maximum principle and dynamic programming principle in terms of inclusion: p(t) \in D^{1, +}_{s}V_{m}(t, w). In particular, if V is Gâteaux-differentiable with respect to m and s then
\left\{ \begin{array}{lll} p_i(t) = V_{i, sm}(s(t), m(t)) = d_2, \\ q_i(t) = \sigma V_{i, ssm}(s(t), m(t)). \end{array} \right. |
Proposition 4. If the first and second order weak derivatives with respect to s of V_{m} exist then the process p_i(t) = V_{i, sm}(s(t)) solves the backward SDE:
dp_i = -\alpha_i dt+ \beta_i dB, \ \ p_i(T) = g_{i, s}(s(T), m(T))+\int_s \partial_s g_{i, m} m(T, ds), |
where \alpha_i =H_{i, s}+\tilde{\mathbb{E}} [\partial_s H_{i, m}], H_{i, s} is a notation for r_{i, s}+b_{s} p_i+\sigma_s q_i and should not be confused with derivative of the functional H_i. \beta_i = \sigma V_{i, ssm} = q_i.
Proof. Let \tilde{f}_i(t, s) = V_{i, sm}(t, s) = v_{i, s}^*(t, s) Applying Ito's formula applied to \tilde{p}_i(t) = \tilde{f}_i(t, s(t)) yields
\begin{array}{l} d\tilde{p}_i =[(\tilde{f}_i)_t+ b(\tilde{f}_i)_s +\frac{\sigma^2}{2}(\tilde{f}_i)_{ss}]dt +\sigma (\tilde{f}_i)_s dB \end{array}, |
We identify the coefficient processes \tilde{q}_i = \sigma (\tilde{f}_i)_s = \sigma v_{i, ss}^* =\sigma V_{i, ssm}. It remains to find the drift coefficient of \tilde{p}_i.
\left\{\begin{array}{lll} \tilde{p}_i = v_{i, s}^*(t, s) =V_{i, sm}(t, s), \\ (\tilde{f}_i)_t = v_{i, ts}^*, \\ b(\tilde{f}_i)_s =bv_{i, ss}^*, \\ \frac{\sigma^2}{2}(\tilde{f}_i)_{ss} = \frac{\sigma^2}{2}v_{i, sss}^* \end{array} \right. |
Summing together one obtains
\begin{equation} \left\{\begin{array}{lll} (\tilde{f}_i)_t+ b(\tilde{f}_i)_s +\frac{\sigma^2}{2}(\tilde{f}_i)_{ss} = v_{i, ts}^*+bv_{i, ss}^*+\frac{\sigma^2}{2}v_{i, sss}^*\end{array} \right. \end{equation} | (15) |
In order to identify the latter term, one differentiates with respect to s the equation (10) satisfied by the dual function v_i^*.
\begin{equation}\label{dpp3}\left\{ \begin{array}{lll} i\in \mathcal{I}, \\ v^*_{i, st}+r_{i, s}+(bv^*_{i, s})_s+(\frac{\sigma^2}{2}v_{i, ss})_s+ \tilde{\mathbb{E}} {H}_{i, sm} = 0, \\ v^*_{i, s}(T, s) = g_{i, s}(s, m(T, .))+\tilde{\mathbb{E}} g_{i, sm}, \\ m(t, .) = \mathbb{P}_{s^a(t)}, \ m(0, .) = m_0(.) \end{array} \right. \end{equation} | (16) |
which is expanded as
\begin{equation}\label{dpp3t}\left\{ \begin{array}{lll} i\in \mathcal{I}, \\ (v^*_{i, s})_t+b v^*_{i, ss}+\frac{\sigma^2}{2}v^*_{i, sss}\\ +[r_{i, s}+b_sv^*_{i, s}+ \sigma_s (\sigma v^*_{i, ss})+ \tilde{\mathbb{E}} {H}_{i, sm}] = 0, \\ v^*_{i, s}(T, s) = g_{i, s}(s, m(T, .))+\tilde{\mathbb{E}} g_{i, sm}, \\ m(t, .) = \mathbb{P}_{s^a(t)}, \ m(0, .) = m_0(.) \end{array} \right. \end{equation} | (17) |
It follows that (v^*_{i, s})_t+b v^*_{i, ss}+\frac{\sigma^2}{2}v^*_{i, sss} =-[r_{i, s}+b_sv^*_{i, s}+ \sigma_s (\sigma v^*_{i, ss})+ \tilde{\mathbb{E}} H_{i, sm}], and
\begin{array}{l} d\tilde{p}_i =-[r_{i, s}+b_sv^*_{i, s}+ \sigma_s(\sigma v^*_{i, ss})+ \tilde{\mathbb{E}} H_{i, sm}]dt +\sigma v_{i, ss}^*dB\\ = -[H_{i, s}+ \tilde{\mathbb{E}} H_{i, sm}]dt+q_idB \end{array}, |
The two pair of processes (p_i, q_i) and (\tilde{p}_i = v^*_{i, s}, \tilde{q}_i = \sigma v^*_{i, ss}) solves the same backward SDE with the same terminal condition. Under the assumptions above, these two processes are identical. This completes the proof.
Define the second order super-differential is D^{1, 2, +}_{t, w}V_{m}(t, w) = \{ (d_1, d_2, d_3)\in \mathbb{R}\times T(\mathcal{S})\times S^n \ | \
\lim\sup_{t', w' \rightarrow t, w} \frac{V_{m}(t', w')-V_{m}(t, w)-d_1(t'-t)-d_2(w'-w)-\frac{1}{2}(w'-w)'d_3(w'-w)}{ |t'-t|+\| w'-w\|^2}\leq 0 \}
Similarly, the second order sub-differential is D^{1, 2, -}_{t, w}V_{m}(t, w) = \{ (d_1, d_2, d_3)\in \mathbb{R}\times T(\mathcal{S})\times \mathbb{R}^n\times S^n \ | \
\lim\inf_{t', w' \rightarrow t, w} \frac{V_{m}(t', w')-V_{m}(t, w)-d_1(t'-t)-d_2(w'-w) -\frac{1}{2}(w'-w)'d_3(w'-w)}{ |t'-t|+\| w'-w\|^2}\geq 0 \}
V_i is a viscosity solution if and only if V_i(T, m) =\int_s g_i m(T, ds) for any measure m and for all (t, m)\in [0, T)\times \Delta(\mathcal{S}),
\left\{ \begin{array}{lll} d_1+\int \sup\limits_{a_i(.)\in \mathcal{A}_i} H_i (t, s, m, a, d_2, d_3)\leq 0, \ \forall \ d \in D_{t, w}^{1, 2, +}V_{i, m}(t, w), \\ d_1+\int \sup\limits_{a_i(.)\in \mathcal{A}_i} H_i \geq 0, \ \forall \ d \in D_{t, w}^{1, 2, -}V_{i, m}(t, w). \ \end{array} \right. |
In particular, if V_{i, m}\in C^{1, 3} then
\left\{ \begin{array}{lll} P_i(t) = V_{i, ssm}(t, s(t)) = d_3, \\ Q_i(t) = \sigma V_{i, sssm}(t, s(t)). \end{array} \right. |
Proposition 5. If the first and second order weak derivatives with respect to s of V_{m} exist then the process P_i(t) = V_{i, ssm}(t, s(t)) solves the backward SDE:
dP_i = -\gamma_i dt+ \kappa_i dB, \ \ P_i(T) = g_{i, ss}(s(T), m(T))+\int_s \partial_{ss} g_{i, m} m(T, ds), |
where \gamma_i =H_{i, ss}+\tilde{\mathbb{E}} \partial_{ss}{H}_{i, m} +(2 b_s+\sigma_s^2) P_i+2\sigma_s Q_i, \kappa_i = \sigma V_{i, sssm} = Q_i
Proof. We compute explicitly the second order terms. Let \tilde{g}_i(t, s) = V_{i, ssm}(t, s) = v_{i, ss}^*(t, s). Applying Ito's formula to \tilde{P}_i(t) = \tilde{g}_i(t, s(t)) yields
\begin{array}{l} d\tilde{P}_i =[(\tilde{g}_i)_t+ b(\tilde{g}_i)_s +\frac{\sigma^2}{2}(\tilde{g}_i)_{ss}]dt +\sigma (\tilde{g}_i)_s dB \end{array}, |
We identify the coefficient processes \tilde{Q}_i = \sigma (\tilde{g}_i)_{s} = \sigma v_{i, sss}^* =\sigma V_{i, sssm}. It remains to find the drift coefficient of \tilde{P}_i.
\left\{\begin{array}{lll} \tilde{P}_i = v_{i, ss}^*(t, s) =V_{i, ssm}(t, s), \\ (\tilde{g}_i)_t = v_{i, tss}^*, \\ b(\tilde{g}_i)_s =bv_{i, sss}^*, \\ \frac{\sigma^2}{2}(\tilde{g}_i)_{ss} = \frac{\sigma^2}{2}v_{i, ssss}^* \end{array} \right. |
Summing together one obtains
\begin{equation} \left\{\begin{array}{lll} (\tilde{g}_i)_t+ b(\tilde{g}_i)_s +\frac{\sigma^2}{2}(\tilde{g}_i)_{ss} = v_{i, tss}^*+bv_{i, sss}^*+\frac{\sigma^2}{2}v_{i, ssss}^*\end{array} \right. \end{equation} | (18) |
In order to identify the latter term, one can differentiate twice with respect to s the equation (10) satisfied by the dual function v_i^*.
\begin{equation}\label{dpp5}\left\{ \begin{array}{lll} i\in \mathcal{I}, \\ v^*_{i, sst}+r_{i, ss}+(bv^*_{i, s})_{ss}+(\frac{\sigma^2}{2}v^*_{i, ss})_{ss}+ \tilde{\mathbb{E}} {H}_{i, ssm} = 0, \\ v^*_{i, ss}(T, m) = g_{i, ss}(s, m(T, .))+\tilde{\mathbb{E}} g_{i, ssm}, \\ m(t, .) = \mathbb{P}_{s^a(t)}, \ m(0, .) = m_0(.) \end{array} \right. \end{equation} | (19) |
\left\{\begin{array}{lll} (bv^*_{i, s})_{ss} = (b_s v^*_{i, s}+bv^*_{i, ss})_s = b_{ss}v^*_{i, s}+2b_s v^*_{i, ss}+bv^*_{i, sss}, \\ (\frac{\sigma^2}{2}v^*_{i, ss})_{ss} = [\sigma_s\sigma v^*_{i, ss}+ \frac{\sigma^2}{2}v^*_{i, sss}]_s =(\sigma_{ss}\sigma +\sigma_s^2)v^*_{i, ss}+ 2\sigma_s\sigma v^*_{i, sss}+\frac{\sigma^2}{2}v^*_{i, ssss} \end{array} \right. |
By substitution we obtain
\left\{\begin{array}{lll} 0 = v^*_{i, sst}+r_{i, ss}+(bv^*_{i, s})_{ss}+(\frac{\sigma^2}{2}v^*_{i, ss})_{ss}+ \tilde{\mathbb{E}} {H}_{i, ssm}\\ = v_{i, tss}^*+bv_{i, sss}^*+\frac{\sigma^2}{2}v_{i, ssss}^*\\ +r_{i, ss}+b_{ss}v^*_{i, s}+\sigma_{ss}(\sigma v^*_{i, ss})+\\ + (2b_s +\sigma_s^2) v^*_{i, ss}+ 2\sigma_s (\sigma v^*_{i, sss})+ \tilde{\mathbb{E}} {H}_{i, ssm} \end{array} \right. |
\left\{\begin{array}{lll} -[v_{i, tss}^*+bv_{i, sss}^*+\frac{\sigma^2}{2}v_{i, ssss}^*]\\ =r_{i, ss}+b_{ss}v^*_{i, s}+\sigma_{ss}\sigma v^*_{i, ss}+\\ + (2b_s +\sigma_s^2) v^*_{i, ss}+ 2\sigma_s (\sigma v^*_{i, sss})+\tilde{\mathbb{E}} {H}_{i, ssm}\\ = H_{i, ss}+(2b_s +\sigma_s^2) \tilde{P}_i+ 2\sigma_s \tilde{Q}_i+\tilde{\mathbb{E}} {H}_{i, ssm}\\ \end{array} \right. |
\begin{array}{l} d\tilde{P}_i =[(\tilde{g}_i)_t+ b(\tilde{g}_i)_s +\frac{\sigma^2}{2}(\tilde{g}_i)_{ss}]dt +\sigma (\tilde{g}_i)_s dB\\ = [v_{i, tss}^*+bv_{i, sss}^*+\frac{\sigma^2}{2}v_{i, ssss}^*]dt+ (\sigma v_{i, sss}^*) dB\\ = -[H_{i, ss}+\tilde{\mathbb{E}} {H}_{i, ssm}+(2b_s +\sigma_s^2) \tilde{P}_i+ 2\sigma_s \tilde{Q}_i]dt+ \tilde{Q}_i dB \end{array}, |
where H_{i, ss} = r_{i, ss}+b_{ss}p_i+\sigma_{ss}q_i. This completes the proof.
The space of square integrable real-valued deterministic functions over [0, T], is denoted as L^2([0, T], \mathbb{R}). From Riesz and Fischer's theorem, the space L^2([0, T], \mathbb{R}) of square Lebesgue-integrable functions over [0, T] is an (infinite-dimensional) complete metric space. Thus, L^2([0, T], \mathbb{R}) is an Hilbert space (with the inner product \langle f, g\rangle = \int_0^T f g dt).
Lemma 6 ([22], prop. 5.14). Every Hilbert space, that is not reduced to singleton \{0\}, has an orthonormal basis. In particular, L^2([0, T], \mathbb{R}) has an orthonormal basis. Let \{\hat{m}_k(.), k\in \mathbb{Z}_+\} be an orthonormal basis in the Hilbert space L^2([0, T], \mathbb{R}). Then, the following are equivalent:
• \{\hat{m}_k(.), k\in \mathbb{Z}_+\} is an orthonormal basis in the Hilbert space L^2([0, T], \mathbb{R}).
• \forall \ f\in L^2([0, T], \mathbb{R}), f(t) = \sum_{k\in \mathbb{Z}_+} \langle f, \hat{m}_k\rangle \hat{m}_k(t) = \sum_{k\in \mathbb{Z}_+} \left(\int_0^T f(t')\hat{m}_k(t')dt'\right)\hat{m}_k(t).
• One has \| f\|^2_{L^2([0, T], \mathbb{R})} = \sum_{k\in \mathbb{Z}_+}| \langle f, \hat{m}_k\rangle|^2 =\sum_{k\in \mathbb{Z}_+}\left(\int_0^T f(t)\hat{m}_k(t)\ dt\right)^2 for all f\in L^2([0, T], \mathbb{R}).
• The system \langle f, \hat{m}_k\rangle = 0\ \forall k \in \mathbb{Z}_+ implies that f = 0 (identically null function over [0, T]).
See Robinson ([22], prop. 5.14) for a proof of this result.
Example 3. Orthonormal Basis of L^2([0, T], \mathbb{R}):
• An important orthogonal basis of L^2[0, T] is the set \{1, \cos(\frac{2\pi}{T}kt), \sin(\frac{2\pi}{T}kt)\}_{ \ k\geq 1}. Normalizing by the respective norms in L^2([0, T], \mathbb{R}), one gets an orthonormal basis.
• One can use the Gram-Schmidt algorithm to construct new bases from more-or-less arbitrary collections of vectors. This is an inductive process which any basis of square integrable functions over [0, T], \{f_k \ | \ k\in \mathbb{Z}_+\}. (i) Since f_0\neq 0, we set \hat{m}_0 =\frac{f_0}{|| f_0||_{L^2([0, T], \mathbb{R})}} and (ii) then inductively
\hat{f}_{k+1}(t) ={f}_{k+1}(t)-\sum\limits_{i = 0}^k\langle {f}_{k+1}, \hat{m}_i\rangle \hat{m}_i(t), |
\hat{f}_{k+1}\neq 0 because otherwise it can be written as a linear combination of the vector \hat{m}_0(), \ldots, \hat{m}_k(). Thus, set \hat{m}_{k+1} =\frac{\hat{f}_{k+1}}{|| \hat{f}_{k+1}||_{L^2([0, T], \mathbb{R})}} to get a unit vector. (iii) Then the set \{\hat{m}_k(.) \ | \ k\in \mathbb{Z}_+\} is an orthonormal basis of L^2([0, T], \mathbb{R}).
Lemma 7. From any orthonormal basis \{\hat{m}_k\}_k of L^2([0, T], \mathbb{R}), we set \xi_k = \int_0^T \hat{m}_k(t) dB(t) be the stochastic Itô's integral. Then, the random variables \{\xi_k\}_k are identically distributed and Gaussian with zero mean and variance equals to
\int_0^T \hat{m}_k^2(t) dt =\|\hat{m}_k \|_{L^2([0, T], \mathbb{R})}^2 = 1. |
Furthermore, the standard Brownian motion can be decomposed as follows:
\begin{eqnarray} \nonumber B(t)& = &\int_0^t dB(t') = \int_0^T {\rm{ll}}_{[0, t]}(t')dB(t')\\ \nonumber && = \int_0^T \left( \sum\limits_{k\geq 0} \hat{m}_k(t') \langle {\rm{ll}}_{[0, t]}(.), \hat{m}_k(.)\rangle\right) dB(t')\\ \nonumber & = & \sum\limits_{k\geq 0} \langle {\rm{ll}}_{[0, t]}(.), \hat{m}_k(.)\rangle \int_0^T \hat{m}_k(t') dB(t') \\ && = \sum\limits_{k\geq 0} \left(\int_0^t \hat{m}_k(t')dt'\right)\xi_k. \end{eqnarray} | (20) |
The expansion
B(t) = \sum\limits_{k\geq 0} \xi_k\int_0^t \hat{m}_k(t')dt' |
converges in the mean-square sense:
\mathbb{E}\left[\|B(t)-\sum\limits_{k = 0}^K \xi_k\int_0^t \hat{m}_k(t')dt'\|^2\right] \rightarrow 0 |
as K goes to infinity, for t\leq T.
The mean-square error is
\mathbb{E}\left[\|\sum\limits_{k = K+1}^{+\infty} \xi_k\int_0^t \hat{m}_k(t')dt'\|^2\right] =\sum\limits_{k = K+1}^{+\infty} \left( \int_0^t \hat{m}_k(t')dt'\right)^2 =O(\frac{t}{K}). |
As a result of the above Lemma, we can view the Ito's process
s(t) = s(0)+\int_0^t b(t', s(t'), m(t'), a(t'))dt' +\int_0^t \sigma(t', s(t'), m(t'), a(t'))dB(t'), |
as a function of t, s(0), w and the set of random variables \xi = (\xi_k)_k. We denote the solution of the state equation as s(t): = s^{s_0, a}(t).
Definition 3. With the one-dimensional Hermite polynomials
\hat{H}_k(t) = \frac{(-1)^k}{k!} e^{\frac{t^2}{2}}\frac{d^k}{dt^k}\left[e^{-\frac{t^2}{2}}\right], |
the basis polynomials of the Wiener chaos space are defined by
\chi_{\alpha}(\xi) =\sqrt{\alpha !} \prod\limits_{i} \hat{H}_{\alpha_i}(\xi_i) |
where \alpha! is the product of all the components' factorials, \alpha denotes the multi-index from the set
\{ \alpha = (\alpha_i)_i\ | \ \alpha_i\geq 0, \ | \ |\alpha| = \sum\limits_{i = 0}^{+\infty}\alpha_i<+\infty\}. |
Note that, for a multi-index \alpha such that |\alpha | < +\infty, the number of the terms in the product \prod_{i} \hat{H}_{\alpha_i} is finite. From Definition 3 and the properties of the Hermite polynomials one directly shows the orthogonality of the above polynomial basis, which are often referred to as Wick polynomials. If the controlled state process s\in L^2(\Omega \times [0, T], \mathbb{R}) then one can decompose it as
s(t) = \sum\limits_{\alpha} \langle s(t), \chi_{\alpha}(\xi)\rangle \chi_{\alpha}(\xi) = \sum\limits_{\alpha} E[s(t)\chi_{\alpha}(\xi)] \chi_{\alpha}(\xi), |
where \langle x, y\rangle = E[x y], when x and y are random variables. Particularly the coefficient function of order zero, s_0(t) = E[s(t)\chi_{0}(\xi)], obtains a special meaning. It coincides with the expectation of the process x(t), as the basis polynomial of order zero is identically one.
This is summarized in the following well-known result:
Theorem 3.1 (Cameron and Martin [23]). Assume that the state process s(t) is adapted to the filtration generated by the Wiener chaos \{\chi_{\alpha}(\xi)\}_{\alpha}, and is in L^2(\Omega \times [0, T], \mathbb{R}) i.e., it satisfies the integrability condition E[\int_0^T|s(t)|^2 dt] < +\infty. Then, s(t) can be expanded in [0, T] as
s(t, \xi) = \sum\limits_{\alpha} s_{\alpha}(t)\chi_{\alpha}(\xi), |
where the deterministic coefficient functions s_{\alpha}(t) =E[x(t)\chi_{\alpha}(\xi)] can be interpreted as projections of the process s(t) onto the corresponding chaos basis.
Note that the variance starts with index 1 and not from 0. The polynomial chaos (PC) or Wiener chaos framework was developed by Norbert Wiener [24,25] and generalized by Cameron and Martin later on [23]. The Fourier-Hermite series of s(t) is often called the Wiener chaos expansion (WCE).
Theorem 3.2 (Error estimates, Theorem 2.1 in [26]). If there exists k such that \| \partial_{\xi}^k s(t, \xi)\|^2 < +\infty, then the following error estimates holds:
\| s(t)-\sum\limits_{|\alpha| \leq K } s_{\alpha}(t)\chi_{\alpha}(\xi)\|^2 \leq \frac{ \| \partial_{\xi}^k s(t, \xi)\|^2}{\prod\limits_{i = 0}^{k-1}(K-i+1)} = :\epsilon_K. |
It is important to notice that this error \epsilon_K is much better than Monte-Carlo sampling (in the order of O\left(\frac{1}{\sqrt{K}}\right) for K samples) when the random processes of the basis capture the state process s. This provides a more efficient way to solve mean-field-type game problem. However, the choice of a basis is crucial and the chosen elements in the basis need to be sparse enough in order to reduce the curse of dimensionality.
We now reformulate the mean-field-type game problems using the PC framework above. We replace the state process s and the control process by their respective PC expansions. Therefore, s is determined by its coefficient functions (s_{\alpha})_{\alpha}. We can compute the cost as
\begin{eqnarray} && \mathbb{E} g_i( s(T), m(T)) =\hat{g}_i( (s_{\alpha}(T))_{\alpha})\\ \nonumber && \mathbb{E}r_i(t, s(t), m(t), a(t)) = \hat{r}_i(t, (s_{\alpha}(t))_{\alpha}, (a_{\alpha}(t))_{\alpha}). \end{eqnarray} | (21) |
and the state dynamics coefficients are
s_{\alpha}(t) = s_{\alpha}(0){\rm{ll}}_{\alpha = 0}+\langle \chi_{\alpha}(.), \int_0^t b dt'\rangle + \langle \chi_{\alpha}(.), \int_0^t \sigma dB(t')\rangle. |
Since the Lebesgue integral term is \langle \chi_{\alpha}(.), \int_0^t b dt' \rangle =\int_0^t b_{\alpha} dt', the Ito's integral term becomes
\langle \chi_{\alpha}(.), \int_0^t \sigma dB(t')\rangle = \int_0^t \sigma \partial_{t'}[\chi_{\alpha}] dt'. |
Using Hermite polynomials, we know that
\partial_{t'}[\chi_{\alpha}] = \sum\limits_{k = 1}^{\infty}\sqrt{\alpha_k} m_k(t') \chi_{\hat{\alpha}^k}(\xi) |
where \hat{\alpha}^k = (\alpha_1, \ldots, \alpha_{k-1}, \alpha_{k}-1, \alpha_{k+1}, \ldots), i.e., \hat{\alpha}^k_i = (\alpha_{k}-1){\rm{ll}}_{i = k}+\alpha_{i}{\rm{ll}}_{i\neq k}. It turns out that s_{\alpha} solves the ordinary differential system:
\dot{s}_{\alpha}(t) = b_{\alpha}(t)+ \sum\limits_{k = 1}^{\infty}\sqrt{\alpha_k} m_k(t) E[\sigma (t, .)\chi_{\hat{\alpha}^k}(\xi)] |
with initial condition s(0){\rm{ll}}_{\alpha = 0}.
Proposition 6. The best response problem (1) becomes the following: For each decision-maker i, given the coefficient strategies of the others (\{a_{i, \alpha}\}_{\alpha}, \ j\neq i) the best response solves
\begin{eqnarray} \label{chaos} \left\{ \begin{array}{lll} \sup\limits_{(a_{i, \alpha}(\cdot))_{\alpha}} \mathbb{E} \hat{\mathcal{R}}_{i, T}, \\ \displaystyle{\mbox{ subject to }\ }\\ \dot{s}_{\alpha}(t) = b_{\alpha}(t)+ \sum\limits_{k = 1}^{\infty}\sqrt{\alpha_k} \hat{m}_k(t) E[\sigma (t, .)\chi_{\hat{\alpha}^k}] \\ s_{\alpha}(0) = s(0){\rm{ll}}_{\alpha = 0}, \\ \end{array} \right. \end{eqnarray} | (22) |
where \hat{\mathcal{R}}_{i, T} =\hat{g}_i+ \int_0^T \ \hat{r}_idt.
Proof. We first rewrite the instantaneous payoff functions in terms Wiener chaos. We replace the state process s(t) by its decomposition \sum_{\alpha}s_{\alpha}(t)\chi_{\alpha}(\xi).
\begin{eqnarray} \mathbb{E}r_i(t, s(t), m(t), a(t)) = : \hat{r}_i(t, \{s_{\alpha}(t)\}_{\alpha}, \{a_{\alpha}(t)\}_{\alpha}) \end{eqnarray} | (23) |
In addition, the coefficients dynamics s_{\alpha}(t) can be easily obtained from the state dynamics and s_{\alpha} solves a standard differential system:
\dot{s}_{\alpha}(t) = b_{\alpha}(t)+ \sum\limits_{k = 1}^{\infty}\sqrt{\alpha_k} \hat{m}_k(t) \mathbb{E}[\sigma (t, .)\chi_{\hat{\alpha}^k}(\xi)]. |
with initial condition s(0){\rm{ll}}_{\alpha = 0}. Collecting together, we obtain a standard dynamic optimization problem subject to coefficient dynamics with multiple index. This completes the proof.
As we can observe, problem (22) is now a standard differential game with standard deterministic state dynamics. Note that, in order to get these transformations, the underlying problem must be included in the space of square integrable random process Hilbert space. This implies, that the variance of the problem must be finite.
In this section, equilibrium system for mean-field-type games with aggregative structures are presented.
In this subsection we take b, \sigma, r_i, g_i as functions of m^a only through aggregative terms \int \phi_{.}(s) m^a(t, ds) = \mathbb{E}[\phi_{.}(s^a(t))], and \int \phi_{.}(a') \mathbb{P}_a(t, da') = \mathbb{E}[\phi_{.}(a(t))]. In that case the Gâteaux differentiation with respect to m can be reduced to a finite-dimensional differentiation.
\begin{equation}\label{SDEugamersnoisyrn} \left\{\begin{array}{lll} \mathcal{R}_{i, T} =\left[\int_0^Tr_i(t, s^a(t), \mathbb{E}[\phi_{r_i}(s^a(t))], a(t), \mathbb{E}[\phi_{r_i}(a(t))])\, dt + g_i(s^a(T), \mathbb{E}[\phi_{g_i}(s^a(t))]) \right], \\ \sup\limits_{a_i\in \mathcal{U}_i} \mathbb{E}\mathcal{R}_{i, T} \ \ \mbox{subject to}\\ ds^a(t) = {b}(t, s^a(t), \mathbb{E}[\phi_{b}(s^a(t))], a(t), \mathbb{E}[\phi_{b}(a(t))]) \ dt+ \sigma( t, s^a(t), \mathbb{E}[\phi_{\sigma}(s^a(t))], a(t), \mathbb{E}[\phi_{\sigma}(a(t))])dB(t), \\ s^a(0) = s_0.\\ \end{array}\right. \end{equation} | (24) |
If b, \sigma, r_i are only functions of (t, s^a, \mathbb{E}[\phi_{.}(s^a(t))], a, \mathbb{E}[\phi_{.}(s(t))]), then the partial derivatives of the dual functions p_i = V_{sm}(t, s^a(t)) and q_i = \sigma V_{ssm}(t, s^a(t)) solve backward SDE system:
\begin{equation}\label{SDEugamersnoisyrn2}\left\{ \begin{array}{l} dp_i = \{-H_{i, s}-\phi_{r_i, s}\mathbb{E}[r_{i, y}]-\phi_{b, s}\mathbb{E}[b_{y} p_i]-\phi_{\sigma, s}\mathbb{E}[\sigma_{y} q_i]\} +q_i dB, \\ p_i(T) =g_{i, s}(s^a(T), \mathbb{E}[\phi_{g_i}(s^a(T))]) +\phi_{g_i, s}(T)\mathbb{E}[g_{i, y}(s^a(T), \mathbb{E}[\phi_{g_i}(s^a(T))])] \end{array} \right. \end{equation} | (25) |
where H_{i, s} = r_{i, s}+b_s. p_i+\sigma_s q_i.
Note that the aggregative structure in the form \tilde{E}[\phi(S(t), \tilde{S}(t))] = \int_{w} \phi(S, w)m(t, dw) which is a random variable, is already included in the cases \phi(t, s, m) discussed in the previous section.
We take b, \sigma, r_i as functions of the k-th moment of the state \int s^k m^a(t, ds) = \mathbb{E}[(s^a(t))^k], and the l-moment of the control action of decision-maker i \int (a'_i)^l \mathbb{P}_{a_i}(t, da'_i) = \mathbb{E}[a^l_i(t)]. Thus, aggregative function is \phi_{.}(y) = y^k. Then, the derivative of \phi is \phi_{., y} = k y^{k-1}. For k>1, the dynamics yields
\begin{equation}\label{moment}\left\{ \begin{array}{l} dp_i = -\{H_{i, s}+k s^{k-1}\mathbb{E}H_{i, y}\} dt+q_i dB, \\ p_i(T) =g_{i, s}(s^a(T), \mathbb{E}[(s^a(T))^k]) +k s^{k-1}(T)\mathbb{E}[g_{i, y}(s^a(T), \mathbb{E}[(s^a(T))^k])] \end{array} \right. \end{equation} | (26) |
The existence of solution is not immediate. It requires the (k-1)-moment estimates of the state for k>1.
We take b, \sigma, r_i as functions of the first moments \int s m^a(t, ds) = \mathbb{E}[s^a(t)], and \int a' \mathbb{P}_a(t, da') = \mathbb{E}[a(t)]. Then the aggregative functions are reduced to the identity function \phi_{.}(y) = y. Then, the derivative of \phi is \phi_{., y} = 1. Hence, the first order risk-neutral adjoint processes p_i(t) = V_{i, sm}(t, s^a(t)) and q_i(t) = \sigma V_{i, ssm}(t, s^a(t)) solve a simpler backward SDE system and (25) reduces to
\begin{array}{l} dp_i = \{-H_{i, s}-\mathbb{E}[H_{i, y}(t, {s^a}, \mathbb{E}[s^a], p_i, q_i)]\} +q_i dB\\ p_i(T) =g_{i, s}(s^a(T), \mathbb{E}[s^a(T)]) +\mathbb{E}[g_{i, y}(s^a(T), \mathbb{E}[s^a(T)])]. \end{array} |
However, the optimal control strategy equation is modified because of the presence of E[a(t)] In the convex control set case, one obtains the variational inequality (H_{i, a}+\mathbb{E}H_{i, \bar{a}})(a_i-a'_i)\geq 0 where \mathbb{H}_{i, \bar{a}_i} denotes the Hamiltonian derivative with respect to the component \bar{a}_i = \mathbb{E}[a_i(t)].
This section examines situations in which the state is partially observed.
\begin{equation}\label{SDEugamersnoisytt} \left\{\begin{array}{lll} \mathcal{R}_{i, T}(m_0, a) = \int_0^Tr_i(t, s^{s_0, a}(t), m^{m_0, a}(t), a(t))\, dt + g_i(s^{s_0, a}(T), m^{m_0, a}_1(T)), \\ \sup\limits_{a_i\in \mathcal{A}_i} \mathbb{E}^a \mathcal{R}_{i, T}(m_0, a) \ \mbox{subject to}\\ ds^{s_0, a}(t) = {b}(t, s^{s_0, a}(t), m^{m_0, a}(t), a(t)) \ dt\\ \quad \quad+ \sigma(t, s^{s_0, a}(t), m^{m_0, a}(t), a(t))dB(t)\\ \quad \quad + \sigma_o(t, s^{s_0, a}(t), m^{m_0, a}(t), a(t)) dB_o(t), \\ s^a(0) = s_0, \\ dy^{a}(t) = {b}_o(t, s^{s_0, a}(t), m^{m_0, a}(t), a(t)) \ dt+ dB_o(t), \ y^a(0) = 0, \\ m^{m_0, a}(t, \ .): = \mathbb{P}_{s^{s_0, a}(t)}. \end{array}\right. \end{equation} | (27) |
where \mathbb{E}^a denotes the the expectation with respect to the probability space (\Omega, \mathbb{F}, \{\mathcal{F}_t\}_t, \mathbb{P}^a) where \mathbb{P}^a is defined below. Under partial state observation an admissible strategy a_i is a random process a_i:\ [0, T]\times \Omega \rightarrow A_i that is adapted to \mathcal{F}^y_t = \sigma(y({t'}), \ t'\leq t) and \mathbb{E}\int_0^T |a_i(t, .)|^{\alpha} dt < \infty. The set of admissible strategies is denoted by \hat{\mathcal{A}}_i. Introduce the density process \rho^a(t) = e^{\int_0^t b_o(t') dy^a(t') -\frac{1}{2}\int_0^t |b_o(t')|^2 dt'}. Then, \rho^a(t) solves the forward SDE d\rho^a = \rho^a b_o dy^a, \ \rho^a(0) = 1. By Girsanov transform, d\mathbb{P}^a =\rho^a d \mathbb{P}, the partial observation problem is transformed into a full observation problem with respect to \mathbb{P} and with a new state (\hat{s}^a, \rho^a).
\begin{equation}\label{SDEpartial} \left\{\begin{array}{lll} \sup\limits_{a_i\in \hat{\mathcal{A}}_i}\ \ \mathbb{E} [\int_0^T\rho^a(t) r_i(t, \hat{s}^{s_0, a}(t), m^{m_0, a}(t), a(t))\, dt + \rho^a(T)g_i(\hat{s}^{s_0, a}(T), m^{m_0, a}(T))] \\ \mbox{subject to}\\ d\hat{s}^a(t) = {b} \ dt+ \sigma dB(t)+ \sigma_o [dy^a(t)-{b}_o \ dt], \\ = [{b}-\sigma_ob_o] \ dt+ \sigma dB+ \sigma_o dy^a(t), \\ \hat{s}^a(0) = s_0, \\ d\rho^a = \rho^a b_o dy^a, \ \rho^a(0) = 1. \end{array}\right. \end{equation} | (28) |
This problem is similar to (1) but with a new state (\hat{s}^{s_0, a}, \rho^a). One obtains an augmented state (\hat{s}, \rho) and the infinite-dimensional DPP can be directly applied by considering the probability measure m(t, d\hat{s}d\rho) = \mathbb{P}_{\hat{s}^a(t), \rho^a(t)}. Similarly the stochastic maximum principle can be applied with the state (\hat{s}, \rho).
d\left(\begin{array}{lll} \hat{s}^a\\ \rho \end{array}\right) = \left(\begin{array}{lll} {b}-\sigma_ob_o\\ 0 \end{array}\right)dt+ \left(\begin{array}{ll} \sigma & \sigma_o \\ 0 & \rho^a b_o \end{array}\right) \left(\begin{array}{lll} dB\\ dy^a \end{array}\right). |
The new Hamiltonian for decision-maker i is
H_i(t, s, \rho, m, (p_1, p_2), q) = \rho r_i+ [{b}-\sigma_ob_o] p_1+ trace(\Gamma' q), |
where \Gamma: = \left(\begin{array}{ll} \sigma & \sigma_o \\ 0 & \rho^a b_o \end{array}\right), and \Gamma' is the transpose of the matrix \Gamma, and q: = \left(\begin{array}{ll} q_{11} & q_{12} \\ q_{21} & q_{22} \end{array}\right). For the optimal control action, however, it should be conditioned on the observation filtration \mathcal{F}_t^y. A necessary condition in the smooth case yields
\mathbb{E}\left[\delta H_i+\frac{1}{2}P_i [\delta \sigma]^2 \ | \ \mathcal{F}_t^y\right]\leq 0. |
Note that the methodology extends to the case of individual observation per decision-maker i using the filtration \mathcal{F}_{i, t}^{y_i}.
In this section focuses on recent development and or extensions of mean-field-type games.
Consider a mean-field-type game setup with the following data:
\left\{ \begin{array}{cc} \mbox{Time step: } & t\in \{0, 1, \ldots, T-1\}\\ \mbox{Set of decision-makers:} &\mathcal{I}\\ \mbox{Initial state : } & s_{0} \sim m_0^s\\ \mbox{Stochastic state dynamics: } & s(t+1) \sim q_{t+1}(. | \ s(t), m^s(t), m^a(t), a(t))\\ \mbox{Instant payoff of}\ i : & r_{i}(t, s(t), m^s(t), m^a(t), a(t))\\ \mbox{Terminal payoff of } i: & g_{i}(s(T), m^s(T))\\ \end{array} \right. |
Here the time space is a discrete set and denoted by \{0, 1, \ldots, T-1\}, T\geq 1 is the length of the horizon, t denotes a time step, s is the state process of the system, m_0^s is the initial distribution of states. A decision-maker is denoted by i, has an action space A_i. The system state s(t) is stochastic and its probability transition from t to t+1 is given by q_{t+1}: \ \mathcal{S}\times \Delta(\mathcal{S})\times \prod_i \Delta(A_i) \times \prod_i A_i \rightarrow \Delta(\mathcal{S}). Denote by m^s(t) the distribution of state and by m^a(t) the distribution of actions at time t. Decision-maker i's cumulative payoff is
\mathcal{R}_{i, T} (m_0^s, a) =\sum\limits_{t = 0}^{T-1}r_{i}(t, s(t), m^s(t), m^a(t), a(t)) + g_{i}(s(T), m^s(T)), |
The risk-neutral best response problem of i is
\begin{equation} \left\{\begin{array}{lll} \sup\limits_{a_i}\mathbb{E}\mathcal{R}_{i, T} (m_0^s, a)\ \ \mbox{subject to}\\ s(t+1) \sim q_{t+1}(. | \ s(t), m^s(t), m^a(t), a(t)), \\ m^s(t, .) = \mathbb{P}_{s(t)}, \ s_{0} \sim m_0^s, \end{array} \right. \end{equation} | (29) |
Let express the expected payoff in terms of the measure m(t, .).
Er_{i} (t, s(t), m^s(t), m^a(t), a(t)) = \int r_{i} (t, \bar{s}, m^s(t), m^a(t), a(t)) m^s(t, d\bar{s}) = \hat{r}_{i} (t, m(t), a(t)) |
where \hat{r} _{i} depends only on the measure m(t) and the strategy profile a(t). Similarly one can rewrite the expected value of the terminal payoff as
Eg_{i} (s(T), m^s(T)) = \int g_{i} (\bar{s}, m^s(T))m^s(T, d\bar{s}) =\hat{g}_{i} (m^s(T)). |
Proposition 7. On the space of measures, one has a deterministic dynamic game problem over multiple stages. Therefore a dynamic programming principle (DPP) holds:
\begin{equation}\left\{ \begin{array}{c} V_{i} (t, m^s(t)) = \sup\limits_{a'_i}\left\{ \hat{r}_{i} ( t, m^s(t), a'_{i}(t), a_{-i}(t)) +V_{i} (t+1, m^s({t+1})) \right\}\\ m^s(t+1, ds') =\int_s q_{t+1}(ds' | \ s, m^s(t), m^a(t), a(t)) m^s(t, ds) \end{array}\right. \end{equation} | (30) |
As we can see the best-response strategy may be dependent on the state, the mean-field m^s, which is referred to as (state-and-mean-field) feedback strategy. Therefore, m^{ a(t)} can be expressed as a function of (s(t), m^{s(t)}(t, .)). Thus, the payoff \hat{r}_{i}(t, .) can be expressed as a function (m(t, .), a(t)).
Proposition 8. Suppose a sequence of real-valued function V_{i}(t, .), \ t\leq T defined on the set of probability measures over \mathcal{S} is satisfying the DPP relation above. Then V _{i}(t, m) is the value function on \Delta(\mathcal{S}) starting from m(t) = m. Moreover if the supremum is achieved for some action a^*_i(., m), then the best response strategy is in (state-and-mean-field) feedback form, and the payoff value is
R_i(a^*) = V_{i} (0, m_0). |
Proposition 8 provides a sufficiency condition for best-response strategies in terms of (s, m^s(t)). The proof is immediate and follows from the verification theorem of DPP in deterministic dynamic games.
articular cases of interest
• Finite state space: Suppose that the state space and the action spaces are nonempty and finite. Let the state transition be
\mathbb{P}( s(t+1) = s'\ | \ s(t), m^s(t), m^a(t), a(t)) =q_{t+1}(s' | \ s(t), m^s(t), m^a(t), a(t)), |
DPP becomes
\left\{ \begin{array}{c} V_{i} (t, m^s(t)) = \sup\limits_{a'_i}\left\{ \hat{r}_{i} (t, m(t), a'_{it}, a_{-i, t}) +V_{i}(t+1, m^s(t+1)) \right\}\\ m^s(t+1, s') =\sum\limits_{s\in \mathcal{S}} q_{t+1}(s' | \ s, m^s(t), m^a(t), a(t)) m^s(t, s) \end{array}\right. |
As for classical polymatrix games, a pure mean-field equilibrium may not exist in general in mean-field-type games with finite actions. However mixed extension can be adopted. By extending the action space to the set of probability measures on A_i and the functions \hat{r}_{it}, \hat{g}_{iT}^, q_{t+1} to the corresponding mixed extensions, one gets the existence of mean-field equilibria in behavioral (mixed) strategies for payoffs that are independent of m^a.
• Continuous state space: Consider the interactive state dynamics in discrete time
s(t+1) =s(t)+ b(t+1, s(t), m^s(t), m^a(t), a(t), \eta_{t+1}) |
where \eta is random process. The transition kernel of s(t+1) given s(t), m^s(t), m^a(t), a(t) is
q_{t+1}(ds' | \ s(t), m^s(t), m^a(t), a(t)) =\int_{\eta} P(ds'\ni s(t)+ b_{t+1}( s(t), m^s(t), m^a(t), a(t), \eta)) \mathbb{P}_{\eta_{t+1}}(d\eta) |
where \mathbb{P}_{\eta_{t+1}}(d\eta) denotes the probability distribution of \eta_{t+1}. The Bellman equation 7 applies this case.
• Mean-field free case: If r_i(s, a, m^s, m^a) = r_i(s, a) and g_i(s, m^s) = g_i(s) for every decision-maker i then
\hat{r}_i(m, a) = \int_s {r}_i(s, a) m^s(t, ds). |
There exists a function {v}_i such that
V_i(t, m^s(t)) = \langle {v}_i(t, .), m^s(t, .)\rangle =\int_s {v}_i(t, s) m^s(t, ds), |
{v}_i(t, s) is a mean-field free equilibrium payoff function of i. In that case, the dynamic programming reduces to {v}_{i}(t, s) = \sup_{a'_{it}}H_i(t, s, a'_{it}, a_{-i, t}),
H_i = {r}_{i}(t, s, a'_{it}, a_{-i, t}) +\int_{s'} {v}_{i}(t+1, s')q_{t+1}(ds' | s(t), a'_{i}(t), a_{-i}(t)), |
which is the classical Bellman-Shapley equilibrium system. Note that in this case v_i(t, s) = \partial_{m(t, s)} V(t, m).
Finite state continuous time games
Discrete state (countable or finite) games of mean-field type can be analyzed using the above methodologies. State process becomes a continuous time interactive Markovian decision process in which mean-field terms such as distribution of states and distribution of control actions are involved in it.
Each decision-maker has a risk-sensitivity index \theta_i\in \mathbb{R}. When \theta_i vanishes, one gets the risk-neutral setup discussed above. Consider the risk-sensitive best response of i as
\begin{equation}\label{SDEugamersnoisy} \left\{\begin{array}{lll} \sup\limits_{a_i\in \mathcal{A}_i} \frac{1}{\theta_i}\log\left( \mathbb{E}e^{\theta_i \mathcal{R}_{i, T}}\right)\ \mbox{subject to}\\ s(t) =s_0+\int_0^t b dt'+\int_0^t \sigma dB(t'), \ t>0\\ s(t) =s^{s_0, a}(t), \ s(0) = s_0\sim m_0\\ m(t, .) = m^{m_0, a}(t, .) = \mathbb{P}_{s^{s_0, a}(t)} \end{array}\right. \end{equation} | (31) |
Introduce the following augmented state (s, z) such that z_i(0) = 0, and dz_i(t) =r_i(t, s^a(t), m^a(t), a(t))\, dt. Then,
e^{\theta_i \mathcal{R}_{i, T}} =e^{\theta_i z_i(T)+ \theta_i g_i(s^a(T), m^a(T))}. |
Thus, we obtain the following mean-field-type control problem for decision-maker i:
\begin{equation}\label{D2dppmfsdersttnoisy} \left\{\begin{array}{lll} \sup\limits_{a_i} \mathbb{E}e^{\theta_i [z_i(T)+ g_i(s^a(T), m^a(T))]}\\ z^a_i(t) =\int_0^t r_i\, dt', \\ z^a_i(0) = 0, \\ s(t) =s_0+\int_0^t b dt'+\int_0^t \sigma dB(t'), \ t>0\\ s^a(0) = s_0. \end{array}\right. \end{equation} | (32) |
Let us introduce \mu(t, dsdz) = \mathbb{P}_{s^a(t), z^a(t)}, the probability measure associated with the joint process (s^a(t), z^a(t)). Generically, the measure \mu solves the Fokker-Planck-Kolmogorov forward equation in the distributional (weak) sense. The term \mathbb{E}e^{\theta_i \mathcal{R}_{i, T}} can be rewritten in a deterministic manner as
\int \mu(T, dsdz) \ e^{\theta_i(z_i+g_i(s, \int_{\tilde{z}}\mu(T, ., d\tilde{z})))}\ . |
This is a terminal cost in the sense that it is evaluated only at \mu(T, .). \mu is an infinite dimensional quantity that serves as a state in the problem. The advantage now is that \mu(.) is a deterministic quantity. Since there is no running cost, one can write directly the HJB equation using classical calculus of variations for
\hat{V}^{\theta}_i(t, \mu) = \sup\limits_{a_i\in \mathcal{A}_i} \int \mu(T, dsdz) \ e^{\theta_i(z_i+g_i(s, \int_{\tilde{z}}\mu(T, ., d\tilde{z})))}, |
starting from \mu(t, .) = \mu at time t: The risk-sensitive best-value is {V}^{\theta}_i(t, \mu): = \frac{1}{\theta_i}\log \hat{V}^{\theta}_i(t, \mu).
\mu_t = -\partial_s[{b} \mu] -\mbox{div}_z(f \mu)+\frac{1}{2}\partial_{ss}(\sigma'\sigma \mu), | (33) |
with initial distribution \ \mu(0, dx, dz) = m_0(dx)\delta_0(dz) and {\sigma}' is the transpose of {\sigma}.
The risk-sensitive HJB system yields
\begin{eqnarray} \nonumber 0& = & \hat{V}^{\theta}_{i, t}(t, \mu)+ \int {H}^{\theta}_i \mu(t, dsdz), \\ \nonumber && \hat{V}^{\theta}_i(T, \mu) =\int \mu(T, dsdz) \ e^{\theta_i(z_i+g_i(s, \int_{\tilde{z}}\mu(T, ., d\tilde{z})))}, \ \\ && i\in \mathcal{I} \label{hjbrsnoisy} \end{eqnarray} | (34) |
where {H}^{\theta}_i(t, s, z, \mu, \hat{V}_{i, z\mu}, \hat{V}_{i, ss\mu}) =\sup_{a_i} {b}\partial_s \hat{V}_{i, \mu} +\langle r, \partial_z \hat{V}_{i, \mu}\rangle +\frac{\sigma^2}{2} \partial_{ss} \hat{V}_{i, \mu}. Note that
\begin{eqnarray} \nonumber \hat{V}^{\theta}_i(T, \mu)& = & \int \mu(T, dsdz) \ e^{\theta_i(z_i+g_i(s, \int_{\tilde{z}}\mu(T, ., d\tilde{z})))}\\ \nonumber & = &\int_{s, z_i} \ e^{\theta_i(z_i+g_i(s, \int_{\tilde{z}_i}\int_{\tilde{z}_{-i}}\mu(T, ., d\tilde{z})))} \int_{z_{-i}}\mu(T, dsdz)\\ & = &\int_{s, z_i} \ e^{\theta_i(z_i+g_i(s, \int_{\tilde{z}_i}\mu_i(T, ., d\tilde{z}_i)))}\mu_i(T, dsdz_i) =\hat{V}^{\theta}_i(T, \mu_i), \end{eqnarray} | (35) |
where \mu_i(t, dsdz_i) = \int_{z_{-i}} \mu(t, dsdz_idz_{-i}), is the marginal of \mu with respect to (s, z_i), and \hat{V}^{\theta}_i(t, \mu) = \tilde{V}_i(t, \mu_i), The above calculation shows that the value function \frac{1}{\theta_i}\log \hat{V}^{\theta}_i depends only on the measure \mu_i
Proposition 9. If there exists a function \tilde{V} with
\tilde{V}_{i, t}, \tilde{V}_{i, s\mu_i}, \tilde{V}_{i, ss\mu_i}, \tilde{V}_{i, zz\mu_i}, |
satisfying
\begin{eqnarray} \nonumber 0& = &\tilde{V}_{i, t}+ \int {H}^{\theta}_i \mu_i(t, dsdz_i), \\ \nonumber && \tilde{V}_i(T, \mu_i) =\int \mu_i(T, dsdz_i) \ e^{\theta_i(z_i+g_i(s, \int_{\tilde{z}}\mu_i(T, ., d\tilde{z})))}, \ \\ \nonumber && \mu_i(t, dsdz_i) = \int_{z_{-i}} \mu(t, dsdz_idz_{-i}), \\ \nonumber && \mu_t = -\partial_s[{b} \mu] -\mbox{div}_z(r \mu)+\frac{1}{2}\partial_{ss}(\sigma^2 \mu), \\ \nonumber && \mu(0, dsdz) = m_0(ds)\delta_{0}(dz), \\ && i\in \mathcal{I}, \ \label{hjbrsnoisytt} \end{eqnarray} | (36) |
where the integrand Hamiltonian is
\begin{array}{l} {H}^{\theta}_i (t, s, z, \mu_i, \tilde{V}_{i, s\mu_i}, \tilde{V}_{i, z_i\mu_i}, \tilde{V}_{i, ss\mu_i}) \ =\sup\limits_{a_i}[{b}\tilde{V}_{i, s\mu_i} + r_i\tilde{V}_{i, z_i\mu_i} +\frac{\sigma^2}{2} \tilde{V}_{i, ss\mu_i}]\\ = \tilde{V}_{i, z_i\mu_i}\sup\limits_{a_i}[r_i +{b}\frac{\tilde{V}_{i, s\mu_i} }{\tilde{V}_{i, z_i\mu_i}} +\frac{\sigma^2}{2} \frac{\tilde{V}_{i, ss\mu_i}}{\tilde{V}_{i, z_i\mu_i}}]\ = \tilde{V}_{i, z_i\mu_i}H_i(t, s, z, \mu_i, \frac{\tilde{V}_{i, s\mu_i} }{\tilde{V}_{i, z_i\mu_i}}, \frac{\tilde{V}_{i, ss\mu_i}}{\tilde{V}_{i, z_i\mu_i}}) \end{array} |
then \frac{1}{\theta_i}\log \tilde{V}_{i}(0, \mu) is an equilibrium payoff for decision-maker i and the best-response strategy a_i minimizes {H}^{\theta}_i given the other decision-makers' strategies a_{-i}.
Proof. Apply the recipe of Proposition 1 with the augmented state (s, z) and the measure \mu
From the relation
{V}^{\theta}_i(t, \mu): =\frac{1}{\theta_i}\log \hat{V}^{\theta}_i(t, \mu) \implies e^{\theta_i {V}^{\theta}_i(t, \mu)} =\hat{V}^{\theta}_i(t, \mu), | (37) |
It follows that
\begin{array}{l} \hat{V}^{\theta}_{i, \mu}(t, \mu) =\theta_i {V}^{\theta}_{i, \mu}(t, \mu) e^{\theta_i {V}^{\theta}_i(t, \mu)} =\theta_i {V}^{\theta}_{i, \mu}(t, \mu) \hat{V}^{\theta}_i(t, \mu), \\ \hat{V}^{\theta}_{i, s\mu}(t, \mu) =\theta_i {V}^{\theta}_{i, s\mu}(t, \mu) \hat{V}^{\theta}_i(t, \mu), \\ \hat{V}^{\theta}_{i, z\mu}(t, \mu) =\theta_i {V}^{\theta}_{i, z\mu}(t, \mu) \hat{V}^{\theta}_i(t, \mu), \\ \hat{V}^{\theta}_{i, ss\mu}(t, \mu) =\theta_i {V}^{\theta}_{i, ss\mu}(t, \mu) \hat{V}^{\theta}_i(t, \mu), \\ \hat{V}^{\theta}_{i, t}(t, \mu) =\theta_i {V}^{\theta}_{i, t}(t, \mu) \hat{V}^{\theta}_i(t, \mu). \end{array} |
Thus,
\begin{array}{l} {H}^{\theta}_i (t, s, z, \mu_i, \hat{V}_{i, s\mu_i}, \hat{V}_{i, z_i\mu_i}, \hat{V}_{i, ss\mu_i}) \ = \hat{V}_{i, z_i\mu_i}H_i(t, s, z, \mu_i, \frac{\hat{V}_{i, s\mu_i} }{\hat{V}_{i, z_i\mu_i}}, \frac{\hat{V}_{i, ss\mu_i}}{\hat{V}_{i, z_i\mu_i}})\\ % = \theta_i {V}^{\theta}_{i, z\mu} \hat{V}^{\theta}_i(t, \mu) H_i(t, s, z, \mu_i, \frac{{V}^{\theta}_{i, s\mu_i} }{{V}^{\theta}_{i, z_i\mu_i}}, \frac{{V}^{\theta}_{i, ss\mu_i}}{{V}^{\theta}_{i, z_i\mu_i}}) \end{array} |
\begin{eqnarray} \nonumber 0& = & \theta_i {V}^{\theta}_{i, t}(t, \mu) \hat{V}^{\theta}_i(t, \mu) + \int \theta_i {V}^{\theta}_{i, z\mu} \hat{V}^{\theta}_i(t, \mu). H_i(t, s, z, \mu_i, \frac{{V}^{\theta}_{i, s\mu_i} }{{V}^{\theta}_{i, z_i\mu_i}}, \frac{{V}^{\theta}_{i, ss\mu_i}}{{V}^{\theta}_{i, z_i\mu_i}}) \mu_i(t, dsdz_i), \\ \nonumber && {V}^{\theta}_i(T, \mu_i) =\frac{1}{\theta_i}\log \left[\int \mu_i(T, dsdz_i) \ e^{\theta_i(z_i+g_i(s, \int_{\tilde{z}}\mu_i(T, ., d\tilde{z})))}\right], \ \\ \nonumber \label{hjbrsnoisyttty} \end{eqnarray} |
By symplifying by \theta_i \hat{V}^{\theta}_i(t, \mu)\neq 0, we obtain that the risk-sensitive best-response payoff {V}^{\theta}_i(t, \mu) solves the functional PDE given by
\begin{eqnarray} \nonumber 0& = & {V}^{\theta}_{i, t}(t, \mu) + \int {V}^{\theta}_{i, z\mu} . H_i(t, s, z, \mu_i, \frac{{V}^{\theta}_{i, s\mu_i} }{{V}^{\theta}_{i, z_i\mu_i}}, \frac{{V}^{\theta}_{i, ss\mu_i}}{{V}^{\theta}_{i, z_i\mu_i}}) \mu_i(t, dsdz_i), \\ \nonumber && {V}^{\theta}_i(T, \mu_i) =\frac{1}{\theta_i}\log \left[\int \mu_i(T, dsdz_i) \ e^{\theta_i(z_i+g_i(s, \int_{\tilde{z}}\mu_i(T, ., d\tilde{z})))}\right], \ \\ \nonumber && \mu_i(t, dsdz_i) = \int_{z_{-i}} \mu(t, dsdz_idz_{-i}), \\ \nonumber && \mu_t = -\partial_s[{b} \mu] -\mbox{div}_z(r \mu)+\frac{1}{2}\partial_{ss}(\sigma^2 \mu), \\ \nonumber && \mu(0, dsdz) = m_0(ds)\delta_{0}(dz), \\ && i\in \mathcal{I}, \ \label{hjbrsnoisytttyt} \end{eqnarray} | (38) |
Let \tilde{v}^*_i(t, s, z_i): = \tilde{V}_{i, \mu_i}(\mu_i)(t, s, z_i). It follows that \tilde{v}^*_i(t, s, z_i) solves the following PDE system:
\begin{array}{l} 0 = \tilde{v}^*_{i, t}+ \tilde{v}^*_{i, z_i} H_i(t, s, z_i, \mu_i, \frac{\tilde{v}^*_{i, s}}{\tilde{v}^*_{i, z_i}}, \frac{\tilde{v}^*_{i, ss}}{\tilde{v}^*_{i, z_i}})\ + \int \tilde{v}^*_{i, \tilde{z}_i}H_{i, \mu_i} \mu_i(t, d\tilde{s}, d\tilde{z}_i). \end{array} |
The terminal condition for this PDE is
\tilde{v}^*_i(T, s, z_i) = e^{\theta_i (z_i+g_i(s, m(T)))}+\theta_i \int g_{i, m}(\tilde{s}, m(T))({s}) e^{\theta_i (\tilde{z}_i+g_i(\tilde{s}, m(T)))}\ \mu_i(T, d\tilde{s}, d\tilde{z}_i). |
From the equality \tilde{V}^{\theta}_{i, \mu}(t, \mu) =\theta_i {V}^{\theta}_{i, \mu}(t, \mu) \tilde{V}^{\theta}_i(t, \mu), we deduce that the function {v}^*_i(t, s, z_i): = {V}^{\theta}_{i, \mu_i}(\mu_i)(t, s, z_i) =\frac{\tilde{V}^{\theta}_{i, \mu}}{\theta_i \tilde{V}^{\theta}_i} = \frac{\tilde{v}^*_i(t, s, z_i)}{\theta_i \tilde{V}^{\theta}_i} is the dual function associated with the risk-sensitive best response value of decision-maker i.
Note that, as in the risk-neutral case, in general, the risk-sensitive dual function {v}^*(t, s, z_i) is different than the risk-sensitive payoff function {V}^{\theta}(t, \mu_i) or {V}^{\theta}(t, \mu).
Itô's formula applied to
\begin{array}{lll} p^{\theta}_i(t) = \frac{\tilde{v}^*_{i, s}(t, s(t), z_i(t))}{\tilde{v}^*_{i, z_i}(t, s(t), z_i(t))}\ =\frac{ \tilde{V}_{i, s\mu_i}(t, s(t), z_i(t))}{\tilde{V}_{i, z_i\mu_i}(t, s(t), z_i(t))}, \end{array} |
provides a risk-sensitive first-order adjoint equation with the risk-sensitive Hamiltonian H_i^{\theta} = H_i(t, s, m, p^{\theta}_i, q_i^{\theta}+p_i^{\theta} l_i).
Proposition 10. For \theta_i\neq 0, the risk-sensitive first order adjoint process (p_i^{\theta}, q_i^{\theta}, l_i, \eta_i)_{i\in \mathcal{I}} solves
\begin{array}{ll} dp^{\theta}_i = - \{H_{i, s} +\frac{1}{\eta_i} \mathbb{E}[\eta_i H_{i, s\mu_i}]\} dt +q_i^{\theta}[-l_i dt+ dB], \\ p^{\theta}_i(T) = g_{i, s}+\frac{1}{\eta_i(T)}\mathbb{E}\{ \eta_i(T) \partial_s g_{i, m}\}, \\ d\eta_i = \eta_i l_i dB, \\ \eta_i(T) =\theta_i e^{\theta_i[z_i(T)+g_i(s(T), m(T, .))} \end{array} |
Proposition 10 provides a risk-sensitive first order system for stochastic maximum principle of mean-field type. By identification of processes, one obtains
\begin{array}{lll} p_i^{\theta}(t) = \frac{\tilde{v}^*_{i, s}(t, s(t), z_i(t))}{\tilde{v}^*_{i, z_i}(t, s(t), z_i(t))} =\frac{ \tilde{V}_{i, s\mu_i}(t, s(t), z_i(t))}{\tilde{V}_{i, z_i\mu_i}(t, s(t), z_i(t))}, \\ q^{\theta}_i = \sigma \frac{\tilde{v}^*_{i, ss}}{ \eta_i}-p_i^{\theta} l_i, \\ \eta_i(t) = \tilde{v}^*_{i, z_i}(t, s(t), z_i(t)), \\ l_i = \sigma\frac{\eta_{i, s}}{\eta_i} = \sigma \partial_s[\ \log \eta_i] \end{array} |
Proof. A proof can be obtained by following similar steps as for the proof of Proposition 4.
One of the fundamental element in the theory of cooperative games is the formulation of the optimal behavior for the decision-makers. Decision-maker behavior (control action and imputations) satisfying specific optimality behaviors then constitutes a solution of the game. In other words, a solution concept of a dynamic cooperative game is produced by a set of optimality principles such as dynamic bargaining solution and payoff allocation procedure. Altruism and cooperation are fascinating research areas. Intuitively, one has attempted to claim that the decision-makers are better off when they all work cooperatively. However, we are often observing very strange behaviors that are far from cooperation. So, if cooperation is answer, what is the question and why these strange behaviors?
Let us consider a simple example cooperative mean-field-type game [42] with two decision-makers. Assume that if they work together (jointly) they will be able to get V(\{12\}, m_0, [0, T]). DM 1 gets V(\{1\}, m_0, [0, T]) if he or she works alone and DM 2 gets V(\{2\}, m_0, [0, T]). From these three values, it is not clear why these decision-makers should work together. In order to formalize it in terms of their interest, we introduce a cost of making a coalition, C(\{12\}, m_0, [0, T])\geq 0 which is the cost incurred when both decision-makers pool their effort (it includes information exchanging cost, coalition creation cost, etc). While this cost is often neglected in the literature, it may be important in many setups. Thus, a necessary condition for possible cooperation between the decision-makers is
V(\{12\}, m_0, [0, T])-C(\{12\}, m_0, [0, T]) > V(\{1\}, m_0, [0, T])+V(\{2\}, m_0, [0, T]). |
Then, the next question is: what will be their payoff if they cooperate? To answer to this question, we need to know how to share the outcome of the cooperation. It is clear that allocating the equal share \frac{1}{2}[V(\{12\}, m_0, [0, T])-C(\{12\}, m_0, [0, T])] to each decision-maker is not necessary appropriate since it can be less than \min_{i}(V(\{i\}, m_0, [0, T])). Thus, the allocation has to be done in a more clever way.
Cooperative game-theoretic solutions such as Bargaining solution, Core, Shapley value, Nucleolus dealt with such problems. When stochasticity and time-dependence are involved, the solution concepts require careful adaptations. In addition, if the payoff function and the state dynamics are of mean-field type, the optimality equations need to be established. In terms of equilibrium payoffs, a mean-field-type Nash equilibrium could plays the role of a benchmark in a cooperative game, i.e., gives what decision-makers could secure for themselves if there is no agreement, i.e., V(\{i\}, m_0, [0, T])).
Let R_{c, i} be the total before-side-payment cooperative payoff of decision-maker i i.e.,
\mathbb{E} [g_i(s_c(T), m_c(T))+\int_0^T r_{i}(t, s_c(t), m_c(t), a_c(t))dt] |
s_c is the optimal state under cooperative (joint) decision-making scenario. One has \sum_{j\in \mathcal{J}} R_{c, j} = V(\mathcal{J}, m_0, [0, T])). However, one needs to find a better way to share the payoff V(\mathcal{J}, m_0, [0, T])-C(\mathcal{J}, m_0, [0, T]). This leads to the introduction of the notion of imputation, i.e., a vector profile (\gamma_j)_{j\in \mathcal{J}} such that \sum_{j\in \mathcal{J}'}\gamma_j\geq V(\mathcal{J}', m_0, [0, T])-C(\mathcal{J}', m_0, [0, T]) for any \mathcal{J}'\subset \mathcal{J}, \ and \sum_{j\in \mathcal{J}}\gamma_j = V(\mathcal{J}, m_0, [0, T])-C(\mathcal{J}, m_0, [0, T])
By virtue of mean-field type joint optimization, the sum of individual payoffs under cooperation is greater or equal to its noncooperative counterpart, i.e.,
\sum\limits_{j\in \mathcal{J}} R_{c, j} \geq \sum\limits_{j\in \mathcal{J}} V(\{j\}, m_0, [0, T]). |
Thus, the dividend of cooperation (without the coalition making cost) is
DC = \sum\limits_{j\in \mathcal{J}} R_{c, j} - \sum\limits_{j\in \mathcal{J}} V(\{j\}, m_0, [0, T])\geq 0. |
Thus, the dividend of cooperation (with the coalition making cost) to be distributed among the decision-makers is DC-C(\mathcal{J}, m_0, [0, T]).
As a first consequence, it is clear if the coalition making cost is too high (compared to the game coalition value) then there is no reason for the decision-makers to form coalition.
Therefore for cooperation purpose we require the positivity of DC-C(\mathcal{J}, m_0, [0, T]). Using a cooperative game approach yields individual payoffs for the whole interval [0, T]. The selected imputation has, by definition, the property that each decision-maker's payoff in the cooperative game is higher or equal to what she would get in a noncooperative game played on the same time interval.
Let \gamma_i(t) = \gamma(\{i\}, \mathcal{J}, m_0, [t, T]) be the cooperative payoff-to-go after side payment for decision-maker i at position [t, T], 0 < t < T of the game. This is the amount of individual payoff that decision-maker i will actually get. One way sharing the payoff is to use a dynamical Shapley value. The allocated payoff to decision-maker i under Shaley value is
\begin{eqnarray}\gamma_i& = &\sum\limits_{\mathcal{J}'\subset \mathcal{J}, i\notin \mathcal{J}'} \frac{|\mathcal{J}'|!(|\mathcal{J}|-|\mathcal{J}'|-1)!}{|\mathcal{J}|!} [(V-C)(\mathcal{J}' \cup \{i\}, [t, T])-(V-C)(\mathcal{J}', [t, T])].\end{eqnarray} | (39) |
For two decision-makers case, the payoffs of the decision-makers are
\begin{equation} \begin{array}{ll}\gamma_1 =\\ V(\{1\}, m_0, [0, T])+ \frac{1}{2}\left[V(\{12\}, m_0, [0, T])-C(\{12\}, m_0, [0, T])- V(\{1\}, m_0, [0, T])- V(\{2\}, m_0, [0, T]) \right], \nonumber\\ \gamma_2 = \\ V(\{2\}, m_0, [0, T])+ \frac{1}{2}\left[V(\{12\}, m_0, [0, T])-C(\{12\}, m_0, [0, T])- V(\{1\}, m_0, [0, T])- V(\{2\}, m_0, [0, T]) \right]. \nonumber \end{array} \end{equation} |
Definition 4. A cooperative solution is time consistent if, at any position t \in [0, T] the cooperation solution payoff to go \gamma(\{i\}, m(t), [t, T])\geq V(\{i\}, m(t), [t, T]) where the deviating payoff V(\{i\}, m(t), [t, T]) is computed along the cooperative state trajectory s_c(t).
This notion of time consistency and its implementation in cooperative differential games was introduced in [6]. A stronger notion of time consistency is that the cooperative payoff-to-go dominates the noncooperative payoff-to-go for any state s(t), \ t\in [[0, T]. This is called sub-game consistent solution.
Though one of the most commonly used allocation principles is the dynamical Shapley value, however, in the case when decision-makers may be asymmetric in their powers and sizes of payoffs, equal imputation of cooperative gains may not be totally agreeable to asymmetric decision-maker. To overcome this, one can suggest the allocation principle in which the decision-makers, shares of the gain from cooperation are proportional to the relative sizes of their expected deviating payoffs. Thus, a proportional time-consistent solution is given by
\begin{eqnarray} \gamma_i(s) =\frac{V(\{i\}, [t, T])}{\sum\limits_{i}V(\{i\}, [t, T])} [V(\mathcal{J}, [t, T])-C(\mathcal{J}, [t, T])]. \end{eqnarray} | (40) |
If these quantities are positive, one gets \gamma(\{i\}, \mathcal{I}, [t, T])\geq V(\{i\}, m(t), [t, T]), \forall t, (individual rationality) and \sum_{i\in\mathcal{I}} \gamma(\{i\}, \mathcal{I}, [t, T]) = V(\mathcal{I}, m(t), [t, T])-C(\mathcal{I}, [s, T]) (efficiency at any time).
For dynamic games, an additional and stringent condition on the solutions is required: The specific optimality principle must remain optimal at any instant of time throughout the game duration along the optimal state trajectory. This condition is known as dynamic stability or time consistency.
In the context of mean-field type games, the notion of time consistency is crucial since the initial distribution of states and starting time influences naturally the Kolmogorov forward equation. A cooperative solution is sub-game consistent if an extension of the cooperative strategy to a situation with a later starting time and to any possible state brought about by the prior optimal behavior of the decision-makers remains optimal. Sub-game consistent is a stronger notion of time consistency. In the presence of stochastic elements, sub-game consistency is required in a credible cooperative solution. In the field of cooperative mean-field type games, little research has been published to date on sub-game consistent solutions.
If the set \mathcal{A} is a non-convex but a general separable complete metric space (Polish space), Pontryagin's approach suggests the following perturbation method called spike variation. The approach is well-adapted to sub-game perfection in games. Fix (t, s)\in [0, T]\times \mathcal{S} and define the control law a_{\epsilon} as the spike variation of \hat{a} over the set [t, t+\epsilon], \ \epsilon>0 i.e.,
a_{\epsilon}(t') =a(t') {\rm{ll}}_{[t, t+\epsilon]}(t') + \hat{a}(t') {\rm{ll}}_{[0, T]\backslash [t, t+\epsilon]}(t'), |
where a is an arbitrary admissible control and {\rm{ll}}_{[t, t+\epsilon]} is the indicator function over the set [t, t+\epsilon].
Definition 5. Let R_J be objective to be maximized in a. We say that \hat{a} is a sub-game perfect cooperative strategy under spike variation if for any t_0, s_0, a,
\lim\limits_{\epsilon \rightarrow 0}\ \frac{1}{\epsilon}\left[R_J([t_0, T], s_0, \hat{a})- R_J([t_0, T], s_0, a_{\epsilon})\ \right] \geq 0. |
Note that a sub-game perfect cooperative strategy under spike variation is in particular a time consistent solution.
The key difference here is that the solution that we are looking for, should not depend on the initial data (when and where we started).
Let H^{t_0, s_0} be the Pontryagin function associated with the random variable s that starts from s_0 at t_0\in [0, T]. H^{t_0, s_0}(t, s, m, a, p, q) =b p^{t_0, s_0}+\sigma q^{t_0, s_0} +r^{t_0, s_0}_J, where the notation r^{t_0, s_0} is obtained from r when the aggregate term is m = m^{t_0, s_0, a}. The first order adjoint equation process (p^{t_0, s_0}, q^{t_0, s_0}) under optimal cooperative control law becomes
\begin{equation}\begin{array}{ll} dp^{t_0, s_0} =-[H^{t_0, s_0}_s+\mathbb{E} \partial_s H^{t_0, s_0}_{m}] dt +q^{t_0, s_0} dB(t), \ t>t_0, \\ p^{t_0, x_0}(T) = g^{t_0, s_0}_s(T)+\mathbb{E} \partial_s g^{t_0, s_0}_{m}(T). \end{array} \end{equation} | (41) |
The second order adjoint equation (P^{t_0, s_0}, Q^{t_0, s_0}) is defined following similar steps as in (14) but with conditioning on t_0, s_0.
Proposition 11. Let the assumptions of Lemma 5 hold. If (\hat{x}, \hat{a}) is an optimal solution of the cooperative mean-field type game then there are two pairs of processes (p, q), (P, Q) that satisfy the first order and the second order adjoint equations, such that
H^{t, \hat{x}(t)}(t, \hat{x}, \hat{m}, \hat{a}, p^{t, \hat{x}(t)}, q^{t, \hat{x}(t)})-H^{t, \hat{x}(t)}(t, \hat{x}, \hat{m}, a, p^{t, \hat{x}(t)}, q^{t, \hat{x}(t)}) |
+\frac{1}{2}P^{t, \hat{x}(t)}(t)\left( \sigma(t, \hat{x}, \hat{m}, \hat{a})- \sigma(t, \hat{x}, \hat{m}, a)\right)^2 \geq 0, |
for all a(.)\in \mathcal{A}, almost every t and \mathbb{P}-almost surely.
In this article, basic results on mean-field-type games were presented. Most of the results presented here extend to the multi-dimensional state case (vector or matrix), and the state-of-the-art stochastic maximum principle can carried out random coefficient functions, common noise as well as time delays and backward-forward stochastic integro-differential equation. The state-of-art dynamic programming principle works on infinite dimension, in the space of measures. By introducing dual functions which are weak Gateaux-derivatives, relationships between stochastic maximum principle and dynamic programming were established. The methodology was shown to be flexible enough to carry out partial state observation and imperfect state measurement in the non-degenerate case using Girsanov transform. Wiener chaos expansion of the underlying processes were proposed to solve the game problems and truncature error bounds were derived using Kosambi-Karhunen-Loeve's approach. This allows one to solve efficiently of the mean-field-type problem much faster than standard methods as multi-level Monte-Carlo sampling or stochastic collocation methods. The choice of a basis is crucial as the number of elements influence the curse of dimensionality of the problem. Sparse representation and non-intrusive proper generalized decomposition of the processes may be needed in order to significantly reduce the complexity.
This research work is supported by U.S. Air Force Office of Scientific Research under grant number FA9550-17-1-0259. The author is grateful to Prof. Boualem Djehiche for useful comments on the Wiener Chaos Expansions in mean-field-type games. The author is grateful to the Editor and the reviewers' valuable comments that improved the manuscript.
The author declares that there is no conflict of interest regarding the publication of this paper.
[1] | L. S. Shapley, Stochastic games, PNAS, P. Natl Acad. Sci. USA, 39 (1953), 1095-1100. |
[2] | B. Jovanovic, R. W. Rosenthal, Anonymous sequential games, J. Math. Econ. Elsevier, 17 (1988), 77-87. |
[3] | J. M. Lasry, P. L. Lions, Mean field games, Jpn. J. Math. 2 (2007), 229-260. |
[4] | A. Bensoussan, B. Djehiche, H. Tembine, et al. Risk-sensitive mean-field-type control, IEEE CDC, 2017. |
[5] | B. Djehiche, S. A. Tcheukam and H. Tembine, A Mean-Field Game of Evacuation in Multi-Level Building, IEEE T. Automat. Contr. 62 (2017), 5154-5169. |
[6] | L. A. Petrosjan, Stability of the solutions in differential games with several players, Vestnik Leningrad. univ. 19 (1977), 46-52. |
[7] | B. Jourdain, S. Méléard andW.Woyczynski, Nonlinear SDEs driven by Lévy processes and related PDEs, Alea, 4 (2008), 1-29. |
[8] | R. Buckdahn, B. Djehiche, J. Li, et al. Mean-field backward stochastic differential equations: a limit approach, Ann. Probab. 37 (2009), 1524-1565. |
[9] | R. Buckdahn, J. Li and S. Peng, Mean-field backward stochastic differential equations and related partial differential equations, Stoch. Proc. Appl. 119 (2009), 3133-3154. |
[10] | D. Andersson, B. Djehiche, A maximum principle for SDEs of Mean-Field Type, Appl. Math. Opt. 63 (2011), 341-356. |
[11] | R. Buckdahn, B. Djehiche and J. Li, A General Stochastic Maximum Principle for SDEs of Mean-Field Type, Appl. Math. Opt. 64 (2011), 197-216. |
[12] | J. Hosking, A stochastic maximum principle for a stochastic differential game of a mean-field type, Appl. Math. Opt. 66 (2012), 415-454. |
[13] | J. Li, Stochastic maximum principle in the mean-field controls, Automatica, 48 (2012), 366-373. |
[14] | M. Hafayed, S. Abbas, A. Abba, On mean-field partial information maximum principle of optimal control for stochastic systems with Lévy processes, J. Optimiz. Theory App. 167 (2015), 1051-1069. |
[15] | R. Elliott, X. Li and Y. H. Ni, Discrete time mean-field stochastic linear-quadratic optimal control problems, Automatica, 49 (2013), 3222-3233. |
[16] | M. Hafayed, A mean-field maximum principle for optimal control of forward-backward stochastic differential equations with Poisson jump processes, International Journal of Dynamics and Control, 1 (2013), 300-315. |
[17] | A. Bensoussan, J. Frehse and S. C. P. Yam, Mean-field games and mean-field-type control theory, SpringerBriefs in mathematics, Springer, 2013. |
[18] | B. Djehiche, H. Tembine and R. Tempone, A Stochastic Maximum Principle for Risk-Sensitive Mean-Field-Type Control, IEEE T. Automat. Contr. 60 (2015), 2640-2649. |
[19] | H. Tembine, Risk-sensitive mean-field-type games with Lp-norm drifts, Automatica, 59 (2015), 224-237. |
[20] | B. Djehiche, H. Tembine, Risk-Sensitive Mean-Field-Type Control under Partial Observation, In Stochastics in environmental and financial economics, Springer Proceedings in Mathematics and Statistics, 2016. |
[21] | K. Law, H. Tembine and R. Tempone, Deterministic Mean-Field Ensemble Kalman Filtering, SIAM J. Sci. Comput. 38 (2016), 1251-1279. |
[22] | J. Robinson, Applied Analysis lecture notes, University of Warwick, 2006. |
[23] | R. H. Cameron, W. T. Martin, The orthogonal development of non-linear functionals in series of Fourier-Hermite functionals, Ann. Math. 48 (1947), 385-392. |
[24] | N. Wiener, The homogeneous chaos, Am. J. Math. 60 (1938), 897-936. |
[25] | N. Wiener, Nonlinear Problems in Random Theory, Technology Press of the Massachusetts Institute of Technology/Wiley, New York, 1958. |
[26] | F. Augustin, A. Gilg, M. Paffrath, et al. Polynomial chaos for the approximation of uncertainties: Chances and limits, Eur. J. Appl. Math. 19 (2008), 149-190. |
[27] | G. Wang, C. Zhang and W. Zhang, Stochastic maximum principle for mean-field type optimal control under partial information, IEEE T. Automat. Contr. 59 (2014), 522-528. |
[28] | Z.Wu, A maximum principle for partially observed optimal control of forward-backward stochastic control systems, Science China information sciences, 53 (2010), 2205-2214. |
[29] | J. Yong, Linear-quadratic optimal control problems for mean-field stochastic differential equations, SIAM J. Control Optim. 51 (2013), 2809-2838. |
[30] | H. Tembine, Uncertainty quantification in mean-field-type teams and games, In Proceedings of 54th IEEE Conference on Decision and Control (CDC), 2015,4418-4423, |
[31] | S. Tang, The maximum principle for partially observed optimal control of stochastic differential equations, SIAM J. Control Optim. 36 (1998), 1596-1617. |
[32] | H. Ma, B. Liu, Linear Quadratic Optimal Control Problem for Partially Observed Forward Backward Stochastic Differential Equations of Mean-Field Type, Asian J. Control, 2017. |
[33] | G. Wang, Z. Wu and J. Xiong, A linear-quadratic optimal control problem of forward-backward stochastic differential equations with partial information, IEEE T. Automat. Contr. 60 (2015), 2904-2916. |
[34] | Q. Meng, Y. Shen, Optimal control of mean-field jump-diffusion systems with delay: A stochastic maximum principle approach, J. Comput. Appl. Math. 279 (2015), 13-30. |
[35] | T. Meyer-Brandis, B. Oksendal and X. Y. Zhou, A mean-field stochastic maximum principle via Malliavin calculus, Stochastics, 84 (2012), 643-666. |
[36] | Y. Shen, Q. Meng and P. Shi, Maximum principle for mean-field jump-diffusion stochastic delay differential equations and its application to finance, Automatica, 50 (2014), 1565-1579. |
[37] | Y. Shen, T. K. Siu, The maximum principle for a jump-diffusion mean-field model and its application to the mean-variance problem, Nonlinear Anal-Theor, 86 (2013), 58-73. |
[38] | G.Wang, Z.Wu, The maximum principles for stochastic recursive optimal control problems under partial information, IEEE T. Automat. Contr. 54 (2009), 1230-1242. |
[39] | G. Wang, Z. Wu and J. Xiong, Maximum principles for forward-backward stochastic control systems with correlated state and observation noises, SIAM J. Control Optim. 51 (2013), 491-524. |
[40] | B. Djehiche, A. Tcheukam and H. Tembine, Mean-Field-Type Games in Engineering, AIMS Electronics and Electrical Engineering, 1 (2017), 18-73. |
[41] | J. Gao, H. Tembine, Distributed Mean-Field-Type Filters for Big Data Assimilation, IEEE International Conference on Data Science Systems (DSS), Sydney, Australia, 12-14 Dec. 2016. |
[42] | A. K. Cisse, H. Tembine, Cooperative Mean-Field Type Games, 19th World Congress of the International Federation of Automatic Control (IFAC), Cape Town, South Africa, 24-29 August 2014,8995-9000. |
[43] | H. Tembine, Tutorial on Mean-Field-Type Games, 19th World Congress of the International Federation of Automatic Control (IFAC), Cape Town, South Africa, 24-29 August 2014. |
[44] | G. Rossi, A. S. Tcheukam and H. Tembine, How Much Does Users' Psychology Matter in Engineering Mean-Field-Type Games, Workshop on Game Theory and Experimental Methods, June 6-7, Second University of Naples, Italy, 2016. |
[45] | A. S. Tcheukam, H. Tembine, On the Distributed Mean-Variance Paradigm, 13th International Multi-Conference on Systems, Signals & Devices. Conference on Systems, Automation & Control. March 21-24,2016 -Leipzig, Germany, 604-609. |
[46] | T. E. Duncan, H. Tembine, Linear-quadratic mean-field-type games: A direct method, Preprint, 2017. |
[47] | T. E. Duncan, H. Tembine, Linear-quadratic mean-field-type games with common noise: A direct method, Preprint, 2017. |
[48] | T. E. Duncan, H. Tembine, Other-regarding payoffs in linear-quadratic mean-field-type games with common noise: A direct method, Preprint, 2017. |
[49] | T. E. Duncan, H. Tembine, Nash bargaining solution in linear-quadratic mean-field-type games: A direct method, Preprint, 2017. |
[50] | H. Tembine, B. Djehiche, P. Yam, et al. Mean-field-type games with jumps and regime switching, Preprint, 2017. |
1. | Julian Barreiro-Gomez, Tyrone E. Duncan, Bozenna Pasik-Duncan, Hamidou Tembine, Semiexplicit Solutions to Some Nonlinear Nonquadratic Mean-Field-Type Games: A Direct Method, 2020, 65, 0018-9286, 2582, 10.1109/TAC.2019.2946337 | |
2. | Alexander Aurell, Boualem Djehiche, Modeling tagged pedestrian motion: A mean-field type game approach, 2019, 121, 01912615, 168, 10.1016/j.trb.2019.01.011 | |
3. | Yan Sun, Lixin Li, Qianqian Cheng, Dawei Wang, Wei Liang, Xu Li, Zhu Han, 2020, Joint Trajectory and Power Optimization in Multi-Type UAVs Network with Mean Field Q-Learning, 978-1-7281-7440-2, 1, 10.1109/ICCWorkshops49005.2020.9145105 | |
4. | Alain Bensoussan, Boualem Djehiche, Hamidou Tembine, Sheung Chi Phillip Yam, Mean-Field-Type Games with Jump and Regime Switching, 2020, 10, 2153-0785, 19, 10.1007/s13235-019-00306-2 | |
5. | Lixin LI, Yan SUN, Qianqian CHENG, Dawei WANG, Wensheng LIN, Wei CHEN, Optimal trajectory and downlink power control for multi-type UAV aerial base stations, 2021, 10009361, 10.1016/j.cja.2020.12.019 | |
6. | Hamidou Tembine, COVID-19: Data-Driven Mean-Field-Type Game Perspective, 2020, 11, 2073-4336, 51, 10.3390/g11040051 | |
7. | Zahrate El Oula Frihi, Julian Barreiro-Gomez, Salah Eddine Choutri, Hamidou Tembine, Hierarchical Structures and Leadership Design in Mean-Field-Type Games with Polynomial Cost, 2020, 11, 2073-4336, 30, 10.3390/g11030030 | |
8. | Tyrone Duncan, Hamidou Tembine, Linear–Quadratic Mean-Field-Type Games: A Direct Method, 2018, 9, 2073-4336, 7, 10.3390/g9010007 | |
9. | Sergey Gavrilets, Peter J. Richerson, Authority matters: propaganda and the coevolution of behaviour and attitudes, 2022, 4, 2513-843X, 10.1017/ehs.2022.48 | |
10. | Zahrate El Oula Frihi, Salah Eddine Choutri, Julian Barreiro-Gomez, Hamidou Tembine, Hierarchical Mean-Field Type Control of Price Dynamics for Electricity in Smart Grid, 2022, 35, 1009-6124, 1, 10.1007/s11424-021-0176-3 | |
11. | Christian Houle, Damian J. Ruck, R. Alexander Bentley, Sergey Gavrilets, Inequality between identity groups and social unrest, 2022, 19, 1742-5662, 10.1098/rsif.2021.0725 | |
12. | Sergey Gavrilets, Coevolution of actions, personal norms and beliefs about others in social dilemmas, 2021, 3, 2513-843X, 10.1017/ehs.2021.40 | |
13. | Denis Tverskoi, Andrea Guido, Giulia Andrighetto, Angel Sánchez, Sergey Gavrilets, Disentangling material, social, and cognitive determinants of human behavior and beliefs, 2023, 10, 2662-9992, 10.1057/s41599-023-01745-4 | |
14. | Matt Barker, Pierre Degond, Ralf Martin, Mirabelle Muûls, A mean field game model of firm-level innovation, 2023, 33, 0218-2025, 929, 10.1142/S0218202523500203 | |
15. | Partha Sarathi Mohapatra, Puduru Viswanadha Reddy, Linear-Quadratic Mean-Field-Type Difference Games With Coupled Affine Inequality Constraints, 2023, 7, 2475-1456, 1987, 10.1109/LCSYS.2023.3283371 | |
16. | Alexander Aurell, Mean-Field Type Games between Two Players Driven by Backward Stochastic Differential Equations, 2018, 9, 2073-4336, 88, 10.3390/g9040088 | |
17. | Sergey Gavrilets, Denis Tverskoi, Nianyi Wang, Xiaomin Wang, Juan Ozaita, Boyu Zhang, Angel Sánchez, Giulia Andrighetto, Co-evolution of behaviour and beliefs in social dilemmas: estimating material, social, cognitive and cultural determinants, 2024, 6, 2513-843X, 10.1017/ehs.2024.38 |
\mathcal{I} | \triangleq | set of decision-makers |
T | \triangleq | Length of the horizon |
[0, T] | \triangleq | horizon of the mean-field-type game |
t | \triangleq | time index. |
\mathcal{S} | \triangleq | state space |
s(t) | \triangleq | state at time t |
\Delta(\mathcal{S}) | \triangleq | set of probability measures on \mathcal{S} |
m(t, .) | \triangleq | probability measure of the state at time t |
A_i | \triangleq | control action set of decision-maker i\in \mathcal{I} |
a_i(.) | \triangleq | strategy of decision-maker i\in \mathcal{I} |
a=(a_i)_{i\in \mathcal{I}} | \triangleq | strategy profile |
b(t, s, m, a) | \triangleq | drift coefficient function |
\sigma(t, s, m, a) | \triangleq | diffusion coefficient function |
r_i(t, s, m, a) | \triangleq | instant payoff of decision-maker i\in \mathcal{I} |
g_i(s, m) | \triangleq | terminal payoff of decision-maker i\in \mathcal{I} |
\mathcal{R}_{i, T}(m_0, a) | \triangleq | cumulative payoff of i |
V_i(t, m) | \triangleq | equilibrium payoff of i\in \mathcal{I} |
(p_i, q_i) | \triangleq | first order adjoint process of i\in \mathcal{I} |
(P_i, Q_i) | \triangleq | second order adjoint process of i\in \mathcal{I} |
v_i^*(t, s) | \triangleq | dual function of i\in \mathcal{I} |
H_i | \triangleq | r_i+bp_i+\sigma q_i of i\in \mathcal{I} |
H_{i, s} | \triangleq | r_{i, s}+b_sp_i+\sigma_s q_i |
H_{i, ss} | \triangleq | r_{i, ss}+b_{ss}p_i+\sigma_{ss} q_i |
\mathcal{I} | \triangleq | set of decision-makers |
T | \triangleq | Length of the horizon |
[0, T] | \triangleq | horizon of the mean-field-type game |
t | \triangleq | time index. |
\mathcal{S} | \triangleq | state space |
s(t) | \triangleq | state at time t |
\Delta(\mathcal{S}) | \triangleq | set of probability measures on \mathcal{S} |
m(t, .) | \triangleq | probability measure of the state at time t |
A_i | \triangleq | control action set of decision-maker i\in \mathcal{I} |
a_i(.) | \triangleq | strategy of decision-maker i\in \mathcal{I} |
a=(a_i)_{i\in \mathcal{I}} | \triangleq | strategy profile |
b(t, s, m, a) | \triangleq | drift coefficient function |
\sigma(t, s, m, a) | \triangleq | diffusion coefficient function |
r_i(t, s, m, a) | \triangleq | instant payoff of decision-maker i\in \mathcal{I} |
g_i(s, m) | \triangleq | terminal payoff of decision-maker i\in \mathcal{I} |
\mathcal{R}_{i, T}(m_0, a) | \triangleq | cumulative payoff of i |
V_i(t, m) | \triangleq | equilibrium payoff of i\in \mathcal{I} |
(p_i, q_i) | \triangleq | first order adjoint process of i\in \mathcal{I} |
(P_i, Q_i) | \triangleq | second order adjoint process of i\in \mathcal{I} |
v_i^*(t, s) | \triangleq | dual function of i\in \mathcal{I} |
H_i | \triangleq | r_i+bp_i+\sigma q_i of i\in \mathcal{I} |
H_{i, s} | \triangleq | r_{i, s}+b_sp_i+\sigma_s q_i |
H_{i, ss} | \triangleq | r_{i, ss}+b_{ss}p_i+\sigma_{ss} q_i |