Modeling path-dependent state transitions by a recurrent neural network

Bill Huajian Yang; Bill Huajian Yang

doi:10.3934/bdia.2022001

Big Data and Information Analytics

2022, Volume 7, 1-12. doi: 10.3934/bdia.2022001

Previous Article Next Article

Research article

Modeling path-dependent state transitions by a recurrent neural network

Bill Huajian Yang ^,

Garland Tech Inc., 54 Courtsfield Crescent, Etobicoke, ON M9A 4S9, Canada

Received: 20 June 2022 Revised: 29 July 2022 Accepted: 09 September 2022 Published: 26 September 2022

Rating transition models are widely used for credit risk evaluation. It is not uncommon that a time-homogeneous Markov rating migration model will deteriorate quickly after projecting repeatedly for a few periods. This is because the time-homogeneous Markov condition is generally not satisfied. For a credit portfolio, the rating transition is usually path-dependent. In this paper, we propose a recurrent neural network (RNN) model for modeling path-dependent rating migration. An RNN is a type of artificial neural network where connections between nodes form a directed graph along a temporal sequence. There are neurons for input and output at each time period. The model is informed by the past behaviors for a loan along the path. Information learned from previous periods propagates to future periods. The experiments show that this RNN model is robust.

Keywords:

Citation: Bill Huajian Yang. Modeling path-dependent state transitions by a recurrent neural network[J]. Big Data and Information Analytics, 2022, 7: 1-12. doi: 10.3934/bdia.2022001

Related Papers:

[1]	Nickson Golooba, Woldegebriel Assefa Woldegerima, Huaiping Zhu . Deep neural networks with application in predicting the spread of avian influenza through disease-informed neural networks. Big Data and Information Analytics, 2025, 9(0): 1-28. doi: 10.3934/bdia.2025001
[2]	Mingxing Zhou, Jing Liu, Shuai Wang, Shan He . A comparative study of robustness measures for cancer signaling networks. Big Data and Information Analytics, 2017, 2(1): 87-96. doi: 10.3934/bdia.2017011
[3]	Jason Adams, Yumou Qiu, Luis Posadas, Kent Eskridge, George Graef . Phenotypic trait extraction of soybean plants using deep convolutional neural networks with transfer learning. Big Data and Information Analytics, 2021, 6(0): 26-40. doi: 10.3934/bdia.2021003
[4]	David E. Bernholdt, Mark R. Cianciosa, David L. Green, Kody J.H. Law, Alexander Litvinenko, Jin M. Park . Comparing theory based and higher-order reduced models for fusion simulation data. Big Data and Information Analytics, 2018, 3(2): 41-53. doi: 10.3934/bdia.2018006
[5]	Amanda Working, Mohammed Alqawba, Norou Diawara, Ling Li . TIME DEPENDENT ATTRIBUTE-LEVEL BEST WORST DISCRETE CHOICE MODELLING. Big Data and Information Analytics, 2018, 3(1): 55-72. doi: 10.3934/bdia.2018010
[6]	Xiaoxiang Guo, Zuolin Shi, Bin Li . Multivariate polynomial regression by an explainable sigma-pi neural network. Big Data and Information Analytics, 2024, 8(0): 65-79. doi: 10.3934/bdia.2024004
[7]	Sayed Mohsin Reza, Md Al Masum Bhuiyan, Nishat Tasnim . A convolution neural network with encoder-decoder applied to the study of Bengali letters classification. Big Data and Information Analytics, 2021, 6(0): 41-55. doi: 10.3934/bdia.2021004
[8]	Marco Tosato, Jianhong Wu . An application of PART to the Football Manager data for players clusters analyses to inform club team formation. Big Data and Information Analytics, 2018, 3(1): 43-54. doi: 10.3934/bdia.2018002
[9]	Yaru Cheng, Yuanjie Zheng . Frequency filtering prompt tuning for medical image semantic segmentation with missing modalities. Big Data and Information Analytics, 2024, 8(0): 109-128. doi: 10.3934/bdia.2024006
[10]	Robin Cohen, Alan Tsang, Krishna Vaidyanathan, Haotian Zhang . Analyzing opinion dynamics in online social networks. Big Data and Information Analytics, 2016, 1(4): 279-298. doi: 10.3934/bdia.2016011

Abstract

1. Introduction

Rating transition models are widely used in the financial industry for credit risk evaluations, including stress testing and IFRS 9 expected credit loss evaluation ^[1,2,3,4], under the assumption that a rating transition is a time-homogeneous Markov process, depending only on the current rating and covariates. However, it is not uncommon that a Markov model deteriorates quickly after projecting for a few periods. This is because the Markov condition is generally not satisfied. A rating transition for a credit portfolio is generally path dependent. A test for this assumption is required ^[5,6] for the use of these Markov models.

There are various methods for path-dependent credit risk modeling ^[7], including regime-switching models ^[8] and the conditional methods ^[9,10]. The latter is comparative to a cohort analysis.

In this paper, we propose a recurrent neural network (RNN) model for modeling path-dependent rating transition. An RNN is a type of artificial neural network where connections between nodes form a directed graph along a temporal sequence. There are neurons for input and output at each time-period. The RNN is informed by past behaviours along the path. Information learned from previous periods propagates to future periods ^{[11,12,13,14,15,16,17]}.

The network structure for the proposed RNN model is described in Section 2 by using (2.1)–(2.4). This RNN model was implemented in Python. The experiments show that this RNN model is robust, compared to Markov transition models. Applications of this RNN model include the following, wherever path dependence is relevant:

(a) Path-dependent asset evaluation or credit risk evaluation

(b) Decisioning for account management

(d) Estimating conditional probability of default for survival analysis

The paper is organized as follows. In Section 2, we setup the proposed RNN model. In Section 3, we calculate the partial derivatives for the network cost function. In Section 4, we present the experimental results for this proposed RNN model, as benchmarked with the time-homogeneous and time-inhomogeneous rating transition models.

2. Recurrent neural network models for multiperiod state transition

In this section, we describe, the proposed RNN model, as given in (2.1)–(2.4), for modeling path-dependent rating transition. Given an observation horizon with $T$ periods:

${0 = {t}_{0} < t}_{1} < {t}_{2} < \dots < {t}_{T};$

our goal is to estimate at each $i\ge 0$ the probability of transitioning to a rating at time ${t}_{i+1}$ given the rating and covariates at time ${t}_{i}$ .

Traditional rating transition models assume the Markov condition, i.e., the transition probability depends only on the current rating and covariates. The type of Markov transition models includes the following:

(a) Time-homogeneous Markov rating transition, as represented by one single transition model for all periods

(b) Time-inhomogeneous Markov rating transition, as represented by one transition model for each period

It is not uncommon that a time-homogeneous Markov model will deteriorate quickly after projecting only for a few periods. This is because the Markov condition is generally not satisfied. A rating transition for a credit portfolio is generally path dependent.

An RNN is an artificial neural network where connections between nodes form a directed graph along a temporal sequence. The chart below depicts the structure of an RNN. At the ${i}^{th}$ time period of the temporal sequence, the input neurons, the hidden neurons and the output neurons are respectively labeled ${x}^{\left(i\right)},$ ${h}^{\left(i\right)}$ and ${y}^{\left(i\right)}:$

Figure 1. An RNN for rating transition.

DownLoad: Full-Size Img PowerPoint

An RNN shares the advantages of common neural networks; particularly, information learned at a point is propagated back and forward to all periods. It is path dependent.

Let ${\left\{{R}_{i}\right\}}_{i = 1}^{n}$ denote the $n$ ratings for a credit portfolio. For a loan portfolio, we reserve ${R}_{n-1}$ as the withdrawal rating and ${R}_{n}$ as the default ratings. Both the default and withdrawal ratings are assumed to be absorbed states, which means that a loan rated by a default or withdrawal rating will be excluded from the sample for future subsequent observations. Rating labels are observable at the beginning and the end of a period.

Let ${r}_{j}^{\left(i\right)}$ denote the indicator with a value of 1 if the rating for a loan at the end of the ${i}^{th}$ period is ${R}_{j}$ and 0 otherwise. Let ${n}_{p}$ denote the number of non-absorbed ratings. An input at the ${i}^{th}$ period is denoted as

${x}^{\left(i\right)} = ({x}_{1}^{\left(i\right)}, {x}_{2}^{\left(i\right)}, \ \ \dots , {x}_{m}^{\left(i\right)}) ,$

where the first ( $m-{n}_{p})$ input components ${(x}_{1}^{\left(i\right)}, {x}_{2}^{\left(i\right)}, \dots, {x}_{m-{n}_{p}}^{\left(i\right)}$ ) denote the covariates observable at the beginning of the ${i}^{th}$ period, and the remaining ${n}_{p}$ components are the rating indicators for non-absorbed ratings observed at the end of the ${(i-1)}^{th}$ period:

${x}_{j}^{\left(i\right)} = {r}_{j}^{(i-1)}, \ \ \ m-{n}_{p}+1\le j\le m.$

That is, the non-absorbed rating observed at the end of the ${(i-1)}^{th}$ period is used as the input for the next period. The output at the ${i}^{th}$ period is denoted as

${y}^{\left(i\right)} = {(y}_{1}^{\left(i\right)}, {y}_{2}^{\left(i\right)}, \dots, {y}_{n}^{\left(i\right)})$

where

${y}_{j}^{\left(i\right)} = {r}_{j}^{\left(i\right)}, 1\le j\le n.$

The structure for this RNN is described as shown in (2.1)–(2.4) below. Initially, for the first period, we have

(a) Input: ${x}^{\left(1\right)} = ({x}_{1}^{\left(1\right)}, {x}_{2}^{\left(1\right)}, \dots, {x}_{m}^{\left(1\right)})$ ;

(b) Output: ${y}^{\left(1\right)}$ = ${(y}_{1}^{\left(1\right)}, {y}_{2}^{\left(1\right)}, \dots, {y}_{n}^{\left(1\right)})$ , i.e., a unit vector where all components are zero except for one, which has a value of 1, and it is a random realization generated by the multinomial probability ${{p}_{1} = (p}_{11, }{p}_{12}, \dots, {p}_{1n}),$ where

${p}_{1j} = \frac{\mathrm{exp}\left({v}_{j}^{\left(1\right)}\right)}{\mathrm{exp}\left({v}_{1}^{\left(1\right)}\right)+\mathrm{exp}\left({v}_{2}^{\left(1\right)}\right)+\dots +\mathrm{exp}\left({v}_{n}^{\left(1\right)}\right)} ,$

(2.1)

and

${v}_{j}^{\left(1\right)} = {a}_{j1}^{\left(1\right)}{x}_{1}^{\left(1\right)}+{a}_{j2}^{\left(1\right)}{x}_{2}^{\left(1\right)}+\dots +{{a}_{jm}^{\left(1\right)}x}_{m}^{\left(1\right)} .$

(2.2)

Vector ( ${v}_{1}^{\left(1\right)}, {v}_{2}^{\left(1\right)}, \dots, {v}_{n}^{\left(1\right)})$ in (2.1) and (2.2) represents the information learned during the 1^st period, which is stored in hidden neurons ${h}^{\left(1\right)} = ({h}_{1}^{\left(1\right)}, {h}_{2}^{\left(1\right)}, \dots, {h}_{n}^{\left(1\right)})$ in the 1^st period.

In general, given the vector ( ${v}_{1}^{\left(i-1\right)}, {v}_{2}^{\left(i-1\right)}, \dots, {v}_{n}^{\left(i-1\right)})$ for the ${\left(i-1\right)}^{th}$ period ( $i\ge 2)$ , we have the following at the ${i}^{th}$ period:

(c) Input: ${x}^{\left(i\right)} = ({x}_{1}^{\left(i\right)}, {x}_{2}^{\left(i\right)}, \dots, {x}_{m}^{\left(i\right)})$ ;

(d) Output: ${y}^{\left(i\right)} = {(y}_{1}^{\left(i\right)}, {y}_{2}^{\left(i\right)}, \dots, {y}_{n}^{\left(i\right)}$ ), i.e., a unit vector, which is a random realization generated by multinomial probability ${{p}_{i} = (p}_{i1, }{p}_{i2}, \dots, {p}_{in}),$ where

${p}_{ij} = \frac{\mathrm{exp}\left({v}_{j}^{\left(i\right)}\right)}{\mathrm{exp}\left({v}_{1}^{\left(i\right)}\right)+\mathrm{exp}\left({v}_{2}^{\left(i\right)}\right)+\dots +\mathrm{exp}\left({v}_{n}^{\left(i\right)}\right)} .$

(2.3)

and

${v}_{j}^{\left(i\right)} = {a}_{j1}^{\left(i\right)}{x}_{1}^{\left(i\right)}+{a}_{j2}^{\left(i\right)}{x}_{2}^{\left(i\right)}+\dots +{{a}_{jm}^{\left(i\right)}x}_{m}^{\left(i\right)} + {{{b}_{j1}^{\left(i\right)}v}_{1}^{\left(i-1\right)}+b}_{j2}^{\left(i\right)}{v}_{2}^{(i-1)}+\dots +{{b}_{jn}^{\left(i\right)}v}_{n}^{\left(i-1\right)} .$

(2.4)

As observed, ${v}_{j}^{\left(i\right)}$ consists of two parts, one from the current input, and the other from history, i.e., ( ${v}_{1}^{\left(i-1\right)}, {v}_{2}^{\left(i-1\right)}, \dots, {v}_{n}^{\left(i-1\right)})$ , corresponding to the information learned up to the ${(i-1)}^{th}$ period. Similarly, the vector ( ${v}_{1}^{\left(i\right)}, {v}_{2}^{\left(i\right)}, \dots, {v}_{n}^{\left(i\right)})$ represents the information learned so far up to ${i}^{th}$ period, which is stored in hidden neurons ${h}^{\left(i\right)} = {(h}_{1}^{\left(i\right)}, {h}_{2}^{\left(i\right)}, \dots, {h}_{n}^{\left(i\right)})$ .

This RNN model works, at the ${i}^{th}$ stage, in the way as described below:

1) Collects the input ${x}^{\left(i\right)},$ and information ${h}^{(i-1)}$ learned up to the end of $(i-1)$

2) Learns from ${x}^{\left(i\right)}$ and ${h}^{(i-1)}$ and stores the learned information in hidden neurons ${h}^{\left(i\right)}$

3) Derives the multinomial probability $({p}_{i1}, {p}_{i2}, \dots, {p}_{in})$ , where ${p}_{ij}$ is the probability that the event will transition to ${j}^{th}$ rating.

4) Output: ${y}^{\left(i\right)}$ is a random multinomial realization, given the multinomial probability distribution $({p}_{i1}, {p}_{i2}, \dots, {p}_{in})$

Remark 2.1. The formulation of (2.4) does not come with a bias (i.e., intercept). An intercept can be inserted by adding a covariate with a constant value of 1, whenever necessary.

3. Training the RNN rating transition model

Let ${y}^{\left(k\right)} = {(y}_{1}^{\left(k\right)}, {y}_{2}^{\left(k\right)}, \dots, {y}_{n}^{\left(k\right)})$ be the observed outcome at the ${k}^{th}$ period. The cost function at the ${k}^{th}$ period is denoted by ${L}_{k}$ , which is given as (for one single data point):

${L}_{k} = -{\sum }_{j = 1}^{n}{y}_{j}^{\left(k\right)}\mathrm{l}\mathrm{o}\mathrm{g}\left({p}_{kj}\right) .$

(3.1)

This is the negative log-likelihood for observing the multinomial outcome ${(y}_{1}^{\left(k\right)}, {y}_{2}^{\left(k\right)}, \dots, {y}_{n}^{\left(k\right)})$ . The total cost function to be minimized for this recurrent neural network is

$L = {L}_{1}+{L}_{2}+\dots +{L}_{T}.$

(3.2)

summing over entire training sample.

3.1. Partial derivatives of $L$ with respect to network weights

Training a neural network involves a series of gradient descent searches. Evaluation of partial derivatives is essential. In this sub-section, we calculate the partial derivatives for the network cost function with respect to network weights.

3.1.1. Partial derivatives of ${L}_{k}$ with respect to ${v}_{i}^{(k-r)}$

Let ${d}_{i}^{(k, r)}$ denote the partial derivative of ${L}_{k}$ with respect to ${v}_{i}^{(k-r)}$ , $0\le r\le k-1.$ By (2.1) and (2.3), we have the partial derivative $\frac{\partial {p}_{ki}}{\partial {v}_{j}^{\left(k\right)}}$ as:

$\frac{\partial {p}_{ki}}{\partial {v}_{j}^{\left(k\right)}} = \left\{\begin{array}{c}{p}_{ki}\left(1-{p}_{ki}\right), i = j, \\ \ \ \ \ {-p}_{ki}{p}_{kj}, i\ne j.\end{array}\right.$

(3.3)

Hence, by (3.1) and (3.3), we have the following for $1\le i\le n$ :

$\begin{array}{l} {d}_{i}^{(k, 0)} = \partial {L}_{k}/\partial {v}_{i}^{\left(k\right)} = -{\sum }_{j = 1}^{n}{y}_{j}^{\left(k\right)}\partial \left[\mathrm{log}\left({p}_{kj}\right)\right]/\partial {v}_{i}^{\left(k\right)}\\ = -{y}_{i}^{\left(k\right)}\left(1-{p}_{ki}\right)+{\sum }_{j\ne i}^{n}{y}_{j}^{\left(k\right)}{p}_{ki} = -\left({y}_{i}^{\left(k\right)}-{p}_{ki}\right), \end{array}$

(3.4)

where ${\sum }_{j = 1}^{n}{y}_{j}^{\left(k\right)} = 1$ is used.

Given ${d}_{j}^{(k, 0)}, 1\le j\le n,$ we can calculate ${d}_{i}^{(k, 1)}$ from top-down for $k > 1$ and $1\le i\le n$ by using (2.4):

${d}_{i}^{(k, 1)} = \frac{\partial {L}_{k}}{\partial {v}_{i}^{\left(k-1\right)}} = {\sum }_{j = 1}^{n}\left(\frac{\partial {L}_{k}}{\partial {v}_{j}^{\left(k\right)}}\right)\left(\frac{\partial {v}_{j}^{\left(k\right)}}{\partial {v}_{i}^{\left(k-1\right)}}\right) = {\sum }_{j = 1}^{n}{{b}_{ji}^{\left(k\right)}d}_{j}^{(k, 0)} .$

Inductively, we have the following for $k > r$ and $0\le i\le n$ :

$\begin{array}{c} {d}_{i}^{(k, r)} = \frac{\partial {L}_{k}}{\partial {v}_{i}^{\left(k-r\right)}} \\ { = \sum }_{j = 1}^{n}\left(\frac{\partial {L}_{k}}{\partial {v}_{j}^{\left(k-r+1\right)}}\right)\left(\frac{\partial {v}_{j}^{\left(k-r+1\right)}}{\partial {v}_{i}^{\left(k-r\right)}}\right)\\ = {\sum }_{j = 1}^{n}{{b}_{ji}^{(k-r+1)}d}_{j}^{(k, r-1)}. \end{array}$

(3.5)

3.1.2. Partial derivatives of $L={\sum }_{k=1}^{T}{L}_{k}$ with respect to ${v}_{i}^{\left(r\right)}$

We will use the following fact:

$\frac{\partial \mathit{\boldsymbol{L}}_\mathit{\boldsymbol{k}}}{\mathit{\boldsymbol{\partial}} {v}_{j}^{\left(i\right)}} = 0 \ {\rm{if}} \ k < i .$

(3.6)

Let ${D}_{i}^{T-r}$ denote the partial derivative of $L$ with respect to ${v}_{i}^{(T-r)}$ . Given ${\{d}_{i}^{(k, 0)}| 1\le i\le n, 1\le k\le T\},$ we can calculate ${\{D}_{i}^{T-r}\left|1\le i\le n, 0\le r < T\right\}$ top-down. Initially, at the top period, we have the following by (3.6):

${D}_{i}^{T} = \frac{\partial \mathit{\boldsymbol{L}}}{\mathit{\boldsymbol{\partial}} {v}_{i}^{\left(T\right)}} = \frac{\partial \mathit{\boldsymbol{L}}_\mathit{\boldsymbol{T}}}{\mathit{\boldsymbol{\partial}} {v}_{i}^{\left(T\right)}}{ = d}_{i}^{(T, 0)}.$

Next, backward from the top period, we have ${D}_{i}^{T-1}$ for $T > 1$ and $1\le i\le n$ as follows:

${D}_{i}^{T-1} = \frac{\partial L}{\partial {v}_{i}^{\left(T-1\right)}} = \frac{\partial {(L}_{T}+{L}_{T-1)}}{\partial {v}_{i}^{\left(T-1\right)}}$

$= \frac{\partial {L}_{T}}{\partial {v}_{i}^{\left(T-1\right)}}+\frac{\partial {L}_{T-1}}{\partial {v}_{i}^{\left(T-1\right)}} { = \sum }_{j = 1}^{n}\frac{\partial {L}_{T}}{\partial {v}_{j}^{\left(T\right)}}\frac{\partial {v}_{j}^{\left(T\right)}}{\partial {v}_{i}^{\left(T-1\right)}}+{d}_{i}^{(T-\mathrm{1, 0})}$

${ = \sum }_{j = 1}^{n}{b}_{ji}^{\left(T\right)}\frac{\partial L}{\partial {v}_{j}^{\left(T\right)}}+{d}_{i}^{\left(T-\mathrm{1, 0}\right)}$

${ = \sum }_{j = 1}^{n}{{b}_{ji}^{\left(T\right)}D}_{j}^{T}+{d}_{i}^{(T-\mathrm{1, 0})} ,$

where (3.6) is used for 2^nd and ${5}^{th}$ equality signs. Inductively, we have ${D}_{i}^{T-r}$ for $T > r$ and $1\le i\le n$ from top-down as follows:

$\begin{array}{*{20}{c}} {{D}_{i}^{T-r} = \frac{\partial L}{\partial {v}_{i}^{\left(T-r\right)}} = \frac{\partial ({L}_{T}+{L}_{T-1}+\dots +{L}_{T-r+1})}{\partial {v}_{i}^{\left(T-r\right)}}+\frac{\partial {L}_{T-r}}{\partial {v}_{i}^{\left(T-r\right)}} , }\\ { { = \sum }_{j = 1}^{n}\frac{\partial ({L}_{T}+{L}_{T-1}+\dots +{L}_{T-r+1})}{\partial {v}_{j}^{\left(T-r+1\right)}}\frac{\partial {v}_{j}^{(T-r+1)}}{\partial {v}_{i}^{\left(T-r\right)}}+{d}_{i}^{(T-r, 0)}}\\ {{ = \sum }_{j = 1}^{n}\frac{\partial L}{\partial {v}_{j}^{\left(T-r+1\right)}}\frac{\partial {v}_{j}^{(T-r+1)}}{\partial {v}_{i}^{\left(T-r\right)}}+{d}_{i}^{(T-r, 0)}}\\ { = {\sum }_{j = 1}^{n}{{b}_{ji}^{(T-r+1)}D}_{j}^{(T-r+1)}+{d}_{i}^{(T-r, 0)} , } \end{array}$

(3.7)

where (3.6) is used for 2^nd and ${4}^{th}$ equality signs.

3.1.3. Partial derivatives of $L={\sum }_{k=1}^{T}{L}_{k}$ with respect to ${a}_{ij}^{\left(r\right)}$ and ${b}_{ij}^{\left(r\right)}$

Given ${\{D}_{i}^{r}\}$ , i.e., the partial derivatives of the cost function $L$ with respect to ${v}_{i}^{\left(r\right)};$ we can now find the partial derivatives of $L$ with respect network weights ${a}_{ij}^{\left(r\right)}$ and ${b}_{ij}^{\left(r\right)}$ as defined in (2.2) and (2.4).

Let ${\delta }_{ij}^{r}$ and ${\sigma }_{ij}^{r}$ denote respectively the partial derivatives of $L$ with respect to ${a}_{ij}^{\left(r\right)}$ and ${b}_{ij}^{\left(r\right)}.$ By (2.4), at the top time period $r = T$ , we have

$\begin{array}{*{20}{c}} {{\delta }_{ij}^{T} = \frac{\partial L}{\partial {a}_{ij}^{\left(T\right)}} = \frac{\partial L}{\partial {v}_{i}^{\left(T\right)}}\frac{\partial {v}_{i}^{\left(T\right)}}{\partial {a}_{ij}^{\left(T\right)}} = {{x}_{j}^{\left(T\right)}D}_{i}^{T}, }\\ {{\sigma }_{ij}^{T} = \frac{\partial L}{\partial {b}_{ij}^{\left(T\right)}} = \frac{\partial L}{\partial {v}_{i}^{\left(T\right)}}\frac{\partial {v}_{i}^{\left(T\right)}}{\partial {b}_{ij}^{\left(T\right)}} = {{v}_{j}^{(T-1)}D}_{i}^{T}} \end{array}$

If general, we have

${{\delta }_{ij}^{T-r} = {{x}_{j}^{(T-r)}D}_{i}^{T-r}, }$

(3.8)

${\sigma }_{ij}^{T-r} = {{v}_{j}^{(T-r-1)}D}_{i}^{T-r}.$

(3.9)

3.2. Initialization of network weights

A good initialization of the network weights ${a}_{ij}^{\left(r\right)}$ and ${b}_{ij}^{\left(r\right)}$ speeds up the convergence for the network training. In this sub-section, we propose an algorithm for initializing the network weights.

Let ${w}_{j}^{\left(r\right)}$ denote the vector of weights in (2.4) for ${v}_{j}^{\left(r\right)}$ at the ${r}^{th}$ time period, i.e.,

${w}_{j}^{\left(r\right)} = {\left({a}_{j1}^{\left(r\right)}, {a}_{j2}^{\left(r\right)}, \dots , {a}_{jm}^{\left(r\right)}, {b}_{j1}^{\left(r\right)}, {b}_{j2}^{\left(r\right)}, \dots , {b}_{jn}^{\left(r\right)}\right)}^{transpose}$

(3.10A)

for $r > 1,$ and for $r = 1,$

${w}_{j}^{\left(1\right)} = {\left({a}_{j1}^{\left(1\right)}, {a}_{j2}^{\left(1\right)}, \dots , {a}_{jm}^{\left(1\right)}\right)}^{transpose}.$

(3.10B)

Then the weight matrix for the network at the ${r}^{th}$ time-period is given by

${{w}^{\left(r\right)} = (w}_{1}^{\left(r\right)}, {w}_{2}^{\left(r\right)}, \dots , {w}_{n}^{\left(r\right)}), 1\le r\le T.$

(3.11)

Algorithm 3.1 (Initialization). Initialize network weights ${a}_{ij}^{\left(r\right)}$ and ${b}_{ij}^{\left(r\right)}$ as follows, step-by-step, starting from the first time period:

(a) Find ${w}_{j}^{\left(1\right)}, 1\le j\le n,$ by running a linear (or logistic if more sensitivity is required for some ${y}_{j}^{\left(1\right)}\text{'}s$ ) regression against the binary target ${y}_{j}^{\left(1\right)}$ with ${x}^{\left(1\right)}$ as the explanatory variable. Derive ${v}_{j}^{\left(1\right)}$ by (2.2).

(b) Given ${x}^{\left(2\right)} = {(x}_{1}^{\left(2\right)}, {x}_{2}^{\left(2\right)}, \dots,$ ${x}_{m}^{\left(2\right)})$ and ${v}^{\left(1\right)} = {(v}_{1}^{\left(1\right)}, {v}_{2}^{\left(1\right)}, \dots, {v}_{n}^{\left(1\right)}),$ find ${w}_{j}^{\left(2\right)}, 1\le j\le n$ by running a linear (or logistic if more sensitivity is required for some ${y}_{j}^{\left(2\right)}\text{'}s$ ) regression against ${y}_{j}^{\left(2\right)}$ with the components of ${x}^{\left(2\right)}$ and ${v}^{\left(1\right)}$ as explanatory variables. Derive ${v}_{j}^{\left(2\right)}$ by (2.4).

(c) Repeat (b) to obtain the initial weights for ${w}_{j}^{\left(r\right)}$ at the ${r}^{th}$ time period for $1\le j\le n$ and $1\le r\le T.$

3.3. Training the recurrent neural network

Given initial weights, network training involves a series of gradient descent searches, as described in the next algorithm. Let ${w}^{\left(r\right)}$ be the weight matrix as in (3.11) for the network at the ${r}^{th}$ time period, i.e.,

${{w}^{\left(r\right)} = (w}_{1}^{\left(r\right)}, {w}_{2}^{\left(r\right)}, \dots , {w}_{n}^{\left(r\right)}), 1\le r\le T.$

Algorithm 3.2 (Network training). Update network weights ${w}^{\left(r\right)}, 1\le r\le T,$ step-by-step, as described below

(a) Forward scoring: Randomly select a small batch of examples (1–10 loan accounts, for example) from the time series of the training sample and calculate ${p}_{rj}$ by (2.3) using the current weights for $1\le r\le T$ and $1\le j\le n.$

(b) Select a time period $r,$ from $1$ to $T$ in sequence. At the ${r}^{th}$ time period, find the partial derivatives of $L$ with respect to ${a}_{ij}^{\left(r\right)}$ and ${b}_{ij}^{\left(r\right)}$ by (3.8) and (3.9); then, calculate $\mathrm{\Delta }{w}_{j}^{\left(r\right)}$ as follows

$\mathrm{\Delta }{w}_{j}^{\left(r\right)} = avg{\left(\frac{\partial L}{\partial {a}_{j1}^{\left(r\right)}}, \frac{\partial L}{\partial {a}_{j2}^{\left(r\right)}}, ..., \frac{\partial L}{\partial {a}_{jm}^{\left(r\right)}}, \frac{\partial L}{\partial {b}_{j1}^{\left(r\right)}}, \frac{\partial L}{\partial {b}_{j2}^{\left(r\right)}}, ..., \frac{\partial L}{\partial {b}_{jn}^{\left(r\right)}}\right)}^{Transpose}, 1\le j\le T,$

which is the average of the partial derivatives of $L$ over the batch; hence, obtain the following weight matrix:

$\mathrm{\Delta }{w}^{\left(r\right)} = \left(\mathrm{\Delta }{w}_{1}^{\left(r\right)}, \mathrm{\Delta }{w}_{2}^{\left(r\right)}, ..., \mathrm{\Delta }{w}_{n}^{\left(r\right)}\right) .$

${w}^{\left(r\right)}\Leftarrow {w}^{\left(r\right)}+\eta \left(\mathrm{\Delta }{w}^{\left(r\right)}\right) ,$

(3.12)

which gives rise to the biggest decrease for the cost function described by (3.2) over the entire training sample. Execute the update for ${w}^{\left(r\right)}$ by (3.12).

(d) Steps (a)–(c) are repeated until no material improvement is possible.

With the partial derivatives being evaluated over only one small batch of examples, these partial derivatives are called the mini-batch stochastic gradient for the cost function. This gradient can go off in a direction far from the batch gradient (i.e., the gradient over the entire training sample). Nevertheless, this noisiness is what we need for non-convex optimization ^[18,19] to escape from saddle points or local minima (Theorem 6 in ^[19]). The disadvantage is that more iterations are required to reach a good solution.

Remark 3.3. For Step (c) in Algorithm 3.2, there are better approaches for selecting a value for the learning rate $\eta$ , rather than exhausting all possible values in the grid. For example, let ${\eta }_{i}$ be the ${i}^{th}$ value in the grid from 1 downward, and assume that currently ${\eta }_{i}$ is the best learning rate so far, and it leads to a decrease for the cost function; stop the search for the learning rate and use ${\eta }_{i}$ as the best learning rate, if ${\eta }_{i+1}$ does not lead to a bigger decrease for the cost function than ${\eta }_{i}$ .

4. Experimental results

In this section, we present the experimental results for the proposed RNN model, as benchmarked with two other Markov rating transition models.

The data we used constituted a synthetic sample, simulating a commercial loan portfolio with seven ratings ${\left\{{R}_{i}\right\}}_{i = 1}^{7}$ over seven quarters (periods). At the end of each quarter, accounts are rated by one of seven ratings, with ratings ${R}_{6}$ and ${R}_{7}$ being, respectively, the withdrawal and default ratings. Both the default and withdrawal ratings are absorbed ratings that were excluded from later quarters for observation. For simplicity, we included only three covariates, which simulated the following drivers for a loan:

(a) Debt service coverage ratio

(b) Debt to tangible net worth ratio

The sample contained 10,000 accounts. It was split 50:50 into training and validation. We focued on the following three models:

1) Model 1—The proposed RNN rating transition model;

2) Model 2—Time-inhomogeneous Markov transition model, with one separate transition model for each period;

3) Model 3—Time-homogeneous Markov transition model, with one single transition model for all periods.

All three model use the same covariates.

Let ${y}_{j}$ denote a binary variable for a loan with a value of 1 if the loan has the rating ${R}_{j}$ at the quarter end, and 0 otherwise. Let ${(p}_{1, }{p}_{2}, \dots, {p}_{7})$ be the multinomial probabilities for a loan estimated by a rating transition model at the beginning of a quarter, with ${p}_{j}$ being the probability of transitioning to ${R}_{j}$ at the quarter end.

Tables 1 and below show the Gini coefficients, over the training and validation samples respectively, for each of the above three models for ranking each of these seven ratings individually. For example, in , for the RNN transition model over the training sample, it had a Gini of 0.84 for the ranking rating ${R}_{1}$ . This Gini was calculated by using ${p}_{1}$ to predict ${y}_{1}$ over the entire training sample. The results shown in these two tables demonstrate strong performance for the RNN transition model over the other two models.

Table 1. Gini by rating on training.

Model	Rating
Model	1	2	3	4	5	6	7	Avg
1 2 3	0.84 0.73 0.53	0.69 0.45 0.33	0.62 0.39 0.19	0.48 0.24 0.15	0.54 0.37 0.20	0.50 0.37 0.32	0.68 0.55 0.53	0.62 0.44 0.32

| Show Table

DownLoad: CSV

Table 2. Gini by rating on vaildation.

Model	Rating
Model	1	2	3	4	5	6	7	Avg
1 2 3	0.82 0.74 0.52	0.68 0.45 0.34	0.60 0.39 0.18	0.46 0.24 0.12	0.45 0.28 0.28	0.34 0.36 0.32	0.66 0.54 0.51	0.57 0.43 0.33

| Show Table

DownLoad: CSV

In the remainder of this section, we focus on the robustness of a model in terms of predicting the default event, and the quality of using ${p}_{7}$ to predict ${y}_{7}$ , the default indicator. Tables 3 and 4 below show the Gini coefficients period by period, over the training and validation samples respectively, for ranking the default indicator over each of seven periods. Again, the RNN transition model significantly outperformed the other two benchmark models across all periods.

Table 3. Gini by period for default rating on training.

Model	Period
Model	1	2	3	4	5	6	7	Avg
1	0.66	0.63	0.61	0.62	0.57	0.67	0.51	0.61
2	0.66	0.07	0.43	0.37	0.33	0.21	0.16	0.32
3	0.66	0.31	0.22	0.27	0.33	0.21	0.16	0.31

| Show Table

DownLoad: CSV

Table 4. Gini by period for default rating on validation.

Model	Period
Model	1	2	3	4	5	6	7	Avg
1	0.64	0.62	0.58	0.55	0.46	0.55	0.53	0.56
2	0.64	0.05	0.35	0.28	0.25	0.04	0.20	0.26
3	0.64	0.27	0.11	0.18	0.25	0.04	0.20	0.24

| Show Table

DownLoad: CSV

The following six tables show the actual and predicted default rates for each model by decile over the training and validation samples. For example, shows the actual and predicted default rates over the training sample for the RNN rating transition model. These values in the table were calculated by first sorting ${p}_{7}$ ascendingly, and then dividing the sample into 10 buckets, each of which was about 10%. The averages of the actual and predicted default rates over each bucket were taken.

Table 5. RNN on training (Gini-68%).

Decile	0	1	2	3	4	5	6	7	8	9
Actual	3.62%	4.90%	4.97%	5.33%	21.16%	23.79%	36.43%	42.47%	61.72%	81.19%
Pred	3.20%	4.33%	4.59%	5.63%	19.68%	23.24%	35.62%	44.20%	62.25%	82.85%

| Show Table

DownLoad: CSV

Table 6. RNN on validation (Gini-66%).

Decile	0	1	2	3	4	5	6	7	8	9
Actual	4.53%	4.89%	4.53%	8.49%	24.73%	24.03%	34.46%	45.04%	61.44%	81.09%
Pred	3.17%	4.31%	4.57%	6.45%	20.65%	23.49%	36.65%	45.87%	63.57%	84.49%

| Show Table

DownLoad: CSV

Table 7. One migration matrix per period on training (Gini-55%).

Decile	0	1	2	3	4	5	6	7	8	9
Actual	11.29%	12.50%	4.55%	7.03%	23.01%	26.99%	38.99%	32.67%	50.14%	78.42%
Pred	1.46%	5.04%	5.84%	6.19%	8.20%	15.54%	24.29%	49.62%	73.66%	95.73%

| Show Table

DownLoad: CSV

Table 8. One migration matrix per period on validation (Gini-54%).

Decile	0	1	2	3	4	5	6	7	8	9
Actual	11.87%	14.89%	5.25%	8.42%	22.65%	26.40%	39.86%	34.39%	51.80%	77.71%
Pred	1.52%	5.16%	5.97%	6.36%	8.58%	16.22%	24.82%	51.71%	76.60%	96.29%

| Show Table

DownLoad: CSV

Table 9. One single migration matrix on training (Gini-53%).

Decile	0	1	2	3	4	5	6	7	8	9
Actual	19.11%	4.90%	12.93%	4.97%	19.03%	29.05%	27.70%	39.28%	50.07%	78.57%
Pred	2.52%	5.36%	6.20%	6.71%	9.82%	21.95%	25.37%	35.67%	75.51%	96.46%

| Show Table

DownLoad: CSV

Table 10. One single migration matrix on validation (Gini-51%).

Decile	0	1	2	3	4	5	6	7	8	9
Actual	21.30%	5.90%	14.46%	4.53%	19.41%	29.06%	29.50%	39.64%	52.09%	77.35%
Pred	2.52%	5.49%	6.35%	6.87%	10.39%	22.60%	25.81%	37.94%	78.29%	96.97%

| Show Table

DownLoad: CSV

These results demonstrate a significant improvement for the RNN model over the other two models, either on training or validation.

5. Conclusions

A rating transition for a credit portfolio is generally path dependent. A Markov rating transition model, either homogeneous or inhomogeneous, usually does not perform well after projecting for a few periods. The RNN model proposed in this paper provides a solution for modeling state transitions under non-Markov settings. This RNN is informed by the information history along the path. The experiments show that this proposed RNN model significantly outperforms Markov models where path dependence is relevant.

Acknowledgements

The author thanks Biao Wu for many valuable discussions in the last 3 years in deep machine learning, Python financial engineering, as well as his insights and comments. Thanks also go to Felix Kan for many valuable comments, to Zunwei Du, Kaijie Cui, and Glenn Fei for many valuable conversations.

Conflict of interest

The views expressed in this article are not necessarily those of the banks the author works with. Please direct any comments to the author Bill Huajian Yang at: h_y02@yahoo.ca.

References

[1]	Yang BH, Du Z, (2016) Rating transition probability models and CCAR stress testing. J Risk Model Validation 10: 1–19. https://doi.org/10.21314/JRMV.2016.155 doi: 10.21314/JRMV.2016.155
[2]	Yang BH, (2017) Forward ordinal models for point-in-time probability of default term structure. J Risk Model Validation 11: 1–18. https://doi.org/10.21314/JRMV.2017.181 doi: 10.21314/JRMV.2017.181
[3]	Dos Reis G, Pfeuffer M, Smith G, (2020) Capturing model risk and rating momentum in the estimation of probabilities of default and credit rating migrations. Quant Finance 20: 1069–1083. https://doi.org/10.1080/14697688.2020.1726439 doi: 10.1080/14697688.2020.1726439
[4]	Miu P, Ozdemir B, (2009) Stress testing probability of default and rating migration rate with respect to Basel Ⅱ requirements. J Risk Model Validation 3: 3–38. https://doi.org/10.21314/JRMV.2009.048 doi: 10.21314/JRMV.2009.048
[5]	Kiefer NM, Larson CE, (2004) Testing Simple Markov Structures for Credit Rating Transitions. Comptroller of the Currency.
[6]	Kiefer NM, Larson CE, (2007) A simulation estimator for testing the time homogeneity of credit rating transitions. J Empirical Finance 14: 818–835. https://doi.org/10.1016/j.jempfin.2006.08.001 doi: 10.1016/j.jempfin.2006.08.001
[7]	Juhasz P, Vidovics-Dancs A, Szaz J, (2017) Measuring path dependency. UTMS J Econ 8: 29–37.
[8]	Russo E, (2020) A discrete-time approach to evaluate path-dependent derivatives in a regime-switching risk model. Risks 8: 9. https://doi.org/10.3390/risks8010009 doi: 10.3390/risks8010009
[9]	Yang BH, (2017) Point-in-time PD term structure models for multi-period scenario loss projection. J Risk Model Validation 11: 73–94. https://doi.org/10.21314/JRMV.2017.164 doi: 10.21314/JRMV.2017.164
[10]	Zhu S, Lomibao D, (2005) A conditional valuation approach for path-dependent instruments. SSRN Electron J. https://doi.org/10.2139/ssrn.806704
[11]	Graves A, Liwicki M, Fernández S, Bertolami R, Bunke H, Schmidhuber J, (2009) A novel connectionist system for unconstrained handwriting recognition. IEEE Trans Pattern Anal Mach Intell 31: 855–868. https://doi.org/10.1109/TPAMI.2008.137 doi: 10.1109/TPAMI.2008.137
[12]	Tealab A, (2018) Time series forecasting using artificial neural networks methodologies: A systemati review. Future Comput Inf J 3: 334–340. https://doi.org/10.1016/j.fcij.2018.10.003 doi: 10.1016/j.fcij.2018.10.003
[13]	Hyötyniemi H, (1997) Proceedings of STeP'96, (eds. Jarmo Alander, Timo Honkela and Matti Jakobsson), Publications of the Finnish Artificial Intelligence Society, 13–24.
[14]	Elman JL, (1990) Finding structure in time. Cognitive Sci 14: 179–211. https://doi.org/10.1016/0364-0213(90)90002-E doi: 10.1016/0364-0213(90)90002-E
[15]	Schmidhuber J, (2015) Deep learning in neural networks: An overview. Neural Networks 61: 85–117. https://doi.org/10.1016/j.neunet.2014.09.003 doi: 10.1016/j.neunet.2014.09.003
[16]	Abiodun OI, Jantan A, Omolara AE, Dada KV, Mohamed NA, Arshad H (2018), State-of-the-art in artificial neural network applications: A survey, Heliyon 4: e00938. https://doi.org/10.1016/j.heliyon.2018.e00938 doi: 10.1016/j.heliyon.2018.e00938
[17]	Dupond S (2019), A thorough review on the current advance of neural network structures. Annu Rev Control 14: 200–230.
[18]	Bottou L, (2010) Large-scale machine learning with stochastic gradient descent, In Proceedings of COMPSTAT'2010 Physica-Verlag HD, 177–186. https://doi.org/10.1007/978-3-7908-2604-3_16
[19]	Ge R, Huang F, Jin C, Yuan Y, (2015) Escaping from saddle points-online stochastic gradient for tensor decomposition, In Conference on Learning Theory, PMLR, 1–46.

Reader Comments

Your name:*

Email:*
© 2022 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Big Data and Information Analytics

Metrics

Article views(2039) PDF downloads(286) Cited by(0)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

Figures and Tables

Figures(1) / Tables(10)

Big Data and Information Analytics

Modeling path-dependent state transitions by a recurrent neural network

Related Papers:

Abstract

1. Introduction

2. Recurrent neural network models for multiperiod state transition

3. Training the RNN rating transition model

3.1. Partial derivatives of $L$ with respect to network weights

3.1.1. Partial derivatives of ${L}_{k}$ with respect to ${v}_{i}^{(k-r)}$

3.1.2. Partial derivatives of $L={\sum }_{k=1}^{T}{L}_{k}$ with respect to ${v}_{i}^{\left(r\right)}$

3.1.3. Partial derivatives of $L={\sum }_{k=1}^{T}{L}_{k}$ with respect to ${a}_{ij}^{\left(r\right)}$ and ${b}_{ij}^{\left(r\right)}$

3.2. Initialization of network weights

3.3. Training the recurrent neural network

4. Experimental results

5. Conclusions

Acknowledgements

Conflict of interest

References

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Catalog

Big Data and Information Analytics

Modeling path-dependent state transitions by a recurrent neural network

Related Papers:

Abstract

1. Introduction

2. Recurrent neural network models for multiperiod state transition

3. Training the RNN rating transition model

3.1. Partial derivatives of L L with respect to network weights

3.1.1. Partial derivatives of Lk {L}_{k} with respect to v(k−r)i {v}_{i}^{(k-r)}

3.1.2. Partial derivatives of L=∑Tk=1Lk L={\sum }_{k=1}^{T}{L}_{k} with respect to v(r)i {v}_{i}^{\left(r\right)}

3.1.3. Partial derivatives of L=∑Tk=1Lk L={\sum }_{k=1}^{T}{L}_{k} with respect to a(r)ij {a}_{ij}^{\left(r\right)} and b(r)ij {b}_{ij}^{\left(r\right)}

3.2. Initialization of network weights

3.3. Training the recurrent neural network

4. Experimental results

5. Conclusions

Acknowledgements

Conflict of interest

References

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog

3.1. Partial derivatives of $L$ with respect to network weights

3.1.1. Partial derivatives of ${L}_{k}$ with respect to ${v}_{i}^{(k-r)}$

3.1.2. Partial derivatives of $L={\sum }_{k=1}^{T}{L}_{k}$ with respect to ${v}_{i}^{\left(r\right)}$

3.1.3. Partial derivatives of $L={\sum }_{k=1}^{T}{L}_{k}$ with respect to ${a}_{ij}^{\left(r\right)}$ and ${b}_{ij}^{\left(r\right)}$