Research article

Modeling path-dependent state transitions by a recurrent neural network

  • Received: 20 June 2022 Revised: 29 July 2022 Accepted: 09 September 2022 Published: 26 September 2022
  • Rating transition models are widely used for credit risk evaluation. It is not uncommon that a time-homogeneous Markov rating migration model will deteriorate quickly after projecting repeatedly for a few periods. This is because the time-homogeneous Markov condition is generally not satisfied. For a credit portfolio, the rating transition is usually path-dependent. In this paper, we propose a recurrent neural network (RNN) model for modeling path-dependent rating migration. An RNN is a type of artificial neural network where connections between nodes form a directed graph along a temporal sequence. There are neurons for input and output at each time period. The model is informed by the past behaviors for a loan along the path. Information learned from previous periods propagates to future periods. The experiments show that this RNN model is robust.

    Citation: Bill Huajian Yang. Modeling path-dependent state transitions by a recurrent neural network[J]. Big Data and Information Analytics, 2022, 7: 1-12. doi: 10.3934/bdia.2022001

    Related Papers:

    [1] Nickson Golooba, Woldegebriel Assefa Woldegerima, Huaiping Zhu . Deep neural networks with application in predicting the spread of avian influenza through disease-informed neural networks. Big Data and Information Analytics, 2025, 9(0): 1-28. doi: 10.3934/bdia.2025001
    [2] Mingxing Zhou, Jing Liu, Shuai Wang, Shan He . A comparative study of robustness measures for cancer signaling networks. Big Data and Information Analytics, 2017, 2(1): 87-96. doi: 10.3934/bdia.2017011
    [3] Jason Adams, Yumou Qiu, Luis Posadas, Kent Eskridge, George Graef . Phenotypic trait extraction of soybean plants using deep convolutional neural networks with transfer learning. Big Data and Information Analytics, 2021, 6(0): 26-40. doi: 10.3934/bdia.2021003
    [4] David E. Bernholdt, Mark R. Cianciosa, David L. Green, Kody J.H. Law, Alexander Litvinenko, Jin M. Park . Comparing theory based and higher-order reduced models for fusion simulation data. Big Data and Information Analytics, 2018, 3(2): 41-53. doi: 10.3934/bdia.2018006
    [5] Amanda Working, Mohammed Alqawba, Norou Diawara, Ling Li . TIME DEPENDENT ATTRIBUTE-LEVEL BEST WORST DISCRETE CHOICE MODELLING. Big Data and Information Analytics, 2018, 3(1): 55-72. doi: 10.3934/bdia.2018010
    [6] Xiaoxiang Guo, Zuolin Shi, Bin Li . Multivariate polynomial regression by an explainable sigma-pi neural network. Big Data and Information Analytics, 2024, 8(0): 65-79. doi: 10.3934/bdia.2024004
    [7] Sayed Mohsin Reza, Md Al Masum Bhuiyan, Nishat Tasnim . A convolution neural network with encoder-decoder applied to the study of Bengali letters classification. Big Data and Information Analytics, 2021, 6(0): 41-55. doi: 10.3934/bdia.2021004
    [8] Marco Tosato, Jianhong Wu . An application of PART to the Football Manager data for players clusters analyses to inform club team formation. Big Data and Information Analytics, 2018, 3(1): 43-54. doi: 10.3934/bdia.2018002
    [9] Yaru Cheng, Yuanjie Zheng . Frequency filtering prompt tuning for medical image semantic segmentation with missing modalities. Big Data and Information Analytics, 2024, 8(0): 109-128. doi: 10.3934/bdia.2024006
    [10] Robin Cohen, Alan Tsang, Krishna Vaidyanathan, Haotian Zhang . Analyzing opinion dynamics in online social networks. Big Data and Information Analytics, 2016, 1(4): 279-298. doi: 10.3934/bdia.2016011
  • Rating transition models are widely used for credit risk evaluation. It is not uncommon that a time-homogeneous Markov rating migration model will deteriorate quickly after projecting repeatedly for a few periods. This is because the time-homogeneous Markov condition is generally not satisfied. For a credit portfolio, the rating transition is usually path-dependent. In this paper, we propose a recurrent neural network (RNN) model for modeling path-dependent rating migration. An RNN is a type of artificial neural network where connections between nodes form a directed graph along a temporal sequence. There are neurons for input and output at each time period. The model is informed by the past behaviors for a loan along the path. Information learned from previous periods propagates to future periods. The experiments show that this RNN model is robust.



    Rating transition models are widely used in the financial industry for credit risk evaluations, including stress testing and IFRS 9 expected credit loss evaluation [1,2,3,4], under the assumption that a rating transition is a time-homogeneous Markov process, depending only on the current rating and covariates. However, it is not uncommon that a Markov model deteriorates quickly after projecting for a few periods. This is because the Markov condition is generally not satisfied. A rating transition for a credit portfolio is generally path dependent. A test for this assumption is required [5,6] for the use of these Markov models.

    There are various methods for path-dependent credit risk modeling [7], including regime-switching models [8] and the conditional methods [9,10]. The latter is comparative to a cohort analysis.

    In this paper, we propose a recurrent neural network (RNN) model for modeling path-dependent rating transition. An RNN is a type of artificial neural network where connections between nodes form a directed graph along a temporal sequence. There are neurons for input and output at each time-period. The RNN is informed by past behaviours along the path. Information learned from previous periods propagates to future periods [11,12,13,14,15,16,17].

    The network structure for the proposed RNN model is described in Section 2 by using (2.1)–(2.4). This RNN model was implemented in Python. The experiments show that this RNN model is robust, compared to Markov transition models. Applications of this RNN model include the following, wherever path dependence is relevant:

    (a) Path-dependent asset evaluation or credit risk evaluation

    (b) Decisioning for account management

    (c) Forecasting loss for stress testing, expected credit loss for IFRS 9 projects

    (d) Estimating conditional probability of default for survival analysis

    The paper is organized as follows. In Section 2, we setup the proposed RNN model. In Section 3, we calculate the partial derivatives for the network cost function. In Section 4, we present the experimental results for this proposed RNN model, as benchmarked with the time-homogeneous and time-inhomogeneous rating transition models.

    In this section, we describe, the proposed RNN model, as given in (2.1)–(2.4), for modeling path-dependent rating transition. Given an observation horizon with T periods:

    0=t0<t1<t2<<tT;

    our goal is to estimate at each i0 the probability of transitioning to a rating at time ti+1 given the rating and covariates at time ti.

    Traditional rating transition models assume the Markov condition, i.e., the transition probability depends only on the current rating and covariates. The type of Markov transition models includes the following:

    (a) Time-homogeneous Markov rating transition, as represented by one single transition model for all periods

    (b) Time-inhomogeneous Markov rating transition, as represented by one transition model for each period

    It is not uncommon that a time-homogeneous Markov model will deteriorate quickly after projecting only for a few periods. This is because the Markov condition is generally not satisfied. A rating transition for a credit portfolio is generally path dependent.

    An RNN is an artificial neural network where connections between nodes form a directed graph along a temporal sequence. The chart below depicts the structure of an RNN. At the ith time period of the temporal sequence, the input neurons, the hidden neurons and the output neurons are respectively labeled x(i), h(i) and y(i):

    Figure 1.  An RNN for rating transition.

    An RNN shares the advantages of common neural networks; particularly, information learned at a point is propagated back and forward to all periods. It is path dependent.

    Let {Ri}ni=1 denote the n ratings for a credit portfolio. For a loan portfolio, we reserve Rn1 as the withdrawal rating and Rn as the default ratings. Both the default and withdrawal ratings are assumed to be absorbed states, which means that a loan rated by a default or withdrawal rating will be excluded from the sample for future subsequent observations. Rating labels are observable at the beginning and the end of a period.

    Let r(i)j denote the indicator with a value of 1 if the rating for a loan at the end of the ith period is Rj and 0 otherwise. Let np denote the number of non-absorbed ratings. An input at the ith period is denoted as

    x(i)=(x(i)1,x(i)2,  ,x(i)m),

    where the first (mnp) input components (x(i)1,x(i)2,,x(i)mnp) denote the covariates observable at the beginning of the ith period, and the remaining np components are the rating indicators for non-absorbed ratings observed at the end of the (i1)th period:

    x(i)j=r(i1)j,   mnp+1jm.

    That is, the non-absorbed rating observed at the end of the (i1)th period is used as the input for the next period. The output at the ith period is denoted as

    y(i)=(y(i)1,y(i)2,,y(i)n)

    where

    y(i)j=r(i)j,1jn.

    The structure for this RNN is described as shown in (2.1)–(2.4) below. Initially, for the first period, we have

    (a) Input: x(1)=(x(1)1,x(1)2,,x(1)m);

    (b) Output: y(1) = (y(1)1,y(1)2,,y(1)n), i.e., a unit vector where all components are zero except for one, which has a value of 1, and it is a random realization generated by the multinomial probability p1=(p11,p12,,p1n), where

    p1j=exp(v(1)j)exp(v(1)1)+exp(v(1)2)++exp(v(1)n), (2.1)

    and

    v(1)j=a(1)j1x(1)1+a(1)j2x(1)2++a(1)jmx(1)m. (2.2)

    Vector (v(1)1,v(1)2,,v(1)n) in (2.1) and (2.2) represents the information learned during the 1st period, which is stored in hidden neurons h(1)=(h(1)1,h(1)2,,h(1)n) in the 1st period.

    In general, given the vector (v(i1)1,v(i1)2,,v(i1)n) for the (i1)th period (i2), we have the following at the ith period:

    (c) Input: x(i)=(x(i)1,x(i)2,,x(i)m);

    (d) Output: y(i)=(y(i)1,y(i)2,,y(i)n), i.e., a unit vector, which is a random realization generated by multinomial probability pi=(pi1,pi2,,pin), where

    pij=exp(v(i)j)exp(v(i)1)+exp(v(i)2)++exp(v(i)n). (2.3)

    and

    v(i)j=a(i)j1x(i)1+a(i)j2x(i)2++a(i)jmx(i)m+b(i)j1v(i1)1+b(i)j2v(i1)2++b(i)jnv(i1)n. (2.4)

    As observed, v(i)j consists of two parts, one from the current input, and the other from history, i.e., (v(i1)1,v(i1)2,,v(i1)n), corresponding to the information learned up to the (i1)th period. Similarly, the vector (v(i)1,v(i)2,,v(i)n) represents the information learned so far up to ith period, which is stored in hidden neurons h(i)=(h(i)1,h(i)2,,h(i)n).

    This RNN model works, at the ith stage, in the way as described below:

    1) Collects the input x(i), and information h(i1) learned up to the end of (i1)

    2) Learns from x(i) and h(i1) and stores the learned information in hidden neurons h(i)

    3) Derives the multinomial probability (pi1,pi2,,pin), where pij is the probability that the event will transition to jth rating.

    4) Output: y(i) is a random multinomial realization, given the multinomial probability distribution (pi1,pi2,,pin)

    Remark 2.1. The formulation of (2.4) does not come with a bias (i.e., intercept). An intercept can be inserted by adding a covariate with a constant value of 1, whenever necessary.

    Let y(k)=(y(k)1,y(k)2,,y(k)n) be the observed outcome at the kth period. The cost function at the kth period is denoted by Lk, which is given as (for one single data point):

    Lk=nj=1y(k)jlog(pkj). (3.1)

    This is the negative log-likelihood for observing the multinomial outcome (y(k)1,y(k)2,,y(k)n). The total cost function to be minimized for this recurrent neural network is

    L=L1+L2++LT. (3.2)

    summing over entire training sample.

    Training a neural network involves a series of gradient descent searches. Evaluation of partial derivatives is essential. In this sub-section, we calculate the partial derivatives for the network cost function with respect to network weights.

    Let d(k,r)i denote the partial derivative of Lk with respect to v(kr)i, 0rk1. By (2.1) and (2.3), we have the partial derivative pkiv(k)j as:

    pkiv(k)j={pki(1pki),i=j,    pkipkj,ij. (3.3)

    Hence, by (3.1) and (3.3), we have the following for 1in:

    d(k,0)i=Lk/v(k)i=nj=1y(k)j[log(pkj)]/v(k)i=y(k)i(1pki)+njiy(k)jpki=(y(k)ipki), (3.4)

    where nj=1y(k)j=1 is used.

    Given d(k,0)j,1jn, we can calculate d(k,1)i from top-down for k>1 and 1in by using (2.4):

    d(k,1)i=Lkv(k1)i=nj=1(Lkv(k)j)(v(k)jv(k1)i)=nj=1b(k)jid(k,0)j.

    Inductively, we have the following for k>r and 0in:

    d(k,r)i=Lkv(kr)i=nj=1(Lkv(kr+1)j)(v(kr+1)jv(kr)i)=nj=1b(kr+1)jid(k,r1)j. (3.5)

    We will use the following fact:

    Lkv(i)j=0 if k<i. (3.6)

    Let DTri denote the partial derivative of L with respect to v(Tr)i. Given {d(k,0)i|1in,1kT}, we can calculate {DTri|1in,0r<T} top-down. Initially, at the top period, we have the following by (3.6):

    DTi=Lv(T)i=LTv(T)i=d(T,0)i.

    Next, backward from the top period, we have DT1i for T>1 and 1in as follows:

    DT1i=Lv(T1)i=(LT+LT1)v(T1)i
    =LTv(T1)i+LT1v(T1)i=nj=1LTv(T)jv(T)jv(T1)i+d(T1,0)i
    =nj=1b(T)jiLv(T)j+d(T1,0)i
    =nj=1b(T)jiDTj+d(T1,0)i,

    where (3.6) is used for 2nd and 5th equality signs. Inductively, we have DTri for T>r and 1in from top-down as follows:

    DTri=Lv(Tr)i=(LT+LT1++LTr+1)v(Tr)i+LTrv(Tr)i,=nj=1(LT+LT1++LTr+1)v(Tr+1)jv(Tr+1)jv(Tr)i+d(Tr,0)i=nj=1Lv(Tr+1)jv(Tr+1)jv(Tr)i+d(Tr,0)i=nj=1b(Tr+1)jiD(Tr+1)j+d(Tr,0)i, (3.7)

    where (3.6) is used for 2nd and 4th equality signs.

    Given {Dri}, i.e., the partial derivatives of the cost function L with respect to v(r)i; we can now find the partial derivatives of L with respect network weights a(r)ij and b(r)ij as defined in (2.2) and (2.4).

    Let δrij and σrij denote respectively the partial derivatives of L with respect to a(r)ij and b(r)ij. By (2.4), at the top time period r=T, we have

    δTij=La(T)ij=Lv(T)iv(T)ia(T)ij=x(T)jDTi,σTij=Lb(T)ij=Lv(T)iv(T)ib(T)ij=v(T1)jDTi

    If general, we have

    δTrij=x(Tr)jDTri, (3.8)
    σTrij=v(Tr1)jDTri. (3.9)

    A good initialization of the network weights a(r)ij and b(r)ij speeds up the convergence for the network training. In this sub-section, we propose an algorithm for initializing the network weights.

    Let w(r)j denote the vector of weights in (2.4) for v(r)j at the rth time period, i.e.,

    w(r)j=(a(r)j1,a(r)j2,,a(r)jm,b(r)j1,b(r)j2,,b(r)jn)transpose (3.10A)

    for r>1, and for r=1,

    w(1)j=(a(1)j1,a(1)j2,,a(1)jm)transpose. (3.10B)

    Then the weight matrix for the network at the rth time-period is given by

    w(r)=(w(r)1,w(r)2,,w(r)n),1rT. (3.11)

    Algorithm 3.1 (Initialization). Initialize network weights a(r)ij and b(r)ij as follows, step-by-step, starting from the first time period:

    (a) Find w(1)j,1jn, by running a linear (or logistic if more sensitivity is required for some y(1)j's) regression against the binary target y(1)j with x(1) as the explanatory variable. Derive v(1)j by (2.2).

    (b) Given x(2)=(x(2)1,x(2)2,, x(2)m) and v(1)=(v(1)1,v(1)2,,v(1)n), find w(2)j,1jn by running a linear (or logistic if more sensitivity is required for some y(2)j's) regression against y(2)j with the components of x(2) and v(1) as explanatory variables. Derive v(2)j by (2.4).

    (c) Repeat (b) to obtain the initial weights for w(r)j at the rth time period for 1jn and 1rT.

    Given initial weights, network training involves a series of gradient descent searches, as described in the next algorithm. Let w(r) be the weight matrix as in (3.11) for the network at the rth time period, i.e.,

    w(r)=(w(r)1,w(r)2,,w(r)n),1rT.

    Algorithm 3.2 (Network training). Update network weights w(r),1rT, step-by-step, as described below

    (a) Forward scoring: Randomly select a small batch of examples (1–10 loan accounts, for example) from the time series of the training sample and calculate prj by (2.3) using the current weights for 1rT and 1jn.

    (b) Select a time period r, from 1 to T in sequence. At the rth time period, find the partial derivatives of L with respect to a(r)ij and b(r)ij by (3.8) and (3.9); then, calculate Δw(r)j as follows

    Δw(r)j=avg(La(r)j1,La(r)j2,...,La(r)jm,Lb(r)j1,Lb(r)j2,...,Lb(r)jn)Transpose,1jT,

    which is the average of the partial derivatives of L over the batch; hence, obtain the following weight matrix:

    Δw(r)=(Δw(r)1,Δw(r)2,...,Δw(r)n).

    (c) Select a learning rate η from a grid of values such that the update of w(r) is

    w(r)w(r)+η(Δw(r)), (3.12)

    which gives rise to the biggest decrease for the cost function described by (3.2) over the entire training sample. Execute the update for w(r) by (3.12).

    (d) Steps (a)–(c) are repeated until no material improvement is possible.

    With the partial derivatives being evaluated over only one small batch of examples, these partial derivatives are called the mini-batch stochastic gradient for the cost function. This gradient can go off in a direction far from the batch gradient (i.e., the gradient over the entire training sample). Nevertheless, this noisiness is what we need for non-convex optimization [18,19] to escape from saddle points or local minima (Theorem 6 in [19]). The disadvantage is that more iterations are required to reach a good solution.

    Remark 3.3. For Step (c) in Algorithm 3.2, there are better approaches for selecting a value for the learning rate η, rather than exhausting all possible values in the grid. For example, let ηi be the ith value in the grid from 1 downward, and assume that currently ηi is the best learning rate so far, and it leads to a decrease for the cost function; stop the search for the learning rate and use ηi as the best learning rate, if ηi+1 does not lead to a bigger decrease for the cost function than ηi.

    In this section, we present the experimental results for the proposed RNN model, as benchmarked with two other Markov rating transition models.

    The data we used constituted a synthetic sample, simulating a commercial loan portfolio with seven ratings {Ri}7i=1 over seven quarters (periods). At the end of each quarter, accounts are rated by one of seven ratings, with ratings R6 and R7 being, respectively, the withdrawal and default ratings. Both the default and withdrawal ratings are absorbed ratings that were excluded from later quarters for observation. For simplicity, we included only three covariates, which simulated the following drivers for a loan:

    (a) Debt service coverage ratio

    (b) Debt to tangible net worth ratio

    (c) Current ratio

    The sample contained 10,000 accounts. It was split 50:50 into training and validation. We focued on the following three models:

    1) Model 1—The proposed RNN rating transition model;

    2) Model 2—Time-inhomogeneous Markov transition model, with one separate transition model for each period;

    3) Model 3—Time-homogeneous Markov transition model, with one single transition model for all periods.

    All three model use the same covariates.

    Let yj denote a binary variable for a loan with a value of 1 if the loan has the rating Rj at the quarter end, and 0 otherwise. Let (p1,p2,,p7) be the multinomial probabilities for a loan estimated by a rating transition model at the beginning of a quarter, with pj being the probability of transitioning to Rj at the quarter end.

    Tables 1 and 2 below show the Gini coefficients, over the training and validation samples respectively, for each of the above three models for ranking each of these seven ratings individually. For example, in Table 1, for the RNN transition model over the training sample, it had a Gini of 0.84 for the ranking rating R1. This Gini was calculated by using p1 to predict y1 over the entire training sample. The results shown in these two tables demonstrate strong performance for the RNN transition model over the other two models.

    Table 1.  Gini by rating on training.
    Model Rating
    1 2 3 4 5 6 7 Avg
    1
    2
    3
    0.84
    0.73
    0.53
    0.69
    0.45
    0.33
    0.62
    0.39
    0.19
    0.48
    0.24
    0.15
    0.54
    0.37
    0.20
    0.50
    0.37
    0.32
    0.68
    0.55
    0.53
    0.62
    0.44
    0.32

     | Show Table
    DownLoad: CSV
    Table 2.  Gini by rating on vaildation.
    Model Rating
    1 2 3 4 5 6 7 Avg
    1
    2
    3
    0.82
    0.74
    0.52
    0.68
    0.45
    0.34
    0.60
    0.39
    0.18
    0.46
    0.24
    0.12
    0.45
    0.28
    0.28
    0.34
    0.36
    0.32
    0.66
    0.54
    0.51
    0.57
    0.43
    0.33

     | Show Table
    DownLoad: CSV

    In the remainder of this section, we focus on the robustness of a model in terms of predicting the default event, and the quality of using p7 to predict y7, the default indicator. Tables 3 and 4 below show the Gini coefficients period by period, over the training and validation samples respectively, for ranking the default indicator over each of seven periods. Again, the RNN transition model significantly outperformed the other two benchmark models across all periods.

    Table 3.  Gini by period for default rating on training.
    Model Period
    1 2 3 4 5 6 7 Avg
    1 0.66 0.63 0.61 0.62 0.57 0.67 0.51 0.61
    2 0.66 0.07 0.43 0.37 0.33 0.21 0.16 0.32
    3 0.66 0.31 0.22 0.27 0.33 0.21 0.16 0.31

     | Show Table
    DownLoad: CSV
    Table 4.  Gini by period for default rating on validation.
    Model Period
    1 2 3 4 5 6 7 Avg
    1 0.64 0.62 0.58 0.55 0.46 0.55 0.53 0.56
    2 0.64 0.05 0.35 0.28 0.25 0.04 0.20 0.26
    3 0.64 0.27 0.11 0.18 0.25 0.04 0.20 0.24

     | Show Table
    DownLoad: CSV

    The following six tables show the actual and predicted default rates for each model by decile over the training and validation samples. For example, Table 5 shows the actual and predicted default rates over the training sample for the RNN rating transition model. These values in the table were calculated by first sorting p7 ascendingly, and then dividing the sample into 10 buckets, each of which was about 10%. The averages of the actual and predicted default rates over each bucket were taken.

    Table 5.  RNN on training (Gini-68%).
    Decile 0 1 2 3 4 5 6 7 8 9
    Actual 3.62% 4.90% 4.97% 5.33% 21.16% 23.79% 36.43% 42.47% 61.72% 81.19%
    Pred 3.20% 4.33% 4.59% 5.63% 19.68% 23.24% 35.62% 44.20% 62.25% 82.85%

     | Show Table
    DownLoad: CSV
    Table 6.  RNN on validation (Gini-66%).
    Decile 0 1 2 3 4 5 6 7 8 9
    Actual 4.53% 4.89% 4.53% 8.49% 24.73% 24.03% 34.46% 45.04% 61.44% 81.09%
    Pred 3.17% 4.31% 4.57% 6.45% 20.65% 23.49% 36.65% 45.87% 63.57% 84.49%

     | Show Table
    DownLoad: CSV
    Table 7.  One migration matrix per period on training (Gini-55%).
    Decile 0 1 2 3 4 5 6 7 8 9
    Actual 11.29% 12.50% 4.55% 7.03% 23.01% 26.99% 38.99% 32.67% 50.14% 78.42%
    Pred 1.46% 5.04% 5.84% 6.19% 8.20% 15.54% 24.29% 49.62% 73.66% 95.73%

     | Show Table
    DownLoad: CSV
    Table 8.  One migration matrix per period on validation (Gini-54%).
    Decile 0 1 2 3 4 5 6 7 8 9
    Actual 11.87% 14.89% 5.25% 8.42% 22.65% 26.40% 39.86% 34.39% 51.80% 77.71%
    Pred 1.52% 5.16% 5.97% 6.36% 8.58% 16.22% 24.82% 51.71% 76.60% 96.29%

     | Show Table
    DownLoad: CSV
    Table 9.  One single migration matrix on training (Gini-53%).
    Decile 0 1 2 3 4 5 6 7 8 9
    Actual 19.11% 4.90% 12.93% 4.97% 19.03% 29.05% 27.70% 39.28% 50.07% 78.57%
    Pred 2.52% 5.36% 6.20% 6.71% 9.82% 21.95% 25.37% 35.67% 75.51% 96.46%

     | Show Table
    DownLoad: CSV
    Table 10.  One single migration matrix on validation (Gini-51%).
    Decile 0 1 2 3 4 5 6 7 8 9
    Actual 21.30% 5.90% 14.46% 4.53% 19.41% 29.06% 29.50% 39.64% 52.09% 77.35%
    Pred 2.52% 5.49% 6.35% 6.87% 10.39% 22.60% 25.81% 37.94% 78.29% 96.97%

     | Show Table
    DownLoad: CSV

    These results demonstrate a significant improvement for the RNN model over the other two models, either on training or validation.

    A rating transition for a credit portfolio is generally path dependent. A Markov rating transition model, either homogeneous or inhomogeneous, usually does not perform well after projecting for a few periods. The RNN model proposed in this paper provides a solution for modeling state transitions under non-Markov settings. This RNN is informed by the information history along the path. The experiments show that this proposed RNN model significantly outperforms Markov models where path dependence is relevant.

    The author thanks Biao Wu for many valuable discussions in the last 3 years in deep machine learning, Python financial engineering, as well as his insights and comments. Thanks also go to Felix Kan for many valuable comments, to Zunwei Du, Kaijie Cui, and Glenn Fei for many valuable conversations.

    The views expressed in this article are not necessarily those of the banks the author works with. Please direct any comments to the author Bill Huajian Yang at: h_y02@yahoo.ca.



    [1] Yang BH, Du Z, (2016) Rating transition probability models and CCAR stress testing. J Risk Model Validation 10: 1–19. https://doi.org/10.21314/JRMV.2016.155 doi: 10.21314/JRMV.2016.155
    [2] Yang BH, (2017) Forward ordinal models for point-in-time probability of default term structure. J Risk Model Validation 11: 1–18. https://doi.org/10.21314/JRMV.2017.181 doi: 10.21314/JRMV.2017.181
    [3] Dos Reis G, Pfeuffer M, Smith G, (2020) Capturing model risk and rating momentum in the estimation of probabilities of default and credit rating migrations. Quant Finance 20: 1069–1083. https://doi.org/10.1080/14697688.2020.1726439 doi: 10.1080/14697688.2020.1726439
    [4] Miu P, Ozdemir B, (2009) Stress testing probability of default and rating migration rate with respect to Basel Ⅱ requirements. J Risk Model Validation 3: 3–38. https://doi.org/10.21314/JRMV.2009.048 doi: 10.21314/JRMV.2009.048
    [5] Kiefer NM, Larson CE, (2004) Testing Simple Markov Structures for Credit Rating Transitions. Comptroller of the Currency.
    [6] Kiefer NM, Larson CE, (2007) A simulation estimator for testing the time homogeneity of credit rating transitions. J Empirical Finance 14: 818–835. https://doi.org/10.1016/j.jempfin.2006.08.001 doi: 10.1016/j.jempfin.2006.08.001
    [7] Juhasz P, Vidovics-Dancs A, Szaz J, (2017) Measuring path dependency. UTMS J Econ 8: 29–37.
    [8] Russo E, (2020) A discrete-time approach to evaluate path-dependent derivatives in a regime-switching risk model. Risks 8: 9. https://doi.org/10.3390/risks8010009 doi: 10.3390/risks8010009
    [9] Yang BH, (2017) Point-in-time PD term structure models for multi-period scenario loss projection. J Risk Model Validation 11: 73–94. https://doi.org/10.21314/JRMV.2017.164 doi: 10.21314/JRMV.2017.164
    [10] Zhu S, Lomibao D, (2005) A conditional valuation approach for path-dependent instruments. SSRN Electron J. https://doi.org/10.2139/ssrn.806704
    [11] Graves A, Liwicki M, Fernández S, Bertolami R, Bunke H, Schmidhuber J, (2009) A novel connectionist system for unconstrained handwriting recognition. IEEE Trans Pattern Anal Mach Intell 31: 855–868. https://doi.org/10.1109/TPAMI.2008.137 doi: 10.1109/TPAMI.2008.137
    [12] Tealab A, (2018) Time series forecasting using artificial neural networks methodologies: A systemati review. Future Comput Inf J 3: 334–340. https://doi.org/10.1016/j.fcij.2018.10.003 doi: 10.1016/j.fcij.2018.10.003
    [13] Hyötyniemi H, (1997) Proceedings of STeP'96, (eds. Jarmo Alander, Timo Honkela and Matti Jakobsson), Publications of the Finnish Artificial Intelligence Society, 13–24.
    [14] Elman JL, (1990) Finding structure in time. Cognitive Sci 14: 179–211. https://doi.org/10.1016/0364-0213(90)90002-E doi: 10.1016/0364-0213(90)90002-E
    [15] Schmidhuber J, (2015) Deep learning in neural networks: An overview. Neural Networks 61: 85–117. https://doi.org/10.1016/j.neunet.2014.09.003 doi: 10.1016/j.neunet.2014.09.003
    [16] Abiodun OI, Jantan A, Omolara AE, Dada KV, Mohamed NA, Arshad H (2018), State-of-the-art in artificial neural network applications: A survey, Heliyon 4: e00938. https://doi.org/10.1016/j.heliyon.2018.e00938 doi: 10.1016/j.heliyon.2018.e00938
    [17] Dupond S (2019), A thorough review on the current advance of neural network structures. Annu Rev Control 14: 200–230.
    [18] Bottou L, (2010) Large-scale machine learning with stochastic gradient descent, In Proceedings of COMPSTAT'2010 Physica-Verlag HD, 177–186. https://doi.org/10.1007/978-3-7908-2604-3_16
    [19] Ge R, Huang F, Jin C, Yuan Y, (2015) Escaping from saddle points-online stochastic gradient for tensor decomposition, In Conference on Learning Theory, PMLR, 1–46.
  • Reader Comments
  • © 2022 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(2039) PDF downloads(286) Cited by(0)

Figures and Tables

Figures(1)  /  Tables(10)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog