1.
Introduction
Contemporary Amperex Technology Co., Ltd. (CATL), as a leading new energy innovation and technology company in China and even the world, has attracted widespread attention for its stock price [1,2,3]. Predicting the trend of stock price is an important basis for many investors to make investments. However, influenced by factors such as new energy costs, usability, economic conditions, and policy environments, CATL's stock price exhibits prolonged volatility and long-memory characteristics, which brings inherent difficulties to the prediction of its price trend. The objective of this study is to design an efficient numerical method, named extended fractional neural stochastic differential equation (fNSDE)-Net, for simulating and forecasting stock prices and their corresponding Bermuda options, which exhibit long-term dependence characteristics. The core idea behind this method is to integrate the generative adversarial network (GAN) with the fNSDE and self-attention modules to generate and predict the prices of underlying assets. The main contributions of this paper are summarized as follows: The proposed method, extended fNSDE-Net, combines the strengths of fNSDE-Net, which excels at simulating long-range correlated time series, and the generative capabilities of GAN, enabling the generation of multiple simulation paths for a single stock price time series in a probabilistic sense. In particular, the self-attention mechanism is incorporated into the generator component of our method to better capture long-range dependencies. The results demonstrate that, compared to the pure fNSDE-GAN method [4] and the NSDE-GAN method [5], our approach outperforms them in terms of both fitting and generalization performance for stock prices exhibiting temporal memory effects. In addition to fitting/forecasting the stock price of CATL, we also address the pricing of a vanilla Bermuda call option derived from the generated stock price paths. Motivated by the least squares Monte (LSM) Carlo method originally proposed by Longstaff et al. [6], we apply dynamic programming principles to price the corresponding option. To the best of our knowledge, no existing work has combined fNSDE-Net with LSM methods for option pricing. This paper aims to make a modest contribution in this regard.
In fact, with the advancement of artificial intelligence (AI), the use of AI to simulate, calculate, and predict changes in new energy stock prices has become increasingly popular and has become a major subject of exploration for financial analysts and academic researchers. In recent years, some scholars have made significant contributions in this field, for example, Sadorsky [7] made use of the machine learning method of random forests to predict the stock price direction of clean energy exchange traded funds, and showed that random forests are a useful method for forecasting the clean energy stock price trends. In addition, Sadorsky [8] also used tree-based machine learning methods to forecast the direction of solar stock prices. Wang et al. [9] developed a novel prediction method by the integration of time series and cloud models to manage the stock price prediction of new energy vehicles enterprises. Gu et al. [10] proposed a neural network prediction method via Markov chain and long short-term memory (LSTM) neural network to predict new energy stock price, and proved that the new method was better than LSTM neural network in the evaluation indexes of average relative error, a posteriori error, mean square error and small robability error of stock price prediction. Meng et al. [11] employed a novel attention mechanism to forecast new energy electricity price, and demonstrated that the proposed model was superior to some hybrid models. Shen et al. [12] utilized autoregressive univariate time series models to predict the trends of new energy stock closing prices, and showed by comparison that the temporal convolutional attention neural network has a great effect on stock price prediction. Zhu et al. [13] introduced the new model called "VMD-BIGRU" to predict the trend of the portfolio and quantify the trades of the portfolios, and proved that this model greatly improved forecasting performance compared to single gated recurrent unit or LSTM models. Fan et al. [14] used a new AI model to predict the stock price of new energy industry. This method involved support vector machines and Ant Lion Optimization to accurately predict stock prices. Alshater et al. [15] deployed Machine Learning models to forecast energy equity prices by employing uncertainty indices as a proxy for predicting energy market volatility. Gao et al. [16] introduced the most commonly used machine learning techniques and explored their diverse applications in marketing, stock analysis, demand forecasting, and energy marketing. Ghosh et al. [17] investigated the predictability of clean energy investment in the US market by using NeuralProphet and selecting eight sectoral stock indices. Jiang et al. [18] compared the application of various artificial intelligence methods in electricity price forecasting. Lin et al. [19] proposed a novel decomposition-ensemble model to predict the energy prices which fluctuate wildly. Mostafa et al. [20] used evolutionary techniques such as gene expression programming and artificial neural network models to predict oil prices over the period from January 2, 1986 to June 12, 2012. Pino et al. [21] predicted the hourly energy prices for the next-day in the Spanish electricity production market based on artificial neural networks and compared the calculation results with the ARIMA model. Although great progress has been made in numerical simulation and prediction of new energy stock prices using neural networks or machine learning methods, the problem of predicting the price of a single new energy stock with long-term memory effects and non-Brownian motion characteristics has not yet been addressed to the best of our knowledge. This paper attempts to fill this gap from the perspective of numerical approximation.
The rest of the paper is organized as follows: Section 2 theoretically outlines the details of the extended fNSDE-Net approach. Section 3 presents a series of numerical experiments to illustrate the effectiveness of the new approach. Section 4 introduces the pricing problem of a vanilla Bermuda call option induced by the generated stock price path and verifies the reliability of the pricing method through comparative experiments. Finally, Section 5 concludes.
2.
Stock price and new network model
In this section, we first present the closing prices of CATL stock over the past year, revealing its long-range dependency characteristic. Then we introduce the proposed new method in detail. It is started by stating some basic mathematical concepts and definitions.
Definition 2.1. Let H∈(0,1) be a fixed number. A real-valued mean-zero Gaussian process BH(t),t≥0, is called fractional Brownian motion (fBm) with Hurst index H if it fulfills
almost surely and its covariance function satisfies:
Obviously, fBm becomes the pure Brownian motion when
Moreover, fBm has stationary increments since
see, e.g., [22, Section 5.3] for detail. Especially, almost all paths of fBm with Hurst index H satisfiy (H−ε)-Hölder regularity [23] for any ε>0, i.e.,
for a constant c>0, and s,t≥0. This means: sample path of fBm shows better regularity as the Hurst index H becomes large, as shown in Figure 1. It is worthwhile to point that according to the definition of long-range dependency [4, Definition Ⅲ.2], one verifies readily that the increments of fBm exhibit long-range dependency if and only if
Next, we investigate the mathematical characteristic of stock price of CATL. Through AKShare database (see the link https://github.com/akfamily/akshare), we take the closing prices of CATL stock for all workdays from May 2023 to May 2024, which is given in Figure 2.
Indeed, AKShare is a Python library that provides financial data extraction and analysis tools for Chinese financial markets. Many researchers have made use of the data from AKShare to analyze stock price trends, see, e.g., [24,25] and references therein.
Now we will show that the path of CATL's stock closing price is characterized by long-range dependency. To this end, we need to introduce the notion of R/S statistic.
Let
be a discrete random time series. Define the range of V(t) at time T by
with
representing the sample mean. Furthermore, let ST be the sample standard deviation defined by
The quantity RT/ST is called R/S statistics. An important application of R/S statistics is to determine whether a random time series V(t) can be simulated using fBm. To be specific, Mandelbrot [26] has proven that R/S statistics asymptotically satisfy the relation
where H is the Hurst index, c is a constant, and OT(1) is a quantity that vanishes in probability as T tends to infinity. In fact, Eq (2.2) provides a natural way to estimate H, such as using a linear regression approach.
By simple calculation, we obtain the Hurst index that matches the closing price path of CATL stock is 0.721, which is bigger than 0.5, indicating that the increments of CATL stock price have long-range dependency. This implies that employing a dynamic process driven by fBm with
to simulate the CATL stock price will be reasonable. In addition to the Hurst index of CATL's stock price, we also calculated the Hurst index for the stock prices of several other well-known companies (see Table 1 below for details) and found that, with the exception of CATL, the Hurst index for the stock prices of the others is less than 1/2. In the current work, we focus on the stock price valuation problem with long-term dependencies (i.e., H>1/2), and thus, we selected CATL's stock price as the underlying asset.
One more thing to note is that CATL's stock price is just one path, and thus, there is a need to find a powerful generative model capable of producing more possible realizations in a probabilistic sense, thereby providing more valuable information for predicting stock prices. For this purpose, to address this, we propose a novel method, hereafter referred to as extended fNSDE-Net.
2.1. Extended fNSDE-Net
The extended fNSDE-Net consists of two main components called the generator and the discriminator. The generator is formulated by an NSDE driven by a fBm with a Hurst index of
which is used to generate multiple latent paths with long-range dependency. The discriminator is represented as a neural controlled differential equation (NCDE) by using the Riemann-Stieltjes measure of the generator output, aiming to provide a score of how close the generated path is to the true stock price path. Particularly, the self-attention module is embedded in the generator to efficiently model relationships between widely separated data. It is followed by a description of the generator.
Generator. Given a time horizon T>0, let Ytrue(t) denote the closing price of CATL stock, t∈[0,T]. Ytrue(t) is what we wish to approximate, and it is a continuous stochastic process. Let
be a w-dimensional fBm with Hurst index
Let γ be a random variable of v-dimensional following standard fBm. Set
where ζθ, μθ, σθ, αθ, and βθ are all neural networks with a common parameter θ. The hyperparameter x is used to describe the size of the hidden state dimension. Now, define fNSDE as below:
where
is a latent variable such that each generative path of Y(t) can approximate Ytrue in some probabilistic sense. In fact, (2.3) is an extension of the NSDE generator shown in the paper [5, (3)]. However, this extension is non-trivial because we change the noise driving mechanism from pure Brownian motion to fBm so that (2.3) can generate a time series with long-range dependency, therefore better simulating the target path of CATL's closing stock price.
Remark 2.1. The existence and uniqueness of the latent variable X(t) can be guaranteed by imposing some mild conditions, such as the global-Lipschitz condition, on drift and diffusion terms for SDE, see; e.g., [4]. Particularly, Hayashi et al. have proven in [4, Theorem V.1] that when
μθ and σθ have multi-layer perceptron (MLP) network structures with the tanh activation function; the strong solution X(t) is existence and uniqueness. Moreover, the globally Lipschitz network (drift term and diffusion term) in (2.3) ensures that the price dynamics do not exhibit wild, unbounded behavior. It guarantees that the model behaves smoothly, preventing the model from overfitting to noise or large market swings [27], thus providing more reliable and stable predictions for stock prices.
In order to further efficiently simulate relationships between widely separated time points, we additionally incorporate the self-attention module into the generator.
Self-attention. Let
with n being a positive integer number. Denote by
the value of Y(t) at time point
where Y(ti)∈Rm. Let W1,W2∈Rm×m be the learned weight matrices. Define
Set
where τj,i stands for the extent to which the model attends to the i-th time point when generating Y(tj) [28]. In other words, τj,i represents an attention weight coefficient between ti and tj, i,j=1,⋯,n. Let
with W3,W4∈Rm×m denoting the other two learned weight matrices. Define the output of the attention layer by
with
Next, we aim to combine this output with Y(ti) to increase global relations. Precisely denote the final output by
with
where ϵ is a learnable scalar parameter and it is initialized as 0. It is notable that introducing the learnable ϵ allows the network to first rely on the information in the local neighborhood (because this is easier), and then gradually learns to assign more weights of non-local cues.
Define the continuous version of O by
where the initial value
Each sample path of O(t) is infinite dimensional. O(t) will be regarded as inputs to the discriminator. We now briefly describe discriminator.
Discriminator. Let
where ξρ,fρ,gρ,qρ are all global-Lipschitz neural networks parameterised collectively by ρ, and the dimension h is a hyperparameter representing the size of hidden state. The discriminator is expressed as another NSDE of the form: for H(t): [0,T]→Rh,
where gρ(t,H(t))dO(t) denotes the Riemann-Stieltjes integral related to O(t). The value D∈R is a function of the terminal hidden state H(T). It is the discriminator's score for real versus fake. In order to intuitively see how the extended fNSDE-Net works, we present its flowchart in Figure 3.
Remark 2.2. The well-posedness of SDE shown in (2.4) can be guaranteed since fρ and gρ are global Lipschitz [5]. The construction of the discriminator (2.4) is inspired by the formulation of NCDE shown in [29] with respect to the control O(t). Especially, we mention that the discriminator (2.4) is driven by fBm, which is different from the classical discriminator [5, (5)] driven by pure Brownian motion.
With the model specified, we are now in a position to focus on the data training. Suppose we observe an irregularly sampled time series
from Ytrue. Denote the linear interpolation of y by ˆy(t) such that
and define ˆH(t) by following SDE:
where gρ(t,ˆH(t))dˆy(t) is the Riemann-Stieltjes integral, depending on the regularity of ˆy(t).
The training loss function applied in this work is the Wasserstein loss, which is a common loss function in GAN, see, e.g., [30,31]. Let
be the overall action of the generator with self-attention. Let
represent the overall action of the discriminator. Similarly, let
denote the overall action of the system (2.5). Then the generator is optimized by
and the discriminator is optimized by
where E[⋅] stands for the expection. Training is performed via stochastic gradient descent [32] techniques as usual.
3.
Numerical experiments
This section is devoted to comparing the extended fNSDE-Net with several other network models through numerical experiments to highlight the superior performance of the proposed method in terms of fitting and generalization. Before doing this, we first normalize the observed data y using the MinMaxScaler provided by the Sciket-learn preprocessing library to avoid excessively large and unstable weight value distributions during the calculation process. Specifically, we define yscaled according to y as below:
The MinMax scaler is chosen here because it preserves the shape of the dataset without introducing any distortion, and Tee et al. [33] have employed this technique to normalize the data.
We select the stock closing prices of CATL from May 28, 2023, to May 28, 2024, a total of 242 data points, as observations. Therefore,
Moreover, considering the timeliness of stocks, we intend to make short-term predictions of stock prices and use the last small part of the dataset as a testing set to demonstrate the generalization performance of the proposed method. Specifically, we take the first J1:=236 data for training and the remaining J2:=6 data for testing. Define
and
as the numerical solutions generated by the extended fNSDE-Net corresponding to the k-th sample of the training set and the testing set, respectively, where k=1,⋯,M with M being the total number of generated sample paths. Let the sample mean of xktrain(⋅) and xktest(⋅) at the j-th time point be ˉxtrain(j) and ˉxtest(j), where ˉxtrain(j) and ˉxtest(j) are defined by
The relative fitting error and generalization error are expressed as follows:
A series of numerical experiments are conducted, including ablation studies and comparisons with the classical NSDE-GAN method originally proposed by Kidger et al. [5] to show the effectiveness of our model. The more details on hyperparameter settings and implementation of our method, see the code available at https://github.com/JHUNAI/EfNSDE.
Table 2 presents the numerical results of fitting error ef and generalization error eg for different models.
Among these models, (1) represents our proposed model; (2) is the model developed by Hayashi et al. [4], which differs from model (1) in that it does not include a self-attention module; (3) is another model that differs from ours in that it uses standard Brownian motion instead of the fBm applied in our model; and (4) is the pure NSDE-GAN model proposed by Kidger et al. [5]. Based on the numerical results shown in Table 2, we can draw the following conclusions:
ⅰ) Compared with models (2) and (4), our model demonstrates clear advantages in terms of both fitting error and generalization error. Specifically, for a given sample size, the calculation error of our model is smaller.
ⅱ) Compared with model (3), although our model is slightly inferior in terms of fitting error, it shows significant superiority in the calculation of generalization error. In other words, the generalization error computed using model (1) is notably smaller than that obtained from model (3).
Also shown in Figure 4 are 10 sample trajectories generated by different models to intuitively illustrate the fitting and generalization performance of each model. Next, we further investigate the fitting and generalization error performance of different models from the perspective of strong and weak errors. These two errors will be calculated using the last time points of the training and testing sets as examples. The strong and weak errors of fitting and generalization are defined as follows:
Note that the weak error is related to the approximation of the probability law of the solution, while the strong error measures the deviation from the solution trajectory. Table 3 presents the strong and weak error behaviors of different models, where models (1)–(4) are consistent with those in Table 2. From Table 3, the following conclusions can be drawn:
ⅰ) Compared with models (2) and (4), our model (i.e., model (1)) exhibits smaller fitting and generalization errors in both strong and weak senses, indicating that our model has higher computational accuracy.
ⅱ) Although the strong and weak fitting error calculations of model (3) are slightly better than those of our proposed model (i.e., model (1)), its strong and weak generalization error calculations are significantly worse. Therefore, our proposed model shows a clear overall advantage.
This once again demonstrates the effectiveness of the extended fNSDE-Net.
4.
Bermuda option pricing
This section aims at valuing a vanilla Bermuda call option based on the observed information y of CATL's stock price and the simulated paths in the testing set
Note that the Bermuda option is halfway between American and European options, and it can be exercised at a set of prescribed dates within the horizon. The main purpose of this section is to provide an effective way for pricing Bermuda options derived from CATL stock prices. The principal idea is motivated by the LSM method proposed by Longstaff et al. [6], which is a dynamic programming method based on iteratively selecting an optimal policy to estimate a continuation value that determines the optimal exercise strategy. However, unlike [6], our underlying asset prices (i.e., the simulated stock price paths) are not driven by a geometric Brownian motion (GBM), but are instead computed by a new method proposed in this paper. Next, we intend to use the simulated stock price data in the testing set to value the Bermuda option at the time (denoted as t0) corresponding to the observation value yj=238, and use this as a numerical example to describe the pricing procedure in detail.
Denote the time stamp corresponding to the observed data yj=239,⋯,yj=242 by t1,⋯,t4. Let the strike price K:=200, interest rate r:=0.5. For the sake of clarity, we take 10 sample paths as an example to demonstrate the pricing algorithm. Let Sji be the stock price on the i-th sample path generated at time j, where i=1,⋯,10, j=t1,⋯,t4. Suppose that the exercise waiting time is an equidistant interval
In Table 4, we present the simulated values of CATL stock price from t1t4 using the extended fNSDE-Net.
According to the LSM pricing rule [6], at the final exercise date, t4, the option should be exercised if it is in-the-money. However, before the final exercise date, the optimal strategy is to compare the immediate intrinsic value with the expected cash flow from the continuation value, and exercise the option if the intrinsic value is larger.
The cashflow at time t4 is calculated as (K−St4i), for i=1,⋯,10, and the relevant results are presented in Table 5.
The fitting data for computing intrinsic value and continuation value at time t3 are shown in Table 6, where Rji represents the discounted cashflow on the i-th sample path generated at time j, and the multiplier e−rδt is used as the discount factor.
The intrinsic value at time t3 is calculated as (K−St3i). The continuation value for each sample path at time t3 is computed using the least squares method with power polynomial basis functions, based on St3i and the cashflow at t4. It is worth noting that Samanez et al. [34] demonstrated that power polynomials provide better approximation performance than other types of basis functions in the LSM method. Additionally, note that the 9-th path does not need to be considered, as it is out-of-the-money (i.e., St39−K<0). The calculated data for the continuation value and intrinsic value at t3 are shown in Table 7, and the cashflow can be updated by comparing these two values, as shown in Table 8.
Repeating the backward process like t4 to t3 and comparing the continuation as well as intrinsic values at time t2 and t1 (see Tables 9–13 for intermediate calculation results) allow obtaining the cashflow matrix shown in Table 14 and the optimal stopping time on each sample path; see Table 15. Now, the intrinsic value at time t0 is computed by
and the continuation value is calculated via Table 14, i.e., this continuation value is
Hence, the option valuation at time t0 is
In the pricing process described above, we used the LSM pricing framework with extended fNSDE-Net to price a vanilla Bermuda call option. Next, we conduct numerical experiments to compare the reliability of the option price estimates obtained using the LSM with NSDE-GAN against those obtained using our method under different sample sizes. Specifically, we compare the standard deviations (SD) and the lengths of the confidence intervals (CI) for the option price estimates.
Table 16 presents the numerical comparison results, from which it is evident that, for a given sample size, the standard deviation of the estimator calculated using LSM with extended fNSDE-Net (our method) is smaller than that obtained using LSM with NSDE-GAN method. This indicates that our method is more stable. Furthermore, the length of the confidence interval of the option estimator calculated by our method is shorter than that of the LSM with NSDE-GAN method, suggesting that the estimator obtained by our method is more reliable.
5.
Conclusions
This paper proposes a new numerical method called extended fNSDE-Net, which is based on GANs with self-attention modules and fNSDE to generate and predict CATL's stock price. This new method not only generates multiple sample paths for the initial input in the sense of probability but also retains the long-term memory characteristics of the generated samples. Numerical experiments show the effectiveness of this new method. Additionally, we calculated the price of Bermuda options derived from stock price paths using the LSM method. The detailed pricing process is demonstrated, and the effectiveness of our pricing method is validated through comparative experiments. Regarding the limitations of this paper, we assume that the drift and diffusion terms of both the generator NSDE and the discriminator NSDE are globally Lipschitz continuous. However, this assumption is made primarily for the convenience of proving the well-posedness of the corresponding NSDE. If we replace the global Lipschitz condition with a weaker, non-global Lipschitz condition, the corresponding NSDE could potentially represent more complex dynamic models. However, this would also complicate the theoretical analysis, which is a topic for our future research.
Author contributions
Xiao Qi: writing-original draft, data curation, formal analysis, investigation, methodology, validation; Tianyao Duan: software, methodology, data curation; Lihua Wang: conceptualization, supervision, methodology, resources; Huan Guo: conceptualization, formal analysis, investigation, methodology, project administration, resources, supervision, writing-review & editing. All authors have read and agreed to the published version of the manuscript.
Use of Generative-AI tools declaration
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
Acknowledgments
This work was supported by the National Natural Science Foundation of China under Grant No. 71601085, the Key Research and Development Project of Hubei Province, China under Grant 2020BCA084, the Research Fund of Jianghan University under Grant No. 2024JCYJ04, and the Ministry of Education Humanities and Social Sciences Research Project in China under Grant No. 23YJCZH062.
Conflict of interest
The authors declare no competing interests.