In the realm of option pricing, parametric models originating from the Black-Scholes-Merton framework have proven extremely persistent. However, machine learning models have recently entered the field with success, arguably due to their flexible and non-parametric nature. A combined LSTM-MLP deep learning architecture that combines time series data with cross-sectional pricing information, avoiding explicit volatility estimates, has recently been proposed. This LSTM-MLP model outperforms relevant benchmarks in different dimensions. In this research, we investigated whether a transformer-based alternative is able to better capture the inter-temporal characteristics of the data than the LSTM-based LSTM-MLP model. We found that although the transformer performs better during the extreme market conditions of COVID-19, the LSTM-MLP architecture is overall superior.
Citation: Boye A. Høverstad, Morten Risstad, Lavrans K. Sagen. Transformers vs. LSTM-MLP for option pricing[J]. AIMS Mathematics, 2025, 10(11): 27152-27170. doi: 10.3934/math.20251193
In the realm of option pricing, parametric models originating from the Black-Scholes-Merton framework have proven extremely persistent. However, machine learning models have recently entered the field with success, arguably due to their flexible and non-parametric nature. A combined LSTM-MLP deep learning architecture that combines time series data with cross-sectional pricing information, avoiding explicit volatility estimates, has recently been proposed. This LSTM-MLP model outperforms relevant benchmarks in different dimensions. In this research, we investigated whether a transformer-based alternative is able to better capture the inter-temporal characteristics of the data than the LSTM-based LSTM-MLP model. We found that although the transformer performs better during the extreme market conditions of COVID-19, the LSTM-MLP architecture is overall superior.
| [1] |
F. Black, M. Scholes, The valuation of option contracts and a test of market efficiency, J. Financ., 27 (1972), 399. https://doi.org/10.2307/2978484 doi: 10.2307/2978484
|
| [2] |
R. C. Merton, Option pricing when underlying stock returns are discontinuous, J. Financ. Econ., 3 (1976), 125–144. https://doi.org/10.1016/0304-405X(76)90022-2 doi: 10.1016/0304-405X(76)90022-2
|
| [3] |
J. Hull, A. White, The pricing of options on assets with stochastic volatilities, J. Financ., 42 (1987), 281–300. https://doi.org/10.1111/j.1540-6261.1987.tb02568.x doi: 10.1111/j.1540-6261.1987.tb02568.x
|
| [4] |
S. L. Heston, A closed-form solution for options with stochastic volatility with applications to bond and currency options, Rev. Financ. Stud., 6 (1993), 327–343. https://doi.org/10.1093/rfs/6.2.327 doi: 10.1093/rfs/6.2.327
|
| [5] |
P. C. Andreou, C. Charalambous, S. H. Martzoukos, Pricing and trading European options by combining artificial neural networks and parametric models with implied parameters, Eur. J. Oper. Res., 185 (2008), 1415–1433. https://doi.org/10.1016/j.ejor.2005.03.081 doi: 10.1016/j.ejor.2005.03.081
|
| [6] | R. Culkin, R. D. Das, Machine learning in finance: The case of deep learning for option pricing, J. Invest. Manag., 15 (2017), 1–9. |
| [7] |
Y. Cao, X. Liu, J. Zhai, Option valuation under no-arbitrage constraints with neural networks, Eur. J. Oper. Res., 293 (2021), 361–374. https://doi.org/10.1016/j.ejor.2020.12.003 doi: 10.1016/j.ejor.2020.12.003
|
| [8] | L. Liang, X. Cai, Time-sequencing European options and pricing with deep learning—Analyzing based on interpretable ALE method, Expert Syst. Appl., 187 (2022). https://doi.org/10.1016/j.eswa.2021.115951 |
| [9] |
R. Pimentel, M. Risstad, S. Rogde, E. S. Rygg, J. Vinje, S. Westgaard, et al., Option pricing with deep learning: A long short-term memory approach, Decis. Econ. Financ., 1 (2025), 1–32. https://doi.org/10.1007/s10203-025-00518-9 doi: 10.1007/s10203-025-00518-9
|
| [10] |
J. Vinje, E. S. Rygg, C. Wu, M. Risstad, R. Pimentel, S. Westgaard, et al., Merged LSTM-MLP for option valuation, Quant. Financ., 1 (2025), 1–16. https://doi.org/10.1080/14697688.2025.2493965 doi: 10.1080/14697688.2025.2493965
|
| [11] | D. Bahdanau, K. Cho, Y. Bengio, Neural machine translation by jointly learning to align and translate, arXiv preprint, 2014. |
| [12] | A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, et al., Attention is all you need, Adv. Neur. Inform. Process. Syst., 2017. |
| [13] | Q. Wen, T. Zhou, C. Zhang, W. Chen, Z. Ma, J. Yan, et al., Transformers in time series: A survey, arXiv preprint, 2023. https://doi.org/10.48550/arXiv.2202.07125 |
| [14] |
B. Lim, S. Ö. Arık, N. Loeff, T. Pfister, Temporal fusion transformers for interpretable multi-horizon time series forecasting, Int. J. Forecasting, 37 (2021), 1748–1764. https://doi.org/10.1016/j.ijforecast.2021.03.012 doi: 10.1016/j.ijforecast.2021.03.012
|
| [15] | J. L. Ba, J. R. Kiros, G. E. Hinton, Layer normalization, arXiv preprint, 2016. https://doi.org/10.48550/arXiv.1607.06450 |
| [16] | J. Bergstra, Y. Bengio, Random search for hyper-parameter optimization, J. Mach. Learn. Res., 13 (2012), 281–305. |
| [17] | Y. N. Dauphin, A. Fan, M. Auli, D. Grangier, Language modeling with gated convolutional networks, In: Proceedings of the 34 th International Conference on Machine Learning, Sydney, Australia, 2017. |
| [18] | H. Wu, J. Xu, J. Wang, M. Long, Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting, In: Advances in Neural Information Processing Systems 34 (NeurIPS 2021), 2021. |
| [19] | S. Eggen, T. J. Espe, K. Grude, M. Risstad, R. Sandberg, Financial time series uncertainty: A review of probabilistic AI applications, J. Econ. Surv., 2025. https://doi.org/10.1111/joes.70018 |