Automated regime classification in multidimensional time series data using sliced Wasserstein k-means clustering

Qinmeng Luan; James Hamp; Qinmeng Luan; James Hamp

doi:10.3934/DSFE.2025016

Data Science in Finance and Economics

2025, Volume 5, Issue 3: 387-418. doi: 10.3934/DSFE.2025016

Previous Article Next Article

Research article

Automated regime classification in multidimensional time series data using sliced Wasserstein k-means clustering

Qinmeng Luan ¹,
James Hamp ^{1,2
,
,}

1.
Citigroup, London, UK
2.
Data Science Institute, London School of Economics, London, UK

Received: 25 November 2024 Revised: 16 June 2025 Accepted: 23 July 2025 Published: 29 August 2025
JEL Codes: C14, C38, C55, C58, C63, G17

Recent work has proposed Wasserstein k-means (Wk-means) clustering as a powerful method to classify regimes in time series data, and one-dimensional asset returns in particular. In this paper, we begin by studying in detail the behaviour of the Wasserstein k-means clustering algorithm applied to synthetic one-dimensional time series data. We extend the previous work by studying, in detail, the dynamics of the clustering algorithm and how varying the hyperparameters impacts the performance over different random initialisations. We compute simple metrics that we find to be useful in identifying high-quality clusterings. We then extend the technique of Wasserstein k-means clustering to multidimensional time series data by approximating the multidimensional Wasserstein distance as a sliced Wasserstein distance, resulting in a method we call 'sliced Wasserstein k-means (sWk-means) clustering'. We apply the sWk-means clustering method to the problem of automated regime classification in multidimensional time series data, using synthetic data to demonstrate the validity and effectiveness of the approach. Finally, we show that the sWk-means method is able to identify distinct market regimes in real multidimensional financial time series, using publicly available foreign exchange spot rate data as a case study. We conclude with remarks about some limitations of our approach and potential complementary or alternative approaches.
- time series,
- regime classification,
- market regimes,
- Wasserstein metric,
- unsupervised learning
Citation: Qinmeng Luan, James Hamp. Automated regime classification in multidimensional time series data using sliced Wasserstein k-means clustering[J]. Data Science in Finance and Economics, 2025, 5(3): 387-418. doi: 10.3934/DSFE.2025016

Related Papers:

Abstract

Recent work has proposed Wasserstein k-means (Wk-means) clustering as a powerful method to classify regimes in time series data, and one-dimensional asset returns in particular. In this paper, we begin by studying in detail the behaviour of the Wasserstein k-means clustering algorithm applied to synthetic one-dimensional time series data. We extend the previous work by studying, in detail, the dynamics of the clustering algorithm and how varying the hyperparameters impacts the performance over different random initialisations. We compute simple metrics that we find to be useful in identifying high-quality clusterings. We then extend the technique of Wasserstein k-means clustering to multidimensional time series data by approximating the multidimensional Wasserstein distance as a sliced Wasserstein distance, resulting in a method we call 'sliced Wasserstein k-means (sWk-means) clustering'. We apply the sWk-means clustering method to the problem of automated regime classification in multidimensional time series data, using synthetic data to demonstrate the validity and effectiveness of the approach. Finally, we show that the sWk-means method is able to identify distinct market regimes in real multidimensional financial time series, using publicly available foreign exchange spot rate data as a case study. We conclude with remarks about some limitations of our approach and potential complementary or alternative approaches.

References

[1]	Bai J, Perron P (2003) Computation and analysis of multiple structural change models. J Appl Econometrics 18: 1–22. https://doi.org/10.1002/jae.659 doi: 10.1002/jae.659
[2]	Bilokon P, Jacquier A, McIndoe C (2021) Market regime classification with signatures. arXiv preprint. https://doi.org/10.48550/arXiv.2107.00066
[3]	Blazsek S, Escribano A, Kristof E (2024) Global, Arctic, and Antarctic sea ice volume predictions using score-driven threshold climate models. Energ Econ 134: 107591. https://doi.org/10.1016/j.eneco.2024.107591 doi: 10.1016/j.eneco.2024.107591
[4]	Bobkov SG, Ledoux M (2019) One-dimensional empirical measures, order statistics, and Kantorovich transport distances 261: 1259. American Mathematical Society. https://doi.org/10.1090/memo/1259
[5]	Bonneel N, Pfister H (2013) Sliced Wasserstein barycenter of multiple densities (Tech. Rep. No. TR-05-13). Harvard University. Available from: https://dash.harvard.edu/entities/publication/73120378-ebd3-6bd4-e053-0100007fdf3b.
[6]	Bonneel N, Rabin J, Peyré G, et al. (2015) Sliced and Radon Wasserstein barycenters of measures. J Math Imaging Vis 51: 22–45. https://doi.org/10.1007/s10851-014-0506-3 doi: 10.1007/s10851-014-0506-3
[7]	Dasgupta S (1999) Learning mixtures of Gaussians. In: Proceedings of the 40th annual symposium on foundations of computer science (FOCS '99), 634–644. IEEE Computer Society. https://doi.org/10.1109/SFCS.1999.814639
[8]	Hartigan JA (1975) Clustering algorithms. John Wiley & Sons.
[9]	Hendry DF, Doornik JA (2014) Empirical model discovery and theory evaluation: automatic selection methods in econometrics. The MIT Press. https://doi.org/10.7551/mitpress/9780262028356.001.0001
[10]	Horvath B, Issa Z, Muguruza A (2024) Clustering Market Regimes using the Wasserstein Distance. J Computat Financ 28: 1–39. https://doi.org/10.21314/JCF.2024.005 doi: 10.21314/JCF.2024.005
[11]	Issa Z, Horvath B (2023) Non-parametric online market regime detection and regime clustering for multidimensional and path-dependent data structures. arXiv preprint. https://doi.org/10.48550/arXiv.2306.15835
[12]	Kanungo T, Mount DM, Netanyahu, et al. (2002) An efficient k-means clustering algorithm: Analysis and implementation. Ieee T Pattern Anal Mach Intell 24: 881–892. https://doi.org/10.1109/TPAMI.2002.1017616 doi: 10.1109/TPAMI.2002.1017616
[13]	Kidger P, Bonnier P, Perez Arribas I, et al. (2019) Deep signature transforms. In: H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché Buc, E. Fox, R. Garnett (Eds.), Advances in Neural Information Processing Systems 32 (NeurIPS 2019), Curran Associates, Inc, 3082–3092. Available from: https://papers.nips.cc/paper_files/paper/2019/hash/d2cdf047a6674cef251d56544a3cf029-Abstract.html.
[14]	Panaretos VM, Zemel Y (2019) Statistical aspects of Wasserstein distances. Annu Rev Stat Appl 6: 405–431. https://doi.org/10.1146/annurev-statistics-030718-104938 doi: 10.1146/annurev-statistics-030718-104938
[15]	Pedregosa F, Varoquaux G, Gramfort A, et al. (2011) Scikit-learn: Machine Learning in Python. J Mach Learn Research 12: 2825–2830. Available from: http://www.jmlr.org/papers/v12/pedregosa11a.html.
[16]	Peyré G, Cuturi M (2019) Computational optimal transport. Found Trends Mach Learn11: 355–607. https://doi.org/10.1561/2200000073 doi: 10.1561/2200000073
[17]	Rabin J, Peyré G, Delon J, et al. (2012) Wasserstein barycenter and its application to texture mixing. In: Scale Space and Variational Methods in Computer Vision, 6667: 435–446. https://doi.org/10.1007/978-3-642-24785-9_37
[18]	Stromme AJ (2020) Wasserstein barycenters: statistics and optimization, Massachusetts Institute of Technology. Available from: https://dspace.mit.edu/handle/1721.1/127364.
[19]	Villani C (2009) Optimal transport: old and new, 338. Springer. https://doi.org/10.1007/978-3-540-71050-9
[20]	You K, Shung DL, Giuffrè M (2024) On the Wasserstein median of probability measures. J Comput Graph Stat 33: 253–266. https://doi.org/10.1080/10618600.2024.2374580 doi: 10.1080/10618600.2024.2374580

Reader Comments

Your name:*

Email:*
© 2025 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)