ESG-XAI: An explainable unsupervised feature selection pipeline for ESG datasets

Tristan Lim; Zhiyuan Wang; Yun Teng; Tristan Lim; Zhiyuan Wang; Yun Teng

doi:10.3934/DSFE.2026010

Data Science in Finance and Economics

2026, Volume 6, Issue 2: 277-314. doi: 10.3934/DSFE.2026010

Previous Article Next Article

Research article

ESG-XAI: An explainable unsupervised feature selection pipeline for ESG datasets

1.
SUSS Academy, Singapore University of Social Sciences, Singapore
2.
School of Business, Singapore University of Social Sciences, Singapore

Received: 08 September 2025 Revised: 22 December 2025 Accepted: 13 March 2026 Published: 10 May 2026
JEL Codes: C38, G34, M14, Q56

Problem
Environmental, Social, and Governance (ESG) panels are difficult to model, as variables live on mixed scales, redundancy within silos is high, cross-silo alignment is weak, and labels suitable for supervision are scarce or disputed.
Method
We present a deployment-ready, end-to-end workflow that turns such disclosure matrices into train-only labels for evaluation, out-of-sample performance estimates, and audit-ready explanations that trace to ESG indicators. For the system, we separated three planes, Discovery, Prediction, and Explanation, with strict train/test boundaries. Discovery built a balanced multi-view latent (per-silo PCA-fusion) and used evidence accumulation with stability rule to choose the number of segments (minimizing Proportion of Ambiguous Clustering). When global discovery was weak, an evaluation-feasibility guard constructed label-agnostic, stratified base splits and deferred labeling to training folds only. Prediction adopted nested labeling with nearest-centroid assignment to test, closing common leakage paths. Explanation was fold-local; we selected a balanced set of features per silo on the training fold and computed SHAP only on held-out tests.
Results
On a 2024 ESG panel (57 firms; 61 indicators: E = 15, S = 25, G = 21), 2D linear discriminants (PCA/NCA-LDA) attained balanced accuracy ≈ 0.917, with low dispersion under strict nested evaluation, outperforming higher-capacity baselines. Performance metrics quantified the out-of-sample consistency and generalization of discovered latent structure, remaining stable even when raw cluster assignments varied. Fold-local explanations with human-interpretable drivers indicated social variables as dominant contributors, followed by environmental and governance variables.
Contributions
A portable, auditable ESG analytics pipeline enables evaluation when discovery is weak, reduces model risk, and delivers transparent outputs suitable for ESG use cases in small-$ n $ settings.
- ESG analytics,
- consensus clustering,
- stability-based model selection,
- nested evaluation,
- explainable AI (XAI)
Citation: Tristan Lim, Zhiyuan Wang, Yun Teng. ESG-XAI: An explainable unsupervised feature selection pipeline for ESG datasets[J]. Data Science in Finance and Economics, 2026, 6(2): 277-314. doi: 10.3934/DSFE.2026010

Related Papers:

Abstract

Problem

Environmental, Social, and Governance (ESG) panels are difficult to model, as variables live on mixed scales, redundancy within silos is high, cross-silo alignment is weak, and labels suitable for supervision are scarce or disputed.

Method

We present a deployment-ready, end-to-end workflow that turns such disclosure matrices into train-only labels for evaluation, out-of-sample performance estimates, and audit-ready explanations that trace to ESG indicators. For the system, we separated three planes, Discovery, Prediction, and Explanation, with strict train/test boundaries. Discovery built a balanced multi-view latent (per-silo PCA-fusion) and used evidence accumulation with stability rule to choose the number of segments (minimizing Proportion of Ambiguous Clustering). When global discovery was weak, an evaluation-feasibility guard constructed label-agnostic, stratified base splits and deferred labeling to training folds only. Prediction adopted nested labeling with nearest-centroid assignment to test, closing common leakage paths. Explanation was fold-local; we selected a balanced set of features per silo on the training fold and computed SHAP only on held-out tests.

Results

On a 2024 ESG panel (57 firms; 61 indicators: E = 15, S = 25, G = 21), 2D linear discriminants (PCA/NCA-LDA) attained balanced accuracy ≈ 0.917, with low dispersion under strict nested evaluation, outperforming higher-capacity baselines. Performance metrics quantified the out-of-sample consistency and generalization of discovered latent structure, remaining stable even when raw cluster assignments varied. Fold-local explanations with human-interpretable drivers indicated social variables as dominant contributors, followed by environmental and governance variables.

Contributions

A portable, auditable ESG analytics pipeline enables evaluation when discovery is weak, reduces model risk, and delivers transparent outputs suitable for ESG use cases in small-$ n $ settings.

References

[1]	Adebayo J, Gilmer J, Muelly M, et al. (2018) Sanity checks for saliency maps. Adv Neur Inf Proces Syst 31.
[2]	Barber RF, Candès EJ (2015) Controlling the false discovery rate via knockoffs. Ann Stat 43: 2055–2085. Available from: https://www.jstor.org/stable/43818570.
[3]	Berg F, Kölbel J F, Rigobon R. (2022) Aggregate confusion: The divergence of ESG ratings. Rev Financ 26: 1315–1344. https://doi.org/10.1093/rof/rfac033 doi: 10.1093/rof/rfac033
[4]	Billio M, Fitzpatrick AC, Latino C, et al. (2024) Unpacking the ESG ratings: Does one size fit all? (No. 415). SAFE Working Paper.https://doi.org/10.2139/ssrn.4742445
[5]	Bishop CM (2006) Pattern Recognition and Machine Learning. Springer.
[6]	Blank H, Sgambati G, Truelson Z (2016) Best practices in ESG investing. J Invest 25: 103–112. https://doi.org/10.3905/joi.2016.25.2.103 doi: 10.3905/joi.2016.25.2.103
[7]	Candes E, Fan Y, Janson L, et al. (2018) Panning for gold: 'model-X'knockoffs for high dimensional controlled variable selection. J R Stat Soc B 80: 551–577. https://doi.org/10.1111/rssb.12265 doi: 10.1111/rssb.12265
[8]	Cawley GC, Talbot NL (2010) On over-fitting in model selection and subsequent selection bias in performance evaluation. J Mach Learn Res 11: 2079–2107. Available from: https://www.jmlr.org/papers/volume11/cawley10a/cawley10a.pdf.
[9]	Chen Y, Yang Y (2021) The one standard error rule for model selection: Does it work? Stats 4: 868–892. https://doi.org/10.3390/stats4040051 doi: 10.3390/stats4040051
[10]	Cover T, Hart P (1967) Nearest neighbor pattern classification. Ieee T Inform Theory 13: 21–27. https://doi.org/10.1109/TIT.1967.1053964 doi: 10.1109/TIT.1967.1053964
[11]	Covert I, Lundberg S, Lee SI (2021) Explaining by removing: A unified framework for model explanation. J Mach Learn Res 22: 1–90. Available from: https://www.jmlr.org/papers/v22/20-1316.html.
[12]	Demartini P, Pagliei C (2023) Can we trust ESG ratings? Some insights based on a bibliometric analysis of ESG data quality and rating reliability. Manag Control, Supplemento 2: 161–187. Available from: https://www.torrossa.com/en/resources/an/5657265.
[13]	Dormann CF, Elith J, Bacher S, et al. (2013) Collinearity: a review of methods to deal with it and a simulation study evaluating their performance. Ecography 36: 27–46. https://doi.org/10.1111/j.1600-0587.2012.07348.x doi: 10.1111/j.1600-0587.2012.07348.x
[14]	Doshi-Velez F, Kim B (2017) Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv: 1702.08608.https://doi.org/10.48550/arXiv.1702.08608
[15]	Fred AL, Jain AK (2005) Combining multiple clusterings using evidence accumulation. Ieee T Pattern Anal Machi Intell 27: 835–850. https://doi.org/10.1109/TPAMI.2005.113 doi: 10.1109/TPAMI.2005.113
[16]	Golalipour K, Akbari E, Hamidi SS, et al. (2021) From clustering to clustering ensemble selection: A review. Eng Appl Artif Intell 104: 104388. https://doi.org/10.1016/j.engappai.2021.104388 doi: 10.1016/j.engappai.2021.104388
[17]	Goldberger J, Hinton GE, Roweis S, et al. (2004) Neighbourhood components analysis. Adv Neural Inform Process Syst 17. Available from: https://proceedings.neurips.cc/paper_files/paper/2004/file/42fe880812925e520249e808937738d2-Paper.pdf.
[18]	Guignard F, Ginsbourger D, Levy Häner L, et al. (2024) Some combinatorics of data leakage induced by clusters. Stoch Env Res Risk Assess 38: 2815–2828. https://doi.org/10.1007/s00477-024-02715-1 doi: 10.1007/s00477-024-02715-1
[19]	Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 1157–1182. https://www.jmlr.org/papers/v3/guyon03a.html
[20]	Hao Z, Lu Z, Li G, et al. (2023) Ensemble clustering with attentional representation. Ieee T Knowl Data Eng 36: 581–593. https://doi.org/10.1109/TKDE.2023.3292573 doi: 10.1109/TKDE.2023.3292573
[21]	Hofner B, Boccuto L, Göker M (2015) Controlling false discoveries in high-dimensional situations: boosting with stability selection. BMC Bioinformatics 16: 144. https://doi.org/10.1186/s12859-015-0575-3 doi: 10.1186/s12859-015-0575-3
[22]	Jolliffe IT, Cadima J (2016) Principal component analysis: A review and recent developments. Philos T R Soc A 374: 20150202. https://doi.org/10.1098/rsta.2015.0202 doi: 10.1098/rsta.2015.0202
[23]	Kaufman S, Rosset S, Perlich C, et al. (2012) Leakage in data mining: Formulation, detection, and avoidance. Acm T Knowl Discov Data 6: 1–21. https://doi.org/10.1145/2382577.2382579 doi: 10.1145/2382577.2382579
[24]	Kapoor S, Cantrell EM, Peng K, et al. (2024) REFORMS: Consensus-based Recommendations for Machine-learning-based Science. Sci Adv 10: eadk3452. https://doi.org/10.1126/sciadv.adk3452 doi: 10.1126/sciadv.adk3452
[25]	Kapoor S, Narayanan A (2023) Leakage and the reproducibility crisis in machine-learning-based science. Patterns 4: 100804. https://doi.org/10.1016/j.patter.2023.100804 doi: 10.1016/j.patter.2023.100804
[26]	Klaaßen L, Lohmüller C, Steffen B (2024) Assessing corporate climate action: Corporate climate policies and company-level emission reductions. PLOS Clim 3: e0000458. https://doi.org/10.1371/journal.pclm.0000458 doi: 10.1371/journal.pclm.0000458
[27]	Kotsantonis S, Serafeim G (2019) Four things no one will tell you about ESG data. J Appl Corp Financ 31: 50–58. https://doi.org/10.1111/jacf.12346 doi: 10.1111/jacf.12346
[28]	Kuhn M, Johnson K (2013) Data pre-processing. In: Applied Predictive Modeling, 27-59. New York, NY: Springer New York. https://doi.org/10.1007/978-1-4614-6849-3_3
[29]	Lim T (2024) Environmental, social, and governance (ESG) and artificial intelligence in finance: State-of-the-art and research takeaways. Artif Intell Rev 57: 76. https://doi.org/10.1007/s10462-024-10708-3 doi: 10.1007/s10462-024-10708-3
[30]	Lim M, Hastie T (2015) Learning interactions via hierarchical group-lasso regularization. J Comput Graph Stat 24: 627–654. https://doi.org/10.1080/10618600.2014.938812 doi: 10.1080/10618600.2014.938812
[31]	Liu H, Chen J, Dy J, et al. (2023) Transforming complex problems into K-means solutions. Ieee T Pattern Anal Mach Intell 45: 9149–9168. https://doi.org/10.1109/TPAMI.2023.3237667 doi: 10.1109/TPAMI.2023.3237667
[32]	Liu T, Yu H, Blair RH (2022) Stability estimation for unsupervised clustering: A review. Wires Comput Stat 14: e1575. https://doi.org/10.1002/wics.1575 doi: 10.1002/wics.1575
[33]	Lock EF, Hoadley KA, Marron JS, et al. (2013) Joint and individual variation explained (JIVE) for integrated analysis of multiple data types. Ann Appl Stat, 7: 523–542. https://doi.org/10.1214/12-AOAS597 doi: 10.1214/12-AOAS597
[34]	Lucińska M, Wierzchoń ST (2012) Spectral clustering based on k-nearest neighbor graph. In: IFIP International Conference on Computer Information Systems and Industrial Management, 254–265. Berlin, Heidelberg: Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-33260-9_22
[35]	Lundberg SM, Erion G, Chen H, et al. (2020) From local explanations to global understanding with explainable AI for trees. Nat Mach Intell 2: 56–67. https://doi.org/10.1038/s42256-019-0138-9 doi: 10.1038/s42256-019-0138-9
[36]	Lundberg SM, Erion GG, Lee SI (2018) Consistent individualized feature attribution for tree ensembles. arXiv preprint. https://doi.org/10.48550/arXiv.1802.03888
[37]	Lundberg SM, Lee SI (2017) A unified approach to interpreting model predictions. Adv Neural Inf Process Syst 30. Available from: https://proceedings.neurips.cc/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf.
[38]	Meinshausen N, Bühlmann P (2010) Stability selection. J R Stat Soc B 72: 417–473. https://doi.org/10.1111/j.1467-9868.2010.00740.x doi: 10.1111/j.1467-9868.2010.00740.x
[39]	Monti S, Tamayo P, Mesirov J, et al. (2003) Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach learn 52: 91–118. https://doi.org/10.1023/A:1023949509487 doi: 10.1023/A:1023949509487
[40]	Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. Ieee T Pattern Anal Mach Intell 27: 1226–1238. https://doi.org/10.1109/TPAMI.2005.159 doi: 10.1109/TPAMI.2005.159
[41]	Qu L, Pei Y (2024) A comprehensive review on discriminant analysis for addressing challenges of class-level limitations, small sample size, and robustness. Processes 12: 1382. https://doi.org/10.3390/pr12071382 doi: 10.3390/pr12071382
[42]	Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20: 53–65. https://doi.org/10.1016/0377-0427(87)90125-7 doi: 10.1016/0377-0427(87)90125-7
[43]	Rudin C (2019) Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell 1: 206–215. https://doi.org/10.1038/s42256-019-0048-x doi: 10.1038/s42256-019-0048-x
[44]	Sasse L, Nicolaisen-Sobesky E, Dukart J, et al. (2025) Overview of leakage scenarios in supervised machine learning. J Big Data 12: 135. https://doi.org/10.1186/s40537-025-01193-8 doi: 10.1186/s40537-025-01193-8
[45]	Șenbabaoğlu Y, Michailidis G, Li JZ (2014) Critical limitations of consensus clustering in class discovery. Sci Rep 4: 6207. https://doi.org/10.1038/srep06207 doi: 10.1038/srep06207
[46]	Simon N, Friedman J, Hastie T, et al. (2013) A sparse-group lasso. J Comput Graph Stat 22: 231–245. https://doi.org/10.1080/10618600.2012.681250 doi: 10.1080/10618600.2012.681250
[47]	Smilde AK, Bro R, Geladi P (2003) Multi-Way Analysis: Applications in the Chemical Sciences. Wiley.
[48]	Smilde AK, Westerhuis JA, de Jong S (2003) A framework for sequential multiblock component methods. J Chemometr 17: 323–337. https://doi.org/10.1002/cem.811 doi: 10.1002/cem.811
[49]	Tao Z, Liu H, Li S, et al. (2017) From ensemble clustering to multi-view clustering. In: C. Sierra (Ed.), Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI-17), 2843–2849. AAAI Press. https://dl.acm.org/doi/10.5555/3172077.3172285
[50]	Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc B 63: 411–423. https://doi.org/10.1111/1467-9868.00293 doi: 10.1111/1467-9868.00293
[51]	Varma S, Simon R (2006) Bias in error estimation when using cross-validation for model selection. BMC Bioinformatics 7: 91. https://doi.org/10.1186/1471-2105-7-91 doi: 10.1186/1471-2105-7-91
[52]	Varoquaux G (2018) Cross-validation failure: Small sample sizes lead to large error bars. Neuroimage 180: 68–77. https://doi.org/10.1016/j.neuroimage.2017.06.061 doi: 10.1016/j.neuroimage.2017.06.061
[53]	Wang C, Cao L, Miao B (2013) Optimal feature selection for sparse linear discriminant analysis and its applications in gene expression data. Comput Stat Data Anal 66: 140–149. https://doi.org/10.1016/j.csda.2013.04.003 doi: 10.1016/j.csda.2013.04.003
[54]	Westerhuis JA, Kourti T, MacGregor JF (1998) Analysis of multiblock and hierarchical PCA and PLS models. J Chemometr 12: 301–321. https://doi.org/10.1002/(SICI)1099-128X(199809/10)12:5%3C301::AID-CEM515%3E3.0.CO;2-S doi: 10.1002/(SICI)1099-128X(199809/10)12:5%3C301::AID-CEM515%3E3.0.CO;2-S
[55]	Xu C, Tao D, Xu C (2013) A survey on multi-view learning. arXiv preprint. https://doi.org/10.48550/arXiv.1304.5634
[56]	Yu Z, Dong Z, Yu C, et al. (2025) A review on multi-view learning. Front Comput Sci 19: 197334. https://doi.org/10.1007/s11704-024-40004-w doi: 10.1007/s11704-024-40004-w
[57]	Zhao X, Niu X, Ma Y, et al. (2024) A multi-view ensemble clustering approach using joint entropy. Expert Syst Appl 255: 124683. https://doi.org/10.1016/j.eswa.2024.124683 doi: 10.1016/j.eswa.2024.124683
[58]	Zhou S, Xu H, Zheng Z, et al. (2024) A comprehensive survey on deep clustering: Taxonomy, challenges, and future directions. Acm Comput Surv 57: 1–38. https://doi.org/10.1145/3689036 doi: 10.1145/3689036

Reader Comments

Your name:*

Email:*
© 2026 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)