Environmental, Social, and Governance (ESG) panels are difficult to model, as variables live on mixed scales, redundancy within silos is high, cross-silo alignment is weak, and labels suitable for supervision are scarce or disputed.
We present a deployment-ready, end-to-end workflow that turns such disclosure matrices into train-only labels for evaluation, out-of-sample performance estimates, and audit-ready explanations that trace to ESG indicators. For the system, we separated three planes, Discovery, Prediction, and Explanation, with strict train/test boundaries. Discovery built a balanced multi-view latent (per-silo PCA-fusion) and used evidence accumulation with stability rule to choose the number of segments (minimizing Proportion of Ambiguous Clustering). When global discovery was weak, an evaluation-feasibility guard constructed label-agnostic, stratified base splits and deferred labeling to training folds only. Prediction adopted nested labeling with nearest-centroid assignment to test, closing common leakage paths. Explanation was fold-local; we selected a balanced set of features per silo on the training fold and computed SHAP only on held-out tests.
On a 2024 ESG panel (57 firms; 61 indicators: E = 15, S = 25, G = 21), 2D linear discriminants (PCA/NCA-LDA) attained balanced accuracy ≈ 0.917, with low dispersion under strict nested evaluation, outperforming higher-capacity baselines. Performance metrics quantified the out-of-sample consistency and generalization of discovered latent structure, remaining stable even when raw cluster assignments varied. Fold-local explanations with human-interpretable drivers indicated social variables as dominant contributors, followed by environmental and governance variables.
A portable, auditable ESG analytics pipeline enables evaluation when discovery is weak, reduces model risk, and delivers transparent outputs suitable for ESG use cases in small-$ n $ settings.
Citation: Tristan Lim, Zhiyuan Wang, Yun Teng. ESG-XAI: An explainable unsupervised feature selection pipeline for ESG datasets[J]. Data Science in Finance and Economics, 2026, 6(2): 277-314. doi: 10.3934/DSFE.2026010
Environmental, Social, and Governance (ESG) panels are difficult to model, as variables live on mixed scales, redundancy within silos is high, cross-silo alignment is weak, and labels suitable for supervision are scarce or disputed.
We present a deployment-ready, end-to-end workflow that turns such disclosure matrices into train-only labels for evaluation, out-of-sample performance estimates, and audit-ready explanations that trace to ESG indicators. For the system, we separated three planes, Discovery, Prediction, and Explanation, with strict train/test boundaries. Discovery built a balanced multi-view latent (per-silo PCA-fusion) and used evidence accumulation with stability rule to choose the number of segments (minimizing Proportion of Ambiguous Clustering). When global discovery was weak, an evaluation-feasibility guard constructed label-agnostic, stratified base splits and deferred labeling to training folds only. Prediction adopted nested labeling with nearest-centroid assignment to test, closing common leakage paths. Explanation was fold-local; we selected a balanced set of features per silo on the training fold and computed SHAP only on held-out tests.
On a 2024 ESG panel (57 firms; 61 indicators: E = 15, S = 25, G = 21), 2D linear discriminants (PCA/NCA-LDA) attained balanced accuracy ≈ 0.917, with low dispersion under strict nested evaluation, outperforming higher-capacity baselines. Performance metrics quantified the out-of-sample consistency and generalization of discovered latent structure, remaining stable even when raw cluster assignments varied. Fold-local explanations with human-interpretable drivers indicated social variables as dominant contributors, followed by environmental and governance variables.
A portable, auditable ESG analytics pipeline enables evaluation when discovery is weak, reduces model risk, and delivers transparent outputs suitable for ESG use cases in small-$ n $ settings.
| [1] | Adebayo J, Gilmer J, Muelly M, et al. (2018) Sanity checks for saliency maps. Adv Neur Inf Proces Syst 31. |
| [2] | Barber RF, Candès EJ (2015) Controlling the false discovery rate via knockoffs. Ann Stat 43: 2055–2085. Available from: https://www.jstor.org/stable/43818570. |
| [3] |
Berg F, Kölbel J F, Rigobon R. (2022) Aggregate confusion: The divergence of ESG ratings. Rev Financ 26: 1315–1344. https://doi.org/10.1093/rof/rfac033 doi: 10.1093/rof/rfac033
|
| [4] | Billio M, Fitzpatrick AC, Latino C, et al. (2024) Unpacking the ESG ratings: Does one size fit all? (No. 415). SAFE Working Paper.https://doi.org/10.2139/ssrn.4742445 |
| [5] | Bishop CM (2006) Pattern Recognition and Machine Learning. Springer. |
| [6] |
Blank H, Sgambati G, Truelson Z (2016) Best practices in ESG investing. J Invest 25: 103–112. https://doi.org/10.3905/joi.2016.25.2.103 doi: 10.3905/joi.2016.25.2.103
|
| [7] |
Candes E, Fan Y, Janson L, et al. (2018) Panning for gold: 'model-X'knockoffs for high dimensional controlled variable selection. J R Stat Soc B 80: 551–577. https://doi.org/10.1111/rssb.12265 doi: 10.1111/rssb.12265
|
| [8] | Cawley GC, Talbot NL (2010) On over-fitting in model selection and subsequent selection bias in performance evaluation. J Mach Learn Res 11: 2079–2107. Available from: https://www.jmlr.org/papers/volume11/cawley10a/cawley10a.pdf. |
| [9] |
Chen Y, Yang Y (2021) The one standard error rule for model selection: Does it work? Stats 4: 868–892. https://doi.org/10.3390/stats4040051 doi: 10.3390/stats4040051
|
| [10] |
Cover T, Hart P (1967) Nearest neighbor pattern classification. Ieee T Inform Theory 13: 21–27. https://doi.org/10.1109/TIT.1967.1053964 doi: 10.1109/TIT.1967.1053964
|
| [11] | Covert I, Lundberg S, Lee SI (2021) Explaining by removing: A unified framework for model explanation. J Mach Learn Res 22: 1–90. Available from: https://www.jmlr.org/papers/v22/20-1316.html. |
| [12] | Demartini P, Pagliei C (2023) Can we trust ESG ratings? Some insights based on a bibliometric analysis of ESG data quality and rating reliability. Manag Control, Supplemento 2: 161–187. Available from: https://www.torrossa.com/en/resources/an/5657265. |
| [13] |
Dormann CF, Elith J, Bacher S, et al. (2013) Collinearity: a review of methods to deal with it and a simulation study evaluating their performance. Ecography 36: 27–46. https://doi.org/10.1111/j.1600-0587.2012.07348.x doi: 10.1111/j.1600-0587.2012.07348.x
|
| [14] | Doshi-Velez F, Kim B (2017) Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv: 1702.08608.https://doi.org/10.48550/arXiv.1702.08608 |
| [15] |
Fred AL, Jain AK (2005) Combining multiple clusterings using evidence accumulation. Ieee T Pattern Anal Machi Intell 27: 835–850. https://doi.org/10.1109/TPAMI.2005.113 doi: 10.1109/TPAMI.2005.113
|
| [16] |
Golalipour K, Akbari E, Hamidi SS, et al. (2021) From clustering to clustering ensemble selection: A review. Eng Appl Artif Intell 104: 104388. https://doi.org/10.1016/j.engappai.2021.104388 doi: 10.1016/j.engappai.2021.104388
|
| [17] | Goldberger J, Hinton GE, Roweis S, et al. (2004) Neighbourhood components analysis. Adv Neural Inform Process Syst 17. Available from: https://proceedings.neurips.cc/paper_files/paper/2004/file/42fe880812925e520249e808937738d2-Paper.pdf. |
| [18] |
Guignard F, Ginsbourger D, Levy Häner L, et al. (2024) Some combinatorics of data leakage induced by clusters. Stoch Env Res Risk Assess 38: 2815–2828. https://doi.org/10.1007/s00477-024-02715-1 doi: 10.1007/s00477-024-02715-1
|
| [19] | Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 1157–1182. https://www.jmlr.org/papers/v3/guyon03a.html |
| [20] |
Hao Z, Lu Z, Li G, et al. (2023) Ensemble clustering with attentional representation. Ieee T Knowl Data Eng 36: 581–593. https://doi.org/10.1109/TKDE.2023.3292573 doi: 10.1109/TKDE.2023.3292573
|
| [21] |
Hofner B, Boccuto L, Göker M (2015) Controlling false discoveries in high-dimensional situations: boosting with stability selection. BMC Bioinformatics 16: 144. https://doi.org/10.1186/s12859-015-0575-3 doi: 10.1186/s12859-015-0575-3
|
| [22] |
Jolliffe IT, Cadima J (2016) Principal component analysis: A review and recent developments. Philos T R Soc A 374: 20150202. https://doi.org/10.1098/rsta.2015.0202 doi: 10.1098/rsta.2015.0202
|
| [23] |
Kaufman S, Rosset S, Perlich C, et al. (2012) Leakage in data mining: Formulation, detection, and avoidance. Acm T Knowl Discov Data 6: 1–21. https://doi.org/10.1145/2382577.2382579 doi: 10.1145/2382577.2382579
|
| [24] |
Kapoor S, Cantrell EM, Peng K, et al. (2024) REFORMS: Consensus-based Recommendations for Machine-learning-based Science. Sci Adv 10: eadk3452. https://doi.org/10.1126/sciadv.adk3452 doi: 10.1126/sciadv.adk3452
|
| [25] |
Kapoor S, Narayanan A (2023) Leakage and the reproducibility crisis in machine-learning-based science. Patterns 4: 100804. https://doi.org/10.1016/j.patter.2023.100804 doi: 10.1016/j.patter.2023.100804
|
| [26] |
Klaaßen L, Lohmüller C, Steffen B (2024) Assessing corporate climate action: Corporate climate policies and company-level emission reductions. PLOS Clim 3: e0000458. https://doi.org/10.1371/journal.pclm.0000458 doi: 10.1371/journal.pclm.0000458
|
| [27] |
Kotsantonis S, Serafeim G (2019) Four things no one will tell you about ESG data. J Appl Corp Financ 31: 50–58. https://doi.org/10.1111/jacf.12346 doi: 10.1111/jacf.12346
|
| [28] | Kuhn M, Johnson K (2013) Data pre-processing. In: Applied Predictive Modeling, 27-59. New York, NY: Springer New York. https://doi.org/10.1007/978-1-4614-6849-3_3 |
| [29] |
Lim T (2024) Environmental, social, and governance (ESG) and artificial intelligence in finance: State-of-the-art and research takeaways. Artif Intell Rev 57: 76. https://doi.org/10.1007/s10462-024-10708-3 doi: 10.1007/s10462-024-10708-3
|
| [30] |
Lim M, Hastie T (2015) Learning interactions via hierarchical group-lasso regularization. J Comput Graph Stat 24: 627–654. https://doi.org/10.1080/10618600.2014.938812 doi: 10.1080/10618600.2014.938812
|
| [31] |
Liu H, Chen J, Dy J, et al. (2023) Transforming complex problems into K-means solutions. Ieee T Pattern Anal Mach Intell 45: 9149–9168. https://doi.org/10.1109/TPAMI.2023.3237667 doi: 10.1109/TPAMI.2023.3237667
|
| [32] |
Liu T, Yu H, Blair RH (2022) Stability estimation for unsupervised clustering: A review. Wires Comput Stat 14: e1575. https://doi.org/10.1002/wics.1575 doi: 10.1002/wics.1575
|
| [33] |
Lock EF, Hoadley KA, Marron JS, et al. (2013) Joint and individual variation explained (JIVE) for integrated analysis of multiple data types. Ann Appl Stat, 7: 523–542. https://doi.org/10.1214/12-AOAS597 doi: 10.1214/12-AOAS597
|
| [34] | Lucińska M, Wierzchoń ST (2012) Spectral clustering based on k-nearest neighbor graph. In: IFIP International Conference on Computer Information Systems and Industrial Management, 254–265. Berlin, Heidelberg: Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-33260-9_22 |
| [35] |
Lundberg SM, Erion G, Chen H, et al. (2020) From local explanations to global understanding with explainable AI for trees. Nat Mach Intell 2: 56–67. https://doi.org/10.1038/s42256-019-0138-9 doi: 10.1038/s42256-019-0138-9
|
| [36] | Lundberg SM, Erion GG, Lee SI (2018) Consistent individualized feature attribution for tree ensembles. arXiv preprint. https://doi.org/10.48550/arXiv.1802.03888 |
| [37] | Lundberg SM, Lee SI (2017) A unified approach to interpreting model predictions. Adv Neural Inf Process Syst 30. Available from: https://proceedings.neurips.cc/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf. |
| [38] |
Meinshausen N, Bühlmann P (2010) Stability selection. J R Stat Soc B 72: 417–473. https://doi.org/10.1111/j.1467-9868.2010.00740.x doi: 10.1111/j.1467-9868.2010.00740.x
|
| [39] |
Monti S, Tamayo P, Mesirov J, et al. (2003) Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach learn 52: 91–118. https://doi.org/10.1023/A:1023949509487 doi: 10.1023/A:1023949509487
|
| [40] |
Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. Ieee T Pattern Anal Mach Intell 27: 1226–1238. https://doi.org/10.1109/TPAMI.2005.159 doi: 10.1109/TPAMI.2005.159
|
| [41] |
Qu L, Pei Y (2024) A comprehensive review on discriminant analysis for addressing challenges of class-level limitations, small sample size, and robustness. Processes 12: 1382. https://doi.org/10.3390/pr12071382 doi: 10.3390/pr12071382
|
| [42] |
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20: 53–65. https://doi.org/10.1016/0377-0427(87)90125-7 doi: 10.1016/0377-0427(87)90125-7
|
| [43] |
Rudin C (2019) Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell 1: 206–215. https://doi.org/10.1038/s42256-019-0048-x doi: 10.1038/s42256-019-0048-x
|
| [44] |
Sasse L, Nicolaisen-Sobesky E, Dukart J, et al. (2025) Overview of leakage scenarios in supervised machine learning. J Big Data 12: 135. https://doi.org/10.1186/s40537-025-01193-8 doi: 10.1186/s40537-025-01193-8
|
| [45] |
Șenbabaoğlu Y, Michailidis G, Li JZ (2014) Critical limitations of consensus clustering in class discovery. Sci Rep 4: 6207. https://doi.org/10.1038/srep06207 doi: 10.1038/srep06207
|
| [46] |
Simon N, Friedman J, Hastie T, et al. (2013) A sparse-group lasso. J Comput Graph Stat 22: 231–245. https://doi.org/10.1080/10618600.2012.681250 doi: 10.1080/10618600.2012.681250
|
| [47] | Smilde AK, Bro R, Geladi P (2003) Multi-Way Analysis: Applications in the Chemical Sciences. Wiley. |
| [48] |
Smilde AK, Westerhuis JA, de Jong S (2003) A framework for sequential multiblock component methods. J Chemometr 17: 323–337. https://doi.org/10.1002/cem.811 doi: 10.1002/cem.811
|
| [49] | Tao Z, Liu H, Li S, et al. (2017) From ensemble clustering to multi-view clustering. In: C. Sierra (Ed.), Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI-17), 2843–2849. AAAI Press. https://dl.acm.org/doi/10.5555/3172077.3172285 |
| [50] |
Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc B 63: 411–423. https://doi.org/10.1111/1467-9868.00293 doi: 10.1111/1467-9868.00293
|
| [51] |
Varma S, Simon R (2006) Bias in error estimation when using cross-validation for model selection. BMC Bioinformatics 7: 91. https://doi.org/10.1186/1471-2105-7-91 doi: 10.1186/1471-2105-7-91
|
| [52] |
Varoquaux G (2018) Cross-validation failure: Small sample sizes lead to large error bars. Neuroimage 180: 68–77. https://doi.org/10.1016/j.neuroimage.2017.06.061 doi: 10.1016/j.neuroimage.2017.06.061
|
| [53] |
Wang C, Cao L, Miao B (2013) Optimal feature selection for sparse linear discriminant analysis and its applications in gene expression data. Comput Stat Data Anal 66: 140–149. https://doi.org/10.1016/j.csda.2013.04.003 doi: 10.1016/j.csda.2013.04.003
|
| [54] |
Westerhuis JA, Kourti T, MacGregor JF (1998) Analysis of multiblock and hierarchical PCA and PLS models. J Chemometr 12: 301–321. https://doi.org/10.1002/(SICI)1099-128X(199809/10)12:5%3C301::AID-CEM515%3E3.0.CO;2-S doi: 10.1002/(SICI)1099-128X(199809/10)12:5%3C301::AID-CEM515%3E3.0.CO;2-S
|
| [55] | Xu C, Tao D, Xu C (2013) A survey on multi-view learning. arXiv preprint. https://doi.org/10.48550/arXiv.1304.5634 |
| [56] |
Yu Z, Dong Z, Yu C, et al. (2025) A review on multi-view learning. Front Comput Sci 19: 197334. https://doi.org/10.1007/s11704-024-40004-w doi: 10.1007/s11704-024-40004-w
|
| [57] |
Zhao X, Niu X, Ma Y, et al. (2024) A multi-view ensemble clustering approach using joint entropy. Expert Syst Appl 255: 124683. https://doi.org/10.1016/j.eswa.2024.124683 doi: 10.1016/j.eswa.2024.124683
|
| [58] |
Zhou S, Xu H, Zheng Z, et al. (2024) A comprehensive survey on deep clustering: Taxonomy, challenges, and future directions. Acm Comput Surv 57: 1–38. https://doi.org/10.1145/3689036 doi: 10.1145/3689036
|