Mahalanobis-geometry imputation for multivariate data with missing entries

Alvaro H. Salas S.; David L. Ocampo R.; Lorenzo J. Martínez H.; Alvaro H. Salas S.; David L. Ocampo R.; Lorenzo J. Martínez H.

doi:10.3934/math.2026600

AIMS Mathematics

2026, Volume 11, Issue 5: 14641-14654. doi: 10.3934/math.2026600

Previous Article Next Article

Research article

Mahalanobis-geometry imputation for multivariate data with missing entries

1.
Department of Mathematics and Statistics, Universidad Nacional de Colombia, Manizales, Colombia
2.
Universidad de Caldas, Manizales, Colombia

Received: 08 February 2026 Revised: 23 April 2026 Accepted: 29 April 2026 Published: 25 May 2026
MSC : 62F35, 62H12, 62H25, 62J10, 65F10

Missing entries in multivariate data distort not only marginal summaries but also the covariance geometry that governs scale-adjusted and correlation-aware comparisons between observations. Motivated by covariance-sensitive downstream tasks, this paper develops a deterministic imputation framework driven by Mahalanobis distance. The first stage is a linear frozen-covariance procedure: missing entries are temporarily replaced by simple columnwise values, a fixed covariance matrix is computed, and the sum of the nonconstant squared Mahalanobis distances is minimized with respect to the unknown entries. Since the inverse covariance is fixed at that stage, the objective is quadratic and the first-order optimality conditions reduce to a linear system. The second stage is a nonlinear covariance-updating refinement in which the covariance matrix depends on the imputed values themselves and the optimization is performed locally, using the linear solution as initializer. We derive a compact matrix representation of the linear objective, give a sufficient full-rank condition guaranteeing uniqueness of the stationarity system, discuss the bias induced by freezing the covariance, and provide a regularized fallback for singular or ill-conditioned systems. The framework also clarifies its scope with respect to MCAR, MAR-type, and structured block masks, and uses covariance stabilization only as a numerical safeguard rather than as a determinant-minimization estimator. A repeated-mask experiment on the red wine quality dataset shows that the Mahalanobis method substantially improves on mean imputation at all masking levels and becomes the strongest among the tested methods at the highest missingness level considered. The resulting method is transparent, reproducible, and intended for moderate continuous-data settings in which preserving empirical covariance geometry is more important than fitting a large black-box model.
- missing data,
- imputation,
- Mahalanobis distance,
- sample covariance matrix,
- covariance geometry,
- regularized linear systems,
- MCAR masking
Citation: Alvaro H. Salas S., David L. Ocampo R., Lorenzo J. Martínez H.. Mahalanobis-geometry imputation for multivariate data with missing entries[J]. AIMS Mathematics, 2026, 11(5): 14641-14654. doi: 10.3934/math.2026600

Related Papers:

Abstract

Missing entries in multivariate data distort not only marginal summaries but also the covariance geometry that governs scale-adjusted and correlation-aware comparisons between observations. Motivated by covariance-sensitive downstream tasks, this paper develops a deterministic imputation framework driven by Mahalanobis distance. The first stage is a linear frozen-covariance procedure: missing entries are temporarily replaced by simple columnwise values, a fixed covariance matrix is computed, and the sum of the nonconstant squared Mahalanobis distances is minimized with respect to the unknown entries. Since the inverse covariance is fixed at that stage, the objective is quadratic and the first-order optimality conditions reduce to a linear system. The second stage is a nonlinear covariance-updating refinement in which the covariance matrix depends on the imputed values themselves and the optimization is performed locally, using the linear solution as initializer. We derive a compact matrix representation of the linear objective, give a sufficient full-rank condition guaranteeing uniqueness of the stationarity system, discuss the bias induced by freezing the covariance, and provide a regularized fallback for singular or ill-conditioned systems. The framework also clarifies its scope with respect to MCAR, MAR-type, and structured block masks, and uses covariance stabilization only as a numerical safeguard rather than as a determinant-minimization estimator. A repeated-mask experiment on the red wine quality dataset shows that the Mahalanobis method substantially improves on mean imputation at all masking levels and becomes the strongest among the tested methods at the highest missingness level considered. The resulting method is transparent, reproducible, and intended for moderate continuous-data settings in which preserving empirical covariance geometry is more important than fitting a large black-box model.

References

[1]	E. J. Candès, B. Recht, Exact matrix completion via convex optimization, Found. Comput. Math., 9 (2009), 717–772. http://doi.org/10.1007/s10208-009-9045-5 doi: 10.1007/s10208-009-9045-5
[2]	P. Cortez, A. Cerdeira, F. Almeida, T. Matos, J. Reis, Modeling wine preferences by data mining from physicochemical properties, Decis. Support Syst., 47 (2009), 547–553. http://doi.org/10.1016/j.dss.2009.05.016 doi: 10.1016/j.dss.2009.05.016
[3]	A. P. Dempster, N. M. Laird, D. B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc. B, 39 (1977), 1–22. http://doi.org/10.1111/j.2517-6161.1977.tb01600.x doi: 10.1111/j.2517-6161.1977.tb01600.x
[4]	R. Gnanadesikan, Methods for statistical data analysis of multivariate observations, John Wiley & Sons, 1997. http://doi.org/10.1002/9781118032671
[5]	R. A. Johnson, D. W. Wichern, Applied multivariate statistical analysis, Biometrics, 44 (1988), 920.
[6]	R. Little, D. Rubin, Statistical analysis with missing data, John Wiley & Sons, 2019. http://doi.org/10.1002/9781119482260
[7]	P. C. Mahalanobis, On the generalized distance in statistics, Proc. Natl. Inst. Sci. India, 2 (1936), 49–55.
[8]	R. Mazumder, T. Hastie, R. Tibshirani, Spectral regularization algorithms for learning large incomplete matrices, J. Mach. Learn. Res., 11 (2010), 2287–2322.
[9]	D. B. Rubin, Multiple imputation for nonresponse in surveys, John Wiley & Sons, 1987. http://doi.org/10.1002/9780470316696
[10]	D. J. Stekhoven, P. Bühlmann, MissForest-non-parametric missing value imputation for mixed-type data, Bioinformatics, 28 (2012), 112–118. http://doi.org/10.1093/bioinformatics/btr597 doi: 10.1093/bioinformatics/btr597
[11]	O. Troyanskaya, M. Cantor, G. Sherlock, P. Brown, T. Hastie, R. Tibshirani, et al., Missing value estimation methods for DNA microarrays, Bioinformatics, 17 (2001), 520–525. http://doi.org/10.1093/bioinformatics/17.6.520 doi: 10.1093/bioinformatics/17.6.520
[12]	S. van Buuren, Flexible imputation of missing data, New York: Chapman & Hall/CRC, 2018. http://doi.org/10.1201/9780429492259
[13]	J. Yoon, J. Jordon, M. van der Schaar, GAIN: Missing data imputation using generative adversarial nets, In: Proceedings of the 35th International Conference on Machine Learning, 80 (2018), 5689–5698.
[14]	V. Fortuin, D. Baranchuk, G. Rätsch, S. Mandt, GP-VAE: Deep probabilistic multivariate time series imputation, In: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 108 (2020), 1651–1661.
[15]	Y. Tashiro, J. Song, Y. Song, S. Ermon, CSDI: Conditional score-based diffusion models for probabilistic time series imputation, In: Advances in Neural Information Processing Systems, 34 (2021), 24804–24816.
[16]	W. Du, D. Côté, Y. Liu, SAITS: Self-attention-based imputation for time series, Expert Syst. Appl., 219 (2023), 119619. http://doi.org/10.1016/j.eswa.2023.119619 doi: 10.1016/j.eswa.2023.119619
[17]	T. Du, L. Melis, T. Wang, ReMasker: Imputing tabular data with masked autoencoding, In: The Twelfth International Conference on Learning Representations, 2024.
[18]	J. Wang, W. Du, Y. Yang, L. Qian, W. Cao, K. Zhang, et al., Deep learning for multivariate time series imputation: A survey, In: Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, 2025, 10696–10704. http://doi.org/10.24963/ijcai.2025/1187

Reader Comments

Your name:*

Email:*
© 2026 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

AIMS Mathematics

1.8 3.6

Metrics

Article views(411) PDF downloads(15) Cited by(0)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

Figures and Tables

Tables(1)

AIMS Mathematics

Mahalanobis-geometry imputation for multivariate data with missing entries

Related Papers:

Abstract

References

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Catalog

AIMS Mathematics

Mahalanobis-geometry imputation for multivariate data with missing entries

Related Papers:

Abstract

References

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog