Research article

Mahalanobis-geometry imputation for multivariate data with missing entries

  • Published: 25 May 2026
  • MSC : 62F35, 62H12, 62H25, 62J10, 65F10

  • Missing entries in multivariate data distort not only marginal summaries but also the covariance geometry that governs scale-adjusted and correlation-aware comparisons between observations. Motivated by covariance-sensitive downstream tasks, this paper develops a deterministic imputation framework driven by Mahalanobis distance. The first stage is a linear frozen-covariance procedure: missing entries are temporarily replaced by simple columnwise values, a fixed covariance matrix is computed, and the sum of the nonconstant squared Mahalanobis distances is minimized with respect to the unknown entries. Since the inverse covariance is fixed at that stage, the objective is quadratic and the first-order optimality conditions reduce to a linear system. The second stage is a nonlinear covariance-updating refinement in which the covariance matrix depends on the imputed values themselves and the optimization is performed locally, using the linear solution as initializer. We derive a compact matrix representation of the linear objective, give a sufficient full-rank condition guaranteeing uniqueness of the stationarity system, discuss the bias induced by freezing the covariance, and provide a regularized fallback for singular or ill-conditioned systems. The framework also clarifies its scope with respect to MCAR, MAR-type, and structured block masks, and uses covariance stabilization only as a numerical safeguard rather than as a determinant-minimization estimator. A repeated-mask experiment on the red wine quality dataset shows that the Mahalanobis method substantially improves on mean imputation at all masking levels and becomes the strongest among the tested methods at the highest missingness level considered. The resulting method is transparent, reproducible, and intended for moderate continuous-data settings in which preserving empirical covariance geometry is more important than fitting a large black-box model.

    Citation: Alvaro H. Salas S., David L. Ocampo R., Lorenzo J. Martínez H.. Mahalanobis-geometry imputation for multivariate data with missing entries[J]. AIMS Mathematics, 2026, 11(5): 14641-14654. doi: 10.3934/math.2026600

    Related Papers:

  • Missing entries in multivariate data distort not only marginal summaries but also the covariance geometry that governs scale-adjusted and correlation-aware comparisons between observations. Motivated by covariance-sensitive downstream tasks, this paper develops a deterministic imputation framework driven by Mahalanobis distance. The first stage is a linear frozen-covariance procedure: missing entries are temporarily replaced by simple columnwise values, a fixed covariance matrix is computed, and the sum of the nonconstant squared Mahalanobis distances is minimized with respect to the unknown entries. Since the inverse covariance is fixed at that stage, the objective is quadratic and the first-order optimality conditions reduce to a linear system. The second stage is a nonlinear covariance-updating refinement in which the covariance matrix depends on the imputed values themselves and the optimization is performed locally, using the linear solution as initializer. We derive a compact matrix representation of the linear objective, give a sufficient full-rank condition guaranteeing uniqueness of the stationarity system, discuss the bias induced by freezing the covariance, and provide a regularized fallback for singular or ill-conditioned systems. The framework also clarifies its scope with respect to MCAR, MAR-type, and structured block masks, and uses covariance stabilization only as a numerical safeguard rather than as a determinant-minimization estimator. A repeated-mask experiment on the red wine quality dataset shows that the Mahalanobis method substantially improves on mean imputation at all masking levels and becomes the strongest among the tested methods at the highest missingness level considered. The resulting method is transparent, reproducible, and intended for moderate continuous-data settings in which preserving empirical covariance geometry is more important than fitting a large black-box model.



    加载中


    [1] E. J. Candès, B. Recht, Exact matrix completion via convex optimization, Found. Comput. Math., 9 (2009), 717–772. http://doi.org/10.1007/s10208-009-9045-5 doi: 10.1007/s10208-009-9045-5
    [2] P. Cortez, A. Cerdeira, F. Almeida, T. Matos, J. Reis, Modeling wine preferences by data mining from physicochemical properties, Decis. Support Syst., 47 (2009), 547–553. http://doi.org/10.1016/j.dss.2009.05.016 doi: 10.1016/j.dss.2009.05.016
    [3] A. P. Dempster, N. M. Laird, D. B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc. B, 39 (1977), 1–22. http://doi.org/10.1111/j.2517-6161.1977.tb01600.x doi: 10.1111/j.2517-6161.1977.tb01600.x
    [4] R. Gnanadesikan, Methods for statistical data analysis of multivariate observations, John Wiley & Sons, 1997. http://doi.org/10.1002/9781118032671
    [5] R. A. Johnson, D. W. Wichern, Applied multivariate statistical analysis, Biometrics, 44 (1988), 920.
    [6] R. Little, D. Rubin, Statistical analysis with missing data, John Wiley & Sons, 2019. http://doi.org/10.1002/9781119482260
    [7] P. C. Mahalanobis, On the generalized distance in statistics, Proc. Natl. Inst. Sci. India, 2 (1936), 49–55.
    [8] R. Mazumder, T. Hastie, R. Tibshirani, Spectral regularization algorithms for learning large incomplete matrices, J. Mach. Learn. Res., 11 (2010), 2287–2322.
    [9] D. B. Rubin, Multiple imputation for nonresponse in surveys, John Wiley & Sons, 1987. http://doi.org/10.1002/9780470316696
    [10] D. J. Stekhoven, P. Bühlmann, MissForest-non-parametric missing value imputation for mixed-type data, Bioinformatics, 28 (2012), 112–118. http://doi.org/10.1093/bioinformatics/btr597 doi: 10.1093/bioinformatics/btr597
    [11] O. Troyanskaya, M. Cantor, G. Sherlock, P. Brown, T. Hastie, R. Tibshirani, et al., Missing value estimation methods for DNA microarrays, Bioinformatics, 17 (2001), 520–525. http://doi.org/10.1093/bioinformatics/17.6.520 doi: 10.1093/bioinformatics/17.6.520
    [12] S. van Buuren, Flexible imputation of missing data, New York: Chapman & Hall/CRC, 2018. http://doi.org/10.1201/9780429492259
    [13] J. Yoon, J. Jordon, M. van der Schaar, GAIN: Missing data imputation using generative adversarial nets, In: Proceedings of the 35th International Conference on Machine Learning, 80 (2018), 5689–5698.
    [14] V. Fortuin, D. Baranchuk, G. Rätsch, S. Mandt, GP-VAE: Deep probabilistic multivariate time series imputation, In: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 108 (2020), 1651–1661.
    [15] Y. Tashiro, J. Song, Y. Song, S. Ermon, CSDI: Conditional score-based diffusion models for probabilistic time series imputation, In: Advances in Neural Information Processing Systems, 34 (2021), 24804–24816.
    [16] W. Du, D. Côté, Y. Liu, SAITS: Self-attention-based imputation for time series, Expert Syst. Appl., 219 (2023), 119619. http://doi.org/10.1016/j.eswa.2023.119619 doi: 10.1016/j.eswa.2023.119619
    [17] T. Du, L. Melis, T. Wang, ReMasker: Imputing tabular data with masked autoencoding, In: The Twelfth International Conference on Learning Representations, 2024.
    [18] J. Wang, W. Du, Y. Yang, L. Qian, W. Cao, K. Zhang, et al., Deep learning for multivariate time series imputation: A survey, In: Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, 2025, 10696–10704. http://doi.org/10.24963/ijcai.2025/1187
  • Reader Comments
  • © 2026 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(74) PDF downloads(5) Cited by(0)

Article outline

Figures and Tables

Tables(1)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog