We perform a numerical study of autoencoder deep neural networks (DNNs) when the input and the output vectors have the same dimension. Our focus is on fixed points (FPs) arising in these DNNs. We show that the existence and the number of these FPs depend on the distribution of randomly initialized DNNs' weight matrices. We first consider initialization with the identically and independently distributed (i.i.d.) light-tailed distributions of weights (e.g., Gaussian) and show existence of a single stable FP for a wide class of DNN architectures. In contrast, for heavy-tailed distributions (e.g., Cauchy), which typically appear after the training of DNNs, a number of stable FPs emerge. We observe an intriguing non-monotone dependence of the number of FPs on the DNN's depth. Finally, we link our result for untrained DNNs to the trained ones by showing that a number of FPs emerge after training of DNNs with light-tailed initialization.
Citation: L. Berlyand, O. Krupchytskyi, V. Slavin. Random weights of DNNs and emergence of fixed points[J]. Networks and Heterogeneous Media, 2026, 21(1): 170-182. doi: 10.3934/nhm.2026007
We perform a numerical study of autoencoder deep neural networks (DNNs) when the input and the output vectors have the same dimension. Our focus is on fixed points (FPs) arising in these DNNs. We show that the existence and the number of these FPs depend on the distribution of randomly initialized DNNs' weight matrices. We first consider initialization with the identically and independently distributed (i.i.d.) light-tailed distributions of weights (e.g., Gaussian) and show existence of a single stable FP for a wide class of DNN architectures. In contrast, for heavy-tailed distributions (e.g., Cauchy), which typically appear after the training of DNNs, a number of stable FPs emerge. We observe an intriguing non-monotone dependence of the number of FPs on the DNN's depth. Finally, we link our result for untrained DNNs to the trained ones by showing that a number of FPs emerge after training of DNNs with light-tailed initialization.
| [1] | Y. Yan, S. Yang, Y. Wang, J. Zhao, F. Shen, Review neural networks about image transformation based on IGC learning framework with annotated information, preprint, arXiv: 2206.10155, 2022. https://doi.org/10.48550/arXiv.2206.10155 |
| [2] | S. Kaji, S. Kida, Overview of image-to-image translation by use of deep neural networks: Denoising, super-resolution, modality conversion, and reconstruction in medical imaging, preprint, arXiv: 1905.08603, 2019. https://doi.org/10.48550/arXiv.1905.08603 |
| [3] | W. Hong, T. Chen, M. Lu, S. Pu, Z. Ma, Efficient neural image decoding via fixed-point inference, in IEEE Transactions on Circuits and Systems for Video Technology, 31 (2021), 3618–3630. https://doi.org/10.1109/TCSVT.2020.3040367 |
| [4] | C. Mou, Q. Wang, J. Zhang, Deep generalized unfolding networks for image restoration, preprint, arXiv: 2204.13348, 2022. https://doi.org/10.48550/arXiv.2204.13348 |
| [5] |
D. Ferster, K. D. Miller, Neural mechanisms of orientation selectivity in the visual cortex, Annual Rev. Neurosci., 23 (2000), 441–471. https://doi.org/10.1146/annurev.neuro.23.1.441 doi: 10.1146/annurev.neuro.23.1.441
|
| [6] |
H. Ozeki, I. M. Finn, E. S. Schaffer, K. D. Miller, D. Ferster, Inhibitory stabilization of the cortical network underlies visual surround suppression, Neuron, 62 (2009), 578–592. https://doi.org/10.1016/j.neuron.2009.03.028 doi: 10.1016/j.neuron.2009.03.028
|
| [7] |
D. B. Rubin, S. D. Van Hooser, K. D. Miller, The stabilized supralinear network: A unifying circuit motif underlying multi-input integration in sensory cortex, Neuron, 85 (2015), 402–417. https://doi.org/10.1016/j.neuron.2014.12.026 doi: 10.1016/j.neuron.2014.12.026
|
| [8] |
C. Ebsch, R. Rosenbaum, Imbalanced amplification: A mechanism of amplification and suppression from local imbalance of excitation and inhibition in cortical circuits, PLoS Comput. Biol., 14 (2018), e1006048. https://doi.org/10.1371/journal.pcbi.1006048 doi: 10.1371/journal.pcbi.1006048
|
| [9] |
C. Curto, J. Geneson, K. Morrison, Fixed points of competitive threshold-linear networks. Neural Comput., 31 (2019), 94–155. https://doi.org/10.1162/neco_a_01151 doi: 10.1162/neco_a_01151
|
| [10] |
J. Hopfield, Neural networks and physical systems with emergent collective computational abilities, Proc. Natl. Acad. Sci., 79 (1982), 2554. https://doi.org/10.1073/pnas.79.8.2554 doi: 10.1073/pnas.79.8.2554
|
| [11] |
J. Hopfield, Neurons with graded response have collective computational properties like those of two-state neurons, Proc. Natl. Acad. Sci., 81 (1984), 3088–3092. https://doi.org/10.1073/pnas.81.10.3088 doi: 10.1073/pnas.81.10.3088
|
| [12] | D. Krotov, J. Hopfield, Large associative memory problem in neurobiology and machine learning, preprint, arXiv: 2008.06996, 2020. https://doi.org/10.48550/arXiv.2008.06996 |
| [13] | T. Kimura, K. Kato, Analysis of discrete modern hopfield networks in open quantum system, preprint, arXiv: 2411.02883, 2024. https://doi.org/10.1103/3966-8xs2 |
| [14] | L. Berlyand, P. E. Jabin, Mathematics of Deep Learning: An Introduction, Walter de Gruyter GmbH & Co KG, (2023), 132. https://doi.org/10.1515/9783111025551 |
| [15] | I. Goodfellow, Y. Bengio, A. Courville, Deep learning, in Adaptive Computation and Machine Learning Series, MIT Press, (2016), 785. Available from: http://www.deeplearningbook.org. |
| [16] | D. P. Kingma, M. Welling, An introduction to variational autoencoders, preprint, arXiv: 1906.02691v3, 2019. https://doi.org/10.1561/2200000056 |
| [17] |
J. Wang, R. Cao, N. J. Brandmeir, X. Li, S. Wang, Face identity coding in the deep neural network and primate brain, Commun. Biol., 5 (2022), 611. https://doi.org/10.1038/s42003-022-03557-9 doi: 10.1038/s42003-022-03557-9
|
| [18] |
P. Baldi, K. Hornik, Neural networks and principal component analysis: Learning from examples without local minima, Neural Networks, 2 (1989), 53–58. https://doi.org/10.1016/0893-6080(89)90014-2 doi: 10.1016/0893-6080(89)90014-2
|
| [19] | T. J. Piotrowski, R. L. G. Cavalcante, M. Gabor, Fixed points of nonnegative neural networks, preprint, arXiv: 2106.16239v9, 2021. https://doi.org/10.48550/arXiv.2106.16239 |
| [20] | N. Buduma, N. Buduma, J. Papa, Fundamentals of Deep Learning, O'Reilly Media, Inc., 2nd edition, 2022,387. Available from: https://www.oreilly.com/library/view/fundamentals-of-deep/9781492082170/. |
| [21] |
Y. Bahri, J. Kadmon, J. Pennington, S. Schoenholz, J. Sohl-Dickstein, S. Ganguli, Statistical mechanics of deep learning, Annu. Rev. Condens. Matter Phys., 11 (2020), 501–528. https://doi.org/10.1146/annurev-conmatphys-031119-050745 doi: 10.1146/annurev-conmatphys-031119-050745
|
| [22] | C. Gallicchio, S. Scardapane, Deep randomized neural networks, in Recent Trends in Learning From Data. Studies in Computational Intelligence, (eds. L. Oneto, N. Navarin, A. Sperduti, and D. Anguita), Springer, Cham, 896 (2020). https://doi.org/10.1007/978-3-030-43883-8_3 |
| [23] |
R. Giryes, G. Sapiro, A. M. Bronstein, Deep neural networks with random Gaussian weights: A universal classification strategy, IEEE Trans. Signal Process. 64 (2016), 3444–3457. https://doi.org/10.1109/TSP.2016.2546221 doi: 10.1109/TSP.2016.2546221
|
| [24] | Z. Ling, X. He, R. C. Qiu, Spectrum concentration in deep residual learning: A free probability approach, preprint, arXiv: 1807.11694, 2018. https://doi.org/10.48550/arXiv.1807.11694 |
| [25] | A. G. de G. Matthews, J. Hron, M. Rowland, R. E. Turner, Z. Ghahramani, Gaussian process behaviour in wide deep neural networks, preprint, arXiv: 1804.11271, 2018. https://doi.org/10.48550/arXiv.1804.11271 |
| [26] | G. Yang, Tensor programs Ⅲ: Neural matrix laws, preprint, arXiv: 2009.10685v1, 2020. https://doi.org/10.48550/arXiv.2009.10685 |
| [27] |
V. Marchenko, L. Pastur, The eigenvalue distribution in some ensembles of random matrices, Math. USSR Sbornik, 1 (1967), 457–483. https://doi.org/10.1070/SM1967v001n04ABEH001994 doi: 10.1070/SM1967v001n04ABEH001994
|
| [28] |
L. Berlyand, E. Sandier, Y. Shmalo, L. Zhang, Enhancing accuracy in deep learning using random matrix theory, J. Mach. Learn., 3 (2024), 347–412. https://doi.org/10.4208/jml.231220 doi: 10.4208/jml.231220
|
| [29] |
C. H. Martin, M. W. Mahoney, Implicit self-regularization in deep neural networks: Evidence from random matrix theory and implications for learning, J. Mach. Learn. Res., 22 (2021), 1–73. https://doi.org/10.5555/3546258.3546423 doi: 10.5555/3546258.3546423
|
| [30] | J. Pennington, S. S. Schoenholz, S. Ganguli, The emergence of spectral universality in deep networks, preprint, arXiv: 1802.09979v1, 2018. https://doi.org/10.48550/arXiv.1802.09979 |
| [31] | N. Belrose, A. Scherlis, Understanding gradient descent through the training Jacobian, preprint, arXiv: 2412.07003v2, 2024. https://doi.org/10.48550/arXiv.2412.07003 |
| [32] |
L. Pastur, V. Slavin, On random matrices arising in deep neural networks: General I.I.D. case, Random Matrices: Theory Appl., 12 (2023), 2250046. https://doi.org/10.1142/s2010326322500460 doi: 10.1142/s2010326322500460
|
| [33] | J. Hoffman, D. A. Roberts, S. Yaida, Robust learning with Jacobian regularization, preprint, arXiv: 1908.02729v1, 2019. https://doi.org/10.48550/arXiv.1908.02729 |
| [34] |
C. Bessaga, On the converse of the Banach "fixed-point principle", Colloq. Math., 7 (1959), 41–43. https://doi.org/10.4064/cm-7-1-41-43 doi: 10.4064/cm-7-1-41-43
|
| [35] |
J. Jachymski, I. Jóźwik, M. Terepeta, The Banach fixed point theorem: Selected topics from its hundred-year history, Rev. Real Acad. Cienc. Exactas Fis. Nat. Ser. A-Mat., 118 (2024), 140. https://doi.org/10.1007/s13398-024-01636-6 doi: 10.1007/s13398-024-01636-6
|
| [36] | Y. Shmalo, J. Jenkins, O. Krupchytskyi, Deep learning weight pruning with RMT-SVD: Increasing accuracy and reducing overfitting, preprint, arXiv: 2303.08986v1, 2023. https://doi.org/10.48550/arXiv.2303.08986 |
| [37] |
T. Shcherbina, On universality of local edge regime for the deformed Gaussian unitary ensemble, J. Stat. Phys., 143 (2011), 455–481. https://doi.org/10.1007/s10955-011-0196-9 doi: 10.1007/s10955-011-0196-9
|
| [38] |
J. G. Russo, Deformed Cauchy random matrix ensembles and large $N$ phase transitions, J. High Energy Phys., 14 (2020), 1. https://doi.org/10.1007/JHEP11(2020)014 doi: 10.1007/JHEP11(2020)014
|
| [39] | M. Hisakado, T. Kaneko, Deformation of Marchenko–Pastur distribution for the correlated time series, preprint, arXiv: 2305.12632v2, 2023. https://doi.org/10.48550/arXiv.2305.12632 |