Research article Special Issues

Testing for correlation in Gaussian databases via local decision making

  • Published: 01 April 2025
  • MSC : 60F10, 62F03, 62F05, 62H20, 62H22, 68P15, 68P27

  • We propose a computationally efficient statistical test for detecting correlation between two Gaussian databases. The problem is formulated as a hypothesis test: under the null hypothesis, the databases are independent; under the alternative hypothesis, they are correlated but subject to an unknown row permutation. We derive bounds on both type Ⅰ and type Ⅱ error probabilities and demonstrate that the proposed test outperforms a recently introduced method across a broad range of parameter settings. The test statistic is based on a sum of dependent indicator random variables. To effectively bound the type Ⅰ error probability, we introduce a novel graph-theoretic technique for bounding the $ k $-th moments of such statistics.

    Citation: Ran Tamir. Testing for correlation in Gaussian databases via local decision making[J]. AIMS Mathematics, 2025, 10(4): 7721-7766. doi: 10.3934/math.2025355

    Related Papers:

  • We propose a computationally efficient statistical test for detecting correlation between two Gaussian databases. The problem is formulated as a hypothesis test: under the null hypothesis, the databases are independent; under the alternative hypothesis, they are correlated but subject to an unknown row permutation. We derive bounds on both type Ⅰ and type Ⅱ error probabilities and demonstrate that the proposed test outperforms a recently introduced method across a broad range of parameter settings. The test statistic is based on a sum of dependent indicator random variables. To effectively bound the type Ⅰ error probability, we introduce a novel graph-theoretic technique for bounding the $ k $-th moments of such statistics.



    加载中


    [1] P. Ohm, Broken promises of privacy: Responding to the surprising failure of anonymization, UCLA Law Rev., 57 (2009), 1701.
    [2] J. Sedayao, R. Bhardwaj, N. Gorade, Making big data, privacy, and anonymization work together in the enterprise: Experiences and issues, In: 2014 IEEE International Congress on Big Data, USA: IEEE, 2014,601–607. https://doi.org/10.1109/BigData.Congress.2014.92
    [3] L. Sweeney, Weaving technology and policy together to maintain confidentiality, J. Law Med. Ethics, 25 (1997), 98–110. https://doi.org/10.1111/j.1748-720X.1997.tb01885.x doi: 10.1111/j.1748-720X.1997.tb01885.x
    [4] F. M. Naini, J. Unnikrishnan, P. Thiran, M. Vetterli, Where you are is who you are: User identification by matching statistics, IEEE T. Inf. Foren. Sec., 11 (2016), 358–372. https://doi.org/10.1109/TIFS.2015.2498131 doi: 10.1109/TIFS.2015.2498131
    [5] A. Datta, D. Sharma, A. Sinha, Provable de-anonymization of large datasets with sparse dimensions, In: International Conference on Principles of Security and Trust, Berlin, Germany: Springer, 2012,229–248. https://doi.org/10.1007/978-3-642-28641-4_13
    [6] A. Narayanan, V. Shmatikov, Robust de-anonymization of large sparse datasets, In: 2008 IEEE Symposium on Security and Privacy, USA: IEEE, 2008,111–125. https://doi.org/10.1109/SP.2008.33
    [7] N. Takbiri, A. Houmansadr, D. L. Goeckel, H. Pishro-Nik, Matching anonymized and obfuscated time series to users' profiles, IEEE T. Inform. Theory, 65 (2019), 724–741. https://doi.org/10.1109/TIT.2018.2873134 doi: 10.1109/TIT.2018.2873134
    [8] G. Wondracek, T. Holz, E. Kirda, C. Kruegel, A practical attack to de-anonymize social network users, In: 2010 IEEE Symposium on Security and Privacy, USA: IEEE, 2010,223–238. https://doi.org/10.1109/SP.2010.21
    [9] J. Su, A. Shukla, S. Goel, A. Narayanan, De-anonymizing web browsing data with social networks, In: Proc. 26th Int. Conf. World Wide Web, 2017, 1261–1269. https://doi.org/10.1145/3038912.3052714
    [10] L. Bilge, T. Strufe, D. Balzarotti, E. Kirda, All your contacts are belong to us: Automated identity theft attacks on social networks, In: Proc. 18th Int. Conf. World wide web, 2009,551–560. https://doi.org/10.1145/1526709.1526784
    [11] M. Srivatsa M. Hicks, Deanonymizing mobility traces: Using social network as a side-channel, In: Proc. ACM Conf. Comput. Commun. Secur., 2012,628–637. https://doi.org/10.1145/2382196.2382262
    [12] Z. Cheng, J. Caverlee, K. Lee, You are where you tweet: A content based approach to geo-locating Twitter users, In: Proc. 19th ACM Int. Conf. Inf. Knowl. Manage., 2010,759–768. https://doi.org/10.1145/1871437.1871535
    [13] Z. K, B. Nazer, Detecting correlated Gaussian databases, In: IEEE International Symposium on Information Theory, Finland: IEEE, 2022, 2064–2069. https://doi.org/10.1109/ISIT50566.2022.9834731
    [14] D. Elimelech, W. Huleihel, Phase transitions in the detection of correlated databases, In: International Conference on Machine Learning (PMLR), 2023, 9246–9266.
    [15] V. Paslev, W. Huleihel, Testing dependency of unlabeled databases, IEEE T. Inform. Theory, 70 (2024), 7410–7431. https://doi.org/10.1109/TIT.2024.3442977 doi: 10.1109/TIT.2024.3442977
    [16] D. Cullina, P. Mittal, N. Kiyavash, Fundamental limits of database alignment, In: IEEE International Symposium on Information Theory, USA: IEEE, 2018,651–655. https://doi.org/10.1109/ISIT.2018.8437908
    [17] O. E. Dai, D. Cullina, N. Kiyavash, Database alignment with Gaussian features, In: Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, 2019, 3225–3233.
    [18] O. E. Dai, D. Cullina, N. Kiyavash, Achievability of nearly-exact alignment for correlated Gaussian databases, In: IEEE International Symposium on Information Theory, USA: IEEE, 2020, 1230–1235. https://doi.org/10.1109/ISIT44484.2020.9174507
    [19] F. Shirani, S. Garg, E. Erkip, A concentration of measure approach to database de-anonymization, In: IEEE International Symposium on Information Theory, France: IEEE, 2019, 2748–2752. https://doi.org/10.1109/ISIT.2019.8849392
    [20] S. Bakırtaş, E. Erkip, Database matching under column deletions, In: IEEE International Symposium on Information Theory, Australia: IEEE, 2021, 2720–2725. https://doi.org/10.1109/ISIT45174.2021.9518145
    [21] S. Bakırtaş, E. Erkip, Database matching under adversarial column deletions, In: IEEE Information Theory Workshop, France: IEEE, 2023,181–185. https://doi.org/10.1109/ITW55543.2023.10161615
    [22] S. Bakırtaş, E. Erkip, Distribution-agnostic database de-anonymization under synchronization errors, In: IEEE International Workshop on Information Forensics and Security (WIFS), Germany: IEEE, 2023, 1–6. https://doi.org/10.1109/ITW55543.2023.10161615
    [23] S. Bakırtaş, E. Erkip, Database matching under noisy synchronization errors, IEEE T. Inform. Theory, 70 (2024), 4335–4367. https://doi.org/10.1109/TIT.2024.3388990 doi: 10.1109/TIT.2024.3388990
    [24] M. Chertkov, L. Kroc, F. Krzakala, M. Vergassola, L. Zdeborová, Inference in particle tracking experiments by passing messages between images, In: Proceedings of the National Academy of Sciences, 107 (2010), 7663–7668. https://doi.org/10.1073/pnas.0910994107
    [25] D. Kunisky, J. Niles-Weed, Strong recovery of geometric planted matchings, In: ACM-SIAM Symposium on Discrete Algorithms, 2022,834–876. https://doi.org/10.1137/1.9781611977073.36
    [26] P. Pedarsani, M. Grossglauser, On the privacy of anonymized networks, In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2011, 1235–1243. https://doi.org/10.1145/2020408.2020596
    [27] J. Ding, Z. Ma, Y. Wu, J. Xu, Efficient random graph matching via degree profiles, Probab. Theory Rel., 179 (2021), 29–115. https://doi.org/10.1007/s00440-020-00997-4 doi: 10.1007/s00440-020-00997-4
    [28] Y. Wu, J. Xu, S. H. Yu, Testing correlation of unlabeled random graphs, Ann. Appl. Probab., 33 (2023), 2519–2558. https://doi.org/10.1214/22-AAP1786 doi: 10.1214/22-AAP1786
    [29] C. Mao, Y. Wu, J. Xu, S. H. Yu, Testing network correlation efficiently via counting trees, Ann. Stat., 52 (2024), 2483–2505. https://doi.org/10.1214/23-AOS2261 doi: 10.1214/23-AOS2261
    [30] Y. Wu, J. Xu, H. Y. Sophie, Settling the sharp reconstruction thresholds of random graph matching, IEEE T. Inform. Theory, 68 (2022), 5391–5417. https://doi.org/10.1109/TIT.2022.3169005 doi: 10.1109/TIT.2022.3169005
    [31] L. Ganassali, Sharp threshold for alignment of graph databases with Gaussian weights, In: Mathematical and Scientific Machine Learning, 2 (2022), 314–335.
    [32] S. M. Kay, Fundamentals of statistical signal processing: Detection theory, University of Rhode Island: Prentice Hall PTR, 1998.
    [33] M. Basseville, I. V. Nikiforov, Detection of abrupt changes: theory and application, Prentice-Hall, 1993.
    [34] H. W. Kuhn, The Hungarian Method for the assignment problem, Nav. Res. Logist. Quart., 2 (1955), 83–97. https://doi.org/10.1002/nav.3800020109 doi: 10.1002/nav.3800020109
    [35] H. W. Kuhn, Variants of the Hungarian method for assignment problems, Nav. Res. Logist. Quart., 3 (1956), 253–258. https://doi.org/10.1002/nav.3800030404 doi: 10.1002/nav.3800030404
    [36] J. Munkres, Algorithms for the assignment and transportation problems, J. Soc. Ind. Appl. Math., 5 (1957), 32–38. https://doi.org/10.1137/0105003 doi: 10.1137/0105003
    [37] J. Edmonds, R. M. Karp, Theoretical improvements in algorithmic efficiency for network flow problems, JACM, 19 (1972), 248–264. https://doi.org/10.1145/321694.321699 doi: 10.1145/321694.321699
    [38] S. Janson, New versions of Suen's correlation inequality, Random Struct. Algor., 13 (1998), 467–483. https://doi.org/10.1002/(SICI)1098-2418(199810/12)13:3/4<467::AID-RSA15>3.0.CO;2-W doi: 10.1002/(SICI)1098-2418(199810/12)13:3/4<467::AID-RSA15>3.0.CO;2-W
    [39] R. Tamir, N. Merhav, N. Weinberger, A. G. i Fàbregas, Large deviations behavior of the logarithmic error probability of random codes, IEEE T. Inform. Theory, 66 (2020), 6635–6659. https://doi.org/10.1109/TIT.2020.2995136 doi: 10.1109/TIT.2020.2995136
    [40] L. V. Truong, G. Cocco, J. Font-Segura, A. G. i Fàbregas, Concentration properties of random codes, IEEE T. Inform. Theory, 69 (2023), 7499–7537. https://doi.org/10.1109/TIT.2023.3312326 doi: 10.1109/TIT.2023.3312326
    [41] N. Merhav, Error exponents of typical random codes, IEEE T. Inform. Theory, 64 (2018), 6223–6235. https://doi.org/10.1109/TIT.2018.2834503 doi: 10.1109/TIT.2018.2834503
    [42] N. Merhav, Statistical physics and information theory, Found. Trends Commun., 6 (2009), 1–212. https://doi.org/10.1561/0100000052 doi: 10.1561/0100000052
    [43] C. Bunte, A. Lapidoth, On the listsize capacity with feedback, IEEE T. Inform. Theory, 60 (2014), 6733–6748. https://doi.org/10.1109/TIT.2014.2355815 doi: 10.1109/TIT.2014.2355815
    [44] S. Li, Concise formulas for the area and volume of a hyperspherical cap, Asian J. Math. Stat., 4 (2011), 66–70. https://doi.org/10.3923/ajms.2011.66.70 doi: 10.3923/ajms.2011.66.70
    [45] J. Riordan, An introduction to combinatorial analysis, New York: Wiley, 1980.
  • Reader Comments
  • © 2025 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(921) PDF downloads(82) Cited by(0)

Article outline

Figures and Tables

Figures(15)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog