
We propose a computationally efficient statistical test for detecting correlation between two Gaussian databases. The problem is formulated as a hypothesis test: under the null hypothesis, the databases are independent; under the alternative hypothesis, they are correlated but subject to an unknown row permutation. We derive bounds on both type Ⅰ and type Ⅱ error probabilities and demonstrate that the proposed test outperforms a recently introduced method across a broad range of parameter settings. The test statistic is based on a sum of dependent indicator random variables. To effectively bound the type Ⅰ error probability, we introduce a novel graph-theoretic technique for bounding the k-th moments of such statistics.
Citation: Ran Tamir. Testing for correlation in Gaussian databases via local decision making[J]. AIMS Mathematics, 2025, 10(4): 7721-7766. doi: 10.3934/math.2025355
[1] | C. T. J. Dodson . Information distance estimation between mixtures of multivariate Gaussians. AIMS Mathematics, 2018, 3(4): 439-447. doi: 10.3934/Math.2018.4.439 |
[2] | Le Gao, Junzhe Zhang, Jiaxin Yu, Yin Tang, Zhiqiang Zeng . BPA: A decentralized payment system that balances privacy and auditability. AIMS Mathematics, 2024, 9(3): 6183-6206. doi: 10.3934/math.2024302 |
[3] | Jairo A. Angel, Francisco M.M. Rocha, Jorge I. Vélez, Julio M. Singer . A new test for detecting specification errors in Gaussian linear mixed-effects models. AIMS Mathematics, 2024, 9(11): 30710-30727. doi: 10.3934/math.20241483 |
[4] | Jianye Yang, Tongjiang Yan, Pengcheng Ren . A novel Bayesian federated learning framework to address multi-dimensional heterogeneity problem. AIMS Mathematics, 2023, 8(7): 15058-15080. doi: 10.3934/math.2023769 |
[5] | Asamh Saleh M. Al Luhayb . Nonparametric bootstrap methods for hypothesis testing in the event of double-censored data. AIMS Mathematics, 2024, 9(2): 4649-4664. doi: 10.3934/math.2024224 |
[6] | D. Jeni Seles Martina, G. Deepa . Some algebraic properties on rough neutrosophic matrix and its application to multi-criteria decision-making. AIMS Mathematics, 2023, 8(10): 24132-24152. doi: 10.3934/math.20231230 |
[7] | Eatedal Alabdulkreem, Mesfer Alduhayyem, Mohammed Abdullah Al-Hagery, Abdelwahed Motwakel, Manar Ahmed Hamza, Radwa Marzouk . Artificial Rabbit Optimizer with deep learning for fall detection of disabled people in the IoT Environment. AIMS Mathematics, 2024, 9(6): 15486-15504. doi: 10.3934/math.2024749 |
[8] | Yan Wang, Yanxi Fu, Nian Li, Huanyu Wang . Optimal and near-optimal frequency-hopping sequences based on Gaussian period. AIMS Mathematics, 2023, 8(12): 29158-29170. doi: 10.3934/math.20231493 |
[9] | H. M. Barakat, M. H. Dwes . Asymptotic behavior of ordered random variables in mixture of two Gaussian sequences with random index. AIMS Mathematics, 2022, 7(10): 19306-19324. doi: 10.3934/math.20221060 |
[10] | Shumoua F. Alrzqi, Fatimah A. Alrawajeh, Hany N. Hassan . An efficient numerical technique for investigating the generalized Rosenau–KDV–RLW equation by using the Fourier spectral method. AIMS Mathematics, 2024, 9(4): 8661-8688. doi: 10.3934/math.2024420 |
We propose a computationally efficient statistical test for detecting correlation between two Gaussian databases. The problem is formulated as a hypothesis test: under the null hypothesis, the databases are independent; under the alternative hypothesis, they are correlated but subject to an unknown row permutation. We derive bounds on both type Ⅰ and type Ⅱ error probabilities and demonstrate that the proposed test outperforms a recently introduced method across a broad range of parameter settings. The test statistic is based on a sum of dependent indicator random variables. To effectively bound the type Ⅰ error probability, we introduce a novel graph-theoretic technique for bounding the k-th moments of such statistics.
With the thriving use of smart devices and the growing popularity of big data, various institutions have been collecting more and more personal data from users, which are then either published or sold for research or profit-making purposes. Although the published data are typically anonymized (i.e., the explicit attributes of the users, such as names and dates of birth are erased), there has been increasing concern about potential privacy leakage from anonymized data, approached from juridical [1] and corporate [2] points of view. These concerns are also expressed in the literature regarding successful practical de-anonymization attacks on real data; the reader is referred to [3,4,5,6,7] for a non-exhaustive list. We briefly elaborate on some of these studies. In [4], the task of identifying users from the statistics of their behavioral patterns, based on call data records, web browsing histories, or GPS trajectories, has been studied. It was demonstrated that a large fraction of users can be easily identified, given only histograms of their data; hence, these histograms can act as users' fingerprints. In [6], the authors presented a class of statistical de-anonymization attacks against high-dimensional micro-data, such as individual preferences, recommendations, transaction records, and so on. Their techniques are robust to perturbation in the data and tolerate some mistakes in the adversary's background knowledge. More studies regarding practical de-anonymization attacks on real data have been conducted with a specific focus on social networks; in this case, the reader is referred to [8,9,10,11,12] and to the references therein. For example, in [8], a novel de-anonymization attack that exploits group membership information that is available on social networking sites was introduced. The authors showed that information about the group memberships of a user (i.e., the groups of a social network to which a user belongs) is sufficient to uniquely identify this person or at least to significantly reduce the set of possible candidates. Although being extremely valuable, these lines of work does not provide a complete rigorous understanding of the conditions under which anonymized databases are inclined to privacy attacks; thus, more fundamental research is currently missing around these topics.
Two fundamental problems of statistical inference in database models are correlation detection and alignment recovery. Correlation detection between two databases is basically a hypothesis testing problem; under the null hypothesis, the databases are independent, and under the alternative hypothesis, a permutation exists, for which the databases are correlated. In this task, the main objective is to achieve the best trade-off between type Ⅰ and type Ⅱ error probabilities. In the problem of database alignment recovery, one assumes that the two databases are correlated and wants to estimate the underlying permutation. While these two problems are intimately related, this paper is devoted solely to correlation detection.
Correlation detection of databases with n sequences, each containing d independent Gaussian entries, has recently been studied in [13,14], while correlation detection of databases with independent entries and general distributions has lately been explored in [15]. In the present manuscript, we continue along this line of work and analyze a simple statistical test that solves the problem of correlation detection between two Gaussian databases relatively efficientlyⅠ. The tester first calculates a statistic which is based on local decisions for all n2 pairs of sequences and then makes a final decision in favor of one of the hypotheses based on this statistic. Since the proposed statistical test is based on a sum of indicator random variables which are weakly dependent, the analysis of the two error probabilities, especially the type Ⅰ probability of error, turns out to be highly non-trivial, and this is the main technical contribution of this work.
ⅠThe proposed statistical test has a computational complexity of O(n2), while the generalized likelihood ratio (GLR) test has a computational complexity of O(n3). More on the GLR test can be found in Section 4.2.
We compare the performances of the proposed statistical test with those of [13]. The main problem in [13] was to identify the smallest possible correlation coefficient between each consecutive entries of the databases, as a function of n and d, such that the sum of the type Ⅰ and type Ⅱ error probabilities can be driven to zero as n and d tend to infinity. While the detector that was proposed in [13] can also be used for non-asymptotic correlation values, we show that for a wide range of parameter choices, its performance is inferior to that of the new proposed statistical test. In addition to the comparison between the analytically derived theoretical guarantees, we also present Monte–Carlo simulation results for specific parameter values, which also demonstrate that the proposed statistical test outperforms the previous one.
Alignment recovery of correlated databases has been investigated from an information-theoretic point of view. The database alignment problem was originally introduced by Cullina et al. [16]. The discrete setting was studied in [16], which derived achievability and converse bounds in terms of a newly introduced measure of correlation, called the cycle mutual information. Exact recovery of the underlying permutation for correlated Gaussian databases was studied in [17], and a follow-up work extended the results to partial recovery [18]. A typicality-based framework for permutation estimation was investigated in [19]. Database matching under random column deletions and adversarial column deletions were researched, respectively, in [20] and [21]. Theoretical guarantees for the de-anonymization of random correlated databases without prior knowledge of the data distribution were proposed in [22]. The problem of database matching under noisy synchronization errors was recently studied in [23]. The Gaussian database alignment recovery problem is equivalent to a certain idealized tracking problem studied in [24,25]. The recovery problem with two correlated random graphs has also been investigated in the past few years. A starting point of this problem proposed the correlated Erdős–Rényi graph model with dependent Bernoulli edge pairs [26]. A subsequent work studied the recovery problem in the Gaussian setting [27]. More recent papers investigated the corresponding detection problem for correlation between graphs [28,29]. It has lately been proved [30,31] that detecting whether Gaussian graphs are correlated is as difficult as recovering the node labels.
The continuation of this paper is organized as follows. In Section 2, we establish the notation conventions. In Section 3, we formalize the model and formulate our main objectives. In Section 4, we state some results from previous work and discuss the GLR test. In Section 5, we state the new proposed statistical test and then provide and discuss the main results of this work. In Section 6, we introduce the novel graph-theoretic concepts that have paved the way to handling the type Ⅰ probability of error of the proposed test. In Section 7, we compare the performances of the proposed test and an existing one. Section 8 concludes the work and suggests some open avenues for future research. All the results of this work are proved in the appendices.
Throughout the paper, random variables will be denoted by capital letters, and the specific values they take will be denoted by the corresponding lower case letters. Random vectors and their realizations will be denoted, respectively, by capital letters and the corresponding lowercase letters, both in bold font. For example, the random vector X=(X1,X2,…,Xd) (d-positive integer) may take a specific vector value x=(x1,x2,…,xd) in Rd. When used in the linear-algebraic context, these vectors should be thought of as column vectors, so when they appear with the superscript T, they will be transformed into row vectors by transposition. Thus, xTy is the inner product of x and y. The notation ‖x‖ will stand for the Euclidean norm of the vector x. As is customary in probability theory, we write X=(X1,…,Xd)∼N(0d,Id) (with 0d being a vector of d zeros and Id being the d×d identity matrix) to denote that the probability density function of X is
PX(x)=(2π)−d/2⋅exp(−12‖x‖2),x∈Rd. | (2.1) |
For X=(X1,…,Xd) and Y=(Y1,…,Yd), we write
(X,Y)∼N⊗d([00],[1ρρ1]) | (2.2) |
to denote the fact that X and Y are jointly Gaussian with independent and identically distributed (IID) pairs, where each pair (Xi,Yi), i∈{1,…,d}, is a Gaussian vector with a zero mean and the specified correlation ρ. Logarithms are taken to the natural base. The probability of an event E will be denoted by P{E} and the indicator function by 1{E}. The expectation operator will be denoted by E[⋅]. The set of all permutations of {1,2,…,n} is denoted by Sn
Let Xn be a database comprising the n feature vectors {X1,…,Xn} and let Yn be a second database comprising the n feature vectors {Y1,…,Yn}. Each feature vector contains d entries. We consider the following binary hypothesis testing problem. Under the null hypothesis H0, the Gaussian databases Xn and Yn are generated independently with X1,…,XnIID∼N(0d,Id) and Y1,…,YnIID∼N(0d,Id) (Figure 1). Let us use P0 to denote the probability distribution that governs (Xn,Yn) under H0. Under the alternative hypothesis H1, the databases Xn and Yn are correlated with some unknown permutation σ=(σ1,σ2,…,σn)∈Sn and some known correlation coefficient ρ∈(0,1) (Figure 2). If ρ∈(−1,0), then one considers the negation of Yn. Let us use P1|σ to denote the probability distribution that governs (Xn,Yn) under H1, for some permutation σ∈Sn. In summary,
H0:(X1,Y1),…,(Xn,Yn)IID∼N⊗d([00],[1001])H1:(X1,Yσ1),…,(Xn,Yσn)IID∼N⊗d([00],[1ρρ1]), | (3.1) |
for some permutation σ∈Sn.
We would like to consider a problem of correlation detection. Given the databases Xn and Yn (and not the permutation σ), a statistical test ϕ:Rd×n×Rd×n→{0,1} decides whether the null hypothesis or the alternative hypothesis occurred.
For a given n and d, the type Ⅰ probability of error of a test ϕ is
P FA(ϕ)△=P0{ϕ(Xn,Yn)=1}, | (3.2) |
and the type Ⅱ probability of error is
P MD(ϕ)△=max | (3.3) |
where in (3.2), FA stands for false alarm, and in (3.3), MD stands for missed detection.
For the specific statistical test that is proposed in Section 5, which solves the testing problem defined above, our main objective is to find theoretical guarantees on the two error probabilities as functions of the parameters \rho , n , and d .
The testing problem between databases with IID Gaussian entries has already been considered in [13,14]. For this specific testing problem, the following statistical test was proposed in [13]. The statistic is defined by
\begin{align} T \stackrel{\triangle}{ = } \sum\limits_{i = 1}^{n} \sum\limits_{j = 1}^{n} \boldsymbol{X}_{i}^{\mathsf{T}} \boldsymbol{Y}_{j}, \end{align} | (4.1) |
and the sum test is defined by comparing T with a threshold
\begin{align} \phi_{\mbox{ Sum}}( \boldsymbol{X}^{n}, \boldsymbol{Y}^{n}) = \left\{ \begin{array}{l l} 0 & \quad { T < t } \\ 1 & \quad { T \geq t .} \end{array} \right. \end{align} | (4.2) |
The following result was proved in [13, Section Ⅳ].
Proposition 1. (Sum test) Let t = \sqrt{\gamma}\tfrac{dn}{2} with \gamma \in (0, 4\rho^{2}) . The error probabilities of the sum test \phi_{\mathit{\mbox{Sum}}} for the binary hypothesis testing problem (3.1) are upper-bounded by
\begin{align} P_{\mathit{\mbox{FA}}}(\phi_{\mathit{\mbox{Sum}}}) &\leq \exp\left(-\frac{d}{2}G_{\mathit{\mbox{FA}}}(\gamma)\right) \end{align} | (4.3) |
\begin{align} P_{\mathit{\mbox{MD}}}(\phi_{\mathit{\mbox{Sum}}}) &\leq \exp\left(-\frac{d}{2}G_{\mathit{\mbox{MD}}}(\gamma)\right), \end{align} | (4.4) |
where,
\begin{align} G_{\mathit{\mbox{FA}}}(\gamma) & \stackrel{\triangle}{ = } \sqrt{1+\gamma} - 1 - \ln\left(\frac{1+\sqrt{1+\gamma}}{2}\right), \end{align} | (4.5) |
\begin{align} G_{\mathit{\mbox{MD}}}(\gamma) & \stackrel{\triangle}{ = } \frac{1}{1-\rho^{2}}\left(\sqrt{(1-\rho^{2})^{2}+\gamma} - \sqrt{\rho^{2}\gamma}\right) - 1 - \ln\left(\frac{1-\rho^{2}+\sqrt{(1-\rho^{2})^{2}+\gamma}}{2}\right). \end{align} | (4.6) |
The GLR test is known to be an asymptotically uniformly most powerful (UMP)Ⅱ test among all invariant tests [32, Section 6.4.2]. Although there are no general results regarding non-asymptotic properties of the GLR test, it is known to be optimal in special cases [33]. Since the sum test from Section 4.1, as well as the count test (which will be presented in the following section) can be proved to be invariant tests, it makes sense to also briefly present the GLR test and discuss its computational complexity. We first derive a simplified statistic for the GLR test. Since only the alternative hypothesis is composite, the GLR test is given by
ⅡA UMP test minimizes the type Ⅱ probability of error among all tests with a constrained type Ⅰ probability of error. For example, the Neyman-Pearson test is UMP when testing between two simple hypotheses.
\begin{align} \frac{\max\nolimits_{ \boldsymbol{\sigma} \in {\cal S}_{n}}p_{1| \boldsymbol{\sigma}}( \boldsymbol{x}^{n}, \boldsymbol{y}^{n})}{p_{0}( \boldsymbol{x}^{n}, \boldsymbol{y}^{n})} \gtrless \lambda, \end{align} | (4.7) |
for some \lambda \in \mathbb{R} , or, more explicitly, by
\begin{align} \frac{\max\nolimits_{ \boldsymbol{\sigma} \in {\cal S}_{n}} \prod\nolimits_{k = 1}^{n} c_1 \exp\left\{-\frac{1}{2(1-\rho^2)}(\| \boldsymbol{x}_k\|^{2} -2\rho \boldsymbol{x}_{k}^{ \mathsf{T}} \boldsymbol{y}_{\sigma_k} + \| \boldsymbol{y}_{\sigma_k}\|^{2})\right\}}{\prod\nolimits_{k = 1}^{n} c_0 \exp\left\{-\frac{1}{2}(\| \boldsymbol{x}_k\|^{2} + \| \boldsymbol{y}_k\|^{2})\right\}} \gtrless \lambda. \end{align} | (4.8) |
By simple manipulations,
\begin{align} \max\limits_{ \boldsymbol{\sigma} \in {\cal S}_{n}} \left\{-\frac{1}{1-\rho^2}\sum\limits_{k = 1}^{n}(\| \boldsymbol{x}_k\|^{2} -2\rho \boldsymbol{x}_{k}^{ \mathsf{T}} \boldsymbol{y}_{\sigma_k} + \| \boldsymbol{y}_{\sigma_k}\|^{2})\right\} + \sum\limits_{k = 1}^{n}(\| \boldsymbol{x}_k\|^{2} + \| \boldsymbol{y}_{k}\|^{2}) \gtrless \lambda', \end{align} | (4.9) |
which is equivalent to
\begin{align} \max\limits_{ \boldsymbol{\sigma} \in {\cal S}_{n}} \sum\limits_{k = 1}^{n} 2\rho \boldsymbol{x}_{k}^{ \mathsf{T}} \boldsymbol{y}_{\sigma_k} - \sum\limits_{k = 1}^{n}(\rho^2\| \boldsymbol{x}_k\|^{2} + \rho^2\| \boldsymbol{y}_{k}\|^{2}) \gtrless \lambda". \end{align} | (4.10) |
Thus, the GLR test requires us to find the solution of an assignment problem of size n with the scores given by the empirical correlations. Relying on the Kuhn–Munkres algorithm [34,35,36] (also known as the Hungarian algorithm) and its modifications [37], this assignment problem can be solved with a running time of {\cal O}(n^{3}) . It is important to note that this computational complexity holds in general, but when the optimal assignment is relatively easy to find (e.g., for a diagonal score matrix), the running time of the Hungarian algorithm is reduced to {\cal O}(n^{2}) . In our settings, this holds when \rho is close to 1.
While the GLR test has high chances of performing well in the studied testing problem, it has two main disadvantages. First, as explained above, its computational complexity grows cubically in n , which is discouraging from a practical point of view; second, the GLR test does not lend itself to a tractable analytical analysis (specifically, the type Ⅰ error event). These facts motivate us to propose an alternative statistical test that will be computationally efficient (relative to the GLR test) and analytically tractable, on the one hand, and perform at least as well as the existing sum test, on the other hand. The count test, which is the main theme of the next section, fulfills all these requirements.
In order to motivate the new proposed statistical test, let us shortly consider the sum test in Section 4.1. Note that its statistic can also be written as follows:
\begin{align} T = \left(\sum\limits_{i = 1}^{n} \boldsymbol{X}_{i}\right)^{ \mathsf{T}} \left(\sum\limits_{j = 1}^{n} \boldsymbol{Y}_{j}\right), \end{align} | (5.1) |
which means that one has to first sum up all the n feature vectors in each database to get the vectors \hat{ \boldsymbol{X}} and \hat{ \boldsymbol{Y}} , and then test whether these vectors are correlated or not. The meaning of transforming each database into a single vector is obvious; one wastes the original structure of the problem and, as a consequence, valuable information is lost. One has to note that, nonetheless there is a significant advantage to such a procedure: The computational complexity of the sum test is linear in n . As opposed to the sum test, we would like to take advantage of the fact that each database comprises n sequences. We note the following simple observation: The inner product between two independent vectors \boldsymbol{X}, \boldsymbol{Y} \sim {\cal N}(\boldsymbol{0}_{d}, {\bf{I}}_{d}) is typically close to zero, while the inner product between the vectors
\begin{align} ( \boldsymbol{X}, \boldsymbol{Y}) \sim {\cal N}^{\otimes d}\left(\left[\begin{matrix} 0 \\ 0 \end{matrix}\right], \left[\begin{matrix} 1 & \rho \\ \rho & 1 \end{matrix}\right] \right) \end{align} | (5.2) |
concentrates around \rho d . This means that one can first test in a local manner for each pair of feature vectors and in a second step, count how many of them seem to be correlated. Ideally, under the null hypothesis, the number of detected correlated pairs is close to zero, while under the alternative hypothesis, this number should be close to n times the probability of local detection, which, in turn, should be close to one. Thus, on the basis of the count of local detections, one should be able to distinguish very well between the two hypotheses. It should be noted that the computational complexity of this testing procedure is {\cal O}(n^2) , which is higher than that of the sum test, but still much lower than the complexity of the GLR test ( {\cal O}(n^3) ).
According to the explanation above, an appropriate statistic is given by
\begin{align} M(\xi) \stackrel{\triangle}{ = } \sum\limits_{i = 1}^{n} \sum\limits_{j = 1}^{n} \mathbb{1} \left\{ \boldsymbol{X}_{i}^{\mathsf{T}} \boldsymbol{Y}_{j} \geq \xi \right\}, \end{align} | (5.3) |
where \xi is some predefined threshold. While a test based on M(\xi) can be analyzed, it turns out that by first normalizing the feature vectors before calculating the inner products, one can propose a test which lends itself to a much tighter analysis. We will elaborate more on this point in Section 6. In the meanwhile, we continue as follows. Denote the normalized vectors
\begin{align} \tilde{ \boldsymbol{X}}_{i} = \frac{ \boldsymbol{X}_{i}}{\| \boldsymbol{X}_{i}\|}, \; \; i \in \{1, 2, \ldots, n\}, \end{align} | (5.4) |
and
\begin{align} \tilde{ \boldsymbol{Y}}_{j} = \frac{ \boldsymbol{Y}_{j}}{\| \boldsymbol{Y}_{j}\|}, \; \; j \in \{1, 2, \ldots, n\}, \end{align} | (5.5) |
and define the statistic
\begin{align} N(\theta) \stackrel{\triangle}{ = } \sum\limits_{i = 1}^{n} \sum\limits_{j = 1}^{n} \mathbb{1} \left\{\tilde{ \boldsymbol{X}}_{i}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{j} \geq \theta \right\}, \end{align} | (5.6) |
where \theta \in (0, 1) is the local threshold.
In order to present our statistical test, we need some definitions. For two feature vectors \boldsymbol{X} and \boldsymbol{Y} , correlated according to a given \rho \in (0, 1) , the local detection probability is defined by
\begin{align} P \equiv P(d, \rho, \theta) \stackrel{\triangle}{ = } \mathbb{P}\left\{\tilde{ \boldsymbol{X}}^{\mathsf{T}} \tilde{ \boldsymbol{Y}} \geq \theta \right\}, \end{align} | (5.7) |
while for two independent feature vectors (i.e., \rho = 0 ), the local false-alarm probability is defined by
\begin{align} Q \equiv Q(d, \theta) \stackrel{\triangle}{ = } P(d, 0, \theta). \end{align} | (5.8) |
Under \mathsf{H}_{1} , note that the expectation of N(\theta) is given by
\begin{align} \Delta_{n, d} \stackrel{\triangle}{ = } \mathbb{E}_{1| \boldsymbol{\sigma}}[N(\theta)] & = \mathbb{E}_{1| \boldsymbol{\sigma}}\left[\sum\limits_{i = 1}^{n} \sum\limits_{j = 1}^{n} \mathbb{1}\left\{\tilde{ \boldsymbol{X}}_{i}^{\mathsf{T}} \tilde{ \boldsymbol{Y}}_{j} \geq \theta \right\}\right] \end{align} | (5.9) |
\begin{align} & = \sum\limits_{i = 1}^{n} \sum\limits_{j = 1}^{n} \mathbb{P}_{1| \boldsymbol{\sigma}} \left\{\tilde{ \boldsymbol{X}}_{i}^{\mathsf{T}} \tilde{ \boldsymbol{Y}}_{j} \geq \theta \right\} \end{align} | (5.10) |
\begin{align} & = nP + n(n-1)Q. \end{align} | (5.11) |
Thus, the proposed test compares N(\theta) with a threshold as follows:
\begin{align} \phi_{\mbox{ Count}}( \boldsymbol{X}^{n}, \boldsymbol{Y}^{n}) = \left\{ \begin{array}{l l} 0 & \quad { N(\theta) < \beta \Delta_{n, d} } \\ 1 & \quad { N(\theta) \geq \beta \Delta_{n, d} , } \end{array} \right. \end{align} | (5.12) |
where \beta \in (0, 1) is the global threshold.
Our proposed statistical test may be presented in a simple graphical way. One has to draw a table with n rows and n columns. In this table, Row i stands for \boldsymbol{X}_{i} and Column j for \boldsymbol{Y}_{j} . The cell (i, j) is filled with a dot only if \tilde{ \boldsymbol{X}}_{i}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{j} \geq \theta . The dots are then counted and one rejects the null if the total number of dots exceeds a threshold of \beta[nP+n(n-1)Q] . Typical tables for n = 10 under \mathsf{H}_{0} and under \mathsf{H}_{1} with the identity permutation are given in Figure 3.
While the two error probabilities of the sum test (4.1)-(4.2) were upper-bounded using standard large deviation techniques, i.e., Chernoff's bound, this cannot be similarly done with the new proposed test, since calculating the moment-generating function of the double summation in (5.6) seems to be hopeless. Thus, alternative tools from large deviation theory have to be invoked. We start with the type Ⅱ probability of error, which is given explicitly by
\begin{align} P_{\mbox{ MD}}(\phi_{\mbox{ Count}}) & = \mathbb{P}_{1| \boldsymbol{\sigma}} \left\{\sum\limits_{i = 1}^{n} \sum\limits_{j = 1}^{n} \mathbb{1}\left\{\tilde{ \boldsymbol{X}}_{i}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{j} \geq \theta \right\} \leq \beta [nP+n(n-1)Q] \right\}. \end{align} | (5.13) |
The main difficulty in analyzing this lower tail probability is the statistical dependencies between the indicator random variables. Nevertheless, an existing large deviation result by Janson [38, Theorem 10] provides the appropriate tool to handle the situation at hand. Other recently treated settings where tools from [38] proved useful can be found in [39, Appendixes B and I] and [40, Appendix A.10].
Regarding the type Ⅰ probability of error, the situation is somewhat more complicated, since no general large deviation bounds exist to assess upper tail probabilities of the form
\begin{align} P_{\mbox{ FA}}(\phi_{\mbox{ Count}}) & = \mathbb{P}_{0} \left\{\sum\limits_{i = 1}^{n} \sum\limits_{j = 1}^{n} \mathbb{1}\left\{\tilde{ \boldsymbol{X}}_{i}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{j} \geq \theta \right\} \geq \beta [nP+n(n-1)Q] \right\}. \end{align} | (5.14) |
Hence, the best we could do is to upper-bound this probability using a generalization of Chebyshev's inequality as follows:
\begin{align} P_{\mbox{ FA}}(\phi_{\mbox{ Count}}) \leq \frac{ \mathbb{E}_{0}\left[\left(\sum\nolimits_{i = 1}^{n} \sum\nolimits_{j = 1}^{n} \mathbb{1}\left\{\tilde{ \boldsymbol{X}}_{i}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{j} \geq \theta \right\}\right)^{\alpha}\right]}{\left(\beta [nP+n(n-1)Q] \right)^{\alpha}}, \end{align} | (5.15) |
where \alpha > 0 may be optimized to yield the tightest bound. In order to derive (or at least, to upper-bound) the \alpha -th moment in (5.15), we consider two different cases. On the one hand, when \alpha \in (0, 1] , we are able to upper-bound the \alpha -th moment using similar techniques as in the proof of the lower bound on the error exponent of the typical random code [41, p. 6229, right column]. On the other hand, for a general \alpha > 1 , we are not able to handle the \alpha -th moment; hence, instead, we have considered only the k -th integer moments of orders k \geq 2 . For the special case k = 2 , the calculation is relatively straightforward, but for a general k \geq 3 , the situation is somewhat more complicated. Thus, new techniques that rely on graph-counting considerations are developed in Appendix C to prove Theorem 2 (which is stated in Section 6).
In the following result, which is proved in Appendix B, we present upper bounds on the type Ⅰ and type Ⅱ error probabilities for a general set of parameters. Let us denote the Stirling numbers of the second kind as follows:
\begin{align} \mathsf{S}(k, \ell) = \frac{1}{\ell!} \sum\limits_{i = 0}^{\ell} (-1)^{i} \binom{\ell}{i}(\ell-i)^{k}, \end{align} | (5.16) |
and define the numbers
\begin{align} \mathsf{B}(k) = \sum\limits_{\ell = 1}^{k} \mathsf{S}(k, \ell) \cdot \frac{k^{2\ell}}{\ell!}. \end{align} | (5.17) |
Theorem 1. (Type Ⅰ and type Ⅱ probability bounds) Let n, d \in \mathbb{N} and \rho \in (0, 1) be given. Let P = P(d, \rho, \theta) and Q = Q(d, \theta) as defined above. Then, for any \beta \in (0, 1) and \theta \in (0, 1)
\begin{align} P_{\mathit{\mbox{FA}}}(\phi_{\mathit{\mbox{Count}}}) &\leq \min \left\{ \min\limits_{\alpha \in [0, 1]} \frac{(n^{2}Q)^{\alpha} \cdot \mathbb{1}\{n^{2}Q > 1\} + n^{2}Q \cdot \mathbb{1}\{n^{2}Q \leq 1\}}{\left(\beta [nP+n(n-1)Q]\right)^{\alpha}} , \frac{n^{2}(n^{2}-1)Q^{2} + n^{2}Q}{\left(\beta [nP+n(n-1)Q]\right)^{2}}, \right. \\ &\; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \left. \inf\limits_{k \geq 3} \frac{k(k+1)\mathsf{B}(k)\left[(n^{2}Q)^{k} \cdot \mathbb{1}\{n^{2}Q > 1\} + n^{2}Q \cdot \mathbb{1}\{n^{2}Q \leq 1\}\right]}{\left(\beta [nP+n(n-1)Q]\right)^{k}} \right\}, \end{align} | (5.18) |
and,
\begin{align} P_{\mathit{\mbox{MD}}}(\phi_{\mathit{\mbox{Count}}}) &\leq \exp \left\{- \min \left( (1-\beta)^{2} \frac{nP + n(n-1)Q}{16Qn + 2}, (1-\beta) \frac{n}{12} \right) \right\}. \end{align} | (5.19) |
Regarding the asymptotic behavior of the bounds. One benefit of Proposition 1 (Section 4.1) with respect to Theorem 1 is the clear conclusion that when d tends to infinity, the error probabilities vanish exponentially. In order to understand the asymptotic behavior of the upper bounds in Theorem 1, we first have to handle the quantities Q(d, \theta) and P(d, \rho, \theta) . Q(d, \theta) is the probability of an independent pair of Gaussian vectors having an angle larger than \theta , and thus, thanks to symmetry, Q(d, \theta) equals the probability that a uniformly distributed vector on a d -dimensional hyper-sphere of a unit norm will fall within a d -dimensional hyper-spherical cap with a half-angle \varphi = \arccos(\theta) . This probability is given by the expression [44]
\begin{align} Q(d, \theta) = \frac{1}{2}\frac{B\left(\sin^{2}(\varphi); \frac{d-1}{2}, \frac{1}{2}\right)}{B\left(\frac{d-1}{2}, \frac{1}{2}\right)} = \frac{1}{2}\frac{B\left(1-\theta^{2}; \frac{d-1}{2}, \frac{1}{2}\right)}{B\left(\frac{d-1}{2}, \frac{1}{2}\right)}, \end{align} | (5.20) |
where the complete beta function is defined by
\begin{align} B(a, b) = \int_{0}^{1} t^{a-1}(1-t)^{b-1} \mathrm{d} t, \quad a, b > 0, \end{align} | (5.21) |
and the incomplete beta function by
\begin{align} B(x; a, b) = \int_{0}^{x} t^{a-1}(1-t)^{b-1} \mathrm{d} t, \quad x\in[0, 1], \; a, b > 0. \end{align} | (5.22) |
The following result, which is proved in Appendix D, shows that Q(d, \theta) converges exponentially fast to zero.
Lemma 1. Let d \in \mathbb{N} and \theta \in (0, 1) . Let
\begin{align} ( \boldsymbol{X}, \boldsymbol{Y}) \sim {\cal N}^{\otimes d}\left(\left[\begin{matrix} 0 \\ 0 \end{matrix}\right], \left[\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix}\right] \right), \end{align} | (5.23) |
and
\begin{align} \tilde{ \boldsymbol{X}} = \frac{ \boldsymbol{X}}{\| \boldsymbol{X}\|}, \; \; \; \tilde{ \boldsymbol{Y}} = \frac{ \boldsymbol{Y}}{\| \boldsymbol{Y}\|}. \end{align} | (5.24) |
Then,
\begin{align} \mathbb{P}\left\{\tilde{ \boldsymbol{X}}^{\mathsf{T}} \tilde{ \boldsymbol{Y}} \geq \theta \right\} \leq \frac{1}{\theta\sqrt{d}} \cdot \left(1-\theta^{2}\right)^{\frac{d-1}{2}}. \end{align} | (5.25) |
In addition, one can prove that P(d, \rho, \theta) converges to one as d \to \infty for any \theta \in (0, \rho) . Since the proof of this result is quite long and relatively technical, we refrain from presenting it here. Going back to the bounds in Theorem 1, we conclude the following. Regarding the bound in (5.18), it can be further upper-bounded as
\begin{align} P_{\mbox{ FA}}(\phi_{\mbox{ Count}}) &\leq \frac{n^{2}(n^{2}-1)Q(d, \theta)^{2} + n^{2}Q(d, \theta)}{\left(\beta [nP(d, \rho, \theta)+n(n-1)Q(d, \theta)]\right)^{2}}, \end{align} | (5.26) |
which vanishes exponentially fast for any fixed n as d \to \infty . Regarding the bound in (5.19), the situation is different, because when d \to \infty , we find that
\begin{align} P_{\mbox{ MD}}(\phi_{\mbox{ Count}}) &\leq \exp \left\{- n \cdot \min \left\{ (1-\beta)^{2}/2 , (1-\beta)/12 \right\} \right\}, \end{align} | (5.27) |
which is exponentially small but fixed in n .
Several remarks are now in order.
Remark 1. The threshold \beta \in (0, 1) trades-off between the two error probabilities. When \beta is relatively low, then relatively few "dots" are required to reject \mathsf{H}_{0} , hence the type Ⅰ error probability is high and the type Ⅱ error probability is low. When \beta grows towards 1, an increasing number of dots are needed to reject \mathsf{H}_{0} ; the type Ⅰ error probability gradually decreases while the type Ⅱ error probability increases.
Remark 2. The relatively interesting fact (at least to the writer of these lines) is the apparent dichotomy between two regions in the parameter space \{(n, d)\} . On the one hand, if n^{2}Q(d, \theta) , which is the expected number of dots under \mathsf{H}_{0} , is smaller than 1, then the \alpha -th moment in (5.15) is proportional to n^{2}Q(d, \theta) for any \alpha \in (0, 1]\cup\{2, 3, \ldots\} . In this case, the type Ⅰ error probability is relatively small, since the typical number of "correlated" pairs of vectors is zero. On the other hand, if (n, d) are such that n^{2}Q(d, \theta) \geq 1 , then the \alpha -th moment in (5.15) is proportional to [n^{2}Q(d, \theta)]^{\alpha} . In this case, any increment in n^{2}Q(d, \theta) (due to an increasing n or a decreasing d ) causes a relatively sharp increment in the type Ⅰ error probability. Such phenomena, where moments of enumerators (i.e., sums of IID or weakly dependent indicator random variables) undergo phase transitions have already been encountered multiple times, e.g., in information theory [39, Appendix B, Lemma 3], [42, p. 168], [43, Appendix A, Lemma 1.1].
Remark 3. In this manuscript, we only consider a noiseless setup. Regarding noisy setups where the data are affected by additive white Gaussian noiseⅢ, consider the following. Let
ⅢObfuscation, which refers to the deliberate addition of noise to the database entries, has been suggested as an additional measure to protect privacy [3].
\begin{align} ( \boldsymbol{X}, \boldsymbol{Y}) \sim {\cal N}^{\otimes d}\left(\left[\begin{matrix} 0 \\ 0 \end{matrix}\right], \left[\begin{matrix} 1 & \rho \\ \rho & 1 \end{matrix}\right] \right), \end{align} | (5.28) |
and
\begin{align} \hat{ \boldsymbol{X}} & = \frac{ \boldsymbol{X} + {\bf{U}}}{\sqrt{1+ \sigma^{2}}} \end{align} | (5.29) |
\begin{align} \hat{ \boldsymbol{Y}} & = \frac{ \boldsymbol{Y} + {\bf{V}}}{\sqrt{1+ \sigma^{2}}}, \end{align} | (5.30) |
where {\bf{U}}, {\bf{V}} \overset{\text{IID}}{\sim} {\cal N}(\boldsymbol{0}_{d}, \sigma^{2} {\bf{I}}_{d}) are independent of (\boldsymbol{X}, \boldsymbol{Y}) . We can immediately see that (\hat{ \boldsymbol{X}}, \hat{ \boldsymbol{Y}}) are jointly Gaussian with a correlation coefficient of \frac{\rho}{1+ \sigma^{2}} , and thus, the count test and its theoretical guarantees can be relied on also for this extended case.
Remark 4. Both the sum test and the count test assume perfect knowledge of \rho in order to choose desired trade-offs between the type Ⅰ and type Ⅱ error probabilities. In practice, the value of \rho may be unknown a-priori or only known to be in an interval of possible values. In the latter case, where \rho \in [\rho_0, \rho_1] , and this range is relatively narrow, the sum/count tests can still be applied with any \rho in this range, with some degraded performance due to the expected mismatch between the true \rho and the chosen one. Choosing \rho closer to \rho_0 has a preference towards lower type Ⅱ error probabilities, and vice versa. The general case where \rho is unknown and may take any value in (0, 1) seems to be more complicated and will not be discussed here in detail; we just mention that a possible solution may be in the spirit of the GLR test (as in (4.7)), where the likelihood under \mathsf{H}_{1} is now also maximized over \rho \in (0, 1) , in addition to the maximization over the permutations of the rows.
Let us begin by presenting an explicit expression for P(d, \rho, \theta) , which is the probability that the angle between a pair of correlated Gaussian vectors crosses a threshold. We first make a few definitions. For \rho \in (0, 1) and \theta \in (0, 1) , define the constants
\begin{align} \alpha(\rho) & = \frac{1-\rho^2}{\rho^2}, \end{align} | (5.31) |
\begin{align} \beta(\rho, \theta) & = \frac{1-\rho^2}{\rho^2(1-\theta^2)}, \end{align} | (5.32) |
and the functions
\begin{align} F_{1}(u) & = -\frac{\rho (1-\theta^{2})}{\sqrt{1-\rho^{2}}}\sqrt{u} - \theta\sqrt{1 - \frac{\rho^{2}(1-\theta^{2})}{1-\rho^{2}} u}, \end{align} | (5.33) |
\begin{align} F_{2}(u) & = -\frac{\rho (1-\theta^{2})}{\sqrt{1-\rho^{2}}}\sqrt{u} + \theta\sqrt{1 - \frac{\rho^{2}(1-\theta^{2})}{1-\rho^{2}} u}. \end{align} | (5.34) |
Moreover, for any d\in \mathbb{N} , we define the two probability density functions
\begin{align} f(u) = \frac{1}{B\left(\frac{d}{2}, \frac{d}{2}\right)} u^{d/2-1}\left(1+u\right)^{-d}, \; u\geq 0, \end{align} | (5.35) |
and
\begin{align} g(s) = \frac{(1-s^{2})^{\frac{d-3}{2}}}{B\left(\frac{d-1}{2}, \frac{1}{2}\right)}, \; s \in [-1, 1]. \end{align} | (5.36) |
The following lemma is proved in Appendix E.
Lemma 2. Let d \in \mathbb{N} , \rho \in (0, 1) , and \theta \in (0, 1) . Let
\begin{align} ( \boldsymbol{X}, \boldsymbol{Y}) \sim {\cal N}^{\otimes d}\left(\left[\begin{matrix} 0 \\ 0 \end{matrix}\right], \left[\begin{matrix} 1 & \rho \\ \rho & 1 \end{matrix}\right] \right), \end{align} | (5.37) |
and
\begin{align} \tilde{ \boldsymbol{X}} = \frac{ \boldsymbol{X}}{\| \boldsymbol{X}\|}, \; \; \; \tilde{ \boldsymbol{Y}} = \frac{ \boldsymbol{Y}}{\| \boldsymbol{Y}\|}. \end{align} | (5.38) |
Then,
\begin{align} P(d, \rho, \theta) = \mathbb{P}\left\{\tilde{ \boldsymbol{X}}^{\mathsf{T}} \tilde{ \boldsymbol{Y}} \geq \theta \right\} & = \int_{0}^{\alpha(\rho)} \left[\int_{F_{2}(u)}^{1} g(s) \mathrm{d} s \right] f(u) \mathrm{d} u \\ &\; \; \; + \int_{\alpha(\rho)}^{\beta(\rho, \theta)} \left[\int_{-1}^{F_{1}(u)} g(s) \mathrm{d} s + \int_{F_{2}(u)}^{1} g(s) \mathrm{d} s\right] f(u) \mathrm{d} u \\ &\; \; \; \; \; \; + \int_{\beta(\rho, \theta)}^{\infty} f(u) \mathrm{d} u. \end{align} | (5.39) |
Regarding the bound on the type Ⅰ error probability, which appears in (5.18), and is given as a minimization between three competing terms, it is definitely not obvious that all of these terms are required to achieve the best performance. In order to show that all of them are indeed required, let us compare
\begin{align} E_{\mbox{ frac}}(\beta) = -\log \left(\min\limits_{\alpha \in [0, 1]} \frac{(n^{2}Q)^{\alpha} \cdot \mathbb{1}\{n^{2}Q > 1\} + n^{2}Q \cdot \mathbb{1}\{n^{2}Q \leq 1\}}{\left(\beta [nP+n(n-1)Q]\right)^{\alpha}}\right), \end{align} | (5.40) |
\begin{align} E_{\mbox{ sec}}(\beta) = -\log \left(\frac{n^{2}(n^{2}-1)Q^{2} + n^{2}Q}{\left(\beta [nP+n(n-1)Q]\right)^{2}}\right), \end{align} | (5.41) |
and
\begin{align} E_{\mbox{ int}}(\beta) = -\log \left(\inf\limits_{k \geq 3} \frac{k(k+1)\mathsf{B}(k)\left[(n^{2}Q)^{k} \cdot \mathbb{1}\{n^{2}Q > 1\} + n^{2}Q \cdot \mathbb{1}\{n^{2}Q \leq 1\}\right]}{\left(\beta [nP+n(n-1)Q]\right)^{k}}\right). \end{align} | (5.42) |
In Figure 4 below, we present plots of E_{\mbox{ frac}}(\beta) (solid line) and E_{\mbox{ sec}}(\beta) (dashed line) for the parameters n = 200 , d = 50 , \rho = 0.4 , and \theta = 0.6 . The quantity Q(d, \theta) is calculated numerically using (5.20), and P(d, \rho, \theta) is calculated using (5.39) in Lemma 2. As can be seen in Figure 4, at relatively low \beta s, E_{\mbox{ frac}}(\beta) is a constant and it strictly improves upon E_{\mbox{ sec}}(\beta) . In this range, the optimizer is \alpha^{*} = 0 . At higher \beta values, the optimizer is \alpha^{*} = 1 , and E_{\mbox{ frac}}(\beta) is actually related to Markov's inequality. In the range of relatively high \beta values, E_{\mbox{ sec}}(\beta) outperforms E_{\mbox{ frac}}(\beta) , which is not very surprising, since Chebyshev's inequality is usually tighter than Markov's inequality.
With respect to the bound in E_{\mbox{ int}}(\beta) , the situation is somewhat more complicated and depends on the model's parameters. For E_{\mbox{ int}}(\beta) to be higher than E_{\mbox{ sec}}(\beta) , we have two contradictory conditions. On the one hand, we need n^{2}Q(d, \theta) \leq 1 , which holds for relatively high \theta values, but, on the other hand, we require nP(d, \rho, \theta) relatively large, which holds for relatively low \theta values. Thus, E_{\mbox{ int}}(\beta) improves on E_{\mbox{ sec}}(\beta) only in some cases but not generally. To capture this fact, we present plots of the three bounds in Figure 5 for the parameters n = 200 and d = 50 , but for two different pairs of (\rho, \theta) for which n^{2}Q(d, \theta) \leq 1 . Since the numbers \mathsf{B}(k) in (5.17) grow quite rapidly with k , when calculating the bound in E_{\mbox{ int}}(\beta) , we switched the infimum over \{3, 4, \ldots\} with a minimization over \{3, 4, \ldots, 30\} . For \rho = 0.6 and \theta = 0.7 , it turns out that nP(d, \rho, \theta) \approx 24.2 ; in this case, E_{\mbox{ sec}}(\beta) uniformly outperforms E_{\mbox{ int}}(\beta) for all \beta \in [0, 1] . For \rho = 0.7 and \theta = 0.6 , we find that nP(d, \rho, \theta) \approx 179.3 . In this alternative case, E_{\mbox{ int}}(\beta) is higher than E_{\mbox{ sec}}(\beta) for relatively high \beta values.
By comparing among E_{\mbox{ frac}}(\beta) , E_{\mbox{ sec}}(\beta) , and E_{\mbox{ int}}(\beta) for specific parameter values, and showing that none of them uniformly dominates the others, we can safely conclude that, generally, all three bounds are needed in order to achieve the best guarantee for the type Ⅰ error probability.
As we explained before in Section 5.3, a feasible way to upper-bound the type Ⅰ probability of error is by using the generalized Chebyshev's inequality, which, in turn, requires to evaluate (or bound) the moments
\begin{align} \mathbb{E}_{0}\left[\left(\sum\limits_{i = 1}^{n} \sum\limits_{j = 1}^{n} \mathbb{1}\left\{\tilde{ \boldsymbol{X}}_{i}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{j} \geq \theta \right\}\right)^{\alpha}\right], \end{align} | (6.1) |
where \alpha > 0 . For \alpha \in (0, 1] we relied on existing techniques from [41]. The case \alpha > 1 seems to be much more intricate; nonetheless, for k -th order integer moments, we were able to derive plausible bounds by using some graph-theoretic concepts. In what follows, we present a slightly more general result, which is the main theoretical contribution of this work.
Theorem 2. Let n, d \in \mathbb{N} . Let \boldsymbol{X}_{1}, \ldots, \boldsymbol{X}_{n} and \boldsymbol{Y}_{1}, \ldots, \boldsymbol{Y}_{n} be two sets of IID random variables taking values in \mathbb{R}^{d} . Let J: \mathbb{R}^{d} \times \mathbb{R}^{d} \to \{0, 1\} and assume that
\begin{align} \mathbb{P}(J( \boldsymbol{x}, \boldsymbol{Y}) = 1) &\leq Q, \; \; \forall \boldsymbol{x} \in \mathbb{R}^d, \end{align} | (6.2) |
\begin{align} \mathbb{P}(J( \boldsymbol{X}, \boldsymbol{y}) = 1) &\leq Q, \; \; \forall \boldsymbol{y} \in \mathbb{R}^d, \end{align} | (6.3) |
\begin{align} \mathbb{P}(J( \boldsymbol{X}, \boldsymbol{Y}) = 1) &\leq Q, \end{align} | (6.4) |
where Q \in [0, 1] is independent of \boldsymbol{x} and \boldsymbol{y} . Let k \in \mathbb{N} . Then,
\begin{align} \mathbb{E} \left[\left( \sum\limits_{\ell = 1}^{n} \sum\limits_{m = 1}^{n} J( \boldsymbol{X}_{\ell}, \boldsymbol{Y}_{m}) \right)^{k}\right] \leq k(k+1)\mathsf{B}(k) \cdot \left\{ \begin{array}{l l} (n^{2}Q)^{k} & \quad \mathit{{if\; n^{2}Q > 1 , }} \\ n^{2}Q & \quad \mathit{{if\; n^{2}Q \leq 1 .}} \end{array} \right. \end{align} | (6.5) |
A proof of Theorem 2 is given in Appendix C. Due to the three technical uniformity conditions in (6.2)–(6.4), we can better understand the motivation for the initial normalization of the feature vectors in (5.4) and (5.5). Thanks to these normalizations and the spherical symmetry of the Gaussian distribution, we have the following for any \boldsymbol{x} \in \mathbb{R}^d and \boldsymbol{y} \in \mathbb{R}^d :
\begin{align} \mathbb{P}\left\{\tilde{ \boldsymbol{x}}^{\mathsf{T}} \tilde{ \boldsymbol{Y}} \geq \theta \right\} = \mathbb{P}\left\{\tilde{ \boldsymbol{X}}^{\mathsf{T}} \tilde{ \boldsymbol{y}} \geq \theta \right\} = \mathbb{P}\left\{\tilde{ \boldsymbol{X}}^{\mathsf{T}} \tilde{ \boldsymbol{Y}} \geq \theta \right\} = Q(d, \theta). \end{align} | (6.6) |
Thus, we can use Theorem 2 when deriving theoretical guarantees on the type Ⅰ error probability of the count test as defined with N(\theta) . On the other hand, referring to the statistic in (5.3), which does not involve data normalization, we see that the conditions in (6.2)–(6.4) are not satisfied. For example, the probability \mathbb{P}\left\{ \boldsymbol{x}^{\mathsf{T}} \boldsymbol{Y} \geq \theta \right\} is given by a function of \theta and \| \boldsymbol{x}\| .
We now move to explaining the main proof concepts of Theorem 2 by referring to a slightly different probabilistic problem. The main benefit of switching to a close problem is that its proof techniques are very close to those required in the proof of Theorem 2, but they are still easy to understand from simple diagrams.
Let \boldsymbol{X}_{1}, \ldots, \boldsymbol{X}_{M} be a set of IID random variables taking values in \mathbb{R}^{d} and let J: \mathbb{R}^{d} \times \mathbb{R}^{d} \to \{0, 1\} . Let us denote the indicator random variables {\cal I}(m, m') = J(\boldsymbol{X}_m, \boldsymbol{X}_{m'}) and the enumerator
\begin{align} N = \sum\limits_{(m, m') \in [M]^2} {\cal I}(m, m'), \end{align} | (6.7) |
where the set [M]^2 is an abbreviation for the set \{(m, m'): m, m' \in \{1, 2, \ldots, M\}, m \neq m'\} . The main objective is to derive an upper bound for \mathbb{E}[N^k] . In what follows, we provide and explain only the novel concepts in analyzing this k -th-order moment. We divide the analysis into three main steps: Graph counting, graph pruning, and graph reduction.
For any k \in \mathbb{N} , let \mathsf{S}(k, d) be the number of ways to partition a set of k labeled objects into d \in \{1, 2, \dotsc, k\} nonempty unlabeled subsets, which is given by Stirling numbers of the second kind [45]:
\begin{align} \mathsf{S}(k, d) = \frac{1}{d!} \sum\limits_{i = 0}^{d} (-1)^{i} \binom{d}{i}(d-i)^{k}. \end{align} | (6.8) |
We begin as follows:
\begin{align} \mathbb{E}[N^{k}] & = \mathbb{E} \left[ \left( \sum\limits_{(m, m') \in [M]^{2}} {\cal I}(m, m') \right)^{k} \right] \end{align} | (6.9) |
\begin{align} & = \sum\limits_{(m_{1}, m'_{1}) \in [M]^{2}} \sum\limits_{(m_{2}, m'_{2}) \in [M]^{2}} \dots \sum\limits_{(m_{k}, m'_{k}) \in [M]^{2}} \mathbb{E} \left[ {\cal I}(m_{1}, m'_{1}) \cdot {\cal I}(m_{2}, m'_{2}) \cdot \ldots \cdot {\cal I}(m_{k}, m'_{k}) \right] \end{align} | (6.10) |
\begin{align} & = \sum\limits_{d = 1}^{k} \sum\limits_{\left\{\substack{(m_{i}, m'_{i}) \in [M]^{2}, \; 1 \leq i \leq k, \\ \text{divided into } d \text{ subsets of identical pairs}} \right\}} \mathbb{E} \left[ {\cal I}(m_{1}, m'_{1}) \cdot {\cal I}(m_{2}, m'_{2}) \cdot \ldots \cdot {\cal I}(m_{k}, m'_{k}) \right] \end{align} | (6.11) |
\begin{align} & = \sum\limits_{d = 1}^{k} \mathsf{S}(k, d) \sum\limits_{\left\{\substack{(m_{i}, m'_{i}) \in [M]^{2}, \; 1 \leq i \leq d, \\ (m_{i}, m'_{i}) \neq (m_{j}, m'_{j}) \; \forall i \neq j} \right\}} \mathbb{E} \left[ {\cal I}(m_{1}, m'_{1}) \cdot {\cal I}(m_{2}, m'_{2}) \cdot \ldots \cdot {\cal I}(m_{d}, m'_{d}) \right] , \end{align} | (6.12) |
where, in the inner summation in (6.11), we sum over all possible k pairs of vector indices, which are divided in any possible way into exactly d subsets, and all pairs in each subset are identicalⅣ. In (6.12), we use the Stirling numbers of the second kind and sum over exactly d distinct pairs of vector indices, where all the identical pairs of indices in (6.11) have been merged together, using the trivial fact that multiplying any number of identical indicator random variables is equal to any one of them.
ⅣTwo pairs (m_{1}, m'_{1}) and (m_{2}, m'_{2}) are called identical if and only if m_{1} = m_{2} and m'_{1} = m'_{2} , otherwise, they said to be distinct.
Let us handle the inner sum in (6.12). The idea is as follows. Instead of summing over the set \{(m_{i}, m'_{i}) \in [M]^{2}, \; 1 \leq i \leq d, \; (m_{i}, m'_{i}) \neq (m_{j}, m'_{j}) \; \forall i \neq j\} of d distinct pairs of vector indices, we represent each possible configuration of the indices in this set as a graph G , and sum over all the different graphs with exactly d distinct edges. In our graph representation, each vector index m_{i} \in \{1, 2, \dotsc, M\} is denoted by a vertex, and each pair of indices (m_{i}, m'_{i}) , m_{i} \neq m'_{i} , is connected by an edgeⅤ. Hence, the number of edges is fixed, but the numbers of vertices and subgraphs (i.e., disconnected parts of the graph) are variable.
ⅤIndices that belong to distinct pairs may be joined together, e.g., if (m_{1}, m'_{1}) and (m_{2}, m'_{2}) are distinct, then it may be that m_{1} = m_{2} or m'_{1} = m'_{2} , but not both.
Now, given d \in \{1, 2, \dotsc, k\} , we sum over the range of integers, which consists of all possible numbers of vertices needed to support a graph with d different edges. At one extreme, if all of the d edges are disconnected, then we must have 2d vertices. At the other extreme, let \mathcal{V}(d) be the minimal number of vertices that can support a graph of d edges.
Next, given d \in \{1, 2, \dotsc, k\} and v \in [\mathcal{V}(d), 2d] , we sum over the range of possible number of subgraphs. Let \mathcal{S}_{\mbox{ min}}(d, v) ( \mathcal{S}_{\mbox{ max}}(d, v) ) be the minimal (maximal) number of subgraphs that a graph with d edges and v vertices can have. For a given triplet (d, v, s) , where s \in [\mathcal{S}_{\mbox{ min}}(d, v), \mathcal{S}_{\mbox{ max}}(d, v)] is the number of subgraphs within G , note that one can create many different graphs (see Figure 6), and we have to take all of them into account. Hence, let \mathsf{T}(d, v, s) be the number of distinct ways (or topologies) to connect a graph with d edges, v vertices, and s subgraphs. Finally, for any 1 \leq i \leq \mathsf{T}(d, v, s) , let \mathsf{G}_{i}(d, v, s) be the set of different graphs with d edges, v vertices, s subgraphs, and the topology i , which can be defined on the set of M vectors, as we explained before.
Let \Theta(G) be an indicator random variable that equals one if and only if all the indicator random variables related to the pairs of vectors that are linked by the edges of G equals one. With the definitions above, the inner sum in (6.12) can now be written as:
\begin{align} &\sum\limits_{\{(m_{i}, m'_{i}) \in [M]^{2}, \; 1 \leq i \leq d, \; (m_{i}, m'_{i}) \neq (m_{j}, m'_{j}) \; \forall i \neq j\}} \mathbb{E} \left[ {\cal I}(m_{1}, m'_{1}) \cdot {\cal I}(m_{2}, m'_{2}) \cdot \ldots \cdot {\cal I}(m_{d}, m'_{d}) \right] \\ &\equiv \sum\limits_{\{(m_{i}, m'_{i}) \in [M]^{2}, \; 1 \leq i \leq d, \; (m_{i}, m'_{i}) \neq (m_{j}, m'_{j}) \; \forall i \neq j\}} \mathbb{E} \left[ \prod\limits_{i = 1}^{d} J( \boldsymbol{X}_{m_{i}}, \boldsymbol{X}_{m'_{i}}) \right] \end{align} | (6.13) |
\begin{align} & = \sum\limits_{v = \mathcal{V}(d)}^{2d} \sum\limits_{s = \mathcal{S}_{\mbox{ min}}(d, v)}^{\mathcal{S}_{\mbox{ max}}(d, v)} \sum\limits_{i = 1}^{\mathsf{T}(d, v, s)} \sum\limits_{G \in \mathsf{G}_{i}(d, v, s)} \mathbb{E} \left[ \Theta(G) \right], \end{align} | (6.14) |
i.e., for any d \in \{1, 2, \dotsc, k\} , we first sum over the number of vertices, then over the number of subgraphs. Later, for a fixed triplet (d, v, s) , we sum over all possible \mathsf{T}(d, v, s) topologies and finally over all specific graphs G \in \mathsf{G}_{i}(d, v, s) with a given topology.
It turns out that the expectation of \Theta(G) can be easily evaluated if all subgraphs of G are trees (also known as a forest in the terminology of graph theory). If at least one subgraph of G contains loops, we apply a process of graph pruning, in which we cut out the minimal amount of edgesⅥ, while keeping all vertices intact, until we get a forest (for example, see Figure 7).
ⅥIn fact, this procedure is equivalent to upper-bounding some of the indicator functions in (6.13) by one.
Denote by \mathcal{P}(G) the pruned graph of G . Notice that the expectation of \Theta(G) is upper-boundedⅦ by the expectation of \Theta(\mathcal{P}(G)) , and equality holds between them if G is a tree. Before we proceed, let us calculate the quantity \mathcal{E}(G) , which is the number of edges in \mathcal{P}(G) . Of course, if G is already a forest, then \mathcal{E}(G) = d . For any G \in \mathsf{G}_{i}(d, v, s) which is not necessarily a forest, we find \mathcal{E}(G) as follows. Assume that v_{1}, v_{2}, \dotsc, v_{s} are the number of vertices in each of the s subgraphs of G . Then, in the process of graph pruning, each of the j \in \{1, 2, \dotsc, s\} subgraphs of G will transform into a tree with exactly v_{j} - 1 edges. Hence,
ⅦIt follows from the fact that \Theta(G) \leq\Theta(\mathcal{P}(G)) with probability one.
\begin{align} \mathcal{E}(G) = \sum\limits_{j = 1}^{s} (v_{j} - 1) = v - s. \end{align} | (6.15) |
The expectation of \Theta(\mathcal{P}(G)) can be upper-bounded in a simple iterative process of graph reduction. In the first step, we take the expectation with respect to all vectors that are labels of leaf vertices in \mathcal{P}(G) , while we condition on the realizations of all other vectors, namely those corresponding to inner vertices in \mathcal{P}(G) ; afterwards, we erase the leaf vector vertices and the corresponding edges. The successive steps are identical to the first one on the remaining (unerased) graph, continuing until all vectors that are attributed to G have been considered (for example, see Figure 8).
In each step of graph reduction, the expectation with respect to each of the leaf vectors is upper-bounded as \mathbb{P} \left\{ J(\boldsymbol{x}, \boldsymbol{X}_{\mbox{ leaf}}) = 1 \right\} \leq Q , regardless of \boldsymbol{x} , since we condition on the realizations of vectors that are attributed to the inner vertices. For any graph G \in \mathsf{G}_{i}(d, v, s) , we conclude that the expectation of \Theta(\mathcal{P}(G)) is upper-bounded by Q^{\mathcal{E}(G)} . Thus, for any G \in \mathsf{G}_{i}(d, v, s) , we prove the following upper bound on the expectation of \Theta(G) as follows:
\begin{align} \mathbb{E}\left[ \Theta(G) \right] \leq \mathbb{E} \left[ \Theta(\mathcal{P}(G)) \right] \leq Q^{\mathcal{E}(G)} = Q^{v-s}. \end{align} | (6.16) |
This bound involves, in fact, the main conceptual steps in the derivation of an upper bound on \mathbb{E}[N^{k}] ; after upper-bounding (6.14) using (6.16), the remaining steps are much more technical and will not be presented here, as they resemble the second half of the proof of Theorem 2 (in Appendix C).
We now show that for some parameter choices, the new proposed statistical test in (5.12) achieves lower type Ⅰ and type Ⅱ error probabilities than the sum test in (4.2), which was proposed in [13]. The fact that we could prove that the new test has lower error probabilities is not trivial, since the analysis in [13] is based on Chernoff's bound, which is known to be relatively tightⅧ in many cases, while the analysis in the current manuscript follows other methods that are able to accommodate the statistical dependencies between the indicator random variables in (5.6).
ⅧIn the sense that a lower bound with a matching exponent function can be derived.
For each specific \rho , the trade-off between the two error probabilities varies significantly with the choices of the thresholds \beta and \theta ; hence, these parameters should be tuned accordingly to achieve the desired trade-off. Let us denote the right-hand sides of (5.18) and (5.19) as P_{\mbox{ FA}}(\beta, \theta) and P_{\mbox{ MD}}(\beta, \theta) , respectively, where the dependence on (n, d, \rho) is made implicit. We define a unique performance curve for the count test by minimizing the type Ⅱ error probability while constraining the type Ⅰ error probability to a desired level. More specifically,
\begin{align} E_{\mbox{ MD}}^{\mbox{ Count}}(n, d, \rho, E_{\mbox{ FA}}) = \max\limits_{\{(\beta, \theta) \in (0, 1)^2:\; -\log P_{\mbox{ FA}}(\beta, \theta) \geq E_{\mbox{ FA}}\}} -\log P_{\mbox{ MD}}(\beta, \theta). \end{align} | (7.1) |
In Figure 9 below, we present a plot of the trade-off between \tfrac{d}{2}G_{\mbox{ FA}}(\gamma) and \tfrac{d}{2}G_{\mbox{ MD}}(\gamma) for \gamma \in (0, 4\rho^{2}) (solid line) together with a plot of E_{\mbox{ MD}}^{\mbox{ Count}}(n, d, \rho, E_{\mbox{ FA}}) as a function of E_{\mbox{ FA}} (scattered points) for the parameters n = 200 , d = 50 , and for two values of \rho . The quantity Q(d, \theta) is calculated numerically using (5.20) and P(d, \rho, \theta) is calculated using (5.39) in Lemma 2. Since the numbers \mathsf{B}(k) in (5.17) grow quite rapidly with k , when calculating the bound in (5.42), we switched the infimum over \{3, 4, 5, \ldots\} with a minimization over \{3, 4, \ldots, 30\} . As can be seen in Figure 9, the new proposed statistical test achieves lower error probabilities for both \rho = 0.7 and \rho = 0.4 .
In the comparison aboove, we see that the count test uniformly outperforms the sum test for specific choices of the parameter triplet n, d, \rho . We now show that the range of parameter choices for which the count test is better than the sum test is relatively wide. To compare between the two tests, let us define
\begin{align} E_{\mbox{ MD}}^{\mbox{ Sum}}(d, \rho, E_{\mbox{ FA}}) = \max\limits_{\{\gamma \in (0, 4\rho^2):\; \tfrac{d}{2}G_{\mbox{ FA}}(\gamma) \geq E_{\mbox{ FA}}\}} \tfrac{d}{2}G_{\mbox{ MD}}(\gamma). \end{align} | (7.2) |
We let d vary while fixing the parameter values n = 200 and E_{\mbox{ FA}} = 3 , such that the type Ⅰ error probabilities are upper-bounded by approximately 0.05 for any value of d . In Figure 10 below, we present plots of E_{\mbox{ MD}}^{\mbox{ Count}}(n, d, \rho, E_{\mbox{ FA}}) and E_{\mbox{ MD}}^{\mbox{ Sum}}(d, \rho, E_{\mbox{ FA}}) as functions of d for two values of \rho . As can be seen in Figure 10, the count test outperforms the sum test for relatively low d values, while the sum test performs better otherwise. One should note that for lower values of \rho , the range of d in which the count test outperforms the sum test increases.
It may be somewhat surprising to some readers that the sum test was found to perform worse than the count test in the Gaussian setup. Moreover, it is possible that the sum test may actually perform better in practice, but the (Chernoff-based) bounds for its error probabilities may not be tight enough to show this. In order to reject this hypothesis, we computed the type Ⅰ and type Ⅱ error probabilities of both tests via Monte–Carlo simulations. In Figure 11 below, we present points of the form (-\log \hat{P}_{\mbox{ FA}}, -\log \hat{P}_{\mbox{ MD}}) for some threshold values t in the sum test and for some threshold pairs (\theta, \beta) in the count test for the parameters n = 200 , d = 50 , and \rho = 0.3 . In order to facilitate the simulations for the count test, we fixed the local threshold as \theta = 0.4 and picked a few values for \beta in (0, 1) . The error probabilities of the sum test have been estimated on the basis of 20,000 samples at each point, while for the count test we increased the sample size to 200,000 at each point. As can be seen in Figure 11, the estimated error probabilities of the sum test are higher than those of the count test, thus the latter performs better also in practice, at least for the specific chosen parameter values.
In this manuscript, we propose the count test, which is a new statistical test that solves the correlation detection problem between two Gaussian databases. The proposed test has a reduced computational complexity in comparison with the GLR test, which makes it more desirable from a practical point of view. The count test relies on a statistic given by a sum of weakly dependent indicator random variables. Therefore, novel graph-theoretic techniques have been developed in order to derive theoretical guarantees on the type Ⅰ probability of error. We compared the count test with the sum test, which was recently proposed as a valid solution for the same statistical problem. By comparing the analytical bounds of the two error probabilities, it is realized that the count test outperforms the sum test for a relatively wide range of system parameters. In principle, the probability bounds may not reflect the true performance of the tests, and thus we also present a comparison between the estimated error probabilities that result from Monte–Carlo simulations of both tests. In this comparison as well, the count test outperforms the sum test. Further possible directions for future research are suggested as follows.
(1) A goal for future research is to analyze the GLR test for the correlation detection problem and compare its performance with those of the sum test and the count test. Bounding the type Ⅱ error probability of the GLR test should be relatively straightforward, but deriving a plausible bound for its type Ⅰ error probability seems to be more challenging and may require new probabilistic tools. In addition, one can also propose a generalization of the simplified form of the GLR test in (4.10), which trades-off between computational complexity and reliability; we suggest a statistical test with a running time of {\cal O}(n^{\delta}) for some \delta \in (1, 3) . Specifically, we suggest the following test. For two given databases \boldsymbol{X}^{n} and \boldsymbol{Y}^{n} and a predefined \alpha \in [0, 1] , we consider the first n^{\alpha} rows in \boldsymbol{X}^{n} and solve the partial assignment problem between \{ \boldsymbol{X}_{1}, \ldots, \boldsymbol{X}_{n^{\alpha}}\} and \boldsymbol{Y}^{n} , i.e., we calculate the statistic
\begin{align} W(\alpha) = \max\limits_{\{ {\cal A} \subseteq [n], | {\cal A}| = n^{\alpha}\}} \max\limits_{ \boldsymbol{\sigma} \in {\cal S}_{n^{\alpha}}} \sum\limits_{i = 1}^{n^{\alpha}} \boldsymbol{X}_{i}^{\mathsf{T}} \boldsymbol{Y}_{\sigma_{i}}^{ {\cal A}}, \end{align} | (8.1) |
where \boldsymbol{Y}^{ {\cal A}} is a database that contains only the rows from \boldsymbol{Y}^{n} , with indices as given by the set {\cal A} \subseteq [n] . The proposed test then compares W(\alpha) to a threshold as follows:
\begin{align} \phi_{\mbox{ GLR}}( \boldsymbol{X}^{n}, \boldsymbol{Y}^{n}) = \left\{ \begin{array}{l l} 0 & \quad { W(\alpha) < \xi } \\ 1 & \quad { W(\alpha) \geq \xi , } \end{array} \right. \end{align} | (8.2) |
where \xi \in (\mathbb{E}_{0}[W(\alpha)], \mathbb{E}_{1}[W(\alpha)]) is a global threshold.
Concerning the running time of the test \phi_{{\mbox{ GLR}}}(\boldsymbol{X}^{n}, \boldsymbol{Y}^{n}) , it is directly connected to the computational complexity of the partial assignment problem between J jobs and W workers ( J \leq W ). Such an assignment problem can be solved sequentially; one can add the j -th job and update the total cost in time {\cal O}(jW) , yielding an overall time complexity of {\cal O}(\sum_{j = 1}^{J}jW) = {\cal O}(J^{2}W) . In our case, the time complexity is given by {\cal O}(n^{1+2\alpha}) . It may be of special interest to compare this test at \alpha = 0 with the sum test (whose computational complexity scales like {\cal O}(n) ) and, additionally, to compare this test at \alpha = 1/2 with the count test (whose computational complexity scales like {\cal O}(n^2) ).
(2) In the current manuscript, we have considered a relatively simplified model, in which the components of every feature vector are IID and Gaussian. In practice, both of these assumptions may not be valid. In the first step towards more general settings, one should consists a model where each feature vector is still Gaussian but with a general covariance matrix. In a second step, a model with non-Gaussian feature vectors may be considered as well. It is of great interest to explore statistical tests for more general models, which are characterized by high reliability and low computational complexity.
The author declares that he has not used Artificial Intelligence (AI) tools in the creation of this article.
The author gratefully acknowledges the timely and constructive reports of the anonymous referees, as well as editorial comments provided by the editor. Special thanks are extended to Edward Hall (EPFL) for his help in executing the Monte–Carlo simulations in Section 7. The author also thanks Argaman and Omri Tamir for sharing their artistic preferences when finalizing Figure 4.
The author declares no conflicts of interest.
Preliminaries
The main purpose of this appendix is to provide the general setting and the main result borrowed from [38].
Let \{U_{ \boldsymbol{k}}\}_{ \boldsymbol{k} \in {\cal K}} be a family of Bernoulli random variables, where {\cal K} is a set of multidimensional indexes. Let {\cal G} be a dependency graph for \{U_{ \boldsymbol{k}}\}_{ \boldsymbol{k} \in {\cal K}} , i.e., a graph with a vertex set {\cal K} such that if {\cal A} and {\cal B} are two disjoint subsets of {\cal K} , and {\cal G} contains no edge between {\cal A} and {\cal B} , then the families \{U_{ \boldsymbol{k}}\}_{ \boldsymbol{k} \in {\cal A}} and \{U_{ \boldsymbol{k}}\}_{ \boldsymbol{k} \in {\cal B}} are independent. Let S = \sum_{ \boldsymbol{k} \in {\cal K}} U_{ \boldsymbol{k}} and \Delta = \mathbb{E}[S] . Moreover, we write \boldsymbol{i} \sim \boldsymbol{j} if (\boldsymbol{i}, \boldsymbol{j}) is an edge in the dependency graph G . Let
\begin{align} \Omega = \max\limits_{ \boldsymbol{i} \in {\cal K}} \sum\limits_{ \boldsymbol{j} \in {\cal K}, \boldsymbol{j} \sim \boldsymbol{i}} \mathbb{E} [U_{ \boldsymbol{j}}] \end{align} | (A.1) |
and
\begin{align} \Theta = \frac{1}{2} \sum\limits_{ \boldsymbol{i} \in {\cal K}} \sum\limits_{ \boldsymbol{j} \in {\cal K}, \boldsymbol{j} \sim \boldsymbol{i}} \mathbb{E} [U_{ \boldsymbol{i}}U_{ \boldsymbol{j}}]. \end{align} | (A.2) |
The following result will be used in the proof of Theorem 1 in Appendix B:
Theorem 3. ([38], Theorem 10) With the notation as given above, then for any 0\leq \beta \leq 1 ,
\begin{align} \mathbb{P} \{S \leq \beta \Delta\} \leq \exp \left\{- \min \left( (1-\beta)^{2} \frac{\Delta^{2}}{8\Theta+2\Delta}, (1-\beta) \frac{\Delta}{6\Omega} \right) \right\} . \end{align} | (A.3) |
Analysis of false alarms
For \alpha \geq 0 , consider the following:
\begin{align} P_{\mbox{ FA}}(\phi_{\mbox{ Count}}) & = \mathbb{P}_{0} \left\{N(\theta) \geq \beta [nP+n(n-1)Q] \right\} \end{align} | (B.1) |
\begin{align} & = \mathbb{P}_{0} \left\{\sum\limits_{i = 1}^{n} \sum\limits_{j = 1}^{n} \mathbb{1} \left\{\tilde{ \boldsymbol{X}}_{i}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{j} \geq \theta \right\} \geq \beta [nP+n(n-1)Q] \right\} \end{align} | (B.2) |
\begin{align} & = \mathbb{P}_{0} \left\{\left(\sum\limits_{i = 1}^{n} \sum\limits_{j = 1}^{n} \mathbb{1}\left\{\tilde{ \boldsymbol{X}}_{i}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{j} \geq \theta \right\}\right)^{\alpha} \geq \left(\beta [nP+n(n-1)Q]\right)^{\alpha} \right\} \end{align} | (B.3) |
\begin{align} &\leq \frac{ \mathbb{E}_{0} \left[\left(\sum\nolimits_{i = 1}^{n} \sum\nolimits_{j = 1}^{n} \mathbb{1}\left\{\tilde{ \boldsymbol{X}}_{i}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{j} \geq \theta \right\}\right)^{\alpha}\right]}{\left(\beta [nP+n(n-1)Q]\right)^{\alpha}}, \end{align} | (B.4) |
where (B.4) is due to Markov's inequality. Since \alpha \geq 0 is arbitrary, it holds that
\begin{align} P_{\mbox{ FA}}(\phi_{\mbox{ Count}}) &\leq \inf\limits_{\alpha \geq 0} \frac{ \mathbb{E}_{0} \left[\left(\sum\nolimits_{i = 1}^{n} \sum\nolimits_{j = 1}^{n} \mathbb{1}\left\{\tilde{ \boldsymbol{X}}_{i}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{j} \geq \theta \right\}\right)^{\alpha}\right]}{\left(\beta [nP+n(n-1)Q]\right)^{\alpha}} \end{align} | (B.5) |
\begin{align} & = \min \left\{ \min\limits_{\alpha \in [0, 1]} \frac{ \mathbb{E}_{0} \left[\left(\sum\nolimits_{i = 1}^{n} \sum\nolimits_{j = 1}^{n} \mathbb{1}\left\{\tilde{ \boldsymbol{X}}_{i}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{j} \geq \theta \right\}\right)^{\alpha}\right]}{\left(\beta [nP+n(n-1)Q]\right)^{\alpha}} , \right. \\ &\; \; \; \; \; \; \; \; \; \; \; \; \; \; \left. \inf\limits_{\alpha \geq 1} \frac{ \mathbb{E}_{0} \left[\left(\sum\nolimits_{i = 1}^{n} \sum\nolimits_{j = 1}^{n} \mathbb{1}\left\{\tilde{ \boldsymbol{X}}_{i}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{j} \geq \theta \right\}\right)^{\alpha}\right]}{\left(\beta [nP+n(n-1)Q]\right)^{\alpha}} \right\} \end{align} | (B.6) |
\begin{align} &\leq \min \left\{ \min\limits_{\alpha \in [0, 1]} \frac{ \mathbb{E}_{0} \left[\left(\sum\nolimits_{i = 1}^{n} \sum\nolimits_{j = 1}^{n} \mathbb{1}\left\{\tilde{ \boldsymbol{X}}_{i}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{j} \geq \theta \right\}\right)^{\alpha}\right]}{\left(\beta [nP+n(n-1)Q]\right)^{\alpha}} , \right. \\ &\; \; \; \; \; \; \; \; \; \; \; \; \; \; \left. \inf\limits_{k \in \mathbb{N}} \frac{ \mathbb{E}_{0} \left[\left(\sum\nolimits_{i = 1}^{n} \sum\nolimits_{j = 1}^{n} \mathbb{1}\left\{\tilde{ \boldsymbol{X}}_{i}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{j} \geq \theta \right\}\right)^{k}\right]}{\left(\beta [nP+n(n-1)Q]\right)^{k}} \right\}, \end{align} | (B.7) |
so the main tasks are to evaluate the fractional \alpha -th moment and the integer k -th moment of N(\theta) . We start with the fractional \alpha -th moment of N(\theta) and proceed as follows. For a given \alpha \in [0, 1] , let \xi \in [\alpha, 1] . In this case,
\begin{align} & \mathbb{E}_{0} \left[\left(\sum\limits_{i = 1}^{n} \sum\limits_{j = 1}^{n} \mathbb{1}\left\{\tilde{ \boldsymbol{X}}_{i}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{j} \geq \theta \right\}\right)^{\alpha}\right] \\ &\; \; \; \; = \mathbb{E}_{0} \left\{\left[\left(\sum\limits_{i = 1}^{n} \sum\limits_{j = 1}^{n} \mathbb{1}\left\{\tilde{ \boldsymbol{X}}_{i}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{j} \geq \theta \right\}\right)^{\xi}\right]^{\alpha/\xi}\right\} \end{align} | (B.8) |
\begin{align} &\; \; \; \; \leq \mathbb{E}_{0} \left\{\left[\sum\limits_{i = 1}^{n} \left(\sum\limits_{j = 1}^{n} \mathbb{1}\left\{\tilde{ \boldsymbol{X}}_{i}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{j} \geq \theta \right\}\right)^{\xi}\right]^{\alpha/\xi}\right\} \end{align} | (B.9) |
\begin{align} &\; \; \; \; \leq \left\{ \mathbb{E}_{0}\left[\sum\limits_{i = 1}^{n} \left(\sum\limits_{j = 1}^{n} \mathbb{1}\left\{\tilde{ \boldsymbol{X}}_{i}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{j} \geq \theta \right\}\right)^{\xi}\right]\right\}^{\alpha/\xi} \end{align} | (B.10) |
\begin{align} &\; \; \; \; = \left\{\sum\limits_{i = 1}^{n} \mathbb{E}_{0}\left[\left(\sum\limits_{j = 1}^{n} \mathbb{1}\left\{\tilde{ \boldsymbol{X}}_{i}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{j} \geq \theta \right\}\right)^{\xi}\right]\right\}^{\alpha/\xi}, \end{align} | (B.11) |
where the first inequality is based on the fact that (\sum_{i}a_{i})^{t} \leq \sum_{i}a_{i}^{t} whenever \{a_{i}\} are non-negative and t \in [0, 1] , and the second inequality follows from Jensen's inequality and the concavity of the function f(u) = u^{\alpha/\xi} when u \geq 0 and 0 < \alpha/\xi \leq 1 . In order to proceed, we use the following modification of [43, Appendix A, Lemma 1.1].
Lemma 3. Let I_{1}, \ldots, I_{n} be IID Bernoulli random variables with success probability bounded as:
\begin{align} p = \mathbb{P}(I_{i} = 1) = 1- \mathbb{P}(I_{i} = 0) \leq Q, \end{align} | (B.12) |
where n \in \mathbb{N} and Q \in [0, 1] . Let \xi \in (0, 1) . In this case,
\begin{align} \mathbb{E} \left[\left( \sum\limits_{i = 1}^{n} I_{i} \right)^{\xi}\right] \leq \left\{ \begin{array}{l l} nQ & \quad \mathit{{if\; nQ \leq 1 , }} \\ (nQ)^{\xi} & \quad \mathit{{if \; nQ > 1 .}} \end{array} \right. \end{align} | (B.13) |
Proof. We distinguish between two cases. If nQ \leq 1 , then we rely on the result in [43, Equation (149)] with (\lceil\xi\rceil!)^2\lceil\xi\rceil = 1 to conclude that
\begin{align} \mathbb{E} \left[\left( \sum\limits_{i = 1}^{n} I_{i} \right)^{\xi}\right] \leq nQ. \end{align} | (B.14) |
Conversely, if nQ > 1 , Jensen's inequality and the concavity of the function f(u) = u^{\xi} when u \geq 0 and 0 < \xi < 1 yields
\begin{align} \mathbb{E} \left[\left( \sum\limits_{i = 1}^{n} I_{i} \right)^{\xi}\right] \leq \left( \mathbb{E}\left[\sum\limits_{i = 1}^{n} I_{i}\right]\right)^{\xi} \leq (nQ)^{\xi}. \end{align} | (B.15) |
Note that, conditioned on \boldsymbol{X}_{i} = \boldsymbol{x} , the random variables \left\{ \mathbb{1}\{\tilde{ \boldsymbol{x}}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{j} \geq \theta \}\right\}_{j = 1}^{n} are IID Bernoulli, and thus
\begin{align} \mathbb{E}_{0}\left[\left(\sum\limits_{j = 1}^{n} \mathbb{1}\left\{\tilde{ \boldsymbol{X}}_{i}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{j} \geq \theta \right\}\right)^{\xi} \middle| \boldsymbol{X}_{i} \right] \leq \left\{ \begin{array}{l l} nQ & \quad {\text{if}\; nQ \leq 1 , } \\ (nQ)^{\xi} & \quad {\text{if}\; nQ > 1 .} \end{array} \right. \end{align} | (B.16) |
Since the right-hand side of (B.16) does not depend on \boldsymbol{X}_{i} , it also holds that
\begin{align} \mathbb{E}_{0}\left[\left(\sum\limits_{j = 1}^{n} \mathbb{1}\left\{\tilde{ \boldsymbol{X}}_{i}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{j} \geq \theta \right\}\right)^{\xi} \right] \leq \left\{ \begin{array}{l l} nQ & \quad {\text{if}\; nQ \leq 1 , } \\ (nQ)^{\xi} & \quad {\text{if}\; nQ > 1 .} \end{array} \right. \end{align} | (B.17) |
Upper-bounding (B.11) using (B.17) yields
\begin{align} \mathbb{E}_{0} \left[\left(\sum\limits_{i = 1}^{n} \sum\limits_{j = 1}^{n} \mathbb{1}\left\{\tilde{ \boldsymbol{X}}_{i}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{j} \geq \theta \right\}\right)^{\alpha}\right] &\leq \left(\sum\limits_{i = 1}^{n} \left\{ \begin{array}{l l} nQ & \quad {\text{if}\; nQ \leq 1 } \\ (nQ)^{\xi} & \quad {\text{if}\; nQ > 1 } \end{array} \right. \right)^{\alpha/\xi} \end{align} | (B.18) |
\begin{align} & = \left(\left\{ \begin{array}{l l} n^{2}Q & \quad {\text{if}\; nQ \leq 1 } \\ n^{1+\xi}Q^{\xi} & \quad {\text{if}\; nQ > 1 } \end{array} \right. \right)^{\alpha/\xi} \end{align} | (B.19) |
\begin{align} & = \left\{ \begin{array}{l l} (n^{2}Q)^{\alpha/\xi} & \quad {\text{if}\; nQ \leq 1 } \\ n^{(1/\xi+1)\alpha}Q^{\alpha} & \quad {\text{if}\; nQ > 1 } \end{array} \right. . \end{align} | (B.20) |
Finally, minimizing over \xi \in [\alpha, 1] yields
\begin{align} \mathbb{E}_{0} \left[\left(\sum\limits_{i = 1}^{n} \sum\limits_{j = 1}^{n} \mathbb{1}\left\{\tilde{ \boldsymbol{X}}_{i}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{j} \geq \theta \right\}\right)^{\alpha}\right] &\leq \min\limits_{\xi \in [\alpha, 1]} \left\{ \begin{array}{l l} (n^{2}Q)^{\alpha/\xi} & \quad {\text{if}\; nQ \leq 1 } \\ n^{(1/\xi+1)\alpha}Q^{\alpha} & \quad {\text{if}\; nQ > 1 } \end{array} \right. \end{align} | (B.21) |
\begin{align} & = \min\limits_{\xi \in [\alpha, 1]} \left\{ \begin{array}{l l} (n^{2}Q)^{\alpha/\xi} & \quad {\text{if}\; nQ \leq 1, n^{2}Q \leq 1 } \\ (n^{2}Q)^{\alpha/\xi} & \quad {\text{if}\; nQ \leq 1, n^{2}Q > 1 } \\ n^{(1/\xi+1)\alpha}Q^{\alpha} & \quad {\text{if}\; nQ > 1, n^{2}Q > 1 } \end{array} \right. \end{align} | (B.22) |
\begin{align} & = \left\{ \begin{array}{l l} (n^{2}Q)^{\alpha/\alpha} & \quad {\text{if}\; nQ \leq 1, n^{2}Q \leq 1 } \\ (n^{2}Q)^{\alpha/1} & \quad {\text{if}\; nQ \leq 1, n^{2}Q > 1 } \\ n^{(1/1+1)\alpha}Q^{\alpha} & \quad {\text{if}\; nQ > 1, n^{2}Q > 1 } \end{array} \right. \end{align} | (B.23) |
\begin{align} & = \left\{ \begin{array}{l l} n^{2}Q & \quad {\text{if}\; nQ \leq 1, n^{2}Q \leq 1 } \\ (n^{2}Q)^{\alpha} & \quad {\text{if}\; nQ \leq 1, n^{2}Q > 1 } \\ (n^{2}Q)^{\alpha} & \quad {\text{if}\; nQ > 1, n^{2}Q > 1 } \end{array} \right. \end{align} | (B.24) |
\begin{align} & = \left\{ \begin{array}{l l} n^{2}Q & \quad {\text{if}\; n^{2}Q \leq 1 } \\ (n^{2}Q)^{\alpha} & \quad {\text{if}\; n^{2}Q > 1 } \end{array} \right. . \end{align} | (B.25) |
It then follows that the first term inside the minimum in (B.7) is upper-bounded by
\begin{align} \min\limits_{\alpha \in [0, 1]} \frac{(n^{2}Q)^{\alpha} \cdot \mathbb{1}\{n^{2}Q > 1\} + n^{2}Q \cdot \mathbb{1}\{n^{2}Q \leq 1\}}{\left(\beta [nP+n(n-1)Q]\right)^{\alpha}}. \end{align} | (B.26) |
Moving to upper-bounding the k -th integer moments of N(\theta) , let us start with the case of k = 2 .
\begin{align} & \mathbb{E}_{0}\left[\left(\sum\limits_{i = 1}^{n} \sum\limits_{j = 1}^{n} \mathbb{1}\left\{\tilde{ \boldsymbol{X}}_{i}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{j} \geq \theta \right\}\right)^{2}\right] \\ &\; \; \; \; = \sum\limits_{i = 1}^{n} \sum\limits_{j = 1}^{n} \sum\limits_{k = 1}^{n} \sum\limits_{\ell = 1}^{n} \mathbb{P}\left\{\tilde{ \boldsymbol{X}}_{i}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{j} \geq \theta, \tilde{ \boldsymbol{X}}_{k}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{\ell} \geq \theta \right\} \end{align} | (B.27) |
\begin{align} &\; \; \; \; = \sum\limits_{i = 1}^{n} \sum\limits_{j = 1}^{n} \sum\limits_{k\neq i} \sum\limits_{\ell\neq j} \mathbb{P}\left\{\tilde{ \boldsymbol{X}}_{i}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{j} \geq \theta, \tilde{ \boldsymbol{X}}_{k}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{\ell} \geq \theta \right\} \\ &\; \; \; \; \; \; + \sum\limits_{i = 1}^{n} \sum\limits_{j = 1}^{n} \sum\limits_{k\neq i} \mathbb{P}\left\{\tilde{ \boldsymbol{X}}_{i}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{j} \geq \theta, \tilde{ \boldsymbol{X}}_{k}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{j} \geq \theta \right\} \\ &\; \; \; \; \; \; +\sum\limits_{i = 1}^{n} \sum\limits_{j = 1}^{n} \sum\limits_{\ell\neq j} \mathbb{P}\left\{\tilde{ \boldsymbol{X}}_{i}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{j} \geq \theta, \tilde{ \boldsymbol{X}}_{i}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{\ell} \geq \theta \right\} \\ &\; \; \; \; \; \; +\sum\limits_{i = 1}^{n} \sum\limits_{j = 1}^{n} \mathbb{P}\left\{\tilde{ \boldsymbol{X}}_{i}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{j} \geq \theta \right\} \end{align} | (B.28) |
\begin{align} &\; \; \; \; = n^{2}(n-1)^{2} Q^{2} + n^{2}(n-1) Q^{2} + n^{2}(n-1) Q^{2} + n^{2} Q \end{align} | (B.29) |
\begin{align} &\; \; \; \; = n^{2}(n^{2}-1)Q^{2} + n^{2}Q, \end{align} | (B.30) |
where (B.29) is due to
\begin{align} & \mathbb{P}\left\{\tilde{ \boldsymbol{X}}_{i}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{j} \geq \theta, \tilde{ \boldsymbol{X}}_{i}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{\ell} \geq \theta \right\} \\ &\; \; = \int \mathbb{P}\left\{\tilde{ \boldsymbol{x}}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{j} \geq \theta, \tilde{ \boldsymbol{x}}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{\ell} \geq \theta \right\} \mathrm{d} F_{ \boldsymbol{X}_{i}}( \boldsymbol{x}) \end{align} | (B.31) |
\begin{align} &\; \; = \int \mathbb{P}\left\{\tilde{ \boldsymbol{x}}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{j} \geq \theta\right\} \cdot \mathbb{P}\left\{\tilde{ \boldsymbol{x}}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{\ell} \geq \theta \right\} \mathrm{d} F_{ \boldsymbol{X}_{i}}( \boldsymbol{x}) \end{align} | (B.32) |
\begin{align} &\; \; = \int Q^{2} \mathrm{d} F_{ \boldsymbol{X}_{i}}( \boldsymbol{x}) \end{align} | (B.33) |
\begin{align} &\; \; = Q^{2}. \end{align} | (B.34) |
The resulting second-order bound is given by
\begin{align} \frac{n^{2}(n^{2}-1)Q^{2} + n^{2}Q}{\left(\beta [nP+n(n-1)Q]\right)^{2}}. \end{align} | (B.35) |
For the general case of k \geq 3 , we use the result of Theorem 2 (which appears in Section 6). It follows that the second term inside the minimum in (B.7) is upper-bounded by
\begin{align} \inf\limits_{k \geq 3} \frac{k(k+1)\mathsf{B}(k)\left[(n^{2}Q)^{k} \cdot \mathbb{1}\{n^{2}Q > 1\} + n^{2}Q \cdot \mathbb{1}\{n^{2}Q \leq 1\}\right]}{\left(\beta [nP+n(n-1)Q]\right)^{k}}. \end{align} | (B.36) |
Upper-bounding (B.7) with (B.26), (B.35), and (B.36) proves (5.18) in Theorem 1.
Analysis of missed detection
Consider the following:
\begin{align} P_{\mbox{ MD}}(\phi_{\mbox{ Count}}) & = \mathbb{P}_{1| \boldsymbol{\sigma}} \left\{N(\theta) \leq \beta \left[nP+n(n-1)Q\right] \right\} \end{align} | (B.37) |
\begin{align} & = \mathbb{P}_{1| \boldsymbol{\sigma}} \left\{\sum\limits_{i = 1}^{n} \sum\limits_{j = 1}^{n} \mathbb{1}\left\{\tilde{ \boldsymbol{X}}_{i}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{j} \geq \theta \right\} \leq \beta \Delta_{n, d} \right\}. \end{align} | (B.38) |
Let us abbreviate the indicator random variables as {\cal I}(i, j) = \mathbb{1}\left\{\tilde{ \boldsymbol{X}}_{i}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{j} \geq \theta \right\} . In order to upper-bound (B.38), we use the result of Theorem 3, which appears in Appendix A. In our case, we have \Delta = \Delta_{n, d} , and it only remains to assess the quantities \Theta and \Omega . One can easily check that the indicator random variables {\cal I}(i, j) and {\cal I}(k, \ell) are independent as long as i \neq k and j \neq \ell . Thus, we define our dependency graph so that each vertex (i, j) is connected exactly to 2n - 2 vertices of the form (i, \ell) , \ell \neq j or (k, j) , k \neq i . If the vertices (i, j) and (k, \ell) are connected, we write (i, j) \sim (k, \ell) .
Let us denote the set \Psi_{n} = \{(m, m'):\; m, m'\in \{1, 2, \ldots, n\}\} . Concerning \Theta and \Omega , we get
\begin{align} \Theta & = \frac{1}{2} \sum\limits_{(i, j) \in \Psi_{n}} \sum\limits_{\substack{(k, \ell) \in \Psi_{n} \\ (k, \ell) \sim (i, j)}} \mathbb{E}[ {\cal I}(i, j) {\cal I}(k, \ell)] \end{align} | (B.39) |
\begin{align} & = \frac{1}{2} \sum\limits_{i = 1}^{n} \sum\limits_{\substack{(k, \ell) \in \Psi_{n} \\ (k, \ell) \sim (i, i)}} \mathbb{E}[ {\cal I}(i, i) {\cal I}(k, \ell)] + \frac{1}{2} \sum\limits_{i = 1}^{n} \sum\limits_{j \neq i} \sum\limits_{\substack{(k, \ell) \in \Psi_{n} \\ (k, \ell) \sim (i, j)}} \mathbb{E}[ {\cal I}(i, j) {\cal I}(k, \ell)]. \end{align} | (B.40) |
Regarding the summands in the left-hand term of (B.40), we derive them as follows:
\begin{align} \mathbb{E}[ {\cal I}(i, i) {\cal I}(k, \ell)] & = \mathbb{P}\left\{\tilde{ \boldsymbol{X}}_{i}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{i} \geq \theta, \tilde{ \boldsymbol{X}}_{i}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{\ell} \geq \theta \right\} \end{align} | (B.41) |
\begin{align} & = \int \mathbb{P}\left\{\tilde{ \boldsymbol{x}}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{i} \geq \theta, \tilde{ \boldsymbol{x}}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{\ell} \geq \theta \right\} \mathrm{d} F_{ \boldsymbol{X}_{i}}( \boldsymbol{x}) \end{align} | (B.42) |
\begin{align} & = \int \mathbb{P}\left\{\tilde{ \boldsymbol{x}}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{i} \geq \theta\right\} \cdot \mathbb{P}\left\{\tilde{ \boldsymbol{x}}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{\ell} \geq \theta \right\} \mathrm{d} F_{ \boldsymbol{X}_{i}}( \boldsymbol{x}) \end{align} | (B.43) |
\begin{align} & = \int PQ \mathrm{d} F_{ \boldsymbol{X}_{i}}( \boldsymbol{x}) \end{align} | (B.44) |
\begin{align} & = PQ. \end{align} | (B.45) |
Deriving the summands in the right-hand term of (B.40) in a similar way, we eventually arrive at
\begin{align} \Theta &\leq \frac{1}{2} PQ (2n-2)n + \frac{1}{2}[2Q^{2}(n-2) + 2PQ]n(n-1) \end{align} | (B.46) |
\begin{align} & = PQ(n-1)n + [Q^{2}(n-2) + PQ]n(n-1) \end{align} | (B.47) |
\begin{align} & = 2PQn(n-1) + Q^{2}n(n-1)(n-2). \end{align} | (B.48) |
In addition
\begin{align} \Omega & = \max\limits_{(i, j) \in \Psi_{n}} \sum\limits_{\substack{(k, \ell) \in \Psi_{n} \\ (k, \ell) \sim (i, j)}} \mathbb{E}[ {\cal I}(k, \ell)] \end{align} | (B.49) |
\begin{align} & = \max\left\{ \sum\limits_{\substack{(k, \ell) \in \Psi_{n} \\ (k, \ell) \sim (i, i)}} \mathbb{E}[ {\cal I}(k, \ell)], \sum\limits_{\substack{(k, \ell) \in \Psi_{n} \\ (k, \ell) \sim (i, j), \; i \neq j}} \mathbb{E}[ {\cal I}(k, \ell)] \right\} \end{align} | (B.50) |
\begin{align} & = \max\left\{(2n-2)Q, 2P + (2n-4)Q \right\} \end{align} | (B.51) |
\begin{align} & = 2P + (2n-4)Q. \end{align} | (B.52) |
Then
\begin{align} \frac{\Delta_{n, d}}{6\Omega} & = \frac{nP + n(n-1)Q}{6[2P + (2n-4)Q]} \end{align} | (B.53) |
\begin{align} & = \frac{n[P + (n-1)Q]}{12[P + (n-2)Q]} \end{align} | (B.54) |
\begin{align} &\geq \frac{n[P + (n-2)Q]}{12[P + (n-2)Q]} \end{align} | (B.55) |
\begin{align} & = \frac{n}{12}, \end{align} | (B.56) |
and
\begin{align} \frac{\Delta_{n, d}^{2}}{8\Theta + 2\Delta_{n, d}} &\geq \frac{[nP + n(n-1)Q]^{2}}{8[2PQn(n-1) + Q^{2}n(n-1)(n-2)] + 2[nP + n(n-1)Q]} \end{align} | (B.57) |
\begin{align} &\geq \frac{[nP + n(n-1)Q]^{2}}{8[2PQn^{2} + 2Q^{2}n^{2}(n-1)] + 2[nP + n(n-1)Q]} \end{align} | (B.58) |
\begin{align} & = \frac{[nP + n(n-1)Q]^{2}}{16Qn[Pn + Qn(n-1)] + 2[nP + n(n-1)Q]} \end{align} | (B.59) |
\begin{align} & = \frac{nP + n(n-1)Q}{16Qn + 2}. \end{align} | (B.60) |
Hence
\begin{align} P_{\mbox{ MD}}(\phi_{\mbox{ Count}}) & = \mathbb{P}_{1|\sigma} \left\{\sum\limits_{i = 1}^{n} \sum\limits_{j = 1}^{n} \mathbb{1}\left\{\tilde{ \boldsymbol{X}}_{i}^{\mathsf{T}}\tilde{ \boldsymbol{Y}}_{j} \geq \theta \right\} \leq \beta \Delta_{n, d} \right\} \end{align} | (B.61) |
\begin{align} &\leq \exp \left\{- \min \left( (1-\beta)^{2} \frac{\Delta_{n, d}^{2}}{8\Theta+2\Delta_{n, d}}, (1-\beta) \frac{\Delta_{n, d}}{6\Omega} \right) \right\} \end{align} | (B.62) |
\begin{align} &\leq \exp \left\{- \min \left( (1-\beta)^{2} \frac{nP + n(n-1)Q}{16Qn + 2}, (1-\beta) \frac{n}{12} \right) \right\}, \end{align} | (B.63) |
which completes the proof of Theorem 1.
For any k \in \mathbb{N} , let \mathsf{S}(k, d) be the number of ways to partition a set of k labeled objects into d \in \{1, 2, \dotsc, k\} nonempty unlabeled subsets, which is given by Stirling numbers of the second kind [45], i.e.,
\begin{align} \mathsf{S}(k, d) = \frac{1}{d!} \sum\limits_{i = 0}^{d} (-1)^{i} \binom{d}{i}(d-i)^{k}. \end{align} | (C.1) |
Obviously, \mathsf{S}(k, 1) = \mathsf{S}(k, k) = 1 . As before, we denote \Psi_{n} = \{(m, m'):\; m, m'\in \{1, 2, \ldots, n\}\} . In this case,
\begin{align} \mathbb{E} \left[ N^{k} \right] & = \mathbb{E} \left[ \left( \sum\limits_{(m, m') \in \Psi_{n}} J( \boldsymbol{X}_{m}, \boldsymbol{Y}_{m'}) \right)^{k} \right] \end{align} | (C.2) |
\begin{align} & = \sum\limits_{(m_{1}, m'_{1}) \in \Psi_{n}} \dots \sum\limits_{(m_{k}, m'_{k}) \in \Psi_{n}} \mathbb{E} \left[ J( \boldsymbol{X}_{m_{1}}, \boldsymbol{Y}_{m'_{1}}) \cdot \ldots \cdot J( \boldsymbol{X}_{m_{k}}, \boldsymbol{Y}_{m'_{k}}) \right] \end{align} | (C.3) |
\begin{align} & = \sum\limits_{d = 1}^{k} \sum\limits_{\left\{\substack{(m_{i}, m'_{i}) \in \Psi_{n}, \; 1 \leq i \leq k, \\ \text{divided into } d \text{ subsets of identical pairs}} \right\}} \mathbb{E} \left[ J( \boldsymbol{X}_{m_{1}}, \boldsymbol{Y}_{m'_{1}}) \cdot \ldots \cdot J( \boldsymbol{X}_{m_{k}}, \boldsymbol{Y}_{m'_{k}}) \right] \end{align} | (C.4) |
\begin{align} & = \sum\limits_{d = 1}^{k} \mathsf{S}(k, d) \sum\limits_{\left\{\substack{(m_{i}, m'_{i}) \in \Psi_{n}, \; 1 \leq i \leq d, \\ (m_{i}, m'_{i}) \neq (m_{j}, m'_{j}) \; \forall i \neq j} \right\}} \mathbb{E} \left[ J( \boldsymbol{X}_{m_{1}}, \boldsymbol{Y}_{m'_{1}}) \cdot \ldots \cdot J( \boldsymbol{X}_{m_{d}}, \boldsymbol{Y}_{m'_{d}}) \right] , \end{align} | (C.5) |
where, in the inner summation in (C.4), we sum over all possible k pairs of vectors indices, which are divided in any possible way into exactly d subsets, and all pairs in each subset are identical. In (C.5), we use the Stirling numbers of the second kind and sum over exactly d distinct pairs of vector indices, where all the identical pairs of indices in (C.4) have been merged together, using the trivial fact that multiplying any number of identical indicator random variables is equal to any one of them.
Let us handle the inner sum of (C.5). The idea is as follows. Instead of summing over the set \{(m_{i}, m'_{i}) \in \Psi_{n}, \; 1 \leq i \leq d, \; (m_{i}, m'_{i}) \neq (m_{j}, m'_{j}) \; \forall i \neq j\} of d distinct pairs of indices of vectors, we represent each possible configuration of indices in this set as a graph G and sum over all different graphs with exactly d distinct edges. In our graph representation, each vector index m_{i} \in \{1, 2, \dotsc, n\} and m'_{i} \in \{1, 2, \dotsc, n\} is denoted by a vertex, and each pair of indices (m_{i}, m'_{i}) , m_{i} \neq m'_{i} , is connected by an edge. Hence, the number of edges is fixed, but the numbers of vertices and subgraphs (i.e., disconnected parts of the graph) are variable.
Given d \in \{1, 2, \dotsc, k\} , we sum over the set {\cal V}(d) = \{(v_{x}, v_{y})\} of pairs of integers, which consists of all possible pairs of vertices needed in order to support a graph with d different edges. Next, given d \in \{1, 2, \dotsc, k\} and (v_{x}, v_{y}) \in {\cal V}(d) , we sum over the range of the possible number of subgraphs. Let \mathcal{S}_{\mbox{ min}}(d, v_{x}, v_{y}) ( \mathcal{S}_{\mbox{ max}}(d, v_{x}, v_{y}) ) be the minimal (maximal) number of subgraphs that a graph with d edges and (v_{x}, v_{y}) vertices can have. For a given quadruplet (d, v_{x}, v_{y}, s) , where s \in [\mathcal{S}_{\mbox{ min}}(d, v_{x}, v_{y}), \mathcal{S}_{\mbox{ max}}(d, v_{x}, v_{y})] is the number of subgraphs within G , note that one can create many different graphs (see Figure 12), and we have to take all of them into account. Hence, let \mathsf{T}(d, v_{x}, v_{y}) ( \mathsf{T}(d, v_{x}, v_{y}, s) ) be the number of distinct ways to connect a graph with d edges and (v_{x}, v_{y}) vertices (and s subgraphs). Finally, for any 1 \leq i \leq \mathsf{T}(d, v_{x}, v_{y}, s) , let \mathsf{G}_{i}(d, v_{x}, v_{y}, s) be the set of different graphs with d edges, (v_{x}, v_{y}) vertices, and s subgraph, that can be defined on the set \Psi_{n} of pairs of vectors, as we explained before.
We now prove that the cardinality of the set \mathsf{G}_{i}(d, v_{x}, v_{y}, s) is upper-bounded by an expression that depends only on the numbers of vertices (v_{x}, v_{y}) . First, let N(v_{x}) be the number of options to choose v_{x} different vectors from a set of cardinality n . In this case,
\begin{align} N(v_{x}) = n \cdot (n-1) \cdot (n-2) \cdot \ldots \cdot (n-v_{x}+1) \leq n^{v_{x}}. \end{align} | (C.6) |
Thus
\begin{align} |\mathsf{G}_{i}(d, v_{x}, v_{y}, s)| & = N(v_{x}) \cdot N(v_{y}) \end{align} | (C.7) |
\begin{align} &\leq n^{v_{x}} \cdot n^{v_{y}}. \end{align} | (C.8) |
Let \Theta(G) be an indicator random variable that equals one if and only if all pairs of feature vectors that are linked by the edges of G have J(\boldsymbol{X}, \boldsymbol{Y}) = 1 . With the definitions above, the inner sum of (C.5) can now be written as
\begin{align} &\sum\limits_{\{(m_{i}, m'_{i}) \in \Psi_{n}, \; 1 \leq i \leq d, \; (m_{i}, m'_{i}) \neq (m_{j}, m'_{j}) \; \forall i \neq j\}} \mathbb{E} \left[ J( \boldsymbol{X}_{m_{1}}, \boldsymbol{Y}_{m'_{1}}) \cdot \ldots \cdot J( \boldsymbol{X}_{m_{d}}, \boldsymbol{Y}_{m'_{d}}) \right] \\ &\; \; \; \; \equiv \sum\limits_{\{(m_{i}, m'_{i}) \in \Psi_{n}, \; 1 \leq i \leq d, \; (m_{i}, m'_{i}) \neq (m_{j}, m'_{j}) \; \forall i \neq j\}} \mathbb{E} \left[ \prod\limits_{i = 1}^{d} J( \boldsymbol{X}_{m_{i}}, \boldsymbol{Y}_{m'_{i}}) \right] \end{align} | (C.9) |
\begin{align} &\; \; \; \; = \sum\limits_{(v_{x}, v_{y}) \in {\cal V}(d)} \sum\limits_{s = \mathcal{S}_{\mbox{ min}}(d, v_{x}, v_{y})}^{\mathcal{S}_{\mbox{ max}}(d, v_{x}, v_{y})} \sum\limits_{i = 1}^{\mathsf{T}(d, v_{x}, v_{y}, s)} \sum\limits_{G \in \mathsf{G}_{i}(d, v_{x}, v_{y}, s)} \mathbb{E} \left[ \Theta(G) \right], \end{align} | (C.10) |
i.e., for any d \in \{1, 2, \dotsc, k\} , we first sum over the numbers of vertices, then over the number of subgraphs, then, for a fixed quadruplet (d, v_{x}, v_{y}, s) , over all possible \mathsf{T}(d, v_{x}, v_{y}, s) topologies, and, finally, over all specific graphs G \in \mathsf{G}_{i}(d, v_{x}, v_{y}, s) with a given topology. One should note that all limits of the three outer sums in (C.10) depend only on k , while |\mathsf{G}_{i}(d, v_{x}, v_{y}, s)| is the only one that depends also on n . Similarly to the procedure described in Section 6.2, also here, the expectation of \Theta(G) can be easily evaluated if all subgraphs of G are trees (i.e., G is a forest). If at least one subgraph of G contains loops, we apply a process of graph pruning, in which we cut out the minimal amount of edgesⅨ, while keeping all vertices intact, until we get a forest (for example, see Figure 13).
ⅨIn fact, this procedure is equivalent to upper-bounding some of the indicator functions in (C.9) by one.
We use \mathcal{P}(G) to denote the pruned graph of G . Notice that the expectation of \Theta(G) is upper-boundedⅩ by the expectation of \Theta(\mathcal{P}(G)) , which can be evaluated in a simple iterative process of graph reduction, similarly to the procedure described in Section 6.3. In the first step, we take the expectation with respect to all vectors that are the labels of leaf vertices in \mathcal{P}(G) , while we condition on the realizations of all other vectors, namely those corresponding to inner vertices in \mathcal{P}(G) ; afterwards, we erase leaf vector vertices and corresponding edges. The successive steps are identical to the first one on the remaining (unerased) graph, continuing until all vectors that are attributed to G have been considered (for example, see Figure 14).
ⅩIt follows from the fact that \Theta(G) \leq\Theta(\mathcal{P}(G)) with probability one.
In each step of graph reduction, the expectation with respect to each of the leaf vectors is given by \mathbb{P} \left\{J(\boldsymbol{x}, \boldsymbol{Y}_{\mbox{ leaf}}) = 1 \right\} \leq Q , since we condition on the realizations of vectors that are attributed to the inner vertices. For any graph G \in \mathsf{G}_{i}(d, v_{x}, v_{y}, s) , we conclude that the expectation of \Theta(\mathcal{P}(G)) is upper-bounded by Q^{ {\cal E}(G)} , where {\cal E}(G) is the number of edges in {\cal P}(G) . Of course, if G is already a forest, then \mathcal{E}(G) = d . For any G \in \mathsf{G}_{i}(d, v_{x}, v_{y}, s) which is not a forest, we find \mathcal{E}(G) as follows. Assume that v_{x}(1), v_{x}(2), \dotsc, v_{x}(s) and v_{y}(1), v_{y}(2), \dotsc, v_{y}(s) are the number of vertices in each of the s subgraphs of G . In the process of graph pruning, each of the j \in \{1, 2, \dotsc, s\} subgraphs of G will transform into a tree with exactly v_{x}(j) + v_{y}(j) - 1 edges. Hence
\begin{align} \mathcal{E}(G) = \sum\limits_{j = 1}^{s} (v_{x}(j) + v_{y}(j) - 1) = v_{x} + v_{y} - s. \end{align} | (C.11) |
The innermost sum of (C.10) can be treated as follows:
\begin{align} \sum\limits_{G \in \mathsf{G}_{i}(d, v_{x}, v_{y}, s)} \mathbb{E} \left[ \Theta(G) \right] &\leq \sum\limits_{G \in \mathsf{G}_{i}(d, v_{x}, v_{y}, s)} \mathbb{E} \left[ \Theta(\mathcal{P}(G)) \right] \end{align} | (C.12) |
\begin{align} &\leq \sum\limits_{G \in \mathsf{G}_{i}(d, v_{x}, v_{y}, s)} Q^{ {\cal E}(G)} \end{align} | (C.13) |
\begin{align} & = \sum\limits_{G \in \mathsf{G}_{i}(d, v_{x}, v_{y}, s)} Q^{v_{x}+v_{y}-s} \end{align} | (C.14) |
\begin{align} & = |\mathsf{G}_{i}(d, v_{x}, v_{y}, s)| \cdot Q^{v_{x}+v_{y}-s} \end{align} | (C.15) |
\begin{align} &\leq n^{v_{x}} \cdot n^{v_{y}} \cdot Q^{v_{x}+v_{y}-s}, \end{align} | (C.16) |
where (C.14) follows from (C.11), and (C.16) is due to (C.8). Next, we substitute (C.16) back into (C.10) and get
\begin{align} &\sum\limits_{s = \mathcal{S}_{\mbox{ min}}(d, v_{x}, v_{y})}^{\mathcal{S}_{\mbox{ max}}(d, v_{x}, v_{y})} \sum\limits_{i = 1}^{\mathsf{T}(d, v_{x}, v_{y}, s)} n^{v_{x}} \cdot n^{v_{y}} \cdot Q^{v_{x}+v_{y}-s} \\ &\; \; \; \; \; \; = \sum\limits_{s = \mathcal{S}_{\mbox{ min}}(d, v_{x}, v_{y})}^{\mathcal{S}_{\mbox{ max}}(d, v_{x}, v_{y})} \mathsf{T}(d, v_{x}, v_{y}, s) \cdot n^{v_{x}} \cdot n^{v_{y}} \cdot Q^{v_{x}+v_{y}-s} \end{align} | (C.17) |
\begin{align} &\; \; \; \; \; \; \leq \left( \sum\limits_{s = \mathcal{S}_{\mbox{ min}}(d, v_{x}, v_{y})}^{\mathcal{S}_{\mbox{ max}}(d, v_{x}, v_{y})} \mathsf{T}(d, v_{x}, v_{y}, s) \right) \cdot n^{v_{x}} \cdot n^{v_{y}} \cdot Q^{v_{x}+v_{y}-\mathcal{S}_{\mbox{ max}}(d, v_{x}, v_{y})} \end{align} | (C.18) |
\begin{align} &\; \; \; \; \; \; = \mathsf{T}(d, v_{x}, v_{y}) \cdot n^{v_{x}} \cdot n^{v_{y}} \cdot Q^{v_{x}+v_{y}-\mathcal{S}_{\mbox{ max}}(d, v_{x}, v_{y})}, \end{align} | (C.19) |
where (C.18) is true since Q \in [0, 1] , such that replacing s by its maximal value \mathcal{S}_{\mbox{ max}}(d, v_{x}, v_{y}) provides an upper bound. The passage (C.19) follows from the definitions of \mathsf{T}(d, v_{x}, v_{y}) and \mathsf{T}(d, v_{x}, v_{y}, s) . Before we substitute it back into (C.10) and then into (C.5), we summarize some minor results that will be needed in the sequel. Let us define d^{*} = \max\{v_{x}, v_{y}\} .
Lemma 4. We have the following:
1. For fixed (v_{x}, v_{y}) , \mathcal{S}_{\mathit{\mbox{max}}}(d, v_{x}, v_{y}) is a non-increasing sequence of d .
2. For any (v_{x}, v_{y}) , we have \mathcal{S}_{\mathit{\mbox{max}}}(d^{*}, v_{x}, v_{y}) = \min\{v_{x}, v_{y}\} .
Proof. We prove each of the two arguments separately.
1. For fixed (v_{x}, v_{y}) , if we add to the graph an edge, we have only two options. On the one hand, we may connect vertices that belong to the same subgraph so the number of subgraphs remains the same. On the other hand, we can connect vertices that belong to different subgraphs, in which case, the number of subgraphs decreases.
2. Without loss of generality, assume that v_{x} \leq v_{y} . Then d^{*} = v_{y} , and hence it follows by definition that each column contains exactly one edge, such that the set of edges in each row provides a subgraph. Thus, the number of subgraphs equals exactly v_{x} .
Now we have
\begin{align} \mathbb{E} \left[ N^{k} \right] &\leq \sum\limits_{d = 1}^{k} \mathsf{S}(k, d) \sum\limits_{(v_{x}, v_{y}) \in {\cal V}(d)} \mathsf{T}(d, v_{x}, v_{y}) \cdot n^{v_{x}} \cdot n^{v_{y}} \cdot Q^{v_{x}+v_{y}-\mathcal{S}_{\mbox{ max}}(d, v_{x}, v_{y})} \end{align} | (C.20) |
\begin{align} & = \sum\limits_{d = 1}^{k} \sum\limits_{(v_{x}, v_{y}) \in {\cal V}(d)} \mathsf{S}(k, d) \cdot \mathsf{T}(d, v_{x}, v_{y}) \cdot n^{v_{x}} \cdot n^{v_{y}} \cdot Q^{v_{x}+v_{y}-\mathcal{S}_{\mbox{ max}}(d, v_{x}, v_{y})} \end{align} | (C.21) |
\begin{align} & = \sum\limits_{v_{x} = 1}^{k} \sum\limits_{v_{y} = 1}^{k} \sum\limits_{d = \max\{v_{x}, v_{y}\}}^{\min\{k, v_{x} \cdot v_{y}\}} \mathsf{S}(k, d) \cdot \mathsf{T}(d, v_{x}, v_{y}) \cdot n^{v_{x}} \cdot n^{v_{y}} \cdot Q^{v_{x}+v_{y}-\mathcal{S}_{\mbox{ max}}(d, v_{x}, v_{y})} \end{align} | (C.22) |
\begin{align} &\leq \sum\limits_{v_{x} = 1}^{k} \sum\limits_{v_{y} = 1}^{k} \left( \sum\limits_{d = \max\{v_{x}, v_{y}\}}^{\min\{k, v_{x} \cdot v_{y}\}} \mathsf{S}(k, d) \cdot \mathsf{T}(d, v_{x}, v_{y}) \right) \cdot n^{v_{x}} \cdot n^{v_{y}} \cdot Q^{v_{x}+v_{y}-\mathcal{S}_{\mbox{ max}}(d^{*}, v_{x}, v_{y})}. \end{align} | (C.23) |
In (C.22) we changed the order of summation, and (C.23) is true since Q \in [0, 1] , and thus, according to the first point in Lemma 4, we reach an upper bound by substituting d = d^{*} . Moving forward, we use the trivial bound
\begin{align} \mathsf{T}(d, v_{x}, v_{y}) \leq \binom{v_{x}v_{y}}{d} \leq \frac{(v_{x}v_{y})^{d}}{d!}, \end{align} | (C.24) |
and arrive at
\begin{align} \mathbb{E} \left[ N^{k} \right] &\leq \sum\limits_{v_{x} = 1}^{k} \sum\limits_{v_{y} = 1}^{k} \left( \sum\limits_{d = \max\{v_{x}, v_{y}\}}^{\min\{k, v_{x} \cdot v_{y}\}} \mathsf{S}(k, d) \cdot \frac{(v_{x}v_{y})^{d}}{d!} \right) \cdot n^{v_{x}} \cdot n^{v_{y}} \cdot Q^{v_{x}+v_{y} - \min\{v_{x}, v_{y}\}} \end{align} | (C.25) |
\begin{align} &\leq \sum\limits_{v_{x} = 1}^{k} \sum\limits_{v_{y} = 1}^{k} \left( \sum\limits_{d = 1}^{k} \mathsf{S}(k, d) \cdot \frac{(v_{x}v_{y})^{d}}{d!} \right) \cdot n^{v_{x}} \cdot n^{v_{y}} \cdot Q^{\max\{v_{x}, v_{y}\}} \end{align} | (C.26) |
\begin{align} &\leq \sum\limits_{v_{x} = 1}^{k} \sum\limits_{v_{y} = 1}^{k} \left( \sum\limits_{d = 1}^{k} \mathsf{S}(k, d) \cdot \frac{k^{2d}}{d!} \right) \cdot n^{v_{x}} \cdot n^{v_{y}} \cdot Q^{\max\{v_{x}, v_{y}\}} \end{align} | (C.27) |
\begin{align} & = \sum\limits_{v_{x} = 1}^{k} \sum\limits_{v_{y} = 1}^{k} \mathsf{B}(k) \cdot n^{v_{x}} \cdot n^{v_{y}} \cdot Q^{\max\{v_{x}, v_{y}\}}, \end{align} | (C.28) |
where (C.25) follows from the second point in Lemma 4 and (C.28) follows from the definition in (5.17). Finally,
\begin{align} \mathbb{E} \left[ N^{k} \right] &\leq \sum\limits_{v_{x} = 1}^{k} \sum\limits_{v_{y} = 1}^{v_{x}} \mathsf{B}(k) \cdot n^{v_{x}} \cdot n^{v_{y}} \cdot Q^{\max\{v_{x}, v_{y}\}} \\ &\; \; \; \; \; \; + \sum\limits_{v_{y} = 1}^{k} \sum\limits_{v_{x} = 1}^{v_{y}} \mathsf{B}(k) \cdot n^{v_{x}} \cdot n^{v_{y}} \cdot Q^{\max\{v_{x}, v_{y}\}} \end{align} | (C.29) |
\begin{align} & = \sum\limits_{v_{x} = 1}^{k} \sum\limits_{v_{y} = 1}^{v_{x}} \mathsf{B}(k) \cdot n^{v_{x}} \cdot n^{v_{y}} \cdot Q^{v_{x}} \\ &\; \; \; \; \; \; + \sum\limits_{v_{y} = 1}^{k} \sum\limits_{v_{x} = 1}^{v_{y}} \mathsf{B}(k) \cdot n^{v_{x}} \cdot n^{v_{y}} \cdot Q^{v_{y}} \end{align} | (C.30) |
\begin{align} &\leq \sum\limits_{v_{x} = 1}^{k} v_{x} \cdot \mathsf{B}(k) \cdot (n^{2}Q)^{v_{x}} + \sum\limits_{v_{y} = 1}^{k} v_{y} \cdot \mathsf{B}(k) \cdot (n^{2}Q)^{v_{y}} \end{align} | (C.31) |
\begin{align} & = 2 \mathsf{B}(k) \sum\limits_{\ell = 1}^{k} \ell \cdot (n^{2}Q)^{\ell}. \end{align} | (C.32) |
Now, if n^{2}Q > 1 , then
\begin{align} \mathbb{E} \left[ N^{k} \right] &\leq \left( 2\mathsf{B}(k) \sum\limits_{\ell = 1}^{k} \ell \right) \cdot (n^{2}Q)^{k} \end{align} | (C.33) |
\begin{align} &\leq k(k+1)\mathsf{B}(k) \cdot (n^{2}Q)^{k}; \end{align} | (C.34) |
otherwise, if n^{2}Q \leq 1 , then
\begin{align} \mathbb{E} \left[ N^{k} \right] &\leq k(k+1)\mathsf{B}(k) \cdot n^{2}Q, \end{align} | (C.35) |
which completes the proof of Theorem 2.
We upper-bound the expression in (5.20), which is given by
\begin{align} Q(d, \theta) = \frac{1}{2}\frac{B\left(1-\theta^{2}; \frac{d-1}{2}, \frac{1}{2}\right)}{B\left(\frac{d-1}{2}, \frac{1}{2}\right)}. \end{align} | (D.1) |
As for the numerator, we have
\begin{align} B\left(1-\theta^{2}; \frac{d-1}{2}, \frac{1}{2}\right) & = \int_{0}^{1-\theta^{2}} \frac{t^{\frac{d-3}{2}}}{\sqrt{1-t}} \mathrm{d} t \end{align} | (D.2) |
\begin{align} &\leq \int_{0}^{1-\theta^{2}} \frac{t^{\frac{d-3}{2}}}{\sqrt{1-(1-\theta^{2})}} \mathrm{d} t \end{align} | (D.3) |
\begin{align} & = \frac{1}{\theta} \int_{0}^{1-\theta^{2}} t^{\frac{d-3}{2}} \mathrm{d} t \end{align} | (D.4) |
\begin{align} & = \frac{2}{\theta(d-1)} \cdot \left(1-\theta^{2}\right)^{\frac{d-1}{2}} \end{align} | (D.5) |
\begin{align} &\leq \frac{4}{\theta d} \cdot \left(1-\theta^{2}\right)^{\frac{d-1}{2}}. \end{align} | (D.6) |
In order to bound the beta function, we use the following identity:
\begin{align} B(x, y) = \frac{\Gamma(x)\Gamma(y)}{\Gamma(x+y)}, \end{align} | (D.7) |
where \Gamma(\cdot) is the gamma function:
\begin{align} \Gamma(s) = \int_{0}^{\infty} x^{s-1}e^{-x} \mathrm{d} x. \end{align} | (D.8) |
In general
\begin{align} B\left(\frac{d-1}{2}, \frac{1}{2}\right) & = \frac{\Gamma\left(\frac{d-1}{2}\right)\Gamma\left(\frac{1}{2}\right)}{\Gamma\left(\frac{d}{2}\right)} \end{align} | (D.9) |
\begin{align} & = \frac{\sqrt{\pi}\Gamma\left(\frac{d-1}{2}\right)}{\Gamma\left(\frac{d}{2}\right)}. \end{align} | (D.10) |
In order to bound the ratio of gamma functions, let us invoke Gautschi's inequality.
Lemma 5. Let x be a positive real number and let s \in (0, 1) . In this case,
\begin{align} x^{1-s} \leq \frac{\Gamma(x+1)}{\Gamma(x+s)} \leq (x+1)^{1-s}. \end{align} | (D.11) |
For a lower bound
\begin{align} B\left(\frac{d-1}{2}, \frac{1}{2}\right) & = \frac{\sqrt{\pi}\Gamma\left(\frac{d-1}{2}\right)}{\Gamma\left(\frac{d}{2}\right)} \end{align} | (D.12) |
\begin{align} & = \frac{\sqrt{\pi}}{\frac{\Gamma\left(\frac{d}{2}-1+1\right)}{\Gamma\left(\frac{d}{2}-1+\frac{1}{2}\right)}} \end{align} | (D.13) |
\begin{align} &\geq \frac{\sqrt{\pi}}{(\frac{d}{2}-1+1)^{1-\frac{1}{2}}} \end{align} | (D.14) |
\begin{align} & = \sqrt{\frac{2\pi}{d}}, \end{align} | (D.15) |
where (D.14) is due to the upper bound in Lemma 5. Now, combining (D.6) and (D.15) yields the upper bound
\begin{align} Q(d, \theta) &\leq \frac{1}{2} \frac{\frac{4}{\theta d} \cdot \left(1-\theta^{2}\right)^{\frac{d-1}{2}}}{\sqrt{\frac{2\pi}{d}}} \end{align} | (D.16) |
\begin{align} & = \sqrt{\frac{2}{\pi}}\frac{1}{\theta\sqrt{d}} \cdot \left(1-\theta^{2}\right)^{\frac{d-1}{2}} \end{align} | (D.17) |
\begin{align} &\leq \frac{1}{\theta\sqrt{d}} \cdot \left(1-\theta^{2}\right)^{\frac{d-1}{2}}, \end{align} | (D.18) |
which completes the proof of Lemma 1.
Observe that under the assumption of
\begin{align} ( \boldsymbol{X}, \boldsymbol{Y}) \sim {\cal N}^{\otimes d}\left(\left[\begin{matrix} 0 \\ 0 \end{matrix}\right], \left[\begin{matrix} 1 & \rho \\ \rho & 1 \end{matrix}\right] \right), \end{align} | (E.1) |
it holds that \boldsymbol{Y} = \rho \boldsymbol{X} + \sqrt{1-\rho^{2}} \boldsymbol{Z} with \boldsymbol{Z} \sim {\cal N}(\boldsymbol{0}_{d}, {\bf{I}}_{d}) and independent of \boldsymbol{X} . Consider the following for some \theta \in (0, 1) :
\begin{align} \mathbb{P}\left\{\tilde{ \boldsymbol{X}}^{\mathsf{T}} \tilde{ \boldsymbol{Y}} \geq \theta \right\} & = \mathbb{P}\left\{\frac{ \boldsymbol{X}^{\mathsf{T}} \boldsymbol{Y}}{\| \boldsymbol{X}\| \cdot \| \boldsymbol{Y}\|} \geq \theta \right\} \end{align} | (E.2) |
\begin{align} & = \mathbb{P}\left\{\frac{ \boldsymbol{X}^{\mathsf{T}}\left(\rho \boldsymbol{X} + \sqrt{1-\rho^{2}} \boldsymbol{Z}\right)}{\| \boldsymbol{X}\|\left\|\rho \boldsymbol{X} + \sqrt{1-\rho^{2}} \boldsymbol{Z}\right\|} \geq \theta \right\} \end{align} | (E.3) |
\begin{align} & = \mathbb{P}\left\{\frac{\rho\| \boldsymbol{X}\|^{2} + \sqrt{1-\rho^{2}} \boldsymbol{X}^{\mathsf{T}} \boldsymbol{Z}} {\| \boldsymbol{X}\| \sqrt{\rho^{2}\| \boldsymbol{X}\|^{2} + 2\rho\sqrt{1-\rho^{2}} \boldsymbol{X}^{ \mathsf{T}} \boldsymbol{Z} + (1-\rho^{2})\| \boldsymbol{Z}\|^{2}}} \geq \theta \right\}. \end{align} | (E.4) |
Since \theta > 0 , the event in (E.4) can occur only if \rho\| \boldsymbol{X}\|^{2} + \sqrt{1-\rho^{2}} \boldsymbol{X}^{\mathsf{T}} \boldsymbol{Z} \geq 0 . Let us denote the event
\begin{align} {\cal F}_{\rho}( \boldsymbol{X}, \boldsymbol{Z}) = \left\{\rho\| \boldsymbol{X}\|^{2} + \sqrt{1-\rho^{2}} \boldsymbol{X}^{\mathsf{T}} \boldsymbol{Z} \geq 0\right\} \end{align} | (E.5) |
and the angle between \boldsymbol{X} and \boldsymbol{Z} as \Phi_{ \mbox{xz}} . In this case,
\begin{align} & \mathbb{P}\left\{\tilde{ \boldsymbol{X}}^{\mathsf{T}} \tilde{ \boldsymbol{Y}} \geq \theta \right\} \\ & = \mathbb{P}\left\{\frac{\rho\| \boldsymbol{X}\|^{2} + \sqrt{1-\rho^{2}} \boldsymbol{X}^{\mathsf{T}} \boldsymbol{Z}} {\| \boldsymbol{X}\| \sqrt{\rho^{2}\| \boldsymbol{X}\|^{2} + 2\rho\sqrt{1-\rho^{2}} \boldsymbol{X}^{ \mathsf{T}} \boldsymbol{Z} + (1-\rho^{2})\| \boldsymbol{Z}\|^{2}}} \geq \theta , {\cal F}_{\rho}( \boldsymbol{X}, \boldsymbol{Z}) \right\} \end{align} | (E.6) |
\begin{align} & = \mathbb{P}\left\{\frac{\rho^{2}\| \boldsymbol{X}\|^{4} + 2 \rho \sqrt{1-\rho^{2}} \| \boldsymbol{X}\|^{2} \boldsymbol{X}^{\mathsf{T}} \boldsymbol{Z} +(1-\rho^{2})\left( \boldsymbol{X}^{\mathsf{T}} \boldsymbol{Z}\right)^{2}} {\rho^{2}\| \boldsymbol{X}\|^{4} + 2\rho\sqrt{1-\rho^{2}}\| \boldsymbol{X}\|^{2} \boldsymbol{X}^{ \mathsf{T}} \boldsymbol{Z} + (1-\rho^{2})\| \boldsymbol{X}\|^{2}\| \boldsymbol{Z}\|^{2}} \geq \theta^{2} , {\cal F}_{\rho}( \boldsymbol{X}, \boldsymbol{Z}) \right\} \end{align} | (E.7) |
\begin{align} & = \mathbb{P}\left\{\frac{\rho^{2}\| \boldsymbol{X}\|^{4} + 2 \rho \sqrt{1-\rho^{2}} \| \boldsymbol{X}\|^{3}\| \boldsymbol{Z}\|\cdot\cos(\Phi_{ \mbox{xz}}) +(1-\rho^{2})\| \boldsymbol{X}\|^{2}\| \boldsymbol{Z}\|^{2}\cdot\cos^{2}(\Phi_{ \mbox{xz}})} {\rho^{2}\| \boldsymbol{X}\|^{4} + 2\rho\sqrt{1-\rho^{2}}\| \boldsymbol{X}\|^{3}\| \boldsymbol{Z}\|\cdot\cos(\Phi_{ \mbox{xz}}) + (1-\rho^{2})\| \boldsymbol{X}\|^{2}\| \boldsymbol{Z}\|^{2}} \geq \theta^{2} , {\cal F}_{\rho}( \boldsymbol{X}, \boldsymbol{Z}) \right\} \end{align} | (E.8) |
\begin{align} & = \mathbb{P}\left\{\frac{\rho^{2}\| \boldsymbol{X}\|^{2} + 2 \rho \sqrt{1-\rho^{2}} \| \boldsymbol{X}\|\| \boldsymbol{Z}\|\cdot\cos(\Phi_{ \mbox{xz}}) +(1-\rho^{2})\| \boldsymbol{Z}\|^{2}\cdot\cos^{2}(\Phi_{ \mbox{xz}})} {\rho^{2}\| \boldsymbol{X}\|^{2} + 2\rho\sqrt{1-\rho^{2}}\| \boldsymbol{X}\|\| \boldsymbol{Z}\|\cdot\cos(\Phi_{ \mbox{xz}}) + (1-\rho^{2})\| \boldsymbol{Z}\|^{2}} \geq \theta^{2} , {\cal F}_{\rho}( \boldsymbol{X}, \boldsymbol{Z}) \right\}. \end{align} | (E.9) |
Rearranging, we get
\begin{align} & \mathbb{P}\left\{\tilde{ \boldsymbol{X}}^{\mathsf{T}} \tilde{ \boldsymbol{Y}} \geq \theta \right\} \\ & = \mathbb{P}\left\{ (1-\rho^{2})\| \boldsymbol{Z}\|^{2}\cdot\cos^{2}(\Phi_{ \mbox{xz}}) + 2 \rho \sqrt{1-\rho^{2}} (1-\theta^{2}) \| \boldsymbol{X}\|\| \boldsymbol{Z}\|\cdot\cos(\Phi_{ \mbox{xz}}) \right. \\ &\; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \left. + \rho^{2}(1-\theta^{2})\| \boldsymbol{X}\|^{2} - (1-\rho^{2})\theta^{2}\| \boldsymbol{Z}\|^{2} \geq 0 , {\cal F}_{\rho}( \boldsymbol{X}, \boldsymbol{Z}) \right\} \end{align} | (E.10) |
\begin{align} & = \mathbb{P}\left\{ (1-\rho^{2})\cdot\cos^{2}(\Phi_{ \mbox{xz}}) + 2 \rho \sqrt{1-\rho^{2}} (1-\theta^{2}) \frac{\| \boldsymbol{X}\|}{\| \boldsymbol{Z}\|}\cdot\cos(\Phi_{ \mbox{xz}}) \right. \\ &\; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \left. + \rho^{2}(1-\theta^{2})\frac{\| \boldsymbol{X}\|^{2}}{\| \boldsymbol{Z}\|^{2}} - (1-\rho^{2})\theta^{2} \geq 0 , {\cal F}_{\rho}( \boldsymbol{X}, \boldsymbol{Z}) \right\} \end{align} | (E.11) |
\begin{align} & \stackrel{\triangle}{ = } \mathbb{P}\left\{\cos^{2}(\Phi_{ \mbox{xz}}) + \frac{2\rho (1-\theta^{2})}{\sqrt{1-\rho^{2}}} \sqrt{U} \cdot\cos(\Phi_{ \mbox{xz}}) + \frac{\rho^{2}(1-\theta^{2})}{1-\rho^{2}} U - \theta^{2} \geq 0 , {\cal F}_{\rho}( \boldsymbol{X}, \boldsymbol{Z}) \right\} \end{align} | (E.12) |
\begin{align} & = \mathbb{P}\left\{\left(\cos(\Phi_{ \mbox{xz}}) + \frac{\rho (1-\theta^{2})}{\sqrt{1-\rho^{2}}}\sqrt{U} \right)^{2} + \frac{\rho^{2}\theta^{2}(1-\theta^{2})}{1-\rho^{2}} U - \theta^{2} \geq 0 , {\cal F}_{\rho}( \boldsymbol{X}, \boldsymbol{Z}) \right\} \end{align} | (E.13) |
\begin{align} & = \mathbb{P}\left\{\left(\cos(\Phi_{ \mbox{xz}}) + \frac{\rho (1-\theta^{2})}{\sqrt{1-\rho^{2}}}\sqrt{U} \right)^{2} \geq \theta^{2} \left[1 - \frac{\rho^{2}(1-\theta^{2})}{1-\rho^{2}} U\right], {\cal F}_{\rho}( \boldsymbol{X}, \boldsymbol{Z}) \right\}. \end{align} | (E.14) |
Note that \rho\| \boldsymbol{X}\|^{2} + \sqrt{1-\rho^{2}} \boldsymbol{X}^{\mathsf{T}} \boldsymbol{Z} \geq 0 is equivalent to
\begin{align} \cos(\Phi_{ \mbox{xz}}) \geq -\frac{\rho}{\sqrt{1-\rho^{2}}}\sqrt{U}, \end{align} | (E.15) |
and therefore
\begin{align} & \mathbb{P}\left\{\tilde{ \boldsymbol{X}}^{\mathsf{T}} \tilde{ \boldsymbol{Y}} \geq \theta \right\} \\ & = \mathbb{P}\left\{\left(\cos(\Phi_{ \mbox{xz}}) + \frac{\rho (1-\theta^{2})}{\sqrt{1-\rho^{2}}}\sqrt{U} \right)^{2} \geq \theta^{2} \left[1 - \frac{\rho^{2}(1-\theta^{2})}{1-\rho^{2}} U\right], \cos(\Phi_{ \mbox{xz}}) \geq -\frac{\rho}{\sqrt{1-\rho^{2}}}\sqrt{U}\right\}. \end{align} | (E.16) |
Now, the left-hand side event in (E.16) becomes trivial if and only if
\begin{align} 1 - \frac{\rho^{2}(1-\theta^{2})}{1-\rho^{2}} U \leq 0 \; \iff\; U \geq \frac{1-\rho^2}{\rho^2(1-\theta^2)} \stackrel{\triangle}{ = } \beta(\rho, \theta), \end{align} | (E.17) |
while the right-hand side event in (E.16) becomes trivial if and only if
\begin{align} \frac{\rho}{\sqrt{1-\rho^{2}}} \sqrt{U} \geq 1 \; \iff\; U \geq \frac{1-\rho^2}{\rho^2} \stackrel{\triangle}{ = } \alpha(\rho). \end{align} | (E.18) |
In addition, note that for any \theta \in (0, 1) , it holds that \beta(\rho, \theta) \geq \alpha(\rho) . Define the functions
\begin{align} F_{1}(u) \equiv F_{1}(u, \rho, \theta) & = -\frac{\rho (1-\theta^{2})}{\sqrt{1-\rho^{2}}}\sqrt{u} - \theta\sqrt{1 - \frac{\rho^{2}(1-\theta^{2})}{1-\rho^{2}} u}, \end{align} | (E.19) |
\begin{align} F_{2}(u) \equiv F_{2}(u, \rho, \theta) & = -\frac{\rho (1-\theta^{2})}{\sqrt{1-\rho^{2}}}\sqrt{u} + \theta\sqrt{1 - \frac{\rho^{2}(1-\theta^{2})}{1-\rho^{2}} u}, \end{align} | (E.20) |
\begin{align} F_{3}(u) \equiv F_{3}(u, \rho) & = - \frac{\rho }{\sqrt{1-\rho^{2}}}\sqrt{u}, \end{align} | (E.21) |
such that
\begin{align} \mathbb{P}\left\{\tilde{ \boldsymbol{X}}^{\mathsf{T}} \tilde{ \boldsymbol{Y}} \geq \theta \right\} = \mathbb{P}\left\{\cos(\Phi_{ \mbox{xz}}) \in [-1, F_{1}(U, \rho, \theta)]\cup [F_{2}(U, \rho, \theta), 1], \cos(\Phi_{ \mbox{xz}}) \in [F_{3}(U, \rho), 1] \right\}. \end{align} | (E.22) |
Note that both \| \boldsymbol{X}\|^{2} and \| \boldsymbol{Z}\|^{2} follow a \chi^{2} distribution with d degrees of freedom. Hence, the random variable defined by the ratio
\begin{align} U = \frac{\| \boldsymbol{X}\|^{2}}{\| \boldsymbol{Z}\|^{2}} \end{align} | (E.23) |
follows the F -distribution with (d, d) degrees of freedom. The probability density function of a general F -distributed random variable with (d_{1}, d_{2}) degrees of freedom is given by
\begin{align} f_{U}(u;d_{1}, d_{2}) = \frac{1}{B\left(\frac{d_{1}}{2}, \frac{d_{2}}{2}\right)} \left(\frac{d_{1}}{d_{2}}\right)^{d_{1}/2} u^{d_{1}/2-1}\left(1+\frac{d_{1}}{d_{2}}u\right)^{-(d_{1}+d_{2})/2}, \quad u\geq 0, \end{align} | (E.24) |
and for d_{1} = d_{2} = d , we denote it as
\begin{align} f_{U}(u) = \frac{1}{B\left(\frac{d}{2}, \frac{d}{2}\right)} u^{d/2-1}\left(1+u\right)^{-d}, \quad u\geq 0. \end{align} | (E.25) |
We now have
\begin{align} & \mathbb{P}\left\{\tilde{ \boldsymbol{X}}^{\mathsf{T}} \tilde{ \boldsymbol{Y}} \geq \theta \right\} \\ & = \int_{0}^{\infty} \mathbb{P}\left\{\cos(\Phi_{ \mbox{xz}}) \in [-1, F_{1}(u)]\cup [F_{2}(u), 1], \cos(\Phi_{ \mbox{xz}}) \in [F_{3}(u), 1] \right\} f_{U}(u) \mathrm{d} u \end{align} | (E.26) |
\begin{align} & = \int_{0}^{\alpha(\rho)} \mathbb{P}\left\{\cos(\Phi_{ \mbox{xz}}) \in [-1, F_{1}(u)]\cup [F_{2}(u), 1], \cos(\Phi_{ \mbox{xz}}) \in [F_{3}(u), 1] \right\} f_{U}(u) \mathrm{d} u \\ &\; \; \; +\int_{\alpha(\rho)}^{\beta(\rho, \theta)} \mathbb{P}\left\{\cos(\Phi_{ \mbox{xz}}) \in [-1, F_{1}(u)]\cup [F_{2}(u), 1], \cos(\Phi_{ \mbox{xz}}) \in [F_{3}(u), 1] \right\} f_{U}(u) \mathrm{d} u \\ &\; \; \; \; \; \; +\int_{\beta(\rho, \theta)}^{\infty} \mathbb{P}\left\{\cos(\Phi_{ \mbox{xz}}) \in [-1, F_{1}(u)]\cup [F_{2}(u), 1], \cos(\Phi_{ \mbox{xz}}) \in [F_{3}(u), 1] \right\} f_{U}(u) \mathrm{d} u \end{align} | (E.27) |
\begin{align} & = \int_{0}^{\alpha(\rho)} \mathbb{P}\left\{\cos(\Phi_{ \mbox{xz}}) \in [-1, F_{1}(u)]\cup [F_{2}(u), 1], \cos(\Phi_{ \mbox{xz}}) \in [F_{3}(u), 1] \right\} f_{U}(u) \mathrm{d} u \\ &\; \; \; + \int_{\alpha(\rho)}^{\beta(\rho, \theta)} \mathbb{P}\left\{\cos(\Phi_{ \mbox{xz}}) \in [-1, F_{1}(u)]\cup [F_{2}(u), 1] \right\} f_{U}(u) \mathrm{d} u \\ &\; \; \; \; \; \; + \int_{\beta(\rho, \theta)}^{\infty} f_{U}(u) \mathrm{d} u, \end{align} | (E.28) |
where (E.28) follows from the following considerations. For the second integral, we use the fact that for any u \geq \alpha(\rho) , the second event is trivial, and for the last integral, we use the fact that for any u \geq \beta(\rho, \theta) , both events become trivial. In order to facilitate the first integral in (E.28), we prove that for any u \in [0, \alpha(\rho)] , it holds that F_{1}(u) \leq F_{3}(u) \leq F_{2}(u) (as can be seen in Figure 15 for two specific choices of the pair (\rho, \theta) ).
It immediately follows from the definitions of F_{1}(u) and F_{2}(u) in (E.19) and (E.20) that for any u \in [0, \beta(\rho, \theta)] , F_{1}(u) \leq F_{2}(u) . We start by proving that F_{1}(u) \leq F_{3}(u) . Note that F_{3}(u) is monotonically decreasing and F_{3}(0) = 0 . In addition, F_{1}(0) = -\theta < F_{3}(0) . Furthermore, note that F_{1}(\alpha(\rho)) = F_{3}(\alpha(\rho)) = -1 , so it only remains to show that F_{1}(u) is also monotonically decreasing for any u \in [0, \alpha(\rho)] . To this end, we show that F'_{1}(u) = 0 has a unique solution at u = \alpha(\rho) . The derivative F'_{1}(u) is given by
\begin{align} F'_{1}(u) = -\frac{\rho (1-\theta^{2})}{\sqrt{1-\rho^{2}}}\cdot\frac{1}{2\sqrt{u}} + \theta \frac{\frac{\rho^{2}(1-\theta^{2})}{1-\rho^{2}}}{2\sqrt{1 - \frac{\rho^{2}(1-\theta^{2})}{1-\rho^{2}} u}}. \end{align} | (E.29) |
F'_{1}(u) = 0 is then equivalent to
\begin{align} \sqrt{1 - \frac{\rho^{2}(1-\theta^{2})}{1-\rho^{2}} u} = \frac{\rho\theta}{\sqrt{1-\rho^{2}}} \cdot \sqrt{u}, \end{align} | (E.30) |
or, in turn, to the linear equation
\begin{align} 1 - \frac{\rho^{2}(1-\theta^{2})}{1-\rho^{2}} u = \frac{\rho^2\theta^2}{1-\rho^{2}} u, \end{align} | (E.31) |
whose unique solution is given by
\begin{align} u^{*} = \frac{1-\rho^2}{\rho^2} = \alpha(\rho). \end{align} | (E.32) |
Since F_1(0) = -\theta > -1 = F_1(\alpha(\rho)) , this proves that F_{1}(u) is monotonically decreasing for any u \in [0, \alpha(\rho)] . This completes the proof of F_{1}(u) \leq F_{3}(u) .
We next prove that F_{3}(u) \leq F_{2}(u) . The derivative of F_{2}(u) is given by
\begin{align} F'_{2}(u) = -\frac{\rho (1-\theta^{2})}{\sqrt{1-\rho^{2}}}\cdot\frac{1}{2\sqrt{u}} - \theta \frac{\frac{\rho^{2}(1-\theta^{2})}{1-\rho^{2}}}{2\sqrt{1 - \frac{\rho^{2}(1-\theta^{2})}{1-\rho^{2}} u}}, \end{align} | (E.33) |
which is negative for any u \in [0, \beta(\rho, \theta)] , and thus F_{2}(u) is a monotonically decreasing function on the same interval. Note that F_{2}(0) = \theta > 0 = F_{3}(0) and, in addition
\begin{align} F_{2}(\beta(\rho, \theta)) & = -\frac{\rho (1-\theta^{2})}{\sqrt{1-\rho^{2}}}\sqrt{\frac{1-\rho^2}{\rho^2(1-\theta^2)}} + \theta\sqrt{1 - \frac{\rho^{2}(1-\theta^{2})}{1-\rho^{2}} \cdot \frac{1-\rho^2}{\rho^2(1-\theta^2)}} \end{align} | (E.34) |
\begin{align} & = -\sqrt{1-\theta^{2}} \end{align} | (E.35) |
\begin{align} &\geq -1. \end{align} | (E.36) |
Since F_{2}(u) is monotonically decreasing and \beta(\rho, \theta) \geq \alpha(\rho) , we get
\begin{align} F_{2}(\alpha(\rho)) \geq F_{2}(\beta(\rho, \theta)) \geq -1 = F_{3}(\alpha(\rho)), \end{align} | (E.37) |
which completes the proof of F_{3}(u) \leq F_{2}(u) , since F_{3}(u) is monotonically decreasing for any u\geq 0 .
Using the double inequality F_{1}(u) \leq F_{3}(u) \leq F_{2}(u) , which holds for any u \in [0, \alpha(\rho)] , we arrive at
\begin{align} \mathbb{P}\left\{\tilde{ \boldsymbol{X}}^{\mathsf{T}} \tilde{ \boldsymbol{Y}} \geq \theta \right\} & = \int_{0}^{\alpha(\rho)} \mathbb{P}\left\{\cos(\Phi_{ \mbox{xz}}) \in [F_{2}(u), 1] \right\} f_{U}(u) \mathrm{d} u \\ &\; \; \; + \int_{\alpha(\rho)}^{\beta(\rho, \theta)} \mathbb{P}\left\{\cos(\Phi_{ \mbox{xz}}) \in [-1, F_{1}(u)]\cup [F_{2}(u), 1] \right\} f_{U}(u) \mathrm{d} u \\ &\; \; \; \; \; \; + \int_{\beta(\rho, \theta)}^{\infty} f_{U}(u) \mathrm{d} u. \end{align} | (E.38) |
Regarding the probabilities inside the first two integrals in (E.38), consider the following result.
Lemma 6. Let d \in \mathbb{N} and \boldsymbol{X}, \boldsymbol{Y} \sim {\cal N}(\boldsymbol{0}_{d}, {\bf{I}}_{d}) be two independent random vectors. Let \tilde{ \boldsymbol{X}} = \tfrac{ \boldsymbol{X}}{\| \boldsymbol{X}\|}, \; \tilde{ \boldsymbol{Y}} = \tfrac{ \boldsymbol{Y}}{\| \boldsymbol{Y}\|} , and define S = \cos(\Phi) = \tilde{ \boldsymbol{X}}^{ \mathsf{T}}\tilde{ \boldsymbol{Y}} . The probability density function of S is then given by
\begin{align} g_{S}(s) = \frac{(1-s^{2})^{\frac{d-3}{2}}}{B\left(\frac{d-1}{2}, \frac{1}{2}\right)}, \; s \in [-1, 1], \end{align} | (E.39) |
where the complete beta function is
\begin{align} B(a, b) = \int_{0}^{1} t^{a-1}(1-t)^{b-1} \mathrm{d} t, \quad a, b > 0. \end{align} | (E.40) |
Finally,
\begin{align} \mathbb{P}\left\{\tilde{ \boldsymbol{X}}^{\mathsf{T}} \tilde{ \boldsymbol{Y}} \geq \theta \right\} & = \int_{0}^{\alpha(\rho)} \left[\int_{F_{2}(u)}^{1} g_{S}(s) \mathrm{d} s \right] f_{U}(u) \mathrm{d} u \\ &\; \; \; + \int_{\alpha(\rho)}^{\beta(\rho, \theta)} \left[\int_{-1}^{F_{1}(u)} g_{S}(s) \mathrm{d} s + \int_{F_{2}(u)}^{1} g_{S}(s) \mathrm{d} s\right] f_{U}(u) \mathrm{d} u \\ &\; \; \; \; \; \; + \int_{\beta(\rho, \theta)}^{\infty} f_{U}(u) \mathrm{d} u, \end{align} | (E.41) |
which completes the proof of Lemma 2.
[1] | P. Ohm, Broken promises of privacy: Responding to the surprising failure of anonymization, UCLA Law Rev., 57 (2009), 1701. |
[2] | J. Sedayao, R. Bhardwaj, N. Gorade, Making big data, privacy, and anonymization work together in the enterprise: Experiences and issues, In: 2014 IEEE International Congress on Big Data, USA: IEEE, 2014,601–607. https://doi.org/10.1109/BigData.Congress.2014.92 |
[3] |
L. Sweeney, Weaving technology and policy together to maintain confidentiality, J. Law Med. Ethics, 25 (1997), 98–110. https://doi.org/10.1111/j.1748-720X.1997.tb01885.x doi: 10.1111/j.1748-720X.1997.tb01885.x
![]() |
[4] |
F. M. Naini, J. Unnikrishnan, P. Thiran, M. Vetterli, Where you are is who you are: User identification by matching statistics, IEEE T. Inf. Foren. Sec., 11 (2016), 358–372. https://doi.org/10.1109/TIFS.2015.2498131 doi: 10.1109/TIFS.2015.2498131
![]() |
[5] | A. Datta, D. Sharma, A. Sinha, Provable de-anonymization of large datasets with sparse dimensions, In: International Conference on Principles of Security and Trust, Berlin, Germany: Springer, 2012,229–248. https://doi.org/10.1007/978-3-642-28641-4_13 |
[6] | A. Narayanan, V. Shmatikov, Robust de-anonymization of large sparse datasets, In: 2008 IEEE Symposium on Security and Privacy, USA: IEEE, 2008,111–125. https://doi.org/10.1109/SP.2008.33 |
[7] |
N. Takbiri, A. Houmansadr, D. L. Goeckel, H. Pishro-Nik, Matching anonymized and obfuscated time series to users' profiles, IEEE T. Inform. Theory, 65 (2019), 724–741. https://doi.org/10.1109/TIT.2018.2873134 doi: 10.1109/TIT.2018.2873134
![]() |
[8] | G. Wondracek, T. Holz, E. Kirda, C. Kruegel, A practical attack to de-anonymize social network users, In: 2010 IEEE Symposium on Security and Privacy, USA: IEEE, 2010,223–238. https://doi.org/10.1109/SP.2010.21 |
[9] | J. Su, A. Shukla, S. Goel, A. Narayanan, De-anonymizing web browsing data with social networks, In: Proc. 26th Int. Conf. World Wide Web, 2017, 1261–1269. https://doi.org/10.1145/3038912.3052714 |
[10] | L. Bilge, T. Strufe, D. Balzarotti, E. Kirda, All your contacts are belong to us: Automated identity theft attacks on social networks, In: Proc. 18th Int. Conf. World wide web, 2009,551–560. https://doi.org/10.1145/1526709.1526784 |
[11] | M. Srivatsa M. Hicks, Deanonymizing mobility traces: Using social network as a side-channel, In: Proc. ACM Conf. Comput. Commun. Secur., 2012,628–637. https://doi.org/10.1145/2382196.2382262 |
[12] | Z. Cheng, J. Caverlee, K. Lee, You are where you tweet: A content based approach to geo-locating Twitter users, In: Proc. 19th ACM Int. Conf. Inf. Knowl. Manage., 2010,759–768. https://doi.org/10.1145/1871437.1871535 |
[13] | Z. K, B. Nazer, Detecting correlated Gaussian databases, In: IEEE International Symposium on Information Theory, Finland: IEEE, 2022, 2064–2069. https://doi.org/10.1109/ISIT50566.2022.9834731 |
[14] | D. Elimelech, W. Huleihel, Phase transitions in the detection of correlated databases, In: International Conference on Machine Learning (PMLR), 2023, 9246–9266. |
[15] |
V. Paslev, W. Huleihel, Testing dependency of unlabeled databases, IEEE T. Inform. Theory, 70 (2024), 7410–7431. https://doi.org/10.1109/TIT.2024.3442977 doi: 10.1109/TIT.2024.3442977
![]() |
[16] | D. Cullina, P. Mittal, N. Kiyavash, Fundamental limits of database alignment, In: IEEE International Symposium on Information Theory, USA: IEEE, 2018,651–655. https://doi.org/10.1109/ISIT.2018.8437908 |
[17] | O. E. Dai, D. Cullina, N. Kiyavash, Database alignment with Gaussian features, In: Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, 2019, 3225–3233. |
[18] | O. E. Dai, D. Cullina, N. Kiyavash, Achievability of nearly-exact alignment for correlated Gaussian databases, In: IEEE International Symposium on Information Theory, USA: IEEE, 2020, 1230–1235. https://doi.org/10.1109/ISIT44484.2020.9174507 |
[19] | F. Shirani, S. Garg, E. Erkip, A concentration of measure approach to database de-anonymization, In: IEEE International Symposium on Information Theory, France: IEEE, 2019, 2748–2752. https://doi.org/10.1109/ISIT.2019.8849392 |
[20] | S. Bakırtaş, E. Erkip, Database matching under column deletions, In: IEEE International Symposium on Information Theory, Australia: IEEE, 2021, 2720–2725. https://doi.org/10.1109/ISIT45174.2021.9518145 |
[21] | S. Bakırtaş, E. Erkip, Database matching under adversarial column deletions, In: IEEE Information Theory Workshop, France: IEEE, 2023,181–185. https://doi.org/10.1109/ITW55543.2023.10161615 |
[22] | S. Bakırtaş, E. Erkip, Distribution-agnostic database de-anonymization under synchronization errors, In: IEEE International Workshop on Information Forensics and Security (WIFS), Germany: IEEE, 2023, 1–6. https://doi.org/10.1109/ITW55543.2023.10161615 |
[23] |
S. Bakırtaş, E. Erkip, Database matching under noisy synchronization errors, IEEE T. Inform. Theory, 70 (2024), 4335–4367. https://doi.org/10.1109/TIT.2024.3388990 doi: 10.1109/TIT.2024.3388990
![]() |
[24] | M. Chertkov, L. Kroc, F. Krzakala, M. Vergassola, L. Zdeborová, Inference in particle tracking experiments by passing messages between images, In: Proceedings of the National Academy of Sciences, 107 (2010), 7663–7668. https://doi.org/10.1073/pnas.0910994107 |
[25] | D. Kunisky, J. Niles-Weed, Strong recovery of geometric planted matchings, In: ACM-SIAM Symposium on Discrete Algorithms, 2022,834–876. https://doi.org/10.1137/1.9781611977073.36 |
[26] | P. Pedarsani, M. Grossglauser, On the privacy of anonymized networks, In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2011, 1235–1243. https://doi.org/10.1145/2020408.2020596 |
[27] |
J. Ding, Z. Ma, Y. Wu, J. Xu, Efficient random graph matching via degree profiles, Probab. Theory Rel., 179 (2021), 29–115. https://doi.org/10.1007/s00440-020-00997-4 doi: 10.1007/s00440-020-00997-4
![]() |
[28] |
Y. Wu, J. Xu, S. H. Yu, Testing correlation of unlabeled random graphs, Ann. Appl. Probab., 33 (2023), 2519–2558. https://doi.org/10.1214/22-AAP1786 doi: 10.1214/22-AAP1786
![]() |
[29] |
C. Mao, Y. Wu, J. Xu, S. H. Yu, Testing network correlation efficiently via counting trees, Ann. Stat., 52 (2024), 2483–2505. https://doi.org/10.1214/23-AOS2261 doi: 10.1214/23-AOS2261
![]() |
[30] |
Y. Wu, J. Xu, H. Y. Sophie, Settling the sharp reconstruction thresholds of random graph matching, IEEE T. Inform. Theory, 68 (2022), 5391–5417. https://doi.org/10.1109/TIT.2022.3169005 doi: 10.1109/TIT.2022.3169005
![]() |
[31] | L. Ganassali, Sharp threshold for alignment of graph databases with Gaussian weights, In: Mathematical and Scientific Machine Learning, 2 (2022), 314–335. |
[32] | S. M. Kay, Fundamentals of statistical signal processing: Detection theory, University of Rhode Island: Prentice Hall PTR, 1998. |
[33] | M. Basseville, I. V. Nikiforov, Detection of abrupt changes: theory and application, Prentice-Hall, 1993. |
[34] |
H. W. Kuhn, The Hungarian Method for the assignment problem, Nav. Res. Logist. Quart., 2 (1955), 83–97. https://doi.org/10.1002/nav.3800020109 doi: 10.1002/nav.3800020109
![]() |
[35] |
H. W. Kuhn, Variants of the Hungarian method for assignment problems, Nav. Res. Logist. Quart., 3 (1956), 253–258. https://doi.org/10.1002/nav.3800030404 doi: 10.1002/nav.3800030404
![]() |
[36] |
J. Munkres, Algorithms for the assignment and transportation problems, J. Soc. Ind. Appl. Math., 5 (1957), 32–38. https://doi.org/10.1137/0105003 doi: 10.1137/0105003
![]() |
[37] |
J. Edmonds, R. M. Karp, Theoretical improvements in algorithmic efficiency for network flow problems, JACM, 19 (1972), 248–264. https://doi.org/10.1145/321694.321699 doi: 10.1145/321694.321699
![]() |
[38] |
S. Janson, New versions of Suen's correlation inequality, Random Struct. Algor., 13 (1998), 467–483. https://doi.org/10.1002/(SICI)1098-2418(199810/12)13:3/4<467::AID-RSA15>3.0.CO;2-W doi: 10.1002/(SICI)1098-2418(199810/12)13:3/4<467::AID-RSA15>3.0.CO;2-W
![]() |
[39] |
R. Tamir, N. Merhav, N. Weinberger, A. G. i Fàbregas, Large deviations behavior of the logarithmic error probability of random codes, IEEE T. Inform. Theory, 66 (2020), 6635–6659. https://doi.org/10.1109/TIT.2020.2995136 doi: 10.1109/TIT.2020.2995136
![]() |
[40] |
L. V. Truong, G. Cocco, J. Font-Segura, A. G. i Fàbregas, Concentration properties of random codes, IEEE T. Inform. Theory, 69 (2023), 7499–7537. https://doi.org/10.1109/TIT.2023.3312326 doi: 10.1109/TIT.2023.3312326
![]() |
[41] |
N. Merhav, Error exponents of typical random codes, IEEE T. Inform. Theory, 64 (2018), 6223–6235. https://doi.org/10.1109/TIT.2018.2834503 doi: 10.1109/TIT.2018.2834503
![]() |
[42] |
N. Merhav, Statistical physics and information theory, Found. Trends Commun., 6 (2009), 1–212. https://doi.org/10.1561/0100000052 doi: 10.1561/0100000052
![]() |
[43] |
C. Bunte, A. Lapidoth, On the listsize capacity with feedback, IEEE T. Inform. Theory, 60 (2014), 6733–6748. https://doi.org/10.1109/TIT.2014.2355815 doi: 10.1109/TIT.2014.2355815
![]() |
[44] |
S. Li, Concise formulas for the area and volume of a hyperspherical cap, Asian J. Math. Stat., 4 (2011), 66–70. https://doi.org/10.3923/ajms.2011.66.70 doi: 10.3923/ajms.2011.66.70
![]() |
[45] | J. Riordan, An introduction to combinatorial analysis, New York: Wiley, 1980. |