Research article Special Issues

Output statistics, equivocation, and state masking

  • Given a discrete memoryless channel and a target distribution on its output alphabet, one wishes to construct a length-n rate-R codebook such that the output distribution—computed over a codeword that is chosen uniformly at random—should be close to the n-fold tensor product of the target distribution. Here "close" means that the relative entropy between the output distribution and said n-fold product should be small. We characterize the smallest achievable relative entropy divided by n as n tends to infinity. We then demonstrate two applications of this result. The first application is an alternative proof of the achievability of the rate-equivocation region of the wiretap channel. The second application is a new capacity result for communication subject to state masking in the scenario where the decoder has access to channel-state information.

    Citation: Ligong Wang. Output statistics, equivocation, and state masking[J]. AIMS Mathematics, 2025, 10(6): 13151-13165. doi: 10.3934/math.2025590

    Related Papers:

    [1] Lian-Ta Su, Kuldip Raj, Sonali Sharma, Qing-Bo Cai . Applications of relative statistical convergence and associated approximation theorem. AIMS Mathematics, 2022, 7(12): 20838-20849. doi: 10.3934/math.20221142
    [2] R. Mareay, Radwan Abu-Gdairi, M. Badr . Soft rough fuzzy sets based on covering. AIMS Mathematics, 2024, 9(5): 11180-11193. doi: 10.3934/math.2024548
    [3] T. M. Athira, Sunil Jacob John, Harish Garg . A novel entropy measure of Pythagorean fuzzy soft sets. AIMS Mathematics, 2020, 5(2): 1050-1061. doi: 10.3934/math.2020073
    [4] B. B. Jena, S. K. Paikray, S. A. Mohiuddine, Vishnu Narayan Mishra . Relatively equi-statistical convergence via deferred Nörlund mean based on difference operator of fractional-order and related approximation theorems. AIMS Mathematics, 2020, 5(1): 650-672. doi: 10.3934/math.2020044
    [5] Weiwei Sun, Mengyang Qiu, Xinyu Lv . H filter design for a class of delayed Hamiltonian systems with fading channel and sensor saturation. AIMS Mathematics, 2020, 5(4): 2909-2922. doi: 10.3934/math.2020188
    [6] Liandi Fang, Li Ma, Shihong Ding . Finite-time fuzzy output-feedback control for $ p $-norm stochastic nonlinear systems with output constraints. AIMS Mathematics, 2021, 6(3): 2244-2267. doi: 10.3934/math.2021136
    [7] Imran Shahzad Khan, Choonkil Park, Abdullah Shoaib, Nasir Shah . A study of fixed point sets based on Z-soft rough covering models. AIMS Mathematics, 2022, 7(7): 13278-13291. doi: 10.3934/math.2022733
    [8] Walid Emam . Benefiting from statistical modeling in the analysis of current health expenditure to gross domestic product. AIMS Mathematics, 2023, 8(5): 12398-12421. doi: 10.3934/math.2023623
    [9] Sergio Verdú . Relative information spectra with applications to statistical inference. AIMS Mathematics, 2024, 9(12): 35038-35090. doi: 10.3934/math.20241668
    [10] Sergio Verdú . Correction: Relative information spectra with applications to statistical inference. AIMS Mathematics, 2025, 10(4): 9322-9323. doi: 10.3934/math.2025429
  • Given a discrete memoryless channel and a target distribution on its output alphabet, one wishes to construct a length-n rate-R codebook such that the output distribution—computed over a codeword that is chosen uniformly at random—should be close to the n-fold tensor product of the target distribution. Here "close" means that the relative entropy between the output distribution and said n-fold product should be small. We characterize the smallest achievable relative entropy divided by n as n tends to infinity. We then demonstrate two applications of this result. The first application is an alternative proof of the achievability of the rate-equivocation region of the wiretap channel. The second application is a new capacity result for communication subject to state masking in the scenario where the decoder has access to channel-state information.



    The problem of approximating a certain output distribution over a noisy channel appears in many works in Information Theory [1,2,3,4,5,6,7,8,9]. Here we consider a variant of this problem where the approximation error—measured in normalized relative entropy—is not required to approach zero as the number of channel uses increases. Instead, we study the trade-off between the approximation error and the rate of the codebook.

    Consider a discrete memoryless channel (DMC) with finite input and output alphabets X and Y, respectively, and channel law

    W(y|x),xX,yY. (1)

    Let ˆQY be a probability mass function (PMF) on Y such that

    supp(ˆQY)=Y. (2)

    We wish to approximate its n-fold product ˆQ×nY. A length-n rate-R codebook is given by

    C={xn(1),,xn(2nR)}. (3)

    When a codeword is chosen equiprobably from C and sent over n uses of the channel, the output distribution is

    PYn(yn)=2nR2nRm=1ni=1W(yi|xi(m)),ynYn. (4)

    The approximation error that we consider is

    1nD(PYnˆQ×nY). (5)

    Assume that there exists an input PMF QX such that ˆQY=QXW, by which notation we mean

    (QXW)(y)=xXQX(x)W(y|x),yY. (6)

    Wyner's Soft-Covering Lemma [1] asserts that there exists a sequence of codebooks like (3) such that (5) tends to zero as n provided

    R>I(QX,W). (7)

    Later, Cuff [5,6] showed that, under (7), the unnormalized relative entropy D(PYnˆQ×nY) can also be made to approach zero, as can the total variation distance between PYn and ˆQ×nY. Furthermore, the probability that a randomly generated codebook (according to QX) produces an output distribution that is not close to ˆQnY in total variation distance is doubly-exponentially small in n.

    Now consider the case where again ˆQY=QXW, but where R approaches I(QX,W) from below. Furthermore, assume that QX and ˆQY are capacity-achieving input and output distributions, respectively, so I(QX,W) equals the capacity of the channel. Then there exists a sequence of codes such that (5) approaches zero [2, Theorem 15], [3, Theorem 2]. In fact, any sequence of "good codes"—those whose rates approach capacity and whose average error probabilities approach zero as nmust be such that (5) tends to zero as n. This result has been generalized to codes with nonvanishing error probability [8,9].

    In all results that we recalled above, the minimum of (5) approaches zero as n. Here we are interested in cases where R is not large enough for (5) to approach zero. Specifically, either R is smaller than and possibly bounded away from I(QX,W) for any QX such that ˆQY=QXW, or there exists no input distribution that can induce ˆQY via W. In Section 2, we study the tradeoff between the rate R and the approximation error (5) in such cases.

    The Soft-Covering Lemma and its stronger versions are useful in many problems in information-theoretic security. For example, the original version of the lemma can be used to derive the secrecy capacity of the wiretap channel, although Wyner did not observe this connection in his original work [10].

    Here we demonstrate two applications of the rate-error tradeoff that we derive. The first is an alternative proof of the achievability of the rate-equivocation region of the wiretap channel; see Section 3. The second application is in state masking [11,12]. Specifically, we derive capacity subject to state masking when the decoder has channel-state information (CSI); see Section 4.

    The following simple identity will be used repeatedly in our proofs.

    Lemma 1. Let PXnYn be a joint distribution on Xn×Yn such that PYn|Xn=W×n, and let ˉPXY be its average distribution on X×Y. Further, let ˆQY be a distribution on Y. The following holds:

    D(PYnˆQ×nY)=nI(ˉPX,W)+nD(ˉPYˆQY)I(Xn;Yn). (8)

    Proof. We will use the following well-known identity (which is easily verified by expanding its both sides): For QY=QXW,

    I(QX,W)+D(QYˆQY)=EQX[D(W(|X)ˆQY)]. (9)

    Replacing QX, W, and ˆQY in the above respectively by PXn, W×n, and ˆQ×nY, we obtain

    I(Xn;Yn)+D(PYnˆQ×nY)=EPXn[D(W×n(|Xn)ˆQ×nY)] (10)
    =nEˉPX[D(W(|X)ˆQY)] (11)
    =nI(ˉPX,W)+D(ˉPYˆQY), (12)

    where the last step follows by applying (9) again, this time with QX replaced by ˉPX.

    Theorem 2. (Direct result). Given some DMC (1) and ˆQY satisfying (2), fix a PMF QX on X and let QY=QXW. For any R<I(QX,W), ϵ>0, and ζ>0, let C be a random codebook whose entries are independent and identically distributed (IID) according to QX. Then, as n, the probability that C satisfies both of the following tends to one:

    (C1) The average probability of a decoding error (by an optimal decoder) is at most ϵ;

    (C2) Let PYn be given by (4), then (5) is upper- and lower-bounded as

    I(QX,W)+D(QYˆQY)Rζ1nD(PYnˆQ×nY)I(QX,W)+D(QYˆQY)R+Rϵ+ζ. (13)

    Proof. Let PCXn be the uniform distribution over the codewords of C, and let ˉPCX denote its average PMF on X, i.e.,

    ˉPCX(x)=1nni=1PCXi(x),xX. (14)

    By standard arguments (see, e.g., [13]) we can deduce that, for sufficiently large n, with high probability, the average decoding error probability of C is less than ϵ, i.e., it satisfies (C1). Furthermore, by the Law of Large Numbers, as n, ˉPCX(x)—which is random because C is random—converges to QX(x) with probability one for all xsupp(QX). Therefore, for any α>0, for sufficiently large n, we have, with high probability,

    δTV(ˉPCX,QX)α, (15)

    where δTV(,) denotes the total variation distance:

    δTV(P,Q)12PQ1. (16)

    By the union bound, we deduce that, with high probability, C satisfies both (C1) and (15). We fix C=C for any C satisfying (C1) and (15) and show that it must also satisfy (C2) (for an appropriately chosen α), which will then conclude the proof. From now on, we drop the superscript C: PXnYn denotes the joint input-output distribution on Xn×Yn resulting from sending a uniformly chosen codeword from C, and ˉPXY denotes its average PMF on X×Y. We can upper- and lower-bound I(Xn;Yn) as

    nR(1ϵ)1I(Xn;Yn)nR, (17)

    where the lower bound (first inequality) follows by Fano's Inequality [13], and the upper bound (second inequality) because there are only 2nR codewords, so H(Xn)nR. Using (17) together with Lemma 1, we have

    1nD(PYnˆQ×nY)I(ˉPX,W)+D(ˉPYˆQY)R+Rϵ+1n (18)
    =xXˉPX(x)D(W(|x)ˆQY)R+Rϵ+1n, (19)

    and also

    1nD(PYnˆQ×nY)xXˉPX(x)D(W(|x)ˆQY)R. (20)

    Denote

    dmaxxXD(W(|x)ˆQY), (21)

    which is finite due to (2). Then

    |xXˉPX(x)D(W(|x)ˆQY)xXQX(x)D(W(|x)ˆQY)|δTV(ˉPX,QX)dαd. (22)

    We combine (19) and (22) to obtain that, for sufficiently large n,

    1nD(PYnˆQ×nY)xXQX(x)D(W(|x)ˆQY)R+Rϵ+αd+1n (23)
    =I(QX,W)+D(QYˆQY)R+Rϵ+αd+1n. (24)

    Similarly, by (20) and (22),

    1nD(PYnˆQ×nY)I(QX,W)+D(QYˆQY)Rαd. (25)

    Combining (24) and (25) and choosing α<ζ/d (strictly) proves (13).

    We first present a lower bound on (5) for any codebook.

    Theorem 3 (Converse: General lower bound). For all n, any code C satisfies the following: Let PYn be given by (4). Then

    1nD(PYnˆQ×nY)minQX{(I(QX,W)R)++D(QYˆQY)}, (26)

    where QY=QXW.

    Proof. We have the following standard upper bound on I(Xn;Yn) that follows by the chain rule, the channel being memoryless, and concavity of mutual information in the input distribution:

    I(Xn;Yn)nI(ˉPX,W), (27)

    where ˉPX is the average PMF on X induced by the codebook. Also recall the upper bound (second inequality) in (17). Combining these two bounds with Lemma 1 immediately yields

    1nD(PYnˆQ×nY)(I(ˉPX,W)R)++D(ˉPYˆQY), (28)

    which continues to hold when its right-hand side (RHS) is minimized over ˉPX.

    Maximizing (5) over all codebooks is trivial: one should choose all codewords to be identical and to consist only of the input symbol x that maximizes D(W(|x)ˆQY). The problem becomes meaningful when we require the codebook to have a small decoding error probability, in which case we have the following upper bound.

    Theorem 4 (Converse: Upper bound for decodable codes). Any code C with decoding error probability less than or equal to ϵ must satisfy: There exists some QX such that, for QY=QXW,

    I(QX,W)RRϵ1n (29)
    1nD(PYnˆQ×nY)I(QX,W)+D(QYˆQY)R+Rϵ+1n. (30)

    Proof. Choose QX=ˉPX, the average input PMF induced by the codebook, then (29) follows by Fano's Inequality together with (27), and (30) follows by Fano's Inequality together with Lemma 1.

    The above results yield the following asymptotics on (5). First, combining Theorems 2 and 3, we have

    limnminC1nD(PYnˆQ×nY)=minQX{(I(QX,W)R)++D(QYˆQY)}, (31)

    where the minimum on the left-hand side is over all length-n rate-R codebooks. If we only allow codes with vanishing decoding error probabilities—which, with some abuse of notation, we denote as Ccor—then, by Theorems 2 and 4,

    limnminCcor1nD(PYnˆQ×nY)=minQX:I(QX,W)R{I(QX,W)R+D(QYˆQY)}. (32)

    Note that (31) and (32) coincide when

    RminQX:QXW=ˆQYI(QX,W). (33)

    Beyond this threshold, (31) remains zero, whereas (32) increases with R.

    Finally, again for codes with vanishing decoding error probabilities, Theorems 2 and 4 together imply

    limnmaxCcor1nD(PYnˆQ×nY)=maxQX:I(QX,W)R{I(QX,W)R+D(QYˆQY)}. (34)

    Note that (32) is meaningful only when R is below the capacity of the channel; else correct decoding would not be possible, and the minimization on both sides would be over empty sets; similarly for (34).

    Due to the identity (9), the minimization on the RHS of (31) (when restricted to QX such that I(QX,W)R) and (32) is linear, and so is the maximization on the RHS of (34). Therefore they are achieved on the boundary, as we demonstrate in the following example.

    Example 5. Let W be a binary symmetric channel with crossover probability p(0,0.5), and let ˆQY be Bernoulli(q) with q(p,0.5), so D(W(|0)ˆQY)<D(W(|1)ˆQY). It then follows that the minimum on the RHS of of (32) is achieved by using 0 as frequently as possible, i.e., it is achieved by QX being Bernoulli(a) such that I(QX,W)=R and a<0.5. Similarly, the maximum on the RHS of (34) is achieved by Bernoulli(1a). We plot (31), (32), and (34) for this example in Figure 1.

    Figure 1.  Example 5 with p=0.15 and q=0.35. The solid line depicts (31) and (32) when R satisfies (33); the dashed line depicts (32) when R exceeds the RHS of (33); and the dotted line depicts (34).

    Our results above recover the Soft-Covering Lemma [1] as well as its converse, and the proofs are considerably simpler than the ones that are usually seen in the literature. We prove both direct and converse results with the approximation error measured by the normalized relative entropy (5). As shown in [5], (5) tending to zero is a weaker requirement than δTV(PYn,ˆQ×nY) tending to zero, so the direct result that we recover (which is the same as Wyner's original version [1]) is weaker than those in [5,6], but the converse we prove is stronger than a converse stated in terms of δTV(PYn,ˆQ×nY).

    The converse to the Soft-Covering Lemma asserts that, in order for (5) to approach zero as n, R must satisfy

    RminQX:QXW=ˆQYI(QX,W). (35)

    This follows immediately from Theorem 3: For the RHS of (26) to approach zero, both summands inside the minimization must tend to zero. This means QY=QXW must tend to ˆQY, and R must approach or exceed I(QX,W).

    For the direct Soft-Covering Lemma, we apply Theorem 2, choose QX to be such that QY=ˆQY, and let R approach I(QX,W) from below. Then the second inequality in (13) implies that the normalized relative entropy (5) can be made arbitrarily close to zero.*

    *Note that our proof of the direct Soft-Covering Lemma builds upon the classic proof that, with high probability, the random codebook C admits a small decoding error probability. The latter is standard, albeit nontrivial.

    Consider a wiretap channel characterized by input alphabet X, two output alphabets Y and Z, and channel law W(y,z|x), (x,y,z)X×Y×Z. A length-n rate-R code consists of a stochastic encoder that maps a message m{1,,2nR}, possibly together with some local randomness, to xn, and a decoder that maps yn to ˆm{1,,2nR}; an error occurs whenever ˆmm. A rate-equivocation pair (R,Δeq) is said to be achievable if there exists a sequence of length-n rate-R codes (indexed by n) whose error probability tends to zero as n, and

    lim infn1nH(M|Zn)Δeq. (36)

    The following result is well known [10,14]; we provide a simple proof via Theorem 2.

    The optimal rate-equivocation region also involves an auxiliary random variable U satisfying the Markov relation UX(Y,Z). The direct part of that result can be obtained immediately from Theorem 6 by viewing U, instead of X, as the channel input. We therefore focus on proving the achievability result that does not involve U.

    Theorem 6. For any input distribution PX such that I(X;Y)>I(X;Z), any rate-equivocation pair (R,Δeq) satisfying the following is achievable:

    Δeq<R<I(X;Y) (37)
    Δeq<I(X;Y)I(X;Z). (38)

    Proof. Fix some R>0. Generate a codebook

    {xn(m,), m{1,,2nR},{1,,2nR}} (39)

    IID according to PX. The encoder draws L uniformly at random over {1,,2nR}. It then maps (m,), with m being the message, to the codeword xn(m,), which it subsequently sends to the channel.

    Following standard procedures [13], one can show that the probability that the randomly generated codebook has small average error probability (even for decoding both m and ) will tend to one as n provided

    R+R<I(X;Y). (40)

    We next study equivocation. Let C denote the entire (randomly generated) codebook, let C(m) denote {Xn(m,1),,Xn(m,2nR)}, the random sub-codebook for message M=m, and let PZn|C(m) denote the distribution at the eavesdropper given M=m and with a uniformly chosen L. Then

    I(M;Zn|C)=2nR2nRm=1D(PZn|C(m)PZn) (41)
    =2nR2nRm=1D(PZn|C(m)P×nZ)D(PZnP×nZ) (42)
    2nR2nRm=1D(PZn|C(m)P×nZ). (43)

    By Theorem 2, for any α>0, for all sufficiently large n,

    Pr[1nD(PZn|C(m)P×nZ)I(PX,W)R+α]1α. (44)

    Let B(m) be the indicator function

    B(m)=1[1nD(PZn|C(m)P×nZ)>I(PX,W)R+α], (45)

    then {B(m)} are IID Bernoulli(p) for some pα. Denote

    B=2nRmB(m), (46)

    then

    E[B]=pα (47)
    Var[B]=2nR(pp2)2nRα. (48)

    Now define

    d=maxxD(PZ|X=xPZ), (49)

    then, by the definition of B,

    1nI(M;Zn|C)I(X;Z)R+α+Bd. (50)

    Consequently,

    Pr[1nI(M;Zn|C)>I(X;Z)R+α+2αd]Pr[B>2α]2nR. (51)

    Thus, there must be a code C=C such that the decoding error probability is small and

    1nI(M;Zn|C=C)I(X;Z)R+β (52)

    where βα+2αd can be made arbitrarily close to zero. This means we can achieve equivocation

    Δeq=1n(H(M)I(M;Zn))R+RI(X;Z)β. (53)

    Combining (40) and (53) together with

    R>0, (54)

    and eliminating R, we obtain the desired result.

    State masking was first studied in [11] in a setting where the encoder has noncausal CSI while the decoder does not. Here we are mainly interested in the setting where the encoder does not have CSI but the decoder does.

    Consider a state-dependent DMC with input alphabet X, output alphabet Y, state alphabet S, and channel law

    W(y|x,s),xX,yY,sS. (55)

    The state sequence is IID according to PS. In a length-n rate-R code, the encoder is a possibly stochastic mapping

    enc:{1,,2nR}Xn,mxn, (56)

    and the decoder is a mapping from the output yn and the state sequence sn to its guess of the message:

    dec:Yn×Sn{1,,2nR},(yn,sn)ˆm. (57)

    As usual, we say a decoding error occurs if ˆmm, and we consider the average error probability, where the average is over a uniformly drawn message and the randomness of the state sequence. We impose the following state-masking constraint: For some given constant K>0, the joint distribution induced by a uniformly drawn message, the encoder, the random state sequence, and the channel law must satisfy, for all n,

    I(Sn;Yn)nK. (58)

    The supremum over all rates that are achievable—in the sense that the average error probability can be made arbitrarily small as n while (58) is satisfied—is called the capacity in this setting.

    Theorem 7. The capacity of the state-dependent DMC with CSI at the decoder under the constraint (58) is given by

    supI(X;Y,S) (59)

    over joint distributions of the form PX(x)PS(s)W(y|x,s) subject to

    I(S;Y)<K. (60)

    To approach this capacity, it is sufficient to use deterministic encoders.

    Proof. See Section 4.2.

    The above result may look rather natural by itself, but perhaps less so once compared to previous results on state masking without decoder CSI. When the encoder has noncausal CSI and the decoder has no CSI, the single-letter state-masking capacity formula in [11] involves a constraint on I(S;U,Y) rather than just I(S;Y) as in (60), where U is the auxiliary random variable in Gel'fand-Pinsker coding [15].

    For a more direct comparison with Theorem 7, let us consider the setting in which neither encoder nor decoder has CSI, so the encoder is the same as above, while (57) is replaced by

    dec:Yn→∈{1,,2nR},ynˆm. (61)

    Define capacity similarly as above. When K0, the problem becomes so-called state obfuscation, and the capacity is given by [12, Theorem 3], of which the next theorem can be considered a generalization. Its proof does not use the results from Section 2; we include this result for comparison and its proof for completeness.

    Theorem 8. The capacity of the state-dependent DMC without CSI under the constraint (58) is given by

    supI(U;Y) (62)

    over joint distributions PU(u)PX|U(x|u)PS(s)W(y|x,s)—where U is an auxiliary random variable taking values in some finite set U—subject to the condition

    I(S;U,Y)<K. (63)

    Proof. See Section 4.3.

    Note that not only is (62) different from (59), but so is (63) from (60). The difference becomes even more apparent when we only allow deterministic encoders:

    Remark 9. In the no-CSI setting, if the encoder is restricted to being a deterministic mapping, then capacity becomes

    supI(X;Y) (64)

    over joint distributions PX(x)PS(s)W(y|x,s) subject to the condition

    I(S;X,Y)<K. (65)

    The proof is similar to that of Theorem 8 and is omitted.

    We start with the converse part of the theorem, which is rather straightforward. Take any (possibly random) code satisfying (58). Let ˉPXYS denote the average distribution on X×Y×S induced by a uniformly drawn codeword, the random states, and the channel. We start from (58) to derive the following bound:

    nKI(Sn;Yn) (66)
    =H(Sn)H(Sn|Yn) (67)
    =ni=1H(Si)H(Si|Yn,Si1) (68)
    =ni=1I(Si;Yn,Si1) (69)
    ni=1I(Si;Yi) (70)
    nI(S;Y)|ˉPYS (71)

    where the last step follows because every Si has the same distribution, and because I(S;Y), for fixed PS, is convex in PY|S. On the other hand, since the decoder knows S, we can view the pair (Y,S) as the output of the channel. Using standard arguments invoking Fano's Inequality, we can show

    nRnI(X;Y,S)|ˉPXYS+nϵ (72)

    for some ϵ>0 that approaches zero as n and as the error probability is required to vanish. The converse part of the theorem follows from (71) and (72).

    We next prove the direct part with the help of Theorem 2. Let PX be an input distribution such that PXYS(x,y,s)PX(x)PS(s)W(y|x,s) satisfies

    I(S;Y)K (73)

    for some K<K. All single-letter mutual informations in the following are computed according to PXYS.

    Once again, because the decoder knows S, the effective output of the channel is the pair (Y,S). We can apply Theorem 2 with the choice

    ˆQYS=PY×PS (74)

    to obtain that, for any

    R<I(X;Y,S) (75)

    there exists a deterministic length-n rate-R code with small error probability that induces a distribution PYnSn satisfying, for some small ϵ,

    1nD(PSnYnP×nS×P×nY)I(X;Y,S)R+D(PSYPS×PY)+ϵ (76)
    =I(X;Y,S)R+I(S;Y)+ϵ (77)
    I(X;Y,S)R+K+ϵ. (78)

    We can now bound I(Sn;Yn) as

    I(Sn;Yn)=D(PSnYnPSn×PYn) (79)
    =D(PSnYnP×nS×P×nY)D(PYnP×nY) (80)
    D(PSnYnP×nS×P×nY) (81)
    nK+n(I(X;Y,S)R+ϵ), (82)

    where the last step follows from (78). Since K<K, and since R can be chosen to be arbitrarily close to I(X;Y,S), we conclude that I(Sn;Yn) is smaller than nK for sufficiently large n. The proof is concluded by noting that the construction works for all K<K.

    We start with the converse part. Take a code that has a small error probability and that satisfies (58). Let PXnYnSn be the joint distribution induced by sending a uniformly drawn codeword through the channel. Define

    Ui(M,Yi1,Si1),i=1,,n. (83)

    Let T be uniformly distributed over {1,,n} and independent of everything else. With a slight abuse of notation, define U(UT,T), YYT, and SST. We first upper-bound I(M,Yn;Sn) as

    I(M,Yn;Sn)I(Yn;Sn)+H(M|Yn) (84)
    nK+nϵ (85)

    for some ϵ that approaches zero when the error probability approaches zero and n. The last step follows by (58) and Fano's Inequality. We then lower-bound I(M,Yn;Sn) as

    I(M,Yn;Sn)=ni=1I(M,Yn;Si|Si1) (86)
    =ni=1I(M,Yn,Si1;Si) (87)
    ni=1I(M,Yi,Si1;Si) (88)
    =ni=1I(Ui,Yi;Si) (89)
    =nI(UT,YT;ST|T) (90)
    =nI(U,Y;S) (91)

    where (87) follows because Sn is IID; and (91) because T is independent of ST. Combining (85) and (91), we have

    I(U,Y;S)K+ϵ. (92)

    On the other hand, using a standard argument invoking Fano's Inequality, we can show

    n(Rϵ)I(M;Yn) (93)
    ni=1I(Ui;Yi) (94)
    nI(U;Y). (95)

    Combining (92) and (95) proves the converse part of the theorem.

    We next prove the direct part of the theorem. Fix any finite set U and joint distribution PUX on U×X. Generate a codebook by picking the codewords {un(1),,un(2nR)} IID according to PU. To send message m, the sender first passes u1(m),,un(m) independently through PX|U, and then sends the resulting x-sequence to the channel. Thus, effectively, we have a channel whose input alphabet is U instead of X.

    It follows from standard arguments that, with high probability, the randomly generated code has small average decoding error probability provided R<I(U;Y). We shall show that, again with high probability, the code will also satisfy the constraint (58). Then it will follow from the union bound that there exist codes that satisfy both. To check (58), first write

    I(Sn;Yn)I(Sn;Un,Yn) (96)
    =I(Sn;Yn|Un) (97)
    =H(Yn|Un)H(Yn|Un,Sn). (98)

    For the first term on the RHS of (98), we have

    H(Yn|Un)=ni=1H(Yi|Un,Yi1)ni=1H(Yi|Ui). (99)

    Since the codebook is generated IID according to PU, by the Law of Large Numbers, the RHS divided by n tends to H(Y|U) with probability one. Therefore, for every ϵ>0, for sufficiently large n, with high probability,

    H(Yn|Un)nH(Y|U)+nϵ (100)

    with the understanding that H(Y|U) is computed according to the chosen joint distribution. For the second term on the RHS of (98), we have

    H(Yn|Un,Sn)=ni=1H(Yi|Ui,Si), (101)

    because the state-dependent channel PY|US is memoryless. Again by the Law of Large Numbers, for sufficiently large n, with high probability,

    H(Yn|Un,Sn)nH(Y|U,S)nϵ. (102)

    Summarizing (98), (100), and (102), with high probability

    I(Sn;Yn)nI(S;U,Y)+2nϵ. (103)

    So (58) is indeed satisfied with high probability if I(S;U,Y)<K and ϵ is sufficiently small. This completes the proof.

    The author declares that no Artificial Intelligence (AI) tools were utilized in creating this article.

    The author thanks Stefan Moser, Sergio Verdú, and Shun Watanabe for inspiring discussions and email exchanges. He also thanks Guest Editor Igal Sason and the anonymous reviewers for their helpful comments.

    The author declares no conflicts of interest.



    [1] A. D. Wyner, The common information of two independent random variables, IEEE Trans. Inform. Theory, 21 (1975), 163–179. https://doi.org/10.1109/TIT.1975.1055346 doi: 10.1109/TIT.1975.1055346
    [2] T. S. Han, S. Verdú, Approximation theory of output statistics, IEEE Trans. Inform. Theory, 39 (1993), 752–772. https://doi.org/10.1109/18.256486 doi: 10.1109/18.256486
    [3] S. Shamai, S. Verdú, The empirical distribution of good codes, IEEE Trans. Inform. Theory, 43 (1997), 836–846. https://doi.org/10.1109/18.568695 doi: 10.1109/18.568695
    [4] M. Hayashi, General nonasymptotic and asymptotic formulas in channel resolvability and identification capacity and their application to the wiretap channel, IEEE Trans. Inform. Theory, 52 (2006), 1562–1575. https://doi.org/10.1109/TIT.2006.871040 doi: 10.1109/TIT.2006.871040
    [5] P. Cuff, Distributed channel synthesis, IEEE Trans. Inform. Theory, 59 (2013), 7071–7096. https://doi.org/10.1109/TIT.2013.2279330 doi: 10.1109/TIT.2013.2279330
    [6] P. Cuff, A stronger soft-covering lemma and applications, in 2015 IEEE Conf. Comm. and Network Security, Florence, Italy, 2015. https://doi.org/10.1109/CNS.2015.7346808
    [7] J. Hou, G. Kramer, Effective secrecy: Reliability, confusion and stealth, in Proc. IEEE Int. Symp. Inform. Theory, Honolulu, HI, USA, 2014. https://doi.org/10.1109/ISIT.2014.6874903
    [8] M. Raginsky, I. Sason, Concentration of measure inequalities in information theory, communications, and coding: Third edition, Now Foundations and Trends, 2018.
    [9] Y. Polyanskiy, S. Verdú, Empirical distribution of good channel codes with nonvanishing error probability, IEEE Trans. Inform. Theory, 60 (2014), 5–21. https://doi.org/10.1109/TIT.2013.2284506 doi: 10.1109/TIT.2013.2284506
    [10] A. D. Wyner, The wiretap channel, Bell Syst. Techn. J., 54 (1975), 1355–1387. https://doi.org/10.1002/j.1538-7305.1975.tb02040.x doi: 10.1002/j.1538-7305.1975.tb02040.x
    [11] N. Merhav, S. Shamai, Information rates subject to state masking, IEEE Trans. Inform. Theory, 53 (2007), 2254–2261. https://doi.org/10.1109/TIT.2007.896860 doi: 10.1109/TIT.2007.896860
    [12] L. Wang, G. W. Wornell, Communication over discrete channels subject to state obfuscation, IEEE Trans. Inform. Theory, 70 (2024), 8455–8466. https://doi.org/10.1109/TIT.2024.3432573 doi: 10.1109/TIT.2024.3432573
    [13] T. M. Cover, J. A. Thomas, Elements of information theory, 2 Eds., John Wiley & Sons, New York, 2006.
    [14] I. Csiszár, J. Körner, Broadcast channels with confidential messages, IEEE Trans. Inform. Theory, 24 (1978), 339–348. https://doi.org/10.1109/TIT.1978.1055892 doi: 10.1109/TIT.1978.1055892
    [15] S. I. Gel'fand, M. S. Pinsker, Coding for channels with random parameters, Prob. Contr. Inform. Theory, 9 (1980), 19–31.
  • Reader Comments
  • © 2025 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(150) PDF downloads(20) Cited by(0)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog