Mutual Information and MMSE

Notations:

  • Mutual information (MI)
    \begin{align}
    I(X;Y)
    &=\int p(x,y)\log \frac{p(x|y)}{p(x)}\text{d}x\text{d}y\\
    &=\int p(x,y)\log \frac{p(x,y)}{p(x)p(y)}\text{d}x\text{d}y\\
    &=\int p(x,y)\log \frac{p(y|x)}{p(y)}\text{d}x\text{d}y\\
    &=I(Y;X)
    \end{align}
  • integration by parts
    \begin{align}
    \int u(x)v’(x)\text{d}x= u(x)v(x)|_{x=-\infty}^{x=+\infty}-\int u’(x)v(x)\text{d}x
    \end{align}
    where $v’(x)$ denotes $\frac{\text{d}v(x)}{\text{d}x}$。

Theorem:Given following linear Gaussian model
\begin{align}
Y=\sqrt{\gamma}X+U\quad U\sim \mathcal{N}(0,1)
\end{align}
where $\gamma>0$ refers to signal-noise-rate (SNR). We have
\begin{align}
\frac{\text{d}I(X;Y)}{\text{d}\gamma}=\frac{1}{2}\text{MMSE}
\end{align}
where
\begin{align}
\text{MMSE}=\int (x-\hat{x})^2p(x,y;\gamma)\text{d}x\text{d}y
\end{align}
and $\hat{x}=\int xp(x|y;\gamma)\text{d}x$。

$Proof$:Define
\begin{align}
p_k(y;\gamma)=\int x^k p(y,x;\gamma)\text{d}x=\mathbb{E}_X\left\{X^kp(y|X;\gamma)\right\}
\end{align}
we have follows conclusions

  1. \begin{align}
    \frac{\text{d} p_k(y;\gamma)}{\text{d}\gamma}
    &=\frac{1}{2\sqrt{\gamma} }yp_{k+1}(y;\gamma)-\frac{1}{2}p_{k+2}(y;\gamma)\\
    &=-\frac{1}{2\sqrt{\gamma} }\frac{\text{d} }{\text{d}y}p_{k+1}(y;\gamma)
    \end{align}
  2. \begin{align}
    \hat{x}_{\text{MMSE} }=\int xp(x|y;\gamma)\text{d}x=\frac{p_1(y;\gamma)}{p_0(y;\gamma)}
    \end{align}

Mutual information
\begin{align}
I(X;Y)
&=\int p(y,x;\gamma)\log \frac{p(y|x;\gamma)}{p(y;\gamma)}\text{d}x\text{d}y\\
&=\underbrace{\int p(y,x;\gamma)\log p(y|x;\gamma)\text{d}x\text{d}y}_{\xi}-\underbrace{\int p(y,x;\gamma)\log p(y;\gamma)\text{d}x\text{d}y}_{\zeta}
\end{align}
For that,we calculate $\xi$ and $\zeta$ respectively as follows
\begin{align}
\xi
&=\int p(y|x;\gamma)p(x)\log p(y|x;\gamma)\text{d}x\text{d}y\\
&\overset{(a)}=-\frac{1}{2}\int p(y|x;\gamma)p(x)\log 2\pi \text{d}x\text{d}y-\frac{1}{2}(y-\sqrt{\gamma}x)^2p(y|x;\gamma)p(x)\text{d}x\text{d}y\\
&=-\frac{1}{2}\log (2\pi e)
\end{align}
where the fact $p(y|x;\gamma)=\frac{1}{\sqrt{2\pi} }\exp \left[-\frac{(y-\sqrt{\gamma}x)^2}{2}\right]$ is used in $(a)$.
\begin{align}
\zeta
&=\int p(y,x;\gamma)\log p(y;\gamma)\text{d}y\\
&=\int_y\int_x p(y,x;\gamma)\text{d}x\log p(y;\gamma)\text{d}y\\
&=\int_y p(y;\gamma)\log p(y;\gamma)\text{d}y
\end{align}
Computing the partial derivation of $I(X;Y)$ w.r.t. $\gamma$ yields
\begin{align}
\frac{\text{d}I(X;Y)}{\text{d}\gamma}
&=-\frac{\text{d} }{\text{d}\gamma}p_0(y;\gamma)\log p_0(y;\gamma)\text{d}y\\
&=-\int \left[\log p_0(y;\gamma)+1\right]\frac{\text{d}p_1(y;\gamma)}{\text{d}\gamma}\text{d}y\\
&=\frac{1}{2\sqrt{\gamma} }\int \log p_0(y;\gamma)\frac{\text{d}p_1(y;\gamma)}{\text{d}y}\text{d}y+\frac{1}{2\sqrt{\gamma} }\underbrace{\int \frac{\text{d}p_1(y;\gamma)}{\text{d}y}\text{d}y}_{\kappa}\\
&\overset{(a)}{=}\frac{1}{2\sqrt{\gamma} }\int \log p_0(y;\gamma)\frac{\text{d}p_1(y;\gamma)}{\text{d}y}\text{d}y\\
&\overset{(b)}{=}-\frac{1}{2\sqrt{\gamma} }\int \frac{p_1(y;\gamma)}{p_0(y;\gamma)}\frac{\text{d}p_0(y;\gamma)}{\text{d}y}\text{d}y\\
&\overset{(c)}{=}\frac{1}{2\sqrt{\gamma} }\int \frac{p_1(y;\gamma)}{p_0(y;\gamma)}\left[y-\sqrt{\gamma}\frac{p_1(y;\gamma)}{p_0(y;\gamma)}\right]p_0(y;\gamma)\text{d}y
\end{align}
where $(a)$ holds thanks to integration by parts,
\begin{align}
\kappa=\left.p_1(y;\gamma)\right|_{y=-\infty}^{y=+\infty}=0
\end{align}
$(b)$ holds also based on integration by parts,
\begin{align}
&\int \log p_0(y;\gamma)\frac{\text{d}p_1(y;\gamma)}{\text{d}y}\text{d}y\\
=&\left.{p_1(y;\gamma)\log p_0(y;\gamma)}\right|_{y=-\infty}^{y=+\infty}-\int \frac{p_1(y;\gamma)}{p_0(y;\gamma)}\frac{\text{d}p_0(y;\gamma)}{\text{d}y}\text{d}y\\
=&-\frac{1}{2\sqrt{\gamma} }\int \frac{p_1(y;\gamma)}{p_0(y;\gamma)}\frac{\text{d}p_0(y;\gamma)}{\text{d}y}\text{d}y
\end{align}
and $(c)$ holds by conclusion 1.

Based on above, we have
\begin{align}
\frac{\text{d}I(X;Y)}{\text{d}\gamma}
&=\frac{1}{2\sqrt{\gamma} }\int \left(\int xp(x|y;\gamma)\text{d}x\right)\left[y-\sqrt{\gamma}\left(\int xp(x|y;\gamma)\text{d}x\right)\right]p(y;\gamma)\text{d}y\\
&=\frac{1}{2\sqrt{\gamma} }\int_{x,y} xyp(x,y;\gamma)\text{d}x\text{d}y-\frac{1}{2}\int_y\left(\int xp(x|y;\gamma)\text{d}x\right)^2p(y;\gamma)\text{d}y\\
&=\frac{1}{2}\int x^2p(x,y;\gamma)\text{d}x\text{d}y-\frac{1}{2}\int \hat{x}p(x,y;\gamma)\text{d}x\text{d}y\\
&=\frac{1}{2}\mathbb{E}\left\{X^2-\hat{X}^2\right\}\\
&\overset{(d)}{=}\frac{1}{2}\text{MMSE}
\end{align}
where the expectation is taken over $p(x,y;\gamma)$. In addition, $(d)$ holds by
\begin{align}
&\int (x-\hat{x})^2p(x,y;\gamma)\text{d}x\text{d}y\\
=&\int x^2p(x,y;\gamma)\text{d}x\text{d}y+\int \hat{x}^2 p(x,y;\gamma)\text{d}x\text{d}y-2\int x\hat{x}p(x,y;\gamma)\text{d}x\text{d}y\\
=&\int x^2p(x)\text{d}x+ \int \hat{x}^2p(y;\gamma)\text{d}y-2\int \hat{x} \int x p(x|y;\gamma)\text{d}x p(y;\gamma)\text{d}y\\
=&\int x^2p(x)\text{d}x+\int \hat{x}^2p(y;\gamma)\text{d}y-2\int \hat{x}^2p(y;\gamma)\text{d}y\\
=&\int x^2p(x)\text{d}x-\int \hat{x}^2p(y;\gamma)\text{d}y\\
=&\int (x^2-\hat{x})p(x,y;\gamma)\text{d}x\text{d}y
\end{align}
Note that $\hat{x}=\int xp(x|y;\gamma)\text{d}x$ is the function of $y$.

References

[1] Guo D. Gaussian channels: Information, estimation and multiuser detection[D]. Princeton University, 2004.
[2] Guo D, Shamai S, Verdú S. Mutual information and minimum mean-square error in Gaussian channels[J]. IEEE Transactions on Information Theory, 2005, 51(4): 1261-1282.