Overview

So far, we have chosen to use the term code for a vector subspace $C$ of $\mathbb{F}_q^n$. The idea is that we are interested in encoding some data by identifying it with vectors in $\mathbb{F}_q^k$.

If $G$ is a generator matrix for our code in standard form, then we encode our data: given a vector $v \in \mathbb{F}_q^k$, the encoded version is \[v \cdot G \in \mathbb{F}_q^n.\]

Note that – since $G$ is in standard form – if $v = (v_1,\cdots,v_k)$ then \[v\cdot G = (v_1,\cdots,v_k,w_{k+1},\cdots,w_n)\] for some scalars $w_j \in \mathbb{F}_q.$

Our intent is to “transmit” this encoded data $v \cdot G$, possibly through some noisy channels that may result in transmission errors. At the other end, some vector $w$ in $\mathbb{F}_q^n$ is received, and the hope is to recover the vector $v$ from the received vector $w$.

The field of Information Theory formulates a way to reason about such systems; we are going to sketch a rudimentary description.

Noisy Channels and Shannon’s Theorem

We consider finite sets $S$ and $T$ (say $S = \{s_1,\cdots,s_n\}$ and $T = \{t_1,\cdots,t_m\}$).

We consider random variables $X$ on $S$ and $Y$ on $T$. In particular, we may consider probabilities:

\[P(X=s) = p_s, \quad \text{the probability that the random var $X$ takes value $s \in S$}\]

\[P(Y=t) = q_t, \quad \text{the probability that the random var $Y$ takes value $t \in T$}\]

We view the elements of the set $S$ as the data we send through some transmission channel, and $T$ as the data we receive.

In the case of our linear codes $C \subset \mathbb{F}_q^n$ described above, $S$ would be $\mathbb{F}_q^k$ and $T$ would be $\mathbb{F}_q^n$.

Block codes

More generally, we consider block codes $C \subset A^n$. Here $A$ is an alphabet, and the code words in $C$ are just $n$-tuples of elements from $A$. We write $q = |A|$. We say that the length of the code $C$ is $n$.

In this setting, for our transmission channel we take $S = C$ and $T = A^n$.

Channel

A channel $\Gamma$ for transmission amounts to the following matrix with rows indexed by the set $S$ and columns indexed by the set $T$: \[p_{st} = P( Y = t \mid X = s)\] i.e. the conditional probability that $Y= t$ given that $X = s$.

Example

An example is the binary symmetric channel, where $S = T = \{0,1\}$ and the channel matrix $\Gamma$ is given by \[(p_{st}) = \begin{bmatrix} \phi & 1 - \phi \\ 1 - \phi & \phi \end{bmatrix} \quad \text{for some $\phi \in [0,1]$}.\]

Here $\phi$ represents the probability that $0$ was received given that $0$ was sent, and also the probability that $1$ was received given that $1$ was sent.

Note for example that if we know the channel matrix and if we know the probabilities for the random variable $X$, we find the probabilities for $Y$ via \[P(Y=t) = \sum_{s \in S} P(Y=t \mid X = s) \cdot P(X=s).\]

Example: For the binary symmetric channel if we know that $P(X=0) = p$, then \[P(Y=0) = \phi \cdot p + (1-\phi) (1-p).\]

Decoding

With notation as above, a decoding is a function $\Delta:T \to S$. Think of it this way: given that $t \in T$ was received, the decoding function “guesses” that $\Delta(t) \in S$ was sent.

The question is: how to identify a good decoding function.

Well, we can consider the probabilities that reflect how often a decoding $\Delta$ is correct:

\[q_{\Delta(t),t} = P(X = \Delta(t) \mid Y = t)\] represents the probability that $\Delta(t)$ was sent given that $t$ was received.

The average probability of a correct decoding is given by \[P_{\operatorname{COR}} = \sum_{t \in T} q_t \cdot q_{\Delta(t),t}\] (remember that $q_t = P(Y = t)$)

Maximum likelihood decoding

A decoding $\Delta:T \to S$ is said to be a maximal likelihood decoding if \[p_{\Delta(t),t} \ge p_{s,t}\] for every $s \in S$ and every $t \in T$.

Under some circumstances, one achieves maximal likelihood decoding through minimum distance decoding.

For example, we have:

Lemma

For the binary symmetric channel and $C \subset \mathbb{F}_2^n$, consider the assignment \[\Delta(v) = u\] where $u$ is the closest neighbor to $v$ in $C$ with respect to the Hamming distance.

Then $\Delta$ is maximal likelihood decoding.

Transmission rate and capacity

For a block code $C \subset A^n$ with $|A| = q$, the transmission rate of $C$ is defined to be \[R = \dfrac{\log_q(|C|)}{n}\]

If $A = \mathbb{F}_q$ and $C$ is a linear code with $k = \dim_{\mathbb{F}_q} C$, then \[R = k/n.\]

There is a notion of the capacity of a channel $\Gamma$; it is a number which in some sense encodes the theoretical maximum rate at which information can be reliably transmitted over the channel; we omit the definition here.

Shannon’s Theorem

Theorem (Shannon)

Let $\delta > 0$ be given and let $0 < R$ with $R$ less than the channel capacity.

For every sufficiently large natural number $n$, there is a code of length $n$ and transmission rate $\ge R$ such that, using maximum likelihood decoding, the probability $P_{\operatorname{COR}}$ of a correct decoding satisfies \[P_{\operatorname{COR}} > 1 - \delta.\]

This shows that, given a channel with non-zero capacity, there are codes which allow us to communicate using the channel and decode with a probability of a correct decoding arbitrary close to 1.

It is not constructive, of course – in some sense, this motivates the subject: how to find the codes that work well?

Bounds for block codes

Here we consider an alphabet $A$ – thus $A$ is just a finite set, of order $|A| = q$ – and a set of codewords $C \subset A^n$; we call $n$ the length of the codewords.

We consider the Hamming distance on $A^n$: for $u,v \in A^n$, \[\operatorname{dist}(u,v) = \#\{i \mid u_i \ne v_i\}\] where e.g. $u = (u_1,\cdots,u_n) \in A^n$. In words, the distance between $u$ and $v$ is the number of coordinates in which the tuples differ.

It is straightforward to check that $\operatorname{dist}$ is a metric on the finite set $A^n$; in particular, the triangle inequality holds: for every $u,v,w \in A^n$ we have \[\operatorname{dist}(u,v) \le \operatorname{dist}(u,w) + \operatorname{dist}(w,v).\]

The minimal distance of $C$ is given by \[d = \min \{ \operatorname{dist}(u,v) \mid u,v \in C, u \ne v\}.\]

Lemma

Using nearest neighbor decoding, a block code of minimal distance $d$ can correct up to $(d-1)/2$ errors.

Proof

For every $w \in A^n$ and every $u,v \in C$ we have \[(*) \quad d \le \operatorname{dist}(u,v) \le \operatorname{dist}(u,w) + \operatorname{dist}(w,v).\]

Now, if $\operatorname{dist}(u,w) \le (d-1)/2$ and $\operatorname{dist}(w,v) \le (d-1)/2$ then $\operatorname{dist}(u,v) \le d-1$, contrary to $(*)$. Thus for any $w$ there is at most one codeword $u \in C$ for which $\operatorname{dist}(u,w) \le (d-1)/2$.

From the point of view of code transmission, if $w \in A^n$ is received and no more than $(d-1)/2$ of the components of $w = (w_1,w_2,\cdots,w_n)$ are erroneous, then nearest neighbor decoding will find the codeword in $C$ that was transmitted.

Example (Repetition code)

Consider a finite alphabet $A$ and the the codewords $C=\{(a,a,\cdots,a) \in A^r \mid a \in A\}$. Thus the data $a \in A$ is encoded by the redundant codeword $(a,a,\cdots,a)$.

The minimal distance between distinct codewords in $C$ is $r$, so the Lemma shows that using nearest neighbor decoding, this code can correct up to $(r-1)/2$ errors.

(Note that in this case nearest neighbor decoding amounts to decoding \[(a_1,a_2,\cdots,a_r)\] by “majority vote”; in other words, view $\{a_1,a_2,\cdots,a_r\}$ as a multi-set and choose an element with maximal multiplicity.)

Counting codes with given parameters

Write $A_q(n,d)$ for the maximum size $|C|$ of a block code $C \subset A^n$ having minimal distance $d$.

We are going to prove some results about the quantity $A_q(n,d)$.

We first compute the size of a “ball” in the metric space $(A^n,\operatorname{dist})$.

Lemma

For $u \in A^n$ and a natural number $m$ write: \[B_m(u) = \{v \in A^n \mid \operatorname{dist}(u,v) \le m\}.\]

Let \[\delta(m) = 1 + \dbinom{n}{1}(q-1) + \dbinom{n}{2}(q-1)^2 + \cdots + \dbinom{n}{m}(q-1)^m = \sum_{j=0}^m \dbinom{n}{j}(q-1)^j.\] Then $|B_m(u)| = \delta(m)$.

Remark

Note that if $k \in \mathbb{N}$, $k > n$ then we insist that $\dbinom{n}{k} = 0$; in this case “there are 0 ways of choosing precisely $k$ elements from a set of size $n$.”

This is consistent e.g. with the formula \[\dbinom{n}{k} = \dfrac{n(n-1)\cdots(n-k+1)}{k!}\] since the factor $(n-n) = 0$ appears in the numerator.

Proof of Lemma

For each $j = 0,1,\cdots,m$ there are $\dbinom{n}{j} \cdot (q-1)^j$ elements of $A^n$ at distance precisely $j$ from $u$.

We may now state and prove the following:

Theorem (Gilbert-Varshamov Bound)

\[A_q(n,d) \cdot \delta(d-1) \ge q^n.\]

Proof

Suppose that $C \subset A^n$ is a code with minimal distance $d$ for which $|C| = A_q(n,d)$.

Notice that $|C| \cdot \delta(d-1)$ is the size of the disjoint union \[\bigsqcup_{u \in C} B_{d-1}(u).\].

Thus, if \[|C|\cdot \delta(d-1) < q^n = |A^n|\] then \[\bigcup_{u \in C} B_{d-1}(u) \subsetneq A^n;\] thus, there is some element $v \in A^n$ for which \[v \not \in \bigcup_{u \in C} B_{d-1}(u).\] We then have $\operatorname{dist}(u,v) \ge d$ for every $u \in C$.

This shows that $C \cup \{v\} \subset A^n$ is a code having minimal distance $d$, contradicting the assumption that $|C| = A_q(n,d)$; i.e. contradicting the assumption that $C$ has maximal size among codes $C' \subset A^n$ with minimal distance $d$.

Shannon's Theorem; block codes

Overview

Noisy Channels and Shannon’s Theorem

Block codes

Channel

Decoding

Maximum likelihood decoding

Transmission rate and capacity

Shannon’s Theorem

Bounds for block codes

Counting codes with given parameters