The Notebook

**H(X) = -\sum_{i}p_{i}\log_{b}p_{i}**

**Data Representation and Compression**

Below, I provide an ASCII (American Standard Code for Information Interchange) encoded "hello" below. In ASCII, each letter -and other control characters, such as

So, there are 26 letters in the English alphabet, and we can say that $5$ bits would be enough to encode all characters, since :

**2^{5} = 32 > 26**

Nevertheless, ASCII has 127 different values --See Reference 1, which at least requires 7 bits per value. The rest of the document tries to answer the following question:

"What would be the minimum number of bits that we have to use if we knew the distribution of letters for a particular text that we have?"

So, as it could be seen in Figure above --and we all well-know, some of the letters are more often used than the others. Here, letter "e" is the most commonly used one in English. So, would it be better to represent "e" with fewer bits than the others, if the letter distribution of our text would be exactly the same with the distribution above. If so, how less could it be?

**Uniquely Decodable Codes**

Here, a code is

A uniquely decodable code is called a

The code below is decodable but it is not a prefix code because the codeword for "a" is a prefix for the codeword for "b". This means that we can not instantaneously decode "a" without waiting for the next bit of data (to determine whether it is actually "a" or just the first half of "b".)

Alternatively, the code below is a prefix code, since no codeword is a prefix of another codeword.

**Kraft's Inequality**

**\sum_{i=1}^{r} s^{-n_{i}} \leq 1**

In our previous example code, the alphabet is simply the binary digits

**2^{-1} + 2^{-3} + 2^{-4} + 2^{-4} \leq 1**

As it could be observed in the following Code Tree, there are

So, a codeword at level

Besides, descendant sets of codewords must be disjoint to satisfy prefix-free requirement. As an example, "1" and "10" has joint descendants, which are "101" and "100", hence a prefix-free codeword can not contain both "1" and "10".

So, we know that the total number of forbidden codewords at step

**\sum_{i=1}^{j-1} s^{n_j - n_i}**

As an example, if our level

**\sum_{i=1}^{j-1} s^{n_j - n_i} = 2^{(3-1)} + 2^{(3-2)} = 6**

For each level

**\sum_{i=1}^r s^{n_r - n_i} \leq s^{n_r} \\
\Rightarrow \sum_{i=1}^{r} s^{-n_{i}} \leq 1**

**Shannon Entropy**

**\sum_{i=1}^r p_{i}n_{i}**

**\sum_{i=1}^r p_{i}\log_{s}\frac{1}{p_{i}}**

So, here we calculate the difference between the Shannon entropy and the expected length of the code is:

**\sum_{i=1}^r p_{i}\log_{s}\frac{1}{p_{i}} - \sum_{i=1}^r p_{i}n_{i} = \sum_{i=1}^r p_{i}(\log_{s}\frac{1}{p_{i}} - \log_{s}s^{n_{i}})**

**=\sum_{i=1}^r p_{i}(\log_{s}\frac{1}{p_{i}} + \log_{s}s^{-n_{i}})**

**=\sum_{i=1}^r p_{i}\log_{s} \frac{s^{-n_{i}}}{p_{i}}**

**\leq \log_{s}(\sum_{i=1}^r p_{i} \frac{s^{-n_{i}}}{p_{i}})**

**= \log_{s} (\sum_{i=1}^r s^{-n_{i}})**

**\leq \log_{s} 1 = 0**

- http://ee.hawaii.edu/~tep/EE160/Book/chap4/subsection2.1.1.1.html
- https://mortada.net/simple-proof-for-krafts-inequality.html
- https://www.ics.uci.edu/~dan/pubs/DC-Sec1.html
- https://www2.isye.gatech.edu/~yxie77/ece587/Lecture7.pdf