E2 201 Information Theory (Aug-Dec 2016)

Instructor: Himanshu Tyagi
Teaching Assistants: G D Raghava and K R Sahasranand

Latest Announcements

  • Final exam on November 26 from 4:00 pm to 6:00 pm in ECE 1.08.
  • HW6 solutions and solutions for midterm 2 posted.
  • HW7 posted. The folder contains a detailed instruction file, the template code, and a reference paper.
  • HW6 posted. Quiz based on this HW will be held on Friday, November 18, 2016 at 5:00pm in ECE 108.
  • HW5 posted. Quiz based on this HW will be held on Friday, October 28, 2016 at 5:00pm in ECE 108.
  • Midterm-2 to be held on Friday, November 4, 2016 at 5:00pm in ECE 108.
  • Extra class on Friday, October 21, 2016 at 5:00pm in ECE 108.
  • Review session on HW4 on Friday, October 14, 2016 at 5:00pm in ECE 108.
  • HW4 posted. Quiz based on this HW will be held on Tuesday, October 18, 2016 at 4:00pm in ECE 108.

Time & Venue

  • Class: Tu-Th 4-5:30pm in ECE 1.08
  • Review session: Typically on Fr 5-6pm in ECE 1.08
  • Instructor office hours: By appointment


This is a roughly 14 weeks course. It will be divided into the following three parts:
  1. Basics: Introduction to Information theory; a simple data compression problem; transmission of two messages over a noisy channel (hypothesis testing); measures of information and their properties.
  2. Source and Channel coding: Data compression; transmission over noisy channels
  3. Special topics (time permitting): Differential entropy; Rate-distortion theory;
Lecture notes will be uploaded before or within a week of the class. However, these are simply a sketch of what was/will be discussed in the class, and I suggest that the students maintain their own class notes.

    Part 1: Basics

  • Lecture 1: Introduction to Information Theory, course logistics; introduction to data compression; review of probability [pdf]
  • Lecture 2: Review of probability, a source coding theorem [pdf]
  • Lecture 3: A source coding theorem (contd.); Memoryless sources and Shannon's lossless source coding theorem [pdf]
  • Lecture 4: Proof of Shannon's lossless source coding theorem; introduction to transmission [pdf]
  • Lecture 5: Introduction to binary hypothesis testing; bounds for error in binary hypothesis testing [pdf]
  • Lecture 6: Bounds for error in binary hypothesis testing (contd.); Stein's lemma; example application: Distinguishing two coins [pdf]
  • Lecture 7: M-ary hypothesis testing and the MAP rule; Union bound for reducing M-ary to binary; Introduction to the channel coding problem [pdf]
  • Lecture 8: Heuristic derivation of Shannon’s channel coding theorem and introduction of mutual information [pdf]
  • Lecture 9: Properties of Information Measures [pdf]
  • Lecture 10: Properties of Information Measures (contd.): Shapes of information measures [pdf]
  • Lecture 11: Properties of Information Measures (contd.): Data processing inequalities; Fano's inequality; Continuity of entropy [pdf]
  • Lecture 12: Properties of Information Measures (contd.): Variational forms of information measures and corollaries [pdf]
  • Part 2a: Data Compression

  • Lecture 13: Introduction to data compression; fixed- and variable-length codes; types of error-free codes: Nonsingular, Uniquely-decodable and Prefix-free [pdf]
  • Lecture 14: Kraft's inequality; Shannon codes; minimum average length for prefix-free and uniquely decodable codes [pdf]
  • Lecture 15: Bounds for length of nonsingular codes; Shannon-Fano code [pdf]
  • Lecture 16: Shannon-Fano-Elias code; Huffman code; codes with error allowed [pdf]
  • Lecture 17: Lower bounds for average length of codes with error allowed; coding a sequence of symbols; Arithmetic code; Introduction to universal codes [pdf]
  • Lecture 18: Fixed-length universally rate optimal codes; introduction to types; duality between probability assignment and universal coding [pdf]
  • Lecture 19: Variable-length universally code using types; Universal arithmetic codes [pdf]
  • Part 2b: Transmission over Noisy Channels

  • Lecture 20: Channel codes and capacity; Bounds for capacity of a binary symmetric channel (sphere packing bound and the GV bound) [pdf]
  • Lecture 21: Feinstein's maximal code construction for a binary symmetric channel [pdf]
  • Lecture 22 (two classes): Extending maximal code construction to a general DMC; method of types [pdf]
  • Lecture 23: Maximal code construction for a general DMC; sphere packing bound for a general DMC; Shannon's channel coding theorem [pdf]
  • Lecture 24: Strong converse for average probability of error; examples; Fano's inequality based week converse; feedback does not improve capacity; source-channel separation [pdf]
  • Lecture 25: Random coding proof of achievability based on typical sets [pdf]
  • Lecture 26: Random coding proof of achievability based on information density [pdf]
  • Lecture 27: Additive Gaussian noise channel; capacity of AGN; proof sketch of the converse via sphere packing [pdf]
  • Lecture 28: Random coding achievability proof for AGN [pdf]
  • Summary of what we have covered in channel coding [pdf]

    Part 3: Special topics

  • Lecture 29: Differential entropy; volume of large probability sets; weak converse for AGN capacity
  • Lecture 30: Lossy compression and rate distortion theorem


Undergraduate level probability and appetite for formal mathematical reasoning.


Homework:   20%   A quiz will be organised based on each homework. Lowest grade will be dropped.
Mid-term 1:   20%   September 30
Mid-term 2:   20%   November 4
Final exam:   40%   November 26 at 4:00pm


While there are many very good textbooks available for Information Theory, there is no required textbook for this course. Here I list a subset from which I will draw my lecture notes:
  1. T. Cover and J. Thomas, Elements of Information Theory, Second edition, Wiley, 2006.
  2. I. Csiszàr and J. Körner, Information Theory: Coding Theorems for Discrete Memoryless Systems, Second edition, Cambridge, 2011.
  3. J. Wolfowitz, Coding Theorems of Information Theory, Probability Theory and Stochastic Processes series, Springer, 1978.
  4. A. Khinchin, Mathematical foundations of information theory, Dover, 2001 edition.
  5. A. Feinstein, Foundations of Information Theory, McGraw-Hill, 1958.
  6. T. S. Han, Information spectrum methods in Information Theory, Stochastic Modeling and Applied Probability series, Springer, 2003.
  7. R. G. Gallager, Information Theory and Reliable Communication, Wiley, 1969.

Course description

This is a graduate level introductory course in Information Theory and will be a prerequisite for many advanced courses. Information Theory is the science for measuring, preserving, transmitting, and estimating information in random data. It was initially proposed by Shannon as a mathematical theory of communication more than five decades ago. It provides the fundamental limits of performance for transmission of messages generated by a random source over a noisy communication channel. On the one hand, Information Theory has been the driving force behind the revolution in digital communication and has led to various practical data compression and error correcting codes that meet the fundamental theoretical limits of performance. On the other hand, over the years, techniques and concepts from Information Theory have found applications well beyond communication theory. In this course, we will introduce the basic notions and results of Information Theory, keeping in mind both its fundamental role in communication theory and its varied applications beyond communication theory. This course, and the follow-up advanced courses to be offered in the future, will be of interest to students from various backgrounds.

Broad list of topics

  • Compression of random data: fixed and variable length source coding theorems and entropy
  • Hypothesis testing: Stein's lemma and Kullback-Leibler divergence
  • Measures of randomness: Properties of Shannon entropy, Kullback-Leibler divergence, and Mutual Information
  • Transmission over a noisy channel: channel coding theorems, joint source-channel coding theorem, the Gaussian channel
  • Lower bounds on minimax cost in parameter estimation using Fano's inequality
  • Quantisation and lossy compression: rate-distortion theorem

Who should take this course

  • Any communication engineer!
  • Computer scientists interested in data compression
  • Computer scientists interested in communication complexity
  • Cryptographers interested in information theoretic security and quantum cryptography
  • Data scientists interested in information theoretic methods and measures of information
  • Statisticians interested in information theoretic lower bounds
  • Physicists interested Quantum Information Theory