E2 209 Topics in Information Theory and Statistical Learning
(Jan-Apr 2018)

Instructor: Himanshu Tyagi

Latest Announcements

  • Homework 2 posted. Due on April 23, 2018.
  • Course project presentations on April 19 and 20 from 1:00pm to 5:00pm. Coffee/chai/samosa on the house!
  • No classes on April 9, 12, and 16. Makeup classes on April 17 and 18, 2:00pm to 5:00pm.
  • The list of course projects is available here.
  • Homework 1 posted. Due on March 5, 2018.
  • First class on Wednesday, January 3 at 2:00pm in ECE 1.07

Time & Venue

  • Class: Monday and Wednesday 2pm-3:30pm in ECE 1.07

Homework

Agenda

I am providing my handwritten lecture notes below. These are very raw and must be treated the same way as raw fish.

Part 1: Estimation and Testing

  • Lecture 1: Course outline; examples from distribution testing; decision theoretic framework; total variation distance. [pdf]
  • Lecture 2: Kullback-Leibler divergence and its properties; Fano's inequality. [pdf]
  • Lecture 3: Estimating the bias of a coin; Fano-method for proving lower bounds for minmax risk. [pdf]
  • Lecture 4: Concentration inequalities; the median trick; estimating Gaussian mean. [pdf]
  • Lecture 5: Lower bound for Gaussian mean estimation. [pdf]
  • Lecture 6: Scheffe selector. [pdf]
  • Lecture 7: Gaussian mean estimation using Scheffe selector. [pdf]
  • Lecture 8: Learning Gaussian mixtures using Scheffe: 1-dimensional case. [pdf]
  • Lecture 9: Approximating the k-dimensional space containing means. [pdf]
  • Lecture 10: Learning Gaussian mixtures using Scheffe: high dimensional case. [pdf]
  • Lecture 11: Introduction to distribution testing; uniformity testing upper bound. [pdf]
  • Lecture 12: Uniformity testing lower bounds. [pdf]
  • Part 2: Compression, Estimation, and Model Selection

  • Lecture 13: Compression and probability assignment; Bayesian approach. [pdf]
  • Lecture 14: Dirichlet prior for probability assignment; Shannon lower bound. [pdf]
  • Lecture 15: Universal portfolio, Cover's algorithm, and probability assignment. [pdf]
  • Lecture 16: Universal portfolio and prediction. [pdf]
  • Lecture 17-20: Multiplicative weight update for prediction and multiarmed bandit problems; lower bounds; universal portfolio using multiplicative weight update. [17-20.pdf]
  • Course description

    Information theory is an indispensable tool for a statistician (data scientist?). It allows one to quantize the information a random variable reveals about another, bit-by-bit. This is in contrast to the classical probability view where one can quantify knowing or not knowing a random variable, but not having a bit of information about it. The incremental notion of information and uncertainty provided by information theory enables the proof of lower bounds for statistical problems, the design of penalty functions for model selection, and the design of information theoretic metrics for feature selection. Furthermore, the measures of information provided by information theory gives rise to a geometry on the probability simplex which in turn provides a heuristic interpretation of several popular optimization procedures used in machine learning.

    In this topics course, we will present several use-cases for information theory in statistics and machine learning and cover recent developments along with classic results. Our plan is to cover the following topics (but as a wise person once said: Prediction is difficult, especially about the future).
    1. Part 1: Information theoretically optimal distribution testing and learning: Minmax and Bayesian formulations; Information theoretic lower bounds; case studies such as uniformity testing, learning Gaussian mixtures, etc..
    2. Part 2: Probability estimation and compression: Optimal redundancy compression; estimating discrete probabilities; universal portfolio and online learning; context tree weighting; model selection: BIC and MDL criteria.
    3. Part 3: Information geometry: A geometric view of parametric families; an information geometric view of popular optimization procedures (case studies: Alternating minization, belief propagation).

    Resources: We will be drawing from several recent papers and books. The specific source will be given along with the lecture notes.

    Prerequisites

    A graduate level course in information theory, probability theory, or statistics.

    Grading

    Homework (2-4)   60%
    Literature review (to be done in groups of size 2)   30%
    Class Participation   10%