Information Theory and Applications
This course introduces the foundations and applications of information theory from a principled, probabilistic perspective. Its aim is to clarify what quantities such as entropy, relative entropy, and mutual information actually measure, and why they arise naturally in problems of uncertainty, inference, and physical systems. The course emphasises conceptual understanding and mathematical structure, while illustrating how information-theoretic ideas connect to neuroscience, physics, statistics, and machine learning. The purpose is not only to present the formalism, but to develop a clear and operational understanding of how information-theoretic tools can be meaningfully applied across disciplines.
- To demonstrate a solid understanding of the fundamental concepts of information theory
- To develop the ability to construct rigorous mathematical arguments and proofs within the framework of probability theory and information theory
- To prove and apply the fundamental theorems of information theory, such as source coding and channel coding
- To use Information Theory methods to model problems of a probabilistic nature
- To apply the principles and techniques of information theory to characterise problems of a probabilistic nature: large deviation theory, hypothesis testing, Bayesian inference, etc.
- To acquire the language to describe impossibility results in settings like estimation, testing,
This course introduces the foundations and applications of information theory from a rigorous probabilistic perspective. Beginning with a review of probability and discrete random sources, we develop entropy as a quantitative measure of uncertainty and establish the fundamental limits of lossless data compression. We then introduce relative entropy (Kullback–Leibler divergence) and show how it governs hypothesis testing and statistical distinguishability. These ideas are extended to information transmission, including mutual information, channel capacity, and the Gaussian channel model.
In the second part of the course, information-theoretic methods are applied to problems in statistical inference and learning. Topics include parameter estimation and the Cramér–Rao bound, Fano’s inequality, exploration bias in data analysis, and information-theoretic perspectives on generalisation error. Throughout, the emphasis is on operational meaning, structural results, and fundamental limits rather than computational techniques.
Depending on available time and the background of the students, additional applications and selected modern developments may be discussed, particularly finite-sample and non-asymptotic perspectives relevant to contemporary research in statistics, neuroscience, physics, and machine learning.
1. Review of probability theory
2. Data Compression: coding theorem for a discrete memoryless source
3. Entropy
4. Lossless coding
5. Kullback-Leibler Divergence
6. Hypothesis Testing
7. Channel Coding: Information Transmission Theorem and Mutual Information
8. Continuous random variables and information-theoretic quantities, Gaussian channel
9. Parameter estimation, Cramér-Rao, Fano's inequality
10. Exploration bias in data science
11. Generalisation error of learning algorithms
Depending on available time and the students' backgrounds, additional applications and selected modern developments will be discussed, particularly results on finite-sample behaviour and non-asymptotic analysis.
Final exam 50%, oral presentation 30%, in-term tests/quizzes 20%
Students are expected to have:
- Some knowledge of probability theory
(random variables, expectation, conditioning, Bayes’ rule)
- Basic mathematical maturity
(comfort with proofs, inequalities, and abstract reasoning)
- Familiarity with calculus and linear algebra
No prior knowledge of information theory is assumed.
Programming experience is not required.
Elements of Information Theory, Thomas Cover and Joy A. Thomas
Information Theory: From Coding to Learning, Yury Polyanskiy and Yihong Wu
Information Theory, Coding Theorems for Discrete Memoryless Systems. Imre Csiszár and Jànos Körner