Lecture 1.1

Introduction to the Course

📄 PDF Slides

Note

Lecture 1.1 is a course introduction lecture specific to the 2020 cohort (logistics, grading, practicalities). Rather than reproduce that material here, this page gives a general overview of the course arc that is useful to any reader. It will be expanded once all lecture notes are complete.

Machine Learning 1 is a graduate-level course in the Master of AI program at the University of Amsterdam. It follows Bishop's Pattern Recognition and Machine Learning (PRML) as its primary text, with the goal of building a rigorous probabilistic foundation for machine learning — from first principles in probability theory all the way to neural networks, kernel methods, and unsupervised learning.

What This Course Is About

Most introductions to machine learning focus on tools and recipes. This course focuses on understanding why those tools work. The central thread is probabilistic reasoning: the idea that learning from data is fundamentally an exercise in inference under uncertainty. Every major method — linear regression, classification, neural networks, Gaussian processes — is derived from this unifying perspective rather than introduced as a standalone algorithm.

The course takes a Bayesian viewpoint throughout. This means treating model parameters as random variables, quantifying uncertainty explicitly, and deriving predictions via principled marginalization rather than point estimates. This viewpoint not only provides a cleaner theoretical framework — it also directly motivates many practical techniques such as regularization, model selection, and kernel methods.

Course Structure

The course runs over nine weeks, following the chapter structure of Bishop:

Nine-Week Overview

Week	Topic	Bishop chapters
1	Introduction & Probability Theory	1–2
2	Distributions, MLE & Bayesian Prediction	2
3	Linear Regression	3
4	Model Selection & Bayesian Linear Regression	3
5	Evidence Approximation & Classification	3–4
6	Logistic Regression & Neural Networks	4–5
7	Unsupervised Learning & Dimensionality Reduction	9–10
8	Kernel Methods & Gaussian Processes	6, 11
9	Ensemble Methods	13

Prerequisites

The course assumes comfort with:

Linear algebra — vectors, matrices, matrix multiplication, eigendecomposition.
Calculus — differentiation (including partial derivatives and the chain rule), integration, and optimization via setting derivatives to zero.
Basic probability — random variables, probability distributions, expectation. The course rebuilds these from scratch in the first two weeks, but a prior encounter helps.
Programming — assignments use Python.

The Role of Bishop's Book

Pattern Recognition and Machine Learning by Christopher Bishop (2006) is the backbone of the course. It is a dense, mathematically rigorous text that rewards careful reading. The lectures follow its structure and notation closely, and the lecture notes on this site are written to complement it — providing the reasoning and intuition that connects the derivations in the book to the bigger picture.

Reading ahead in Bishop before each lecture, and working through derivations yourself after each lecture, is the most effective way to consolidate the material.