Online and Reinforcement Learning (OReL)

Course content

In the classical machine learning data are collected and analysed offline and it is assumed that new data come from the same distribution as the data that the algorithm was trained on. If not, all the theoretical guarantees become void and the empirical performance may deteriorate dramatically. But what if we want to design an algorithm for playing chess? The opponent is not going to sample the moves from a fixed distribution.

Online and reinforcement learning break out of the static realm and move into the realm of perpetual cycle of getting new information, analysing it, and executing actions based on the updated estimation of reality. We consider agents (computer programs, robots, living beings) learning based on interactions with (real or simulated) environments. Examples include problems like repeated investment in the stock market, spam filtering, online advertising, online routing, medical treatments, games, and robotics. It allows to model a much richer range of problems, including problems with limited feedback, problems with delayed feedback, and even adversarial problems, where the environment deliberately acts against the algorithm (as, for example, in chess or spam filtering). At the same time it stimulates the development of fascinating mathematical tools for developing and analyzing algorithms for these problems.

In the course we will cover:

The notion of regret: the evaluation measure, which replaces generalization error in offline learning and makes it possible to define and analyse learning in adversarial environments
Various forms of feedback, including full-information and limited [bandit] feedback

We will introduce the following basic online learning settings, algorithms, and their analysis:

Follow the Leader algorithm
Prediction with expert advice: the Hedge / Exponential Weights algorithm
Stochastic and adversarial multiarmed bandits: UCB1 and EXP3 algorithm
Contextual bandits: EXP4 algorithm

And the following basic reinforcement learning settings, algorithms, and their analysis:

Markov Decision Processes (MDPs)
Monte Carlo Methods for reinforcement learning
Dynamic programming for reinforcement learning
Temporal Difference Learning (e.g., Q-Learning)
Reinforcement learning using function approximators (e.g., Deep Q-Learning)
Online reinforcement learning: average-reward and discounted settings

We will also cover a few advanced topics. The selection of advanced topics will depend on the lecturers and will be announced on Absalon.

The students will learn tools for theoretical analysis of most of the algorithms studied at the course and implement them in Python.

The course will bring the students up to a level sufficient for writing a master thesis in the domain of online and reinforcement learning.

WARNING: If you have not taken DIKU's Machine Learning A course, please, carefully check the "Recommended Academic Qualifications" box below. Machine Learning courses given at other places do not necessarily prepare you well for this course, because DIKU's machine learning courses have a stronger theoretical component than average machine learning courses offered elsewhere. It is not advised taking the course if you do not meet the academic qualifications.

Education

MSc Programme in Computer Science

MSc Programme in Statistics

MSc Programme in Mathematics-Economics

Learning outcome

Knowledge of

Evaluation measures used in online and reinforcement learning
Basic online learning settings
Basic reinforcement learning settings
Basic algorithms for online and reinforcement learning problems
Basic tools for theoretical analysis of these algorithms

Skills in

Reading and understanding recent scientific literature in the field of online and reinforcement learning
Formalizing and solving online and reinforcement learning problems
Applying the knowledge obtained by reading scientific papers
Analyzing online and reinforcement learning algorithms and implementing them

Competences in

Understanding advanced methods, and applying the knowledge to practical problems
Planning and carrying out self-learning

Teaching and learning methods

Lectures, exercise classes, and weekly home assignments.

Literature

See Absalon when the course is set up.

Recommended prerequisites

It is assumed that the students have successfully passed the "Machine Learning A" course offered by the Department of Computer Science (DIKU) (or the older “Machine Learning” course). Be aware that machine learning courses offered outside DIKU do not necessarily prepare you well for the course.

The course requires a strong mathematical background. It is suitable for computer science master students, as well as students from mathematics (statistics, actuarial math, math-economics, etc) and physics study programs, who have basic Python programming skills.
Students from other study programs should have their math skills at least at the level of computer science bachelor, and basic Python programming skills. Under https://sites.google.com/diku.edu/machine-learning-courses/orel we provide concrete topics and exercises from the Machine Learning A course that we rely on in OReL.

Feedback form

Written

Continuous feedback during the course of the semester

Exam

ECTS

7,5 ECTS

Type of assessment

Continuous assessment

Type of assessment details

6-8 weekly take-home assignments. The assignments must be solved individually.

The course is based on weekly home assignments, which are graded continuously over the course of the semester. The final grade will be given as an overall assessment.

Aid

All aids allowed

Marking scale

7-point grading scale

Censorship form

No external censorship

Several internal examiners

Re-exam

The re-exam consists of two elements:

The first element is handing in at least 6 of the course assignments no later than 2 weeks prior to the oral part of the re-exam.
The second element is a 30-minute oral examination without preparation in the course curriculum.

The final grade will be given as an overall assessment of the two re-exam elements.

Criteria for exam assessment

See Learning Outcome

Course type

Single subject courses (day)

Workload

Category
Hours
Lectures
28
Preparation
18
Theory exercises
70
Practical exercises
70
Exam
20
English
206

Kursusinformation

Language: English
Course number: NDAK21003U
ECTS: 7,5 ECTS
Programme level: Full Degree Master
Duration: 1 block
Placement: Block 3
Schedulegroup: A

This is an on-site course, but we support remote participation via online streaming and lecture recording.
Capacity: No limitation – unless you register in the late-registration period (BSc and MSc) or as a credit or single subject student.
Studyboard: Study Board of Mathematics and Computer Science

Contracting department

Department of Computer Science

Contracting faculty

Faculty of Science

Course Coordinator

Sadegh Talebi (13-776568696b6c3278657069666d44686d326f7932686f)

Teacher

Yevgeny Seldin, Christian Igel, and Sadegh Talebi

Saved on the 14-02-2024

Tilbage

Er du BA- eller KA-studerende?

Er du bachelor- eller kandidat-studerende, så find dette kursus i kursusbasen for studerende:

Kursusinformation for indskrevne studerende

Efter- og videreuddannelseKursussøgning