Course ATS: Advanced Topics in Stochastic Operations Research: Multi-armed bandit theory and applications

Former title course "MAB: Multi-armed bandit theory and applications"

Time:	Monday 15.15 – 17.00 (November 20 - December 18 and January 22 - February 26).
Location:	Campus Utrecht Science Park. Details about lecture rooms follow after registration.
Lecturer:	Dr. A.V. den Boer (UvA) and Dr. O. Kanavetas (UL)

Course description:
In the first part of this course we will study data-driven decision problems: optimization problems for which the relation between decision and outcome is unknown upfront, and thus has to be learned on-the-fly from accumulating data. This type of problems has an intrinsic tension between statistical goals and optimization goals: learning how the system behaves (the statistical goal) is accelerated by experimenting with different actions, while for taking good decisions (the optimization goal), one would like to limit experimentation and instead use estimated optimal decisions. We will study this `exploration-exploitation' trade-off for so-called `multi-armed bandit problems', the paradigmatic framework for dynamic optimization problems with incomplete information.

In the second part of this course we will study Reinforcement learning. Reinforcement learning has evolved into one of the most dynamic research domains within machine learning, artificial intelligence, and neural network research. The core objective in reinforcement learning is twofold: the creation of effective learning algorithms and the attainment of profound insights into the capabilities and limitations of these algorithms. In this segment of the course, our mission is to offer a concise and accessible presentation of pivotal concepts and algorithms in reinforcement learning. We will explore a range of learning challenges, clarify foundational principles, showcase multiple state-of-the-art algorithms, and then engage in comprehensive discussions concerning their properties and constraints.

Literature:
First part: Bandit Algorithms by Tor Lattimore and Csaba Szepesvari (online available)
Second part: Algorithms for Reinforcement Learning by Csaba Szepesvari (online available)

Prerequisites:
Probability theory and statistics, and some coding skills (Python/Matlab).

Examination:
To be determined.

Address of the lecturer:
Dr. A.V. den Boer
Faculteit der Natuurwetenschappen, Wiskunde en Informatica
Universiteit van Amsterdam
Postbus 94248, 1090 GE Amsterdam
Phone: 020-5252497 E-mail: A.V.denBoer@uva.nl

Dr. O. Kanavetas
Mathematical Institute, Leiden University
P.O. Box 9512, 2300 RA Leiden
Phone: 071 - 5277126 E-mail: o.kanavetas@math.leidenuniv.nl