Contextual Bandit: from Theory to Applications. - Vernade - Workshop 3 - CEB T1 2019

Claire Vernade (Google Deepmind) / 05.04.2019 Contextual Bandit: from Theory to Applications. Trading exploration versus exploration is a key problem in computer science: it is about learning how to make decisions in order to optimize a long-term cost. While many areas of machine learning aim at estimating a hidden function given a dataset, reinforcement learning is rather about optimally building a dataset of observations of this hidden function that contains just enough information to guarantee that the maximum is being properly estimated. The first part of this talk reviews the main techniques and results known on the contextual linear bandit. We'll mostly rely on the recent book of Lattimore and Szepesvari (2019) [1]. Indeed, real-world problems often don't behave as the theory would like them to. In the second part of this talk, we want to share our experience in applying bandit algorithms in industry [2]. In particular, it appears that while the system is supposed to be interacting with its environment, the customers' feedback is often delayed or missing and does not allow to perform the necessary updates. We propose a solution to this issue, propose some alternative models and architecture, and finish the presentation with open questions on sequential learning beyond bandits. [1] Lattimore, Tor, and Csaba Szepesvári. Bandit algorithms. preprint (2018). [2] Vernade, Claire, et al. Contextual bandits under delayed feedback. arXiv preprint arXiv:1807.02089 (2018) ---------------------------------- Vous pouvez nous rejoindre sur les réseaux sociaux pour suivre nos actualités. Facebook : / instituthenripoincare Twitter : / inhenripoincare Instagram : / instituthenripoincare ************************************* Langue : Anglais; Date : 05.04.2019; Conférencier : Vernade, Claire; Évenement : Workshop 3 - CEB T1 2019; Lieu : IHP; Mots Clés :

On the Global Convergence of Gradient Descent for (...) - Bach - Workshop 3 - CEB T1 2019

On the Global Convergence of Gradient Descent for (...) - Bach - Workshop 3 - CEB T1 2019

Yann LeCun's $1B Bet Against LLMs [Part 1]

Yann LeCun's $1B Bet Against LLMs [Part 1]

The Contextual Bandits Problem

The Contextual Bandits Problem

Group theory, abstraction, and the 196,883-dimensional monster

Group theory, abstraction, and the 196,883-dimensional monster

Yann LeCun: World Models: Enabling the next AI revolution

Yann LeCun: World Models: Enabling the next AI revolution

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

AlphaFold - The Most Useful Thing AI Has Ever Done

AlphaFold - The Most Useful Thing AI Has Ever Done

The problem with pretending quantum mechanics makes sense | Sean Carroll

The problem with pretending quantum mechanics makes sense | Sean Carroll

Bandit Algorithms - 1

Bandit Algorithms - 1

What do tech pioneers think about the AI revolution? - The Engineers, BBC World Service

What do tech pioneers think about the AI revolution? - The Engineers, BBC World Service

Inside Black Holes | Leonard Susskind

Inside Black Holes | Leonard Susskind

You’ll stop using ChatGPT after listening to this | Jonathan Pageau [ARC 2026]

You’ll stop using ChatGPT after listening to this | Jonathan Pageau [ARC 2026]

Ilya Sutskever – We're moving from the age of scaling to the age of research

Ilya Sutskever – We're moving from the age of scaling to the age of research

But how do AI images and videos actually work? | Guest video by Welch Labs

But how do AI images and videos actually work? | Guest video by Welch Labs

Google DeepMind Distinguished Eng (L9): How To Land a Job at a Frontier Lab | Vlad Feinberg

Google DeepMind Distinguished Eng (L9): How To Land a Job at a Frontier Lab | Vlad Feinberg

"A.I. and Our Economic Future," Professor Chad Jones

"A.I. and Our Economic Future," Professor Chad Jones

How to Invent Everything | Ryan North

How to Invent Everything | Ryan North

1: Introduction to Neural Networks and Deep Learning; Training Deep NNs

1: Introduction to Neural Networks and Deep Learning; Training Deep NNs

Machine learning - Bayesian optimization and multi-armed bandits

Machine learning - Bayesian optimization and multi-armed bandits

Optimization and Contextual Bandits at Stripe

Optimization and Contextual Bandits at Stripe