Restless Multi-Armed Bandits

Decision-Focused Learning in Restless Multi-Armed Bandits with Application to Maternal and Child Care Domain

We propose a way to differentiate through MDP Planning for Restless Multi-Armed Bandits. We use this approach to better learn the Transition Matrices from "features" associated with different arms using Decision-Focused Learning.

Q-Learning Lagrange Policies for Multi-Action Restless Bandits

We propose two online model-free algorithms to learn the Whittle Index associated with *multi-action* Restless Multi-Armed Bandits.