Reinforcement Learning

Decision-Focused Learning in Restless Multi-Armed Bandits with Application to Maternal and Child Care Domain

We propose a way to differentiate through MDP Planning for Restless Multi-Armed Bandits. We use this approach to better learn the Transition Matrices from "features" associated with different arms using Decision-Focused Learning.

Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Problems by Reinforcement Learning

We propose a way to optimally differentiate through Reinforcement Learning. Specifically, we propose two optimality conditions that hold at convergence and show how to (approximately) calculate gradients using them.

Q-Learning Lagrange Policies for Multi-Action Restless Bandits

We propose two online model-free algorithms to learn the Whittle Index associated with *multi-action* Restless Multi-Armed Bandits.