Implicit Layers | Sanket Shah

Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Problems by Reinforcement Learning

We propose a way to optimally differentiate through Reinforcement Learning. Specifically, we propose two optimality conditions that hold at convergence and show how to (approximately) calculate gradients using them.