Sanket Shah
Bio
Publications
Posts
Contact
Implicit Layers
Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Problems by Reinforcement Learning
We propose a way to optimally differentiate through Reinforcement Learning. Specifically, we propose two optimality conditions that hold at convergence and show how to (approximately) calculate gradients using them.
Cite
×