Abstract:
In the first part, we will present an introduction to discounted and average-cost MDPs. Discounted-cost MDPs are more commonly studied in the reinforcement learning and approximate dynamic programming literature. We will present some reasons why average-cost MDPs are harder to study in this context. In the second part, we will present some recent results on approximate dynamic programming and reinforcement learning for average-cost MDPs. Specifically, we will present some new results on approximate policy iteration and soft policy iteration.
This is joint work with Yashaswini Murthy and Mehrdad Moharrami.