Output details
11 - Computer Science and Informatics
University of York
Dynamic Potential-Based Reward Shaping
<12>This paper shows that the theoretical guarantees of potential-based reward shaping (PBRS), an important and useful reinforcement learning (RL) tool, extend to cases where reward shaping is dynamically changing
during learning. This is especially crucial for cases where the reward shaping is being learned online.
As one conference referee noted, "this is a big result". Many researchers in private
correspondence have expressed surprise at our result, and the work already led to a joint publication at
AAMAS with researchers from the University of Nebraska. The paper shows rigour, containing both theoretical proofs and empirical demonstrations of the theoretical guarantees.