1. What did you find out?
We introduced the action mapping concept for safe reinforcement learning (RL). Current RL methods are often untrustworthy, necessitating the development of safety or feasibility models to constrain RL agents. The key challenge is to efficiently combine these feasibility models with RL. Existing methods focus on constraint satisfaction, not learning efficiency. Our action mapping approach first learns all feasible actions and subsequently selects the optimal one among them. The key advantage is that learning feasible actions does not require environmental interactions, only feasibility model queries.
2. What unique challenges did you face during your research?
A major challenge was the absence of existing algorithms for self-supervised learning to generate all feasible actions. Typically, RL algorithms focus on learning one best action for a state, not all feasible ones. Moreover, when training a generative neural network to produce elements of a class, such as images of human faces, a large dataset of that class is required. However, we did not have a dataset of feasible actions in our case. At the end we developed a novel algorithm capable of generating all feasible actions without relying on a pre-existing dataset.
3. What are the practical uses of your new findings?
This research marks the initial step in developing the action mapping framework for safe reinforcement learning. The next step involves learning to choose the optimal action from the feasible actions, completing the framework. If successful, this approach has the potential to significantly improve the learning efficiency of safe RL, making it more applicable to a wide range of real-world systems. Practical applications include autonomous systems, robotics, and any safety-critical environment where reliable decision-making is essential.
Professor responsible for this research project: Prof. Dr Marco Caccomo, Chair of Cyber-Physical Systems in Product Engineering, Technical University of Munich (TUM)