TY - UNPB
T1 - Safe and Time-Efficient Exploration in Reinforcement Learning-Based Control of Vehicle Thermal Systems
AU - Garg, P.
AU - Silvas, Emilia
AU - Willems, Frank P.T.
PY - 2024/7/16
Y1 - 2024/7/16
N2 - Reinforcement Learning has achieved huge success with various applications in controlled environments. However, limited application is seen in real-world applications due to challenges in guaranteeing safe system operation, required experiment time and required a-priori system knowledge and models in existing methods. To address these limitations, we propose a novel exploration method that integrates a reciprocal Control Barrier Function and an on-line learned Gaussian Process Regression model. For safe system operation, we leverage the information from the reciprocal Control Barrier Function to limit the step-size of the agent's actions, when approaching the safety boundary. To make this exploration process time-efficient, we use the information gain metrics that are calculated using the estimation of the action-values by an on-line learned Gaussian Process Regression model to determine the direction of the agent's actions. We demonstrate the potential of our exploration method in simulation for efficiency-optimal calibration of a thermal management system for battery electric vehicles. To quantify the benefits in terms of safety, optimality and timeefficiency, we benchmark our exploration method with random and uncertainty-driven exploration methods. For the studied test case, the proposed exploration method satisfies the safety constraint and it converges to within 1.25% of the true optimal action while requiring 28% and 18% lower experiment time compared to the random and uncertainty-driven exploration methods, respectively.
AB - Reinforcement Learning has achieved huge success with various applications in controlled environments. However, limited application is seen in real-world applications due to challenges in guaranteeing safe system operation, required experiment time and required a-priori system knowledge and models in existing methods. To address these limitations, we propose a novel exploration method that integrates a reciprocal Control Barrier Function and an on-line learned Gaussian Process Regression model. For safe system operation, we leverage the information from the reciprocal Control Barrier Function to limit the step-size of the agent's actions, when approaching the safety boundary. To make this exploration process time-efficient, we use the information gain metrics that are calculated using the estimation of the action-values by an on-line learned Gaussian Process Regression model to determine the direction of the agent's actions. We demonstrate the potential of our exploration method in simulation for efficiency-optimal calibration of a thermal management system for battery electric vehicles. To quantify the benefits in terms of safety, optimality and timeefficiency, we benchmark our exploration method with random and uncertainty-driven exploration methods. For the studied test case, the proposed exploration method satisfies the safety constraint and it converges to within 1.25% of the true optimal action while requiring 28% and 18% lower experiment time compared to the random and uncertainty-driven exploration methods, respectively.
U2 - 10.36227/techrxiv.172114904.44789755/v1
DO - 10.36227/techrxiv.172114904.44789755/v1
M3 - Preprint
BT - Safe and Time-Efficient Exploration in Reinforcement Learning-Based Control of Vehicle Thermal Systems
PB - TechRxiv
ER -