TY - JOUR
T1 - An Automated Deep Reinforcement Learning Pipeline for Dynamic Pricing
AU - Refaei Afshar, Reza
AU - Rhuggenaath, Jason
AU - Zhang, Yingqian
AU - Kaymak, Uzay
PY - 2023/6/1
Y1 - 2023/6/1
N2 - A dynamic pricing problem is difficult due to the highly dynamic environment and unknown demand distributions. In this article, we propose a deep reinforcement learning (DRL) framework, which is a pipeline that automatically defines the DRL components for solving a dynamic pricing problem. The automated DRL pipeline is necessary because the DRL framework can be designed in numerous ways, and manually finding optimal configurations is tedious. The levels of automation make nonexperts capable of using DRL for dynamic pricing. Our DRL pipeline contains three steps of DRL design, including Markov decision process modeling, algorithm selection, and hyperparameter optimization. It starts with transforming available information to state representation and defining reward function using a reward shaping approach. Then, the hyperparameters are tuned using a novel hyperparameter optimization method that integrates Bayesian optimization and the selection operator of the genetic algorithm. We employ our DRL pipeline on reserve price optimization problems in online advertising as a case study. We show that using the DRL configuration obtained by our DRL pipeline, a pricing policy is obtained whose revenue is significantly higher than the benchmark methods. The evaluation is performed by developing a simulation for the real-time bidding environment that makes exploration possible for the reinforcement learning agent.
AB - A dynamic pricing problem is difficult due to the highly dynamic environment and unknown demand distributions. In this article, we propose a deep reinforcement learning (DRL) framework, which is a pipeline that automatically defines the DRL components for solving a dynamic pricing problem. The automated DRL pipeline is necessary because the DRL framework can be designed in numerous ways, and manually finding optimal configurations is tedious. The levels of automation make nonexperts capable of using DRL for dynamic pricing. Our DRL pipeline contains three steps of DRL design, including Markov decision process modeling, algorithm selection, and hyperparameter optimization. It starts with transforming available information to state representation and defining reward function using a reward shaping approach. Then, the hyperparameters are tuned using a novel hyperparameter optimization method that integrates Bayesian optimization and the selection operator of the genetic algorithm. We employ our DRL pipeline on reserve price optimization problems in online advertising as a case study. We show that using the DRL configuration obtained by our DRL pipeline, a pricing policy is obtained whose revenue is significantly higher than the benchmark methods. The evaluation is performed by developing a simulation for the real-time bidding environment that makes exploration possible for the reinforcement learning agent.
KW - AutoRL
KW - Heuristic algorithms
KW - Machine learning algorithms
KW - Mathematical models
KW - Optimization
KW - Pipelines
KW - Pricing
KW - Reinforcement learning
KW - automated reinforcement learning pipeline
KW - bayesian optimization
KW - dynamic pricing
KW - dynamic pricing (DP)
KW - Automated reinforcement learning (AutoRL) pipeline
KW - Bayesian optimization (BO)
UR - http://www.scopus.com/inward/record.url?scp=85133739199&partnerID=8YFLogxK
U2 - 10.1109/TAI.2022.3186292
DO - 10.1109/TAI.2022.3186292
M3 - Article
SN - 2691-4581
VL - 4
SP - 428
EP - 437
JO - IEEE Transactions on Artificial Intelligence
JF - IEEE Transactions on Artificial Intelligence
IS - 3
ER -