An Automated Deep Reinforcement Learning Pipeline for Dynamic Pricing

Reza Refaei Afshar (Corresponding author), Jason Rhuggenaath, Yingqian Zhang, Uzay Kaymak

Onderzoeksoutput: Bijdrage aan tijdschriftTijdschriftartikelAcademicpeer review

5 Citaten (Scopus)
350 Downloads (Pure)

Samenvatting

A dynamic pricing problem is difficult due to the highly dynamic environment and unknown demand distributions. In this article, we propose a deep reinforcement learning (DRL) framework, which is a pipeline that automatically defines the DRL components for solving a dynamic pricing problem. The automated DRL pipeline is necessary because the DRL framework can be designed in numerous ways, and manually finding optimal configurations is tedious. The levels of automation make nonexperts capable of using DRL for dynamic pricing. Our DRL pipeline contains three steps of DRL design, including Markov decision process modeling, algorithm selection, and hyperparameter optimization. It starts with transforming available information to state representation and defining reward function using a reward shaping approach. Then, the hyperparameters are tuned using a novel hyperparameter optimization method that integrates Bayesian optimization and the selection operator of the genetic algorithm. We employ our DRL pipeline on reserve price optimization problems in online advertising as a case study. We show that using the DRL configuration obtained by our DRL pipeline, a pricing policy is obtained whose revenue is significantly higher than the benchmark methods. The evaluation is performed by developing a simulation for the real-time bidding environment that makes exploration possible for the reinforcement learning agent.

Originele taal-2Engels
Pagina's (van-tot)428-437
Aantal pagina's10
TijdschriftIEEE Transactions on Artificial Intelligence
Volume4
Nummer van het tijdschrift3
Vroegere onlinedatum27 jun. 2022
DOI's
StatusGepubliceerd - 1 jun. 2023

Vingerafdruk

Duik in de onderzoeksthema's van 'An Automated Deep Reinforcement Learning Pipeline for Dynamic Pricing'. Samen vormen ze een unieke vingerafdruk.

Citeer dit