An Automated Deep Reinforcement Learning Pipeline for Dynamic Pricing

Reza Refaei Afshar (Corresponding author), Jason Rhuggenaath, Yingqian Zhang, Uzay Kaymak

Research output: Contribution to journalArticleAcademicpeer-review

5 Citations (Scopus)
296 Downloads (Pure)

Abstract

A dynamic pricing problem is difficult due to the highly dynamic environment and unknown demand distributions. In this article, we propose a deep reinforcement learning (DRL) framework, which is a pipeline that automatically defines the DRL components for solving a dynamic pricing problem. The automated DRL pipeline is necessary because the DRL framework can be designed in numerous ways, and manually finding optimal configurations is tedious. The levels of automation make nonexperts capable of using DRL for dynamic pricing. Our DRL pipeline contains three steps of DRL design, including Markov decision process modeling, algorithm selection, and hyperparameter optimization. It starts with transforming available information to state representation and defining reward function using a reward shaping approach. Then, the hyperparameters are tuned using a novel hyperparameter optimization method that integrates Bayesian optimization and the selection operator of the genetic algorithm. We employ our DRL pipeline on reserve price optimization problems in online advertising as a case study. We show that using the DRL configuration obtained by our DRL pipeline, a pricing policy is obtained whose revenue is significantly higher than the benchmark methods. The evaluation is performed by developing a simulation for the real-time bidding environment that makes exploration possible for the reinforcement learning agent.

Original languageEnglish
Pages (from-to)428-437
Number of pages10
JournalIEEE Transactions on Artificial Intelligence
Volume4
Issue number3
Early online date27 Jun 2022
DOIs
Publication statusPublished - 1 Jun 2023

Keywords

  • AutoRL
  • Heuristic algorithms
  • Machine learning algorithms
  • Mathematical models
  • Optimization
  • Pipelines
  • Pricing
  • Reinforcement learning
  • automated reinforcement learning pipeline
  • bayesian optimization
  • dynamic pricing
  • dynamic pricing (DP)
  • Automated reinforcement learning (AutoRL) pipeline
  • Bayesian optimization (BO)

Fingerprint

Dive into the research topics of 'An Automated Deep Reinforcement Learning Pipeline for Dynamic Pricing'. Together they form a unique fingerprint.

Cite this