A Reward Shaping Approach for Reserve Price Optimization using Deep Reinforcement Learning

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

91 Downloads (Pure)

Abstract

Real Time Bidding is the process of selling and buying online advertisements in real time auctions. Real time auctions are performed in header bidding partners or ad exchanges to sell publishers' ad placements. Ad exchanges run second price auctions and a reserve price should be set for each ad placement or impression. This reserve price is normally determined by the bids of header bidding partners. However, ad exchange may outbid higher reserve prices and optimizing this value largely affects the revenue. In this paper, we propose a deep reinforcement learning approach for adjusting the reserve price of individual impressions using contextual information. Normally, ad exchanges do not return any information about the auction except the sold-unsold status. This binary feedback is not suitable for maximizing the revenue because it contains no explicit information about the revenue. In order to enrich the reward function, we develop a novel reward shaping approach to provide informative reward signal for the reinforcement learning agent. Based on this approach, different intervals of reserve price get different weights and the reward value of each interval is learned through a search procedure. Using a simulator, we test our method on a set of impressions. Results show superior performance of our proposed method in terms of revenue compared with the baselines.
Original languageEnglish
Title of host publicationProceedings of the International Joint Conference on Neural Networks (IJCNN2021)
Publication statusAccepted/In press - 10 Apr 2021
Event2021 International Joint Conference on Neural Networks (IJCNN) -
Duration: 18 Jul 202122 Jul 2021

Conference

Conference2021 International Joint Conference on Neural Networks (IJCNN)
Period18/07/2122/07/21

Fingerprint

Dive into the research topics of 'A Reward Shaping Approach for Reserve Price Optimization using Deep Reinforcement Learning'. Together they form a unique fingerprint.

Cite this