top of page

Reinforcement Learning for Real Time Bidding

Master’s thesis carried out at Emerse Sverige AB for the Department of Computer Science, Lund University.


Author: Erik Smith


Supervisors:

Pierre Nugues, Department of Computer Science, Faculty of Engineering, Lund University

Elin Anna Topp, Department of Computer Science, Faculty of Engineering, Lund University

Carl-Johan Grund, Emerse Sverige AB

Rasmus Larsson, Emerse Sverige AB



Today, the most common software-based approach to trading advertising slots is real time bidding: as soon as the user begins to load the web page, an auction for the slot is held in real time, and the highest bidder gets to display their advertisement of choice. But each bidder has a limited budget, and strives to spend it in a manner that maximizes the value of the advertisement slots bought. In this thesis, we formalize this problem by modelling the bidding process as a Markov decision process. To find the optimal auction bid, two different solution methods are proposed: value iteration and actor–critic policy gradients. The effectiveness of the value iteration Markov decision process approach (versus other common baselines methods) is demonstrated on real-world auction data.





bottom of page