• Posted by Intent Media 26 Nov
  • 0 Comments

Animating Bayesian Bandits in Python

Randomized probability matching is a strategy for solving the explore/exploit problem that arises when running experiments. It may be useful, for example, when a publisher is evaluating a number of different ad designs to determine which has the best click-through rate. At any given time the experimenter may decide to display the ad which has performed best so far (“exploit”), or try out different ads that might perform better (“explore”).

Ted Dunning showed a video demonstration of randomized probability matching during a recent Mahout presentation in New York. The following video is based on Ted’s, and the python code that generates it gives some insight into how randomized probability matching works.

The upper plot of the animation represents each ad design with a different color. The click/no click events for each ad are sampled from a bernoulli distribution with $P(Click)$ for each color based the small arrows in the top plot. In a real world experiment, these would be the unknown values that the experimenter is trying to discern. The curves in the plot represent the uncertainty about the true $P(Click)$ values for each ad, or in other words, $P(P(Click) == x)$. They change over time as experiments are run. The beta distribution is a conjugate prior for the bernoulli trials and is simple to update as new results come in — it is entirely parameterized by the number of click and no-click events seen so far.

The lower plot shows a rolling average of the regret that we experience over time given our experimental choices. The regret for a given experiment represents the difference between the click probability for the ad design that was chosen and the click probability of the optimal ad design. Of course, normally these would be the unknown values we would be trying to calculate in the first place, but here we know them since we are sampling from distributions that we created. The regret declines over time as the beta distributions narrow, we become more certain about the true click probabilities, and shift more of our experiments to the best performing ad designs.

At each frame of the animation, our code runs an experiment and updates the beta distribution for the corresponding ad design. The find_arm_to_run function draws a number of samples from each of the beta distributions and randomly chooses an experiment based on the probability that the sample from that design has the highest click probability.

Steven Scott’s article has additional theoretical and experimental results on randomized probability matching and multi-armed bandits.

Jon Sondag
Data Scientist

Post Comments 0