EXAMPLE: Bandit Problems
In many economically significant environments, agents must repeatedly choose among uncertain alternatives about which they can learn only through experimentation. Examples include the situations of a shopper deciding whether to purchase his favorite brand of orange juice or experiment with a new one he has never tried and an oil company deciding whether to continue testing a tract of land or to move its equipment to another tract. If these agents do not experiment enough, they can lose considerable welfare: the shopper could miss out on a delicious new brand of juice he would purchase and enjoy in the future, and the oil company may engage in an expensive recovery operation based on too few good test results. On the other hand, if these agents experiment too much, they may lose welfare as they pursue inferior choices.
Professor Christopher Anderson has studied behavior in an abstract version of these problems, multi-armed bandit problems. In these experiments, subjects are presented with several alternatives from which they must choose to receive a payoff in each of several periods. The payoff is the sum of an unknown underlying average randomly determined at the beginning of the experiment and a random noise value which is chosen each period. Given this structure, a good payoff may come from an alternative with a good average payoff observed with a small noise value, or a poor average payoff observed with a large noise value. In each period, subjects must choose between selecting the alternative which has given them the highest payoffs in the past and learning more about alternatives about which less is known, because they have been chosen fewer times, and possibly learn that one of them yields higher payoffs.
Here are links to the actual experimental stimuli and working papers (.pdf format) providing detailed explanations and results from each. The experiment examples below are for demonstration purpose only, so there is no cash reward at this time.
- Working paper: A Laboratory Study of Present Bias in Bandit Problems” (478Kb).
Experiment – example 1: Subjects choose among arms with normally distributed average payoffs and noise.
- Working paper: “Ambiguity Aversion in Multi-armed Bandit Problems” (366Kb).
Experiment – example 2: Subjects choose between two arms with beta distributed means and Bernoulli distributed payoffs.
Experiment – example 3: Subjects purchase information about two arms with beta priors and Bernoulli distributed payoffs.