Quidest?

e-greedy

· Lorenzo Drumond

A version of the greedy strategy that introduces exploration:

The idea is to take either the most optimal (known) choice with probability $(1-e)$, or a random choice with probability $e$.

Therefore we can balance how much exploration we want by tuning the $e$ parameter: 0 = full exploitation, 1 = full exploration

Example

In the bernoulli 3-armed bandit example (greedy-strategy), with $e=0.1$ we would pick a random option for 10% of the time. This ensures we don’t get stuck in unoptimal choices.

A drawback of this strategy is that we will always choose an unoptimal action at most $e$ per cent of the times.

The cost of this exploration can be calculated in the expected regret of picking a random action:

1[(0.8-0.3) + (0.8 - 0.7) - (0.8-0.8)]/3 = 0.2

Therefore we would have 0 loss with 0.9 probability, and 0.2 loss with 0.1 probability. Which mean that the average regret can only reduce not lower than

1(1-e) * 0 + e * 0.2 = 0.02

References

Next -> decaying-e-greedy

#medium #exploration #exploitation #greedy #statistics #strategy #regret #math #multi_armed #bandits #tradeoff