Quidest?

Bayesian Inference - PyMC example

· Lorenzo Drumond

task: You are given a series of daily text-message counts from a user of your system. The data, plotted over time, appears in the chart below. You are curious to know if the user’s text-messaging habits have changed over time, either gradually or suddenly. How can you model this?

a Poisson random variable is a very appropriate model for this type of count data.

Denoting day i’s text-message count as $C_i$: $C_i ~ Poisson(\lambda)$

We are not sure what the value of $\lambda$ could be. Looking at the data (fictitious), it seems it may change at a later date.

How can we represent this observation mathematically? Let’s assume that on some day during the observation period (call it $\tau$), the parameter $\lambda$ suddenly jumps to a higher value. So we really have two $\lambda$ parameters: one for the period before $\tau$, and one for the rest of the observation period. In the literature, a sudden transition like this would be called a switchpoint:

1\lambda_1  & \text{if } t \lt \tau \cr
2\lambda_2 & \text{if } t \ge \tau

We are interested in inferring the unknown $\lambda$s. To use Bayesian inference, we need to assign prior probabilities to the different possible values of $\lambda$. What would be good prior probability distributions for $\lambda_1$ and $\lambda_2$? Recall that $\lambda$ can be any positive number. As we saw earlier, the exponential distribution provides a continuous density function for positive numbers, so it might be a good choice for modeling $\lambda_i$. But recall that the exponential distribution takes a parameter of its own, so we’ll need to include that parameter in our model. Let’s call that parameter $\alpha$.

1\lambda_1 \sim \text{Exp}( \alpha )
2\lambda_2 \sim \text{Exp}( \alpha )

$\alpha$ is called a hyper-parameter or parent variable. In literal terms, it is a parameter that influences other parameters. Our initial guess at $\alpha$ does not influence the model too strongly, so we have some flexibility in our choice. A good rule of thumb is to set the exponential parameter equal to the inverse of the average of the count data. Since we’re modeling $\lambda$ using an exponential distribution, we can use the expected value identity shown earlier to get:

1\frac{1}{N}\sum_{i=0}^N \;C_i \approx E[\; \lambda \; |\; \alpha ] = \frac{1}{\alpha}

What about $\tau$? Because of the noisiness of the data, it’s difficult to pick out a priori when $\tau$ might have occurred. Instead, we can assign a uniform prior belief to every possible day. This is equivalent to saying

1\tau \sim \text{DiscreteUniform(1,70) }
2\Rightarrow P( \tau = k ) = \frac{1}{70}

So after all this, what does our overall prior distribution for the unknown variables look like? Frankly, it doesn’t matter. What we should understand is that it’s an ugly, complicated mess involving symbols only a mathematician could love. And things will only get uglier the more complicated our models become. Regardless, all we really care about is the posterior distribution.

We next turn to PyMC, a Python library for performing Bayesian analysis that is undaunted by the mathematical monster we have created.

PyMC code is easy to read. The only novel thing should be the syntax. Simply remember that we are representing the model’s components ($\tau, \lambda_1, \lambda_2$ ) as variables.

 1import pymc as pm
 2
 3with pm.Model() as model:
 4    alpha = 1.0/count_data.mean()  # Recall count_data is the
 5                                   # variable that holds our txt counts
 6    lambda_1 = pm.Exponential("lambda_1", alpha)
 7    lambda_2 = pm.Exponential("lambda_2", alpha)
 8
 9    tau = pm.DiscreteUniform("tau", lower=0, upper=n_count_data - 1)
10
11    idx = np.arange(n_count_data) # Index
12    lambda_ = pm.math.switch(tau > idx, lambda_1, lambda_2)
13    # The switch() function assigns lambda_1 or lambda_2 as the value of lambda_, depending on what side of tau we are on. The values of lambda_ up until tau are lambda_1 and the values afterwards are lambda_2.
14
15    observation = pm.Poisson("obs", lambda_, observed=count_data)
16    # The variable observation combines our data, count_data, with our proposed data-generation scheme, given by the variable lambda_, through the observed keyword.
17
18    step = pm.Metropolis()
19    trace = pm.sample(10000, tune=5000, step=step, return_inferencedata=False)
20
21lambda_1_samples = trace['lambda_1']
22lambda_2_samples = trace['lambda_2']
23tau_samples = trace['tau']

Recall that Bayesian methodology returns a distribution. Hence we now have distributions to describe the unknown $\lambda$s and $\tau$. What have we gained? Immediately, we can see the uncertainty in our estimates: the wider the distribution, the less certain our posterior belief should be. We can also see what the plausible values for the parameters are: $\lambda_1$ is around 18 and $\lambda_2$ is around 23. The posterior distributions of the two $\lambda$s are clearly distinct, indicating that it is indeed likely that there was a change in the user’s text-message behaviour.

References

#bayesian #introduction #probabilistic #distribution #hacker #programming #statistics #pymc #inference #machine_learning