Same data, different outcome

2 min read

The usual scheme

The procedure goes like following; we have a court case in front of uns and must decide whether the defendant is guilty or not. There is no certainty for the decision, so we can only make a decision on the basis of the presented evidence. The more evidence speaking against the defendant, the more confident we can consider the defendant to be guilty. Without enough evidence, however, the defendant can be guilty, but does not have to be.

Null-hypothesis test works like that, the further the likelihood of the evidence in the direction of the null-hypothesis' curve’s tail, the more likely we can reject the null-hypothesis.

The twist of the problem is, depending on how we define the curve, the likelihood can differ to a certain extent where the decision can be totally different.

With a coin

Suppose we have a coin for which we would like to know if the coin is fair or not. For this purpose, we collect evidence by flip it 24 times and obtained 7 heads. Is the coin in this case fair?

The answer is: it depends on how we designed the experiment.

Intention to flip the coin 24 times and see how many heads we obtained

In this case the random variable of interested result is binomial distributed, applying a two-sided binomial test yields an almost significant p-value. Therefore, we can not say anything about the coin at the moment.

>>> scipy.stats.binom_test(k=7, n=24, p=0.5)

Intention to flip the coin until we get 7 heads and see how many flips we needed

Since the intention is different, the probability distribution also changed, we now have a negative binomial distribution. This only small change led up to a very big change of the outcome. The p-value is now at whooping 0.01, the coin can certainly not be fair.

>>> 1 - scipy.stats.nbinom.cdf(k=24 - 7, n=7, p=0.5)


Always take p-value with a grain of salt. The p-value of an event depends on the outcome space where the event takes place, changing the outcome space does not change the event, but changes the probability that the event happens.