Long Nguyen / Some notes on Z-Test
Created 2021-03-09 Modified 2021-03-09

593 Words

The Z-Test is a nullhypothesis significance testing method I used more than once for my statistics course. Unfortunately I can not find reference on how to generate the P value to solve my problem, since common literature primarily demonstrates the variant of critical value region, which only gives a boolean answer without any further information about the spectrum.

One-sample Z-Test

Assuming independent and identically distributed random variables (i.i.d) $X_1, …, X_n$ with $X_i \sim N(\mu, \sigma^2)$ and $\sigma^2$ is known and $\mu$ is unknown. For the following null hypothesis testing problems

$$\text{Two-sided: }H_0: \mu = \mu_0 \text{ against } H_1: \mu \neq \mu_0$$ $$\text{One-sided less: }H_0: \mu \geq \mu_0 \text { against } H_1: \mu < \mu_0$$ $$\text{One-sided greater: } H_0: \mu \leq \mu_0 \text { against } H_1: \mu > \mu_0$$

we wish to obtain the test statistic of the sample, which can be achieved by standardize the sample mean with the expected population mean

$$Z = \frac{\bar{X} - \mu_0}{\sigma} \sim N(\mu=0, \sigma^2=1) \text { where } \bar{X} \text{ is the sample mean of } X_1, …, X_n$$

Critical region

The first method to make a decision, rejecting null hypothesis, based on the test statistic $Z$ is comparing $Z$ with the quantile value of the standard normal distribution $\phi \sim N(\mu=0, \sigma^2=1)$

$$\text{Two-sided: } |Z| > \phi_{cdf^{-1}}(1 - \alpha/2)$$ $$\text{One-sided less: } Z < -\phi_{cdf^{-1}}(1 - \alpha)$$ $$\text{One-sided greater: } Z > \phi_{cdf^{-1}}(1 - \alpha)$$

where $\alpha$ is our probability of error.

P value

With the test statistic, we are also able to compute directly the P value (which is basically just an inverse of the critical region method)

$$\text{Two-sided: } p = (1 - \phi_{cdf}(|Z|)) \cdot 2$$ $$\text{One-sided less: } p = \phi_{cdf}(Z)$$ $$\text{One-sided greater: } p = 1 - \phi_{cdf}(|Z|)$$

Notes for me

The heart of Z-Test based on the nice property of any random variable $X$, which can be transformed linearly by $$Z = \frac{X - \mu}{\sigma}$$

from that we obtained $E(Z) = 0$ and $Var(Z) = 1$. And since the transformation of a normal distribution is also a normal distribution, we can exploit the popularity of standard normal distribution’s Z-Table to compute the P-Value easily.

Two-independent-sample Z-Test

Assuming two sets of i.i.d with $X_1, …, X_n$ and $Y_1, …, Y_m$ with $X_i \sim N(\mu_1, \sigma_1^2)$ and $Y_i \sim N(\mu_2, \sigma_2^2)$. It is assumed that $\sigma_1$ and $\sigma_2$ are known while $\mu_1$ and $\mu_2$ remain unknown. For the following testing problems

$$\text{Two-sided: }H_0: \bar{X} = \bar{Y} \text{ against } H_1: \bar{X} \neq \bar{Y}$$ $$\text{One-sided less: }H_0: \bar{X} \geq \bar{Y} \text { against } H_1: \bar{X} < \bar{Y}$$ $$\text{One-sided greater: } H_0: \bar{X} \leq \bar{Y} \text { against } H_1: \bar{X} > \bar{Y}$$

The computation of the Z-statistic is a bit different this time

$$Z = \frac{\bar{X} - \bar{Y}}{\sigma_1 / \sqrt{n} + \sigma_2 / \sqrt{m}} \sim N(\mu=0, \sigma^2=1)$$

The computations of critical region and P value are same from this point on.

Two-dependent-sample Z-Test

For this test to be valid, we assume two sets of i.i.d $X_1,…,X_n$ and $Y_1, …, Y_n$ to be from the same population with $X_i$ and $Y_i$ measurements of the same test object. The test hypotheses are the same as Two-independent-sample Z-Test, only the computation of the Z-statistic is different.

We assign $\delta$ as the paired-sum of difference between $X$ and $Y$ defined as $\delta = \sum_{i=1}^{n} X_i - Y_i$, we can also assume $\delta$ to be normal distributed, hence

$$Z = \sqrt{n} \frac{\bar\delta - \mu_0}{\sigma} \sim N(\mu = 0, \sigma^2=1)$$

The computation of critical region and P value are the same from this point on.