Hypothesis Testing
Hypothesis Testing, Analysis of Variance
[!tldr] Summary In the XIX century, practical problems in production prompted a key advance in statistics. William Sealy Gosset, a statistician employed by the Guinness brewery, needed a reliable way to tell whether changes in process or ingredients really altered batch yields when only a handful of trials could be run. He developed a simple, pragmatic approach: run matched small experiments, compute the differences for each pair, and ask whether the observed differences are plausibly just sampling noise. To handle the extra uncertainty that comes with very small samples, Gosset derived the sampling rule now known as Student’s t, a test that compares the sample mean to the estimated variation in the data.
[!quote] Quote .
Principle of Hypothesis Testing
Generalities
Suppose we are given a random variable \(X\) whose probability distribution depends on the hypotheses we wish to test. Suppose there are several perfectly known hypotheses \(\mathcal{H}_{0}, \mathcal{H}_{1}, \ldots, \mathcal{H}_{n}\) (finite or infinite, countable or not) and that the probability distribution of \(X\) depends on the true hypothesise. The test will allow us to make a judgment on the hypothesis made and to evaluate the degree of validity of the judgment, based on the value taken by \(X\) on a sample.
We will first study the case where we have two simple hypotheses \(\mathcal{H}_{0}\) and \(\mathcal{H}_{1}\). A hypothesis is said to be simple if it completely and uniquely defines the probability distribution of \(X\) otherwise, it is said to be composite. For example, in the presence of a batch of parts distinguished as suitable and defective, the two hypotheses:
- \(\mathcal{H}_0\): the batch contains 5% defects
- \(\mathcal{H}_{1}\): the batch contains 10% defects are simple hypotheses since each of them fully defines the batch. Whereas the two hypotheses:
- \(\mathcal{H}_0\): the batch contains 5% or less than 5% defects
- \(\mathcal{H}_{1}\): the batch contains more than 5% defects are composite hypotheses since neither one fully defines the batch.
Suppose then that there are two simple hypotheses \(\mathcal{H}_{0}\) and \(\mathcal{H}_{1}\), covering all possibilities. This means that one or the other of the two is necessarily true. In this case, it is possible to make one of the two judgments:
- \(\mathcal{H}_0\) is true, therefore \(\mathcal{H}_{1}\) is false
- \(\mathcal{H}_{1}\) is true, therefore \(\mathcal{H}_{0}\) is false
We can symbolize this set by the table below where the possible states are in the first row and the judgments made are in the first column. The table contains the consequences of the different combinations.
| Realised state | ||
|---|---|---|
| Judgment | \(\mathcal{H}_0\) is realised | \(\mathcal{H}_1\) is realised |
| \(\mathcal{H}_0a\) accepted | Correct judgment | False judgment |
| \(\mathcal{H}_1\) accepted | False judgment | Correct judgment |
Among the two hypotheses, \(\mathcal{H}_{0}\) and \(\mathcal{H}_{1}\), there is generally one whose incorrect rejection has more serious consequences than the other. It is therefore normal not to treat the two hypotheses symmetrically. Thus, we can make two types of errors:
- Type I error, which is the probability of rejecting \(\mathcal{H}_{0}\) when \(\mathcal{H}_{0}\) is true
- Type II error, which is the probability of accepting \(\mathcal{H}_{0}\) when \(\mathcal{H}_{1}\) is true To link the judgment made to the observation of the variable \(X\), we proceed as follows:
- We say that we accept \(\mathcal{H}_{0}\) if the observed value of \(X\), say \(x\), is in a certain domain \(\mathcal{A}\) called the acceptance region of the hypothesis \(\mathcal{H}_{0}\).
- We say that we reject \(\mathcal{H}_{0}\) and accept \(\mathcal{H}_{1}\) is the observed value belongs to \(\bar{\mathcal{A}}\), called the critical region or rejection region.
To choose the domain \(\mathcal{A}\), we generally impose two conditions:
- The probability of making a type I error is equal to a determined threshold \(\alpha\) chosen a priori as small as desired
- The probability \(\beta\) of making a type II error is minimal
In general, the first condition is not sufficient to uniquely define \(\mathcal{A}\). We have now:
| Realised state | ||
|---|---|---|
| Judgment | \(\mathcal{H}_0\) is realised | \(\mathcal{H}_1\) is realised |
| \(\mathcal{H}_0a\) accepted, realisation of \(\{ X \in \mathcal{A}\}\) | Correct judgment \(1 - \alpha\) | False judgment, type II error \(\beta\) |
| \(\mathcal{H}_1\) accepted, realisation of \(\{ X \notin \mathcal{A} \}\) | False judgment, type I error: \(\alpha\) | Correct judgment, \(1 - \beta\) |
Such a model of reasoning is called hypothesis testing. The complement to one of \(\beta\), i.e., \((1 - \beta)\), is called the power of the test: a test is more powerful for a fixe type I error, the smaller the type II error risk. The hypothesis \(\mathcal{H}_{0}\) on which the test will be conducted is called the null hypothesis.
Case of composite hypotheses
In real life, often the problem is not to chose between two simples hypotheses \(\mathcal{H}_{0}\) and \(\mathcal{H}_{1}\), but between a simple hypothesis \(\mathcal{H}_{0}\) and a set of hypotheses \(\mathcal{H}_{1}, \mathcal{H}_{2}, \ldots , \mathcal{H}_{n}\) or a continuous set of hypotheses \(\mathcal{H}\). We can reduce the first problem by succesivelely comparing \(\mathcal{H}_{0}\) to each \(\mathcal{H}_{i}\) and find regions \(\mathcal{A}_{i}\). We thus obtain a set of acceptance regions \(\mathcal{A}_{1}, \mathcal{A}_{2}, \ldots , \mathcal{A}_{i}, \ldots , \mathcal{A}_{n}\). In a spacial case, the acceptance regions have a common part \(\mathcal{A}\). In this domain, the test used is said to me uniformly most powerful (UMP). Indeed, when \(\mathcal{X}\) falls in \(\mathcal{A}\), the type I error risk is equal to \(\alpha\) and the type II error risk is minimal, whatever the hypothesis \(\mathcal{H}\) verified. Thus, depending on the form of the alternative hypothesis \(\mathcal{H}\) we are led to choose to concentrate the risk \(\alpha\) at one end of the distribution or, on the contrary, to distribute it at both ends.
Suppose, for example, that we are comparing two hypotheses of the form \(\mathcal{H}_{0} \ : \ \theta = \theta_{0}\) and \(\mathcal{H}_{1} \ : \ \theta > \theta_{0}\). We are led to what is called a right-tailed test, where the type I error risk is fixed on the right of the distribution of the decision variable \(X\) (under \(\mathcal{H}_0\)) Similarly, the hypothesis test of the form \(\mathcal{H}_{0} \ : \ \theta = \theta_{0}\) and \(\mathcal{H}_{1} \ : \ \theta < \theta_{0}\) leads to a left-tailed test. Finally, in the case of hypotheses of the form \(\mathcal{H}_{0} \ : \ \theta = \theta_{0}\) and \(\mathcal{H}_{1} \ : \ \theta \neq \theta_{0}\) it seems logical to distribute the risk αα at both ends of the distribution. The test is then said to be symmetric or two-tailed
General Approach of a Test
The general approach of a test consists of the following:
- Choose and formulate the hypotheses \(\mathcal{H}_{0}\) and \(\mathcal{H}_{1}\)
- Determine the decision variable, the one with which the test will actually be conducted and which will be evaluated on a sample, as well as its probability distribution under the null hypothesis
- Choose the risk \(\alpha\) and compute the acceptance region or, preferably, the critical region that depends on it.
- Optionally, compute the power of the test \(1 - \beta\)
- compute the experimental value of the decision variable from a sample.
- Compare it to the critical region to conclude by rejecting or not rejecting the hypothesis \(\mathcal{H}_{0}\).
It should be kept in mind, given the asymmetry of the two tested hypotheses, that the conclusion of the test is stronger when we reject the hypothesis \(\mathcal{H}_{0}\) with non-rejection not equating to truth. Thus, most often, we will choose the hypotheses such that \(\mathcal{H}_{1}\) is the hypothesis for which we would like to conclude.
Conformity Tests to a Standard
Reminder of Common Probability Laws
Check the other notes
Conformity Test on the Mean of a Normal Random Variable with Known Variance
This test is based on the variable \(\bar{X}\), the empirical mean of an \(n\)-sample. The procedure is:
- Make the hypothesis \(\mathcal{H}_{0} \ : \mu = \mu_{0}\), the alternative hypothesis being \(\mathcal{H}_{1} \ \mu \neq \mu_{0}\).
- It follows, under \(\mathcal{H}_{0}\) that \(\bar{X}\) follows the distribution \(\mathcal{N}(\mu_{0}, \frac{\sigma^{2}}{n})\) and consequently, \(U = \frac{\bar{X} - \mu_{0}}{\sigma/\sqrt{n}}\sim\mathcal{N}(0,1)\)
- Let us set a risk \(\alpha\). This results in a certain interval \([-u_{\alpha/2},u_{\alpha/2}]\) in which the decision variable \(U\) has a probability \(1- \alpha\)of falling if \(\mathcal{H}_{0}\) is correct, consequently, outside of which \(U\) has a small probability \(\alpha\) of falling. In other words: if \(\mathcal{H}_{0}\) is true, \(\mathbb{P}\frac{\lvert U \lvert > u_{\alpha}{2}}= \alpha\) and neglecting the probability \(\alpha\) means considering it is very unlikely to fond \(U\) outside the interval if the hypothesis is true
- From the data of the sample actually obtained \((x_{1}, x_{2}, \ldots, x_{n})\) compute the value \(u\) of \(U\) and place it relative to the interval \([-u_{\alpha/2},u_{\alpha/2}]\). Conclude as follows:
- If \(u\) falls outside the interval, we prefer to reject the hypothesis, knowing however that we assume the risk \(\alpha\) of rejecting it wrongly.
- If \(u\) falls inside the interval, this does not mean, unfortunately, that the hypothesis made is true, but only that the data collected are not in contradiction with this hypothesis. In other words, we are unable to conclude either in favor or against the hypothesis. We will see that in practical applications, this is generally less problematic than it seems, because it is against a wrong rejection of the hypothesis that we must guard, the conservation of the hypothesis corresponding to the status quo.
Example 1
Imperial specifications state that each TIE Fighter produced on Lothal must have a shield energy of 2.5 units. A random sample of 100 TIE Fighters is tested and analyzed. They have an average shield energy of 2.6 units, with a known standard deviation of 0.4 units and a normal distribution. Does the tested batch meet specifications at a 5% risk level?
We test \(\mathcal{H}_{0} : \mu = 2.5\) against \(\mathcal{H}_{1}: \mu \neq 2.5\) (two-tailed test). Under \(\mathcal{H}_{0}\), the statistic
\[U = \frac{\bar{X}- \mu}{\sigma/\sqrt{n}} = \frac{\bar{X}-2.5}{0.4/\sqrt{100}} \sim \mathcal{N}(0,1)\]At \(\alpha=5\%\), the critical region is \(\{U < -1.96 \} \cup \{U > 1.96\}\). We compute:
\[u = \frac{2.6 - 2.5}{0.4/10}=2.5\]Since \(2.5 > 1.96\), \(u\) falls in the critical region. We reject \(H_{0}\) at the 5% level.
Conformity Test on the Mean of a Normal Variable with Unknown Variance
This test is based on the variables \(\bar{X}\) and \(S^{2}\) respectively the empirical mean and variance of an \(n\)-sample. We know that: \(T = \frac{\bar{X}- \mu_{0}}{S/\sqrt{n-1}} \sim \mathcal{T}(n-1)\)his is our decision variable. The test then consists of placing the quantity \(t = \frac{\bar{x} - \mu_{0}}{s/\sqrt{n-1}}\) relative to the interval \([-t_{\alpha/2},t_{\alpha/2}]\) read in the Student’s t-table with \((n-1)\) degrees of freedom and defined such that \(\mathbb{P}(\lvert T \lvert > t_{\alpha/2})=\alpha\)
Example 2
Coming back to example 1, now considering the standard deviation of the population is unknown, the measured standard deviation in the sample is 0.6.
We test the hypothesis \(\mathcal{H}_{0} \mu=2.5\) against \(\mathcal{H}_{1}: \mu \neq 2.5\). (bilateral) Under \(H_{0}\), the variable \(T = \frac{\bar{X} - \mu}{S/\sqrt{n-1}}\) follows \(\mathcal{T}(n-1)\). At the 5% level, the critical region is \(\{T < -1.98 \} \cup \{T > 1.98\}\). We compute \(t=1.66\) which is not in the critical region. We cannot reject \(H_0\) and level \(\alpha\).
Conformity Test on the Variance of a Normal Variable
This test is based on the variable \(S^{2}\), the empirical variance of an \(n\)-sample. Assuming the hypothethis \(\mathcal{H}_{0} \ : \ \sigma^{2} = \sigma_{0}^{2}\), we know hat: \(Z = \frac{nS^{2}}{\sigma_{0}^{2}} \sim \chi^{2}(n-1)\)It follows that, if the hypothesis is true, \(Z\) has the probabilit \(1 - \alpha\) of falling in the interval \([z_{1}, z_{2}]\) where \(z_{1}\) and \(z_{2}\) are read in the table of \(\chi^{2}\) distribution with \((n-1)\) degrees of freedom and such that \(\mathbb{P}(Z < z_{1}) = \mathbb{P}(Z > z_{2}) = \alpha/2\). It is then sufficient to compute the value \(z = ns^{2}/\sigma^{2}_{0}\) form the observations and to place it relative to the interval \([z_{1}, z_{2}]\) to conclude.
Example 3: Resistance Variability of Kylo Ren’s Lightsaber Crystals
A user of Kylo Ren’s lightsaber crystals suspects that the standard deviation of their resistance is greater than the manufacturer’s specified value of 0.75 newtons. To verify this, resistance tests are conducted on several crystals, yielding the following values:
| # | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
|---|---|---|---|---|---|---|---|---|---|
| Resistance (N) | 72.1 | 74.5 | 72.8 | 75.0 | 73.4 | 75.4 | 76.1 | 73.5 | 74.1 |
Assuming the resistance is normally distributed, can we conclude that the standard deviation exceeds the specification at a 5% significance level?
- Formulate the hypotheses:
- \[H_{0}: \sigma^{2}= 0.75^{2}\]
- \[H_{1}: \sigma^{2}> 0.75^2\]
- Compute the empirical variance and standard deviation
\(s^{2}=1.4667, \quad s=1.2111\)
- Under \(H_{0}\), the variable \(Z = \frac{9S^2}{\sigma^2}\) follows a \(\chi^{2}(9-1)\) distribution. At a 5% significance level, the critical value is 15.5
- Compute \(z\):
\(z = \frac{9 \times 1.4667}{0.75^{2}}= 23.47\) This value exceeds the critical value.
Paired Data Test
We want to compare two measuring devices and, to do this, we use \(n\) supports by proceeding, on each of them, to two measurements using the two devices under examination. The two measurements will be said to be paired. We present the results as
| Article | 1 | 2 | … | \(i\) | … | \(n\) |
|---|---|---|---|---|---|---|
| Series 1 | \(x_1\) | \(x_2\) | … | \(x_i\) | … | \(x_n\) |
| Series 2 | \(y_1\) | \(y_2\) | … | \(y_i\) | … | \(y_n\) |
Let \(d_{i}= (y_{i}-x_{i})\) and let \(d\) and \(s_{d}\) be the mean and standard deviation of the differences. We assume that the \(d_i\) are realisation of a variable \(D\) that follows a normal distribution. The test of the hypothesis \(\mathcal{H}_{0}: \ \mathbb{E}(D) = 0\) is the conformity test presented earlier with \(\mu_{0} = 0\).
Example 4: Precision of Macaron Diameter Measurements
We have two methods for measuring the diameter of macarons. On a sample of 9 macarons, we obtain the following results, which correspond to the difference between the value predicted by one method and the value actually measured. At a 5% significance level, is there a systematic bias in the estimation and measurement?
| Method A | 1.286 | 1.251 | 1.322 | 1.339 | 1.250 | 1.402 | 1.365 | 1.437 | 1.459 |
|---|---|---|---|---|---|---|---|---|---|
| Method B | 1.061 | 1.092 | 1.363 | 1.362 | 1.265 | 1.278 | 1.237 | 1.386 | 1.352 |
| d=A−B | 0.225 | 0.159 | -0.041 | -0.023 | -0.015 | 0.124 | 0.128 | 0.051 | 0.107 |
Let \(\mu_{D}= \mathbb{E}(D)\). We test the hypothesis:
- \[H_{0}: \mu_{D}= 0\]
- \[H_{1}: \mu_{D}\neq 0\]
Under \(H_0\), the variable \(T=\frac{\bar{D}-\mu_{D}}{S/\sqrt{n-1}}=\frac{\bar{D}-0}{S/\sqrt{8}}\) follows a \(\mathcal{T}(8)\) distribution. At a 5% significance level, the critical region is \(\{T < -2.306\} \cup \{T > 2.306\}\). Computation of \(t\):
\[t= \frac{0.0794 - 0}{0.0865/\sqrt{8}}=2.598\]This value falls in the critical region. Conclusion: We reject \(H_0\) and conclude that there is a significant difference between the two methods at the 5% significance level.
Comparison Tests of 2 Normal Populations
Comparing two normal populations amounts to asking whether they have the same mean and the same variance since these two parameters are sufficient to fully determine a normal distribution. For theoretical reasons that will appear in a following paragraph, the comparison of variances must precede that of means.
Variance Comparison
Let \(n_1\) and \(s_{1}^{2}\) be the size and variance of the sample taken from the first population, and let \(n_{2}\) and \(s_{2}^{2}\) be the size and variance of the sample taken from the second population. We know that the unbiased estimates of the variances \(\sigma_{1}^{2}\) and \(\sigma_{2}^{2}\) of the two populations are written as: \(s_{1}^{*2}= \frac{n_{1}s_{1}^{2}}{n_{1}-1} \ \text{and} \ \frac{n_{2}s_{2}^{2}}{n_{2}-1}\) Under the null hypothesis of equality of the variances of the two populations (\(\sigma_{1}^{2}=\sigma_{2}^{2}\)), these two estimates differ only due to the randomness of sampling. The same is true for their ratio \(f=s_{1}^{*2}/s_{2}^{*2}\) which differs from 1 only due to the randomness of sampling. The statistician Ronald Aylmer Fisher, a British biologist and mathematician and author of the classic test we are about to present, retained this form and computed the probability distribution of the variable: \(F(\nu_{1}, \nu_{2}) = \frac{Z_{1}/\nu_{1}}{Z_{2}/\nu_{2}}\) where \(Z_{1}\) and \(Z_{2}\) are two independent random variables that follows \(\chi^{2}\) distributions with \(\nu_{1}\) and \(\nu_{2}\) degrees of freedom, respectively.
Theorem:
If we denote by \(S_{1}^{2}\) and \(S_{2}^{2}\), the empirical variances of two samples taken from normal populations, then the variable:
\[F = \frac { \frac{n_{1}S_{1}^{2}}{\sigma_{1}^{2}} /(n_{1}-1) } { \frac{n_{2}S_{2}^{2}}{\sigma_{2}^{2}} /(n_{2}-1) } = \frac { S_{1}^{*2}/\sigma_1^{2} } { S_{2}^{*2}/\sigma_2^{2} }\]follows a Fisher distribution with \(n_{1}-1\) and \(n_{2}-1\) degrees of freedom, denoted \(\mathcal{F}(n_{1}-1; n_{2}-1)\).
Corollary
Thus, under the hypothesis \(\mathcal{H}_{0} \ : \ \sigma_{1}^{2}=\sigma_{2}^{2}=\sigma^{2}\), we have: \(F = \frac { n_{1}S_{1}^{2}/(n_{1}-1) } { n_{2}S_{2}^{2}/(n_{2}-1) } = \frac{S_{1}^{*2}}{S_{1}^{*2}} \sim \mathcal{F}(n_{1}-1, n_{2}-1)\)
This variable being defined, the sequence of operations is now well known. Setting a negligible probability threshold αα, we read from the Fisher-Snedecor table with \(n_{1}-1\) and \(n_{2}-2\) degrees of freedom, the values \(f_1\) and \(f_2\). More often called Fisher’s test, this test sometimes takes the name Fisher-Snedecor test, Snedecor’s test, or F-test.
Estimation of \(\sigma^{2}\)
Assuming that the result of the previous test does not oppose the hypothesis of equality of variances, it is useful to estimate the common value \(\sigma^{2}\) of the variances of the two populations. We will need this particularly for the mean comparison test presented in the following section.
Since, under the hypothesis of equality of variance, \(\frac{n_{1}S_{1}^{2}}{\sigma^{2}}\) and \(\frac{n_{2}S_{2}^{2}}{\sigma^{2}}\) are independent variables that follow \(\chi^{2}\) distributions, with \(n_{1}-1\) and \(n_{2}-1\) degrees of freedom, respectively, their sum \(\frac{n_{1}S_{1}^{2} + n_{2}S_{2}^{2}}{\sigma^{2}}\) follows a \(\chi^{2}\) distribution with \(c\) degrees of freedom, whose expectation and variance are \(n_{1}+n_{2}-2\) and \(2(n_{1}+ n_{2} -2)\), respectively. It follows:
Theorem
Making the hypothesis \(\mathcal{H}_{0}\ : \ \sigma_{1}^{2}=\sigma_{2}^{2}=\sigma^{2}\), the variable: \(S^{*2} = \frac {n_{1}S_{1}^{2} + n_{2}S_{2}^{2}} {n_{1}+n_{2}-2}\)
Mean comparison
Once the equality of variances has been tested in the case of normal populations, it is sufficient to test the equality of means to consider that the populations are identical. The theoretical reasons that lead to presenting the comparison of variances before that of means can, at this stage, be explained. Indeed, the test for comparing variances made no assumption about the equality of means. On the other hand, the test for equality of means requires the equality of variances. It is therefore necessary to verify this equality before being interested in the means.
That being said, let there be two populations \(\mathcal{P}_{1}\) and \(\mathcal{P}_{2}\) characterised by normal random variable \(X_1\) and \(X2\) with respective expectations \(\mu_{1}\) and \(\mu_{2}\), but with the same variance \(\sigma^{2}\). Let \(n_{1}\) and \(n_{2}\) be the sizes of the two samples \(\mathcal{E}_1\) and \(\mathcal{E}_{2}\) randomly taken from each of these two populations; let \(\bar{x}_{1}\) and \(\bar{x}_{2}\) be their means and \(s_{1}^{2}\) and \(s_{2}^{2}\) their variances. Under these conditions, we can consider:
- \(\bar{x}_{1}\) is a realisation of the variable \(\bar{X}_{1}\) which follows the law \(\mathcal{N}(\mu_{1}; \sigma^{2}/n_1)\)
- \(\bar{x}_{2}\) is a realisation of the variable \(\bar{X}_{2}\) which follows the law \(\mathcal{N}(\mu_{2}; \sigma^{2}/n_1)\)
- \(s_{1}^{2}\) and \(s_{2}^{2}\) are realisations of the variables \(S_{1}^{2}\) and \(S_{2}^{2}\) such that the variable \(\frac{n_{1}S_{1}^{2} + n_{2}S_{2}^{2}}{\sigma^{2}}\) follows the law \(\chi^{2}(n_{1}+ n_{2}-2)\) and is independent of \(\bar{X}_{1}\) and \(\bar{X}_{2}\).
Let’s now make the hypothesis \(\mathcal{H}_{0}: \ \mu_{1}= \mu_{2} = \mu\). It follows that the variable \(\bar{X}_{1}- \bar{X}_{2}\) follows a normal distribution with zero expectation and variance equal to the sum of the variances of \(\bar{X}_{1}\) and \(\bar{X}_{2}\), i.e. \(\sigma^{2}(\frac{1}{n_{1}} + \frac{1}{n_{2}})\). Therefore:
Theorem
Given two variables \(X_{1}\sim \mathcal{N}(\mu_{1}, \sigma^{2})\) and \(X_{2}\sim \mathcal{N}(\mu_{2}, \sigma^{2})\) under the hypothesis \(\mathcal{H}_{0}: \ \mu_{1} =\mu_{2}\), we have: \(U = \frac{\bar{X}_{1}- \bar{X}_{2}}{\sigma \sqrt{\frac{1}{n_{1}}+ \frac{1}{n_2}}} \sim \mathcal{N}(0,1)\)
To eliminate the unknown quantity \(\sigma\), it is sufficient to consider the quotient:
\[T = \frac { \frac{ \bar{X}_{1}-\bar{X}_{2} }{\sigma \sqrt{\frac{1}{n_{1}}+ \frac{1}{n_{2}}}} } { \sqrt{ \frac{ n_{1}S_{1}^{2}+ n_{2}S_{2}^{2} }{\sigma^{2}} } } = \frac{\bar{X_{1}} - \bar{X}_{2}} {\sqrt{ \frac { n_{1}S_{1}^{2}+ n_{2}S_{2}^2 }{n_{1}+n_{2}-2} }( \frac{1}{n_{1}}+ \frac{1}{n_2} )} = \frac{\bar{X}_{1}- \bar{X}_{2}} {S^{*} \sqrt{ \frac{1}{n_{1}} + \frac{1}{n_{2}} }}\]we have then:
Theorem
Given two variables \(X_{1}\sim \mathcal{N}(\mu_{1}, \sigma^{2})\) and \(X_{2}\sim \mathcal{N}(\mu_{2}, \sigma^{2})\) with \(\sigma^{2}\) unknown, under the hypothesis \(\mathcal{H}_{0}: \ \mu_{1}= \mu_{2}\): \(T = \frac{\bar{X}_{1}- \bar{X}_{2}} {S^{*} \sqrt{ \frac{1}{n_{1}} + \frac{1}{n_{2}} }} \sim \mathcal{T}(n_{1}+n_{2}-2)\) Therefore, under the hypothesis of equality of the means of the two populations considered, the quantity: \(t = \frac{\bar{x}_{1}- \bar{x}_{2}} {s^{*} \sqrt{ \frac{1}{n_{1}} + \frac{1}{n_{2}} }}\) is the realization of a Student’s t-distribution which, to conclude, we only need to place relative to the interval \([-t_{\alpha/2};t_{\alpha/2}]\) corresponding to the chosen risk \(\alpha\). If \(t\) does not belong to the interval, we will say that the difference between the observed means is significant at the risk \(\alpha\), otherwise, that it is not significant.
Estimation of the difference in population means
If the observed difference between the means \(\bar{x}_{1}\) and \(\bar{x}_{2}\) of the samples is significant (of a difference between the means \(\mu_{1}\) and \(\mu_{2}\) of the populations), it may be useful to estimate the difference \(\Delta = \mu_{1}- \mu_{2}\). The variable \(\bar{X}_{1}- \bar{X}_{2}\) is obviously an unbiased estimator of \(\Delta\). As for the determination of the confidence interval, it is based on considering the variable: \(T = \frac{(\bar{X}_{1}- \bar{X}_{2}) - \Delta} {S^{*} \sqrt{\frac{1}{n_{1}}+ \frac{1}{n_{2}}} } \sim \mathcal{T}(n_{1}+ n_{2}-2)\) We therefore have at the risk \(\alpha\):
\[(\bar{x}_{1}-\bar{x}_{2}) - t_{\alpha/2}\times s^{*}\sqrt{\frac{1}{n_{1}}+ \frac{1}{n_{2}}} < \Delta < (\bar{x}_{1}-\bar{x}_{2}) + t_{\alpha/2}\times s^{*}\sqrt{\frac{1}{n_{1}}+ \frac{1}{n_{2}}}\]Example 5: Comparing Two Macaron Recipes
Let’s revisit Example 4, but now assume the values in the table come from two distinct samples. In other words, they are no longer paired. At a 5% significance level, is there a difference between the two recipes?
We recall the data below:
| Recipe A | 1.286 | 1.251 | 1.322 | 1.339 | 1.250 | 1.402 | 1.365 | 1.437 | 1.459 |
|---|---|---|---|---|---|---|---|---|---|
| Recipe B | 1.061 | 1.092 | 1.363 | 1.362 | 1.265 | 1.278 | 1.237 | 1.386 | 1.352 |
We must first compare the variances under the hypothesis \(H_{0}: \sigma^{2}_{A}=\sigma_{B}^{2}\). The alternative hypothesis is \(H_{1} : \sigma_{A}^{2} \neq \sigma_{B}^{2}\). The is a bilateral test. We have:
\[F = \frac{n_{A}S_{A}^{2}/(n_{A}-1)}{n_BS_{B}^{2}/(n_{B}-1)} = \frac{9S^{2}_{A}/8}{9S_{B}^{2}/8} =\frac{S_{A}^{2}}{S_{B}^{2}} \sim \mathcal{F}(8;8)\]At a 5% significance level, the critical region is \(\{F < 0.22\}\cup \{F > 4.43\}\). We compute \(f=\frac{s_{A}^{2}}{s_{B}^{2}} =0.41\) which does not fall in the critical region. we therefore cannot reject the hypothesis of equal variances. For the rest of the exercises, we compute an estimate of the common variance:
\[s^{*2} = \frac{n_{A}s_{A}^{2}+n_{B}s_{B}^{2}}{n_{A}+n_{B}-2} = 0.01\]We now test the hypothesis of equal means, \(H_{0}: \mu_{A}= \mu_{B}\). The alternative hypothesis is \(H_{1}: \mu_{A}\neq \mu_{B}\). Under \(H_{0}\), we have:
\[T = \frac{\bar{X}_{A}-\bar{X}_{B}}{S^{*}\sqrt{\frac{1}{n_{A}}+ \frac{1}{n_{B}}}} = \frac{\bar{X}_{A}-\bar{X}_{B}}{S^{*}\sqrt{\frac{\frac{1}{9}+1}{9}}} \sim \mathcal{T}(n_{A}+n_{B}-2)\]At a 5% significance level, the critical region is \(\{\lvert T \lvert > 2.12\}\). We compute \(t\) = 1.68. Since \(t=1.68\) is less than 2.12, it does not fall in the critical region. We therefore cannot reject \(H_0\). We conclude that there is no significant difference between the two means, unlike what we obtained with the paired test in Example 4.
This result illustrates that the paired test, when applicable, is more powerful than the mean comparison test preceded by the variance comparison test
Fit tests
We have seen in the previous themes some probability laws likely to constitute models for the reference populations. The aim now, in the presence of observations, is to choose the appropriate model and verify that the available observations fit well with it.
The problem, in its most general form, consists of characterizing from the data the type of the reference law, then specifying this law by estimating the parameters that define it completely. In practice, however, we do not proceed exactly in this way. Since the reference laws are most often identified with the fundamental probability laws (binomial law, normal law, log-normal law, etc.), it turns out to be simpler to:
- bring the examined distribution closer to the probability law to which it intuitively (or for theoretical reasons) seems it should be linked;
- then verify the validity of the approximation thus made.
When the fit to one of the fundamental laws proves to be unjustified, it is necessary to resort to other reference laws, and there are a considerable number of them (gamma law, beta law, Pareto law, Gumbel law, Weibull law, etc.), or possibly to create some for the occasion.
Determination of the Type of the reference distribution
There is no particular recipe for determining the type of the reference law to which the observed distribution is suspected to be linked. In general, one is guided by logical considerations or attempts approximations that seem to result from the form of the observed distributions.
In the case of distributions relating to discrete variables, the fit to a discrete uniform law, a binomial law, or a Poisson law is immediately considered.
In the case of continuous variables, the fit to the normal or log-normal law is very often, but not always, legitimate. To verify, before any complicated computation, that the hypothesis of such fits is not a priori absurd, we have simple and rapid means.
The normal law is a symmetric law. Moreover, we have seen that the interval \([\mu - u \sigma, \mu + u \sigma]\) includes approximately the probability \(50\%\) for \(u=1\); 95% for \(u=2\) and almost 100% for \(u=3\).
Therefore, if an observed distribution is such that the frequencies of the observations included within these intervals are close to these probabilities, there is a presumption of normality.
We can also verify this presumption using a transformation known as Henry’s line, which performs an anamorphosis of the cumulative distribution function of a normal law into a straight line.
We can also plot a quantile-quantile diagram in which we represent the quantiles of the observed distribution as a function of the same quantiles of a normal law whose parameters are estimated from the observed distribution. If the distribution is close to a normal law, this diagram takes the form of a straight line with the equation \(y=x\). We can proceed similarly with other probability laws.
Estimation of the parameters of the reference law
The reference law most often depends on a certain number of parameters that need to be estimated to define it completely. A binomial law is entirely defined by the proportion \(p\) to which it corresponds (with \(n\) known). A normal law is entirely defined by its expectation \(\mu\) and its standard deviation \(\sigma\). It is therefore necessary, from the available data, to estimate either the proportion \(p\), or the expectation \(\mu\) and the standard deviation \(\sigma\) of the reference binomial or normal law, to consider only these two examples.
Testing the Goodness-of-Fit
Let there be a sample of size \(n\) of a random variable \(X\), discrete or discretized, providing the random counts \(N_{1}, N_{2}, \ldots, N_{k}\) in \(k\) classes. We propose to test the hypothesis \(\mathcal{H}_{0}\): “the random variable \(X\) follows the law \(\mathcal{L}\)”, the reference law whose choice was discussed above. Once the parameters of this reference law are estimated, it is possible to compute the theoirtical probabilites \(p_{1}, p_{2}, \ldots, p_{k}\) for the \(k\) classes and deduce the theoretical counts, equal to \(np_{i}\) for class \(i\).
The comparison of the observed and theoretical counts highlights more or less strong differences. This is not surprising since, under the hypothesis that the performed fit is justified, the distribution of the theoretical counts \(np_{i}\) is only the limit law of the distribution of the \(N_{i}\). It remains to be seen whether the differences thus highlighted are compatible with the sole randomness of smapling. It is only under this condition that we can consider the performed fit as legitimate. The verification consists of determining the law of a certain function of the set of fluctuations between observed and theroetical coountsn under the hypothesis that these fluctaitons are indeed only due to the randomness of sampling. Under these conditions, we can show that:
Theorem
\(D^{2}= \sum_{i=1}^{k}\frac{(N_{i}- np_{i})}{np_{i}}\) approximately follows a \(\chi^{2}\) distribution with \(k-1-l\) degrees of freedom, where \(l\) is the number of parameters that had to be estimated for the computation of the theoretical counts.
At a low probability threshold \(\alpha\), which can be considered negligible, corresponds a value \(c\) such that the probability of observing the event \(\{D^{2} > c\}\) is equal to \(\alpha\). If the observed value \(d^{2}\) is greater than \(c\), it seems preferable to doubt the hypothesis of the legitimacy of the fit. If, on the contrary, the value pf \(d^{2}\) is less than \(c\), there is nor eason to doubt this hypothesis. Unfortunately, this does not mean that it is true. Now, what we generally wish is to confirm the validity of the considered model. The negative aspect of the statistical test is problematic in this precise case, in the sense that it does not take into account the risk of wrongly retaining the hypothesis.
Note that we have performed a one-tailed test to the right, since it is large differences between observed and theoretical counts that we potentially want to detect, hence a large value of \(\chi^{2}\).
Finally, let’s add a last remark on the implementation of the test. For the law of the quantity \(D^{2}\) to be sufficiently close to a \(\chi^{2}\) law, it is necessary not only for \(n\) to be large enough but also for the theoretical counts \(np_{i}\) not to be too small. In practice, they must not be less than 5. If some of them are too small, it is necessary to proceed with regroupings of classes.
Example 6
A pastry chef measures the diameters of 300 macarons and obtains:
| Diameter (cm) | 4.5–4.6 | 4.6–4.7 | 4.7–4.8 | 4.8–4.9 | 4.9–5.0 |
|---|---|---|---|---|---|
| Count | 20 | 59 | 136 | 74 | 11 |
Question: Do the macaron diameters follow a normal distribution?
Steps:
Estimate parameters:
- Sample mean: \(\bar{x}=4.749\ cm\)
- Sample std. dev. \(s = 0.127\ cm\)
Compute theoretical probabilities: Standardise intervals using \(Z =\frac{X - \bar{X}}{s}\) then find \(\mathbb{P}(a_{i}<Z<b_{i})\) from the standard normal table.
| Interval (cm) | \(a_i\) | \(b_i\) | \(\mathbb{P}(a_i < Z < b_i)\) | Expected count \(np_i\) |
|---|---|---|---|---|
| 4.5–4.6 | -1.97 | -1.14 | 0.1292 | 38.76 |
| 4.6–4.7 | -1.14 | -0.31 | 0.3023 | 90.69 |
| 4.7–4.8 | -0.31 | 0.52 | 0.3206 | 96.18 |
| 4.8–4.9 | 0.52 | 1.35 | 0.2857 | 85.71 |
| 4.9–5.0 | 1.35 | 2.18 | 0.0622 | 18.66 |
- Compute \(D^{2}\):
- Compare to critical value:
- Degrees of freedom: \(k - 1 - l = 4 - 1 - 2 = 1\)
- Critical value for \(\alpha = 0.05: \chi_{0.05,1}^{2}=3.84\)
- Conclusion: \(12.45>3.84\), so reject \(H_0\), macarons diameters do not follow a normal distribution
Independence Tests
In the tests seen in section 3, we placed ourselves in the context of populations described by normal variables that could therefore be characterized by two parameters: their expectation and their variance. However, very often, we are led to consider variables whose distribution law we do not know. To overcome this difficulty, we have therefore been concerned with defining tests, called non-parametric, which make no assumption about the nature of the populations involved. There is a very wide variety of such non-parametric tests, some of which are based on considering a random variable following a \(\chi^{2}\) law similar to that seen in the test of a previous section.
In this section, we will focus on presenting the test of independence between two variables. Other non-parametric tests will be discussed in the next section.
Let \(x_{1}, x_{2}, \ldots x_{r}\) and \(y_{1}, y_{2}, \ldots y_{s}\) be the modalities of two variables \(X\) and \(Y\). A sample of \(n\) indifividuals on which the values takin simultaneous be two variables have been recorded has given the results below:
| \(y_1\) | \(\ldots\) | \(y_j\) | \(\ldots\) | \(y_s\) | Total | |
|---|---|---|---|---|---|---|
| \(x_1\) | \(n_{11}\) | \(\ldots\) | \(n_{1j}\) | \(\ldots\) | \(n_{1s}\) | \(n_{1}\) |
| \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) |
| \(x_i\) | \(x_{i1}\) | \(\ldots\) | \(n_{ij}\) | \(\ldots\) | \(n_{is}\) | \(n_{i}\) |
| \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) |
| \(x_r\) | \(n_{r1}\) | \(\ldots\) | \(n_{rj}\) | \(\ldots\) | \(n_{rs}\) | \(n_{r}\) |
| Total | \(n_{.1}\) | \(\ldots\) | \(n_{.j}\) | \(\ldots\) | \(n_{.s}\) | n |
We introduce the following notations:
- \(n_{ij}\) is the number of individuals that presented both the modality \(x_i\) of \(X\) and the modality \(y_j\) of \(Y\).
- \(n_{i.} = \sum\limits_{j=1}^{s}n_{ij}\) represents the total of row \(x_{i}\) (marginal count in \(X\)).
- \(n_{.j} = \sum\limits_{i=1}^{r} n_{ij}\) represents the total of column \(y_{j}\) (marginal count in \(Y\)) Let the following probabilities:
- \(p_{ij} = \mathbb{P}(X=x_{i} \text{ and } Y = y_{j})\) the probability, for an individual chosen at random, of being in cell \((i,j)\) of the table
- \(p_{i.} = \mathbb{P}(X=x_{i})\) the probability of having modality \(x_{i}\) of the variable \(X\).
- \(p_{.j} = \mathbb{P}(Y = y_{j})\) the probability of having modality \(y_{j}\) of the variable \(Y\). We do not know these probabilities, but they can be estimated from our table data. Indeed, thanks to the law of large numbers, we can estimate \(p_{i.}\) by the proportion of individuals having the modality \(x_{i}\) i.e., \(n_{i.}/n\). SImilarly, we can estimate \(p_{j.}\) by the ratio \(n_{.j}/n\).
Let us make the hypothesis \(\mathcal{H}(0)\): “the two variables are independent”. It follows that: \(p_{ij} = p_{i.} \times p_{.j}\) Under the hypothesis of independence, the theoretical count \(t_{ij}\) corresponding to the observed count \(n_{ij}\) is then equal to: \(t_{ij} = p_{ij} \times n = \frac{n_{i.} \times n_{.j}}{n}\)
Theorem
If \(n\) is large, and denoting by \(N_{ij}\) the variables whose \(n_{ij}\) are the realisations, then the variable: \(D^{2} = \sum\limits_{i=1}^{r}\sum\limits_{j=1}^{s} \frac{ (N_{ij} - t_{ij})^{2}}{t_{ij}}\) follows a \(\chi^{2}\) distribution with \((r-1)(s-1)\) degrees of freedom.
For a chosen risk \(\alpha\), we define a limit value \(c\) such that \(\mathbb{P}(D^{2} > c) = \alpha\). If the observed value \(d^{2}\) is greater than \(c\), it seems preferable to doubt the hypothesis of independence. If, on the contrary, the value of \(d^{2}\) is less than \(c\), there is no reason to reject it.
Example 7
An experiment-lover baker tests whether oven temperature (low, medium, high) affects cake texture (moist, dry). Data from 300 cakes:
| Low Temp | Medium Temp | High temp | Total | |
|---|---|---|---|---|
| Moist | 60 | 80 | 40 | 180 |
| Dry | 40 | 20 | 60 | 120 |
| Total | 100 | 100 | 100 | 300 |
Question: Is cake texture independent of over temperature?
Steps
- Expected frequencies:
- Compute \(D^{2}\):
- Compare to critical value
- Degrees of freedom: \((2-1)(3-1) = 2\)
- Critical value for \(\alpha = 0.05: \chi_{0.05,2}=5.99\)
- Conclusion: Reject \(H_{0}\). Cake texture depends on over temperature
Other Non-Parametric Tests
There is a very wide variety of other non-parametric tests besides those seen in the previous sections. We will limit ourselves here to presenting those that are most used and that happen to be based on considering a random variable following a \(\chi^{2}\) law as before.
Comparison of test of several qualitative populations
Let \(\mathcal{P}_{1}, \mathcal{P}_{2}, \ldots, \mathcal{P}_{r}\) be \(r\) populations whose individuals are distinguished according to \(s\) categories \(C_{1}, C_{2}, \ldots, C_{s}\) which can be modalities of a qualitative variable (or the classes of a quantitative variable). A sample has been taken from each of these populations. Let \(n_{1}, n_{2}, \ldots, n_{i}, \ldots, n_{r}\) be their sizes let \(n_{ij}\) be the number of individuals that come from population \(\mathcal{P}_{i}\) and belong to category \(C_{j}\).
| \(C_{1}\) | \(\ldots\) | \(C_{j}\) | \(\ldots\) | \(C_{s}\) | Total | |
|---|---|---|---|---|---|---|
| \(\mathcal{P}_{1}\) | ||||||
| \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) |
| \(\mathcal{P}_{i}\) | \(n_{i1}\) | \(\ldots\) | \(n_{ij}\) | \(\ldots\) | \(n_{is}\) | \(n_{i}\) |
| \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) |
| \(\mathcal{P}_{r}\) | \(n_{r1}\) | \(\ldots\) | \(n_{rj}\) | \(\ldots\) | \(n_{rs}\) | \(n_{r}\) |
If we make the hypothesis that the populations are identical, then the probabilities, denoted \(p_{1}, p_{2}, \ldots, p_{s}\) of belonging to each of the classes are the same for all populations, and we can define theoretical counts in each class and for each population: \(t_{ij}= n_{i}p_{j}\) for each class \(C_j\) of population \(\mathcal{P}_{i}\). Then:
Theorem
Under the hypothesis \(\mathcal{H}_{0}\) “the populations are identical”, we have: \(D^{2}= \sum\limits_{i=1}^{r}\sum\limits_{j=1}^{s}\frac{(N_{ij}- t_{ij})^{2}}{t_{ij}} \sim \mathcal{\chi}^{2}((r-1)(s-1))\) where the \(n_{ij}\) are the realisation of the variables \(N_{ij}\). We can then implement the same test as the one in the previous section.
Median test
Given the results provided by two samples of size \(n_{1}\) and \(n_{2}\):
| Sample 1 | \(x_{1}\) | \(x_{2}\) | \(\ldots\) | \(x_{n_{1}}\) |
|---|---|---|---|---|
| Sample 2 | \(y_{1}\) | \(y_{2}\) | \(\ldots\) | \(y_{n_{2}}\) |
Let’s arrange all these results in a single increasing sequence and let’s denote the median of this sequence by \(M\). After counting the observations above and below \(M\), the data table can be summarized as follows:
| Observed Counts | \(> M\) | \(< M\) | Total |
|---|---|---|---|
| Sample 1 | \(n_{11}\) | \(n_{12}\) | \(n_{1}\) |
| Sample 2 | \(n_{21}\) | \(n_{22}\) | \(n_{2}\) |
| Total | \(\frac{n_{1} + n_{2}}{2}\) | \(\frac{n_{1} + n_{2}}{2}\) | \(n_1 + n_2\) |
Under the hypothesis that the two populations are identical, the theoretical proportion of observations above and below the median is in all cases \(1/2\). The previous table corresponds to the following theoretical table, and we are ultimately led to a \(\mathcal{\chi}^2\) test with 1 degree of freedom as in previous section:
| Theoretical Counts | \(> M\) | \(< M\) | Total |
|---|---|---|---|
| Sample 1 | \(\frac{n_1}{2}\) | \(\frac{n_1}{2}\) | \(n_{1}\) |
| Sample 2 | \(\frac{n_2}{2}\) | \(\frac{n_2}{2}\) | \(n_{2}\) |
| Total | \(\frac{n_{1} + n_{2}}{2}\) | \(\frac{n_{1} + n_{2}}{2}\) | \(n_1 + n_2\) |
Sign Test
This test applies to paired observations. On the same individual \(i\), two measurements \(x_{i}\) and \(y_{i}\) have been made, and we are interesting in the differences \(d_{i} = y_{i}-x_{i}\). Tn the classic test, we took into account the values of these differences, but in the sign test, we will only retain the signs of these differences. There is therefore a loss of information. If there is no difference between the measurements, the probability of a plus sign is equal to that of a minus sign and equal to \(0.5\). f there are \(n\) individuals in the sample, the theoretical counts are equal to \(0.5n\) and we are again led to a \(\mathcal{\chi}^{2}\) test with 1 degree of freedom on the quantity: \(d^{2}= \frac{(n_{+} - 0.5n)^{2}}{0.5n} + \frac{(n_{-} - 0.5n)^{2}}{0.5n}\)We can also implement this test by using a binomial law. Indeed, if we denote by \(K\) the number of positive differences, under the null hypothesis of absence of differences between the means of the two populations, there is a one in two chance that a difference is positive or negative. The variable \(K\) therefore follows a binomial law \(\mathcal{B}(n; 0.5)\). It is then sufficient to verify that the computed value \(k\) belongs or not to the critical region, the latter being defined by the quantiles of the law \(\mathcal{B}(n; 0.5)\). Saporta also presents Wilcoxon’s signed-rank test, which is more powerful than the sign test for paired data and insufficiently known.
Analysis of Variance
Imagine the following case: a manufacturer of light bulbs, having the choice between 4 types of filaments, proposes to study the influence of the type of filament (a qualitative variable) on the lifespan of the manufactured bulbs (a quantitative variable). To do this, they will have 4 samples of several identical bulbs made, except for the filament, have the bulbs burn until they go out, and then compare the results obtained. The statistical technique allowing this comparison is called analysis of variance. The objective of this section is therefore to present the analysis of variance technique for studying the influence of one factor, then of several factors.
As we will see, this technique constitutes an extension of the mean comparison test that we saw in an ealier section but applied to the case of more than 2 normal populations.
Search for the Influence of a Factor
We will denote by \(A\) the factor and call \(A_{1}, A_{2}, \ldots, A_{i}, \ldots, A_{p}\) its \(p\) modalities. The problem is the study of the influence of factor \(A\) on the quantitative variable \(Y\). The available experimentation consisted of carrying out, for each modality \(A_{i}\) of the factor, a certain number \(n_{i}\) of measurements of the studied variable \(Y\) so as to have a table like the following where \(\bar{y_{1}}, \ldots, \bar{y_{i}}, \ldots, \bar{y_{p}}\) are the column means:
| \(A_{1}\) | \(\ldots\) | \(A_{i}\) | \(\ldots\) | \(A_{p}\) |
|---|---|---|---|---|
| \(y_{11}\) | \(y_{i1}\) | \(y_{p1}\) | ||
| \(\vdots\) | \(\vdots\) | \(\vdots\) | ||
| \(\vdots\) | \(y_{ij}\) | \(\vdots\) | ||
| \(y_{1n_{1}}\) | \(\vdots\) | \(\vdots\) | ||
| \(\vdots\) | \(y_{1n_{p}}\) | |||
| \(y_{in_{i}}\) | ||||
| \(\bar{y_1}\) | \(\ldots\) | \(\bar{y_i}\) | \(\ldots\) | \(\bar{y_p}\) |
We can more generally interpret the columns of the table as samples from several populations that need to be compared.
The Analysis of Variance Relation
Let’s call \(\bar{y}\) the overall mean of the measurements: \(\bar{y} = \frac{1}{n}\sum\limits_{i=1}^{p} \sum\limits_{j=1}^{n_i}y_{ij}\ \text{with}\ n =\sum\limits_{i=1}^{p}n_{i}\) Let’s then perform a decomposition: \((y_{ij} - \bar{y}) = (\bar{y}_{i} - \bar{y}) + (y_{ij}- \bar{y}_{i})\) By squaring and summing, the double product is zero. Indeed, by definition of the means \(\bar{y}_{i}\) we can write: \(2 \sum\limits_{i=1}^{p} \sum\limits_{j=1}^{n_{i}}(\bar{y}_{i}-\bar{y})(y_{ij} - \bar{y}) =2 \sum\limits_{i=1}^{p}\sum\limits_{j=1}^{n_{i}}(y_{ij}-\bar{y})^{2}= \sum\limits_{i=1}^{p}n_{i}(\bar{y_{i}}-\bar{y})^{2}+ \sum\limits_{i=1}^{p}\sum\limits_{j=1}^{n_i}(y_{ij} - \bar{y_{i}})^2\) a relation called analysis of variance, which decomposes the total sum of squares: \(SST = \sum\limits_{i=1}^{p}\sum\limits_{j=1}^{n_i}(y_{ij} - \bar{y})^2\)into a sum of squares measuring the inter-column variability (i.e., the influence of the factor or the difference between the populations): \(SSA = \sum\limits_{i=1}^{p}n_{i}(\bar{y_{i}}-\bar{y})^{2}\) and a sum of squares measuring the intra-column variability (residual sum of squares): \(SSE = \sum\limits_{i=1}^{p}\sum\limits_{j=1}^{n_i}(y_{ij} - \bar{y_{i}})^2\) Note the great generality of this relation since it was established without making any assumptions about the data. However, the structure of the basic relation: \((y_{ij} - \bar{y}) = (\bar{y_{i}} - \bar{y}) + (y_{ij} - \bar{y_{i}})\) implicitely admits the additivity of the influence of the factor \((\bar{y_{i}} - \bar{y})\) and a residual \((y_{ij} - \bar{y_{i}})\). We can associate degrees of freedom with these sums that correspond to the number of independent terms in each of them:
- For SST: \(n-1\) degrees of freedom (number of values -1)
- for SSA: \(p-1\) degrees of freedom (number of modalities of the factor -1)
- for SSE: \(n-p\) by difference of the previous values
The model
To allow statistical inference, it is necessary to make a certain number of assumptions. The basic model of analysis of variance is written as: \(Y_{i}= \mu_{i}+ \varepsilon_{i}= \mu + \alpha_{i}+\varepsilon_{i}\) The \(\alpha_{i}\) are unknown. To lift their indeterminacy to within a constant, we are accustomed to setting: \(\sum\limits_{i=1}^{p}n_{i}\alpha_{i}=0\) The \(\varepsilon_{i}\) represent the random fluctuations corresponding to measurement errors or to the influence of uncontrolled factors. We will assume that there is no systematic error, or that it is contained in \(\mu\), hence, \(\mathbb{E}(\varepsilon)=0\).
The following assumptions state that the \(\varepsilon_{i}\):
- are independent, which implies \(Cov(\varepsilon_{i},\varepsilon_{j}) =0 \ \text{for} \ i \neq j\)
- have the same variance (homoscedasticity): \(\forall \ i, \ \mathbb{V}(\varepsilon_{i}) = \sigma^{2}\)
- follow normal laws.
Among these assumptions, the most restrictive is certainly the second, according to which the error on the variable \(Y\) is independent of the value taken by \(Y\) i.e., in particular, that it is not of a multiplicative type. To verify whether it is legitimate, we have several tests, the best known of which is Bartlett’s test, but the latter is very sensitive to the normality assumption.
Analysis of variance test
Let’s make the hypothesis \(\mathcal{H}_{0}\) that factor \(A\) has no influence on the variable \(Y\). This means that all \(Y_{i}\) have the same mean \(\mu\) and therefor that \(\alpha = \ldots = \alpha_{i}= \ldots = \alpha_{p} = 0\). Under \(\mathcal{H}_{0}\), we can then show that \(\frac{SSA}{\sigma^{2}}\sim \mathcal{\chi}^{2}(p-1)\). This implies that \(\frac{SSA}{p-1}\) is an estimator of \(\sigma^{2}\). Since, on the other hand, \(\frac{SSE}{\sigma^{2}} \sim \chi^{2}(n-p)\), the variable \(\frac{SSE}{n-1}\) is also an estimator of \(\sigma^{2}\). It follows:
Theorem
Under \(\mathcal{H}(0): \ \forall i, \alpha_{i}=0\) (no influence of factor \(A\)), we have: \(F = \frac{SSA/(p-1)}{SSE/(n-p)} \sim \mathcal{F}(p-1;n-p)\)and, as such, its value is close to 1.
If the computed value \(f\) is greater than the threshold value \(f_{\alpha}\) read from the Fisher-Snedecor table and such that \(\mathbb{P}(F > f_{\alpha}) = \alpha\), we can reject the hypothesis \(\mathcal{H}(0)\) at the risk \(\alpha\) of rejecting it wrongly and conclude in favour of the alternative hypothesis \(\mathcal{H}_{1}: \ \exists i,j \alpha_{i}\neq \alpha_{j}\) which translates an influence of factor \(A\). If it is lower, the available information does not allow us to conclude an influence of factor \(A\). It will be important to perform a one-tailed test to the right. Indeed, the low values of \(f\) correspond to small differences between the means \(\bar{y_i}\), of the columns, whereas the test aims to highlight large differences.
Practical Computation
We compute \(\begin{align} SST &= \sum\limits_{i=1}^{p}\sum\limits_{j=1}^{n_{i}} (y_{ij} - \bar{y})^{2} =\sum\limits_{i=1}^{p}\sum\limits_{j=1}^{n_{i}} y_{ij}^{2}-n \bar{y}^{2} \\ SSA &= \sum\limits_{i=1}^{p}n_{i}(\bar{y_{i}} - \bar{y})^{2} =\sum\limits_{i=1}^{p}n_{i}\bar{y_{i}}^{2}- n\bar{y}^{2} \end{align}\)and, by difference: \(SSE = SST - SSA\)
We then construct the following table:
| Variation | SS | Degrees of Freedom | Computed \(f\) | Critical value |
|---|---|---|---|---|
| Residual Factor | SSA SSE | \(p-1\) \(n-p\) | \(\frac{SSA/(p-1)}{SSE/(n-p)}\) | \(f_\alpha(p-1, n-p)\) |
| Total | SST | n-1 |
Study of the Influence of Two Factors
Imagine that the bulb manufacturer, mentioned above, is concerned with studying the influence, on the lifespan of the bulbs, not only of the type of filament used but also of the nature of the filling gas.
They could obviously first carry out a “filament” study using single-factor analysis of variance, then proceed, secondly, to a “gas” study in all respects similar. Having done this, they would only have to compare the results of these two studies to get an idea of the influence of the two factors studied. But by proceeding in this way, they would implicitly postulate the additivity of the “filament” and “gas” influences, which is not certain.
The two-factor analysis of variance will allow us to globally address the problem and potentially highlight what is commonly called the interactions of the studied factors.
Factorial experiment
Let, in general, \(A\) and \(B\) be two factors whose influence on a quantitative variable \(Y\) we propose to study. We will call \(A_{1}, A_{2}, \ldots, A_{i}, \ldots, A_{p}\) the \(p\) modalities of factor \(A\) and \(B_{1}, B_{2}, \ldots, B_{j}, \ldots, B_{q}\) the \(q\) modalities of factor \(B\). The implementation of the two-factor analysis of variance requires having at least one measurement of \(Y\) for each combination \((A_{i}, B_{j})\) of the modalities of the factors.
We will assume that the experimentation has allowed us to carry out \(r\) repetitions, i.e., \(r\) measurements for each of the \(pq\) combinations of the modalities of the factors. The case with no repetitions (\(r=1\)) will be discussed later.
The tests are therefore conducted in such a way as to obtain the following measurement table, one of the difficulties of the experimentation being to avoid missing measurements.
| \(A_1\) | \(\ldots\) | \(A_{i}\) | \(\ldots\) | \(A_{p}\) | |
|---|---|---|---|---|---|
| \(B_1\) | \(y_{111}, y_{112}, \ldots y_{11r}\) | \(\ldots\) | \(y_{i11}, y_{i12}, \ldots y_{i1r}\) | \(\ldots\) | \(y_{p11}, y_{p12}, \ldots y_{p1r}\) |
| \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) |
| \(B_j\) | \(y_{1j1}, y_{1j2}, \ldots y_{1jr}\) | \(\ldots\) | \(y_{ij1}, y_{ij2}, \ldots y_{ijr}\) | \(\ldots\) | \(y_{pj1}, y_{pj2}, \ldots y_{pjr}\) |
| \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) |
| \(B_q\) | \(y_{1q1}, y_{1q2}, \ldots y_{1qr}\) | \(\ldots\) | \(y_{iq1}, y_{iq2}, \ldots y_{iqr}\) | \(\ldots\) | \(y_{pq1}, y_{pq2}, \ldots y_{pqr}\) |
The experimental design thus carried out is called a factorial design. It is said to be balanced because there is the same number of measurements in each cell of the table. There are other balanced experimental designs that avoid the main disadvantage of the factorial design, which is to be very costly in terms of the number of measurements to be carried out.
Additive Model and model with interaction
The most general model, assuming the additivity of errors, is the following: \(Y_{ij} = \mu_{ij} + \varepsilon_{ij}\) By making explicit \(\mu_{ij}\), a commonly used model is the additive model: \(\mu_{ij} = \mu + \alpha_{i} + \beta_{j}\) We thus assume that there is additivity of effects: the combined action of the modalities \(A_{i}\) and \(B_{j}\) is the sum of the isolated actions of \(A\) on the one hand and \(B\) on the other hand. If we do not assume this restrictive hypothesis of additivity, we adopt the model with interaction: \(\mu_{ij} = \mu + \alpha_{i}+ \beta_{j} + \gamma_{ij}\)here is no longer additivity of effects because, in addition to the direct actions of \(A\) and \(B\), there is the term \(\gamma_{ij}\) which translates an additional effect due to the combination of the modalities \(A_{i}\) and \(B_{j}\).
We say that \((\alpha_{1}, \alpha_{2}, \ldots, \alpha_p)\) and \((\beta_{1}, \beta_{2}, \ldots, \beta_{q})\) are the actions of factors \(A\) and \(B\) while \((\gamma_{1}, \gamma_{2}, \ldots, \gamma_pq)\) are the interactions of the pair \((A,B)\). We can also say that the model with interaction translates the fact that the action of factor \(A\) for instance, depends on the modalities of factor \(B\).
To lift the indeterminacy of \(\mu\), we set the following relations: \(\sum\limits_{i=1}^{p}\alpha_{i}= \sum\limits_{j=1}^{q}\beta_{j}= 0\) \(\sum\limits_{i=1}^{p}\gamma_{ij}=0 \quad \forall j \quad \text{and} \quad \sum\limits_{j=1}^{q}\gamma_{ij}=0 \quad \forall i\)
Analysis of Variance Relation
Let’s call:
- \(\bar{y}_{i}\) the mean of a column of the measurement table: \(\bar{y}_{i} = \frac{1}{qr}\sum\limits_{jk} y_{ijk}\)
- \(\bar{y}_{j}\) the mean of a row of the table: \(\bar{y}_{j}= \frac{1}{pr} \sum\limits_{ik}y_{ijk}\)
- \(\bar{y}_{ij}\) the mean of a cell of the table: \(\bar{y}_{ij} = \sum\limits_{1}{r}\sum\limits_{k}y_{ijk}\)
- \(\bar{y}\) the overall mean of the measurements: \(\bar{y}= \frac{1}{pqr}\sum\limits_{ijk}y_{ijk}\)
Let us then perform the decomposition:
\[(y_{ijk}-\bar{y}) = (\bar{y}_{i}-\bar{y}) + (\bar{y}_{j}- \bar{y}) + \left[ (\bar{y}_{ij} - \bar{y}) - (\bar{y}_{i}-\bar{y}) - (\bar{y}_{j}-\bar{y}) \right] + (y_{ijk}- \bar{y}_{ij})\]By squaring and summing, the double products cancel out by definition of the different means, under the strict condition that the table is complete, i.e., that there are no missing measurements. We therefore obtain:
\[\sum\limits_{ijk} (y_{ijk} - \bar{y})^{2}= qr \sum\limits_{i}(\bar{y_{i}} - \bar{y})^{2} + pr \sum\limits_{j}(\bar{y_{j}} - \bar{y})^{2}+ r \sum\limits_{ij}\left[(\bar{y_{ij}}-\bar{y}) - (\bar{y_{i}} - \bar{y}) - (\bar{y_{j}} \bar{y}) \right]^{2} +\sum\limits_{ijk}(y_{ijk}- \bar{y_{ij})^2}\]which we will symbolically denote as: \(SST = SSA + SSB + SSAB + SSE.\) This is the analysis of variance relation. It allows us to decompose the total sum of squares into four sums. The first two correspond respectively to the actions of \(A\) and \(B\). The third corresponds to the interaction of \(A\) and \(B\). The last is the residual sum of squares.
Analysis of Variance Tests
Let’s assume, as in the case of a single factor, that the \(\varepsilon_{ij}\) are independent random variables all following the law \(\mathcal{N}(0,\sigma^{2})\). It is then possible to perform statistical inference from the observations and to test:
- the presence of an interaction,
- the influence of a factor.
Interaction Test
Let’s make the hypothesis that there is no interaction between factor \(A\) and \(B\). This means that \(\forall i,j\ :\ \gamma_{ij}=0\). We then show than \(\frac{SSAB}{\sigma^{2}} \sim \chi^{2}((p-1)(q-1))\). since \(\frac{SSE}{\sigma^{2}} \sim \chi^{2}(n-pq)\), i.e. \(\chi^{2}(pq(r-1))\), it follows that:
Theorem
Under \(\mathcal{H}_{0} \: \ \forall i,j, : \gamma_{ij}= 0\) (no interaction), we have: \(F_{AB} = \frac{SSAB/((p-1)(q-1))}{SSE/(pq(r-1))} \sim \mathcal{F}((p-1)(q-1);pq(r-1))\)
Test of the Influence of a Factor
Let’s make the hypothesis that factor \(A\), for instance, has no influence on the variable \(Y\). This means that \(\forall i \ : \ \alpha_{i}=0\). We then show than \(\frac{SSA}{\sigma^{2}} \sim \chi^{2}(p-1)\). Therefore,
Theorem
Under \(\mathcal{H}_{0}: \ \forall i,j, \alpha_{i}=0\) (no influence of factor \(A\)), we have: \(F_{A} = \frac{SSA/(p-1)}{SSE/(pq(r-1))} \sim \mathcal{F}(p-1;pq(r-1))\)
Execution of computation
We compute SSA, SSB, SSAB, and SSE using the following formulas:
\(\begin{align} SSA &= qr \sum\limits_{i}\bar{y}_{i}^{2} -pqr\bar{y}^{2}\\ SSB &= pr \sum\limits_{j} \bar{y_{j}}^{2}- pqr\bar{y}^{2}\\ SSAB &= r \sum\limits_{ij}\bar{y}_{ij}^{2}- pqr\bar{y}^{2}- SSA - SSB\\ SST &= \sum\limits_{ijk}y_{ijk}^{2}- pqr\bar{y}^{2} \end{align}\) then, by difference: \(SSE = SST - SSA - SSB - SSAB\) We finally draw up the following table:
| Variation | SS | Degrees of Freedom | Computed \(f\) | Critical Value |
|---|---|---|---|---|
| Factor A | SSA | \(p-1\) | \(f_{A}=\frac{SSA/(p−1)}{SSE/(pq(r−1))}\) | \(f_{\alpha}(p−1,pq(r−1))\) |
| Factor B | SSB | \(q−1\) | \(f_{A}=\frac{SSB/(q−1)}{SSE/(pq(r−1))}\) | \(f_{\alpha}(q−1,pq(r−1))\) |
| Interaction | SSAB | \((p−1)(q−1)\) | \(f_{AB}=\frac{SSAB/((p−1)(q−1))}{SSE/(pq(r−1))}\) | \(f_{\alpha}((p−1)(q−1),pq(r−1))\) |
| Residual | SSE | \(pq(r−1)\) | ||
| Total | SST | \(pqr−1\) |
Analysis of Variance Without Replication
Suppose that we have carried out only one measurement \(y_{ij}\) for each pair of modalities \((A_{i}, B_{j})\) according to the following table.
| \(A_1\) | \(\ldots\) | \(A_i\) | \(\ldots\) | \(A_p\) | |
|---|---|---|---|---|---|
| \(B_1\) | \(y_{11}\) | \(\ldots\) | \(y_{i1}\) | \(\ldots\) | \(y_{p1}\) |
| \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) |
| \(B_j\) | \(y_{1j}\) | \(\ldots\) | \(y_{ij}\) | \(\ldots\) | \(y_{pj}\) |
| \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) |
| \(B_q\) | \(y_{1q}\) | \(\ldots\) | \(y_{iq}\) | \(\ldots\) | \(y_{pq}\) |
The analysis of variance equation is then written as
\[\sum\limits_{ij}(y_{ij}-\bar{y})^{2} = q \sum\limits_{i}(\bar{y}_{i}- \bar{y})^{2} + p \sum\limits_{j}(\bar{y}_{j}-\bar{y})^{2} + \sum\limits_{ij}\left[ (y_{ij}-\bar{y}) - (\bar{y}_{i}-\bar{y}) - (\bar{y}_{j}- \bar{y}) \right]^{2}\]i.e., with the usual notations: \(SST = SSA + SSB + SSAB\) It becomes impossible to test the interaction, since we no longer have a quantity such as SSE allowing, by division, to eliminate \(\sigma^{2}\) and obtain a Fisher-Snedecor distribution. It is therefore necessary in this case to make the hypothesis (impossible to verify) that there is no interaction. We must therefore adopt the additive model: \(y_{ij} = \mu + \alpha_{i}+\beta_{j}+\varepsilon_{ij}\) Under this condition, and whatever the actions of factor \(A\) and \(B\), we show, as in the general case, that \(\frac{SSAB}{\sigma^{2}} \sim \chi^{2}((p-1)(q-1))\) Then, to test the influence of \(A\) for instance, let’s make the hypothesis that the \(\alpha_{i}\) are all zero. It implies that \(\frac{SSA}{\sigma^{2}}\sim \chi^{2}(p-1)\) and consequently, that:
\[F_{A}= \frac { \frac{SSA}{p-1} } { \frac{SSAB}{(p-1)(q-1)} } \sim \mathcal{F}(p-1;(p-1)(q-1))\]