Statistical Hypothesis Testing

Master A-Level hypothesis testing: null and alternative hypotheses, significance levels, binomial tests, critical regions, and one/two-tailed tests.

Hypothesis testing is a formal statistical procedure used to assess whether observed data provides sufficient evidence to reject a claim about a population parameter. It is one of the most important topics in A-Level Statistics, tested by all major exam boards (AQA, Edexcel, OCR), and it underpins much of the statistical reasoning used in science, medicine, business, and social research.

At A-Level, hypothesis testing is introduced using the binomial distribution (and later the normal distribution at A2 for some boards). You will learn to set up hypotheses, choose significance levels, calculate test statistics or probabilities, and draw conclusions in context. The language and logic of hypothesis testing are precise — examiners reward students who use correct terminology and structure their arguments clearly.

This guide covers the full A-Level hypothesis testing framework, with a focus on binomial tests, critical regions, and the interpretation of results.

Core Concepts

What is a Hypothesis Test?

A hypothesis test starts with a claim (or assumption) about a population parameter — typically a probability pp or a mean μ\mu. We collect data and ask: "Is this data consistent with the claim, or does it provide evidence against it?"

The test follows a structured process:

  1. Define the hypotheses.
  2. Choose a significance level.
  3. Collect data and calculate the test statistic.
  4. Compare with the critical value (or calculate the pp-value).
  5. Draw a conclusion in context.

Null and Alternative Hypotheses

The null hypothesis H0H_0 is the default assumption — typically that nothing has changed or that a parameter takes a specified value. For example:

H0:p=0.3H_0: p = 0.3

The alternative hypothesis H1H_1 specifies what we suspect might be true instead. It can take one of three forms:

  • One-tailed (upper): H1:p>0.3H_1: p > 0.3 (we suspect pp is larger)
  • One-tailed (lower): H1:p<0.3H_1: p < 0.3 (we suspect pp is smaller)
  • Two-tailed: H1:p0.3H_1: p \neq 0.3 (we suspect pp is different, but don't specify the direction)

The choice of H1H_1 depends on the context of the problem and must be decided before looking at the data.

Significance Level

The significance level α\alpha is the probability of incorrectly rejecting H0H_0 when it is actually true (a Type I error). Common significance levels are:

  • α=0.05\alpha = 0.05 (5%) — the most common
  • α=0.01\alpha = 0.01 (1%) — more stringent
  • α=0.10\alpha = 0.10 (10%) — more lenient

The smaller the significance level, the stronger the evidence needed to reject H0H_0.

The Binomial Test

At A-Level, many hypothesis tests involve a binomial distribution. If we observe XX successes in nn independent trials, each with probability pp of success, then under H0H_0:

XB(n,p0)X \sim B(n, p_0)

where p0p_0 is the value of pp specified in H0H_0.

To carry out the test, we calculate the probability of obtaining a result as extreme as (or more extreme than) the observed value, assuming H0H_0 is true.

For a one-tailed test (H1:p<p0H_1: p < p_0):

Calculate P(Xxobs)P(X \leq x_{\text{obs}}) under H0H_0. If this probability is less than α\alpha, reject H0H_0.

For a one-tailed test (H1:p>p0H_1: p > p_0):

Calculate P(Xxobs)P(X \geq x_{\text{obs}}) under H0H_0. If this probability is less than α\alpha, reject H0H_0.

For a two-tailed test (H1:pp0H_1: p \neq p_0):

Calculate the probability in the relevant tail and compare with α2\frac{\alpha}{2} (since the significance level is split between both tails).

Critical Regions and Critical Values

The critical region is the set of values of the test statistic that lead to rejection of H0H_0. The critical value is the boundary of this region.

For a binomial test with H1:p<p0H_1: p < p_0 at the 5% level, the critical region consists of all values xx such that P(Xx)<0.05P(X \leq x) < 0.05. The largest such xx is the critical value.

For H1:p>p0H_1: p > p_0, the critical region is in the upper tail: values xx such that P(Xx)<0.05P(X \geq x) < 0.05.

When the observed value falls in the critical region, we reject H0H_0. When it falls outside, we do not reject H0H_0.

The Actual Significance Level

Because the binomial distribution is discrete, we usually cannot achieve exactly α=0.05\alpha = 0.05. The actual significance level is the probability of the critical region, which is as close to α\alpha as possible without exceeding it.

For example, if the critical region is X2X \leq 2 and P(X2)=0.0382P(X \leq 2) = 0.0382, then the actual significance level is 3.82%3.82\%, not exactly 5%5\%.

Type I and Type II Errors

A Type I error occurs when we reject H0H_0 when it is actually true. The probability of a Type I error equals the significance level α\alpha.

A Type II error occurs when we fail to reject H0H_0 when it is actually false. The probability of a Type II error depends on the true value of the parameter and is harder to calculate.

H0H_0 true H0H_0 false
Reject H0H_0 Type I error Correct decision
Don't reject H0H_0 Correct decision Type II error

Writing Conclusions

Conclusions must be written in context and with appropriate language:

  • Reject H0H_0: "There is sufficient evidence at the α\alpha significance level to reject H0H_0 and conclude that [contextual statement about H1H_1]."
  • Do not reject H0H_0: "There is insufficient evidence at the α\alpha significance level to reject H0H_0. There is no significant evidence that [contextual statement about H1H_1]."

Important: we never say we "accept H0H_0" — we only say we "do not reject" it, because failing to find evidence against H0H_0 is not the same as proving it true.

Strategy Tips

Tip 1: Read the Context Carefully

The wording of the question tells you which alternative hypothesis to use. Phrases like "believes the proportion has increased" suggest H1:p>p0H_1: p > p_0; "claims it has changed" suggests H1:pp0H_1: p \neq p_0.

Tip 2: Set Up Hypotheses Before Calculating

Always write down H0H_0 and H1H_1 before doing any calculations. This ensures you test the correct tail and use the correct comparison.

Tip 3: Use the Correct Tail Probability

For H1:p<p0H_1: p < p_0, calculate P(Xx)P(X \leq x) (lower tail). For H1:p>p0H_1: p > p_0, calculate P(Xx)P(X \geq x) (upper tail). Mixing these up is one of the most common errors.

Tip 4: State the Distribution Under H0H_0

Explicitly write "Under H0H_0, XB(n,p0)X \sim B(n, p_0)". This earns a method mark and shows the examiner you understand the test framework.

Tip 5: Always Conclude in Context

A conclusion that says only "reject H0H_0" without reference to the real-world situation will lose marks. Always relate your answer back to the scenario described in the question.

Worked Example: Example 1

Problem

A manufacturer claims that 20%20\% of items produced are defective. A quality inspector tests a random sample of 2020 items and finds 77 defective. Test, at the 5%5\% significance level, whether the proportion of defective items is greater than 20%20\%.

Solution

H0:p=0.2H_0: p = 0.2 (the proportion of defective items is 20%20\%)

H1:p>0.2H_1: p > 0.2 (the proportion is greater than 20%20\%)

Significance level: α=0.05\alpha = 0.05 (one-tailed test).

Under H0H_0: XB(20,0.2)X \sim B(20, 0.2), where XX is the number of defective items.

Observed value: x=7x = 7.

Calculate P(X7)P(X \geq 7) under H0H_0:

P(X7)=1P(X6)P(X \geq 7) = 1 - P(X \leq 6)

Using binomial tables or a calculator: P(X6)=0.9133P(X \leq 6) = 0.9133

P(X7)=10.9133=0.0867P(X \geq 7) = 1 - 0.9133 = 0.0867

Since 0.0867>0.050.0867 > 0.05, we do not reject H0H_0.

Conclusion: There is insufficient evidence at the 5%5\% significance level to conclude that the proportion of defective items is greater than 20%20\%.

Worked Example: Example 2

Problem

A coin is suspected of being biased. It is tossed 1010 times and lands on heads 99 times. Test at the 5%5\% significance level whether the coin is biased towards heads.

Solution

H0:p=0.5H_0: p = 0.5 (the coin is fair)

H1:p>0.5H_1: p > 0.5 (the coin is biased towards heads)

Significance level: α=0.05\alpha = 0.05 (one-tailed).

Under H0H_0: XB(10,0.5)X \sim B(10, 0.5).

Observed value: x=9x = 9.

P(X9)=P(X=9)+P(X=10)P(X \geq 9) = P(X = 9) + P(X = 10)

P(X=9)=(109)(0.5)10=10×11024=101024P(X = 9) = \binom{10}{9}(0.5)^{10} = 10 \times \frac{1}{1024} = \frac{10}{1024}

P(X=10)=(1010)(0.5)10=11024P(X = 10) = \binom{10}{10}(0.5)^{10} = \frac{1}{1024}

P(X9)=1110240.0107P(X \geq 9) = \frac{11}{1024} \approx 0.0107

Since 0.0107<0.050.0107 < 0.05, we reject H0H_0.

Conclusion: There is sufficient evidence at the 5%5\% significance level to conclude that the coin is biased towards heads.

Worked Example: Example 3

Problem

Historically, 35%35\% of students at a school achieve a grade A in maths. After introducing a new teaching method, a random sample of 1515 students is taken and 88 achieve grade A. Test at the 5%5\% significance level whether there is evidence that the proportion has changed.

Solution

H0:p=0.35H_0: p = 0.35

H1:p0.35H_1: p \neq 0.35 (two-tailed test)

Significance level: α=0.05\alpha = 0.05, so each tail has α/2=0.025\alpha/2 = 0.025.

Under H0H_0: XB(15,0.35)X \sim B(15, 0.35).

Observed value: x=8x = 8. Since 8>15×0.35=5.258 > 15 \times 0.35 = 5.25, we test the upper tail.

P(X8)=1P(X7)P(X \geq 8) = 1 - P(X \leq 7)

Using a calculator: P(X7)=0.9500P(X \leq 7) = 0.9500 (approximately)

P(X8)0.0500P(X \geq 8) \approx 0.0500

Since 0.0500>0.0250.0500 > 0.025 (the critical value for the upper tail in a two-tailed test), we do not reject H0H_0.

Conclusion: There is insufficient evidence at the 5%5\% significance level to conclude that the proportion of students achieving grade A has changed following the new teaching method.

Worked Example: Example 4

Problem

Find the critical region for a test of H0:p=0.3H_0: p = 0.3 against H1:p<0.3H_1: p < 0.3 using XB(12,0.3)X \sim B(12, 0.3) at the 5%5\% significance level.

Solution

We need the largest value cc such that P(Xc)<0.05P(X \leq c) < 0.05 under H0H_0.

P(X=0)=(0.7)12=0.0138P(X = 0) = (0.7)^{12} = 0.0138

P(X0)=0.0138<0.05P(X \leq 0) = 0.0138 < 0.05

P(X1)=P(X=0)+P(X=1)=0.0138+(121)(0.3)1(0.7)11=0.0138+0.0712=0.0850P(X \leq 1) = P(X = 0) + P(X = 1) = 0.0138 + \binom{12}{1}(0.3)^1(0.7)^{11} = 0.0138 + 0.0712 = 0.0850

P(X1)=0.0850>0.05P(X \leq 1) = 0.0850 > 0.05

So the critical region is X0X \leq 0, i.e., {0}\{0\}.

The actual significance level is 0.01380.0138 (1.38%1.38\%).

Practice Problems

  1. Problem 1

    A die is thought to be biased. The probability of rolling a six is tested. In 3030 rolls, 99 sixes are observed. Test at the 5%5\% level whether the die is biased towards six. (H0:p=16H_0: p = \frac{1}{6}, H1:p>16H_1: p > \frac{1}{6}.) [Hint: P(X9)P(X \geq 9) where XB(30,1/6)X \sim B(30, 1/6)]

    Problem 2

    A charity claims that 40%40\% of households donate. A survey of 2525 households finds 66 donors. Test at the 5%5\% level whether the proportion is less than 40%40\%. [Answer: P(X6)0.074>0.05P(X \leq 6) \approx 0.074 > 0.05, do not reject H0H_0]

    Problem 3

    Find the critical region for testing H0:p=0.5H_0: p = 0.5 against H1:p<0.5H_1: p < 0.5 with n=10n = 10 at the 5%5\% significance level. [Answer: X1X \leq 1, actual significance =0.0107= 0.0107]

    Problem 4

    A factory's defect rate has historically been 10%10\%. After maintenance, a sample of 5050 items reveals 22 defects. Is there evidence at the 5%5\% level that the defect rate has decreased? [Hint: one-tailed test, XB(50,0.1)X \sim B(50, 0.1)]

    Problem 5

    Explain what is meant by a Type I error in the context of Problem 1 above. State its probability.

Want to check your answers and get step-by-step solutions?

Get it on Google PlayDownload on the App Store

Common Mistakes

  • Saying "accept H0H_0" instead of "do not reject H0H_0". This is a critical language error. We never prove H0H_0 true — we merely find insufficient evidence to reject it.

  • Using the wrong tail. If H1:p>p0H_1: p > p_0, you need the upper tail probability P(Xx)P(X \geq x), not P(Xx)P(X \leq x). Read H1H_1 carefully to determine the correct direction.

  • Forgetting to halve α\alpha for two-tailed tests. In a two-tailed test, compare the tail probability with α2\frac{\alpha}{2}, not α\alpha. Forgetting this effectively doubles the significance level.

  • Not writing the distribution under H0H_0. Always state XB(n,p0)X \sim B(n, p_0) explicitly. This is a required step in the method and earns marks.

  • Vague or non-contextual conclusions. "Reject H0H_0" alone is not sufficient. You must relate the conclusion to the real-world scenario described in the question.

  • Confusing P(X=x)P(X = x) with P(Xx)P(X \leq x). The pp-value for a lower-tailed test is the cumulative probability P(Xx)P(X \leq x), not the probability of that single value.

Frequently Asked Questions

Why don't we "accept" the null hypothesis?

Because failing to reject H0H_0 does not prove it is true. It merely means we did not find enough evidence against it. A different sample might yield different results. The correct phrase is "there is insufficient evidence to reject H0H_0".

How do I decide between a one-tailed and two-tailed test?

If the question suggests a specific direction of change (e.g., "believes the proportion has increased"), use a one-tailed test. If it says "test whether the proportion has changed" without specifying direction, use a two-tailed test.

What if my $p$-value exactly equals the significance level?

Convention varies, but at A-Level, if the pp-value equals α\alpha, we are on the boundary of the critical region. Most exam mark schemes treat this as "reject H0H_0" (the critical region includes the boundary), but read the question carefully.

Do I need to calculate binomial probabilities by hand?

You should be able to use the binomial probability formula P(X=r)=(nr)pr(1p)nrP(X = r) = \binom{n}{r}p^r(1-p)^{n-r} and cumulative probabilities. In practice, many exam boards provide statistical tables or expect calculator use. Check your board's guidance.

What is the actual significance level, and why does it differ from $\alpha$?

The actual significance level is the exact probability of the critical region. Because the binomial distribution is discrete, we cannot always achieve exactly α=0.05\alpha = 0.05. The actual significance level is the largest possible probability that does not exceed α\alpha.

Key Takeaways

  • Hypothesis testing follows a rigid structure. Define H0H_0 and H1H_1, state the significance level, identify the distribution under H0H_0, compute the probability, compare, and conclude in context.

  • H0H_0 represents the status quo. The null hypothesis is what we assume to be true unless the data provides sufficient evidence against it.

  • The significance level controls Type I error. Choosing α=0.05\alpha = 0.05 means we accept a 5%5\% chance of incorrectly rejecting a true H0H_0.

  • Critical regions define rejection boundaries. If the observed test statistic falls in the critical region, we reject H0H_0. Otherwise, we do not.

  • Language matters enormously. Use "sufficient evidence to reject" and "insufficient evidence to reject" — never "accept H0H_0" or "prove H1H_1".

  • Context is king. Every conclusion must be expressed in terms of the original problem. Statistical jargon alone does not earn full marks.

Ready to Ace Your A-Level maths?

Get instant step-by-step solutions to any problem. Snap a photo and learn with Tutor AI — your personal exam prep companion.

Get it on Google PlayDownload on the App Store