Resources For Teachers For Tutors For Students & Parents Pricing
Year 11 General Univariate Data Analysis

Comparing Data For A Numerical Variable Across Two Or More Groups

20 practice questions 2 video lessons Theory + worked examples

Theory

Compare a numerical variable across groups using parallel boxplots, back-to-back stem-and-leaf plots, or side-by-side histograms. Compare four things: centre, spread, shape, and outliers. Use median + IQR if either group has outliers. Watch for overlapping boxplots (suggesting no clear difference) and small-sample limitations.

When comparing a numerical variable across two or more groups (Year 11 vs Year 12 study hours, Brisbane vs Sydney rainfall), the goal is to compare centre, spread, and shape.

Choosing the right display:

  • Parallel boxplots — best for comparing centre and spread of two or more groups on the same axis.
  • Back-to-back stem-and-leaf — good when you want to keep every individual value visible.
  • Parallel dot plots — for two small datasets with discrete values.
  • Side-by-side histograms — for large grouped datasets (use the same scale on both axes).

The four-step comparison:

  1. Centre: which group has the higher mean/median? By how much?
  2. Spread: which has the larger SD/IQR/range? Similar or very different?
  3. Shape: both symmetric? Is one skewed?
  4. Outliers: any extreme values in either group?

The first diagram is a parallel boxplot comparing Year 11 and Year 12 weekly study hours. The second is the four-step framework to apply whenever you're asked to compare two distributions.

Parallel boxplots comparing Year 11 and Year 12 study hours Two parallel boxplots showing weekly study hours: Year 11 with median 5 hours, Year 12 with median 8 hours. Year 12's distribution is shifted right overall. Parallel boxplots — weekly study hours 0 2 4 6 8 10 12 Hours per week Y11 med = 5 Y12 med = 8 Year 12 shifted right by ≈ 3 hours
Year 12's distribution sits roughly 3 hours to the right of Year 11's, with similar IQRs.
Four-step framework for comparing two distributions A reference panel showing the four steps for comparing two distributions: centre, spread, shape, and outliers, each with what to look for and what to report. Four-step comparison framework Step What to compare What to look at / report 1 Centre where is the typical value? Compare medians (or means). By how much do they differ? 2 Spread how widely do values vary? Compare IQRs (or SDs). One group more variable? 3 Shape symmetric or skewed? Symmetric, positively or negatively skewed? 4 Outliers extreme values? Any in either group? Use median/IQR if so. Write one short sentence per step in plain English
Four things to compare every time: centre, spread, shape, and outliers.

This subtopic is about describing differences, not numerical formulas. Use the four-step framework consistently.

Centre

Compare medians (preferred for skewed or outlier data) or means (preferred for symmetric data).

centre difference=medianAmedianB

Spread

Compare IQRs (preferred when outliers are present) or standard deviations (preferred for symmetric data).

IQR=Q3Q1,range=maxmin

Display choices

SituationBest display
Comparing centre and spread, two or more groupsParallel boxplots
Want to see every value, two groupsBack-to-back stem-and-leaf
Two small discrete datasetsParallel dot plots
Two large grouped datasetsSide-by-side histograms (same scale)
Robust comparisons. If outliers are present in either group, compare medians and IQRs — they are not pulled by extreme values, whereas mean and standard deviation are.

Comparing two distributions

  1. Choose a display appropriate to the data and question (parallel boxplots are the default).
  2. Centre: compare medians (or means). State which is higher and by how much.
  3. Spread: compare IQRs (or SDs and ranges). Note if one group is much more variable.
  4. Shape: are both symmetric, or is one skewed? Bimodal?
  5. Outliers: any extreme values? If yes, prefer median + IQR over mean + SD.
  6. Conclude in plain English using the original units (e.g. "Year 12 students study, on average, about 3 hours more per week").

Reading parallel boxplots

  1. For each boxplot, read off the five-number summary.
  2. Compare medians side by side — are they clearly separated, or do the boxes overlap?
  3. Compare box widths (IQRs) and whisker lengths (ranges).
  4. Note any outliers (dots beyond the whiskers).
EXAMPLE 1 — COMPARE MEDIANS
Year 11 boxplot: median =14 hours. Year 12 boxplot: median =24 hours. Compare.
SOLUTION

Subtract the medians to quantify the centre difference.

Median diff=2414=10 hours

Answer: Year 12 students study typically 10 hours more per week than Year 11.

median diff=10
EXAMPLE 2 — COMPARE RANGES
Brisbane daily rainfall (sorted): 0,2,5,8,12,15,25. Cairns: 5,10,15,20,30,35,40. Compare ranges.
SOLUTION

Range = max − min for each group.

Brisbane range=250=25
Cairns range=405=35

Answer: Cairns has the greater spread (range 35 vs 25).

Cairns range>Brisbane range
EXAMPLE 3 — COMPARING WITH OUTLIERS
Office A delivery times: typical around 1525 min. Office B: similar typical times but one extreme value of 120 min. Which summary measure is most affected by the outlier?
SOLUTION

The mean includes every value, so the 120 drags it up sharply. The median just steps up by one position when a single value is added, so it barely changes. Likewise, standard deviation and range are pulled by outliers; IQR is not.

Answer: the mean is most affected by the outlier. For Office B, use median + IQR for a fair comparison with Office A.

mean=most affected
EXAMPLE 4 — SMALL SAMPLE LIMITATION
A sample of 5 Year 12 students has a mean 2 hours higher than a sample of 5 Year 11s. Does this prove Year 12 studies more?
SOLUTION

No. With only 5 students per group, a difference of 2 hours could easily come from chance variation between two small samples. The same comparison with, say, 50 students per group would be much more reliable.

Answer: the sample size is too small — the apparent 2-hour difference might just be random variation, not a real population difference.

small samplesunreliable

Common pitfalls

Heavy overlap → no clear difference. If two parallel boxplots overlap heavily, the groups aren't clearly different even if the medians are slightly apart. Real differences usually mean well-separated medians and minimal box overlap.
Use the same scale for parallel histograms or dot plots. Different scales can make comparable distributions look very different or different distributions look the same. Always check the axes.
Small samples → unreliable conclusions. A difference of a few units between two samples of 5 could easily be random. Large samples give much more reliable comparisons.
Mean is dragged by outliers; median isn't. If either group has outliers, compare medians (and IQRs) for a fair, robust comparison.
Always state units and context. "Year 12 is 10 higher" is incomplete. Say "Year 12 students study, on average, 10 hours more per week than Year 11 students."

Frequently asked questions

Which display should I use to compare two groups?

Parallel (side-by-side) boxplots are usually best — they show centre, spread, range, and outliers at a glance. Back-to-back stem-and-leaf plots are good when you want to keep every individual value visible. Side-by-side histograms work for large grouped datasets.

What four things should I compare?

Centre (which group has the higher mean or median, and by how much), spread (which has the larger SD, IQR, or range), shape (are both symmetric or is one skewed), and outliers (any extreme values in either group).

Should I use mean or median to compare?

Use the median if either group has outliers or is skewed, because the median is resistant to outliers. Use the mean for symmetric data with no outliers.

What if two parallel boxplots overlap?

Heavy overlap suggests no clear difference between the groups. A real difference usually means the medians are clearly separated and the boxes barely or do not overlap. Small differences with overlapping boxes could easily be due to random variation.

Why do small samples make comparisons unreliable?

With small samples, a difference of a few units between groups can easily arise by chance. Larger samples give more reliable comparisons because random fluctuations average out.

Why must side-by-side histograms use the same scale?

Different scales can make similar distributions look very different, or make different distributions look similar. Always use the same horizontal and vertical scales when comparing two histograms — otherwise the comparison is misleading.

Video Lessons

  • Comparing Box Plots-Comparing Box and Whisker Plots Watch
  • Understanding & Comparing Boxplots (Box and Whisker Plots) Watch

Practice Questions

20 questions available.

Practice Questions