Comparing Data For A Numerical Variable Across Two Or More Groups
Theory
Compare a numerical variable across groups using parallel boxplots, back-to-back stem-and-leaf plots, or side-by-side histograms. Compare four things: centre, spread, shape, and outliers. Use median + IQR if either group has outliers. Watch for overlapping boxplots (suggesting no clear difference) and small-sample limitations.
When comparing a numerical variable across two or more groups (Year
Choosing the right display:
- Parallel boxplots — best for comparing centre and spread of two or more groups on the same axis.
- Back-to-back stem-and-leaf — good when you want to keep every individual value visible.
- Parallel dot plots — for two small datasets with discrete values.
- Side-by-side histograms — for large grouped datasets (use the same scale on both axes).
The four-step comparison:
- Centre: which group has the higher mean/median? By how much?
- Spread: which has the larger SD/IQR/range? Similar or very different?
- Shape: both symmetric? Is one skewed?
- Outliers: any extreme values in either group?
The first diagram is a parallel boxplot comparing Year 11 and Year 12 weekly study hours. The second is the four-step framework to apply whenever you're asked to compare two distributions.
This subtopic is about describing differences, not numerical formulas. Use the four-step framework consistently.
Centre
Compare medians (preferred for skewed or outlier data) or means (preferred for symmetric data).
Spread
Compare IQRs (preferred when outliers are present) or standard deviations (preferred for symmetric data).
Display choices
| Situation | Best display |
|---|---|
| Comparing centre and spread, two or more groups | Parallel boxplots |
| Want to see every value, two groups | Back-to-back stem-and-leaf |
| Two small discrete datasets | Parallel dot plots |
| Two large grouped datasets | Side-by-side histograms (same scale) |
Comparing two distributions
- Choose a display appropriate to the data and question (parallel boxplots are the default).
- Centre: compare medians (or means). State which is higher and by how much.
- Spread: compare IQRs (or SDs and ranges). Note if one group is much more variable.
- Shape: are both symmetric, or is one skewed? Bimodal?
- Outliers: any extreme values? If yes, prefer median + IQR over mean + SD.
- Conclude in plain English using the original units (e.g. "Year 12 students study, on average, about 3 hours more per week").
Reading parallel boxplots
- For each boxplot, read off the five-number summary.
- Compare medians side by side — are they clearly separated, or do the boxes overlap?
- Compare box widths (IQRs) and whisker lengths (ranges).
- Note any outliers (dots beyond the whiskers).
Subtract the medians to quantify the centre difference.
| Median diff |
Answer: Year 12 students study typically
Range = max − min for each group.
| Brisbane range | ||
| Cairns range |
Answer: Cairns has the greater spread (range
The mean includes every value, so the
Answer: the mean is most affected by the outlier. For Office B, use median + IQR for a fair comparison with Office A.
No. With only
Answer: the sample size is too small — the apparent
Common pitfalls
Frequently asked questions
Which display should I use to compare two groups?
Parallel (side-by-side) boxplots are usually best — they show centre, spread, range, and outliers at a glance. Back-to-back stem-and-leaf plots are good when you want to keep every individual value visible. Side-by-side histograms work for large grouped datasets.
What four things should I compare?
Centre (which group has the higher mean or median, and by how much), spread (which has the larger SD, IQR, or range), shape (are both symmetric or is one skewed), and outliers (any extreme values in either group).
Should I use mean or median to compare?
Use the median if either group has outliers or is skewed, because the median is resistant to outliers. Use the mean for symmetric data with no outliers.
What if two parallel boxplots overlap?
Heavy overlap suggests no clear difference between the groups. A real difference usually means the medians are clearly separated and the boxes barely or do not overlap. Small differences with overlapping boxes could easily be due to random variation.
Why do small samples make comparisons unreliable?
With small samples, a difference of a few units between groups can easily arise by chance. Larger samples give more reliable comparisons because random fluctuations average out.
Why must side-by-side histograms use the same scale?
Different scales can make similar distributions look very different, or make different distributions look similar. Always use the same horizontal and vertical scales when comparing two histograms — otherwise the comparison is misleading.
Video Lessons
Practice Questions
20 questions available.
Practice Questions