Year 11 General Univariate Data Analysis

Comparing Data For A Numerical Variable Across Two Or More Groups

20 practice questions 2 video lessons Theory + worked examples

Theory

Compare a numerical variable across groups using parallel boxplots, back-to-back stem-and-leaf plots, or side-by-side histograms. Compare four things: centre, spread, shape, and outliers. Use median + IQR if either group has outliers. Watch for overlapping boxplots (suggesting no clear difference) and small-sample limitations.

When comparing a numerical variable across two or more groups (Year $11$ vs Year $12$ study hours, Brisbane vs Sydney rainfall), the goal is to compare centre, spread, and shape.

Choosing the right display:

Parallel boxplots — best for comparing centre and spread of two or more groups on the same axis.
Back-to-back stem-and-leaf — good when you want to keep every individual value visible.
Parallel dot plots — for two small datasets with discrete values.
Side-by-side histograms — for large grouped datasets (use the same scale on both axes).

The four-step comparison:

Centre: which group has the higher mean/median? By how much?
Spread: which has the larger SD/IQR/range? Similar or very different?
Shape: both symmetric? Is one skewed?
Outliers: any extreme values in either group?

The first diagram is a parallel boxplot comparing Year 11 and Year 12 weekly study hours. The second is the four-step framework to apply whenever you're asked to compare two distributions.

Year 12's distribution sits roughly

3

hours to the right of Year 11's, with similar IQRs.

Four things to compare every time: centre, spread, shape, and outliers.

This subtopic is about describing differences, not numerical formulas. Use the four-step framework consistently.

Centre

Compare medians (preferred for skewed or outlier data) or means (preferred for symmetric data).

$centre difference = {median}_{A} - {median}_{B}$

Spread

Compare IQRs (preferred when outliers are present) or standard deviations (preferred for symmetric data).

$IQR = Q_{3} - Q_{1}, range = max - min$

Display choices

Situation	Best display
Comparing centre and spread, two or more groups	Parallel boxplots
Want to see every value, two groups	Back-to-back stem-and-leaf
Two small discrete datasets	Parallel dot plots
Two large grouped datasets	Side-by-side histograms (same scale)

Robust comparisons. If outliers are present in either group, compare medians and IQRs — they are not pulled by extreme values, whereas mean and standard deviation are.

Comparing two distributions

Choose a display appropriate to the data and question (parallel boxplots are the default).
Centre: compare medians (or means). State which is higher and by how much.
Spread: compare IQRs (or SDs and ranges). Note if one group is much more variable.
Shape: are both symmetric, or is one skewed? Bimodal?
Outliers: any extreme values? If yes, prefer median + IQR over mean + SD.
Conclude in plain English using the original units (e.g. "Year 12 students study, on average, about 3 hours more per week").

Reading parallel boxplots

For each boxplot, read off the five-number summary.
Compare medians side by side — are they clearly separated, or do the boxes overlap?
Compare box widths (IQRs) and whisker lengths (ranges).
Note any outliers (dots beyond the whiskers).

EXAMPLE 1 — COMPARE MEDIANS

Year 11 boxplot: median

= 14

hours. Year 12 boxplot: median

= 24

hours. Compare.

SOLUTION

Subtract the medians to quantify the centre difference.

Median diff

=

24 - 14 = 10 hours

Answer: Year 12 students study typically $10$ hours more per week than Year 11.

median diff = 10

EXAMPLE 2 — COMPARE RANGES

Brisbane daily rainfall (sorted):

0, 2, 5, 8, 12, 15, 25

. Cairns:

5, 10, 15, 20, 30, 35, 40

. Compare ranges.

SOLUTION

Range = max − min for each group.

Brisbane range	$=$	$25 - 0 = 25$
Cairns range	$=$	$40 - 5 = 35$

Answer: Cairns has the greater spread (range $35$ vs $25$ ).

Cairns range > Brisbane range

EXAMPLE 3 — COMPARING WITH OUTLIERS

Office A delivery times: typical around

15

–

25

min. Office B: similar typical times but one extreme value of

120

min. Which summary measure is most affected by the outlier?

SOLUTION

The mean includes every value, so the $120$ drags it up sharply. The median just steps up by one position when a single value is added, so it barely changes. Likewise, standard deviation and range are pulled by outliers; IQR is not.

Answer: the mean is most affected by the outlier. For Office B, use median + IQR for a fair comparison with Office A.

mean = most affected

EXAMPLE 4 — SMALL SAMPLE LIMITATION

A sample of

5

Year 12 students has a mean

2

hours higher than a sample of

5

Year 11s. Does this prove Year 12 studies more?

SOLUTION

No. With only $5$ students per group, a difference of $2$ hours could easily come from chance variation between two small samples. The same comparison with, say, $50$ students per group would be much more reliable.

Answer: the sample size is too small — the apparent $2$ -hour difference might just be random variation, not a real population difference.

small samples \Rightarrow unreliable

Common pitfalls

Heavy overlap → no clear difference. If two parallel boxplots overlap heavily, the groups aren't clearly different even if the medians are slightly apart. Real differences usually mean well-separated medians and minimal box overlap.

Use the same scale for parallel histograms or dot plots. Different scales can make comparable distributions look very different or different distributions look the same. Always check the axes.

Small samples → unreliable conclusions. A difference of a few units between two samples of

5

could easily be random. Large samples give much more reliable comparisons.

Mean is dragged by outliers; median isn't. If either group has outliers, compare medians (and IQRs) for a fair, robust comparison.

Always state units and context. "Year 12 is

10

higher" is incomplete. Say "Year 12 students study, on average,

10

hours more per week than Year 11 students."

Frequently asked questions

Which display should I use to compare two groups?

Parallel (side-by-side) boxplots are usually best — they show centre, spread, range, and outliers at a glance. Back-to-back stem-and-leaf plots are good when you want to keep every individual value visible. Side-by-side histograms work for large grouped datasets.

What four things should I compare?

Centre (which group has the higher mean or median, and by how much), spread (which has the larger SD, IQR, or range), shape (are both symmetric or is one skewed), and outliers (any extreme values in either group).

Should I use mean or median to compare?

Use the median if either group has outliers or is skewed, because the median is resistant to outliers. Use the mean for symmetric data with no outliers.

What if two parallel boxplots overlap?

Heavy overlap suggests no clear difference between the groups. A real difference usually means the medians are clearly separated and the boxes barely or do not overlap. Small differences with overlapping boxes could easily be due to random variation.

Why do small samples make comparisons unreliable?

With small samples, a difference of a few units between groups can easily arise by chance. Larger samples give more reliable comparisons because random fluctuations average out.

Why must side-by-side histograms use the same scale?

Different scales can make similar distributions look very different, or make different distributions look similar. Always use the same horizontal and vertical scales when comparing two histograms — otherwise the comparison is misleading.

Video Lessons

Practice Questions

20 questions available.

Practice Questions

← Previous subtopic

Boxplots

Next subtopic →

Problem-Solving Using The Statistical Investigation Process

Comparing Data For A Numerical Variable Across Two Or More Groups

📖 Theory

Centre

Spread

Display choices

Comparing two distributions

Reading parallel boxplots

Common pitfalls

Frequently asked questions

🎬 Video Lessons

✏️ Practice Questions

Theory

Video Lessons

Practice Questions