Characteristics Of Distributions Of Numerical Data: Shape, Location And Spread
Theory
Describe a distribution by shape (symmetric, positively skewed, negatively skewed), location (mean, median, mode), and spread (range, IQR, standard deviation). The mean is pulled by outliers; the median is not. Use mean + standard deviation for symmetric data, median + IQR for skewed data or data with outliers.
A numerical distribution is summarised by three features:
- Shape — the overall pattern. Is it symmetric, skewed to one side, or has multiple peaks?
- Location (centre) — where the typical value sits.
- Spread — how widely the values vary.
Common shapes:
- Symmetric — left and right halves mirror each other. Mean
median. - Positively skewed (right-skewed) — long tail on the right. Mean
median. - Negatively skewed (left-skewed) — long tail on the left. Mean
median. - Bimodal — two distinct peaks (often two mixed groups).
Common location measures:
- Mean
. Sensitive to outliers. - Median — middle value when sorted. Position
. For even , average the two middle values. Resistant to outliers. - Mode — most frequent value.
Common spread measures:
- Range = max
min. - Quartiles
split sorted data into four equal parts; is the median. - Interquartile range:
. Spread of the middle 50%. Resistant to outliers. - Standard deviation
— typical distance from the mean. Larger = more spread. - Five-number summary: min,
, median, , max.
The first diagram shows the three main shapes of a distribution. The second demonstrates how an outlier pulls the mean but barely moves the median.
The key formulas are for the mean, the median position, and the IQR.
Mean
Median position
For sorted data of size
Spread
Choosing the right summary
| Shape | Centre | Spread |
|---|---|---|
| Symmetric, no outliers | Mean | Standard deviation |
| Skewed or with outliers | Median | IQR |
Calculating the median
- Sort the data from smallest to largest.
- Find position
. - For odd
: median is the value at that position. For even : average the two middle values.
Calculating quartiles and IQR
- Sort the data and find the median (
). - Lower half (everything below the median):
= median of this half. - Upper half (everything above the median):
= median of this half. .
Choosing centre and spread
- Look at the shape first — is it symmetric, skewed, or contain outliers?
- If symmetric and no outliers: use mean + standard deviation.
- If skewed or has outliers: use median + IQR. These are resistant to extreme values.
Add all values, then divide by
Answer: the mean is
| Median | ||
Answer: the median score is
| median of | ||
| median of | ||
| IQR |
Five-number summary:
Answer:
New dataset:
| New mean | ||
| New median |
The mean jumped from
Answer: the outlier pulled the mean dramatically but barely moved the median, illustrating why the median is preferred for skewed or outlier-prone data.
Common pitfalls
Frequently asked questions
What are the three things to describe about a distribution?
Shape (the overall pattern — symmetric, skewed, or bimodal), location (where the centre sits — mean, median, mode), and spread (how widely the values vary — range, IQR, standard deviation).
What's the difference between positively and negatively skewed?
Positively skewed (right-skewed) has a long tail on the right; the mean is greater than the median because large values pull it up. Negatively skewed (left-skewed) has a long tail on the left; the mean is less than the median.
How do I calculate the median for an even number of values?
Sort the values from smallest to largest. The median is the average of the two middle values. For 8 values, the median is the average of the 4th and 5th values.
What is the interquartile range and why is it useful?
The interquartile range (IQR) is Q3 minus Q1 — the spread of the middle 50 percent of the data. It is resistant to outliers, unlike the range and standard deviation, which makes it a good measure of spread for skewed data.
When should I use mean vs median?
Use the mean (with standard deviation) for roughly symmetric data with no outliers. Use the median (with IQR) for skewed data or data with outliers, because the median is barely affected by extreme values.
What is the five-number summary?
The five-number summary is: minimum, Q1, median, Q3, maximum. It is the foundation for drawing a boxplot and gives a compact picture of the centre, spread, and tails of the distribution.
Video Lessons
Practice Questions
20 questions available.
Practice Questions