Finding Outliers in a Data Set


Advertisements

Outliers are data points that don’t fit the pattern of rest of the numbers. They are the extremely high or extremely low values in the data set.

A simple way to find an outlier is to examine the numbers in the data set. We will see that most numbers are clustered around a range and some numbers are way too low or too high compared to rest of the numbers. Such numbers are known as outliers.

Other definition of an outlier

A data point that is distinctly separate from the rest of the data. One definition of outlier is any data point more than 1.5 interquartile ranges (IQRs) below the first quartile or above the third quartile. The interquartile range (IQR) is the difference between the third quartile and the first quartile of the data set.

Find the outlier(s) for the data 0, 2, 5, 6, 9, 12, 35.

Solution

For given data set, we have the following five-number summary.

minimum = 0

first quartile = 2

median = 6

third quartile = 12

maximum = 35

IQR = 12 – 2 = 10, so 1.5·IQR = 15.

To determine if there are outliers we must consider the numbers that are 1.5·IQR or 15 beyond the quartiles.

first quartile – 1.5·IQR = 2 – 15 = –13

third quartile + 1.5·IQR = 12 + 15 = 27

Since 35 is outside the interval from –13 to 27, 35 is the outlier in this data set.

Find the outlier(s) in the given data set below.

28, 26, 29, 30, 81, 32, 37

Solution

Step 1:

The data that is different from other numbers in the given set is 81

Step 2:

So the outlier for this data set is 81

Find the outlier(s) in the given data set below.

16, 14, 3, 12, 15, 17, 22, 15, 52

Solution

Step 1:

The data that is different from other numbers in the given set is 52

Step 2:

So the outlier for this data set is 52

Advertisements