Mean, Median, and Mode in Data Mining

A Beginner's Guide to Basic Data Mining Concepts with Simple Examples

Mean (Average)

The mean is the sum of all values in a dataset divided by the number of values.

Example: For the numbers 3, 2, 4, 3, 4, 9, 7:

Mean \(= \frac{3+2+4+3+4+9+7}{7} = \frac{32}{7} \approx 4.57\)

Median

The median is the middle value of a dataset when it is ordered from least to greatest. If the dataset has an even number of observations, the median is the average of the two middle numbers.

Example: For the numbers 3, 2, 4, 3, 4, 9, 7, sorted order is 2, 3, 3, 4, 4, 7, 9:

Median \(= 4\)
(The fourth number in this sorted list.)

Example 2: First, order the dataset: {2, 3, 3, 4, 5, 7, 7, 7, 9, 10}:

Since there are 10 values (an even number), the median is the average of the 5th and 6th values:
Median \(= \frac{5 + 7}{2} = 6\)

Mode

The mode is the value that appears most frequently in a dataset. A dataset can have more than one mode if multiple values have the same highest frequency.

Example: For the numbers 3, 2, 4, 3, 4, 9, 7:

Mode \(= 3\) and \(4\)
(Both 3 and 4 appear twice, more than any other numbers.)

Example 2: For the dataset {2, 3, 3, 4, 5, 7, 7, 7, 9, 10}:

The mode is the value that appears most frequently:
Mode \(= 7\)
(In this dataset, the value 7 appears three times, more than any other number.)