[latexpage]The measures of center are: Mean, Median, Mode, Midrange
Mean
example : calculate the Mean for this data
1,3,5,6,8,10
\[
\bar{x}=\frac{\sum{x}}{n}=\frac{1+3+5+6+8+10}{6}=5,5
\]
Advantages :
– relatively reliable, means of samples drawn from the same population don’t vary as much as other measure of center
– takes every data value into account
Disadvantages:
– sensitive to every data value, one extreme value could affect it dramatically
– not a resistant measure of center
Median
The median is the middle value. It is the value that splits the dataset in half, making it a natural measure of central tendency. To find the median, order your data from smallest to largest, and then find the data point that has an equal number of values above it and below it.
Example:
the data: 1,2,6,4,3,7,8
first we sort the data into 1,2,3,4,6,7,8 then we know the center point of the numbers which is 4. This is the median.
Advantages:
- not affect by an extreme value
- resistant measure of the center
Disadvantages: - the median is only one data point
Mode
The mode is the value that occurs the most frequently in your data set, making it a different type of measure of central tendency than the mean or median. To identified the mode, sort the values in dataset by numeric values or by categories. Then identify the value that occurs most often.
Example:
your data: 1,2,3,4,4,5,5,5
from the data, we know the most often data occurs is 5, this is our mode.
Advantages:
- could be used to measure central tendency for nominal data
Disadvantages: - could vary a lot between different samples from the same population, especially when the sample size is small
Midrange
Advantages:
- very easy to caculate
Disadvantage: - Very sensitive to extreme data
In instances where continuous data displays a symmetrical distribution, the mean, median, and mode coincide. Analysts typically prefer using the mean in such scenarios because it incorporates the entire dataset into calculations. Conversely, in the presence of a skewed distribution, the median is often the most suitable measure of central tendency.
When dealing with ordinal data, opting for the median or mode is generally recommended. In the case of categorical data, the mode is the essential choice.
#R code
#mean
data=c(1,2,2,3,3,3,4,4,4,4,4,5,5,5,6,6,7)
mean(data)
#[1] 4
#median
median(data)
#[1] 4
#Mode
getmode <- function(v) {
uniqv <- unique(v)
uniqv[which.max(tabulate(match(v, uniqv)))]
}
result <- getmode(data)
print(result)
#[1] 4#Relationbetween mean,median,mode
#If mean, median and mode have an equal values,then the frequency distribution curve will be symmetrical.
data= c(1,2,2,3,3,3,4,4,4,4,4,5,5,5,6,6,7)
modus <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
distributions <- function(x) {
hist(data, col = "peachpuff", border = "black",prob = TRUE)
lines(density(data), lwd = 2, col = "chocolate3")
abline(v = mean(data), col = "royalblue", lwd = 2)
abline(v = median(data), col = "green", lwd = 2)
abline(v = modus(data), col = "red", lwd = 2)
}
distributions()