r mean median mode

R Mean, Median, and Mode

Posted on

How To Calculate Mean, Median, and Mode in R

A measure of central tendency, also known as measures of center or central location, is a summary statistic that aims to represent an entire dataset with a single value indicating the middle or center of its distribution. The three primary measures of central tendency are the mode, median, and mean. Each of these measures provides a different perspective on the typical or central value within the distribution.

In R, statistical analysis is carried out using numerous built-in functions, most of which are included in the R base package. These functions take an R vector as input along with other arguments to produce the result.

Firstly, we need to prepare our data. we can create the data or import from csv file. As an example we have a file called employees2.csv from our current local directory.

id;name;salary;start_date;division
1;Saleem;700.3;2022-02-01;IT
2;Jim;500.2;2023-09-20;Operations
3;Ahmed;600;2024-08-16;IT
4;Maya;740;2024-05-13;HR
5;Maria;870.25;2022-11-28;Finance
6;Ben;520;2023-05-22;IT
7;Omar;952.8;2023-10-30;Operations
8;Jhon;733.5;2024-06-10;Finance
9;Mike;700.3;2024-09-12;HR
10;Ameer;760.5;2024-07-17;Finance

Then we import our data into dataframe as follow: 

# Import the data using read.csv() 
df = read.csv("D:/personal/akhor umur/statsmelon/R codes/employees2.csv",sep = ";") 
# Print the first 6 rows 
print(head(df)) 

#output:
  id   name salary start_date     division
1  1 Saleem 700.30 2022-02-01         IT  
2  2    Jim 500.20 2023-09-20 Operations  
3  3  Ahmed 600.00 2024-08-16         IT  
4  4   Maya 740.00 2024-05-13         HR  
5  5  Maria 870.25 2022-11-28    Finance  
6  6    Ben 520.00 2023-05-22         IT  

Mean in R Programming

[latex]
\bar{x}=\frac{\sum{x}}{n}
[/latex]
Where, n = number of terms

We use syntax mean() to find a mean value.

Example – calculate the mean of salary

# calculate the salary mean 
mean = mean(df$salary) 
print(mean)

#output:
[1] 707.785

Median in R Programming

The median is the middle value. It is the value that splits the dataset in half.

We use syntax median() to find a median value.

Example – calculate the median of salary

# calculate the salary median
med = median(df$salary) 
print(med)

#output:
[1] 716.9

Mode in R Programming

The mode is the value that occurs the most frequently in your data set, making it a different type of measure of central tendency than the mean or median.

Example: Finding mode by sorting the column of data frame

mode = function(){ 
  return(sort(-table(df$salary))[1]) 
} 
mode() 

#output:
700.3 
   -2 

To Find Mode by Using Modeest Package

# Install the package if not exist
install.packages("modeest")
# Import the library 
library(modeest)
# Compute the mode value 
mode = mfv(df$salary) 
print(mode) 

#output:
[1] 700.3

Mean and median for null values

We can still calculate the mean and median of the data when there are null values (NA). One way to deal with it is by omitting the null values with the argument na.rm = TRUE.

Example:

m <- c(100, NA, NA, 40, 30, NA, 10, 8, NA) 
mean(m, na.rm = TRUE) 
median(m, na.rm = TRUE)

#output:
[1] 37.6
[1] 30

Conclusion:

Calculating mean, and median is done by calling functions: mean(), median(), while mode can be calculated by sorting column of dataframe or by using certain package.