In R programming, categorical data is stored in factors. Factors are very important in data analysis.
R Factors Tutorials
Factors are data objects used to categorize and store data as levels. They can contain both strings and integers, making them ideal for columns with a limited number of unique values, such as “True”, “False”, or “Male”, “Female”. Factors are particularly useful in data analysis for statistical modeling.
Syntax: factor() or as.factor()
Example:
# Create a vector.
data <- c("Mountain","Sea","Sky","Mountain","Sea","Land","Mountain","Sea","Mountain","Sky","Land")
print(data)
print(is.factor(data))
# Apply the factor.
factor_data <- factor(data)
print(factor_data)
print(is.factor(factor_data))
#output:
[1] "Mountain" "Sea" "Sky" "Mountain" "Sea" "Land" "Mountain" "Sea" "Mountain" "Sky" "Land"
[1] FALSE
[1] Mountain Sea Sky Mountain Sea Land Mountain Sea Mountain Sky Land
Levels: Land Mountain Sea Sky
[1] TRUE
Factors in Data Frame
R treats the text in column of dataframe as categorical and creates factors on it.
Example:
# Create the vectors for data frame.
gender <- c("female","male","female","male","female","male","male")
age <- c(20,25,30,35,40,45,50)
weight <- c(47,55,66,67,70,70,65)
# Create the data frame.
df1 <- data.frame(gender,age,weight)
print(df1)
# Print the gender column
print(df1$gender)
# Test if the gender column is a factor.
print(is.factor(df1$gender))
# Apply the factor.
factor_data <- factor(df1$gender)
print(factor_data)
print(is.factor(factor_data))
#output:
gender age weight
1 female 20 47
2 male 25 55
3 female 30 66
4 male 35 67
5 female 40 70
6 male 45 70
7 male 50 65
[1] "female" "male" "female" "male" "female" "male" "male"
[1] FALSE
[1] female male female male female male male
Levels: female male
[1] TRUE
Changing Order of Levels Factor
Example:
# Create a vector.
df2 <- c("Mountain","Sea","Sky","Mountain","Sea","Land","Mountain","Sea","Mountain","Sky","Land")
print(data)
print(is.factor(df2))
print(factor(df2))
# Apply the factor function with required order of the level.
new_order <- factor(df2,levels = c("Land","Sea","Sky","Mountain"))
print(new_order)
#output:
[1] "Mountain" "Sea" "Sky" "Mountain" "Sea" "Land" "Mountain" "Sea" "Mountain" "Sky" "Land"
[1] FALSE
[1] Mountain Sea Sky Mountain Sea Land Mountain Sea Mountain Sky Land
Levels: Land Mountain Sea Sky
[1] Mountain Sea Sky Mountain Sea Land Mountain Sea Mountain Sky Land
Levels: Land Sea Sky Mountain
Generate Factor Levels
We can generate factor levels by using the gl() function.
Syntax: gl(n, k, labels)where:
- n – integer giving the number of levels.
- k – integer giving the number of replications.
- labels – vector of labels of resulting factor levels.
Example:
v <- gl(3, 4, labels = c("Java", "Bali","Borneo"))
print(v)
#output:
[1] Java Java Java Java Bali Bali Bali Bali Borneo Borneo Borneo Borneo
Levels: Java Bali Borneo
Coclusion:
Creating factor is done by calling function as.factor() or factor().
