r factors

R Factors

Posted on

In R programming, categorical data is stored in factors. Factors are very important in data analysis.

R Factors Tutorials


Factors are data objects used to categorize and store data as levels. They can contain both strings and integers, making them ideal for columns with a limited number of unique values, such as “True”, “False”, or “Male”, “Female”. Factors are particularly useful in data analysis for statistical modeling.

Syntax: factor() or as.factor()

Example:

# Create a vector.
data <- c("Mountain","Sea","Sky","Mountain","Sea","Land","Mountain","Sea","Mountain","Sky","Land")

print(data)
print(is.factor(data))

# Apply the factor.
factor_data <- factor(data)

print(factor_data)
print(is.factor(factor_data))

#output:
[1] "Mountain" "Sea" "Sky" "Mountain" "Sea" "Land" "Mountain" "Sea" "Mountain" "Sky"      "Land" 
[1] FALSE

[1] Mountain Sea Sky Mountain Sea Land Mountain Sea Mountain Sky Land    
Levels: Land Mountain Sea Sky
[1] TRUE

Factors in Data Frame

R treats the text in column of dataframe as categorical and creates factors on it.

Example:

# Create the vectors for data frame.
gender <- c("female","male","female","male","female","male","male")
age <- c(20,25,30,35,40,45,50)
weight <- c(47,55,66,67,70,70,65)

# Create the data frame.
df1 <- data.frame(gender,age,weight)
print(df1)

# Print the gender column
print(df1$gender)

# Test if the gender column is a factor.
print(is.factor(df1$gender))

# Apply the factor.
factor_data <- factor(df1$gender)

print(factor_data)
print(is.factor(factor_data))

#output:
  gender age weight
1 female  20     47
2   male  25     55
3 female  30     66
4   male  35     67
5 female  40     70
6   male  45     70
7   male  50     65

[1] "female" "male"   "female" "male"   "female" "male"   "male"  
[1] FALSE

[1] female male   female male   female male   male  
Levels: female male

[1] TRUE  

Changing Order of Levels Factor

Example:

# Create a vector.
df2 <- c("Mountain","Sea","Sky","Mountain","Sea","Land","Mountain","Sea","Mountain","Sky","Land")
print(data)
print(is.factor(df2))
print(factor(df2))
# Apply the factor function with required order of the level.
new_order <- factor(df2,levels = c("Land","Sea","Sky","Mountain"))
print(new_order)

#output:
[1] "Mountain" "Sea" "Sky" "Mountain" "Sea" "Land" "Mountain" "Sea" "Mountain" "Sky"      "Land"    
[1] FALSE

[1] Mountain Sea Sky Mountain Sea  Land Mountain Sea  Mountain Sky Land    
Levels: Land Mountain Sea Sky

[1] Mountain Sea Sky Mountain Sea Land Mountain Sea Mountain Sky Land    
Levels: Land Sea Sky Mountain

Generate Factor Levels

We can generate factor levels by using the gl() function.

Syntax: gl(n, k, labels)
where:

  • n – integer giving the number of levels.
  • k – integer giving the number of replications.
  • labels – vector of labels of resulting factor levels.

Example:

v <- gl(3, 4, labels = c("Java", "Bali","Borneo"))
print(v)

#output:
[1] Java   Java   Java   Java   Bali   Bali   Bali   Bali   Borneo Borneo Borneo Borneo
Levels: Java Bali Borneo

Coclusion:

Creating factor is done by calling function as.factor() or factor().