Fast & Clean Data Wrangling in R
The dplyr package is one of the most popular tools in R for data manipulation. It makes working with data fast, readable, and efficient using simple verbs like:
select()filter()mutate()arrange()summarise()group_by()
In this tutorial, you’ll learn how to use these core functions with clear examples.
1. Install & Load dplyr
install.packages("dplyr")
library(dplyr)
2. Select Columns
Choose only the columns you need:
data2 <- select(data, age, income, city)
Exclude columns:
data2 <- select(data, -id, -email)
3. Filter Rows
Keep only rows that meet conditions:
data3 <- filter(data, age > 30) data4 <- filter(data, city == "Jakarta") data5 <- filter(data, age > 30 & city == "Jakarta")
4. Create / Modify Variables with mutate()
data <- mutate(data,
income_k = income / 1000,
log_income = log(income))
5. Sort Data with arrange()
data_sorted <- arrange(data, income) data_desc <- arrange(data, desc(income))
6. Group & Summarize
data_summary <- data %>%
group_by(city) %>%
summarise(
avg_income = mean(income, na.rm = TRUE),
n = n()
)
7. Count Observations
count(data, city)
8. Join Tables
left_join(data1, data2, by = "id") inner_join(data1, data2, by = "id")
9. Pipe Operator %>%
Instead of:
summarise(group_by(data, city), avg_income = mean(income))
Use:
data %>% group_by(city) %>% summarise(avg_income = mean(income))
Conclusion
With dplyr, data manipulation becomes fast, readable, powerful.