R: Data Manipulation with dplyr

Posted on

Fast & Clean Data Wrangling in R

The dplyr package is one of the most popular tools in R for data manipulation. It makes working with data fast, readable, and efficient using simple verbs like:

select()
filter()
mutate()
arrange()
summarise()
group_by()

In this tutorial, you’ll learn how to use these core functions with clear examples.

1. Install & Load dplyr

install.packages("dplyr")
library(dplyr)

2. Select Columns

Choose only the columns you need:

data2 <- select(data, age, income, city)

Exclude columns:

data2 <- select(data, -id, -email)

3. Filter Rows

Keep only rows that meet conditions:

data3 <- filter(data, age > 30)
data4 <- filter(data, city == "Jakarta")
data5 <- filter(data, age > 30 & city == "Jakarta")

4. Create / Modify Variables with mutate()

data <- mutate(data,
               income_k = income / 1000,
               log_income = log(income))

5. Sort Data with arrange()

data_sorted <- arrange(data, income)
data_desc   <- arrange(data, desc(income))

6. Group & Summarize

data_summary <- data %>%
  group_by(city) %>%
  summarise(
    avg_income = mean(income, na.rm = TRUE),
    n = n()
  )

7. Count Observations

count(data, city)

8. Join Tables

left_join(data1, data2, by = "id")
inner_join(data1, data2, by = "id")

9. Pipe Operator %>%

Instead of:

summarise(group_by(data, city), avg_income = mean(income))

Use:

data %>%
  group_by(city) %>%
  summarise(avg_income = mean(income))

Conclusion

With dplyr, data manipulation becomes fast, readable, powerful.