# How to Subset Data Frame in R by Multiple Conditions

code
rtip
operations
dplyr
datatable
Author

Steven P. Sanderson II, MPH

Published

March 7, 2024

# Introduction

In data analysis with R, subsetting data frames based on multiple conditions is a common task. It allows us to extract specific subsets of data that meet certain criteria. In this blog post, we will explore how to subset a data frame using three different methods: base R’s `subset()` function, dplyr’s `filter()` function, and the data.table package.

# Examples

## Using Base R’s subset() Function

Base R provides a handy function called `subset()` that allows us to subset data frames based on one or more conditions.

``````# Load the mtcars dataset
data(mtcars)

# Subset data frame using subset() function
subset_mtcars <- subset(mtcars, mpg > 20 & cyl == 4)

# View the resulting subset
print(subset_mtcars)``````
``````                mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Datsun 710     22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Merc 240D      24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230       22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Fiat 128       32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic    30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Toyota Corona  21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Fiat X1-9      27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2``````

In the above code, we first load the `mtcars` dataset. Then, we use the `subset()` function to create a subset of the data frame where the miles per gallon (`mpg`) is greater than 20 and the number of cylinders (`cyl`) is equal to 4. Finally, we print the resulting subset.

## Using dplyr’s filter() Function

dplyr is a powerful package for data manipulation, and it provides the `filter()` function for subsetting data frames based on conditions.

``````# Load the dplyr package
library(dplyr)

# Subset data frame using filter() function
filter_mtcars <- mtcars %>%
filter(mpg > 20, cyl == 4)

# View the resulting subset
print(filter_mtcars)``````
``````                mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Datsun 710     22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Merc 240D      24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230       22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Fiat 128       32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic    30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Toyota Corona  21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Fiat X1-9      27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2``````

In this code snippet, we load the dplyr package and use the `%>%` operator, also known as the pipe operator, to pipe the `mtcars` dataset into the `filter()` function. We specify the conditions within the `filter()` function to create the subset, and then print the resulting subset.

## Using data.table Package

The data.table package is known for its speed and efficiency in handling large datasets. We can use data.table’s syntax to subset data frames as well.

``````# Load the data.table package
library(data.table)

# Convert mtcars to data.table
dt_mtcars <- as.data.table(mtcars)

# Subset data frame using data.table syntax
dt_subset_mtcars <- dt_mtcars[mpg > 20 & cyl == 4]

# Convert back to data frame (optional)
subset_mtcars_dt <- as.data.frame(dt_subset_mtcars)

# View the resulting subset
print(subset_mtcars_dt)``````
``````    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
1  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
2  24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
3  22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
4  32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
5  30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
6  33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
7  21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
8  27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
9  26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
10 30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
11 21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2``````

In this code block, we first load the data.table package and convert the `mtcars` data frame into a data.table using the `as.data.table()` function. Then, we subset the data using data.table’s syntax, specifying the conditions within square brackets. Optionally, we can convert the resulting subset back to a data frame using `as.data.frame()` function before printing it.

# Conclusion

In this blog post, we learned three different methods for subsetting data frames in R by multiple conditions. Whether you prefer base R’s `subset()` function, dplyr’s `filter()` function, or data.table’s syntax, there are multiple ways to achieve the same result. I encourage you to try out these methods on your own datasets and explore the flexibility and efficiency they offer in data manipulation tasks. Happy coding!