How to Select Columns Containing a Specific String in R

code
rtip
operations
Author

Steven P. Sanderson II, MPH

Published

May 15, 2024

How to Select Columns Containing a Specific String in R

Today I want to discuss a common task in data manipulation: selecting columns containing a specific string. Whether you’re working with base R or popular packages like stringr, stringi, or dplyr, I’ll show you how to efficiently achieve this. We’ll cover various methods and provide clear examples to help you understand each approach. Let’s get started!

Examples

Using Base R

Example 1: Using grep

In base R, the grep function is your friend. It searches for patterns in a character vector and returns the indices of the matching elements.

# Sample data frame
df <- data.frame(
  apple_price = c(1, 2, 3),
  orange_price = c(4, 5, 6),
  banana_weight = c(7, 8, 9),
  grape_weight = c(10, 11, 12)
)

# Select columns containing "price"
cols <- grep("price", names(df))
print(cols)
[1] 1 2
df_price <- df[, cols]
print(df_price)
  apple_price orange_price
1           1            4
2           2            5
3           3            6
# Using value = TRUE to return column names
cols <- grep("price", names(df), value = TRUE)
print(cols)
[1] "apple_price"  "orange_price"
df_price <- df[, cols]
print(df_price)
  apple_price orange_price
1           1            4
2           2            5
3           3            6

In this example, we use grep to search for the string “price” in the column names. The value = TRUE argument returns the names of the matching columns instead of their indices. We then use these names to subset the data frame.

Example 2: Using grepl

grepl is another useful function that returns a logical vector indicating whether the pattern was found.

# Select columns containing "weight"
cols <- grepl("weight", names(df))
df_weight <- df[, cols]

print(df_weight)
  banana_weight grape_weight
1             7           10
2             8           11
3             9           12

Here, grepl checks each column name for the string “weight” and returns a logical vector. We use this vector to subset the data frame.

Using stringr

The stringr package provides a set of convenient functions for string manipulation. Let’s see how to use it for our task.

Example 3: Using str_detect

library(stringr)

# Select columns containing "price"
cols <- str_detect(names(df), "price")
df_price <- df[, cols]

print(df_price)
  apple_price orange_price
1           1            4
2           2            5
3           3            6

str_detect checks each column name for the presence of the string “price” and returns a logical vector, which we use to subset the data frame.

Using stringi

stringi is another powerful package for string manipulation. It offers a variety of functions for pattern matching.

Example 4: Using stri_detect_fixed

library(stringi)

# Select columns containing "weight"
cols <- stri_detect_fixed(names(df), "weight")
df_weight <- df[, cols]

print(df_weight)
  banana_weight grape_weight
1             7           10
2             8           11
3             9           12

stri_detect_fixed is similar to str_detect but comes from the stringi package. It checks for the fixed pattern “weight” and returns a logical vector.

Using dplyr

dplyr is a popular package for data manipulation. It provides a straightforward way to select columns based on their names.

Example 5: Using select with contains

library(dplyr)

# Select columns containing "price"
df_price <- df %>% select(contains("price"))

print(df_price)
  apple_price orange_price
1           1            4
2           2            5
3           3            6

The select function combined with contains makes it easy to select columns that include the string “price”. This approach is highly readable and concise.

Conclusion

We’ve covered several methods to select columns containing a specific string in R using base R, stringr, stringi, and dplyr. Each method has its strengths, so choose the one that best fits your needs and coding style.

Feel free to experiment with these examples on your own data sets. Understanding these techniques will enhance your data manipulation skills and make your code more efficient and readable. Happy coding!