# Extracting Numbers from Strings in R

code
rtip
operations
strings
Author

Steven P. Sanderson II, MPH

Published

June 11, 2024

# Introduction

Hello! Today, we’ll jump into something I think is a pretty neat task in data processing: extracting numbers from strings. We’ll explore three different methods using base R, the `stringr` package, and the `stringi` package. Each method has its own strengths, so let’s get started!

# Examples

## Extracting Numbers with Base R

Base R provides powerful tools to manipulate strings, and you can use regular expressions to extract numbers. Here’s a simple example:

``````# Sample string
text <- "The price is 45 dollars and 50 cents."

# Extract numbers using regular expressions
numbers <- gregexpr("[0-9]+", text)
result <- regmatches(text, numbers)

# Convert to numeric
numeric_result <- as.numeric(unlist(result))

print(numeric_result)``````
``[1] 45 50``

Explanation:

1. `gregexpr("[0-9]+", text)` finds all sequences of digits in the text.
2. `regmatches(text, numbers)` extracts these sequences from the text.
3. `unlist(result)` flattens the list of matches.
4. `as.numeric()` converts the character strings to numeric values.

## Extracting Numbers with `stringr`

The `stringr` package offers a more user-friendly approach to string manipulation. Here’s how you can extract numbers:

``````library(stringr)

# Sample string
text <- "The price is 45 dollars and 50 cents."

# Extract numbers using stringr
numbers <- str_extract_all(text, "\\d+")

# Convert to numeric
numeric_result <- as.numeric(unlist(numbers))

print(numeric_result)``````
``[1] 45 50``

Explanation:

1. `str_extract_all(text, "\\d+")` extracts all sequences of digits from the text. `\\d+` is a regular expression that matches one or more digits.
2. `unlist(numbers)` and `as.numeric()` convert the result to numeric, as explained in the base R method.

## Extracting Numbers with `stringi`

The `stringi` package is another excellent tool for string manipulation, providing robust and efficient functions. Here’s an example:

``````library(stringi)

# Sample string
text <- "The price is 45 dollars and 50 cents."

# Extract numbers using stringi
numbers <- stri_extract_all_regex(text, "\\d+")

# Convert to numeric
numeric_result <- as.numeric(unlist(numbers))

print(numeric_result)``````
``[1] 45 50``

Explanation:

1. `stri_extract_all_regex(text, "\\d+")` extracts all sequences of digits from the text using regular expressions.
2. As before, `unlist(numbers)` and `as.numeric()` are used to convert the result to numeric values.

# Comparison and Conclusion

• Base R is flexible and does not require additional packages, but the syntax can be a bit cumbersome.
• stringr simplifies the process with intuitive functions, making the code easier to read and write.
• stringi offers powerful and efficient string operations, suitable for performance-critical tasks.

# Try It Yourself!

I encourage you to try these methods on your own data. Extracting numbers from strings is a useful skill, especially when working with messy data. Experiment with different strings and see which method you prefer. Happy coding!