# Sample data
data <- data.frame(
ID = c(1, 2, 3, 4, 5),
Value = c(10, 25, 15, 25, 20),
Group = c("A", "A", "B", "B", "C")
)
print(data) ID Value Group
1 1 10 A
2 2 25 A
3 3 15 B
4 4 25 B
5 5 20 C
Steven P. Sanderson II, MPH
October 6, 2025
Programming, max value column R, select row with max value R, dplyr max value row, data.table max value row, R select row by column maximum, R filter max value, R get row with highest value, R dataframe select max, R find row with max in group, R slice_max example, how to select all rows with maximum value in a column using dplyr, select first row with max value in R dataframe, find row with max value in specific column using data.table, R code to filter rows with highest value in each group, best way to select row with max value in R without dependencies
Key Takeaway: R offers some powerful methods to select rows with maximum values: Base R (simple, no dependencies), dplyr (readable, tidyverse-friendly), and data.table (fast performance). Each has distinct advantages for different scenarios.
Finding rows with maximum values in a specific column is a common operation in data analysis. You could be trying to identify top performers, peak measurements, or maximum scores, R provides multiple efficient approaches. This guide compares Base R, dplyr, and data.table methods with performance insights and practical examples.
Let’s start with a sample dataset to demonstrate each method:
Use which.max() to get the index of the first maximum value:
Result: Row 2 (ID=2, Value=25)
Use logical subsetting to capture all maximum values:
The dplyr package offers slice_max() for intuitive max row selection .
data.table provides the fastest performance for large datasets .
Row selection (filtering) is one of the most common data manipulation operations. Let’s compare the performance of Base R, dplyr, and data.table for filtering operations using the rbenchmark package .
# Install and load required packages
library(rbenchmark)
library(data.table)
library(dplyr)
# Create sample data
set.seed(123)
n <- 100000L
dt <- data.table(
id = 1:n,
category = sample(c("A", "B", "C", "D"), n, replace = TRUE),
value = rnorm(n, mean = 50, sd = 15),
flag = sample(c(TRUE, FALSE), n, replace = TRUE)
)
# Convert to data.frame for base R and dplyr
df <- as.data.frame(dt)
# Benchmark: Finding rows with maximum value
result <- benchmark(
base_R = df[df$value == max(df$value), ],
data_table = dt[value == max(value)],
dplyr = filter(df, value == max(value)),
replications = 1000,
columns = c("test", "replications", "elapsed", "relative")
) |>
arrange(relative)
print(result) test replications elapsed relative
1 base_R 1000 1.72 1.000
2 dplyr 1000 4.25 2.471
3 data_table 1000 6.10 3.547
Always include error handling for edge cases where no maximum can be determined.
| Scenario | Recommended Method | Reason |
|---|---|---|
| Small data (<1K rows) | Any method | Performance differences minimal |
| Medium data (1K-10K) | Base R or data.table | Good performance balance |
| Large data (>10K rows) | data.table | Best performance scaling |
| Readability priority | dplyr | Clear, expressive syntax |
| No dependencies | Base R | Built-in functionality |
Now it’s your turn to experiment with rbenchmark and deepen your understanding of R performance optimization!
Compare the performance of different string matching methods:
library(stringr)
# Create test data
text_data <- data.frame(
id = 1:50000,
text = sample(c("apple", "banana", "cherry", "date", "elderberry"),
50000, replace = TRUE),
stringsAsFactors = FALSE
)
# Your task: Benchmark these approaches for finding rows containing "app"
benchmark(
base_R = text_data[grepl("app", text_data$text), ],
base_R_fixed = text_data[grepl("app", text_data$text, fixed = TRUE), ],
stringr = text_data[str_detect(text_data$text, "app"), ],
replications = 50,
columns = c("test", "elapsed", "relative")
) |>
arrange(relative) test elapsed relative
1 base_R_fixed 0.26 1.000
2 base_R 0.95 3.654
3 stringr 1.30 5.000
Question: Which method is fastest? Why might fixed = TRUE make a difference?
Compare grouping and summarization methods:
# Create grouping data
group_data <- data.frame(
group = sample(LETTERS[1:10], 10000, replace = TRUE),
value = rnorm(10000)
)
dt_group <- as.data.table(group_data)
# Your task: Benchmark mean calculation by group
benchmark(
base_R = aggregate(value ~ group, data = group_data, FUN = mean),
data_table = dt_group[, .(mean_value = mean(value)), by = group],
dplyr = group_data %>% group_by(group) %>% summarise(mean_value = mean(value)),
replications = 100,
columns = c("test", "elapsed", "relative")
) |>
arrange(relative) test elapsed relative
1 data_table 0.23 1.000
2 dplyr 0.67 2.913
3 base_R 1.23 5.348
Challenge: Try with different aggregation functions (median, sd, length). Do the relative performance patterns change?
Investigate memory usage alongside timing:
# Your task: Compare memory efficiency
library(pryr) # for object_size()
# Create copies for fair comparison
df_copy <- df
dt_copy <- copy(dt)
# Time and measure memory for column addition
system.time({
df_result <- transform(df_copy, new_col = value * 2)
cat("Base R result size:", object_size(df_result), "\n")
})Base R result size: 3201456
user system elapsed
0.02 0.00 0.03
system.time({
dt_copy[, new_col := value * 2]
cat("data.table result size:", object_size(dt_copy), "\n")
})data.table result size: 3602088
user system elapsed
0 0 0
Question: Which approach uses less memory? Why might this matter for large datasets?
Apply what you’ve learned to your own dataset:
rnorm() and sample())rbenchmark with at least 50 replicationsBonus Challenge: Try grouped maximum operations if your data has natural grouping variables!
Some quick easy takeaways from this guide:
• Base R which.max() and logical subsetting provide simple, dependency-free solutions • dplyr slice_max() offers the most readable syntax with excellent tie handling • data.table delivers superior performance, especially for large datasets and grouped operations • rbenchmark helps you make data-driven decisions about method selection • Always consider handling NA values and ties in real-world applications • Choose your method based on dataset size, performance requirements, and code readability preferences • Performance differences become more significant with larger datasets and complex operations
Selecting rows with maximum values in R is straightforward with all three approaches. Base R methods work well for most scenarios without additional packages. dplyr excels when code readability matters most. data.table is your best choice for performance-critical applications with large datasets.
The rbenchmark package provides valuable insights into actual performance differences, helping you make informed decisions about which method to use for your specific situation .
Your turn: Try implementing these methods with your own data and compare the performance differences. Start with the approach that best fits your current workflow and data size requirements, then optimize based on your benchmarking results!
Sanderson, S. (2024, December 10). How to Select Row with Max Value in Specific Column in R: A Complete Guide. R-bloggers. https://www.r-bloggers.com/2024/12/how-to-select-row-with-max-value-in-specific-column-in-r-a-complete-guide/
Statology. (n.d.). R: How to Select Row with Max Value in Specific Column. Statology. Retrieved October 6, 2025, from https://www.statology.org/r-select-row-with-max-value/
Happy Coding! 🚀
You can connect with me at any one of the below:
Telegram Channel here: https://t.me/steveondata
LinkedIn Network here: https://www.linkedin.com/in/spsanderson/
Mastadon Social here: https://mstdn.social/@stevensanderson
RStats Network here: https://rstats.me/@spsanderson
GitHub Network here: https://github.com/spsanderson
Bluesky Network here: https://bsky.app/profile/spsanderson.com
My Book: Extending Excel with Python and R here: https://packt.link/oTyZJ
You.com Referral Link: https://you.com/join/EHSLDTL6