How to Select Row with Max Value in Specific Column in R
Learn how to select rows with the maximum value in a specific column in R using base R, dplyr, and data.table. This comprehensive guide covers code examples, syntax explanations, and benchmarking tips to help you efficiently filter max value rows in your data frames.
code
rtip
Author
Steven P. Sanderson II, MPH
Published
October 6, 2025
Keywords
Programming, max value column R, select row with max value R, dplyr max value row, data.table max value row, R select row by column maximum, R filter max value, R get row with highest value, R dataframe select max, R find row with max in group, R slice_max example, how to select all rows with maximum value in a column using dplyr, select first row with max value in R dataframe, find row with max value in specific column using data.table, R code to filter rows with highest value in each group, best way to select row with max value in R without dependencies
Key Takeaway: R offers some powerful methods to select rows with maximum values: Base R (simple, no dependencies), dplyr (readable, tidyverse-friendly), and data.table (fast performance). Each has distinct advantages for different scenarios.
Finding rows with maximum values in a specific column is a common operation in data analysis. You could be trying to identify top performers, peak measurements, or maximum scores, R provides multiple efficient approaches. This guide compares Base R, dplyr, and data.table methods with performance insights and practical examples.
Sample Data Setup
Let’s start with a sample dataset to demonstrate each method:
ID Value Group
1 1 10 A
2 2 25 A
3 3 15 B
4 4 25 B
5 5 20 C
Base R Methods
First Row with Max Value
Use which.max() to get the index of the first maximum value:
# Returns first occurrence onlydata[which.max(data$Value), ]
ID Value Group
2 2 25 A
Result: Row 2 (ID=2, Value=25)
All Rows with Max Value (Handling Ties)
Use logical subsetting to capture all maximum values:
# Returns all rows with max valuedata[data$Value ==max(data$Value), ]
ID Value Group
2 2 25 A
4 4 25 B
dplyr Methods
The dplyr package offers slice_max() for intuitive max row selection .
First Max Row
library(dplyr)# First max row onlydata %>%slice_max(Value, n =1, with_ties =FALSE)
ID Value Group
1 2 25 A
All Max Rows (Including Ties)
# All rows with max value (default behavior)data %>%slice_max(Value, n =1)
ID Value Group
1 2 25 A
2 4 25 B
Grouped Max Selection
# Max row per groupdata %>%group_by(Group) %>%slice_max(Value, n =1)
# A tibble: 3 × 3
# Groups: Group [3]
ID Value Group
<dbl> <dbl> <chr>
1 2 25 A
2 4 25 B
3 5 20 C
data.table Methods
data.table provides the fastest performance for large datasets .
Setup and Basic Selection
library(data.table)dt <-as.data.table(data)# First max rowdt[which.max(Value)]
ID Value Group
<num> <num> <char>
1: 2 25 A
# All max rowsdt[Value ==max(Value)]
ID Value Group
<num> <num> <char>
1: 2 25 A
2: 4 25 B
Grouped Max Selection
# Max row per group (fastest method)dt[, .SD[which.max(Value)], by = Group]
Group ID Value
<char> <num> <num>
1: A 2 25
2: B 4 25
3: C 5 20
Performance Comparison Using rbenchmark
Row selection (filtering) is one of the most common data manipulation operations. Let’s compare the performance of Base R, dplyr, and data.table for filtering operations using the rbenchmark package .
# Install and load required packageslibrary(rbenchmark)library(data.table)library(dplyr)# Create sample dataset.seed(123)n <-100000Ldt <-data.table(id =1:n,category =sample(c("A", "B", "C", "D"), n, replace =TRUE),value =rnorm(n, mean =50, sd =15),flag =sample(c(TRUE, FALSE), n, replace =TRUE))# Convert to data.frame for base R and dplyrdf <-as.data.frame(dt)# Benchmark: Finding rows with maximum valueresult <-benchmark(base_R = df[df$value ==max(df$value), ],data_table = dt[value ==max(value)],dplyr =filter(df, value ==max(value)),replications =1000,columns =c("test", "replications", "elapsed", "relative")) |>arrange(relative)print(result)
Challenge: Try with different aggregation functions (median, sd, length). Do the relative performance patterns change?
Exercise 3: Memory Efficiency Test
Investigate memory usage alongside timing:
# Your task: Compare memory efficiencylibrary(pryr) # for object_size()# Create copies for fair comparisondf_copy <- dfdt_copy <-copy(dt)# Time and measure memory for column additionsystem.time({ df_result <-transform(df_copy, new_col = value *2)cat("Base R result size:", object_size(df_result), "\n")})
Base R result size: 3201456
user system elapsed
0.02 0.00 0.03
system.time({ dt_copy[, new_col := value *2]cat("data.table result size:", object_size(dt_copy), "\n")})
data.table result size: 3602088
user system elapsed
0 0 0
Question: Which approach uses less memory? Why might this matter for large datasets?
Exercise 4: Your Own Max Value Challenge
Apply what you’ve learned to your own dataset:
Load your data (or create a sample with rnorm() and sample())
Identify the column you want to find maximum values for
Test all three methods (Base R, dplyr, data.table) for finding max rows
Benchmark the performance using rbenchmark with at least 50 replications
Analyze the results: Which method works best for your specific use case?
Bonus Challenge: Try grouped maximum operations if your data has natural grouping variables!
Quick Takeaways
Some quick easy takeaways from this guide:
• Base Rwhich.max() and logical subsetting provide simple, dependency-free solutions • dplyrslice_max() offers the most readable syntax with excellent tie handling • data.table delivers superior performance, especially for large datasets and grouped operations • rbenchmark helps you make data-driven decisions about method selection • Always consider handling NA values and ties in real-world applications • Choose your method based on dataset size, performance requirements, and code readability preferences • Performance differences become more significant with larger datasets and complex operations
Conclusion
Selecting rows with maximum values in R is straightforward with all three approaches. Base R methods work well for most scenarios without additional packages. dplyr excels when code readability matters most. data.table is your best choice for performance-critical applications with large datasets.
The rbenchmark package provides valuable insights into actual performance differences, helping you make informed decisions about which method to use for your specific situation .
Your turn: Try implementing these methods with your own data and compare the performance differences. Start with the approach that best fits your current workflow and data size requirements, then optimize based on your benchmarking results!
References
Sanderson, S. (2024, December 10). How to Select Row with Max Value in Specific Column in R: A Complete Guide. R-bloggers. https://www.r-bloggers.com/2024/12/how-to-select-row-with-max-value-in-specific-column-in-r-a-complete-guide/
Statology. (n.d.). R: How to Select Row with Max Value in Specific Column. Statology. Retrieved October 6, 2025, from https://www.statology.org/r-select-row-with-max-value/