Key Takeaway: R offers some powerful methods to select rows with maximum values: Base R (simple, no dependencies), dplyr (readable, tidyverse-friendly), and data.table (fast performance). Each has distinct advantages for different scenarios.

Finding rows with maximum values in a specific column is a common operation in data analysis. You could be trying to identify top performers, peak measurements, or maximum scores, R provides multiple efficient approaches. This guide compares Base R, dplyr, and data.table methods with performance insights and practical examples.

Sample Data Setup

Let’s start with a sample dataset to demonstrate each method:

# Sample data
data <- data.frame(
  ID = c(1, 2, 3, 4, 5),
  Value = c(10, 25, 15, 25, 20),
  Group = c("A", "A", "B", "B", "C")
)

print(data)

  ID Value Group
1  1    10     A
2  2    25     A
3  3    15     B
4  4    25     B
5  5    20     C

Base R Methods

First Row with Max Value

Use which.max() to get the index of the first maximum value:

# Returns first occurrence only
data[which.max(data$Value), ]

  ID Value Group
2  2    25     A

Result: Row 2 (ID=2, Value=25)

All Rows with Max Value (Handling Ties)

Use logical subsetting to capture all maximum values:

# Returns all rows with max value
data[data$Value == max(data$Value), ]

  ID Value Group
2  2    25     A
4  4    25     B

dplyr Methods

The dplyr package offers slice_max() for intuitive max row selection .

First Max Row

library(dplyr)

# First max row only
data %>% slice_max(Value, n = 1, with_ties = FALSE)

  ID Value Group
1  2    25     A

All Max Rows (Including Ties)

# All rows with max value (default behavior)
data %>% slice_max(Value, n = 1)

  ID Value Group
1  2    25     A
2  4    25     B

Grouped Max Selection

# Max row per group
data %>% group_by(Group) %>% slice_max(Value, n = 1)

# A tibble: 3 × 3
# Groups:   Group [3]
     ID Value Group
  <dbl> <dbl> <chr>
1     2    25 A    
2     4    25 B    
3     5    20 C

data.table Methods

data.table provides the fastest performance for large datasets .

Setup and Basic Selection

library(data.table)
dt <- as.data.table(data)

# First max row
dt[which.max(Value)]

      ID Value  Group
   <num> <num> <char>
1:     2    25      A

# All max rows
dt[Value == max(Value)]

      ID Value  Group
   <num> <num> <char>
1:     2    25      A
2:     4    25      B

Grouped Max Selection

# Max row per group (fastest method)
dt[, .SD[which.max(Value)], by = Group]

    Group    ID Value
   <char> <num> <num>
1:      A     2    25
2:      B     4    25
3:      C     5    20

Performance Comparison Using rbenchmark

Row selection (filtering) is one of the most common data manipulation operations. Let’s compare the performance of Base R, dplyr, and data.table for filtering operations using the rbenchmark package .

# Install and load required packages
library(rbenchmark)
library(data.table)
library(dplyr)

# Create sample data
set.seed(123)
n <- 100000L
dt <- data.table(
  id = 1:n,
  category = sample(c("A", "B", "C", "D"), n, replace = TRUE),
  value = rnorm(n, mean = 50, sd = 15),
  flag = sample(c(TRUE, FALSE), n, replace = TRUE)
)

# Convert to data.frame for base R and dplyr
df <- as.data.frame(dt)

# Benchmark: Finding rows with maximum value
result <- benchmark(
  base_R = df[df$value == max(df$value), ],
  data_table = dt[value == max(value)],
  dplyr = filter(df, value == max(value)),
  replications = 1000,
  columns = c("test", "replications", "elapsed", "relative")
) |>
        arrange(relative)

print(result)

        test replications elapsed relative
1     base_R         1000    1.72    1.000
2      dplyr         1000    4.25    2.471
3 data_table         1000    6.10    3.547

Handling Edge Cases

Missing Values (NA)

# Base R: Remove NAs
data[data$Value == max(data$Value, na.rm = TRUE), ]

  ID Value Group
2  2    25     A
4  4    25     B

# dplyr: Filter NAs first
data %>% filter(!is.na(Value)) %>% slice_max(Value, n = 1)

  ID Value Group
1  2    25     A
2  4    25     B

All Values are NA

Always include error handling for edge cases where no maximum can be determined.

Method Selection Guide

Scenario	Recommended Method	Reason
Small data (<1K rows)	Any method	Performance differences minimal
Medium data (1K-10K)	Base R or data.table	Good performance balance
Large data (>10K rows)	data.table	Best performance scaling
Readability priority	dplyr	Clear, expressive syntax
No dependencies	Base R	Built-in functionality

Your Turn!

Now it’s your turn to experiment with rbenchmark and deepen your understanding of R performance optimization!

Exercise 1: String Operations Benchmark

Compare the performance of different string matching methods:

library(stringr)

# Create test data
text_data <- data.frame(
  id = 1:50000,
  text = sample(c("apple", "banana", "cherry", "date", "elderberry"), 
                50000, replace = TRUE),
  stringsAsFactors = FALSE
)

# Your task: Benchmark these approaches for finding rows containing "app"
benchmark(
  base_R = text_data[grepl("app", text_data$text), ],
  base_R_fixed = text_data[grepl("app", text_data$text, fixed = TRUE), ],
  stringr = text_data[str_detect(text_data$text, "app"), ],
  replications = 50,
  columns = c("test", "elapsed", "relative")
) |>
        arrange(relative)

          test elapsed relative
1 base_R_fixed    0.26    1.000
2       base_R    0.95    3.654
3      stringr    1.30    5.000

Question: Which method is fastest? Why might fixed = TRUE make a difference?

Exercise 2: Aggregation Benchmark

Compare grouping and summarization methods:

# Create grouping data
group_data <- data.frame(
  group = sample(LETTERS[1:10], 10000, replace = TRUE),
  value = rnorm(10000)
)
dt_group <- as.data.table(group_data)

# Your task: Benchmark mean calculation by group
benchmark(
  base_R = aggregate(value ~ group, data = group_data, FUN = mean),
  data_table = dt_group[, .(mean_value = mean(value)), by = group],
  dplyr = group_data %>% group_by(group) %>% summarise(mean_value = mean(value)),
  replications = 100,
  columns = c("test", "elapsed", "relative")
) |>
        arrange(relative)

        test elapsed relative
1 data_table    0.23    1.000
2      dplyr    0.67    2.913
3     base_R    1.23    5.348

Challenge: Try with different aggregation functions (median, sd, length). Do the relative performance patterns change?

Exercise 3: Memory Efficiency Test

Investigate memory usage alongside timing:

# Your task: Compare memory efficiency
library(pryr)  # for object_size()

# Create copies for fair comparison
df_copy <- df
dt_copy <- copy(dt)

# Time and measure memory for column addition
system.time({
  df_result <- transform(df_copy, new_col = value * 2)
  cat("Base R result size:", object_size(df_result), "\n")
})

Base R result size: 3201456

   user  system elapsed 
   0.02    0.00    0.03

system.time({
  dt_copy[, new_col := value * 2]
  cat("data.table result size:", object_size(dt_copy), "\n")
})

data.table result size: 3602088

   user  system elapsed 
      0       0       0

Question: Which approach uses less memory? Why might this matter for large datasets?

Exercise 4: Your Own Max Value Challenge

Apply what you’ve learned to your own dataset:

Load your data (or create a sample with rnorm() and sample())
Identify the column you want to find maximum values for
Test all three methods (Base R, dplyr, data.table) for finding max rows
Benchmark the performance using rbenchmark with at least 50 replications
Analyze the results: Which method works best for your specific use case?

Bonus Challenge: Try grouped maximum operations if your data has natural grouping variables!

Quick Takeaways

Some quick easy takeaways from this guide:

• Base R which.max() and logical subsetting provide simple, dependency-free solutions • dplyr slice_max() offers the most readable syntax with excellent tie handling • data.table delivers superior performance, especially for large datasets and grouped operations • rbenchmark helps you make data-driven decisions about method selection • Always consider handling NA values and ties in real-world applications • Choose your method based on dataset size, performance requirements, and code readability preferences • Performance differences become more significant with larger datasets and complex operations

Conclusion

Selecting rows with maximum values in R is straightforward with all three approaches. Base R methods work well for most scenarios without additional packages. dplyr excels when code readability matters most. data.table is your best choice for performance-critical applications with large datasets.

The rbenchmark package provides valuable insights into actual performance differences, helping you make informed decisions about which method to use for your specific situation .

Your turn: Try implementing these methods with your own data and compare the performance differences. Start with the approach that best fits your current workflow and data size requirements, then optimize based on your benchmarking results!

References

Sanderson, S. (2024, December 10). How to Select Row with Max Value in Specific Column in R: A Complete Guide. R-bloggers. https://www.r-bloggers.com/2024/12/how-to-select-row-with-max-value-in-specific-column-in-r-a-complete-guide/
Statology. (n.d.). R: How to Select Row with Max Value in Specific Column. Statology. Retrieved October 6, 2025, from https://www.statology.org/r-select-row-with-max-value/

Happy Coding! 🚀

You can connect with me at any one of the below:

Telegram Channel here: https://t.me/steveondata

LinkedIn Network here: https://www.linkedin.com/in/spsanderson/

Mastadon Social here: https://mstdn.social/@stevensanderson

RStats Network here: https://rstats.me/@spsanderson

GitHub Network here: https://github.com/spsanderson

Bluesky Network here: https://bsky.app/profile/spsanderson.com

My Book: Extending Excel with Python and R here: https://packt.link/oTyZJ

You.com Referral Link: https://you.com/join/EHSLDTL6