How to Retrieve Row Numbers in R: Complete Guide with Base R, dplyr, and data.table Examples

Learn how to get row numbers in R with clear examples in base R, dplyr, and data.table for efficient data manipulation and grouping.
code
rtip
Author

Steven P. Sanderson II, MPH

Published

August 4, 2025

Keywords

Programming, how to get row numbers in R, row numbers in R, retrieve row numbers R, R row indexing, row number R programming, base R row numbers, dplyr row number example, data.table row indexing, R group row numbers, conditional row selection R, how to add row numbers to a data frame in R, get row numbers within groups using dplyr in R, efficient way to retrieve row numbers in data.table, find row numbers based on condition in R, create sequential row numbers by group in R programming

Key Insight: Retrieving row numbers in R is a skill that comes in very handy for any R programmer. No matter if you’re working with base R, dplyr, or data.table, each approach has its strengths, and choosing the right method can significantly impact your code’s performance and readability.

Working with row numbers is one of the most common tasks in R programming. Whether you need to identify specific rows, create unique identifiers, or filter data based on position, understanding how to retrieve row numbers efficiently is crucial for effective data manipulation.

In this comprehensive guide, you’ll learn multiple approaches to retrieve row numbers in R using base R, dplyr, and data.table packages. We’ll cover the syntax, provide practical examples, and compare performance to help you choose the best method for your specific use case.


Why Row Numbers Matter in R Programming

Row numbers serve several critical purposes in data analysis:

  • Data identification: Uniquely identify rows for tracking and referencing
  • Conditional filtering: Select rows based on their position
  • Ranking and ordering: Create rankings within groups or datasets
  • Data validation: Check data integrity and identify duplicates
  • Indexing: Create custom indices for complex data operations

Understanding different approaches to retrieve row numbers gives you flexibility to choose the most appropriate method based on your data size, performance requirements, and coding style preferences.


Base R Methods for Row Number Retrieval

Base R provides several built-in functions for working with row numbers. These methods are reliable, widely supported, and often surprisingly fast for many use cases .

Using rownames() and row.names()

The most straightforward way to get row identifiers in base R is using rownames() or row.names():

# Create sample data frame
df <- data.frame(
  name = c("Alice", "Bob", "Charlie", "Diana"),
  age = c(25, 30, 35, 28),
  city = c("New York", "Boston", "Chicago", "Miami")
)

# Get row names (returns character vector)
rownames(df)
[1] "1" "2" "3" "4"
# [1] "1" "2" "3" "4"

# Alternative syntax (identical result)
row.names(df)
[1] "1" "2" "3" "4"
# [1] "1" "2" "3" "4"

Simple Explanation: Both functions return the row names as a character vector. By default, R assigns sequential numbers as row names starting from “1”.

Creating Sequential Row Numbers with seq_len()

To generate actual numeric row numbers, combine seq_len() with nrow():

# Add row numbers as a new column
df$row_num <- seq_len(nrow(df))
print(df)
     name age     city row_num
1   Alice  25 New York       1
2     Bob  30   Boston       2
3 Charlie  35  Chicago       3
4   Diana  28    Miami       4

Simple Explanation: seq_len(nrow(df)) creates a sequence from 1 to the number of rows in the data frame. This is the standard base R idiom for generating row numbers .

Finding Row Numbers with Conditions using which()

Use which() to find row numbers that meet specific criteria:

# Find rows where age is greater than 30
which(df$age > 30)
[1] 3
# Find rows where city is "Boston"
which(df$city == "Boston")
[1] 2
# Multiple conditions
which(df$age > 25 & df$city != "Miami")
[1] 2 3

Simple Explanation: which() returns the positions (row numbers) where a logical condition is TRUE. It’s perfect for conditional row selection .

Row Numbers Within Groups using ave()

For grouped operations, use ave() with seq_along():

# Add group column
df$group <- c("A", "A", "B", "B")

# Create row numbers within each group
df$group_row <- ave(df$age, df$group, FUN = seq_along)
print(df[, c("name", "group", "group_row")])
     name group group_row
1   Alice     A         1
2     Bob     A         2
3 Charlie     B         1
4   Diana     B         2

Simple Explanation: ave() applies a function within groups. seq_along() creates sequential numbers for each group separately.


dplyr Methods for Row Number Retrieval

The dplyr package offers intuitive, pipe-friendly functions for row number operations. While generally slower than base R for large datasets, dplyr excels in readability and integration with tidyverse workflows.

Basic Row Numbering with row_number()

library(dplyr)

# Add row numbers using mutate
df <- df %>%
  mutate(dplyr_row_num = row_number())

print(df %>% select(name, dplyr_row_num))
     name dplyr_row_num
1   Alice             1
2     Bob             2
3 Charlie             3
4   Diana             4

Simple Explanation: row_number() creates consecutive integers for each row. Combined with mutate(), it adds a new column with row numbers.

Conditional Row Selection with slice()

# Select specific rows by position
df %>% slice(1, 3)
     name age     city row_num group group_row dplyr_row_num
1   Alice  25 New York       1     A         1             1
2 Charlie  35  Chicago       3     B         1             3
# Select first two rows
df %>% slice(1:2)
   name age     city row_num group group_row dplyr_row_num
1 Alice  25 New York       1     A         1             1
2   Bob  30   Boston       2     A         2             2
# Select last row
df %>% slice(n())
   name age  city row_num group group_row dplyr_row_num
1 Diana  28 Miami       4     B         2             4

Simple Explanation: slice() selects rows by their position. Use n() to reference the last row.

Row Numbers Within Groups

# Row numbers within each group
df %>%
  group_by(group) %>%
  mutate(group_row_dplyr = row_number()) %>%
  select(name, group, group_row_dplyr)
# A tibble: 4 × 3
# Groups:   group [2]
  name    group group_row_dplyr
  <chr>   <chr>           <int>
1 Alice   A                   1
2 Bob     A                   2
3 Charlie B                   1
4 Diana   B                   2

Simple Explanation: Combine group_by() with row_number() to restart numbering within each group.

Finding Row Numbers with Filter

# Get row numbers for rows meeting criteria
df %>%
  mutate(original_row = row_number()) %>%
  filter(age > 30) %>%
  select(name, age, original_row)
     name age original_row
1 Charlie  35            3

Simple Explanation: Add row numbers first, then filter to preserve original row positions.


data.table Methods for Row Number Retrieval

data.table provides the most efficient methods for row operations, especially with large datasets. The syntax is concise but requires understanding data.table’s unique approach.

Basic Row Indexing with .I

library(data.table)

# Convert to data.table
DT <- as.data.table(df)

# Add row numbers using .I
DT[, row_num_dt := .I]
print(DT[, .(name, row_num_dt)])
      name row_num_dt
    <char>      <int>
1:   Alice          1
2:     Bob          2
3: Charlie          3
4:   Diana          4

Simple Explanation: .I returns row indices. The := operator adds a new column by reference (very efficient).

Finding Row Numbers with Conditions

# Get row numbers where age > 30
DT[age > 30, .I]
[1] 1
# More complex conditions
DT[age > 25 & city != "Miami", .I]
[1] 1 2

Simple Explanation: Place the condition in the first argument (i), and .I in the second argument (j) to get matching row numbers.

Row Numbers Within Groups

# Add group row numbers
DT[, group_row_dt := seq_len(.N), by = group]
print(DT[, .(name, group, group_row_dt)])
      name  group group_row_dt
    <char> <char>        <int>
1:   Alice      A            1
2:     Bob      A            2
3: Charlie      B            1
4:   Diana      B            2

Simple Explanation: .N gives the number of rows in each group. seq_len(.N) creates sequential numbers within each group defined by by = group.

Using rowid() for Group Numbering

# Alternative method for group row numbers
DT[, group_row_alt := rowid(group)]
print(DT[, .(name, group, group_row_alt)])
      name  group group_row_alt
    <char> <char>         <int>
1:   Alice      A             1
2:     Bob      A             2
3: Charlie      B             1
4:   Diana      B             2

Simple Explanation: rowid() is a data.table convenience function that automatically generates sequential IDs within groups.


Performance Benchmarking with rbenchmark

To compare the performance of different row number retrieval methods, we’ll use the rbenchmark package . This package provides reliable timing results with statistical analysis across multiple replications.

Setting Up the Benchmark

Here’s how to benchmark different approaches for finding rows that meet specific conditions:

library(rbenchmark)
library(dplyr)

# Create sample dataset
df <- data.frame(
  id = 1:10000,
  value = rnorm(10000),
  category = sample(letters[1:5], 10000, replace = TRUE)
)

# Run benchmark comparison
benchmark(
  "which(condition)" = {
    row_nums <- which(df$value > 0)
  },
  "grep(pattern, rownames)" = {
    matching_rows <- grep("^[1-9]", rownames(df))
  },
  "subset(df, condition, select=row.names)" = {
    subset_rows <- as.numeric(rownames(subset(df, value > 0)))
  },
  "dplyr::filter() %>% row_number()" = {
    filtered_rows <- df %>% 
      filter(value > 0) %>% 
      mutate(row_num = row_number()) %>% 
      pull(row_num)
  },
  replications = 500,
  columns = c("test", "replications", "elapsed", "relative", "user.self", "sys.self")
) %>%
  arrange(relative)
                                     test replications elapsed relative
1                        which(condition)          500    0.08     1.00
2        dplyr::filter() %>% row_number()          500    2.02    25.25
3                 grep(pattern, rownames)          500    3.12    39.00
4 subset(df, condition, select=row.names)          500    3.22    40.25
  user.self sys.self
1      0.03     0.01
2      1.77     0.02
3      2.54     0.06
4      2.62     0.23

Understanding rbenchmark Output

  • elapsed: Total time in seconds for all replications
  • relative: Performance relative to the fastest method (1.00 = fastest)
  • user.self: CPU time spent in the user process
  • sys.self: CPU time spent in system calls
  • replications: Number of times each test was run for accuracy

Recommendations by Use Case:

Data Size Best Choice Why
< 1K rows Base R Simple, readable, adequate performance
1K - 10K rows Base R or data.table Both perform well, choose based on preference
10K - 100K rows data.table Clear performance advantage
> 100K rows data.table Significant speed improvement, memory efficient
Tidyverse workflow dplyr Better integration, acceptable for small-medium data

Your Turn!

Let’s put these concepts into practice with a real-world scenario.

Challenge: You have a sales dataset and need to:

  1. Add row numbers to track each transaction
  2. Find the row numbers of sales over $1000
  3. Create sequential numbers within each salesperson group
  4. Select every 3rd row for quality control sampling
# Sample sales data
sales_data <- data.frame(
  transaction_id = 101:110,
  salesperson = rep(c("John", "Jane", "Mike"), length.out = 10),
  amount = c(750, 1200, 890, 1500, 650, 2000, 1100, 800, 1300, 900),
  date = seq(as.Date("2024-01-01"), by = "day", length.out = 10)
)

Try to solve this using all three methods (base R, dplyr, and data.table), then check the solution below.

Click here for Solution!
# BASE R SOLUTION
# 1. Add row numbers
sales_data$row_num <- seq_len(nrow(sales_data))

# 2. Find rows with sales > $1000
high_sales_rows <- which(sales_data$amount > 1000)
print(paste("High sales in rows:", paste(high_sales_rows, collapse = ", ")))
[1] "High sales in rows: 2, 4, 6, 7, 9"
# 3. Row numbers within salesperson groups
sales_data$group_row <- ave(sales_data$amount, sales_data$salesperson, FUN = seq_along)

# 4. Select every 3rd row
every_third <- sales_data[seq(3, nrow(sales_data), by = 3), ]

# DPLYR SOLUTION
library(dplyr)
sales_dplyr <- sales_data %>%
  # 1. Add row numbers
  mutate(row_num = row_number()) %>%
  # 3. Group row numbers
  group_by(salesperson) %>%
  mutate(group_row = row_number()) %>%
  ungroup()

# 2. Find high sales rows
high_sales_dplyr <- sales_dplyr %>%
  filter(amount > 1000) %>%
  pull(row_num)

# 4. Every 3rd row
every_third_dplyr <- sales_dplyr %>% slice(seq(3, n(), by = 3))

# DATA.TABLE SOLUTION
library(data.table)
sales_dt <- as.data.table(sales_data)

# 1. Add row numbers
sales_dt[, row_num := .I]

# 2. Find high sales rows
high_sales_dt <- sales_dt[amount > 1000, .I]

# 3. Group row numbers
sales_dt[, group_row := seq_len(.N), by = salesperson]

# 4. Every 3rd row
every_third_dt <- sales_dt[seq(3, .N, by = 3)]

Quick Takeaways

Base R: Use seq_len(nrow()) for row numbers, which() for conditional selection, and ave() for grouped operations

dplyr: Leverage row_number(), slice(), and group_by() combinations for readable, pipeline-friendly code

data.table: Utilize .I for row indices, .N for group sizes, and rowid() for efficient group numbering

Performance: which() is fastest for conditions, data.table excels for large datasets, dplyr prioritizes readability

Benchmarking: Use rbenchmark package to compare methods with statistical reliability across multiple replications

Memory: data.table modifies by reference (efficient), while base R and dplyr create copies

Syntax: data.table is most concise, dplyr is most readable, base R is most familiar


Frequently Asked Questions

Q: What’s the difference between rownames() and row_number()? A: rownames() returns character row identifiers (which may not be sequential), while row_number() creates consecutive integers starting from 1.

Q: Why is data.table faster than dplyr for row operations? A: data.table modifies objects by reference and uses optimized C code, while dplyr creates copies and has more overhead from its abstraction layer.

Q: When should I use which() instead of filter()? A: Use which() when you need the actual row numbers/positions. Use filter() when you want to subset the data and continue with dplyr operations.

Q: Can I mix different approaches in the same project? A: Yes, but be consistent within functions or analysis sections. Consider using dtplyr to combine dplyr syntax with data.table performance.

Q: How do I handle row numbers when data has missing values? A: All methods handle NA values consistently - row numbers are assigned regardless of missing data. Use complete.cases() if you need to exclude rows with missing values.


Conclusion

Mastering row number retrieval in R opens up powerful possibilities for data manipulation and analysis. Each approach - base R, dplyr, and data.table - offers unique advantages:

  • Base R provides reliable, universally available functions that work well for small to medium datasets
  • dplyr offers readable, intuitive syntax that integrates seamlessly with tidyverse workflows
  • data.table delivers superior performance and memory efficiency, especially crucial for large datasets

The choice between methods depends on your specific needs: data size, performance requirements, team preferences, and existing codebase. For maximum flexibility, consider learning all three approaches and choosing the most appropriate one for each situation.

Start practicing these techniques with your own datasets, and remember that the best method is the one that helps you solve your specific data challenges effectively and efficiently.


References

  1. How to Retrieve Row Numbers in R DataFrame - GeeksforGeeks

  2. rbenchmark: Benchmarking routine for R - CRAN

  3. Benchmarking the six most used manipulations for data.tables in R - R-bloggers

  4. How to get row from R data.frame - Stack Overflow


Happy Coding! 🚀

Row Numbers in R

You can connect with me at any one of the below:

Telegram Channel here: https://t.me/steveondata

LinkedIn Network here: https://www.linkedin.com/in/spsanderson/

Mastadon Social here: https://mstdn.social/@stevensanderson

RStats Network here: https://rstats.me/@spsanderson

GitHub Network here: https://github.com/spsanderson

Bluesky Network here: https://bsky.app/profile/spsanderson.com

My Book: Extending Excel with Python and R here: https://packt.link/oTyZJ

You.com Referral Link: https://you.com/join/EHSLDTL6