# Level Up Your Data Wrangling: Adding Index Columns in R like a Pro!

code
rtip
operations
Author

Steven P. Sanderson II, MPH

Published

February 16, 2024

# Introduction

Data wrangling in R is like cooking: you have your ingredients (data), and you use tools (functions) to prepare them (clean, transform) for analysis (consumption!). One essential tool is adding an “index column” – a unique identifier for each row. This might seem simple, but there are several ways to do it in base R and tidyverse packages like `dplyr` and `tibble`. Let’s explore and spice up your data wrangling skills!

# Examples

## Adding Heat with Base R

### Ex 1: The Sequencer:

Imagine lining up your rows. `cbind(df, 1:nrow(df))` adds a new column with numbers 1 to n, where n is the number of rows in your data frame (`df`).

``````# Sample data
df <- data.frame(name = c("Alice", "Bob", "Charlie"), age = c(25, 30, 28))

df_with_index <- cbind(index = 1:nrow(df), df)
df_with_index``````
``````  index    name age
1     1   Alice  25
2     2     Bob  30
3     3 Charlie  28``````

### Ex 2: Row Name Shuffle:

Prefer names over numbers? `rownames(df) <- 1:nrow(df)` assigns row numbers as your index, replacing existing row names.

``````# Sample data
df <- data.frame(name = c("Alice", "Bob", "Charlie"), age = c(25, 30, 28))

df_with_index <- cbind(index = rownames(df), df)
df_with_index``````
``````  index    name age
1     1   Alice  25
2     2     Bob  30
3     3 Charlie  28``````

### Ex 3: The All-Seeing Eye:

`seq_len(nrow(df))` generates a sequence of numbers, perfect for adding as a new column named “index”.

``````# Sample data
df <- data.frame(name = c("Alice", "Bob", "Charlie"), age = c(25, 30, 28))

df_with_index <- cbind(index = seq_len(nrow(df)), df)
df_with_index``````
``````  index    name age
1     1   Alice  25
2     2     Bob  30
3     3 Charlie  28``````

## The Tidyverse Twist:

The `tidyverse` offers unique approaches:

### Ex 1: Tibble Magic:

`tibble::rowid_to_column(df)` adds a column named “row_id” with unique row identifiers.

``````library(tibble)

# Convert df to tibble
df_tib <- as_tibble(df)

df_tib_indexed <- rowid_to_column(df_tib)
df_tib_indexed``````
``````# A tibble: 3 × 3
rowid name      age
<int> <chr>   <dbl>
1     1 Alice      25
2     2 Bob        30
3     3 Charlie    28``````

### Ex 2: dplyr’s Ranking System:

`dplyr::row_number()` assigns ranks (starting from 1) based on the order of your data.

``````library(dplyr)
df_tib_ranked <- df_tib |>
mutate(rowid = row_number()) |>
select(rowid, everything())

df_tib_ranked``````
``````# A tibble: 3 × 3
rowid name      age
<int> <chr>   <dbl>
1     1 Alice      25
2     2 Bob        30
3     3 Charlie    28``````

Experiment and see what suits your workflow! Base R offers flexibility, while `tidyverse` provides concise and consistent syntax.