```
<- apply(iris[, 1:4], 2, mean)
column_means print(column_means)
```

```
Sepal.Length Sepal.Width Petal.Length Petal.Width
5.843333 3.057333 3.758000 1.199333
```

code

rtip

operations

Author

Steven P. Sanderson II, MPH

Published

February 15, 2024

Welcome, fellow R warriors! Today, we delve into the heart of vectorized operations with R’s “apply” family: `apply()`

, `lapply()`

, `sapply()`

, and `tapply()`

. These functions are your secret weapons for efficiency and elegance, so buckle up and prepare to be amazed!

**But first, the “why”:** Loops are great, but for repetitive tasks on data structures, vectorization reigns supreme. It’s faster, cleaner, and lets you focus on the “what” instead of the “how” of your analysis. Enter the apply family, each member offering a unique twist on applying functions to your data.

Think of `apply()`

as the customizable grandfather. It takes three arguments:

`X:`

Your data (matrix, array, data frame).`MARGIN:`

Where to apply the function (rows = 1, columns = 2, both = c(1, 2)).`FUN:`

The function to apply (e.g.,`mean`

,`sum`

, your custom function).

Calculate the mean of each column in the `iris`

dataset:

```
Sepal.Length Sepal.Width Petal.Length Petal.Width
5.843333 3.057333 3.758000 1.199333
```

**Explanation:** We apply `mean`

(FUN) to each column (MARGIN = 2) of the first four columns (iris[, 1:4]) of the `iris`

data frame, storing the results in `column_means`

.

`lapply()`

is the speed demon, applying a function to each element of a list or vector and returning a list of results.

Calculate the median of each petal length in a list of lists:

```
petal_lengths <- list(c(1.4, 1.5, 1.6), c(4.4, 4.5, 4.6))
petal_medians <- lapply(petal_lengths, median)
print(petal_medians)
```

```
[[1]]
[1] 1.5
[[2]]
[1] 4.5
```

**Explanation:** We apply `median`

to each sub-list in `petal_lengths`

, returning a list (`petal_medians`

) containing the medians.

`sapply()`

is like `lapply()`

, but it tries to simplify the output. If all results are of the same type (e.g., numeric), it returns a vector instead of a list.

Find the minimum value in each row of a matrix:

```
matrix <- matrix(1:12, nrow = 3)
row_mins <- sapply(1:nrow(matrix), function(i) min(matrix[i, ]))
print(row_mins)
```

`[1] 1 2 3`

**Explanation:** We use an anonymous function to find the minimum in each row (`matrix[i, ]`

) and apply it to each row number (1:nrow(matrix)). `sapply()`

simplifies the output to a vector of minimum values (`row_mins`

).

`tapply()`

groups data based on another variable and applies a function to each group. Perfect for summarizing data by categories!

Calculate the average sepal length for each species in the `iris`

dataset:

```
sepal_length_by_species <- tapply(iris$Sepal.Length, iris$Species, mean)
print(sepal_length_by_species)
```

```
setosa versicolor virginica
5.006 5.936 6.588
```

**Explanation:** We group the `Sepal.Length`

column by the `Species`

column (using `iris$Species`

) and calculate the mean (`mean`

) for each group. The results are stored in `sepal_length_by_species`

.

These are just a taste of the apply family’s power. Now it’s your turn! Explore different functions, data structures, and margins to see how these tools can streamline your R workflow. Remember, practice makes perfect, so dive in, experiment, and conquer the world of vectorized operations!

**Bonus Tip:** Check out the `purrr`

package for even more apply-like functions with a functional programming twist!