```
# Load the Iris dataset
data(iris)
```

# Introduction

Calculating percentages by group is a common task in data analysis. It allows you to understand the distribution of data within different categories. In this blog post, we’ll walk you through the process of calculating percentages by group using three popular R packages: Base R, dplyr, and data.table. To keep things simple, we will use the well-known Iris dataset.

The Iris dataset contains information about different species of iris flowers and their measurements, including sepal length, sepal width, petal length, and petal width. We will focus on the ‘Species’ column and calculate the percentage of each species in the dataset.

# Examples

## Example 1: Using Base R

Step 1: Load the Iris dataset

Step 2: Calculate the counts by group

```
# Use the table() function to get the counts of each species
<- table(iris$Species) group_counts
```

Step 3: Calculate the total count

```
# Calculate the total count using the sum() function
<- sum(group_counts) total_count
```

Step 4: Calculate the percentage by group

```
# Divide each count by the total count and multiply by 100 to get the percentage
<- (group_counts / total_count) * 100 percentage_by_group
```

Step 5: Combine group names and percentages into a data frame and display the result

```
# Combine group names and percentages into a data frame
<- data.frame(
result_base_R Species = names(percentage_by_group),
Percentage = percentage_by_group
)
# Print the result
print(result_base_R)
```

```
Species Percentage.Var1 Percentage.Freq
1 setosa setosa 33.33333
2 versicolor versicolor 33.33333
3 virginica virginica 33.33333
```

## Example 2: Using dplyr

Step 1: Load the necessary library and the Iris dataset

```
# Load the dplyr library
library(dplyr)
# Load the Iris dataset
data(iris)
```

Step 2: Calculate the percentage by group using dplyr

```
# Use the group_by() and summarise() functions to calculate percentages
<- iris %>%
result_dplyr group_by(Species) %>%
summarise(Percentage = n() / nrow(iris) * 100)
```

Step 3: Display the result

```
# Print the result
print(result_dplyr)
```

```
# A tibble: 3 × 2
Species Percentage
<fct> <dbl>
1 setosa 33.3
2 versicolor 33.3
3 virginica 33.3
```

## Example 3: Using data.table:

Step 1: Load the necessary library and the Iris dataset

```
# Load the data.table library
library(data.table)
# Convert the Iris dataset to a data.table
<- as.data.table(iris) iris_dt
```

Step 2: Calculate the percentage by group using data.table

```
# Use the .N special symbol to calculate counts and by-reference to save memory
<- iris_dt[, .(Percentage = .N / nrow(iris_dt) * 100), by = Species] result_data_table
```

Step 3: Display the result

```
# Print the result
print(result_data_table)
```

```
Species Percentage
1: setosa 33.33333
2: versicolor 33.33333
3: virginica 33.33333
```

# Conclusion

In this blog post, we demonstrated three methods to calculate percentages by group in R using Base R, dplyr, and data.table. Each method has its advantages, and you can choose the one that suits your needs and preferences. The key takeaway is that understanding the distribution of data within groups can provide valuable insights in data analysis. We encourage you to try these methods on your own datasets and explore further possibilities with these powerful R packages. Happy coding!