# Visualizing Categorical Data in R: A Guide with Engaging Charts Using the Iris Dataset

rtip
viz
Author

Steven P. Sanderson II, MPH

Published

August 29, 2023

# Introduction

Categorical data is a type of data that represents distinct groups or categories. Visualizing categorical data can provide valuable insights and help in understanding patterns and relationships within the data. In this blog post, we will explore three popular charts for visualizing categorical data in R using the iris dataset: geom_bar() from ggplot2, a grouped boxplot with base R and ggplot2, and a mosaic plot. We will explain each section of code in simple terms and encourage readers to try these charts on their own.

# Examples

## Example 1 Barplots with geom_bar() from ggplot2:

Barplots are a common and effective way to visualize categorical data. We can use the geom_bar() function from the ggplot2 package to create barplots in R. The geom_bar() function accepts a variable for the x-axis and plots the number of times each value of the variable appears in the dataset[3].

``````library(ggplot2)

# Create a barplot using geom_bar()
ggplot(data = iris, aes(x = Species, fill = factor(Species))) +
geom_bar() +
theme_minimal() +
labs(
title = "Bar Chart of Species Count",
ylab = "Count",
fill = "Species"
)``````

Explanation: - Load the ggplot2 package using `library(ggplot2)`. - The iris dataset is already available in R, so we can directly use it. - The `aes()` function specifies the aesthetic mappings, where `x` represents the variable on the x-axis. - The `geom_bar()` function creates the barplot.

Try creating a barplot with the Species variable from the iris dataset using the provided code. Experiment with different variables and datasets to explore the patterns and distributions within your data.

## Example 2 Grouped Boxplot with base R and ggplot2

A grouped boxplot is a useful chart for comparing the distribution of a continuous variable across different categories of a categorical variable. We can create a grouped boxplot using both base R and ggplot2.

### Base R

``````# Create a grouped boxplot using  base R
boxplot(Sepal.Length ~ Species, data = iris)``````

### ggplot2

``````# Create a grouped boxplot using ggplot2
ggplot(data = iris, aes(x = Species, y = Sepal.Length,
fill = factor(Species))) +
geom_boxplot() +
theme_minimal() +
labs(fill = "Species")``````

Explanation: - In base R, we use the `boxplot()` function to create a grouped boxplot. The formula `Sepal.Length ~ Species` specifies that the Sepal.Length variable should be plotted against the Species variable[2]. - In ggplot2, we use the `geom_boxplot()` function to create a grouped boxplot. The `aes()` function specifies the aesthetic mappings, where `x` represents the categorical variable and `y` represents the numeric variable.

Create a grouped boxplot with the Sepal.Length variable across different species in the iris dataset using either base R or ggplot2. Compare the distributions of Sepal.Length for each species and observe any differences.

## Example 3 Mosaic Plot

A mosaic plot is a graphical representation of the relationship between two or more categorical variables. It displays the proportions of each category within the variables and allows for visual comparison.

``mosaicplot(table(iris\$Species, iris\$Petal.Width))``

Explanation: - The `table()` function creates a contingency table of the two variables, Species and Petal.Width, from the iris dataset. - The `mosaicplot()` function creates the mosaic plot.

Create a mosaic plot with the Species and Petal.Width variables from the iris dataset using the provided code. Explore the relationships and proportions between the variables. Experiment with different combinations of variables to gain insights from the mosaic plot.

# Conclusion

Visualizing categorical data is essential for understanding patterns and relationships within the data. In this blog post, we explored three engaging charts for visualizing categorical data in R using the iris dataset: geom_bar() from ggplot2, a grouped boxplot with base R and ggplot2, and a mosaic plot. We explained each section of code in simple terms and encouraged readers to try these charts on their own. By experimenting with these charts, readers can gain valuable insights from their own categorical data and make informed decisions based on the visualizations.

Happy plotting!

# Resources:

• [1] https://community.rstudio.com/t/how-to-plot-categorical-data-in-r/21285
• [2] https://gexijin.github.io/learnR/step-into-r-programmingthe-iris-flower-dataset.html
• [3] https://www.statology.org/plot-categorical-data-in-r/
• [4] https://www.r-bloggers.com/2022/01/handling-categorical-data-in-r-part-4/
• [5] https://www.geeksforgeeks.org/how-to-plot-categorical-data-in-r/
• [6] https://rpubs.com/odenipinedo/introduction-to-data-visualization-with-ggplot2