```
library(ggplot2)
library(caret)
```

# Introduction

Data visualization is a powerful tool in a data scientist’s toolkit. It not only helps us understand our data but also presents it in a way that is easy to comprehend. In this blog post, we will explore how to plot predicted values in R using the mtcars dataset. We will train a simple regression model to predict the miles per gallon (mpg) of cars based on their attributes and then visualize the predictions. By the end of this tutorial, you’ll have a clear understanding of how to plot predicted values and can apply this knowledge to your own data analysis projects.

**Step 1: Load the Required Libraries**

Before we dive into the code, let’s make sure we have the necessary libraries installed. We’ll be using `ggplot2`

for plotting and `caret`

for model training and evaluation. You can install them if you haven’t already using:

```
install.packages("ggplot2")
install.packages("caret")
```

Now, let’s load the libraries:

**Step 2: Load and Explore the Data**

We’ll use the classic `mtcars`

dataset, which contains various attributes of different car models. Our goal is to predict the fuel efficiency (mpg) of these cars. Let’s load and explore the dataset:

`head(mtcars)`

```
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
```

This will display the first few rows of the dataset, giving you an idea of what it looks like.

**Step 3: Split the Data into Training and Testing Sets**

Before we proceed with modeling and prediction, we need to split our data into training and testing sets. We’ll use 80% of the data for training and the remaining 20% for testing:

```
set.seed(123) # for reproducibility
<- createDataPartition(mtcars$mpg, p = 0.8, list = FALSE)
splitIndex <- mtcars[splitIndex, ]
training_data <- mtcars[-splitIndex, ] testing_data
```

**Step 4: Build a Simple Linear Regression Model**

Now, let’s build a simple linear regression model to predict `mpg`

based on other attributes. We’ll use the `lm()`

function:

`<- lm(mpg ~ ., data = training_data) model `

This line of code fits the linear regression model using the training data.

**Step 5: Make Predictions**

With our model trained, we can now make predictions on the testing data:

`<- predict(model, newdata = testing_data) predictions `

**Step 6: Create a Scatter Plot of Predicted vs. Actual Values**

The most exciting part is visualizing the predicted values. We can do this using a scatter plot. Let’s create one:

```
# Combine actual and predicted values
<- data.frame(Actual = testing_data$mpg, Predicted = predictions)
plot_data
# Create a scatter plot
ggplot(plot_data, aes(x = Actual, y = Predicted)) +
geom_point() +
geom_smooth(method = "lm", formula = y ~ x, color = "red") +
labs(
x = "Actual MPG",
y = "Predicted MPG",
title = "Actual vs. Predicted MPG"
+
) theme_minimal()
```

This code generates a scatter plot with the actual MPG values on the x-axis and predicted MPG values on the y-axis. The red line represents a linear regression line that helps us see how well our predictions align with the actual data.

Here is how we also plot the data in base R.

```
# Combine actual and predicted values
<- data.frame(Actual = testing_data$mpg, Predicted = predictions)
plot_data
# Create a scatter plot
plot(plot_data$Actual, plot_data$Predicted,
xlab = "Actual MPG", ylab = "Predicted MPG",
main = "Actual vs. Predicted MPG",
pch = 19, col = "blue")
# Add a regression line
abline(lm(Predicted ~ Actual, data = plot_data), col = "red")
```

# Conclusion

Congratulations! You’ve successfully learned how to plot predicted values in R using the mtcars dataset. Visualization is a vital part of data analysis, and it can provide valuable insights into the performance of your predictive models.

I encourage you to try this on your own datasets and explore more advanced visualization techniques. Experiment with different models and datasets to gain a deeper understanding of data visualization in R. Happy coding!