# Introduction to Linear Regression in R: Analyzing the mtcars Dataset with lm()

rtip
linear
regression
Author

Steven P. Sanderson II, MPH

Published

June 15, 2023

# Introduction

The `lm()` function in R is used for fitting linear regression models. It stands for “linear model,” and it allows you to analyze the relationship between variables and make predictions based on the data.

Let’s dive into the parameters of the `lm()` function:

1. `formula`: This is the most important parameter, as it specifies the relationship between the variables. It follows a pattern: `y ~ x1 + x2 + ...`, where `y` is the response variable, and `x1`, `x2`, etc., are the predictor variables. For example, in the `mtcars` dataset, we can use the formula `mpg ~ wt` to predict the miles per gallon (`mpg`) based on the weight (`wt`) of the cars.

2. `data`: This parameter refers to the dataset you want to use for the analysis. In our case, we’ll use the `mtcars` dataset that comes with R.

Now, let’s see some examples using the `mtcars` dataset

# Examples

Example 1: Simple Linear Regression

``````# Fit a linear regression model to predict mpg based on weight
model <- lm(mpg ~ wt, data = mtcars)

# Print the summary of the model
summary(model)``````
``````
Call:
lm(formula = mpg ~ wt, data = mtcars)

Residuals:
Min      1Q  Median      3Q     Max
-4.5432 -2.3647 -0.1252  1.4096  6.8727

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
wt           -5.3445     0.5591  -9.559 1.29e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.046 on 30 degrees of freedom
Multiple R-squared:  0.7528,    Adjusted R-squared:  0.7446
F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10``````

Example 2: Multiple Linear Regression

``````# Fit a linear regression model to predict mpg based on weight and horsepower
model <- lm(mpg ~ wt + hp, data = mtcars)

# Print the summary of the model
summary(model)``````
``````
Call:
lm(formula = mpg ~ wt + hp, data = mtcars)

Residuals:
Min     1Q Median     3Q    Max
-3.941 -1.600 -0.182  1.050  5.854

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.22727    1.59879  23.285  < 2e-16 ***
wt          -3.87783    0.63273  -6.129 1.12e-06 ***
hp          -0.03177    0.00903  -3.519  0.00145 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.593 on 29 degrees of freedom
Multiple R-squared:  0.8268,    Adjusted R-squared:  0.8148
F-statistic: 69.21 on 2 and 29 DF,  p-value: 9.109e-12``````

Example 3: Include Interaction Term

``````# Fit a linear regression model to predict mpg based on weight, horsepower, and their interaction
model <- lm(mpg ~ wt + hp + wt:hp, data = mtcars)

# Print the summary of the model
summary(model)``````
``````
Call:
lm(formula = mpg ~ wt + hp + wt:hp, data = mtcars)

Residuals:
Min      1Q  Median      3Q     Max
-3.0632 -1.6491 -0.7362  1.4211  4.5513

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 49.80842    3.60516  13.816 5.01e-14 ***
wt          -8.21662    1.26971  -6.471 5.20e-07 ***
hp          -0.12010    0.02470  -4.863 4.04e-05 ***
wt:hp        0.02785    0.00742   3.753 0.000811 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.153 on 28 degrees of freedom
Multiple R-squared:  0.8848,    Adjusted R-squared:  0.8724
F-statistic: 71.66 on 3 and 28 DF,  p-value: 2.981e-13``````

These examples demonstrate how to use the `lm()` function with different sets of predictor variables. After fitting the model, you can use the `summary()` function to get detailed information about the regression results, including coefficients, p-values, and R-squared values.

I encourage you to try running these examples and explore different variables in the `mtcars` dataset. Feel free to modify the formulas and experiment with additional parameters to deepen your understanding of linear regression modeling in R!