Simplifying Model Formulas with the R Function ‘reformulate()’

rtip
Author

Steven P. Sanderson II, MPH

Published

June 13, 2023

Introduction

As a programmer, you may come across various scenarios where you need to create complex model formulas in R. However, constructing these formulas can often be challenging and time-consuming. This is where the ‘reformulate()’ function comes to the rescue! In this blog post, we will explore the purpose and usage of the reformulate() function in R, and provide you with simple examples to help you grasp its power.

What is ‘reformulate()’?

The reformulate() function is a handy tool in R that simplifies the creation of model formulas. It allows you to construct formulas by specifying the response variable and predictor variables using a character vector or formula-like syntax. The function then generates a formula object that can be used in various modeling functions within R.

Usage and Syntax: The syntax of the ‘reformulate()’ function is as follows:

reformulate(response, ...)

Here, ‘response’ represents the response variable, and ‘…’ denotes one or more predictor variables. The predictor variables can be specified as separate arguments or as a character vector.

Examples

Example 1: Linear Regression

Let’s say we want to use the mtcars dataset containing information about cars, including their hp and number of cylinders. We want to perform a linear regression to predict the mpg of the car based upon hp and cyl. Here’s how we can use ‘reformulate()’ for this purpose:

library(stats)

# Creating a formula using reformulate()
formula <- reformulate(c("hp", "cyl"), response = "mpg")

# Fitting a linear regression model
model <- lm(formula, data = mtcars)

formula
mpg ~ hp + cyl
model

Call:
lm(formula = formula, data = mtcars)

Coefficients:
(Intercept)           hp          cyl  
   36.90833     -0.01912     -2.26469  

In this example, the ‘reformulate()’ function creates a formula object that specifies the relationship between the response variable “mpg” and the predictor variables “hp” and “cyl”. This formula is then passed to the ‘lm()’ function for fitting a linear regression model.

Example 2: Logistic Regression

Consider a scenario where we use the mtcars dataset. We use the mpg, hp, and disp variables, and whether the car is an automatic or manual. We want to perform a logistic regression to predict the probability of passing based on the mpg, hp, and disp. Here’s how ‘reformulate()’ can help us:

library(stats)

# Creating a formula using reformulate()
formula <- reformulate(c("mpg", "hp", "disp"), response = "am")

# Fitting a logistic regression model
model <- glm(formula, data = mtcars, family = "binomial")

formula
am ~ mpg + hp + disp
model

Call:  glm(formula = formula, family = "binomial", data = mtcars)

Coefficients:
(Intercept)          mpg           hp         disp  
  -33.81283      1.28498      0.14936     -0.06545  

Degrees of Freedom: 31 Total (i.e. Null);  28 Residual
Null Deviance:      43.23 
Residual Deviance: 10.15    AIC: 18.15

In this example, the ‘reformulate()’ function constructs a formula that defines the relationship between the response variable “am” and the predictor variables “mpg”, “hp”, and “disp”. The resulting formula is then passed to the glm() function for fitting a logistic regression model.

Conclusion

The ‘reformulate()’ function simplifies the creation of model formulas in R by allowing you to specify the response and predictor variables concisely. By leveraging this function, you can save time and effort when constructing complex formulas for various modeling tasks. Whether you’re performing linear regression, logistic regression, or other types of analyses, ‘reformulate()’ is a valuable tool in your programming arsenal.

So, the next time you find yourself struggling with model formula creation, remember the power of ‘reformulate()’ and let it handle the complexity for you!