Using .drop_na in Fast Classification and Regression

code
rtip
tidyaml
Author

Steven P. Sanderson II, MPH

Published

January 17, 2024

Introduction

In the newest release of tidyAML there has been an addition of a new parameter to the functions fast_classification() and fast_regression(). The parameter is .drop_na and it is a logical value that defaults to TRUE. This parameter is used to determine if the function should drop rows with missing values from the output if a model cannot be built for some reason. Let’s take a look at the function and it’s arguments.

fast_regression(
  .data,
  .rec_obj,
  .parsnip_fns = "all",
  .parsnip_eng = "all",
  .split_type = "initial_split",
  .split_args = NULL,
  .drop_na = TRUE
)

Arguments

.data - The data being passed to the function for the regression problem .rec_obj - The recipe object being passed. .parsnip_fns - The default is ‘all’ which will create all possible regression model specifications supported. .parsnip_eng - The default is ‘all’ which will create all possible regression model specifications supported. .split_type - The default is ‘initial_split’, you can pass any type of split supported by rsample .split_args - The default is NULL, when NULL then the default parameters of the split type will be executed for the rsample split type. .drop_na - The default is TRUE, which will drop all NA’s from the data.

Now let’s see this in action.

Example

We are going to use the mtcars dataset for this example. We will create a regression problem where we are trying to predict mpg using all other variables in the dataset. We will not load in all the libraries that are supported causing the function to return NULL for some models and we will set the parameter .drop_na to FALSE.

library(tidyAML)
library(tidymodels)
library(tidyverse)

tidymodels::tidymodels_prefer()

# Create regression problem
rec_obj <- recipe(mpg ~ ., data = mtcars)
frt_tbl <- fast_regression(
  mtcars,
  rec_obj,
  .parsnip_eng = c("lm","glm","gee"),
  .parsnip_fns = "linear_reg",
  .drop_na = FALSE
  )

glimpse(frt_tbl)
Rows: 3
Columns: 8
$ .model_id       <int> 1, 2, 3
$ .parsnip_engine <chr> "lm", "gee", "glm"
$ .parsnip_mode   <chr> "regression", "regression", "regression"
$ .parsnip_fns    <chr> "linear_reg", "linear_reg", "linear_reg"
$ model_spec      <list> [~NULL, ~NULL, NULL, regression, TRUE, NULL, lm, TRUE]…
$ wflw            <list> [cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb, mp…
$ fitted_wflw     <list> [cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb, mp…
$ pred_wflw       <list> [<tbl_df[64 x 3]>], <NULL>, [<tbl_df[64 x 3]>]
extract_wflw(frt_tbl, 1:nrow(frt_tbl))
[[1]]
══ Workflow ════════════════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: linear_reg()

── Preprocessor ────────────────────────────────────────────────────────────────
0 Recipe Steps

── Model ───────────────────────────────────────────────────────────────────────
Linear Regression Model Specification (regression)

Computational engine: lm 


[[2]]
NULL

[[3]]
══ Workflow ════════════════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: linear_reg()

── Preprocessor ────────────────────────────────────────────────────────────────
0 Recipe Steps

── Model ───────────────────────────────────────────────────────────────────────
Linear Regression Model Specification (regression)

Computational engine: glm 

Here we can see that the function returned NULL for the gee model because we did not load in the multilevelmod library. We can also see that the function did not drop that model from the output because .drop_na was set to FALSE. Now let’s set it back to TRUE.

frt_tbl <- fast_regression(
  mtcars,
  rec_obj,
  .parsnip_eng = c("lm","glm","gee"),
  .parsnip_fns = "linear_reg",
  .drop_na = TRUE
  )

glimpse(frt_tbl)
Rows: 2
Columns: 8
$ .model_id       <int> 1, 3
$ .parsnip_engine <chr> "lm", "glm"
$ .parsnip_mode   <chr> "regression", "regression"
$ .parsnip_fns    <chr> "linear_reg", "linear_reg"
$ model_spec      <list> [~NULL, ~NULL, NULL, regression, TRUE, NULL, lm, TRUE]…
$ wflw            <list> [cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb, mp…
$ fitted_wflw     <list> [cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb, mp…
$ pred_wflw       <list> [<tbl_df[64 x 3]>], [<tbl_df[64 x 3]>]
extract_wflw(frt_tbl, 1:nrow(frt_tbl))
[[1]]
══ Workflow ════════════════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: linear_reg()

── Preprocessor ────────────────────────────────────────────────────────────────
0 Recipe Steps

── Model ───────────────────────────────────────────────────────────────────────
Linear Regression Model Specification (regression)

Computational engine: lm 


[[2]]
══ Workflow ════════════════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: linear_reg()

── Preprocessor ────────────────────────────────────────────────────────────────
0 Recipe Steps

── Model ───────────────────────────────────────────────────────────────────────
Linear Regression Model Specification (regression)

Computational engine: glm 

Here we can see that the gee model was dropped from the output because the function could not build the model due to the multilevelmod library not being loaded. This is a great way to drop models that cannot be built due to missing libraries or other reasons.

Conclusion

The .drop_na parameter is a great way to drop models that cannot be built due to missing libraries or other reasons. This is a great addition to the fast_classification() and fast_regression() functions.

Happy coding!