Off to CRAN! {tidyAML}

code
rtip
tidyaml
tidymodels
Author

Steven P. Sanderson II, MPH

Published

February 13, 2023

Introduction

Are you tired of spending hours tuning and testing different machine learning models for your regression or classification problems? The new R package {tidyAML} is here to simplify the process for you! tidyAML is a simple interface for automatic machine learning that fits the tidymodels framework, making it easier for you to solve regression and classification problems.

The tidyAML package has been designed with the goal of providing a simple API that automates the entire machine learning pipeline, from data preparation to model selection, training, and prediction. This means that you no longer have to spend hours tuning and testing different models; tidyAML will do it all for you, saving you time and effort.

In this initial release (version 0.0.1), tidyAML introduces a number of new features and minor fixes to improve the overall user experience. Here are some of the updates in this release:

New Features:

  • make_regression_base_tbl() and make_classification_base_tbl() functions for creating base tables for regression and classification problems, respectively.
  • internal_make_spec_tbl() function for making the specification table for the machine learning pipeline.
  • internal_set_args_to_tune() function for setting arguments to tune the models. This has not yet been implemented in a true working fashion but might be useful for feedback in this initial release.
  • create_workflow_set() function for creating a set of workflows to test different models.
  • get_model(), extract_model_spec(), extract_wflw(), extract_wflw_fit(), and extract_wflw_pred() functions for extracting different parts of the machine learning pipeline.
  • match_args() function for matching arguments between the base and specification tables.

Minor Fixes and Improvements:

  • Updates to fast_classification_parsnip_spec_tbl() and fast_regression_parsnip_spec_tbl() to use the make_regression and make_classification functions and the internal_make_spec_tbl() function.
  • Addition of a class for the base table functions and using that class in internal_make_spec_tbl().
  • Update to the DESCRIPTION for R >= 3.4.0.

In conclusion, tidyAML is a game-changer for those looking to automate the machine learning pipeline. It provides a simple API that eliminates the need for manual tuning and testing of different models. With the updates in this initial release, the tidyAML package is sure to make your machine learning journey easier and more efficient.

Function

There are too many functions to go over in this post so you can find them all here

Examples

Even though there are many functions to go over, we can showcase some with a small useful example. So let’s get at it!

library(tidyAML)
library(recipes)
library(dplyr)

rec_obj <- recipe(mpg ~ ., data = mtcars)

frt_tbl <- fast_regression(
  .data = mtcars, 
  .rec_obj = rec_obj, 
  .parsnip_eng = c("lm","glm"),
  .parsnip_fns = "linear_reg"
  )

glimpse(frt_tbl)
Rows: 2
Columns: 8
$ .model_id       <int> 1, 2
$ .parsnip_engine <chr> "lm", "glm"
$ .parsnip_mode   <chr> "regression", "regression"
$ .parsnip_fns    <chr> "linear_reg", "linear_reg"
$ model_spec      <list> [~NULL, ~NULL, NULL, regression, TRUE, NULL, lm, TRUE]…
$ wflw            <list> [cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb, mp…
$ fitted_wflw     <list> [cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb, mp…
$ pred_wflw       <list> [<tbl_df[24 x 1]>], [<tbl_df[24 x 1]>]

Now let’s go through the extractors.

The get_model() function.

get_model(frt_tbl, 2) |>
  glimpse()
Rows: 1
Columns: 8
$ .model_id       <int> 2
$ .parsnip_engine <chr> "glm"
$ .parsnip_mode   <chr> "regression"
$ .parsnip_fns    <chr> "linear_reg"
$ model_spec      <list> [~NULL, ~NULL, NULL, regression, TRUE, NULL, glm, TRUE…
$ wflw            <list> [cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb, mp…
$ fitted_wflw     <list> [cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb, mp…
$ pred_wflw       <list> [<tbl_df[24 x 1]>]

The extract_model_spec() function.

extract_model_spec(frt_tbl, 1)
[[1]]
Linear Regression Model Specification (regression)

Computational engine: lm 

Or do multiples:

extract_model_spec(frt_tbl, 1:2)
[[1]]
Linear Regression Model Specification (regression)

Computational engine: lm 


[[2]]
Linear Regression Model Specification (regression)

Computational engine: glm 

The extract_wflw() function.

extract_wflw(frt_tbl, 1)
[[1]]
══ Workflow ════════════════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: linear_reg()

── Preprocessor ────────────────────────────────────────────────────────────────
0 Recipe Steps

── Model ───────────────────────────────────────────────────────────────────────
Linear Regression Model Specification (regression)

Computational engine: lm 

Or do multiples:

extract_wflw(frt_tbl, c(1, 2))
[[1]]
══ Workflow ════════════════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: linear_reg()

── Preprocessor ────────────────────────────────────────────────────────────────
0 Recipe Steps

── Model ───────────────────────────────────────────────────────────────────────
Linear Regression Model Specification (regression)

Computational engine: lm 


[[2]]
══ Workflow ════════════════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: linear_reg()

── Preprocessor ────────────────────────────────────────────────────────────────
0 Recipe Steps

── Model ───────────────────────────────────────────────────────────────────────
Linear Regression Model Specification (regression)

Computational engine: glm 

The extract_wflw_fit() function.

extract_wflw_fit(frt_tbl, 1)
[[1]]
══ Workflow [trained] ══════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: linear_reg()

── Preprocessor ────────────────────────────────────────────────────────────────
0 Recipe Steps

── Model ───────────────────────────────────────────────────────────────────────

Call:
stats::lm(formula = ..y ~ ., data = data)

Coefficients:
(Intercept)          cyl         disp           hp         drat           wt  
   28.21291     -1.60712      0.03458     -0.02189      0.56925     -5.69276  
       qsec           vs           am         gear         carb  
    0.69956      0.39398      1.50212     -0.35338      0.48289  

Or do multiples:

extract_wflw_fit(frt_tbl, 1:2)
[[1]]
══ Workflow [trained] ══════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: linear_reg()

── Preprocessor ────────────────────────────────────────────────────────────────
0 Recipe Steps

── Model ───────────────────────────────────────────────────────────────────────

Call:
stats::lm(formula = ..y ~ ., data = data)

Coefficients:
(Intercept)          cyl         disp           hp         drat           wt  
   28.21291     -1.60712      0.03458     -0.02189      0.56925     -5.69276  
       qsec           vs           am         gear         carb  
    0.69956      0.39398      1.50212     -0.35338      0.48289  


[[2]]
══ Workflow [trained] ══════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: linear_reg()

── Preprocessor ────────────────────────────────────────────────────────────────
0 Recipe Steps

── Model ───────────────────────────────────────────────────────────────────────

Call:  stats::glm(formula = ..y ~ ., family = stats::gaussian, data = data)

Coefficients:
(Intercept)          cyl         disp           hp         drat           wt  
   28.21291     -1.60712      0.03458     -0.02189      0.56925     -5.69276  
       qsec           vs           am         gear         carb  
    0.69956      0.39398      1.50212     -0.35338      0.48289  

Degrees of Freedom: 23 Total (i.e. Null);  13 Residual
Null Deviance:      935.1 
Residual Deviance: 121.5    AIC: 131

Finally the extract_wflw_pred() function.

extract_wflw_pred(frt_tbl, 2)
[[1]]
# A tibble: 24 × 1
   .pred
   <dbl>
 1  24.8
 2  26.5
 3  18.5
 4  13.9
 5  24.6
 6  29.1
 7  14.0
 8  17.9
 9  10.0
10  23.4
# … with 14 more rows

Or do multiples:

extract_wflw_pred(frt_tbl, 1:2)
[[1]]
# A tibble: 24 × 1
   .pred
   <dbl>
 1  24.8
 2  26.5
 3  18.5
 4  13.9
 5  24.6
 6  29.1
 7  14.0
 8  17.9
 9  10.0
10  23.4
# … with 14 more rows

[[2]]
# A tibble: 24 × 1
   .pred
   <dbl>
 1  24.8
 2  26.5
 3  18.5
 4  13.9
 5  24.6
 6  29.1
 7  14.0
 8  17.9
 9  10.0
10  23.4
# … with 14 more rows

Voila!