Data Preppers with {healthyR.ai}

code
rtip
healthyrai
preprocessor
Author

Steven P. Sanderson II, MPH

Published

February 24, 2023

Introduction

There are many different methods that one can choose from in order to model their data. This brings with it a fundamental issue of how to prepare your data for the specified algorithm. With the [{healthyR.ai}] package there are many different functions in this family that will help solve this issue for some algorithms but of course not all, that would be utterly exhausting for me to do on my own.

In healthyR.ai I call these Data Preppers because they prep the data you supply to the format necessary for the algorithm to function properly.

Let’s take a look at one.

Function

Here we are going to use the hai_c50_data_prepper(.data, .recipe_formula) function.

hai_c50_data_prepper(.data, .recipe_formula)

Here are the simple arguments:

  • .data - The data that you are passing to the function. Can be any type of data that is accepted by the data parameter of the recipes::recipe() function.
  • .recipe_formula - The formula that is going to be passed. For example if you are using the iris data then the formula would most likely be something like Species ~ .

Example

Here is a small example:

library(healthyR.ai)

hai_c50_data_prepper(.data = Titanic, .recipe_formula = Survived ~ .)
Recipe

Inputs:

      role #variables
   outcome          1
 predictor          4

Operations:

Factor variables from tidyselect::vars_select_helpers$where(is.charac...
rec_obj <- hai_c50_data_prepper(Titanic, Survived ~ .)
get_juiced_data(rec_obj)
# A tibble: 32 × 5
   Class Sex    Age       n Survived
   <fct> <fct>  <fct> <dbl> <fct>   
 1 1st   Male   Child     0 No      
 2 2nd   Male   Child     0 No      
 3 3rd   Male   Child    35 No      
 4 Crew  Male   Child     0 No      
 5 1st   Female Child     0 No      
 6 2nd   Female Child     0 No      
 7 3rd   Female Child    17 No      
 8 Crew  Female Child     0 No      
 9 1st   Male   Adult   118 No      
10 2nd   Male   Adult   154 No      
# … with 22 more rows

Here are the rest of the data-preppers at the time of writing this article:

Voila!