Calibrate and Plot a Time Series with {healthyR.ts}

code
rtip
healthyrts
timeseries
Author

Steven P. Sanderson II, MPH

Published

February 22, 2023

Introduction

In time series analysis, it is common to split the data into training and testing sets to evaluate the accuracy of a model. However, it is important to ensure that the model is calibrated on the training set before evaluating its performance on the testing set. The {healthyR.ts} library provides a function called calibrate_and_plot() that simplifies this process.

Function

Here is the full function call:

calibrate_and_plot(
  ...,
  .type = "testing",
  .splits_obj,
  .data,
  .print_info = TRUE,
  .interactive = FALSE
)

Here are the arguments to the parameters:

  • ... - The workflow(s) you want to add to the function.
  • .type - Either the training(splits) or testing(splits) data.
  • .splits_obj - The splits object.
  • .data - The full data set.
  • .print_info - The default is TRUE and will print out the calibration accuracy tibble and the resulting plotly plot.
  • .interactive - The defaults is FALSE. This controls if a forecast plot is interactive or not via plotly.

Example

By default, calibrate_and_plot() will print out a calibration accuracy tibble and a resulting plotly plot. This can be controlled with the print_info argument, which is set to TRUE by default. If you prefer a non-interactive forecast plot, you can set the interactive argument to FALSE.

Here’s an example of how to use the calibrate_and_plot() function:

library(healthyR.ts)
library(dplyr)
library(timetk)
library(parsnip)
library(recipes)
library(workflows)
library(rsample)

# Get the Data
data <- ts_to_tbl(AirPassengers) |>
  select(-index)

# Split the data into training and testing sets
splits <- time_series_split(
   data
  , date_col
  , assess = 12
  , skip = 3
  , cumulative = TRUE
)

# Make the recipe object
rec_obj <- recipe(value ~ ., data = training(splits))

# Make the Model
model_spec <- linear_reg(
   mode = "regression"
   , penalty = 0.5
   , mixture = 0.5
) |>
   set_engine("lm")

# Make the workflow object
wflw <- workflow() |>
   add_recipe(rec_obj) |>
   add_model(model_spec) |>
   fit(training(splits))

# Get our output
output <- calibrate_and_plot(
  wflw
  , .type = "training"
  , .splits_obj = splits
  , .data = data
  , .print_info = FALSE
  , .interactive = TRUE
 )

The resulting output will include a calibration accuracy tibble and a plotly plot showing the original time series data along with the fitted values for the training set.

Let’s take a look at the output.

output$calibration_tbl
# Modeltime Table
# A tibble: 1 × 5
  .model_id .model     .model_desc .type .calibration_data 
      <int> <list>     <chr>       <chr> <list>            
1         1 <workflow> LM          Test  <tibble [132 × 4]>
output$model_accuracy
# A tibble: 1 × 9
  .model_id .model_desc .type   mae  mape  mase smape  rmse   rsq
      <int> <chr>       <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1         1 LM          Test   31.4  12.0  1.31  11.9  41.7 0.846

And…

output$plot

Overall, the calibrate_and_plot() function is a useful tool for simplifying the process of calibrating time series models on a training set and evaluating their performance on a testing set.

Voila!