library(healthyR.ts)
#> Registered S3 method overwritten by 'tune':
#>   method                   from   
#>   required_pkgs.model_spec parsnip
#> == Welcome to healthyR.ts ======================================================
#> If you find this package useful, please leave a star: https://github.com/spsanderson/healthyR.ts
#> If you encounter a bug or want to request an enhancement please file an issue at:
#>    https://github.com/spsanderson/healthyR.ts/issues
#> Thank you for using healthyR.ts!

Introduction

In this vignette we will discuss how to use the tidy_fft function, what it does, and what it produces.

The Function

The tidy_fft function has only a few parameters, six to be exact. There are some sensible defaults made. It is important that when you use this function, that you supply it with a full time-series data set, one that has no missing data in it as this will affect your results.

Funcation and Parameters

The function and its full parameters are as follows:

tidy_fft(
  .data,
  .date_col,
  .value_col,
  .frequency = 12L,
  .harmonics = 1L,
  .upsampling = 10L
)

The .data argument is the actual formatted data that will get passed to the function, the time series data. The .date_col argument is the column that holds the datetime of interest. The .value column is the column that holds the value that is being analyzed by the function, this can be counts, averages, any type of value that is in the time series. The .frequency argument details the cyclical nature of the data, is it 12 for monthly, 7 for weekly, etc. The .harmonics argument will tell the function how many times the fft should be run internally and how many filters should be made. Finally the .upsampling argument will tell the function how much the function should up sample the time parameter.

Let us now work through a simple example.

Example

Data

Lets get started with some data.

suppressPackageStartupMessages(library(healthyR.data))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(timetk))

data_tbl <- healthyR_data%>%
    filter(ip_op_flag == 'I') %>%
    summarise_by_time(
        .date_var = visit_end_date_time,
        .by = "month",
        value = n()
    ) %>%
    filter_by_time(
        .date_var = visit_end_date_time,
        .start_date = "2015",
        .end_date = "2019"
   ) %>%
  rename(date_col = visit_end_date_time)

Now that we have our sample data, let’s check it out.

glimpse(data_tbl)
#> Rows: 60
#> Columns: 2
#> $ date_col <dttm> 2015-01-01, 2015-02-01, 2015-03-01, 2015-04-01, 2015-05-01, ~
#> $ value    <int> 1172, 966, 961, 1006, 991, 1073, 1143, 1130, 1061, 1101, 981,~

Plot Data

Lets take a look at a time series plot of the data.

suppressPackageStartupMessages(library(timetk))

data_tbl %>%
  plot_time_series(
    .date_var = date_col,
    .value    = value
  )

Now that we know what our data looks like, lets go ahead and run it through the function and assign it to a variable called output

Run Function

output <- tidy_fft(
  .data = data_tbl,
  .date_col = date_col,
  .value_col = value,
  .harmonics = 8,
  .frequency = 12,
  .upsampling = 5
)

Now that we have run the function, let’s take a look at the output.

Output

The function invisibly returns a list object, hence the need to assign it to a variable. There are a total of 4 different sections of data in the list that are returned. These are:

  • data
  • plots
  • parameters
  • model

Output Data

In this section we will go over all of the data components that are returned. We can access all of the data in the usual format output$data, which in of itself will return another list of objects, 7 to be specific. Lets go through them all.

data

The data element accessed by output$data$data is the original data with a few elements added to it. Let’s take a look:

output$data$data %>%
  glimpse()
#> Rows: 2,400
#> Columns: 6
#> $ harmonic   <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,~
#> $ time       <dbl> 1.0, 1.2, 1.4, 1.6, 1.8, 2.0, 2.2, 2.4, 2.6, 2.8, 3.0, 3.2,~
#> $ y_actual   <int> 1172, NA, NA, NA, NA, 966, NA, NA, NA, NA, 961, NA, NA, NA,~
#> $ y_hat      <dbl> 978.4624, 979.1071, 979.7605, 980.4221, 981.0918, 981.7692,~
#> $ x          <dbl> 1, 0, 0, 0, 0, 2, 0, 0, 0, 0, 3, 0, 0, 0, 0, 4, 0, 0, 0, 0,~
#> $ error_term <dbl> 193.537557, NA, NA, NA, NA, -15.769221, NA, NA, NA, NA, -24~

error_data

The error_data element accessed by output$data$error_data is a tibble that has the original data, plus a few other elements and an error term that is the actual value minus the harmonic output. This is done for each harmonic level.

output$data$error_data %>%
  glimpse()
#> Rows: 480
#> Columns: 6
#> $ harmonic   <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,~
#> $ time       <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, ~
#> $ y_actual   <int> 1172, 966, 961, 1006, 991, 1073, 1143, 1130, 1061, 1101, 98~
#> $ y_hat      <dbl> 978.4624, 981.7692, 985.2620, 988.9026, 992.6511, 996.4664,~
#> $ x          <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, ~
#> $ error_term <dbl> 193.5375572, -15.7692207, -24.2620436, 17.0973566, -1.65113~

input_vector

The input_vector is just the value column that was passed to the function.

output$data$input_vector
#>  [1] 1172  966  961 1006  991 1073 1143 1130 1061 1101  981 1069 1065  980 1115
#> [16]  997 1083 1032  962  993  921  911  928 1030 1072  938 1077  961 1041 1060
#> [31] 1018  988 1007 1009  979 1023 1145  985 1015 1016 1040 1117 1057 1040  829
#> [46] 1027  949  916 1009  918  961  908  961  904  913  862  849  913  860  887

maximum_harmonic_tbl

The maximum_harmonic_tbl is a tibble that has data regarding the maximum harmonic entered into the function, this will be the most flexible data returned.

output$data$maximum_harmonic_tbl %>%
  glimpse()
#> Rows: 300
#> Columns: 6
#> $ harmonic   <fct> 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,~
#> $ time       <dbl> 1.0, 1.2, 1.4, 1.6, 1.8, 2.0, 2.2, 2.4, 2.6, 2.8, 3.0, 3.2,~
#> $ y_actual   <int> 1172, NA, NA, NA, NA, 966, NA, NA, NA, NA, 961, NA, NA, NA,~
#> $ y_hat      <dbl> 987.7486, 990.7363, 993.1633, 995.0839, 996.5661, 997.6894,~
#> $ x          <dbl> 1, 0, 0, 0, 0, 2, 0, 0, 0, 0, 3, 0, 0, 0, 0, 4, 0, 0, 0, 0,~
#> $ error_term <dbl> 184.251402, NA, NA, NA, NA, -31.689374, NA, NA, NA, NA, -40~

differenced_value_tbl

The differenced_value_tbl is a tibble that has a lag 1 difference of the value column supplied.

output$data$differenced_value_tbl %>%
  glimpse()
#> Rows: 59
#> Columns: 1
#> $ value <int> -206, -5, 45, -15, 82, 70, -13, -69, 40, -120, 88, -4, -85, 135,~

dff_tbl

The dff_tbl is a tibble that is returned that has the fft values, the complex, real and imaginary parts.

output$data$dff_tbl %>%
  glimpse()
#> Rows: 60
#> Columns: 3
#> $ dff_trans <cpl> 59925.00000+0.00000i, -608.62672-917.15896i, -187.91767-1762~
#> $ real_part <dbl> 59925.000000, -608.626716, -187.917671, -267.179120, -94.543~
#> $ imag_part <dbl> 0.000000, -917.158962, -1762.682114, -519.897564, 66.663422,~

ts_obj

The last data piece of the data section is the ts_obj. This is a ts version of the input_vector

output$data$ts_obj
#>       Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
#> 2015 1172  966  961 1006  991 1073 1143 1130 1061 1101  981 1069
#> 2016 1065  980 1115  997 1083 1032  962  993  921  911  928 1030
#> 2017 1072  938 1077  961 1041 1060 1018  988 1007 1009  979 1023
#> 2018 1145  985 1015 1016 1040 1117 1057 1040  829 1027  949  916
#> 2019 1009  918  961  908  961  904  913  862  849  913  860  887

Output Plots

There are a total of five plots that are returned in the list. Three of them are ggplot plots and two of them are plotly::ggplotly plots.

harmonic_plt

The harmonic_plot is a ggplot plot that shows all of the harmonic waves on the same graph if you set .harmonics greater than 1.

output$plots$harmonic_plot

diff_plot

The diff_plot is a ggplot plot of the lag 1 differenced_value_tbl

output$plots$diff_plot

max_har_plot

The max_har_plot is a ggplot plot of the maximum harmonic wave entered into .harmonics

output$plots$max_har_plot

harmonic_plotly

The harmonic_plotly is a plotly::ggplotly plot of the harmonic_plot

output$plots$harmonic_plotly

max_har_plotly

The max_har_plotly is a plotly::ggplotly plot of the max_har_plot

output$plots$max_har_plotly

Output Parameters

parameters

The parameters element is a list of input parameters and internal parameters.

output$parameters
#> $harmonics
#> [1] 5
#> 
#> $upsampling
#> [1] 8
#> 
#> $start_date
#> [1] "2015-01-01 UTC"
#> 
#> $end_date
#> [1] "2019-12-01 UTC"
#> 
#> $freq
#> [1] 12

Output Model

The model portion has four pieces to it which we will look at below.

m

The parameter m is an internal parameter that is equal to .harmonics / 2. This is fed into TSA::harmonic along with the ts_obj

The parameter harmonic_obj is the object returned from TSA::harmonic

The parameter harmonic_model is the harmonic model from the TSA::harmonic

The parameter model_summary is a summary of the harmonic model.

output$model$m
#> [1] 6
output$model$harmonic_obj %>% head()
#>        cos(2*pi*t) cos(4*pi*t)   cos(6*pi*t) cos(8*pi*t)  cos(10*pi*t)
#> [1,]  1.000000e+00         1.0  1.000000e+00         1.0  1.000000e+00
#> [2,]  8.660254e-01         0.5  3.419656e-13        -0.5 -8.660254e-01
#> [3,]  5.000000e-01        -0.5 -1.000000e+00        -0.5  5.000000e-01
#> [4,]  1.216555e-12        -1.0 -5.468654e-12         1.0  4.263785e-12
#> [5,] -5.000000e-01        -0.5  1.000000e+00        -0.5 -5.000000e-01
#> [6,] -8.660254e-01         0.5  3.319385e-12        -0.5  8.660254e-01
#>      cos(12*pi*t)   sin(2*pi*t)   sin(4*pi*t)   sin(6*pi*t)   sin(8*pi*t)
#> [1,]            1 -4.722001e-13 -9.444002e-13 -5.054579e-12 -1.888800e-12
#> [2,]           -1  5.000000e-01  8.660254e-01  1.000000e+00  8.660254e-01
#> [3,]            1  8.660254e-01  8.660254e-01 -4.370648e-12 -8.660254e-01
#> [4,]           -1  1.000000e+00  2.433110e-12 -1.000000e+00 -4.866219e-12
#> [5,]            1  8.660254e-01 -8.660254e-01 -7.560404e-13  8.660254e-01
#> [6,]           -1  5.000000e-01 -8.660254e-01  1.000000e+00 -8.660254e-01
#>       sin(10*pi*t)
#> [1,]  1.276978e-12
#> [2,]  5.000000e-01
#> [3,] -8.660254e-01
#> [4,]  1.000000e+00
#> [5,] -8.660254e-01
#> [6,]  5.000000e-01
output$model$harmonic_model
#> 
#> Call:
#> stats::lm(formula = ts_obj ~ har_)
#> 
#> Coefficients:
#>      (Intercept)   har_cos(2*pi*t)   har_cos(4*pi*t)   har_cos(6*pi*t)  
#>          998.750            -1.008            28.600            10.900  
#>  har_cos(8*pi*t)  har_cos(10*pi*t)  har_cos(12*pi*t)   har_sin(2*pi*t)  
#>           21.500            27.108             6.750            23.582  
#>  har_sin(4*pi*t)   har_sin(6*pi*t)   har_sin(8*pi*t)  har_sin(10*pi*t)  
#>           -9.469             3.600            -8.487           -27.282
output$model$model_summary
#> 
#> Call:
#> stats::lm(formula = ts_obj ~ har_)
#> 
#> Residuals:
#>    Min     1Q Median     3Q    Max 
#> -140.6  -58.0    9.1   38.7  127.6 
#> 
#> Coefficients:
#>                  Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)       998.750      9.358 106.732   <2e-16 ***
#> har_cos(2*pi*t)    -1.008     13.234  -0.076   0.9396    
#> har_cos(4*pi*t)    28.600     13.234   2.161   0.0357 *  
#> har_cos(6*pi*t)    10.900     13.234   0.824   0.4142    
#> har_cos(8*pi*t)    21.500     13.234   1.625   0.1108    
#> har_cos(10*pi*t)   27.108     13.234   2.048   0.0460 *  
#> har_cos(12*pi*t)    6.750      9.358   0.721   0.4742    
#> har_sin(2*pi*t)    23.582     13.234   1.782   0.0811 .  
#> har_sin(4*pi*t)    -9.469     13.234  -0.715   0.4778    
#> har_sin(6*pi*t)     3.600     13.234   0.272   0.7868    
#> har_sin(8*pi*t)    -8.487     13.234  -0.641   0.5244    
#> har_sin(10*pi*t)  -27.282     13.234  -2.062   0.0447 *  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 72.48 on 48 degrees of freedom
#> Multiple R-squared:  0.3057, Adjusted R-squared:  0.1466 
#> F-statistic: 1.921 on 11 and 48 DF,  p-value: 0.05984