library(healthyR.ts)
#> Registered S3 method overwritten by 'tune':
#>   method                   from
#>   required_pkgs.model_spec parsnip
#> == Welcome to healthyR.ts ======================================================
#> If you find this package useful, please leave a star: https://github.com/spsanderson/healthyR.ts
#> If you encounter a bug or want to request an enhancement please file an issue at:
#>    https://github.com/spsanderson/healthyR.ts/issues
#> Thank you for using healthyR.ts!

# Introduction

In this vignette we will discuss how to use the tidy_fft function, what it does, and what it produces.

# The Function

The tidy_fft function has only a few parameters, six to be exact. There are some sensible defaults made. It is important that when you use this function, that you supply it with a full time-series data set, one that has no missing data in it as this will affect your results.

## Funcation and Parameters

The function and its full parameters are as follows:

tidy_fft(
.data,
.date_col,
.value_col,
.frequency = 12L,
.harmonics = 1L,
.upsampling = 10L
)

The .data argument is the actual formatted data that will get passed to the function, the time series data. The .date_col argument is the column that holds the datetime of interest. The .value column is the column that holds the value that is being analyzed by the function, this can be counts, averages, any type of value that is in the time series. The .frequency argument details the cyclical nature of the data, is it 12 for monthly, 7 for weekly, etc. The .harmonics argument will tell the function how many times the fft should be run internally and how many filters should be made. Finally the .upsampling argument will tell the function how much the function should up sample the time parameter.

Let us now work through a simple example.

# Example

## Data

Lets get started with some data.

suppressPackageStartupMessages(library(healthyR.data))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(timetk))

data_tbl <- healthyR_data%>%
filter(ip_op_flag == 'I') %>%
summarise_by_time(
.date_var = visit_end_date_time,
.by = "month",
value = n()
) %>%
filter_by_time(
.date_var = visit_end_date_time,
.start_date = "2015",
.end_date = "2019"
) %>%
rename(date_col = visit_end_date_time)

Now that we have our sample data, let’s check it out.

glimpse(data_tbl)
#> Rows: 60
#> Columns: 2
#> $date_col <dttm> 2015-01-01, 2015-02-01, 2015-03-01, 2015-04-01, 2015-05-01, ~ #>$ value    <int> 1172, 966, 961, 1006, 991, 1073, 1143, 1130, 1061, 1101, 981,~

## Plot Data

Lets take a look at a time series plot of the data.

suppressPackageStartupMessages(library(timetk))

data_tbl %>%
plot_time_series(
.date_var = date_col,
.value    = value
)

Now that we know what our data looks like, lets go ahead and run it through the function and assign it to a variable called output

## Run Function

output <- tidy_fft(
.data = data_tbl,
.date_col = date_col,
.value_col = value,
.harmonics = 8,
.frequency = 12,
.upsampling = 5
)

Now that we have run the function, let’s take a look at the output.

# Output

The function invisibly returns a list object, hence the need to assign it to a variable. There are a total of 4 different sections of data in the list that are returned. These are:

• data
• plots
• parameters
• model

## Output Data

In this section we will go over all of the data components that are returned. We can access all of the data in the usual format output$data, which in of itself will return another list of objects, 7 to be specific. Lets go through them all. ### data The data element accessed by output$data$data is the original data with a few elements added to it. Let’s take a look: output$data$data %>% glimpse() #> Rows: 2,400 #> Columns: 6 #>$ harmonic   <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,~
#> $time <dbl> 1.0, 1.2, 1.4, 1.6, 1.8, 2.0, 2.2, 2.4, 2.6, 2.8, 3.0, 3.2,~ #>$ y_actual   <int> 1172, NA, NA, NA, NA, 966, NA, NA, NA, NA, 961, NA, NA, NA,~
#> $y_hat <dbl> 978.4624, 979.1071, 979.7605, 980.4221, 981.0918, 981.7692,~ #>$ x          <dbl> 1, 0, 0, 0, 0, 2, 0, 0, 0, 0, 3, 0, 0, 0, 0, 4, 0, 0, 0, 0,~
#> $error_term <dbl> 193.537557, NA, NA, NA, NA, -15.769221, NA, NA, NA, NA, -24~ ### error_data The error_data element accessed by output$data$error_data is a tibble that has the original data, plus a few other elements and an error term that is the actual value minus the harmonic output. This is done for each harmonic level. output$data$error_data %>% glimpse() #> Rows: 480 #> Columns: 6 #>$ harmonic   <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,~
#> $time <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, ~ #>$ y_actual   <int> 1172, 966, 961, 1006, 991, 1073, 1143, 1130, 1061, 1101, 98~
#> $y_hat <dbl> 978.4624, 981.7692, 985.2620, 988.9026, 992.6511, 996.4664,~ #>$ x          <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, ~
#> $error_term <dbl> 193.5375572, -15.7692207, -24.2620436, 17.0973566, -1.65113~ ### input_vector The input_vector is just the value column that was passed to the function. output$data$input_vector #> [1] 1172 966 961 1006 991 1073 1143 1130 1061 1101 981 1069 1065 980 1115 #> [16] 997 1083 1032 962 993 921 911 928 1030 1072 938 1077 961 1041 1060 #> [31] 1018 988 1007 1009 979 1023 1145 985 1015 1016 1040 1117 1057 1040 829 #> [46] 1027 949 916 1009 918 961 908 961 904 913 862 849 913 860 887 ### maximum_harmonic_tbl The maximum_harmonic_tbl is a tibble that has data regarding the maximum harmonic entered into the function, this will be the most flexible data returned. output$data$maximum_harmonic_tbl %>% glimpse() #> Rows: 300 #> Columns: 6 #>$ harmonic   <fct> 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,~
#> $time <dbl> 1.0, 1.2, 1.4, 1.6, 1.8, 2.0, 2.2, 2.4, 2.6, 2.8, 3.0, 3.2,~ #>$ y_actual   <int> 1172, NA, NA, NA, NA, 966, NA, NA, NA, NA, 961, NA, NA, NA,~
#> $y_hat <dbl> 987.7486, 990.7363, 993.1633, 995.0839, 996.5661, 997.6894,~ #>$ x          <dbl> 1, 0, 0, 0, 0, 2, 0, 0, 0, 0, 3, 0, 0, 0, 0, 4, 0, 0, 0, 0,~

### dff_tbl

The dff_tbl is a tibble that is returned that has the fft values, the complex, real and imaginary parts.

output$data$dff_tbl %>%
glimpse()
#> Rows: 60
#> Columns: 3
#> $dff_trans <cpl> 59925.00000+0.00000i, -608.62672-917.15896i, -187.91767-1762~ #>$ real_part <dbl> 59925.000000, -608.626716, -187.917671, -267.179120, -94.543~
#> $imag_part <dbl> 0.000000, -917.158962, -1762.682114, -519.897564, 66.663422,~ ### ts_obj The last data piece of the data section is the ts_obj. This is a ts version of the input_vector output$data$ts_obj #> Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec #> 2015 1172 966 961 1006 991 1073 1143 1130 1061 1101 981 1069 #> 2016 1065 980 1115 997 1083 1032 962 993 921 911 928 1030 #> 2017 1072 938 1077 961 1041 1060 1018 988 1007 1009 979 1023 #> 2018 1145 985 1015 1016 1040 1117 1057 1040 829 1027 949 916 #> 2019 1009 918 961 908 961 904 913 862 849 913 860 887 ## Output Plots There are a total of five plots that are returned in the list. Three of them are ggplot plots and two of them are plotly::ggplotly plots. ### harmonic_plt The harmonic_plot is a ggplot plot that shows all of the harmonic waves on the same graph if you set .harmonics greater than 1. output$plots$harmonic_plot ### diff_plot The diff_plot is a ggplot plot of the lag 1 differenced_value_tbl output$plots$diff_plot ### max_har_plot The max_har_plot is a ggplot plot of the maximum harmonic wave entered into .harmonics output$plots$max_har_plot ### harmonic_plotly The harmonic_plotly is a plotly::ggplotly plot of the harmonic_plot output$plots$harmonic_plotly ### max_har_plotly The max_har_plotly is a plotly::ggplotly plot of the max_har_plot output$plots$max_har_plotly ## Output Parameters ### parameters The parameters element is a list of input parameters and internal parameters. output$parameters
#> $harmonics #> [1] 5 #> #>$upsampling
#> [1] 8
#>
#> $start_date #> [1] "2015-01-01 UTC" #> #>$end_date
#> [1] "2019-12-01 UTC"
#>
#> $freq #> [1] 12 ## Output Model The model portion has four pieces to it which we will look at below. ### m The parameter m is an internal parameter that is equal to .harmonics / 2. This is fed into TSA::harmonic along with the ts_obj The parameter harmonic_obj is the object returned from TSA::harmonic The parameter harmonic_model is the harmonic model from the TSA::harmonic The parameter model_summary is a summary of the harmonic model. output$model$m #> [1] 6 output$model$harmonic_obj %>% head() #> cos(2*pi*t) cos(4*pi*t) cos(6*pi*t) cos(8*pi*t) cos(10*pi*t) #> [1,] 1.000000e+00 1.0 1.000000e+00 1.0 1.000000e+00 #> [2,] 8.660254e-01 0.5 3.419656e-13 -0.5 -8.660254e-01 #> [3,] 5.000000e-01 -0.5 -1.000000e+00 -0.5 5.000000e-01 #> [4,] 1.216555e-12 -1.0 -5.468654e-12 1.0 4.263785e-12 #> [5,] -5.000000e-01 -0.5 1.000000e+00 -0.5 -5.000000e-01 #> [6,] -8.660254e-01 0.5 3.319385e-12 -0.5 8.660254e-01 #> cos(12*pi*t) sin(2*pi*t) sin(4*pi*t) sin(6*pi*t) sin(8*pi*t) #> [1,] 1 -4.722001e-13 -9.444002e-13 -5.054579e-12 -1.888800e-12 #> [2,] -1 5.000000e-01 8.660254e-01 1.000000e+00 8.660254e-01 #> [3,] 1 8.660254e-01 8.660254e-01 -4.370648e-12 -8.660254e-01 #> [4,] -1 1.000000e+00 2.433110e-12 -1.000000e+00 -4.866219e-12 #> [5,] 1 8.660254e-01 -8.660254e-01 -7.560404e-13 8.660254e-01 #> [6,] -1 5.000000e-01 -8.660254e-01 1.000000e+00 -8.660254e-01 #> sin(10*pi*t) #> [1,] 1.276978e-12 #> [2,] 5.000000e-01 #> [3,] -8.660254e-01 #> [4,] 1.000000e+00 #> [5,] -8.660254e-01 #> [6,] 5.000000e-01 output$model$harmonic_model #> #> Call: #> stats::lm(formula = ts_obj ~ har_) #> #> Coefficients: #> (Intercept) har_cos(2*pi*t) har_cos(4*pi*t) har_cos(6*pi*t) #> 998.750 -1.008 28.600 10.900 #> har_cos(8*pi*t) har_cos(10*pi*t) har_cos(12*pi*t) har_sin(2*pi*t) #> 21.500 27.108 6.750 23.582 #> har_sin(4*pi*t) har_sin(6*pi*t) har_sin(8*pi*t) har_sin(10*pi*t) #> -9.469 3.600 -8.487 -27.282 output$model\$model_summary
#>
#> Call:
#> stats::lm(formula = ts_obj ~ har_)
#>
#> Residuals:
#>    Min     1Q Median     3Q    Max
#> -140.6  -58.0    9.1   38.7  127.6
#>
#> Coefficients:
#>                  Estimate Std. Error t value Pr(>|t|)
#> (Intercept)       998.750      9.358 106.732   <2e-16 ***
#> har_cos(2*pi*t)    -1.008     13.234  -0.076   0.9396
#> har_cos(4*pi*t)    28.600     13.234   2.161   0.0357 *
#> har_cos(6*pi*t)    10.900     13.234   0.824   0.4142
#> har_cos(8*pi*t)    21.500     13.234   1.625   0.1108
#> har_cos(10*pi*t)   27.108     13.234   2.048   0.0460 *
#> har_cos(12*pi*t)    6.750      9.358   0.721   0.4742
#> har_sin(2*pi*t)    23.582     13.234   1.782   0.0811 .
#> har_sin(4*pi*t)    -9.469     13.234  -0.715   0.4778
#> har_sin(6*pi*t)     3.600     13.234   0.272   0.7868
#> har_sin(8*pi*t)    -8.487     13.234  -0.641   0.5244
#> har_sin(10*pi*t)  -27.282     13.234  -2.062   0.0447 *
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 72.48 on 48 degrees of freedom
#> Multiple R-squared:  0.3057, Adjusted R-squared:  0.1466
#> F-statistic: 1.921 on 11 and 48 DF,  p-value: 0.05984