```
<- mtcars$mpg
x
x
```

```
[1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
[16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
[31] 15.0 21.4
```

code

tidydensity

bootstrap

weeklytip

Author

Steven P. Sanderson II, MPH

Published

November 4, 2022

Many times in the real world we have a data set which is actually a sample as we typically do not know what the actual population is. This is where **bootstrapping** tends to come into play. It allows us to get a hold on what the possible parameter values are by taking repeated samples of the data that is available to us.

At it’s core it is a **resampling** method **with replacement** where it assigns measures of accuracy to the sample estimates. Here is the Wikipedia Article for bootstrapping.

In this post I am going to go over how to use the *bootstrap* function set with `{TidyDensity}`

. You can find the `pkgdown`

site with all function references here: TidyDensity

The first thing we will need is a dataset, and for this we are going to pick on the `mtcars`

dataset and more specifically the `mpg`

column. So let’s get to it!

```
[1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
[16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
[31] 15.0 21.4
```

We see that `x`

is a numeric vector, which is what we need for these `{TidyDensity}`

functions. The first function that `x`

will be put through is a function called `tidy_bootstrap()`

Let’s take a look at the full function call and parameters of this function.

What you see above are the defaults for the function. Now lets go through the parameters.

`.x`

- This is of course the numeric vector that you are passing to the function, in our case right now, it is`x`

which we set to`mtcars$mpg`

`.num_sims`

- This is how many simulations you want to run of`x`

. This is done**with replacement**. So this is dictating how many bootstrap samples of`x`

we want to take.`.proportion`

- How much of the data do you want to sample? The default here is 80%`.distribution_type`

- What kind of distribution are you sampling from? Is it a continuous or discrete distribution. This is important for plotting.

The function returns a `tibble`

with the bootstrap column as a `list`

object. Lets take a look at `tidy_bootstrap(x)`

. We are going to set simulations to 50 instead of the default 2000.

```
# A tibble: 50 × 2
sim_number bootstrap_samples
<fct> <list>
1 1 <dbl [25]>
2 2 <dbl [25]>
3 3 <dbl [25]>
4 4 <dbl [25]>
5 5 <dbl [25]>
6 6 <dbl [25]>
7 7 <dbl [25]>
8 8 <dbl [25]>
9 9 <dbl [25]>
10 10 <dbl [25]>
# … with 40 more rows
```

The column `bootstrap_samples`

holds the bootstrapped resamples of `x`

at the given `.proportion`

, in this instance, 80%.

From this point we can go straight into use the `bootstrap_stat_plot()`

function if we choose. Under-the-hood it will make use of `bootstrap_unnest_tbl()`

. All this function does is act as a helper to unnest the `bootstrap_samples`

column of the returned tibble from `tidy_bootstrap()`

Let’s take a look below.

```
# A tibble: 1,250 × 2
sim_number y
<fct> <dbl>
1 1 22.8
2 1 15.5
3 1 21.5
4 1 15
5 1 10.4
6 1 22.8
7 1 16.4
8 1 30.4
9 1 26
10 1 19.2
# … with 1,240 more rows
```

Now let’s get into the `bootstrap_stat_plot()`

function of `{TidyDensity}`

The function `bootstrap_stat_plot()`

was designed to handle data either from the `tidy_bootstrap()`

or `bootstrap_unnest_tbl()`

functions only. This was to ensure that the right type of data was being passed in and to ensure that the right type of output was guaranteed.

Let’s take a full look at the function call.

There are a few interesting parameters here, but like before we will go through all of them.

`.data`

- This is the data that gets passed from either`tidy_bootstrap()`

or`bootstrap_unnest_tbl()`

`.value`

- This is the column from`bootstrap_unnest_tbl()`

that you want to visualize, this is typically`y`

`.stat`

- There are multiple cumulative stats that will work with this plot. These are all built directly into the {TidyDensity} package. You can find the supported ones that are built into this package at the reference page.`.show_groups`

- Do you want to show all of the simulation groups TRUE/FALSE`.show_ci_labels`

- If set to TRUE then the confidence interval labels will be shows on the graph as the final value.`.interactive`

- Do you want a`plotly`

plot? Who doesn’t?

Now let’s walk though a few examples.

You can see from this output that the statistic you choose is printed in the chart title and on the y axis, the caption will also tell you how many simulations are present. Lets look at skewness as another example.

Volia!