Mixture Distributions with {TidyDensity}

code
rtip
tidydensity
mixturemodels
Author

Steven P. Sanderson II, MPH

Published

December 13, 2022

Introduction

A mixture distribution is a type of probability distribution that is created by combining two or more simpler distributions. This allows us to model complex data that may have multiple underlying patterns. For example, a mixture distribution could be used to model a dataset that includes both continuous and discrete variables.

To create a mixture distribution, we first need to specify the individual distributions that will be combined, as well as the weights that determine how much each distribution contributes to the overall mixture. Once we have these components, we can use them to calculate the probability of any given value occurring in the mixture distribution.

Mixture distributions can be useful in a variety of applications, such as data analysis and machine learning. In data analysis, they can be used to model data that is not well-described by a single distribution, and in machine learning, they can be used to improve the performance of predictive models. Overall, mixture distributions are a powerful tool for understanding and working with complex data.

Function

Let’s take a look a function in {TidyDensity} that allows us to do this. At this moment, weights are not a parameter to the function.

tidy_mixture_density(...)

Now let’s take a look at the arguments that get supplied to the ... parameter.

  • ... - The random data you want to pass. Example rnorm(50,0,1) or something like tidy_normal(.mean = 5, .sd = 1)

Example

Let’s take a look at an example.

library(TidyDensity)

output <- tidy_mixture_density(
  rnorm(100, 0, 1), 
  tidy_normal(.mean = 5, .sd = 1)
)

As you can see, you can enter a function that outputs a numeric vector or you can use a {TidyDensity} distribution function.

Let’s take a look at the outputs.

output$data
$dist_tbl
# A tibble: 150 × 2
       x       y
   <int>   <dbl>
 1     1  0.442 
 2     2  1.80  
 3     3  0.571 
 4     4 -0.0365
 5     5  0.854 
 6     6 -0.634 
 7     7 -0.189 
 8     8 -0.415 
 9     9  1.36  
10    10  0.107 
# … with 140 more rows

$dens_tbl
# A tibble: 150 × 2
       x         y
   <dbl>     <dbl>
 1 -4.17 0.0000995
 2 -4.08 0.000145 
 3 -3.98 0.000207 
 4 -3.89 0.000294 
 5 -3.79 0.000413 
 6 -3.70 0.000574 
 7 -3.60 0.000788 
 8 -3.51 0.00107  
 9 -3.41 0.00145  
10 -3.32 0.00193  
# … with 140 more rows

$input_data
$input_data$`rnorm(100, 0, 1)`
  [1]  0.44169781  1.80418306  0.57133927 -0.03649729  0.85387119 -0.63383074
  [7] -0.18854658 -0.41451222  1.36023418  0.10726858  0.08526992 -0.64879496
 [13]  0.69255412 -0.75735669  0.19705920 -0.17721516 -0.63079170 -1.39983310
 [19]  1.01755199 -0.83631414  0.72912414 -0.14737137  1.27082258 -1.04753889
 [25] -0.16141490  0.22198899  2.83598596 -0.22484669 -0.58487594 -0.62746477
 [31] -0.81873031  1.74559087  1.36529721  1.45023471 -0.06258668  2.14467649
 [37]  0.10043517 -0.67990809  2.85050168 -1.45216256  0.01049808  0.22827703
 [43] -0.51146361  0.43143915 -0.59915348  1.61324991 -0.58580448 -0.46120961
 [49]  0.98191810 -0.31593955  0.86164296  1.18808250  1.09066101  0.39150090
 [55]  0.50730674  1.88640675  1.55522681 -0.65149477 -0.27561149 -0.31867192
 [61]  0.08555271 -1.00047014  1.12127311 -1.23597493  0.96384070  0.99097697
 [67] -0.25932523  0.25407058 -0.35294377 -0.72055148 -0.40429088 -0.08843004
 [73]  0.95498089 -0.68453125  1.67531797 -0.20665261  0.57318766 -0.12758793
 [79] -0.38044927  1.81833828  1.05959931  0.08519174  0.16865694 -0.15828443
 [85]  0.08736815  0.70222886  1.27180668  0.76483122 -0.43573173  0.02909088
 [91] -1.31286933 -0.09244617  0.22188836 -0.88909052  1.22243358  0.48397190
 [97]  0.82291445  0.46595188  0.68619052 -1.65739185

$input_data$`tidy_normal(.mean = 5, .sd = 1)`
# A tibble: 50 × 7
   sim_number     x     y    dx       dy      p     q
   <fct>      <int> <dbl> <dbl>    <dbl>  <dbl> <dbl>
 1 1              1  6.20  1.24 0.000230 0.886   6.20
 2 1              2  3.95  1.40 0.000639 0.146   3.95
 3 1              3  4.59  1.55 0.00158  0.342   4.59
 4 1              4  3.16  1.70 0.00349  0.0326  3.16
 5 1              5  5.03  1.86 0.00689  0.513   5.03
 6 1              6  4.89  2.01 0.0122   0.455   4.89
 7 1              7  5.49  2.16 0.0195   0.687   5.49
 8 1              8  6.78  2.32 0.0283   0.962   6.78
 9 1              9  5.17  2.47 0.0376   0.566   5.17
10 1             10  6.36  2.62 0.0464   0.913   6.36
# … with 40 more rows

And now the visuals that come with it.

output$plots
$line_plot


$dens_plot

The function also lists the input functions as well.

output$input_fns
[[1]]
rnorm(100, 0, 1)

[[2]]
tidy_normal(.mean = 5, .sd = 1)

Voila!