Fitting a Distribution to Data in R

rtip
distribution
Author

Steven P. Sanderson II, MPH

Published

November 2, 2023

Introduction

The gamma distribution is a continuous probability distribution that is often used to model waiting times or other positively skewed data. It is a two-parameter distribution, where the shape parameter controls the skewness of the distribution and the scale parameter controls the spread of the distribution.

Fitting a gamma distribution to a dataset in R

There are two main ways to fit a gamma distribution to a dataset in R:

  1. Maximum likelihood estimation (MLE): This method estimates the parameters of the gamma distribution that are most likely to have produced the observed data.
  2. Method of moments: This method estimates the parameters of the gamma distribution by equating the sample mean and variance to the theoretical mean and variance of the gamma distribution.

MLE is the more common and generally more reliable method of fitting a gamma distribution to a dataset. To fit a gamma distribution to a dataset using MLE, we can use the fitdist() function from the fitdistrplus package.

# Install the fitdistrplus package if necessary
#install.packages("fitdistrplus")

# Load the fitdistrplus package
library(fitdistrplus)
library(TidyDensity)

set.seed(123)
data <- tidy_gamma(.n = 500)$y

# Fit a gamma distribution to the data
fit <- fitdist(data, distr = "gamma", method = "mle")

The fit object contains the estimated parameters of the gamma distribution, as well as other information about the fit. We can access the estimated parameters using the coef() function. Now the tidy_gamma() function from the TidyDensity package comes with a default setting of a .scale = 0.3 and shape = 1. The rate is 1/.scale, so by default it is 3.33333

# Get the estimated parameters of the gamma distribution
coef(fit)
   shape     rate 
1.031833 3.594773 

Now let’s see how that compares to the built in TidyDensity function:

util_gamma_param_estimate(data)$parameter_tbl[1,c("shape","scale","shape_ratio")]
# A tibble: 1 × 3
  shape scale shape_ratio
  <dbl> <dbl>       <dbl>
1 0.983 0.292        3.36

In the above, the shape_ratio is the rate

Try on your own!

I encourage you to try fitting a gamma distribution to your own data. You can use the fitdistrplus package in R to fit a gamma distribution to any dataset. Once you have fitted a gamma distribution to your data, you can use the estimated parameters to generate random samples from the gamma distribution or to calculate the probability of observing a particular value.