# Fitting a Distribution to Data in R

rtip
distribution
Author

Steven P. Sanderson II, MPH

Published

November 2, 2023

# Introduction

The gamma distribution is a continuous probability distribution that is often used to model waiting times or other positively skewed data. It is a two-parameter distribution, where the shape parameter controls the skewness of the distribution and the scale parameter controls the spread of the distribution.

# Fitting a gamma distribution to a dataset in R

There are two main ways to fit a gamma distribution to a dataset in R:

1. Maximum likelihood estimation (MLE): This method estimates the parameters of the gamma distribution that are most likely to have produced the observed data.
2. Method of moments: This method estimates the parameters of the gamma distribution by equating the sample mean and variance to the theoretical mean and variance of the gamma distribution.

MLE is the more common and generally more reliable method of fitting a gamma distribution to a dataset. To fit a gamma distribution to a dataset using MLE, we can use the `fitdist()` function from the `fitdistrplus` package.

``````# Install the fitdistrplus package if necessary
#install.packages("fitdistrplus")

library(fitdistrplus)
library(TidyDensity)

set.seed(123)
data <- tidy_gamma(.n = 500)\$y

# Fit a gamma distribution to the data
fit <- fitdist(data, distr = "gamma", method = "mle")``````

The `fit` object contains the estimated parameters of the gamma distribution, as well as other information about the fit. We can access the estimated parameters using the `coef()` function. Now the `tidy_gamma()` function from the TidyDensity package comes with a default setting of a `.scale = 0.3` and `shape = 1`. The rate is `1/.scale`, so by default it is 3.33333

``````# Get the estimated parameters of the gamma distribution
coef(fit)``````
``````   shape     rate
1.031833 3.594773 ``````

Now let’s see how that compares to the built in TidyDensity function:

``util_gamma_param_estimate(data)\$parameter_tbl[1,c("shape","scale","shape_ratio")]``
``````# A tibble: 1 × 3
shape scale shape_ratio
<dbl> <dbl>       <dbl>
1 0.983 0.292        3.36``````

In the above, the `shape_ratio` is the `rate`

I encourage you to try fitting a gamma distribution to your own data. You can use the `fitdistrplus` package in R to fit a gamma distribution to any dataset. Once you have fitted a gamma distribution to your data, you can use the estimated parameters to generate random samples from the gamma distribution or to calculate the probability of observing a particular value.