Introduction

Cumulative mean is a statistical measure that calculates the mean of a set of numbers up to a certain point in time or after a certain number of observations. It is also known as a running average or moving average.

Cumulative mean can be useful in a variety of contexts. For example:

Tracking progress: Cumulative mean can be used to track progress over time. For instance, a teacher might use it to track the average test scores of her students throughout the school year.
Analyzing trends: Cumulative mean can help identify trends in data. For example, a business might use it to track the average revenue generated by a new product over the course of several months.
Smoothing data: Cumulative mean can be used to smooth out fluctuations in data. For instance, a meteorologist might use it to calculate the average temperature over the course of a year, which would help to smooth out the effects of daily temperature fluctuations.

In summary, cumulative mean is a useful statistical measure that can help track progress, analyze trends, and smooth out fluctuations in data.

Function

The function we will review is cmean() from the {TidyDensity} R package. Let’s take a look at it.

cmean()

The only argument is .x which is a numeric vector as this is a vectorized function. Let’s see it in use.

Example

First let’s load in TidyDensity

library(TidyDensity)

Ok now let’s make some data. For this we are going to use the simple rnorm() function.

x <- rnorm(100)

head(x)

[1] -0.8293250 -1.2983499  2.2782337 -0.1521549  0.6859169  0.3809020

Ok, now that we have our vector, let’s run it through the function and see what it outputs and then we will graph it.

cmx <- cmean(x)
head(cmx)

[1] -0.8293249774 -1.0638374319  0.0501862766 -0.0003990095  0.1368641726
[6]  0.1775371452

Now let’s graph it.

plot(cmx, type = "l")

Ok nice, so can we do this on grouped data or lists of data? Of course! First let’s use a for loop to generate a list of rnorm() values.

# Initialize an empty list to store the generated values
my_list <- list()

# Generate values using rnorm(5) in a for loop and store them in the list
for (i in 1:5) {
  my_list[[i]] <- rnorm(100)
}

# Print the generated list
purrr::map(my_list, head)

[[1]]
[1] -0.8054353 -0.4596541 -0.2362475  1.1486398 -0.7242154  0.5184610

[[2]]
[1]  0.3243327  0.7170802 -0.5963424 -1.0307104  0.3388504  0.5717486

[[3]]
[1]  1.7360816 -1.0359467 -0.3206138 -1.2157684 -0.8841356  0.1856481

[[4]]
[1] -1.1401642 -0.4437817 -0.2555245 -0.1809040 -0.2131763 -0.1251750

[[5]]
[1]  0.08835903 -1.79153379 -2.15010900  0.67344844  1.06125849  0.99848796

Now that we have our list object let’s go ahead and plot the values out after we pass the data through cmean().

library(purrr)

my_list |>
  map(\(x) x |> cmean() |> plot(type = "l"))

[[1]]
NULL

[[2]]
NULL

[[3]]
NULL

[[4]]
NULL

[[5]]
NULL

From here I think it is easy to see how one could do this on gruoped data as well with dplyr’s group_by().