TidyDensity Update

TidyDensity 1.5.2 boosts R workflows with faster quantile_normalize() and flexible tidy_mixture_density() combinations. Upgrade and explore now!
code
rtip
Author

Steven P. Sanderson II, MPH

Published

September 8, 2025

Keywords

Programming, TidyDensity, quantile_normalize(), tidy_mixture_density(), statistical distribution R, R mixture models, quantile normalization algorithm, mixture density combinations, R statistical package update, efficient quantile normalization, mixture model combination types, how to use tidy_mixture_density with combination types in R, performance improvements in quantile_normalize function TidyDensity, breaking changes in TidyDensity 1.5.2 quantile_normalize output, additive and multiplicative mixture models in TidyDensity R package, memory efficient quantile normalization for large datasets in R

Key Update: TidyDensity 1.5.2 delivers significant performance improvements and enhanced mixture modeling capabilities, but introduces breaking changes.

TidyDensity 1.5.2 has arrived with substantial improvements that will transform how R programmers work with statistical distributions. Released on September 6, 2025, this update brings a fundamentally redesigned quantile_normalize() function and powerful new mixture modeling capabilities through enhanced tidy_mixture_density() functionality . While these changes offer compelling performance benefits, they also introduce breaking changes that require careful consideration for existing workflows.


What is TidyDensity?

TidyDensity is an R package designed to simplify the generation, analysis, and visualization of random numbers from statistical distributions within the tidyverse ecosystem. The package provides a consistent, tidy interface for working with distributional data, making it invaluable for simulation studies, statistical modeling, and exploratory data analysis.

Core capabilities include:

  • Generating tidy random samples from numerous distributions
  • Creating and analyzing mixture models
  • Performing quantile normalization for cross-sample comparability
  • Seamless integration with tidyverse workflows

Breaking Changes: The New quantile_normalize()

Performance Revolution

The most significant change in TidyDensity 1.5.2 is the complete redesign of the quantile_normalize() function. This algorithmic overhaul delivers substantial performance improvements.

Technical Implementation

The new algorithm leverages vectorized operations and indexing techniques, moving away from the classical approach that relied on memory-intensive intermediate storage. The redesign focuses on:

  • Reduced redundant sorting operations
  • In-place memory operations where possible
  • Optimized index mapping for restoring original order
  • Enhanced algorithmic efficiency for large datasets

Why Breaking Changes Occurred

The algorithmic improvements come with a trade-off: slightly different numerical outputs. While the statistical properties remain identical (same quantiles, same normalization effect), the exact element-wise values may differ between versions. The biggest difference is that the function now only returns the normalized data. The old one returned the input data, output data and other intermediate information like duplicate ranks rows.

library(TidyDensity)
Warning: package 'TidyDensity' was built under R version 4.5.1
# Example: Both versions produce identical quantiles
data <- matrix(rnorm(20), ncol = 4)
normalized_data <- quantile_normalize(data)
print(normalized_data)
           [,1]       [,2]       [,3]       [,4]
[1,] -0.9980322  0.8648786  0.8648786  1.2291761
[2,] -0.5656369 -0.5656369  1.2291761  0.8648786
[3,]  1.2291761  0.2846513  0.2846513  0.2846513
[4,]  0.2846513  1.2291761 -0.9980322 -0.9980322
[5,]  0.8648786 -0.9980322 -0.5656369 -0.5656369
# All columns now share identical quantile distributions
# But individual elements may differ slightly from v1.5.1
as.data.frame(normalized_data) |>
  sapply(function(x) quantile(x, probs = seq(0, 1, 1 / 4)))
             V1         V2         V3         V4
0%   -0.9980322 -0.9980322 -0.9980322 -0.9980322
25%  -0.5656369 -0.5656369 -0.5656369 -0.5656369
50%   0.2846513  0.2846513  0.2846513  0.2846513
75%   0.8648786  0.8648786  0.8648786  0.8648786
100%  1.2291761  1.2291761  1.2291761  1.2291761
# Return in tibble format
quantile_normalize(
data.frame(rand_norm_1 = rnorm(30),
           rand_norm_b = rnorm(30)),
           .return_tibble = TRUE)
# A tibble: 30 × 2
   rand_norm_1 rand_norm_b
         <dbl>       <dbl>
 1     -0.589       1.28  
 2     -1.17        1.04  
 3     -1.85       -0.0239
 4      0.899      -0.232 
 5      0.204       1.21  
 6      1.21       -0.796 
 7     -1.60        1.11  
 8      0.0248     -1.17  
 9     -1.22       -0.710 
10     -1.00        0.370 
# ℹ 20 more rows

Important: The quantile normalization properties are perfectly preserved - all columns have identical quantiles after processing. Only the specific element arrangements differ.

New Features: Enhanced tidy_mixture_density()

Flexible Combination Types

TidyDensity 1.5.2 introduces a powerful .combination_type parameter to tidy_mixture_density(), enabling five different ways to combine distributions :

Combination Type Description Use Case
stack Concatenate all data points (default) Traditional mixture models
add Element-wise addition Additive effects modeling
subtract Element-wise subtraction Difference analysis
multiply Element-wise multiplication Interaction effects
divide Element-wise division Ratio analysis

Practical Examples

# Traditional mixture model (default behavior)
mix_stack <- tidy_mixture_density(
  rnorm(100, 0, 1), 
  tidy_normal(.mean = 5, .sd = 1),
  .combination_type = "stack"
)
mix_stack
$data
$data$dist_tbl
# A tibble: 150 × 2
       x      y
   <int>  <dbl>
 1     1 -0.609
 2     2 -0.370
 3     3 -0.308
 4     4 -0.786
 5     5  0.437
 6     6 -0.552
 7     7  0.303
 8     8 -0.652
 9     9 -0.144
10    10 -0.260
# ℹ 140 more rows

$data$dens_tbl
# A tibble: 150 × 2
       x        y
   <dbl>    <dbl>
 1 -4.28 0.000118
 2 -4.19 0.000171
 3 -4.10 0.000245
 4 -4.01 0.000349
 5 -3.91 0.000489
 6 -3.82 0.000677
 7 -3.73 0.000931
 8 -3.63 0.00126 
 9 -3.54 0.00170 
10 -3.45 0.00226 
# ℹ 140 more rows

$data$input_data
$data$input_data$`rnorm(100, 0, 1)`
  [1] -0.60865878 -0.37038002 -0.30760339 -0.78599279  0.43657861 -0.55247080
  [7]  0.30320995 -0.65226084 -0.14429342 -0.25965742 -1.57960296  0.07128014
 [13]  0.65531402  1.35367040 -0.30645307 -0.82656469  1.32659588  0.36869342
 [19]  0.31268606  1.84046365 -1.35549208 -0.15825175  0.68863337 -1.39859775
 [25] -1.07112427  1.45502151  0.06602545 -0.39876615  0.05499137  0.09214760
 [31]  0.38800665 -1.04310666 -0.93508809  0.78018540 -0.14736187  0.48487063
 [37] -0.71797977 -0.09083663  0.24619862  0.42560605 -0.91303163 -0.40070704
 [43] -0.09056107  2.12683480  0.97909343  0.25586273  0.06160965 -0.24959411
 [49] -0.63688175  0.61513865 -1.80508425 -0.10904217 -1.49586272  0.65779129
 [55] -0.21556674  1.45041449  1.64820547 -0.00864845  1.14990888 -0.14165598
 [61]  1.08637758 -0.47666081  0.31451903  1.59206247 -0.31551530 -1.60855895
 [67]  0.91927450 -0.56171737 -0.17915531  0.25223463  0.99074046  1.09265035
 [73] -0.42699577 -1.42269492  0.28942361 -0.93808071 -0.38747430 -1.04629553
 [79] -0.93624539 -0.89624495 -0.94646613  1.43409772  0.40376921  2.20170782
 [85]  0.14770417 -1.10348135 -0.84095040  0.95636639  1.13483275 -0.43345698
 [91]  0.77418611  0.24623017 -0.49152719  0.97051886 -0.40725688  0.02543623
 [97] -0.16623957 -1.15500277  1.37865589  1.67954844

$data$input_data$`tidy_normal(.mean = 5, .sd = 1)`
# A tibble: 50 × 7
   sim_number     x     y    dx       dy      p     q
   <fct>      <int> <dbl> <dbl>    <dbl>  <dbl> <dbl>
 1 1              1  6.52  2.33 0.000974 0.936   6.52
 2 1              2  3.39  2.45 0.00261  0.0538  3.39
 3 1              3  5.54  2.57 0.00629  0.706   5.54
 4 1              4  5.40  2.69 0.0136   0.655   5.40
 5 1              5  5.64  2.81 0.0262   0.739   5.64
 6 1              6  5.40  2.93 0.0457   0.656   5.40
 7 1              7  5.02  3.05 0.0718   0.508   5.02
 8 1              8  4.34  3.17 0.102    0.254   4.34
 9 1              9  4.81  3.29 0.133    0.423   4.81
10 1             10  4.08  3.41 0.160    0.179   4.08
# ℹ 40 more rows



$plots
$plots$line_plot


$plots$dens_plot



$input_fns
[1] "rnorm(100, 0, 1), tidy_normal(.mean = 5, .sd = 1)"
# Additive mixture for modeling combined effects
mix_additive <- tidy_mixture_density(
  rnorm(50), 
  rbeta(50, 0.5, 0.5), 
  .combination_type = "add"
)
mix_additive
$data
$data$dist_tbl
# A tibble: 50 × 2
       x       y
   <int>   <dbl>
 1     1  0.172 
 2     2 -0.385 
 3     3  0.388 
 4     4  0.168 
 5     5 -1.48  
 6     6  0.743 
 7     7  0.859 
 8     8  0.783 
 9     9 -0.0703
10    10 -0.992 
# ℹ 40 more rows

$data$dens_tbl
# A tibble: 50 × 2
       x        y
   <dbl>    <dbl>
 1 -2.41 0.000289
 2 -2.28 0.000893
 3 -2.16 0.00236 
 4 -2.04 0.00531 
 5 -1.91 0.0103  
 6 -1.79 0.0174  
 7 -1.66 0.0259  
 8 -1.54 0.0353  
 9 -1.41 0.0461  
10 -1.29 0.0597  
# ℹ 40 more rows

$data$input_data
$data$input_data$`rnorm(50)`
 [1] -0.28375667 -1.35693228 -0.44459842 -0.44189941 -1.75245474  0.56591611
 [7]  0.64969767  0.43861900 -0.29576390 -1.05083449  0.46476746 -0.09441544
[13]  0.01988715 -0.11553959 -1.06165613 -0.23215249 -0.73166009  0.85851074
[19] -0.23347946 -0.98707523  0.48980891  0.45443754 -0.06019617 -0.28090697
[25]  0.02640269  0.34780762  0.08271394  0.38223602  0.37200374  0.15833057
[31]  0.67451345  0.19271746 -0.76646273 -0.61174894 -0.66437076  0.41119339
[37]  0.94342842  1.79174540 -0.78712893  0.84426079  1.21105485 -1.08434366
[43]  0.34320348  1.51119066 -1.54429610 -0.53518346 -0.12958712  0.40503043
[49]  1.10792452 -0.35614745

$data$input_data$`rbeta(50, 0.5, 0.5)`
 [1] 0.4558286432 0.9722089635 0.8326968294 0.6098241262 0.2693505903
 [6] 0.1769712879 0.2097711862 0.3447365735 0.2255041322 0.0593279079
[11] 0.3605619162 0.7653921143 0.9300206371 0.0649047973 0.8248588302
[16] 0.4746486057 0.0007107969 0.0303139028 0.1092924293 0.9994465190
[21] 0.5447640613 0.6838621697 0.4264545201 0.0350483756 0.1439936791
[26] 0.9999991418 0.9971223513 0.9366023326 0.2380888988 0.3954270399
[31] 0.6355559970 0.0082225336 0.1935222058 0.8301693526 0.0006154042
[36] 0.9468539743 0.5805084634 0.9630710788 0.8536182424 0.0636560268
[41] 0.3383464734 0.8648131947 0.2472292868 0.7353653812 0.6462800353
[46] 0.3418460528 0.8706317638 0.0537689028 0.4028589675 0.5659417332



$plots
$plots$line_plot


$plots$dens_plot



$input_fns
[1] "rnorm(50), rbeta(50, 0.5, 0.5)"
# Multiplicative interactions
mix_multiplicative <- tidy_mixture_density(
  rnorm(50), 
  rbeta(50, 0.5, 0.5), 
  .combination_type = "multiply"
)
mix_multiplicative
$data
$data$dist_tbl
# A tibble: 50 × 2
       x       y
   <int>   <dbl>
 1     1  0.0128
 2     2 -0.352 
 3     3 -0.228 
 4     4  0.0318
 5     5  0.0846
 6     6  0.0148
 7     7  0.301 
 8     8  0.125 
 9     9 -0.196 
10    10  1.58  
# ℹ 40 more rows

$data$dens_tbl
# A tibble: 50 × 2
        x       y
    <dbl>   <dbl>
 1 -1.49  0.00113
 2 -1.42  0.00577
 3 -1.34  0.0217 
 4 -1.27  0.0605 
 5 -1.19  0.127  
 6 -1.12  0.205  
 7 -1.04  0.261  
 8 -0.968 0.272  
 9 -0.893 0.243  
10 -0.818 0.197  
# ℹ 40 more rows

$data$input_data
$data$input_data$`rnorm(50)`
 [1]  0.32370511 -1.21400213 -1.20899806  0.03519770  0.53932734  0.02665457
 [7]  0.41845197  0.14043657 -0.55713865  1.68772225  0.33995465 -1.38935947
[13]  0.74516592 -0.04187743 -1.86673573  0.14417007 -0.44909386 -0.46806361
[19] -1.36766494  0.19793102  0.85812595  0.38882190 -1.00188631  0.43322473
[25]  0.43850846  0.84118862  1.63111722 -0.80401369 -0.96695329  0.44273011
[31]  0.18966768  0.18008685  0.47594963  2.67993093  0.56240726 -0.68272322
[37]  2.07827084  2.87539786  0.07352364 -0.59474395  0.54737811  0.70341946
[43] -0.46216676 -1.04391929 -0.53857891  0.60391106 -1.06072413  0.36132956
[49] -1.24601342 -0.67495126

$data$input_data$`rbeta(50, 0.5, 0.5)`
 [1] 0.03961826 0.28965374 0.18885811 0.90263774 0.15690042 0.55506535
 [7] 0.71961247 0.88995670 0.35266007 0.93680182 0.79821917 0.79099196
[13] 0.47833454 0.88224189 0.06153885 0.96803949 0.16677721 0.08155236
[19] 0.78180736 0.89987410 0.06377253 0.99049015 0.92083908 0.81639438
[25] 0.11599055 0.89253034 0.99961689 0.06746365 0.94993344 0.93847486
[31] 0.04346734 0.97125746 0.01042851 0.07114067 0.04258062 0.31699967
[37] 0.57796230 0.62059110 0.28030112 0.95068023 0.21437376 0.46588256
[43] 0.60081481 0.98127380 0.03041642 0.67000409 0.72883674 0.26381406
[49] 0.43238751 0.07012001



$plots
$plots$line_plot


$plots$dens_plot



$input_fns
[1] "rnorm(50), rbeta(50, 0.5, 0.5)"
# Subtration for differencing
mix_subtract <- tidy_mixture_density(
        rnorm(50),
        rbeta(50, 0.5, 0.5),
        .combination_type = "subtract"
)
mix_subtract
$data
$data$dist_tbl
# A tibble: 50 × 2
       x       y
   <int>   <dbl>
 1     1 -0.934 
 2     2  0.0255
 3     3 -0.807 
 4     4  0.639 
 5     5 -0.151 
 6     6  0.261 
 7     7  0.383 
 8     8 -1.06  
 9     9 -1.17  
10    10  0.868 
# ℹ 40 more rows

$data$dens_tbl
# A tibble: 50 × 2
       x        y
   <dbl>    <dbl>
 1 -3.33 0.000267
 2 -3.22 0.000687
 3 -3.10 0.00159 
 4 -2.99 0.00333 
 5 -2.88 0.00633 
 6 -2.76 0.0109  
 7 -2.65 0.0173  
 8 -2.54 0.0252  
 9 -2.43 0.0342  
10 -2.31 0.0442  
# ℹ 40 more rows

$data$input_data
$data$input_data$`rnorm(50)`
 [1]  0.04741171  0.78106971 -0.51649345  0.79333600  0.84698156  0.44755311
 [7]  0.42599173 -0.05861854 -0.17167764  1.86283568 -0.34825311  1.15277765
[13] -0.43388248 -0.44234758  0.18575088 -1.14766103 -1.00124274 -1.29743634
[19]  0.04227262  1.88358997 -1.03996542  0.01229659  0.54674453  0.78417878
[25] -0.68734596  1.46836234  1.17552232  0.21217058  0.58419970  1.79239641
[31] -0.15530648  0.77885429  1.54672370  1.11665693  0.35566983  0.52467994
[37]  0.30117165 -0.38017897 -0.35182655 -0.50842405  1.54094057  0.01395280
[43] -0.56581282 -0.36566571  0.98543508 -0.48095752 -0.08275619 -0.53918661
[49] -0.51094105  0.65497036

$data$input_data$`rbeta(50, 0.5, 0.5)`
 [1] 0.981019342 0.755544020 0.290997064 0.153894509 0.997501146 0.187018584
 [7] 0.042874119 0.997810266 0.998535547 0.994524920 0.049740075 0.053181972
[13] 0.009592371 0.127241021 0.385442338 0.141428843 0.966020263 0.998885522
[19] 0.194088533 0.709788033 0.479590987 0.346260596 0.958567049 0.796353667
[25] 0.928775994 0.981901862 0.241413100 0.061032775 0.789884132 0.895207272
[31] 0.011005119 0.931750374 0.761540209 0.632937614 0.959700133 0.065693893
[37] 0.654884437 0.980215158 0.887362439 0.347511106 0.982633970 0.108274936
[43] 0.898851715 0.602077489 0.104709882 0.286286952 0.181128702 0.670655445
[49] 0.287199855 0.910472770



$plots
$plots$line_plot


$plots$dens_plot



$input_fns
[1] "rnorm(50), rbeta(50, 0.5, 0.5)"
# Division for ratios
mix_divide <- tidy_mixture_density(
        rnorm(50),
        rbeta(50, 0.5, 0.5),
        .combination_type = "divide"
)
mix_divide
$data
$data$dist_tbl
# A tibble: 50 × 2
       x         y
   <int>     <dbl>
 1     1     3.00 
 2     2    -1.79 
 3     3     0.603
 4     4     1.36 
 5     5 25721.   
 6     6  9101.   
 7     7     1.80 
 8     8    -0.620
 9     9    -0.698
10    10     1.73 
# ℹ 40 more rows

$data$dens_tbl
# A tibble: 50 × 2
        x        y
    <dbl>    <dbl>
 1 -445.  6.46e- 3
 2   89.1 5.56e- 4
 3  623.  2.49e-20
 4 1157.  3.46e-20
 5 1691.  5.86e-19
 6 2225.  1.04e-19
 7 2759.  1.06e-18
 8 3293.  0       
 9 3827.  0       
10 4362.  0       
# ℹ 40 more rows

$data$input_data
$data$input_data$`rnorm(50)`
 [1]  1.14765189 -0.65037474  0.56076429  0.14529634  2.07643060  1.35312739
 [7]  1.36098704 -0.38382420 -0.60398679  1.72618260  1.02326275  0.53904945
[13]  0.58427997 -1.24784343  0.54468193  0.23675504 -0.06278437 -0.19938179
[19]  0.29774594 -1.73700726 -0.95278639  1.50377661  0.76641470 -1.64577160
[25] -1.87538645  0.20415661 -0.02288698  0.03017884 -1.11147919 -0.53853737
[31]  0.58745346  1.53857208 -0.71316156  0.30820280  1.12513966 -1.22796997
[37] -0.43473722 -1.17160252 -1.49085069 -0.97810140  0.77343726 -0.27780074
[43]  0.10606935  1.18324993 -0.66469916  0.77692003 -1.72619510  0.12750687
[49] -0.63319646 -0.44339251

$data$input_data$`rbeta(50, 0.5, 0.5)`
 [1] 3.820203e-01 3.628845e-01 9.300254e-01 1.067735e-01 8.072951e-05
 [6] 1.486745e-04 7.576313e-01 6.194406e-01 8.654029e-01 9.957012e-01
[11] 4.249757e-01 1.018361e-01 9.576092e-01 9.578569e-03 1.354495e-03
[16] 9.547469e-01 9.999004e-01 6.847354e-03 2.532576e-01 8.101000e-01
[21] 9.979821e-01 7.362087e-01 1.718800e-01 1.436152e-01 7.798669e-01
[26] 9.892079e-01 5.585185e-01 8.395558e-01 2.514851e-03 6.805656e-01
[31] 9.669103e-01 1.653675e-04 7.284081e-01 9.800184e-01 9.538534e-02
[36] 8.859544e-01 9.990685e-01 1.933901e-01 9.998451e-01 5.631759e-01
[41] 4.015837e-02 3.758858e-01 9.970344e-01 6.033153e-01 5.793981e-01
[46] 2.345981e-03 9.560933e-01 7.753198e-01 2.004754e-01 7.424075e-01



$plots
$plots$line_plot


$plots$dens_plot



$input_fns
[1] "rnorm(50), rbeta(50, 0.5, 0.5)"

Each combination returns comprehensive output including:

  • Tidy data tables with combined distributions
  • Density estimates for visualization
  • Ready-to-use plots for immediate analysis
  • Input function metadata for reproducibility

Best Practices and Recommendations

For Existing Users

  • Gradually migrate critical workflows after thorough testing
  • Document any code that depends on exact quantile_normalize() outputs
  • Leverage new mixture modeling for more sophisticated statistical modeling
  • Test downstream analyses to ensure compatibility

For New Users

  • Start with v1.5.2 to benefit from performance improvements immediately
  • Explore mixture modeling capabilities for creative statistical applications
  • Use in tidyverse pipelines for seamless data science workflows

Looking Forward

TidyDensity 1.5.2 represents a significant evolution in the package’s capabilities. The performance improvements in quantile_normalize() make it more suitable for large-scale data science applications, while the enhanced tidy_mixture_density() opens new possibilities for sophisticated statistical modeling.

The breaking changes, though initially challenging, position the package for better scalability and more efficient memory usage, crucial factors for modern data science workflows.

Conclusion

TidyDensity 1.5.2 delivers substantial improvements that will benefit R programmers working with statistical distributions. The 48.6% performance improvement in quantile_normalize() and flexible mixture modeling capabilities make this update highly valuable, despite the breaking changes.

Key takeaways:

  • ✅ Significant performance gains across all dataset sizes
  • ✅ Enhanced mixture modeling with five combination types
  • ✅ Preserved statistical properties in quantile normalization
  • ⚠️ Breaking changes require testing of existing workflows
  • 🚀 Improved scalability for large-scale data science applications

Ready to upgrade? Update to TidyDensity 1.5.2 and test your critical workflows to ensure compatibility. The performance benefits and new capabilities make this update well worth the migration effort.

💡 What’s your experience with TidyDensity 1.5.2?


Happy Coding! 🚀

TidyDensity Update

You can connect with me at any one of the below:

Telegram Channel here: https://t.me/steveondata

LinkedIn Network here: https://www.linkedin.com/in/spsanderson/

Mastadon Social here: https://mstdn.social/@stevensanderson

RStats Network here: https://rstats.me/@spsanderson

GitHub Network here: https://github.com/spsanderson

Bluesky Network here: https://bsky.app/profile/spsanderson.com

My Book: Extending Excel with Python and R here: https://packt.link/oTyZJ

You.com Referral Link: https://you.com/join/EHSLDTL6