Random number generation is a basic skill for statistics, simulations, and data analysis in R. This guide should help you get famililar with some of those basics.

Introduction

Random number generation is a basic component of many statistical analyses, simulations, and data science workflows in R. Whether you’re running Monte Carlo simulations, creating sample datasets, or implementing statistical algorithms, knowing how to generate random numbers efficiently is a valuable skill. This article will walk you through the most common methods for generating random numbers in R, with practical examples and visualizations to help you understand the concepts better.

Understanding Random Number Generation in R

R provides a powerful suite of functions for generating random numbers from various probability distributions. These functions are prefixed with r (for random), followed by the abbreviated name of the distribution. For example, rnorm() generates random numbers from a normal distribution, while runif() generates random numbers from a uniform distribution.

Basic Random Number Generation

Uniform Distribution with runif()

The runif() function generates random numbers from a uniform distribution where each number in the specified range has an equal probability of being chosen.

# Generate 5 random numbers between 0 and 1
runif(5)

[1] 0.9631528 0.6473717 0.8502536 0.5132711 0.3478398

# Generate 10 random numbers between 10 and 20
runif(10, min = 10, max = 20)

 [1] 18.06993 14.80453 15.20832 16.79679 16.00856 10.91564 18.10421 19.74378
 [9] 12.87939 15.08066

Normal Distribution with rnorm()

The rnorm() function generates random numbers from a normal (Gaussian) distribution with a specified mean and standard deviation.

# Generate 5 random numbers from a normal distribution with mean 0 and sd 1
rnorm(5)

[1] 0.3169287 0.4466374 0.5926026 1.7024986 1.5120444

# Generate 10 random numbers with mean 100 and sd 20
rnorm(10, mean = 100, sd = 20)

 [1] 109.83534  93.56975 113.24464 128.31562 103.70459  87.70473  89.19045
 [8]  92.77541  80.40960 128.66981

Random Integers with sample()

The sample() function is perfect for generating random integers or sampling from a specific set of values.

# Generate 5 random integers between 1 and 100
sample(1:100, 5)

[1] 81 70 98 46 89

# Sample with replacement (allowing repeated values)
sample(1:10, 15, replace = TRUE)

 [1] 8 9 5 4 7 4 8 3 7 5 4 9 1 4 8

Setting Seeds for Reproducibility

When working with random numbers, it’s often essential to make your results reproducible. The set.seed() function allows you to get the same sequence of “random” numbers each time you run your code.

# Set a seed for reproducibility
set.seed(123)
# Generate random numbers
runif(5)

[1] 0.2875775 0.7883051 0.4089769 0.8830174 0.9404673

# Run again with the same seed for the same results
set.seed(123)
runif(5)

[1] 0.2875775 0.7883051 0.4089769 0.8830174 0.9404673

Visualizing Random Distributions

Visualizing random number distributions helps you understand their properties better. Here’s a comparison of different random distributions:

par(mfrow = c(2, 2))
# Uniform Distribution
hist(runif(1000, min = 0, max = 1), breaks = 30, 
     main = "Uniform Distribution", xlab = "Value", col = "lightblue")

# Normal Distribution
hist(rnorm(1000, mean = 0, sd = 1), breaks = 30, 
     main = "Normal Distribution", xlab = "Value", col = "lightgreen")

# Random Sampling
hist(sample(1:100, 1000, replace = TRUE), breaks = 30, 
     main = "Random Sampling", xlab = "Value", col = "lightcoral")

# Probability Density Functions Comparison
x <- seq(-4, 4, length.out = 100)
plot(x, dnorm(x), type = "l", col = "blue", lwd = 2, ylim = c(0, 1), 
     main = "Probability Density Functions Comparison", xlab = "Value",
     ylab = "Density")
lines(x, dnorm(x, mean = 1, sd = 0.5), col = "red", lwd = 2)
lines(x, dnorm(x, mean = -1, sd = 0.5), col = "green", lwd = 2)

par(mfrow = c(1, 1))

The plots above show:

Uniform Distribution: A flat distribution where all values in the range have equal probability
Normal Distribution: The classic bell-shaped curve with most values clustered around the mean
Random Sampling: The result of random sampling with replacement
Probability Density Functions Comparison: A comparison of different probability distributions

Advanced Random Number Generation

R provides functions for generating random numbers from many other probability distributions. Here are some of the most commonly used ones:

Binomial Distribution (rbinom)

Useful for modeling success/failure scenarios with a fixed number of trials.

# Generate 10 random numbers from a binomial distribution 
# with 20 trials and probability 0.5
rbinom(10, size = 20, prob = 0.5)

 [1] 18 14 14 11 11 14  8 11  7 10

Poisson Distribution (rpois)

Perfect for modeling the number of events occurring in a fixed time period.

# Generate 10 random numbers from a Poisson distribution with lambda = 5
rpois(10, lambda = 5)

 [1] 6 1 3 2 7 4 6 6 3 4

Other Useful Distributions

R supports many other probability distributions, including:

# Gamma distribution
rgamma(5, shape = 2, scale = 1)

[1] 1.0359106 0.4800262 1.5689731 0.9516189 1.6413728

# Beta distribution
rbeta(5, shape1 = 2, shape2 = 5)

[1] 0.16886543 0.36738431 0.26962401 0.27630634 0.04024299

# Chi-squared distribution
rchisq(5, df = 3)

[1] 0.8795775 2.8472263 1.0508443 1.6913313 1.1525831

Visualizing Advanced Distributions

par(mfrow = c(2, 2))

# Gamma Distribution
hist(rgamma(1000, shape = 2, scale = 1), breaks = 30, 
     main = "Gamma Distribution", xlab = "Value", col = "lightblue")

# Beta Distribution
hist(rbeta(1000, shape1 = 2, shape2 = 5), breaks = 30, 
     main = "Beta Distribution", xlab = "Value", col = "lightgreen")

# Poisson Distribution
hist(rpois(1000, lambda = 5), breaks = 30, 
     main = "Poisson Distribution", xlab = "Value", col = "lightcoral")

# Chi-square Distribution
hist(rchisq(1000, df = 3), breaks = 30, 
     main = "Chi-square Distribution", xlab = "Value", col = "lightyellow")

par(mfrow = c(1, 1))

The plots above show the theoretical distributions (red lines) compared to randomly sampled data (histograms) for:

Gamma Distribution: Useful for modeling waiting times
Beta Distribution: Often used for modeling probabilities or proportions
Poisson Distribution: Models count data or rare events
Chi-square Distribution: Used in hypothesis testing

Common Pitfalls and Best Practices

Pitfall 1: Not Setting a Seed

If you don’t set a seed, you’ll get different random numbers each time you run your code, which can make debugging difficult and results irreproducible.

Best Practice: Always Set a Seed for Reproducibility

set.seed(42)  # Choose any number you like

Pitfall 2: Using the Same Seed in Parallel Processing

When using parallel processing, setting the same seed in each parallel worker can lead to correlated random numbers.

Best Practice: Use Parallel-Safe RNG Methods

library(parallel)
cl <- makeCluster(4)
clusterSetRNGStream(cl, iseed = 42)

Pitfall 3: Ignoring the Properties of the Distribution

Using the wrong distribution for your data can lead to incorrect results.

Best Practice: Select the Appropriate Distribution

Choose the distribution that best models your data:

Use rnorm() for continuous, symmetric data
Use rpois() for count data
Use rbinom() for binary outcomes

Your Turn! Interactive Section

Now, let’s put your knowledge into practice with a simple exercise.

Exercise: Generate 1000 random numbers from a normal distribution with mean 50 and standard deviation 10. Then calculate their mean and standard deviation to verify they are close to the expected values.

Click here for Solution!

# Set seed for reproducibility
set.seed(123)

# Generate 1000 random numbers from normal distribution
random_numbers <- rnorm(1000, mean = 50, sd = 10)

# Calculate mean and standard deviation
mean(random_numbers)  # Should be close to 50

[1] 50.16128

sd(random_numbers)    # Should be close to 10

[1] 9.91695

The mean should be approximately 50, and the standard deviation should be approximately 10, with small variations due to randomness.

Quick Takeaways

Function Pattern: Random number generators in R follow the pattern r + distribution name (e.g., rnorm, runif)
Reproducibility: Use set.seed() to make your random numbers reproducible
Common Distributions:
- runif() for uniform distribution
- rnorm() for normal distribution
- sample() for random sampling
- rbinom() for binomial distribution
- rpois() for Poisson distribution
Visualization: Always visualize your random numbers to verify their distribution
Parameters: Each distribution function has specific parameters that control its shape

Conclusion

Random number generation is a powerful tool in R programming that enables everything from simple sampling to complex statistical simulations. By understanding the different distribution functions and their parameters, you can generate the precise type of random data you need for your analyses.

Now that you have a solid foundation in generating random numbers in R, try incorporating these techniques into your next data analysis project or statistical simulation. Remember to set a seed for reproducibility, choose the appropriate distribution for your data, and visualize your results to ensure they meet your expectations.

Ready to level up your R programming skills? Share this article with your colleagues and let us know in the comments which random number generation techniques you find most useful in your work.

FAQs

1. Why do I get different random numbers each time I run my code?

If you don’t set a seed using set.seed(), R will generate different random numbers each time. To get reproducible results, always set a seed at the beginning of your script.

2. Which function should I use to generate random integers?

Use the sample() function to generate random integers. For example, sample(1:100, 10) generates 10 random integers between 1 and 100.

3. How do I generate random numbers from a custom probability distribution?

You can use the sample() function with custom probabilities. For example:

sample(1:6, 10, replace = TRUE, prob = c(0.1, 0.1, 0.1, 0.1, 0.1, 0.5))

This will generate random numbers from 1 to 6, with 6 being five times more likely to appear than the other numbers.

4. What’s the difference between sampling with and without replacement?

When sampling with replacement (replace = TRUE), the same value can be selected multiple times. Without replacement (replace = FALSE), each value can only be selected once.

5. How can I check if my random numbers follow the expected distribution?

Use visualization techniques like histograms, density plots, or Q-Q plots to verify your random numbers follow the expected distribution:

hist(rnorm(1000), breaks = 30)  # Histogram for normal distribution
qqnorm(rnorm(1000))  # Q-Q plot for normal distribution

References

Official R Documentation
- R Core Team (2023). “Introduction to R: A Programming Environment for Data Analysis and Graphics”
  https://cran.r-project.org/doc/manuals/r-release/R-intro.html
R Random Number Generation Tutorial
- GeeksforGeeks (2023). “How to Generate Random Numbers in R”
  https://www.geeksforgeeks.org/how-to-generate-random-numbers-in-r/
Statistical Applications of Random Numbers
- Statology (2023). “Random Number Generation in R”
  https://www.statology.org/r-random-number/
Academic Resources on Random Number Generation
- MIT OpenCourseWare (2022). “R Tutorial B: Random Numbers”
  https://ocw.mit.edu/courses/18-05-introduction-to-probability-and-statistics-spring-2022/pages/r-tutorial-b-random-numbers/
Random Number Generators in R
- Wikibooks (2023). “R Programming/Random Number Generation”
  https://en.wikibooks.org/wiki/R_Programming/Random_Number_Generation
Practical Applications of Random Numbers
- Ducat India (2023). “Random Number Generator in R”
  https://tutorials.ducatindia.com/r-programming/random-number-generator-in-r

Happy Coding! 🚀

You can connect with me at any one of the below:

Telegram Channel here: https://t.me/steveondata

LinkedIn Network here: https://www.linkedin.com/in/spsanderson/

Mastadon Social here: https://mstdn.social/@stevensanderson

RStats Network here: https://rstats.me/@spsanderson

GitHub Network here: https://github.com/spsanderson

Bluesky Network here: https://bsky.app/profile/spsanderson.com

My Book: Extending Excel with Python and R here: https://packt.link/oTyZJ

You.com Referral Link: https://you.com/join/EHSLDTL6