Mastering Random Sampling in R with the sample() Function

code
rtip
operations
Author

Steven P. Sanderson II, MPH

Published

March 12, 2024

Introduction

The sample() function in R is a powerful tool that allows you to generate random samples from a given dataset or vector. It’s an essential function for tasks such as data analysis, Monte Carlo simulations, and randomized experiments. In this blog post, we’ll explore the sample() function in detail and provide examples to help you understand how to use it effectively.

Understanding the sample() Function

The sample() function in R has the following syntax:

sample(x, size, replace = FALSE, prob = NULL)
  • x: This is the vector or data structure from which you want to draw the sample.
  • size: This specifies the number of elements you want to sample from x.
  • replace: This is a logical argument that determines whether sampling should be done with replacement (TRUE) or without replacement (FALSE). The default value is FALSE.
  • prob: This is an optional vector of probability weights, allowing you to perform weighted random sampling.

Examples

Simple Random Sampling

Let’s start with a basic example of simple random sampling without replacement:

# Creating a vector
numbers <- 1:10

# Drawing a sample of 5 elements without replacement
sample_without_replacement <- sample(numbers, 5)

This code will generate a random sample of 5 unique elements from the numbers vector. The output might look something like:

print(sample_without_replacement)
[1] 7 8 3 6 1

Sampling with Replacement

Sometimes, you may want to sample with replacement, which means that an element can be selected multiple times. To do this, you can set the replace argument to TRUE:

# Drawing a sample of 5 elements with replacement
sample_with_replacement <- sample(numbers, 5, replace = TRUE)

This code might produce an output like:

print(sample_with_replacement)
[1] 1 3 6 6 2

Notice that the number 2 appears twice in the sample, since we’re sampling with replacement.

Weighted Random Sampling

The prob argument in the sample() function allows you to perform weighted random sampling. This means that elements have different probabilities of being selected based on the provided weights. Here’s an example:

# Creating a vector of weights
weights <- c(0.1, 0.2, 0.3, 0.4)

# Drawing a weighted sample of 3 elements without replacement
weighted_sample <- sample(1:4, 3, replace = FALSE, prob = weights)

In this example, the numbers 1, 2, 3, and 4 have weights of 0.1, 0.2, 0.3, and 0.4, respectively. The output might look like:

print(weighted_sample)
[1] 4 3 2

Notice how the elements with higher weights (4 and 3) are more likely to be selected in the sample.

Your Turn!

Now that you’ve seen several examples of using the sample() function in R, it’s time to put your knowledge to the test! Here are some exercises for you to try:

  1. Generate a random sample of 10 elements from the letters of the English alphabet.
  2. Sample 5 elements with replacement from the vector c(10, 20, 30, 40, 50).
  3. Create a vector of weights and perform weighted random sampling to select 3 elements from the vector c("apple", "banana", "orange", "grape").

Feel free to experiment with different combinations of arguments and datasets to solidify your understanding of the sample() function. Happy sampling!