# Create a vector with numbers 1 through 10
<- 1:10
numbers
# Draw a random sample of 5 elements without replacement
<- sample(numbers, 5)
sample_without_replacement
# Print the sampled values
print(sample_without_replacement)
[1] 7 9 10 4 5
Steven P. Sanderson II, MPH
May 19, 2025
Programming, How to Select Random Samples in R, Random Sampling in R, R Sample Function, Select Random Rows in R, Random Samples in R Examples, Sampling Without Replacement in R, Sampling With Replacement in R, Random Sampling from Data Frames in R, Sampling from R Vectors, Sampling from Matrices in R, How to Sample Random Rows from a Data Frame in R, Step-by-Step Guide to Using sample() in R, How to Sample from Vectors and Matrices in R, Working Examples of R Random Sampling With and Without Replacement, Tutorial on Reproducible Random Sampling in R
Random sampling is a fundamental technique in statistics, simulation, and data analysis. Whether you are building a model, testing a hypothesis, or simulating data, learning how to randomly select samples from your dataset is a must. In R, the built-in sample()
function is an easy and powerful way to obtain random samples from vectors, data.frames, and even matrices.
In this article, we will explain the sample()
function in detail, provide working examples, and show you how to perform both sampling with and without replacement. By the end, you will be able to confidently use random sampling to support your data analysis tasks in R.
Random sampling is useful for many tasks. With random samples, you can:
In R, the sample()
function is a versatile tool that lets you randomly draw items from a collection—whether that collection is a simple vector, a data.frame, or even a matrix. In the following sections, we will explain the syntax of sample()
, show examples with and without replacement, and provide sample code for various data structures.
sample()
FunctionThe basic syntax of the sample()
function in R is as follows:
Let’s break down the arguments:
TRUE
) or without (the default value FALSE
).This function works by randomly shuffling the elements of x
when size
is not specified. When you set the size
argument, sample()
returns a random subset of the elements from x
.
For more examples and detailed explanations on sample()
, many great resources are available (https://www.statology.org/random-sample-in-r/) (https://www.r-bloggers.com/2024/03/mastering-random-sampling-in-r-with-the-sample-function/).
Vectors are the simplest data structure in R. Let’s start with a few examples that show how to use the sample()
function to draw random samples from a vector.
In simple random sampling, each element is only selected once. Here’s how you can sample 5 elements from a vector of numbers without replacement:
# Create a vector with numbers 1 through 10
numbers <- 1:10
# Draw a random sample of 5 elements without replacement
sample_without_replacement <- sample(numbers, 5)
# Print the sampled values
print(sample_without_replacement)
[1] 7 9 10 4 5
Every time you run this script, you will see a different order for the five unique elements chosen from 1 to 10. This is because sampling without replacement means no element is repeated (https://www.r-bloggers.com/2024/03/mastering-random-sampling-in-r-with-the-sample-function/).
When sampling with replacement, the same element can be selected more than once. This is useful if you need to simulate scenarios where an observation might appear multiple times. To sample with replacement, simply set replace = TRUE
:
# Draw a sample of 5 elements with replacement from the same vector
sample_with_replacement <- sample(numbers, 5, replace = TRUE)
# Print the sampled values
print(sample_with_replacement)
[1] 2 3 2 10 10
Because replacement is allowed, you might see the same number appear more than once (for instance, you might get 3 3 7 2 9
). This method is also commonly used in bootstrapping methods .
Often you need to randomly select rows from a data.frame instead of just sampling from a vector of numbers. This is very useful when splitting data into training and testing sets, or when you need a subset for exploratory analysis.
Imagine you have the following data.frame:
# Create a simple data.frame with names and ages
df <- data.frame(
Name = c("Alice", "Bob", "Charlie", "Diana", "Eve"),
Age = c(25, 30, 35, 28, 22)
)
# Randomly select 3 rows from the data.frame without replacement
df_sample <- df[sample(nrow(df), 3), ]
# Print the sampled data.frame
print(df_sample)
Name Age
4 Diana 28
5 Eve 22
2 Bob 30
sample()
returns the total number of rows in the data.frame.This will return a new data.frame with 3 randomly selected rows, which might be useful for quick exploratory analysis or as input for further processing. Sampling rows using this technique is common when the dataset is large and you need to quickly check a random subset .
Matrices in R are two-dimensional arrays, and you can also use the sample()
function to work with them. The following examples demonstrate two common approaches to sampling from a matrix: sampling random elements from the entire matrix and sampling random rows.
You might want to pick random elements from a matrix regardless of rows and columns. For example:
# Create a 3x3 matrix with numbers from 1 to 9
matrix_data <- matrix(1:9, nrow = 3, byrow = TRUE)
print(matrix_data)
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
# Sample 4 random elements from the matrix (note: the matrix is treated as a vector)
random_elements <- sample(matrix_data, 4)
# Print the sampled elements
print(random_elements)
[1] 8 2 5 6
Explanation: - R internally treats the matrix as a vector when passed to sample()
. Hence, the function picks 4 random values from all values in the matrix.
If you need to randomly select rows (maintaining the matrix structure), you can do this by sampling the row indices:
# Sample 2 random rows from the matrix
random_rows <- matrix_data[sample(nrow(matrix_data), 2), ]
# Print the sampled rows, which still keep the matrix-like structure
print(random_rows)
[,1] [,2] [,3]
[1,] 4 5 6
[2,] 1 2 3
Explanation:
This technique is particularly useful when dealing with multivariate data stored as a matrix and you wish to preserve entire rows for subsequent analysis .
Sometimes, you need elements to have a higher chance of being selected. This is where the prob
argument comes into play. For example, let’s say you have a vector representing four options and they should not all have the same chance of appearing in the sample:
# Define a vector representing four different items
items <- c("Apple", "Banana", "Cherry", "Date")
# Define the weights so that "Date" has the highest probability of selection
weights <- c(0.1, 0.2, 0.3, 0.4)
# Sample 3 elements from items using the weights (without replacement)
weighted_sample <- sample(items, 3, replace = FALSE, prob = weights)
# Print the weighted sample
print(weighted_sample)
[1] "Date" "Cherry" "Apple"
Explanation:
prob
parameter assigns selection probabilities. In this example, “Date” (with the highest weight of 0.4) is more likely to be picked.set.seed()
for Reproducible ResultsIn random sampling, you might want to generate the same random output each time you run your code—especially when sharing code with colleagues or including examples in your reports. R’s set.seed()
function lets you do exactly that.
For example:
# Set the seed to ensure reproducibility
set.seed(42)
# Sample 5 numbers without replacement from 1 to 10
reproducible_sample <- sample(1:10, 5)
print(reproducible_sample)
[1] 1 5 10 8 2
# Re-run with the same seed to see the same sample
set.seed(42)
reproducible_sample <- sample(1:10, 5)
print(reproducible_sample)
[1] 1 5 10 8 2
Using set.seed()
guarantees that the random sequence is the same in every run, which is important for debugging and sharing reproducible research .
Now that we have covered the basics and more advanced examples of using the sample()
function, it’s time for you to practice! Here are some exercises to try on your own:
Exercise 1:
Generate a random sample of 10 elements from the letters of the English alphabet without replacement.
Hint: Use letters
(a built-in vector in R) and the sample()
function.
[1] "d" "r" "q" "o" "g" "z" "e" "n" "y" "w"
Exercise 2:
Sample 5 elements with replacement from the vector c(10, 20, 30, 40, 50)
.
[1] 30 10 10 30 40
Exercise 3:
Create a vector of weights and perform weighted random sampling to select 3 elements from the vector c("apple", "banana", "orange", "grape")
.
Make sure that “orange” has the highest probability of being selected.
sample()
Function:
The sample()
function in R is robust for drawing random elements from vectors, data.frames, and even matrices.
With vs. Without Replacement:
– Use replace = FALSE
for unique sampling.
– Use replace = TRUE
when you allow repeated values in the sample.
Working with Complex Data Structures:
You can sample rows from data.frames or from entire matrices by using functions such as nrow()
to index your data.
Weighted Sampling:
The prob
argument allows you to specify weights for elements, making some more likely to be sampled than others.
Reproducibility:
Use set.seed()
to ensure that your random samples are the same across multiple runs, which is critical for reproducible research.
These points will help guide your use of random sampling.
Q1: What does sampling with replacement mean?
A: Sampling with replacement means that when you choose an element, it is “put back” into the pool of values. This allows the same element to be selected more than once. For example, using sample(numbers, 5, replace = TRUE)
might select one number twice while missing another .
Q2: How is weighted random sampling useful?
A: Weighted random sampling allows you to assign different probabilities to each element in your vector. This is useful in simulations where certain outcomes are more likely than others. By using the prob
argument, you can simulate more realistic scenarios where not all elements have an equal chance of being selected.
Q3: Can I use the sample()
function on data.frames?
A: Yes, you can. By sampling the row indices using sample(nrow(your_dataframe), size)
, you can randomly select rows from a data.frame. This method is especially useful for creating training and testing sets.
Q4: How do I ensure that my random sampling is reproducible?
A: Use the set.seed()
function at the start of your script. Setting a seed (e.g., set.seed(42)
) ensures that the sequence of random numbers—and thus your samples—is the same each time you run your code.
Q5: What if I try to sample more elements than are available in my vector?
A: If you attempt to sample without replacement more elements than exist in the vector, R will return an error. To prevent this, either ensure that the requested size
does not exceed the length of the vector or set replace = TRUE
if duplicates are acceptable.
Random sampling is an essential skill for any R programmer. Whether you’re working with simple vectors, data.frames, or matrices, the sample()
function allows you to extract random subsets of your data with ease. In this article, we covered how to use sample()
for both non-repetitive selection (without replacement) and for allowing repeated values (with replacement). We also touched on weighted sampling—useful when some elements should be more likely to appear—and demonstrated how to achieve reproducibility using set.seed()
.
Make sure to experiment with the different options provided by the sample()
function as part of your workflow.
If you found this article helpful, please feel free to comment below, share it on your favorite social media channels, or subscribe for more R programming tutorials.
sample()
functionprob
argumentset.seed()
Happy Coding! 🚀
You can connect with me at any one of the below:
Telegram Channel here: https://t.me/steveondata
LinkedIn Network here: https://www.linkedin.com/in/spsanderson/
Mastadon Social here: https://mstdn.social/@stevensanderson
RStats Network here: https://rstats.me/@spsanderson
GitHub Network here: https://github.com/spsanderson
Bluesky Network here: https://bsky.app/profile/spsanderson.com
My Book: Extending Excel with Python and R here: https://packt.link/oTyZJ
You.com Referral Link: https://you.com/join/EHSLDTL6