```
<- 1:100
ages <- sample(ages, size = 10)
random_sample random_sample
```

` [1] 53 13 84 50 55 9 12 38 79 15`

rtip

Author

Steven P. Sanderson II, MPH

Published

June 21, 2023

Sampling is a fundamental technique in data analysis and statistical modeling. It allows us to draw meaningful insights and make inferences about a larger population based on a representative subset. In the world of R programming, the `sample()`

function stands as a versatile tool that enables us to create random samples efficiently. In this post, we will explore the `sample()`

function and its various applications through a series of plain English examples.

First, let’s take a look at the syntax:

where:

`x`

is the dataset or vector from which to take the sample`size`

is the number of elements to include in the sample`replace`

is a logical value that indicates whether or not to allow sampling with replacement (the default is FALSE)`prob`

is a vector of probabilities that can be used to weight the sample (the default is NULL)

Let’s say we have a dataset containing the ages of 100 people. To create a random sample of 10 individuals, we can use the `sample()`

function as follows:

` [1] 53 13 84 50 55 9 12 38 79 15`

The `sample()`

function randomly selects 10 values from the `ages`

vector, without replacement, resulting in a new vector named `random_sample`

. This technique represents simple random sampling, where each individual in the population has an equal chance of being included in the sample.

In some scenarios, we might want to allow repeated selections from the population. Let’s say we have a bag with colored balls, and we want to simulate drawing 5 balls with replacement. Here’s how we can achieve it:

```
colors <- c("red", "blue", "green", "yellow")
sample_with_replacement <- sample(colors, size = 5, replace = TRUE)
sample_with_replacement
```

`[1] "yellow" "yellow" "green" "green" "red" `

The `sample()`

function, with the `replace = TRUE`

argument, enables us to randomly select 5 colors from the `colors`

vector, allowing duplicates. This approach represents sampling with replacement, where each selection is independent of the previous ones.

In certain situations, we may want to assign different probabilities to elements in the population. Let’s assume we have a list of items and corresponding weights denoting their probabilities of being selected. We can use the sample() function with the `prob`

parameter to achieve weighted sampling. Consider the following example:

```
library(dplyr)
items <- c("apple", "banana", "orange")
weights <- c(0.4, 0.2, 0.4)
weighted_sample <- sample(items, size = 1, prob = weights)
weighted_sample
```

`[1] "apple"`

```
tibble(x = 1:10) |>
group_by(x) |>
mutate(rs = sample(items, size = 1, prob = weights)) |>
ungroup()
```

```
# A tibble: 10 × 2
x rs
<int> <chr>
1 1 orange
2 2 apple
3 3 apple
4 4 apple
5 5 apple
6 6 orange
7 7 orange
8 8 orange
9 9 apple
10 10 orange
```

By specifying the `prob`

argument with the corresponding weights, the `sample()`

function randomly selects a single item from the `items`

vector. The probability of each item being chosen is proportional to its weight. In this case, “apple” and “orange” have a higher chance (40% each) of being selected compared to “banana” (20%).

Stratified sampling involves dividing the population into subgroups or strata and then sampling from each stratum proportionally. Let’s assume we have a dataset of students’ grades in different subjects, and we want to select a sample that maintains the proportion of students from each subject. We can achieve this using the `sample()`

function along with additional parameters. Consider the following example:

```
subjects <- c("Math", "Science", "English", "History")
grades <- c(80, 90, 85, 70, 75, 95, 60, 92, 88, 83, 78, 91)
strata <- factor(subjects)
stratified_sample <- unlist(
by(
grades,
rep(strata, 3),
FUN = function(x) sample(x, size = 2)
)
)
stratified_sample
```

```
English1 English2 History1 History2 Math1 Math2 Science1 Science2
78 60 92 91 80 75 90 95
```

In this example, we use the by() function to group the grades by subject (`strata`

). Then, we apply the sample() function to each subgroup (subject) using the FUN argument. The result is a stratified sample of two grades from each subject, maintaining the relative proportions of students in the final sample.

The sample() function in R provides a powerful tool for generating random samples for various purposes. Whether you need simple random sampling, sampling with replacement, weighted sampling, or even stratified sampling, the sample() function can cater to your needs. By understanding and utilizing its various parameters, you can leverage the capabilities of sampling to gain insights from your data and make informed decisions. So go ahead, experiment with different sampling techniques using the sample() function, and unlock the potential of your data!