Multiple Choices: Data Wrangling with dplyr

Questions

Given df <- data.frame(A = c(5, 3, 2, 8), B = c("X", "Y", "Z", "X")), which command will filter rows where column A is greater than 4?

filter(df, A > 4) df |> filter(A > 4) df |> select(A > 4) Both A and B

If df <- data.frame(A = 1:5, B = letters[1:5]), what will df |> select(B) return?

A data frame with both columns A and B Only column B as a data frame A vector with values in column B An error

Which dplyr function would you use to create a new column C that is double the values in column A in df <- data.frame(A = 1:3, B = 4:6)?

df |> mutate(C = A * 2) df |> select(C = A * 2) df |> filter(C = A * 2) df |> arrange(C = A * 2)

Given df <- data.frame(A = c(3, 1, 2), B = c("apple", "banana", "apple")), which command will sort df by column A in ascending order?

df |> filter(A) df |> select(A) df |> arrange(A) df |> mutate(A)

If df <- data.frame(A = c("apple", "banana", "apple"), B = c(5, 10, 15)), which command will calculate the mean of B grouped by A?

What will df |> filter(B == "apple") do for df <- data.frame(A = 1:3, B = c("apple", "banana", "apple"))?

Filter rows where A equals “apple” Filter rows where B equals “apple” Remove rows where B equals “apple” Select column B only

To select columns A and C from a data frame df, which command is correct?

df |> select(A, C) df |> filter(A, C) df |> mutate(A, C) df |> arrange(A, C)

Given df <- data.frame(A = c(1, 2, 3), B = c(10, 20, 30)), which command will add a new column C with values A + B?

df |> select(C = A + B) df |> filter(C = A + B) df |> mutate(C = A + B) df |> arrange(C = A + B)

If df <- data.frame(A = c("X", "Y", "X", "Y"), B = c(2, 4, 6, 8)), which command will sum the values of B for each group in A?

Which command will arrange df in descending order by column B for df <- data.frame(A = c("apple", "banana", "cherry"), B = c(5, 3, 8))?

df |> arrange(desc(B)) df |> arrange(B, desc) df |> filter(desc(B)) df |> select(desc(B))

Given df <- data.frame(A = c(10, 20, 30), B = c(100, 200, 300)), which command will filter rows where B is greater than 150?

df |> filter(B > 150) df |> filter(A > 150) df |> select(B > 150) df |> mutate(B > 150)

If df <- data.frame(A = c(3, 1, 4), B = c(2, 6, 8)), which command will sort df by column B in descending order?

df |> arrange(B) df |> arrange(desc(B)) df |> mutate(B) df |> filter(B)

Given df <- data.frame(A = c(1, 2, 3), B = c(4, 5, 6)), which command will select only column A from df?

df |> select(A) df |> filter(A) df |> mutate(A) df |> summarise(A)

If df <- data.frame(A = c("apple", "banana", "cherry"), B = c(10, 20, 30)), which command will create a new column C that is the square of B?

df |> mutate(C = B^2) df |> filter(C = B^2) df |> select(C = B^2) df |> arrange(C = B^2)

What does the command df |> summarise(avg_B = mean(B)) do for df <- data.frame(A = c(1, 2, 3), B = c(4, 5, 6))?

Summarizes the data frame by calculating the mean of column B Summarizes the data frame by calculating the mean of column A Creates a new column avg_B with the mean of column B Filters the rows based on the mean of B

If df <- data.frame(A = c("X", "Y", "X", "Y"), B = c(5, 10, 15, 20)), which command will calculate the total sum of B for each group in A?

Which dplyr function allows you to select columns by their position in the data frame?

select() filter() mutate() arrange()

Given df <- data.frame(A = c(10, 20, 30), B = c(100, 200, 300)), which command will filter rows where A is less than or equal to 20?

df |> filter(A <= 20) df |> select(A <= 20) df |> mutate(A <= 20) df |> summarise(A <= 20)

If df <- data.frame(A = c("X", "Y", "Z"), B = c(5, 6, 7)), which command will create a new column C that is the sum of A and B as a string?

df |> mutate(C = paste(A, B)) df |> mutate(C = A + B) df |> select(C = A + B) df |> filter(C = A + B)

Which dplyr function would you use to group the data frame df by column A and then filter out groups where the sum of B is less than 10?

Given df <- data.frame(A = c(5, 10, 15), B = c("apple", "banana", "cherry")), which command will return rows where column B is either “apple” or “banana”?

df |> filter(B == "apple" | B == "banana") df |> filter(B %in% c("apple", "banana")) Both A and B None of the above

If df <- data.frame(A = c(5, 6, 7), B = c(3, 6, 9)), which command will create a new column C that is the ratio of B to A?

df |> mutate(C = B / A) df |> filter(C = B / A) df |> select(C = B / A) df |> arrange(C = B / A)

What does the command df |> summarise(avg_A = mean(A, na.rm = TRUE)) do for df <- data.frame(A = c(1, 2, 3, NA))?

It calculates the mean of column A, ignoring NA values It calculates the sum of column A, ignoring NA values It replaces NA values with 0 in column A It filters out rows where A is NA

Which of the following commands will sort the data frame df by column A in descending order and then by column B in ascending order?

df |> arrange(desc(A), B) df |> arrange(A, desc(B)) df |> arrange(A, B) df |> arrange(desc(A, B))A

Given df <- data.frame(A = c("X", "Y", "X", "Z"), B = c(10, 20, 30, 40)), which command will filter df to only include rows where A is either “X” or “Z”?

df |> filter(A == "X" | A == "Z") df |> filter(A %in% c("X", "Z")) Both A and B None of the above

If df <- data.frame(A = c(10, 20, 30), B = c(100, 200, 300)), which command will create a new column C that is 10% of column B?

df |> mutate(C = B * 0.1) df |> filter(C = B * 0.1) df |> arrange(C = B * 0.1) df |> select(C = B * 0.1)

Given df <- data.frame(A = c(1, 2, 3, 4, 5), B = c(10, 20, 30, 40, 50)), which command will return the rows where column B is greater than 25 and less than 45?

Which dplyr function is used to rename an existing column in a data frame?

select() mutate() rename() filter()

If df <- data.frame(A = c(5, 6, 7), B = c(8, 9, 10)), which command will return a summary of the total sum of A and B?

Which command will group the data frame df by column A and calculate the mean of column B for each group, then filter to only include groups where the mean of B is greater than 15?