Data Wrangling

Before beginning the exercises, create a sample data frame named employees for use in the exercises.

Select only the Name and Department columns of employees who are older than 30.

Solution:

employees |> 
  filter(Age > 30) |> 
  select(Name, Department)

Create a new column called Bonus that is 10% of Salary, then arrange the data frame by Bonus in descending order.

Solution:

employees |>
  mutate(Bonus = Salary * 0.10) |>
  arrange(desc(Bonus))

Filter for employees who work in the “IT” department, select only their Name and Salary, and arrange the result in ascending order by Salary.

Solution:

employees |>
  filter(Department == "IT") |>
  select(Name, Salary) |>
  arrange(Salary)

Find the average Salary by Department.

Solution:

employees |>
  group_by(Department) |>
  summarise(avg_salary = mean(Salary))

For employees with more than 10 years of experience, create a new column called Seniority with values “Senior” if Years_Experience is greater than 10, otherwise “Junior”. Select only the Name, Years_Experience, and Seniority columns.

Solution:

employees |>
  filter(Years_Experience > 10) |>
  mutate(Seniority = ifelse(Years_Experience > 10, "Senior", "Junior")) |>
  select(Name, Years_Experience, Seniority)

Calculate the average Salary for employees over the age of 30.

Solution:

employees |>
  filter(Age > 30) |>
  summarise(avg_salary = mean(Salary))

Calculate the total Salary for each department and then create a new column Total_Budget that is 1.1 times this total (representing a 10% budget increase).

Solution:

employees |>
  group_by(Department) |>
  summarise(Total_Salary = sum(Salary)) |>
  mutate(Total_Budget = Total_Salary * 1.10)

Filter the employees who have a Salary above 60,000, arrange them by Age, and select only the Name, Age, and Salary columns.

Solution:

employees |>
  filter(Salary > 60000) |>
  arrange(Age) |>
  select(Name, Age, Salary)

Add a new column Age_Group that categorizes employees as “Young” if Age is less than 30, and “Experienced” if 30 or older. Then filter for employees in the “Experienced” age group.

Solution:

employees |>
  mutate(Age_Group = ifelse(Age < 30, "Young", "Experienced")) |>
  filter(Age_Group == "Experienced")

Find the average Years_Experience by Department and arrange the departments in descending order of experience.

Solution:

employees |>
  group_by(Department) |>
  summarise(avg_experience = mean(Years_Experience)) |>
  arrange(desc(avg_experience))

Calculate the average Salary for employees in the “Finance” and “Marketing” departments only.

Solution:

employees |>
  filter(Department %in% c("Finance", "Marketing")) |>
  group_by(Department) |>
  summarise(avg_salary = mean(Salary))

Add a new column Adjusted_Salary that is 95% of the current Salary for employees in “HR” and “Finance”. Then calculate the average Adjusted_Salary for each department.

Solution:

employees |>
  mutate(Adjusted_Salary = ifelse(Department %in% c("HR", "Finance"), Salary * 0.95, Salary)) |>
  group_by(Department) |>
  summarise(avg_adjusted_salary = mean(Adjusted_Salary))

Select the Department and Salary columns, then calculate the sum of Salary for each department.

Solution:

employees |>
  select(Department, Salary) |>
  group_by(Department) |>
  summarise(total_salary = sum(Salary))

Filter for employees over the age of 40, then group by Department and calculate the count of employees in each department.

Solution:

employees |>
  filter(Age > 40) |>
  group_by(Department) |>
  summarise(employee_count = n())

Create a column Experience_Ratio that is Years_Experience divided by Age. Filter for employees with a Experience_Ratio greater than 0.3, then arrange by Experience_Ratio in descending order.

Solution:

employees |>
  mutate(Experience_Ratio = Years_Experience / Age) |>
  filter(Experience_Ratio > 0.3) |>
  arrange(desc(Experience_Ratio))

Filter for employees with a Salary greater than 60,000, then calculate the minimum and maximum Age in this group, and arrange by minimum Age.

Solution:

employees |>
  filter(Salary > 60000) |>
  summarise(min_age = min(Age), max_age = max(Age)) |>
  arrange(min_age)

For each department, calculate the average Salary, and then add a column Above_Avg_Salary that marks TRUE if an employee’s Salary is above their department’s average, and FALSE otherwise.

Solution:

employees |>
  group_by(Department) |>
  mutate(avg_salary = mean(Salary)) |>
  mutate(Above_Avg_Salary = Salary > avg_salary) |>
  select(Name, Department, Salary, Above_Avg_Salary)

Filter for employees in the “IT” department, add a column called Tenure_Level that labels them as “Experienced” if they have 10 or more years of experience, otherwise “New”. Select only the Name, Tenure_Level, and Salary.

Solution:

employees |>
  filter(Department == "IT") |>
  mutate(Tenure_Level = ifelse(Years_Experience >= 10, "Experienced", "New")) |>
  select(Name, Tenure_Level, Salary)

Group by Department and calculate the total number of employees in each department. Arrange in descending order by this count.

Solution:

employees |>
  group_by(Department) |>
  summarise(employee_count = n()) |>
  arrange(desc(employee_count))

Filter for employees in “HR” or “Marketing”, group by Department, calculate the average Salary and Years_Experience in each department, and arrange by average Salary in ascending order.

Solution:

employees |>
  filter(Department %in% c("HR", "Marketing")) |>
  group_by(Department) |>
  summarise(avg_salary = mean(Salary), avg_experience = mean(Years_Experience)) |>
  arrange(avg_salary)

Perform a left join on employees2 and salaries based on EmployeeID. Show all columns, and observe which employees have missing salary information.

Solution:

employees2 |>
  left_join(salaries, by = "EmployeeID")

Perform a right join on employees2 and salaries based on EmployeeID. Check which employees in the salaries data frame do not have corresponding information in employees.

Solution:

employees2 |>
  right_join(salaries, by = "EmployeeID")

Use an inner join to combine employees2 and salaries by EmployeeID. Display only the records where both employee information and salary information are available.

Solution:

employees2 |>
  inner_join(salaries, by = "EmployeeID")

Use a full join to combine employees2 and salaries based on EmployeeID. Identify any employees or salary records without a match in the other table.

Solution:

employees2 |>
  full_join(salaries, by = "EmployeeID")

Perform a left join on employees2 and salaries based on EmployeeID, then filter to show only employees with salaries above 60,000.

Solution:

employees2 |>
  left_join(salaries, by = "EmployeeID") |>
  filter(Salary > 60000)

First, perform a left join between employees2 and salaries based on EmployeeID. Then, join the resulting data with departments to match each employee with their department manager based on the Department column.

Solution:

employees2 |>
  left_join(salaries, by = "EmployeeID") |>
  left_join(departments, by = c("Department" = "DepartmentName"))

Perform an inner join between employees2 and salaries based on EmployeeID. Then, calculate the average salary for each department.

Solution:

employees2 |>
  inner_join(salaries, by = "EmployeeID") |>
  group_by(Department) |>
  summarise(avg_salary = mean(Salary, na.rm = TRUE))

Use left_join to join the employees2 data frame to itself on the Department column to compare employees within the same department.

Solution:

employees2 |>
  left_join(employees, by = "Department", suffix = c("_1", "_2")) |>
  filter(EmployeeID_1 != EmployeeID_2)

Use full_join to combine employees2 and departments by matching the Department column in employees2 with the DepartmentName column in departments.

Solution:

employees2 |>
  full_join(departments, by = c("Department" = "DepartmentName"))

Perform a full join on employees2 and salaries based on EmployeeID. Filter the result to show only the records where either Name or Salary is NA to identify employees without salary data or vice versa.

Solution:

employees2 |>
  full_join(salaries, by = "EmployeeID") |>
  filter(is.na(Name) | is.na(Salary))