Data Wrangling

Before beginning the exercises, create a sample data frame named employees for use in the exercises.

Dataset for question 1:20

Dataset for question 21:30

1. Filter and Select:

Select only the Name and Department columns of employees who are older than 30.

2. Mutate and Arrange:

Create a new column called Bonus that is 10% of Salary, then arrange the data frame by Bonus in descending order.

3. Filter, Select, and Arrange:

Filter for employees who work in the “IT” department, select only their Name and Salary, and arrange the result in ascending order by Salary.

4. Group By and Summarize:

Find the average Salary by Department.

5. Filter, Mutate, and Select:

For employees with more than 10 years of experience, create a new column called Seniority with values “Senior” if Years_Experience is greater than 10, otherwise “Junior”. Select only the Name, Years_Experience, and Seniority columns.

6. Filter and Summarize:

Calculate the average Salary for employees over the age of 30.

7. Group By, Mutate, and Summarize:

Calculate the total Salary for each department and then create a new column Total_Budget that is 1.1 times this total (representing a 10% budget increase).

8. Filter, Arrange, and Select:

Filter the employees who have a Salary above 60,000, arrange them by Age, and select only the Name, Age, and Salary columns.

9. Mutate and Filter:

Add a new column Age_Group that categorizes employees as “Young” if Age is less than 30, and “Experienced” if 30 or older. Then filter for employees in the “Experienced” age group.

10. Group By, Summarize, and Arrange:

Find the average Years_Experience by Department and arrange the departments in descending order of experience.

11. Filter, Group By, and Summarize:

Calculate the average Salary for employees in the “Finance” and “Marketing” departments only.

12. Filter, Mutate, and Summarize:

Add a new column Adjusted_Salary that is 95% of the current Salary for employees in “HR” and “Finance”. Then calculate the average Adjusted_Salary for each department.

Solution:

employees |>
  mutate(Adjusted_Salary = ifelse(Department %in% c("HR", "Finance"), Salary * 0.95, Salary)) |>
  group_by(Department) |>
  summarise(avg_adjusted_salary = mean(Adjusted_Salary))

13. Select and Group By:

Select the Department and Salary columns, then calculate the sum of Salary for each department.

14. Filter and Group By:

Filter for employees over the age of 40, then group by Department and calculate the count of employees in each department.

15. Mutate, Filter, and Arrange:

Create a column Experience_Ratio that is Years_Experience divided by Age. Filter for employees with a Experience_Ratio greater than 0.3, then arrange by Experience_Ratio in descending order.

16. Filter, Summarize, and Arrange:

Filter for employees with a Salary greater than 60,000, then calculate the minimum and maximum Age in this group, and arrange by minimum Age.

17. Group By and Mutate:

For each department, calculate the average Salary, and then add a column Above_Avg_Salary that marks TRUE if an employee’s Salary is above their department’s average, and FALSE otherwise.

18. Filter, Mutate, and Select:

Filter for employees in the “IT” department, add a column called Tenure_Level that labels them as “Experienced” if they have 10 or more years of experience, otherwise “New”. Select only the Name, Tenure_Level, and Salary.

19. Group By, Summarize, and Arrange:

Group by Department and calculate the total number of employees in each department. Arrange in descending order by this count.

20. Mixed Operations:

Filter for employees in “HR” or “Marketing”, group by Department, calculate the average Salary and Years_Experience in each department, and arrange by average Salary in ascending order.

Solution:

employees |>
  filter(Department %in% c("HR", "Marketing")) |>
  group_by(Department) |>
  summarise(avg_salary = mean(Salary), avg_experience = mean(Years_Experience)) |>
  arrange(avg_salary)

21. Left Join:

Perform a left join on employees2 and salaries based on EmployeeID. Show all columns, and observe which employees have missing salary information.

22. Right Join:

Perform a right join on employees2 and salaries based on EmployeeID. Check which employees in the salaries data frame do not have corresponding information in employees.

23. Inner Join:

Use an inner join to combine employees2 and salaries by EmployeeID. Display only the records where both employee information and salary information are available.

24. Full Join:

Use a full join to combine employees2 and salaries based on EmployeeID. Identify any employees or salary records without a match in the other table.

25. Join with Filtering:

Perform a left join on employees2 and salaries based on EmployeeID, then filter to show only employees with salaries above 60,000.

26. Multiple Joins:

First, perform a left join between employees2 and salaries based on EmployeeID. Then, join the resulting data with departments to match each employee with their department manager based on the Department column.

27. Summarize After Join:

Perform an inner join between employees2 and salaries based on EmployeeID. Then, calculate the average salary for each department.

28. Self-Join:

Use left_join to join the employees2 data frame to itself on the Department column to compare employees within the same department.

29. Join with Different Key Columns:

Use full_join to combine employees2 and departments by matching the Department column in employees2 with the DepartmentName column in departments.

30. Identify Missing Data with Outer Join:

Perform a full join on employees2 and salaries based on EmployeeID. Filter the result to show only the records where either Name or Salary is NA to identify employees without salary data or vice versa.