Text Data Cleaning

Somsak Chanaim

International College of Digital Innovation, CMU

July 15, 2025

Background

You are a data analyst at a company. You’ve received an employee dataset that contains inconsistencies, extra spaces, mixed cases, and improperly formatted values. Your job is to clean and transform this dataset using appropriate Excel functions to prepare it for reporting.

Raw Employee Data

Download from GOOGLE DRIVE

Employee_ID Full_Name Department_Code
1001 Mr. John Smith. IT01
1002 Ms. SARAH KIM mk02
1003 Dr. Somchai Wong hr03
1004 MRS. Anne GREEN Fn04
1005 ms. lena wong. HR03
1006 MR. PETER jackson mk02
1007 Dr. james o’neill it01
1008 Mrs. AMY chen fn04
1009 mr. george WASHINGTON HR03
1010 Ms. Emily Taylor mk02

1. Remove Extra Spaces using TRIM()

Problem: Column Full_Name contains irregular spaces.

Tast: Create a column Clean_Name that removes extra spaces.

Formula:

=TRIM(B2)

Example Output for Row 2 (B2 = “Mr. John Smith.\(~~\)”): Mr. John Smith.

2. Capitalize Name Properly using PROPER()

Problem: Names are inconsistently cased.

Tast: Create column Proper_Name with capitalized first letters.

Formula:

=PROPER(TRIM(B2))

Example Output: Ms. Sarah Kim

3. Extract Title using TEXTBEFORE()

Problem: You need to extract the title (Mr., Ms., etc.)

Tast: Create column Title.

Formula (modern Excel 365):

=TEXTBEFORE(E2," ")

Example Output: Ms.

4. Extract First and Last Name using TEXTSPLIT()

Tast: From Proper_Name, extract first and last name into three columns.

Formula:

=TEXTSPLIT(C2," ")

5. Format Email using LOWER() and CONCAT() or TEXTJOIN()

Problem: Create corporate email using first.last@company.com format.

Tast: Column Email should contain lowercase email.

Formula:

=LOWER(D2&"."&E2&"@company.com")

Assumes D2 = First Name, E2 = Last Name

Example Output: sarah.kim@company.com

6. Convert Department Code and Look Up Full Name using UPPER() + XLOOKUP()

Problem: Department codes are mixed-case. You also need to display the full department name.

a) Normalize Department Code

Formula:

=UPPER(F2)

b) Look up Department Name

Assume lookup table in range H2:I5:

Code Department
IT01 IT
MK02 Marketing
HR03 Human Resources
FN04 Finance

Formula to find name from code:

=XLOOKUP(G2, H2:H5, I2:I5, "Not Found")