\(\)Supervised Learning\(\)

Somsak Chanaim

International College of Digital Innovation, CMU

September 13, 2025

Learning Objectives

Be able to identify and explain appropriate techniques for handling different types of datasets
Be able to create visualizations to show model performance
Gain knowledge of using software to apply those techniques
Demonstrate the ability to communicate results from applying selected learning methods to data

Supervised Learning

Supervised learning is a method in Machine Learning where the system learns from data that already contains answers or outcomes (labels).

It uses sample data consisting of pairs of input and output to build a model that can predict outcomes for new incoming data.

It can be expressed as

\[y= f(x)+\varepsilon\]

where

y is the output, also called the dependent variable, target variable, or label
x is the input, also called the independent variable, feature, or attribute
\(f(\cdot)\) is the function that maps input to output
\(\varepsilon\) is the error term

Types of Supervised Learning

1. Regression

Used to predict the value of a target variable that is numerical
Example algorithms: Linear Regression, Ridge Regression,Lasso, Support Vector Regression, Gradient Boosting
(Examples typically predict continuous outcomes such as prices, demand, temperature)

2. Classification

Used to predict the value of a target variable that is categorical, e.g., Yes/No, Group A/B/C
Example algorithms: Logistic Regression, Decision Tree, Random Forest, Support Vector Machine (SVM), Neural Networks

Examples of Business Applications

1. Regression

Sales Forecasting Retail businesses can use Regression (e.g., Linear Regression) to forecast next month’s sales.
- Input: Season, price, promotions, inventory level
- Output: Future sales (amount)

\[\begin{aligned} \text{Sales}_{t+1} &= \beta_0 + \beta_1 \cdot \text{Season}_t + \beta_2 \cdot \text{Price}_t \\&~~~~+ \beta_3 \cdot \text{Promotion}_t + \beta_4 \cdot \text{Inventory}_t + \varepsilon_t \end{aligned}\]

Where:

\(\beta_0\) = intercept (baseline sales when all predictors are zero)
\(\beta_1, \beta_2, \beta_3, \beta_4\) = coefficients showing the effect of each variable
\(\varepsilon_t\) = error term (random factors not explained by the model)

Example with numbers (for illustration):

\[ \begin{aligned} \text{Sales}_{t+1} &= 200 + 50 \cdot \text{Season}_t - 30 \cdot \text{Price}_t \\ &~~~~+ 80 \cdot \text{Promotion}_t + 0.5 \cdot \text{Inventory}_t + \varepsilon_t \end{aligned} \]

So if:

Season = 1 (holiday season)
Price = 20 (per unit)
Promotion = 1 (campaign active)
Inventory = 500 units

Then predicted sales would be:

\[\begin{aligned} 200 + 50(1) - 30(20) + 80(1) + 0.5(500) &= \\200 + 50 - 600 + 80 + 250 &= -20 \end{aligned}\]

(meaning the pricing is too high — sales prediction goes negative, signaling a bad strategy).

Another example

Price Prediction: Real estate businesses use Regression models such as Ridge Regression or Random Forest to predict house prices.

Input: House size, number of rooms, location, year built
Output: House price (numeric value)

\[\begin{aligned} \text{Price} &= \beta_0 + \beta_1 \cdot \text{Size} + \beta_2 \cdot \text{Rooms} + \beta_3 \cdot \text{Location}\\&~~~ + \beta_4 \cdot \text{YearBuilt} + \varepsilon \end{aligned}\]

Where:

\(\beta_0\) = intercept (baseline price)
\(\beta_1, \beta_2, \beta_3, \beta_4\) = coefficients for each feature
\(\varepsilon\) = error term (unexplained variation)

Example with numbers:

\[\begin{aligned} \text{Price} &= 50{,}000 + 200 \cdot \text{Size} + 15{,}000 \cdot \text{Rooms}\\&~~~~ + 80{,}000 \cdot \text{LocationIndex} + 500 \cdot \text{YearBuilt} + \varepsilon \end{aligned}\]

If:

Size = 120 m²
Rooms = 3
LocationIndex = 2 (good neighborhood)
YearBuilt = 2015

Then:

\[ \text{Price} = 50{,}000 + 200(120) + 15{,}000(3) + 80{,}000(2) + 500(2015) \]

\[ = 50{,}000 + 24{,}000 + 45{,}000 + 160{,}000 + 1{,}007{,}500 = 1{,}286{,}500 \]

Predicted house price = 1.29 million (approx.)

2. Classification

Churn Prediction Telecom companies or subscription-based services use Classification models (e.g., Random Forest, Logistic Regression) to detect which customers are likely to stop using the service and take marketing actions to retain them.
- Input: Age, gender, usage history, complaints
- Output: Churn (Yes/No)

Logistic Regression Equation for Churn Prediction

\[ Pr(\text{Churn} = 1) \;=\; \frac{1}{1 + e^{-(\beta_0 + \beta_1 \cdot \text{Age} + \beta_2 \cdot \text{Gender} + \beta_3 \cdot \text{UsageHistory} + \beta_4 \cdot \text{Complaints})}} \]

Where:

\(Pr(\text{Churn} = 1)\) = probability that a customer will churn (Yes)
\(\beta_0\) = intercept
\(\beta_1, \beta_2, \beta_3, \beta_4\) = coefficients for each feature
Predictors (inputs): Age, Gender, Usage History, Complaints

Example (illustrative coefficients):

\[ Pr(\text{Churn} = 1) = \frac{1}{1 + e^{-( -2.5 + 0.03 \cdot \text{Age} + 0.8 \cdot \text{Gender} + 1.2 \cdot \text{Complaints} - 0.05 \cdot \text{UsageHistory})}} \]

If a 40-year-old male customer with 2 complaints and low usage history is evaluated, the probability might turn out high, meaning the company should intervene.

Another example

Fraud Detection: Banks or insurance companies use Classification models (e.g., Neural Networks, Gradient Boosting) to detect suspicious transactions.

Input: Transaction amount, location, time
Output: Normal transaction / Fraudulent transaction

Example Questions and Answers from the Tree

If a transaction has Amount = 250, what will the model classify it as?

A1:

The first split is Amount >= 226.
Since 250 ≥ 226 → the model goes left.
Predicted class: Fraud (≈ 78% probability).

A transaction has Amount = 180 and Location = Rural. What is the prediction?

A2:

First split: 180 < 226 → go right.
Next split: Amount ≥ 152 → true, go right.
Next: Amount < 195 → true, go left.
Next split: Amount ≥ 163 → true, go left.
Prediction: Fraud (≈ 62% probability).

A transaction has Amount = 120. What does the model predict?

A3:

First split: 120 < 226 → go right.
Next: 120 < 152 → true, go left.
Next: 120 < 156 → true, go left.
Prediction: Fraud (≈ 67% probability).

If a transaction has Amount = 170, Location = Urban, what is the prediction?

A4:

First: 170 < 226 → go right.
Next: Amount ≥ 152 → true.
Next: Amount < 195 → true.
Next: Amount ≥ 163 → true.
Prediction: Fraud (≈ 62% probability).

If Amount = 90, what happens?

A5:

90 < 226 → right.
90 < 152 → left.
90 < 156 → left.
Prediction: Fraud (≈ 67%).

Advantages of Supervised Learning

High accuracy when training data is of good quality
Models can be easily adjusted to fit the problem

Disadvantages of Supervised Learning

Requires labeled data, which may be costly to collect
Model performance depends on data completeness

Software for Study: Orange Data Mining

Advantages of Orange Data Mining

1. Easy to Use with a Drag-and-Drop Interface

Users do not need prior coding experience to analyze data easily — simply drag and drop modules onto the workspace.

Suitable for beginners who want to learn data analysis and Machine Learning.

2. Comprehensive Tools for Data Analysis and Machine Learning

Supports both basic data analysis such as Exploratory Data Analysis (EDA) and building Machine Learning models.

Provides a wide range of tools such as Classification, Regression, Clustering, PCA, Text Mining.

3. Open Source with Add-ons

Orange is an open-source program that is free to use and offers many add-ons for specific tasks, such as Text Mining.

Available on Windows, macOS, and Linux.

4. Interactive Visualization

Orange provides visualization tools that support interactive displays, such as Scatter Plot, Heatmap, Decision Tree, Network Graph.
These help users understand data and results more easily.

\(~~~~~~~~~~\)Supervised Learning\(~~~~~~~~~~\)

Learning Objectives

Supervised Learning

Types of Supervised Learning

Examples of Business Applications

Logistic Regression Equation for Churn Prediction

Example Questions and Answers from the Tree

Software for Study: Orange Data Mining

Advantages of Orange Data Mining

1. Easy to Use with a Drag-and-Drop Interface

2. Comprehensive Tools for Data Analysis and Machine Learning

3. Open Source with Add-ons

4. Interactive Visualization

\(\)Supervised Learning\(\)