\(~~~~~~~~~~\)Supervised Learning\(~~~~~~~~~~\)

Somsak Chanaim

International College of Digital Innovation, CMU

September 13, 2025

Learning Objectives

  • Be able to identify and explain appropriate techniques for handling different types of datasets

  • Be able to create visualizations to show model performance

  • Gain knowledge of using software to apply those techniques

  • Demonstrate the ability to communicate results from applying selected learning methods to data

Supervised Learning

Supervised learning is a method in Machine Learning where the system learns from data that already contains answers or outcomes (labels).

It uses sample data consisting of pairs of input and output to build a model that can predict outcomes for new incoming data.

It can be expressed as

\[y= f(x)+\varepsilon\]

where

  • y is the output, also called the dependent variable, target variable, or label

  • x is the input, also called the independent variable, feature, or attribute

  • \(f(\cdot)\) is the function that maps input to output

  • \(\varepsilon\) is the error term

Types of Supervised Learning

1. Regression

  • Used to predict the value of a target variable that is numerical

  • Example algorithms: Linear Regression, Ridge Regression,Lasso, Support Vector Regression, Gradient Boosting

  • (Examples typically predict continuous outcomes such as prices, demand, temperature)

2. Classification

  • Used to predict the value of a target variable that is categorical, e.g., Yes/No, Group A/B/C

  • Example algorithms: Logistic Regression, Decision Tree, Random Forest, Support Vector Machine (SVM), Neural Networks

Examples of Business Applications

1. Regression

  • Sales Forecasting Retail businesses can use Regression (e.g., Linear Regression) to forecast next month’s sales.

    • Input: Season, price, promotions, inventory level

    • Output: Future sales (amount)

\[\begin{aligned} \text{Sales}_{t+1} &= \beta_0 + \beta_1 \cdot \text{Season}_t + \beta_2 \cdot \text{Price}_t \\&~~~~+ \beta_3 \cdot \text{Promotion}_t + \beta_4 \cdot \text{Inventory}_t + \varepsilon_t \end{aligned}\]

Where:

  • \(\beta_0\) = intercept (baseline sales when all predictors are zero)
  • \(\beta_1, \beta_2, \beta_3, \beta_4\) = coefficients showing the effect of each variable
  • \(\varepsilon_t\) = error term (random factors not explained by the model)

Example with numbers (for illustration):

\[ \begin{aligned} \text{Sales}_{t+1} &= 200 + 50 \cdot \text{Season}_t - 30 \cdot \text{Price}_t \\ &~~~~+ 80 \cdot \text{Promotion}_t + 0.5 \cdot \text{Inventory}_t + \varepsilon_t \end{aligned} \]

So if:

  • Season = 1 (holiday season)

  • Price = 20 (per unit)

  • Promotion = 1 (campaign active)

  • Inventory = 500 units

Then predicted sales would be:

\[\begin{aligned} 200 + 50(1) - 30(20) + 80(1) + 0.5(500) &= \\200 + 50 - 600 + 80 + 250 &= -20 \end{aligned}\]

(meaning the pricing is too high — sales prediction goes negative, signaling a bad strategy).

Another example

Price Prediction: Real estate businesses use Regression models such as Ridge Regression or Random Forest to predict house prices.

  • Input: House size, number of rooms, location, year built

  • Output: House price (numeric value)

\[\begin{aligned} \text{Price} &= \beta_0 + \beta_1 \cdot \text{Size} + \beta_2 \cdot \text{Rooms} + \beta_3 \cdot \text{Location}\\&~~~ + \beta_4 \cdot \text{YearBuilt} + \varepsilon \end{aligned}\]

Where:

  • \(\beta_0\) = intercept (baseline price)
  • \(\beta_1, \beta_2, \beta_3, \beta_4\) = coefficients for each feature
  • \(\varepsilon\) = error term (unexplained variation)

Example with numbers:

\[\begin{aligned} \text{Price} &= 50{,}000 + 200 \cdot \text{Size} + 15{,}000 \cdot \text{Rooms}\\&~~~~ + 80{,}000 \cdot \text{LocationIndex} + 500 \cdot \text{YearBuilt} + \varepsilon \end{aligned}\]

If:

  • Size = 120 m²
  • Rooms = 3
  • LocationIndex = 2 (good neighborhood)
  • YearBuilt = 2015

Then:

\[ \text{Price} = 50{,}000 + 200(120) + 15{,}000(3) + 80{,}000(2) + 500(2015) \]

\[ = 50{,}000 + 24{,}000 + 45{,}000 + 160{,}000 + 1{,}007{,}500 = 1{,}286{,}500 \]

Predicted house price = 1.29 million (approx.)

2. Classification

  • Churn Prediction Telecom companies or subscription-based services use Classification models (e.g., Random Forest, Logistic Regression) to detect which customers are likely to stop using the service and take marketing actions to retain them.

    • Input: Age, gender, usage history, complaints

    • Output: Churn (Yes/No)

Logistic Regression Equation for Churn Prediction

\[ Pr(\text{Churn} = 1) \;=\; \frac{1}{1 + e^{-(\beta_0 + \beta_1 \cdot \text{Age} + \beta_2 \cdot \text{Gender} + \beta_3 \cdot \text{UsageHistory} + \beta_4 \cdot \text{Complaints})}} \]

Where:

  • \(Pr(\text{Churn} = 1)\) = probability that a customer will churn (Yes)
  • \(\beta_0\) = intercept
  • \(\beta_1, \beta_2, \beta_3, \beta_4\) = coefficients for each feature
  • Predictors (inputs): Age, Gender, Usage History, Complaints

Example (illustrative coefficients):

\[ Pr(\text{Churn} = 1) = \frac{1}{1 + e^{-( -2.5 + 0.03 \cdot \text{Age} + 0.8 \cdot \text{Gender} + 1.2 \cdot \text{Complaints} - 0.05 \cdot \text{UsageHistory})}} \]

  • If a 40-year-old male customer with 2 complaints and low usage history is evaluated, the probability might turn out high, meaning the company should intervene.

Another example

Fraud Detection: Banks or insurance companies use Classification models (e.g., Neural Networks, Gradient Boosting) to detect suspicious transactions.

  • Input: Transaction amount, location, time

  • Output: Normal transaction / Fraudulent transaction

Example Questions and Answers from the Tree

If a transaction has Amount = 250, what will the model classify it as?

A1:

  • The first split is Amount >= 226.
  • Since 250 ≥ 226 → the model goes left.
  • Predicted class: Fraud (≈ 78% probability).

A transaction has Amount = 180 and Location = Rural. What is the prediction?

A2:

  • First split: 180 < 226 → go right.
  • Next split: Amount ≥ 152 → true, go right.
  • Next: Amount < 195 → true, go left.
  • Next split: Amount ≥ 163 → true, go left.
  • Prediction: Fraud (≈ 62% probability).

A transaction has Amount = 120. What does the model predict?

A3:

  • First split: 120 < 226 → go right.
  • Next: 120 < 152 → true, go left.
  • Next: 120 < 156 → true, go left.
  • Prediction: Fraud (≈ 67% probability).

If a transaction has Amount = 170, Location = Urban, what is the prediction?

A4:

  • First: 170 < 226 → go right.
  • Next: Amount ≥ 152 → true.
  • Next: Amount < 195 → true.
  • Next: Amount ≥ 163 → true.
  • Prediction: Fraud (≈ 62% probability).

If Amount = 90, what happens?

A5:

  • 90 < 226 → right.
  • 90 < 152 → left.
  • 90 < 156 → left.
  • Prediction: Fraud (≈ 67%).

Advantages of Supervised Learning

  • High accuracy when training data is of good quality

  • Models can be easily adjusted to fit the problem

Disadvantages of Supervised Learning

  • Requires labeled data, which may be costly to collect

  • Model performance depends on data completeness

Software for Study: Orange Data Mining

Orange Data Mining

Orange Data Mining

Advantages of Orange Data Mining

1. Easy to Use with a Drag-and-Drop Interface

  • Users do not need prior coding experience to analyze data easily — simply drag and drop modules onto the workspace.

  • Suitable for beginners who want to learn data analysis and Machine Learning.

2. Comprehensive Tools for Data Analysis and Machine Learning

  • Supports both basic data analysis such as Exploratory Data Analysis (EDA) and building Machine Learning models.

Provides a wide range of tools such as Classification, Regression, Clustering, PCA, Text Mining.

3. Open Source with Add-ons

  • Orange is an open-source program that is free to use and offers many add-ons for specific tasks, such as Text Mining.
  • Available on Windows, macOS, and Linux.

4. Interactive Visualization

  • Orange provides visualization tools that support interactive displays, such as Scatter Plot, Heatmap, Decision Tree, Network Graph.

  • These help users understand data and results more easily.