\(~~~~~~~~~~\)Classification Models\(~~~~~~~~~~\)

Somsak Chanaim

International College of Digital Innovation, CMU

September 18, 2025

Introduction

Classification problems in business arise in many contexts. These problems aim to predict which predefined group new data should belong to. Common examples in business include:

1. Customer Analytics

Example Problems:

  • Customer Churn Prediction πŸƒβ™‚οΈπŸ’¨:
    • Classify customers as likely to churn or retain, in order to design effective retention strategies.
  • Customer Segmentation πŸ‘₯πŸ“Š:
    • Classify customers into groups such as premium customers, regular customers, or high-potential target groups.
  • Customer Loyalty Prediction πŸ’™πŸ€:
    • Classify whether a customer will become a loyal customer to the brand or not.

2. Risk Analytics

Example Problems:

  • Credit Scoring πŸ’³πŸ“ˆ:
    • Classify customers as high-risk (default) or low-risk for loan repayment.
  • Fraud Detection πŸ•΅οΈβ€β™‚οΈβš οΈ:
    • Classify transactions as likely fraudulent or normal.
  • Insurance Risk Assessment πŸ›‘οΈπŸ“‰:
    • Classify policyholders as high-risk or low-risk.

3. Marketing Analytics

Example Problems:

  • Campaign Response Prediction πŸ“’βœ…:
    • Classify customers as likely or unlikely to respond to a marketing campaign (e.g., clicking a link or purchasing a product).
  • Sentiment Analysis πŸ˜€πŸ˜‘πŸ˜:
    • Classify customer reviews as positive, negative, or neutral.
  • Recommendation System πŸ›’βœ¨:
    • Classify which products customers are most likely to purchase.

4. Supply Chain Management

Example Problems:

  • Predictive Maintenance πŸ­πŸ› οΈ:
    • Classify machines as likely to fail or still healthy/operational.
  • Order Prioritization πŸ“¦βš‘:
    • Classify orders as urgent (priority shipping) or normal delivery.

5. HR Analytics

Example Problems:

  • Employee Attrition Prediction πŸ§‘πŸ’ΌπŸšͺ:
    • Classify whether an employee is likely to leave (attrition) or stay in the organization.
  • Employee Potential Assessment πŸŒŸπŸ“Š:
    • Classify employees based on potential, such as high-potential (rising stars) or low-potential (needs skill development).

6. Financial Management

Example Problems:

  • Liquidity Analysis πŸ’§πŸ¦:
    • Classify companies as financially liquid (good liquidity) or not.
  • Portfolio Classification πŸ“ŠπŸ’Ό:
    • Classify investment portfolios based on risk levels such as high, medium, or low risk.

7. Retail and E-commerce

Example Problems:

  • Purchase Prediction πŸ›οΈπŸ€”:
    • Classify which product category a customer is most likely to purchase.
  • Inventory Management πŸ“¦πŸ“ˆ:
    • Classify products as high-demand or low-demand.
  • Product Recommendation πŸ›’πŸ”—:
    • Classify which products customers are most likely to purchase together.

8. Healthcare

Example Problems:

  • Disease Diagnosis 🧬🩺:
    • Classify patients as being at risk for certain diseases (e.g., cancer, diabetes) or healthy.
  • Readmission Prediction πŸ₯πŸ”„:
    • Classify whether a patient is likely to be readmitted to the hospital.
  • Risk Stratification βš•οΈπŸ“Š:
    • Classify patients based on the severity of their conditions.

9. Technology and IT

Example Problems:

  • Malware DetectionπŸ¦ πŸ’»:
    • Classify software as malware or not.
  • Spam DetectionπŸ“§πŸš«:
    • Classify emails as spam or normal.
  • Error Classificationβš οΈπŸ› οΈ:
    • Classify system errors, such as hardware-related or software-related issues.

Classification Models

Logistic Regression and Decision Tree

These are models commonly used for classification problems,
but they differ in how they work and how their results are interpreted:

1. Logistic Regression

Logistic Regression Equation

\[ Pr(y=1|x) = \frac{1}{1 + e^{-(\beta_0 + \beta_1x_1 + \beta_2x_2 + \cdots + \beta_nx_n)}} \]

How it works

  • Logistic Regression uses a mathematical linear model to predict the probability of a class.

  • It applies the logistic (sigmoid) function to transform predictions into values between 0 and 1.

  • The output is interpreted as follows:
    if the probability > 0.5 (or a chosen threshold), the model classifies the instance as class 1; otherwise, as class 0.

Advantages

  • Suitable when there is a linear relationship between independent variables (\(x\)) and the log-odds.

  • Easy to interpret results (e.g., coefficients indicate the direction and magnitude of relationships).

Disadvantages

  • Cannot capture complex or non-linear relationships.

  • Sensitive to outliers.

2. Decision Tree

How it works

  • A Decision Tree uses recursive partitioning to split the data repeatedly, creating a tree-like structure.

  • Each node selects the most suitable variable and splitting criterion (e.g., Gini index or entropy) to maximize classification effectiveness.

  • Leaf nodes represent the final class of the data.

Advantages

  • Easy to understand and interpret (the tree shows explicit rules for classification).

  • Can capture non-linear relationships effectively.

  • Works well with mixed data types (numeric and categorical).

Disadvantages

  • Prone to overfitting if the tree grows too deep.

  • Less robust when the dataset is very noisy.

Comparison

Feature Logistic Regression Decision Tree
Complexity of relationships Suitable for linear relationships Can capture non-linear relationships
Interpretability Easy to interpret (based on coefficients) Easy to interpret (tree structure)
Sensitivity to outliers High Low
Risk of overfitting Low High (if tree depth not controlled)
Handling of noisy data Low Moderate

Decision Tree with Iris dataset

Select Data

Decision in Classification Trees

A decision is a test on a feature that helps separate classes.

Previous Example:

  • In the Iris dataset: β€œIs Petal.Length ≀ 2.45?”

    • If Yes β†’ Class = Setosa
    • If No β†’ further splitting (next decision)”
  • Each decision aims to maximize class purity (using Gini index, Entropy/Information Gain, or Misclassification Error).

Components of a Classification Tree

  1. Root Node

    • The starting point of the tree.
    • Contains the entire dataset before any split.
    • Represents the first decision rule that best separates the classes.
    • Example: β€œSepal.Length < 5.5?”
  2. Decision Nodes

    • Internal nodes where the tree continues to split based on feature tests.
    • Each decision node asks a Yes/No question about a feature.
    • Example: β€œSepal.Width β‰₯ 2.8?” or β€œSepal.Length < 6.2?”
  3. Branches

    • The connectors (edges) between nodes.
    • Show the outcome of a decision at a node.
    • Usually labeled β€œYes” or β€œNo” depending on whether the condition is true or false.
  4. Leaf Nodes (Terminal Nodes)

    • The endpoints of the tree.
    • Represent the final classification result.
    • Each leaf is assigned a class label (e.g., Setosa, Versicolor, Virginica).
    • Often includes class probabilities or percentages.

Summary

  • Root Node β†’ entry point (whole dataset).

  • Decision Nodes β†’ internal splits (feature tests).

  • Branches β†’ paths (Yes/No).

  • Leaf Nodes β†’ final classification output.

Classification

Example: Predicting Whether a Customer Will Purchase

Suppose we want to build a Logistic Regression model to predict
whether a customer will purchase a product (Buy = Yes/No),
based on information such as Age, Income, and Gender.

  • Variables:

    • Age: Customer’s age

    • Income: Annual income

    • Gender: Gender (Male/Female)

    • Buy: Purchase decision (No/Yes)

The logistic regression model is built using the basic equation:

The logistic regression model is built using the basic equation:

\[ \begin{aligned} \text{Logit}(P) =& \beta_0 + \beta_1 \cdot \text{Age} + \beta_2 \cdot \text{Income}\\ &+ \beta_3 \cdot \text{GenderMale} \end{aligned} \]

\(\text{Logit}(P)\) is the logarithm of the odds:

\[ \text{Logit}(P) = \ln\left(\frac{P}{1-P}\right) \]

Where \(P\) is the probability that the outcome is β€œYes” (\(P(\text{Buy} = \text{Yes})\)).

  • \(\beta_0\): Intercept (constant term)

  • \(\beta_1, \beta_2, \beta_3\): Coefficients of the variables

  • \(\text{GenderMale}\): A dummy variable created from the categorical variable Gender,
    where β€œMale” = 1 and β€œFemale” = 0

Build a Logistic Regression Model

The model will predict whether a customer will purchase (Buy = Yes/No)
using the variables Age, Income, Gender.

Interpreting the Coefficients

  • Intercept = -0.6951
    The Intercept represents the log-odds when all variables are 0
    (e.g., Age = 0, Income = 0, and Gender = Female).
    However, the intercept usually has no direct business interpretation.

  • Age = 0.002941
    This means that for every 1-year increase in age, the odds of purchase (Buy = 1) change according to the Odds Ratio:

    \[e^{0.002941} \approx 1.0029\]

    or a very small change (~0.3%), which is not statistically significant (p-value = 0.792).

  • Income = -8.422e-07
    This means that for every 1-unit increase in income (e.g., 1 dollar), the odds of purchase slightly decrease,
    but the effect is so small that it is practically meaningless.
    The very high p-value (0.898) indicates this variable has no effect on purchase decisions.

  • GenderMale = 0.1274
    This means that males are slightly more likely to purchase compared to females,
    with an Odds Ratio of:

    \[e^{0.1274} \approx 1.136\]

    or about a 13.6% increase in purchase likelihood.
    However, since the p-value = 0.665 is very high,
    there is no statistical evidence that this difference is significant.

Practice with Orange Data Mining

Example 2: Will a Customer Apply for a Credit Card?

We want to predict whether a customer will apply for a credit card using the following variables:

  • Spending, CreditScore, Married (Marital Status)

Logistic Regression Equation

Based on the model summary, the equation may look like this:

\[ \begin{aligned} \text{Logit}(P) =&\beta_0 + \beta_1 \cdot \text{Spending} + \beta_2 \cdot \text{CreditScore} \\ &+ \beta_3 \cdot \text{MarriedSingle} \\&+ \beta_4 \cdot \text{MarriedDivorced} \end{aligned} \]


Call:
glm(formula = Subscribed ~ Spending + CreditScore + Married, 
    family = binomial, data = data2)

Coefficients:
                 Estimate Std. Error z value Pr(>|z|)  
(Intercept)     6.154e-01  6.029e-01   1.021   0.3074  
Spending       -1.718e-05  4.713e-05  -0.364   0.7155  
CreditScore    -1.370e-03  7.682e-04  -1.783   0.0745 .
MarriedMarried -4.316e-01  3.684e-01  -1.171   0.2414  
MarriedSingle  -5.690e-01  3.724e-01  -1.528   0.1266  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 381.91  on 299  degrees of freedom
Residual deviance: 375.85  on 295  degrees of freedom
AIC: 385.85

Number of Fisher Scoring iterations: 4

Interpretation of Results

(1) Coefficients (Estimates)

  • Intercept = 0.6154

    • Represents the log-odds of subscription (Subscribed = 1) when Spending and CreditScore are 0, and the customer belongs to the reference group (Divorced).
    • This value has little practical meaning.
  • Spending = -1.718e-05 (~0)

    • Spending has virtually no effect on the likelihood of subscription.
  • CreditScore = -0.00137

    • Higher credit scores tend to reduce the likelihood of subscription.
    • Odds Ratio = \(e^{-0.00137} \approx 0.9986\) β†’ subscription decreases slightly (~0.14% per 1 unit increase in CreditScore).
  • MarriedMarried = -0.4316

    • Customers who are married are less likely to subscribe compared to divorced customers.
    • Odds Ratio = \(e^{-0.4316} \approx 0.65\) β†’ about 35% lower likelihood compared to divorced customers.
  • MarriedSingle = -0.5690

    • Customers who are single are less likely to subscribe compared to divorced customers.
    • Odds Ratio = \(e^{-0.5690} \approx 0.57\) β†’ about 43% lower likelihood compared to divorced customers.

What is Log-Odds?

Log-Odds

Log-Odds is the output of a Logistic Regression model, calculated from the equation:

\[ \begin{aligned} \log\left(\frac{Pr(\text{Event})}{1 - Pr(\text{Event})}\right) = &\beta_0 + \beta_1 X_1 + \beta_2 X_2\\ &+ \dots + \beta_n X_n \end{aligned} \]

1. Meaning of Log-Odds

  • Log-Odds measures the likelihood of the event of interest (Event = 1).

  • When Log-Odds increases β†’ the probability of the event occurring increases.

  • When Log-Odds decreases β†’ the probability of the event occurring decreases.

  • If Log-Odds = 0, the event and non-event are equally likely (Pr(Event) = 0.5).

2. Converting Log-Odds to Probability

Because Log-Odds is not intuitive, we usually transform it into a probability using:

\[ Pr(\text{Event}) = \frac{e^{\text{Log-Odds}}}{1 + e^{\text{Log-Odds}}} \]

Examples:

  • If Log-Odds = 0, then \(Pr(Event) = \frac{e^0}{1+e^0} = 0.5\) (50%).

  • If Log-Odds = 2, then \(Pr(Event) = \frac{e^2}{1+e^2} \approx 0.88\) (88%).

  • If Log-Odds = -2, then \(Pr(Event) = \frac{e^{-2}}{1+e^{-2}} \approx 0.12\) (12%).

3. Relationship Between Coefficients and Log-Odds

In logistic regression, the coefficients (Estimates) represent the change in Log-Odds when \(X\) increases by 1 unit.

  • If \(\beta > 0\) β†’ as \(X\) increases, Log-Odds increase β†’ probability of the event increases.

  • If \(\beta < 0\) β†’ as \(X\) increases, Log-Odds decrease β†’ probability of the event decreases.

4. Converting Log-Odds to Odds Ratio (OR)

We often use the Odds Ratio (OR) to make Log-Odds easier to interpret, using:

\[ \text{Odds Ratio} = e^{\beta} \]

  • If OR > 1 β†’ the variable increases the chance of the event occurring.

  • If OR = 1 β†’ the variable has no effect on the chance of the event.

  • If OR < 1 β†’ the variable decreases the chance of the event.

Example:

  • If \(\beta = 0.7\), then Odds Ratio = \(e^{0.7} \approx 2.01\) β†’
    this means the variable doubles the chance of the event occurring.

  • If \(\beta = -0.7\), then Odds Ratio = \(e^{-0.7} \approx 0.50\) β†’
    this means the variable reduces the chance of the event by 50%.

Summary

  1. Log-Odds indicate the tendency of an event to occur (increase or decrease).

  2. Log-Odds can be converted into probability using the logistic function.

  3. Positive Log-Odds mean higher probability of the event,
    while negative values mean lower probability.

  4. Converting to the Odds Ratio (OR) makes interpretation easier:
    OR > 1 means increased odds, OR < 1 means decreased odds.

Practice with Orange Data Mining

Workflow similar to the first example

Workflow similar to the first example

Conclusion

Logistic Regression

  • Works well if the data has a linear relationship.

  • Easy to interpret, but cannot capture complex relationships.

Decision Tree

  • Suitable for more complex relationships and can be adjusted to reduce overfitting (e.g., using pruning).

  • Easy to interpret when the tree is small, but can become difficult to read if the tree grows too deep.

Reference