\(\)Classification Models\(\)

Somsak Chanaim

International College of Digital Innovation, CMU

September 18, 2025

Introduction

Classification problems in business arise in many contexts. These problems aim to predict which predefined group new data should belong to. Common examples in business include:

1. Customer Analytics

Example Problems:

Customer Churn Prediction 🏃♂️💨:
- Classify customers as likely to churn or retain, in order to design effective retention strategies.
Customer Segmentation 👥📊:
- Classify customers into groups such as premium customers, regular customers, or high-potential target groups.
Customer Loyalty Prediction 💙🤝:
- Classify whether a customer will become a loyal customer to the brand or not.

2. Risk Analytics

Example Problems:

Credit Scoring 💳📈:
- Classify customers as high-risk (default) or low-risk for loan repayment.
Fraud Detection 🕵️‍♂️⚠️:
- Classify transactions as likely fraudulent or normal.
Insurance Risk Assessment 🛡️📉:
- Classify policyholders as high-risk or low-risk.

3. Marketing Analytics

Example Problems:

Campaign Response Prediction 📢✅:
- Classify customers as likely or unlikely to respond to a marketing campaign (e.g., clicking a link or purchasing a product).
Sentiment Analysis 😀😡😐:
- Classify customer reviews as positive, negative, or neutral.
Recommendation System 🛒✨:
- Classify which products customers are most likely to purchase.

4. Supply Chain Management

Example Problems:

Predictive Maintenance 🏭🛠️:
- Classify machines as likely to fail or still healthy/operational.
Order Prioritization 📦⚡:
- Classify orders as urgent (priority shipping) or normal delivery.

5. HR Analytics

Example Problems:

Employee Attrition Prediction 🧑💼🚪:
- Classify whether an employee is likely to leave (attrition) or stay in the organization.
Employee Potential Assessment 🌟📊:
- Classify employees based on potential, such as high-potential (rising stars) or low-potential (needs skill development).

6. Financial Management

Example Problems:

Liquidity Analysis 💧🏦:
- Classify companies as financially liquid (good liquidity) or not.
Portfolio Classification 📊💼:
- Classify investment portfolios based on risk levels such as high, medium, or low risk.

7. Retail and E-commerce

Example Problems:

Purchase Prediction 🛍️🤔:
- Classify which product category a customer is most likely to purchase.
Inventory Management 📦📈:
- Classify products as high-demand or low-demand.
Product Recommendation 🛒🔗:
- Classify which products customers are most likely to purchase together.

8. Healthcare

Example Problems:

Disease Diagnosis 🧬🩺:
- Classify patients as being at risk for certain diseases (e.g., cancer, diabetes) or healthy.
Readmission Prediction 🏥🔄:
- Classify whether a patient is likely to be readmitted to the hospital.
Risk Stratification ⚕️📊:
- Classify patients based on the severity of their conditions.

9. Technology and IT

Example Problems:

Malware Detection🦠💻:
- Classify software as malware or not.
Spam Detection📧🚫:
- Classify emails as spam or normal.
Error Classification⚠️🛠️:
- Classify system errors, such as hardware-related or software-related issues.

Classification Models

Logistic Regression and Decision Tree

These are models commonly used for classification problems,
but they differ in how they work and how their results are interpreted:

Logistic Regression
Decistion Tree

1. Logistic Regression

Logistic Regression Equation

\[ Pr(y=1|x) = \frac{1}{1 + e^{-(\beta_0 + \beta_1x_1 + \beta_2x_2 + \cdots + \beta_nx_n)}} \]

How it works

Logistic Regression uses a mathematical linear model to predict the probability of a class.
It applies the logistic (sigmoid) function to transform predictions into values between 0 and 1.
The output is interpreted as follows:
if the probability > 0.5 (or a chosen threshold), the model classifies the instance as class 1; otherwise, as class 0.

Advantages

Suitable when there is a linear relationship between independent variables (\(x\)) and the log-odds.
Easy to interpret results (e.g., coefficients indicate the direction and magnitude of relationships).

Disadvantages

Cannot capture complex or non-linear relationships.
Sensitive to outliers.

2. Decision Tree

How it works

A Decision Tree uses recursive partitioning to split the data repeatedly, creating a tree-like structure.
Each node selects the most suitable variable and splitting criterion (e.g., Gini index or entropy) to maximize classification effectiveness.
Leaf nodes represent the final class of the data.

Advantages

Easy to understand and interpret (the tree shows explicit rules for classification).
Can capture non-linear relationships effectively.
Works well with mixed data types (numeric and categorical).

Disadvantages

Prone to overfitting if the tree grows too deep.
Less robust when the dataset is very noisy.

Comparison

Feature	Logistic Regression	Decision Tree
Complexity of relationships	Suitable for linear relationships	Can capture non-linear relationships
Interpretability	Easy to interpret (based on coefficients)	Easy to interpret (tree structure)
Sensitivity to outliers	High	Low
Risk of overfitting	Low	High (if tree depth not controlled)
Handling of noisy data	Low	Moderate

Decision Tree with Iris dataset

viewof xvar = Inputs.radio(["Sepal.Length","Sepal.Width","Petal.Length","Petal.Width"], {label: "X variable", value: "Sepal.Length"});
viewof yvar = Inputs.radio(["Sepal.Length","Sepal.Width","Petal.Length","Petal.Width"], {label: "Y variable", value: "Sepal.Width"});

Select Data

Decision in Classification Trees

A decision is a test on a feature that helps separate classes.

Previous Example:

In the Iris dataset: “Is Petal.Length ≤ 2.45?”
- If Yes → Class = Setosa
- If No → further splitting (next decision)”
Each decision aims to maximize class purity (using Gini index, Entropy/Information Gain, or Misclassification Error).

Components of a Classification Tree

Root Node
- The starting point of the tree.
- Contains the entire dataset before any split.
- Represents the first decision rule that best separates the classes.
- Example: “Sepal.Length < 5.5?”
Decision Nodes
- Internal nodes where the tree continues to split based on feature tests.
- Each decision node asks a Yes/No question about a feature.
- Example: “Sepal.Width ≥ 2.8?” or “Sepal.Length < 6.2?”
Branches
- The connectors (edges) between nodes.
- Show the outcome of a decision at a node.
- Usually labeled “Yes” or “No” depending on whether the condition is true or false.
Leaf Nodes (Terminal Nodes)
- The endpoints of the tree.
- Represent the final classification result.
- Each leaf is assigned a class label (e.g., Setosa, Versicolor, Virginica).
- Often includes class probabilities or percentages.

Summary

Root Node → entry point (whole dataset).
Decision Nodes → internal splits (feature tests).
Branches → paths (Yes/No).
Leaf Nodes → final classification output.

Example: Predicting Whether a Customer Will Purchase

Suppose we want to build a Logistic Regression model to predict
whether a customer will purchase a product (Buy = Yes/No),
based on information such as Age, Income, and Gender.

Variables:
- Age: Customer’s age
- Income: Annual income
- Gender: Gender (Male/Female)
- Buy: Purchase decision (No/Yes)

The logistic regression model is built using the basic equation:

\[ \begin{aligned} \text{Logit}(P) =& \beta_0 + \beta_1 \cdot \text{Age} + \beta_2 \cdot \text{Income}\\ &+ \beta_3 \cdot \text{GenderMale} \end{aligned} \]

\(\text{Logit}(P)\) is the logarithm of the odds:

\[ \text{Logit}(P) = \ln\left(\frac{P}{1-P}\right) \]

Where \(P\) is the probability that the outcome is “Yes” (\(P(\text{Buy} = \text{Yes})\)).

\(\beta_0\): Intercept (constant term)
\(\beta_1, \beta_2, \beta_3\): Coefficients of the variables
\(\text{GenderMale}\): A dummy variable created from the categorical variable Gender,
where “Male” = 1 and “Female” = 0

Build a Logistic Regression Model

The model will predict whether a customer will purchase (Buy = Yes/No)
using the variables Age, Income, Gender.

Interpreting the Coefficients

Intercept = -0.6951
The Intercept represents the log-odds when all variables are 0
(e.g., Age = 0, Income = 0, and Gender = Female).
However, the intercept usually has no direct business interpretation.
Age = 0.002941
This means that for every 1-year increase in age, the odds of purchase (Buy = 1) change according to the Odds Ratio:

\[e^{0.002941} \approx 1.0029\]

or a very small change (~0.3%), which is not statistically significant (p-value = 0.792).
Income = -8.422e-07
This means that for every 1-unit increase in income (e.g., 1 dollar), the odds of purchase slightly decrease,
but the effect is so small that it is practically meaningless.
The very high p-value (0.898) indicates this variable has no effect on purchase decisions.
GenderMale = 0.1274
This means that males are slightly more likely to purchase compared to females,
with an Odds Ratio of:

\[e^{0.1274} \approx 1.136\]

or about a 13.6% increase in purchase likelihood.
However, since the p-value = 0.665 is very high,
there is no statistical evidence that this difference is significant.

Practice with Orange Data Mining

Example 2: Will a Customer Apply for a Credit Card?

We want to predict whether a customer will apply for a credit card using the following variables:

Spending, CreditScore, Married (Marital Status)

Logistic Regression Equation

Based on the model summary, the equation may look like this:

\[ \begin{aligned} \text{Logit}(P) =&\beta_0 + \beta_1 \cdot \text{Spending} + \beta_2 \cdot \text{CreditScore} \\ &+ \beta_3 \cdot \text{MarriedSingle} \\&+ \beta_4 \cdot \text{MarriedDivorced} \end{aligned} \]


Call:
glm(formula = Subscribed ~ Spending + CreditScore + Married, 
    family = binomial, data = data2)

Coefficients:
                 Estimate Std. Error z value Pr(>|z|)  
(Intercept)     6.154e-01  6.029e-01   1.021   0.3074  
Spending       -1.718e-05  4.713e-05  -0.364   0.7155  
CreditScore    -1.370e-03  7.682e-04  -1.783   0.0745 .
MarriedMarried -4.316e-01  3.684e-01  -1.171   0.2414  
MarriedSingle  -5.690e-01  3.724e-01  -1.528   0.1266  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 381.91  on 299  degrees of freedom
Residual deviance: 375.85  on 295  degrees of freedom
AIC: 385.85

Number of Fisher Scoring iterations: 4

Interpretation of Results

(1) Coefficients (Estimates)

Intercept = 0.6154
- Represents the log-odds of subscription (Subscribed = 1) when Spending and CreditScore are 0, and the customer belongs to the reference group (Divorced).
- This value has little practical meaning.
Spending = -1.718e-05 (~0)
- Spending has virtually no effect on the likelihood of subscription.
CreditScore = -0.00137
- Higher credit scores tend to reduce the likelihood of subscription.
- Odds Ratio = \(e^{-0.00137} \approx 0.9986\) → subscription decreases slightly (~0.14% per 1 unit increase in CreditScore).
MarriedMarried = -0.4316
- Customers who are married are less likely to subscribe compared to divorced customers.
- Odds Ratio = \(e^{-0.4316} \approx 0.65\) → about 35% lower likelihood compared to divorced customers.
MarriedSingle = -0.5690
- Customers who are single are less likely to subscribe compared to divorced customers.
- Odds Ratio = \(e^{-0.5690} \approx 0.57\) → about 43% lower likelihood compared to divorced customers.

What is Log-Odds?

Log-Odds

Log-Odds is the output of a Logistic Regression model, calculated from the equation:

\[ \begin{aligned} \log\left(\frac{Pr(\text{Event})}{1 - Pr(\text{Event})}\right) = &\beta_0 + \beta_1 X_1 + \beta_2 X_2\\ &+ \dots + \beta_n X_n \end{aligned} \]

1. Meaning of Log-Odds

Log-Odds measures the likelihood of the event of interest (Event = 1).
When Log-Odds increases → the probability of the event occurring increases.
When Log-Odds decreases → the probability of the event occurring decreases.
If Log-Odds = 0, the event and non-event are equally likely (Pr(Event) = 0.5).

2. Converting Log-Odds to Probability

Because Log-Odds is not intuitive, we usually transform it into a probability using:

\[ Pr(\text{Event}) = \frac{e^{\text{Log-Odds}}}{1 + e^{\text{Log-Odds}}} \]

Examples:

If Log-Odds = 0, then \(Pr(Event) = \frac{e^0}{1+e^0} = 0.5\) (50%).
If Log-Odds = 2, then \(Pr(Event) = \frac{e^2}{1+e^2} \approx 0.88\) (88%).
If Log-Odds = -2, then \(Pr(Event) = \frac{e^{-2}}{1+e^{-2}} \approx 0.12\) (12%).

3. Relationship Between Coefficients and Log-Odds

In logistic regression, the coefficients (Estimates) represent the change in Log-Odds when \(X\) increases by 1 unit.

If \(\beta > 0\) → as \(X\) increases, Log-Odds increase → probability of the event increases.
If \(\beta < 0\) → as \(X\) increases, Log-Odds decrease → probability of the event decreases.

4. Converting Log-Odds to Odds Ratio (OR)

We often use the Odds Ratio (OR) to make Log-Odds easier to interpret, using:

\[ \text{Odds Ratio} = e^{\beta} \]

If OR > 1 → the variable increases the chance of the event occurring.
If OR = 1 → the variable has no effect on the chance of the event.
If OR < 1 → the variable decreases the chance of the event.

Example:

If \(\beta = 0.7\), then Odds Ratio = \(e^{0.7} \approx 2.01\) →
this means the variable doubles the chance of the event occurring.
If \(\beta = -0.7\), then Odds Ratio = \(e^{-0.7} \approx 0.50\) →
this means the variable reduces the chance of the event by 50%.

Summary

Log-Odds indicate the tendency of an event to occur (increase or decrease).
Log-Odds can be converted into probability using the logistic function.
Positive Log-Odds mean higher probability of the event,
while negative values mean lower probability.
Converting to the Odds Ratio (OR) makes interpretation easier:
OR > 1 means increased odds, OR < 1 means decreased odds.

Practice with Orange Data Mining

Conclusion

Logistic Regression

Works well if the data has a linear relationship.
Easy to interpret, but cannot capture complex relationships.

Decision Tree

Suitable for more complex relationships and can be adjusted to reduce overfitting (e.g., using pruning).
Easy to interpret when the tree is small, but can become difficult to read if the tree grows too deep.

Reference

Demšar, J., Zupan, B., Leban, G., & Curk, T. (2013). Orange: Data Mining Toolbox in Python. Journal of Machine Learning Research, 14(1), 2349–2353. Retrieved from http://jmlr.org/papers/v14/demsar13a.html
Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied Logistic Regression (3rd ed.). Hoboken, NJ: Wiley. https://doi.org/10.1002/9781118548387
Quinlan, J. R. (1996). Improved use of continuous attributes in C4.5. Journal of Artificial Intelligence Research, 4, 77–90. https://doi.org/10.1613/jair.279
Orange Data Mining Team. (2022). Classification with Logistic Regression and Decision Trees. Orange Data Mining Blog. Retrieved from https://orangedatamining.com/blog

\(~~~~~~~~~~\)Classification Models\(~~~~~~~~~~\)

Introduction

1. Customer Analytics

2. Risk Analytics

3. Marketing Analytics

4. Supply Chain Management

5. HR Analytics

6. Financial Management

7. Retail and E-commerce

8. Healthcare

9. Technology and IT

Classification Models

Logistic Regression and Decision Tree

1. Logistic Regression

2. Decision Tree

Comparison

Decision Tree with Iris dataset

Decision in Classification Trees

Classification

Example: Predicting Whether a Customer Will Purchase

Practice with Orange Data Mining

Example 2: Will a Customer Apply for a Credit Card?

What is Log-Odds?

Practice with Orange Data Mining

Conclusion

Reference

\(\)Classification Models\(\)