International College of Digital Innovation, CMU
September 18, 2025
Classification problems in business arise in many contexts. These problems aim to predict which predefined group new data should belong to. Common examples in business include:
Example Problems:
Example Problems:
Example Problems:
Example Problems:
Example Problems:
Example Problems:
Example Problems:
Example Problems:
Example Problems:
These are models commonly used for classification problems,
but they differ in how they work and how their results are interpreted:
Logistic Regression Equation
\[ Pr(y=1|x) = \frac{1}{1 + e^{-(\beta_0 + \beta_1x_1 + \beta_2x_2 + \cdots + \beta_nx_n)}} \]
How it works
Logistic Regression uses a mathematical linear model to predict the probability of a class.
It applies the logistic (sigmoid) function to transform predictions into values between 0 and 1.
The output is interpreted as follows:
if the probability > 0.5 (or a chosen threshold), the model classifies the instance as class 1; otherwise, as class 0.
Advantages
Suitable when there is a linear relationship between independent variables (\(x\)) and the log-odds.
Easy to interpret results (e.g., coefficients indicate the direction and magnitude of relationships).
Disadvantages
Cannot capture complex or non-linear relationships.
Sensitive to outliers.
How it works
A Decision Tree uses recursive partitioning to split the data repeatedly, creating a tree-like structure.
Each node selects the most suitable variable and splitting criterion (e.g., Gini index or entropy) to maximize classification effectiveness.
Leaf nodes represent the final class of the data.
Advantages
Easy to understand and interpret (the tree shows explicit rules for classification).
Can capture non-linear relationships effectively.
Works well with mixed data types (numeric and categorical).
Disadvantages
Prone to overfitting if the tree grows too deep.
Less robust when the dataset is very noisy.
| Feature | Logistic Regression | Decision Tree |
|---|---|---|
| Complexity of relationships | Suitable for linear relationships | Can capture non-linear relationships |
| Interpretability | Easy to interpret (based on coefficients) | Easy to interpret (tree structure) |
| Sensitivity to outliers | High | Low |
| Risk of overfitting | Low | High (if tree depth not controlled) |
| Handling of noisy data | Low | Moderate |
A decision is a test on a feature that helps separate classes.
Previous Example:
In the Iris dataset: βIs Petal.Length β€ 2.45?β
Each decision aims to maximize class purity (using Gini index, Entropy/Information Gain, or Misclassification Error).
Components of a Classification Tree
Root Node
Decision Nodes
Branches
Leaf Nodes (Terminal Nodes)
Summary
Root Node β entry point (whole dataset).
Decision Nodes β internal splits (feature tests).
Branches β paths (Yes/No).
Leaf Nodes β final classification output.
Suppose we want to build a Logistic Regression model to predict
whether a customer will purchase a product (Buy = Yes/No),
based on information such as Age, Income, and Gender.
Variables:
Age: Customerβs age
Income: Annual income
Gender: Gender (Male/Female)
Buy: Purchase decision (No/Yes)
The logistic regression model is built using the basic equation:
The logistic regression model is built using the basic equation:
\[ \begin{aligned} \text{Logit}(P) =& \beta_0 + \beta_1 \cdot \text{Age} + \beta_2 \cdot \text{Income}\\ &+ \beta_3 \cdot \text{GenderMale} \end{aligned} \]
\(\text{Logit}(P)\) is the logarithm of the odds:
\[ \text{Logit}(P) = \ln\left(\frac{P}{1-P}\right) \]
Where \(P\) is the probability that the outcome is βYesβ (\(P(\text{Buy} = \text{Yes})\)).
\(\beta_0\): Intercept (constant term)
\(\beta_1, \beta_2, \beta_3\): Coefficients of the variables
\(\text{GenderMale}\): A dummy variable created from the categorical variable Gender,
where βMaleβ = 1 and βFemaleβ = 0
Build a Logistic Regression Model
The model will predict whether a customer will purchase (Buy = Yes/No)
using the variables Age, Income, Gender.
Interpreting the Coefficients
Intercept = -0.6951
The Intercept represents the log-odds when all variables are 0
(e.g., Age = 0, Income = 0, and Gender = Female).
However, the intercept usually has no direct business interpretation.
Age = 0.002941
This means that for every 1-year increase in age, the odds of purchase (Buy = 1) change according to the Odds Ratio:
\[e^{0.002941} \approx 1.0029\]
or a very small change (~0.3%), which is not statistically significant (p-value = 0.792).
Income = -8.422e-07
This means that for every 1-unit increase in income (e.g., 1 dollar), the odds of purchase slightly decrease,
but the effect is so small that it is practically meaningless.
The very high p-value (0.898) indicates this variable has no effect on purchase decisions.
GenderMale = 0.1274
This means that males are slightly more likely to purchase compared to females,
with an Odds Ratio of:
\[e^{0.1274} \approx 1.136\]
or about a 13.6% increase in purchase likelihood.
However, since the p-value = 0.665 is very high,
there is no statistical evidence that this difference is significant.
We want to predict whether a customer will apply for a credit card using the following variables:
Logistic Regression Equation
Based on the model summary, the equation may look like this:
\[ \begin{aligned} \text{Logit}(P) =&\beta_0 + \beta_1 \cdot \text{Spending} + \beta_2 \cdot \text{CreditScore} \\ &+ \beta_3 \cdot \text{MarriedSingle} \\&+ \beta_4 \cdot \text{MarriedDivorced} \end{aligned} \]
Call:
glm(formula = Subscribed ~ Spending + CreditScore + Married,
family = binomial, data = data2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 6.154e-01 6.029e-01 1.021 0.3074
Spending -1.718e-05 4.713e-05 -0.364 0.7155
CreditScore -1.370e-03 7.682e-04 -1.783 0.0745 .
MarriedMarried -4.316e-01 3.684e-01 -1.171 0.2414
MarriedSingle -5.690e-01 3.724e-01 -1.528 0.1266
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 381.91 on 299 degrees of freedom
Residual deviance: 375.85 on 295 degrees of freedom
AIC: 385.85
Number of Fisher Scoring iterations: 4
Interpretation of Results
(1) Coefficients (Estimates)
Intercept = 0.6154
Subscribed = 1) when Spending and CreditScore are 0, and the customer belongs to the reference group (Divorced).Spending = -1.718e-05 (~0)
CreditScore = -0.00137
MarriedMarried = -0.4316
MarriedSingle = -0.5690
Log-Odds
Log-Odds is the output of a Logistic Regression model, calculated from the equation:
\[ \begin{aligned} \log\left(\frac{Pr(\text{Event})}{1 - Pr(\text{Event})}\right) = &\beta_0 + \beta_1 X_1 + \beta_2 X_2\\ &+ \dots + \beta_n X_n \end{aligned} \]
1. Meaning of Log-Odds
Log-Odds measures the likelihood of the event of interest (Event = 1).
When Log-Odds increases β the probability of the event occurring increases.
When Log-Odds decreases β the probability of the event occurring decreases.
If Log-Odds = 0, the event and non-event are equally likely (Pr(Event) = 0.5).
2. Converting Log-Odds to Probability
Because Log-Odds is not intuitive, we usually transform it into a probability using:
\[ Pr(\text{Event}) = \frac{e^{\text{Log-Odds}}}{1 + e^{\text{Log-Odds}}} \]
Examples:
If Log-Odds = 0, then \(Pr(Event) = \frac{e^0}{1+e^0} = 0.5\) (50%).
If Log-Odds = 2, then \(Pr(Event) = \frac{e^2}{1+e^2} \approx 0.88\) (88%).
If Log-Odds = -2, then \(Pr(Event) = \frac{e^{-2}}{1+e^{-2}} \approx 0.12\) (12%).
3. Relationship Between Coefficients and Log-Odds
In logistic regression, the coefficients (Estimates) represent the change in Log-Odds when \(X\) increases by 1 unit.
If \(\beta > 0\) β as \(X\) increases, Log-Odds increase β probability of the event increases.
If \(\beta < 0\) β as \(X\) increases, Log-Odds decrease β probability of the event decreases.
4. Converting Log-Odds to Odds Ratio (OR)
We often use the Odds Ratio (OR) to make Log-Odds easier to interpret, using:
\[ \text{Odds Ratio} = e^{\beta} \]
If OR > 1 β the variable increases the chance of the event occurring.
If OR = 1 β the variable has no effect on the chance of the event.
If OR < 1 β the variable decreases the chance of the event.
Example:
If \(\beta = 0.7\), then Odds Ratio = \(e^{0.7} \approx 2.01\) β
this means the variable doubles the chance of the event occurring.
If \(\beta = -0.7\), then Odds Ratio = \(e^{-0.7} \approx 0.50\) β
this means the variable reduces the chance of the event by 50%.
Summary
Log-Odds indicate the tendency of an event to occur (increase or decrease).
Log-Odds can be converted into probability using the logistic function.
Positive Log-Odds mean higher probability of the event,
while negative values mean lower probability.
Converting to the Odds Ratio (OR) makes interpretation easier:
OR > 1 means increased odds, OR < 1 means decreased odds.
Logistic Regression
Works well if the data has a linear relationship.
Easy to interpret, but cannot capture complex relationships.
Decision Tree
Suitable for more complex relationships and can be adjusted to reduce overfitting (e.g., using pruning).
Easy to interpret when the tree is small, but can become difficult to read if the tree grows too deep.
DemΕ‘ar, J., Zupan, B., Leban, G., & Curk, T. (2013). Orange: Data Mining Toolbox in Python. Journal of Machine Learning Research, 14(1), 2349β2353. Retrieved from http://jmlr.org/papers/v14/demsar13a.html
Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied Logistic Regression (3rd ed.). Hoboken, NJ: Wiley. https://doi.org/10.1002/9781118548387
Quinlan, J. R. (1996). Improved use of continuous attributes in C4.5. Journal of Artificial Intelligence Research, 4, 77β90. https://doi.org/10.1613/jair.279
Orange Data Mining Team. (2022). Classification with Logistic Regression and Decision Trees. Orange Data Mining Blog. Retrieved from https://orangedatamining.com/blog