International College of Digital Innovation, CMU
September 19, 2025
viewof N0 = Inputs.range([100, 500], {step: 10, label: "N"})
viewof beta0 = Inputs.range([-10, 10], {value: 2, step: 0.2, label: "Intercept (a)"})
viewof beta1 = Inputs.range([-5, 5], {value: 1, step: 0.2, label: "Slope (b)"})
viewof SD = Inputs.range([0.5, 5], {value: 1, step: 0.25, label: "SD"})
viewof clicks = Inputs.button("Click to Random")Linear Regression is a statistical and Machine Learning technique used to model the relationship between
(independent variable or predictor)
(dependent variable or response)
Assuming that the relationship is linear. Regression is a powerful tool for business data analysis.
Pearson Correlation is a statistical measure used to evaluate the linear relationship between two variables.
Formula for Calculating Pearson Correlation
\[r = \dfrac{\displaystyle\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\displaystyle\sum_{i=1}^n (x_i - \bar{x})^2 \sum_{i=1}^n (y_i - \bar{y})^2}}\]
Where:
\(x_i, y_i\) are the sample data points of variables \(x\) and \(y\)
\(\bar{x}, \bar{y}\) are the means of \(x\) and \(y\)
\(n\) is the number of observations
Values of Pearson Correlation
The value of \(r\) ranges from \(-1\) to \(+1\):
\(r = +1\): Perfect positive relationship (Positive Linear Relationship)
\(r = -1\): Perfect negative relationship (Negative Linear Relationship)
\(r = 0\): No linear relationship between \(x\) and \(y\).
| Value \(|r|\) | Level of Relationship |
|---|---|
| \(0.9\) to \(1.0\) | Very Strong linear relationship |
| \(0.7\) to \(0.9\) | Strong linear relationship |
| \(0.5\) to \(0.7\) | Moderate linear relationship |
| \(0.3\) to \(0.5\) | Weak linear relationship |
| \(0.0\) to \(0.3\) | Very Weak (almost no) linear relationship |
Measures only linear relationships Pearson correlation is applicable only when the relationship between variables is linear. If the relationship is nonlinear, the value of \(r\) may misleadingly suggest no relationship.
Sensitive to outliers The value of \(r\) can change drastically if outliers are present in the data.
Requires quantitative data Pearson correlation can only be applied to numerical (quantitative) data.
Assumes normality Both variables should approximately follow a normal distribution, and their variances should be similar.
Example 1: Positive correlation
Example 2: No correlation
Finance: Examining the relationship between two stock prices (CAPM model)
Science: Checking the relationship between variables such as temperature and humidity
Education: Exploring the relationship between study hours and exam performance
Pearson correlation is 0.0222165.
Pearson correlation is -0.0064386.
Pearson correlation is 0.9865504.
Pearson correlation is 0.8250965.
Pearson correlation is 0.8939282.
Linear Regression attempts to find the best-fitting straight-line equation in the form:
\[y = f(x_1,x_2,\cdots,x_n)+\varepsilon =\beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_n x_n + \varepsilon\]
\(y\): Dependent variable (response)
\(x_1, x_2, \dots, x_n\): Independent variables (predictors)
\(\beta_0\): Constant term (intercept) or the point where the line crosses the \(y\)-axis
\(\beta_1, \beta_2, \dots, \beta_n\): Coefficients of the independent variables
\(\varepsilon\): Error term or residual
The goal of Linear Regression is to find the coefficients \(\beta_0, \beta_1, \dots, \beta_n\) that make the straight-line equation describe the data as accurately as possible by minimizing the error between the predicted values (\(\hat{y}\)) and the actual values (\(y\)).
The most common method used to fit the model is Ordinary Least Squares (OLS).
viewof N01 = Inputs.range([10, 30], {step: 10, label: "N"})
viewof beta01 = Inputs.range([-10, 10], {value: 2, step: 0.2, label: "Intercept (a)"})
viewof beta11 = Inputs.range([-5, 5], {value: 1, step: 0.2, label: "Slope (b)"})
viewof SD1 = Inputs.range([3, 7], {value: 1, step: 1, label: "SD"})
viewof clicks2 = Inputs.button("Click to Random")The principle of Ordinary Least Squares (OLS) in the context of the Linear Regression equation mentioned above is to find the coefficients (\(\beta_0, \beta_1, \dots, \beta_n\)) that minimize the squared sum of errors between the actual values (\(y_i\)) and the predicted values (\(\hat{y}_i\)).
viewof sd_lr = Inputs.range([0.1, 5], {value: 1, step: 0.1, label: "Noise SD (σ)"})
viewof click_lr = Inputs.button("Resample data")
// User line (น้ำเงิน)
viewof a_user_lr = Inputs.range([-10, 10], {value: 0, step: 0.5, label: "User intercept (a_user)"})
viewof b_user_lr = Inputs.range([-5, 5], {value: 1, step: 0.2, label: "User slope (b_user)"})OLS: y = + x, SSE =
USER: y = + x, SSE =
OLS (Ordinary Least Squares) tries to find the best line (or plane, if more than one \(x\)) that fits the data by choosing \(\beta_0, \beta_1, \dots, \beta_n\) so that the prediction of \(y\) is as close as possible to the real values.
Principle of OLS
OLS aims to find the coefficients (\(\beta\)) that minimize the
Residual Sum of Squares (RSS):
\[ RSS = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 \]
Where:
\(y_i\): the actual value of the dependent variable for the \(i\)-th observation
\(\hat{y}_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \dots + \beta_n x_{in}\):
the predicted value from the model
\(n\): the number of observations
Mathematical Procedure
\[y_i = \beta_0+ \beta_1x_{1i}+\cdots+\beta_nx_{1i}+\varepsilon_i, ~i =1,2,\cdots,m\]
\[\begin{aligned}Y &= \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_m \end{bmatrix}, X = \begin{bmatrix} 1 & x_{11} & x_{12} & \dots & x_{1n} \\ 1 & x_{21} & x_{22} & \dots & x_{2n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & x_{m1} & x_{m2} & \dots & x_{mn} \end{bmatrix},\\ \beta &= \begin{bmatrix} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_n \end{bmatrix},\varepsilon = \begin{bmatrix} \varepsilon_1 \\ \varepsilon_2 \\ \vdots \\ \varepsilon_m \end{bmatrix}\end{aligned}\]
Predicted values (\(\hat{y}\)) in matrix form:
\[
\hat{Y} = X \beta
\]
Residuals:
\[
\varepsilon = Y - \hat{Y} = Y - X \beta
\]
Residual Sum of Squares (RSS):
\[
RSS = \varepsilon^T \varepsilon = (Y - X \beta)^T (Y - X \beta)
\]
Find \(\beta\) that minimizes \(RSS\):
\[
\hat{\beta} = (X^T X)^{-1} X^T Y
\]
Simple Linear Regression
Uses only one independent variable (\(x\))
Equation:
\[
y = \beta_0 + \beta_1 x + \varepsilon
\]
Example: Using height (\(x\)) to predict weight (\(y\))
Multiple Linear Regression
Uses more than one independent variable (\(x_1, x_2, \dots, x_n\))
Equation:
\[
y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_n x_n + \varepsilon
\]
Example: Using age (\(x_1\)) and income level (\(x_2\)) to predict savings amount (\(y\))
Linearity: The relationship between independent and dependent variables must be linear.
Independence of errors: The error terms (\(\varepsilon\)) should not be correlated with each other.
Constant variance (Homoscedasticity): The errors should have the same variance across all levels of the independent variables.
Normality: The errors should be approximately normally distributed.
No Multicollinearity: Independent variables should not be too highly correlated with each other.
Business Problem: The company wants to forecast sales for the next month.
Regression Equation
\[ \begin{aligned} \text{sales}=&\beta_0+\beta_1\text{advertising_spend}\\ &+\beta_2\text{product_price} + \beta_3\text{promotion}+ \varepsilon \end{aligned} \]
Results
Interpreting the Coefficients
(Intercept) = 3716.95
If advertising_spend, product_price, and promotion are all 0, the expected sales (sales) will be 3716.95 units (this serves as the baseline average sales).
advertising_spend = 0.48895
For every 1-unit increase in advertising spend (advertising_spend), average sales increase by 0.49 units (statistically significant).
product_price = -4.11545
For every 1-unit increase in product price (product_price), average sales decrease by 4.12 units.
However, the p-value for this variable is 0.253, which is greater than 0.05.
This means the effect is not statistically significant (we cannot confirm that price truly impacts sales).
promotion = 1525.80
If promotion (promotion) increases by 1 unit, average sales increase by 1525.8 units (statistically significant).
Model Performance
Residual standard error = 1844
On average, the predicted values (fitted values) deviate from the actual sales by about 1844 units.
Multiple R-squared = 0.61
The model explains approximately 61% of the variation in sales, which is considered a moderate level of explanatory power.
Adjusted R-squared = 0.5978
This value adjusts for the number of predictors in the model.
Since it is slightly lower than the regular R-squared, it suggests that adding product_price may not have substantially improved the model.
F-statistic = 50.04, p-value < 2.2e-16
The model as a whole is statistically significant, because the p-value is less than 0.05.
Business Problem: Determine the optimal product price to maximize sales.
Example: Use Nonlinear Regression or Polynomial Regression to capture the non-linear relationship between price and customer demand.
Regression Equation
\[ \begin{aligned} \text{demand}=\beta_0+\beta_1\text{price}+\beta_2\text{price}^2+\varepsilon \end{aligned} \]
Business Problem: Analyze how investments in different advertising campaigns affect sales.
Independent Variables:
Advertising spend by channel (e.g., Facebook, Google Ads)
Duration of campaign
Dependent Variable:
Example: Use Multiple Regression to identify which marketing channels generate the highest ROAS.
Regression Equation
\[ \begin{aligned} \text{sales_lift}=\beta_0+\beta_1\text{facebook_ads} + \beta_2\text{google_ads}+\varepsilon \end{aligned} \]
Business Problem: A manufacturing company wants to forecast future raw material demand to better manage inventory.
Independent Variables:
Seasonality
Historical Sales
Dependent Variable:
Example: Use Time Series Regression, or combine Regression with time-series models (e.g., ARIMA + Regression).
Regression Equation
\[ \begin{aligned} \text{demand}=\beta_0+ \beta_1\text{sin_term} + \beta_2\text{cos_term}+\varepsilon \end{aligned} \]
Business Problem: A hotel business wants to analyze the factors that influence customer satisfaction.
Independent Variables:
Service Quality
Cleanliness
Room Price
Dependent Variable:
Example: Use Linear Regression to build a model that identifies which factors have the greatest impact on satisfaction scores.
Regression Equation
\[ \begin{aligned} \text{satisfaction} =&\beta_0+\beta_1\text{service_quality} \\&+ \beta_2\text{cleanliness} + \beta_3\text{room_price}+\varepsilon \end{aligned} \]
Demšar, J., Zupan, B., Leban, G., & Curk, T. (2013). Orange: Data Mining Toolbox in Python. Journal of Machine Learning Research, 14, 2349–2353. Retrieved from https://www.jmlr.org/papers/v14/demsar13a.html
Toplak, M., Németh, S., & Demšar, J. (2022). Data mining with visual programming: A case study of Orange. Communications of the ACM, 65(7), 77–85. https://doi.org/10.1145/3507286
Zupan, B., & Demšar, J. (2004). Orange: From experimental machine learning to interactive data mining. White Paper, Faculty of Computer and Information Science, University of Ljubljana. Retrieved from https://orange.biolab.si
Draper, N. R., & Smith, H. (1998). Applied Regression Analysis (3rd ed.). Wiley. https://doi.org/10.1002/9781118625590
Kutner, M. H., Nachtsheim, C. J., Neter, J., & Li, W. (2005). Applied Linear Statistical Models (5th ed.). McGraw-Hill Education.