International College of Digital Innovation, CMU
June 6, 2025
Business Problem refers to a situation or obstacle that prevents a business from achieving its goals.
Examples include:
declining profits
falling sales
losing customers
rising costs
inefficient operations
If left unaddressed, these issues can affect a company’s competitiveness or long-term survival.
Businesses have large volumes of data but lack deep analytical insights
Enables data-driven decision-making
Helps reduce costs, improve efficiency, and forecast future trends
Data-Driven Business Problem Solving
This approach involves using available data to analyze root causes and make strategic decisions based on evidence, rather than just “intuition” or “experience” alone.
Clearly Define the Problem
A good problem should be clear, measurable, and time-bound.
It should be reframed as a data-answerable question.
Collect Relevant Data
Daily/itemized sales data
Customer purchasing behavior
Promotion and competitor activity
Customer feedback and reviews
Technique | Use Case | Example |
---|---|---|
Descriptive Analytics | Describe what happened | “Sales dropped on weekdays” |
Diagnostic Analytics | Identify the cause | “Customers aged 20–30 stopped buying” |
Predictive Analytics | Forecast what may happen | “Sales may decline next quarter” |
Prescriptive Analytics | Recommend actions | “Adjust pricing or launch a campaign” |
Communicate Results Clearly
Dashboards
Data storytelling
Meeting slides focusing on impact and actionable insights
Make Decisions and Monitor Outcomes
Test solutions (e.g., A/B testing)
Measure impact through KPIs
Apply continuous improvement cycles
Problem: Customers Are Not Returning
Data Used:
Purchase history
Time gap between purchases
Customer satisfaction scores (Net Promoter Score - NPS)
Solution Approaches:
Analyze Customer Lifetime Value (CLV)
Perform segmentation and send targeted promotions
Build a recommendation system to encourage repeat purchases
Private Sector
Marketing: Low customer retention, low conversion rates, low brand awareness
Operations: Overstocked inventory, demand fluctuation, slow processes
Finance: Poor cost control, inaccurate profit forecasts, cash flow issues
Human Resources: High employee turnover, low engagement, lack of tech talent
Government / Public Sector
Unequal access to public services: Slow complaint handling, staff shortages
Misallocated budget: Inability to measure policy impact
Lack of data for policy decisions: e.g., outdated data on low-income citizens
Corruption and low transparency: Weak internal auditing systems
Urban issues: Traffic, PM2.5, public health — require data-driven spatial and behavioral management
Individual Level
Personal finance: Overspending, lack of savings, unsure about investments
Career development: Unaware of market-relevant skills, lack of career planning
Health & behavior: Lack of personal health data, risky habits
Lifelong learning: Don’t know how to upskill or choose the right courses
Life decisions: Choosing a house/car, planning for family life
Business Problem | Data Problem | Technique Used |
---|---|---|
Customer churn | Predict churn | Classification |
Poor product sales | Forecast sales | Time Series Forecasting |
Ineffective promotions | Segment customers | Clustering |
Fraud detection | Identify abnormal behavior | Anomaly Detection |
Customer Lifetime Value (CLV)
Customer Retention Rate
Return on Investment (ROI)
Inventory Turnover Rate
Click-Through Rate (CTR)
The total value a customer is expected to generate for the business throughout their entire relationship.
Formula:
\[ \text{CLV} = \text{Average Purchase Value} \times \text{Purchase Frequency} \times \text{Customer Lifespan} \]
Example:
Average purchase = 500 THB/month
Buys every month → 12 times/year
Remains a customer for 3 years
\[ \text{CLV} = 500 \times 12 \times 3 = 18{,}000 \text{ THB} \]
The percentage of existing customers who continue to use the service over a given period.
Formula:
\[ \text{Retention Rate} = \left( \frac{E - N}{S} \right) \times 100 \]
\(E\): Number of customers at the end of the period
\(N\): Number of new customers acquired
\(S\): Number of existing customers at the beginning
Example:
Start of year: 1,000 customers
End of year: 1,200 customers (400 new)
\[ \text{Retention Rate} = \left( \frac{1{,}200 - 400}{1{,}000} \right) \times 100 = 80\% \]
Measures the return on an investment to evaluate its efficiency or profitability.
Formula:
\[ \text{ROI} = \left( \frac{\text{Net Profit}}{\text{Investment Cost}} \right) \times 100 \]
Example:
Investment: 100,000 THB
Net profit: 30,000 THB
\[ \text{ROI} = \left( \frac{30{,}000}{100{,}000} \right) \times 100 = 30\% \]
Measures how quickly inventory is sold and replenished. A higher rate means lower holding time.
Formula:
\[ \text{Inventory Turnover} = \frac{\text{Cost of Goods Sold (COGS)}}{\text{Average Inventory}} \]
Example:
Annual COGS = 1,000,000 THB
Average inventory value = 250,000 THB
\[ \text{Inventory Turnover} = \frac{1{,}000{,}000}{250{,}000} = 4 \text{ cycles/year} \]
Measures the percentage of viewers who clicked on an ad or link.
Formula:
\[ \text{CTR} = \left( \frac{\text{Number of Clicks}}{\text{Number of Impressions}} \right) \times 100 \]
Example:
Ad impressions: 10,000
Clicks: 300
\[ \text{CTR} = \left( \frac{300}{10{,}000} \right) \times 100 = 3\% \]
Behavioral data (e.g., customer actions)
Transaction data (e.g., purchase history)
Real-time logs (e.g., system or clickstream data)
External data (e.g., weather, social media)
Languages: Python, R, SQL
Libraries: Pandas, Scikit-learn, Statsmodels
Visualization: Tableau, Power BI, matplotlib
Cloud Platforms: Google Colab, Azure, AWS
Type | Example Models | When to Use |
---|---|---|
Classification | Logistic Regression, Random Forest | To predict categories |
Regression | Linear Regression, XGBoost | To predict numeric values |
Clustering | K-Means, DBSCAN | To find hidden groups |
Recommendation | Collaborative Filtering | To recommend products |
Time Series | ARIMA, Prophet | To forecast future trends |
Investigate using Time Series Decomposition
Analyze the marketing campaign → Promotion Analysis
Perform Customer Segmentation
Build a Sales Forecasting Model
Create a dashboard for clear and interactive insights
Use data storytelling
Focus on business impact, not just model accuracy
Data Privacy & GDPR (General Data Protection Regulation)
Bias in data can lead to flawed models
Ensure model results are explainable and understandable to stakeholders
Business Problem: Need to forecast daily sales per branch to optimize inventory management.
Models Commonly Used
Time Series Models: ARIMA, SARIMA
Gradient Boosting: XGBoost, LightGBM
Deep Learning: LSTM (Long Short-Term Memory)
Workflow
Collect Data: Daily sales, promotions, holidays, temperature, etc.
Feature Engineering: Create new features such as lag sales, holiday dummy variables
Model Selection & Training: Choose and train the model using historical data
Model Evaluation: Assess using metrics like MAE, RMSE
Deployment: Use the model to forecast future sales
Feedback Loop: Continuously update and improve the model based on performance
Models Used
Collaborative Filtering: Matrix Factorization (SVD, ALS)
Content-Based Filtering: TF-IDF + Cosine Similarity
Deep Learning: Autoencoders, Neural Collaborative Filtering
Workflow
Collect user viewing data (user–movie interaction logs)
Build user and content profiles (user embeddings, movie features)
Train models to predict preferences (e.g., probability of watching)
Generate personalized recommendations
A/B test to evaluate user satisfaction
Improve the model based on feedback and new interactions
Models Used:
Geospatial Clustering: K-Means, DBSCAN
Regression Models: Random Forest Regressor
Predictive Modeling: Gradient Boosting, Decision Trees
Workflow:
Collect data: existing store locations, foot traffic, revenue, competitor density
Create GIS maps: plot coordinates and connect with spatial data
Cluster locations to find areas similar to high-performing stores
Forecast expected sales at potential new locations
Recommend high-potential areas for expansion
Monitor actual performance post-launch
Passengers complain about unexpected fare increases or excessively high prices during peak hours
However, without price adjustments, drivers may reject requests → the challenge is to balance passenger satisfaction vs. driver incentives
Models Used:
Regression Models: Linear, Ridge, Lasso, Gradient Boosting
Reinforcement Learning: Multi-Armed Bandit, Deep Q-Network (DQN)
Demand Forecasting: XGBoost, LSTM (for time-dependent patterns)
Geospatial Analytics: Heatmaps, Clustering
Data Science Pipeline:
Collect relevant data
Ride request volume by location
Trip duration, traffic density
Cancellation and surge pricing behavior
External factors: weather, holidays, events
Analyze Demand & Supply
Detect demand surges via heatmaps and time series
Identify areas with driver shortages
Forecast future demand
Apply Dynamic Pricing Algorithms
Adjust fares based on supply-demand ratio
Use Reinforcement Learning to find the most effective pricing
Conduct A/B Testing
Compare fixed pricing vs. dynamic pricing groups
Evaluate usage, wait times, customer satisfaction
Deploy & Monitor
✅ Outcomes:
Reduced passenger wait times → Higher satisfaction
Maintained driver motivation → More accepted rides
Improved ride matching → Lower cancellation rates
Increased average driver earnings during peak hours
Models Used:
Clustering: K-Means, Hierarchical Clustering
Classification: Random Forest, Logistic Regression
Forecasting: Time Series + Machine Learning
Workflow:
Analyze sales data by store and product
Cluster stores based on sales behavior
Forecast demand per store using customer behavior patterns
Align inventory with the target segment of each store
Use ML to identify “low-demand” items → reduce overproduction
Track weekly performance and adjust models based on actual sales
Understanding the business problem is the first step
Match the problem with the right Data Science techniques
Present actionable and practical results
“Data is valuable only when turned into action.”