Data Visualization        

Somsak Chanaim

International College of Digital Innovation, CMU

June 7, 2025

Importance of Data Visualization

1. Business Perspective

Accurate and Timely Decision-Making

Data visualization enables executives to quickly understand market trends, revenue, profits, and customer behavior.

Performance Monitoring

Dashboards can be used to display sales, costs, profit margins, and other key business factors.

Customer Analysis

Analyze demographic data, behavior, and customer segmentation to develop effective marketing strategies.

2. Economics Perspective

Economic Trend Forecasting

Line charts and heat maps help visualize economic conditions such as inflation, GDP growth, and unemployment rates.

Policy Analysis

Governments and economists can use data visualization to design policies that effectively address economic challenges.

Cross-Country/Regional Comparisons

Maps and bar charts can be used to compare economic growth across countries or regions.

3. Finance & Investment

Stock Market Trend Monitoring

Bar and line charts help investors understand stock market behavior, indices, and various assets.

⚖ Risk Management

Use Box Plots or Histograms to analyze risk-related data.

Portfolio Analysis

Visualize profits, losses, and asset allocation within an investment portfolio.

4. Healthcare

Disease Monitoring and Epidemiology

Use heatmaps or bubble charts to track disease outbreaks and trends.

Medical Resource Management

Analyze hospital bed availability, patient volumes, and medication usage rates.

Treatment Development

Use scatter plots or violin plots to analyze data from clinical trials and research studies.

5. Education

Analyzing Academic Achievement

Use data visualization to monitor students’ academic performance.

Enhancing Teaching and Learning Systems

Analyze learner behavior through Learning Analytics Dashboards.

Global Education Comparison

Use maps or bar charts to evaluate the quality of education across countries.

6. Science & Research

Experimental Data Analysis

Use box plots or scatter plots to examine relationships between variables.

Visualizing Complex Data Patterns

Use PCA (Principal Component Analysis) or heatmaps to better understand high-dimensional data.

Communicating Research Findings Effectively

Graphs and visual aids help scientists clearly explain their study results.

Basic Types of Data Visualization to Know

Data visualization comes in many forms, each suited to different types of analysis. Let’s explore the main categories everyone should know.

1. Trend Visualization

Used to Observe Data Trends or Changes Over Time

  • Line Chart: Displays data trends over time, such as monthly sales or stock prices.

  • Area Chart: Similar to a line chart but emphasizes the area under the curve to show accumulated volume.

Example Use Cases: Tracking trends in GDP, inflation rates, or service user volumes.

2. Distribution Visualization

Used to Show the Patterns and Spread of Data

  • Histogram: Visualizes the distribution of values, e.g., population income.

  • Box Plot (Box-and-Whisker Plot): Shows median, minimum, maximum, and outliers.

  • Density Plot: Displays the statistical distribution of data.

Example Use Cases: Analyzing customer spending or student exam scores.

Example: Income Comparison Between Males and Females

3. Comparison Visualization

Used to Compare Data Across Different Categories

  • Bar Chart: Used to compare values across categories, e.g., sales by product.

  • Stacked Bar Chart: Used to compare parts of a whole, such as market share by company.

  • Horizontal Bar Chart: Useful when category labels are long or when comparing many items.

Example Use Cases: Comparing company revenues or the number of customers in each group.

Example: Two employees – Chatchai and Boat

4. Relationship Visualization

Used to Analyze the Relationship Between Two or More Variables

  • Scatter Plot: Shows the relationship between two variables, such as housing price vs. land size.

  • Bubble Plot: Similar to a scatter plot but uses the size of the points to represent a third variable.

Example Use Case: Exploring the relationship between interest rates and investment levels.

5. Geospatial Visualization

Used to Visualize Data Related to Maps or Geographic Coordinates

  • Heat Map: Visualizes data density or distribution over a map, such as population density.

  • Choropleth Map: Displays values by region, such as average income per province.

  • Bubble Map: Uses the position and size of bubbles to show specific values over geographic areas.

Example Use Case: Analyzing sales distribution across regions.

6. Hierarchical & Network Visualization

Used to Visualize Hierarchical Structures or Network Relationships

  • Tree Diagram: Represents tree-like structures such as organizational charts.

  • Sunburst Chart: A circular version of the tree diagram, ideal for nested data.

  • Network Graph: Shows relationships and connections between entities, such as social networks.

Example Use Cases: Analyzing corporate structure or relationships in a social network.

7. Composition Visualization

Used to Show How Different Parts Make Up a Whole

  • Pie Chart: Used to display proportions of categories.

  • Donut Chart: Similar to a pie chart but with a blank center.

  • Treemap: A better alternative to pie charts when there are many categories.

Example Use Case: Showing the proportion of total sales by product category.

There Are Many Types of Data Visualization, Depending on Data Characteristics and Analysis Goals

Choose Based on Your Analytical Purpose

  • Distribution of Data ➝ Histogram, Box Plot

  • Data Comparison ➝ Bar Chart

  • Trends and Changes Over Time ➝ Line Chart

  • Variable Relationships ➝ Scatter Plot

  • Geospatial Data Display ➝ Heat Map, Choropleth

Data Distribution

Histogram

Illustration by ChatGPT

Illustration by ChatGPT

Histogram is a type of bar chart used to represent the distribution of data.

Each bar shows the frequency of data points falling within a specific range or “bin”.

This makes it easier to observe the shape of the distribution or identify clusters and gaps in the data.

Characteristics of a Histogram

  • X-axis: Represents the range of values (bins), which are divided into specified intervals, such as age ranges, test scores, numerical sizes, or time periods.

  • Y-axis: Represents the frequency — the number of data points that fall within each bin.

  • Bars: The height of each bar corresponds to the number of observations in that bin. The taller the bar, the more data falls within that interval.

Histograms are often used to analyze data distribution, such as:

  • Examining the distribution of test scores

  • Analyzing the age distribution of a population

  • Observing the frequency of customer visits across different time periods

  • Exploring numerical data in statistical analysis

Comparing Data Using Histograms

The incomes of Group A and Group B are normally distributed as follows:
Group A ~\(N(\mu_1, \sigma_1^2)\)or\(N(\)2)
Group B ~\(N(\mu_2, \sigma_2^2)\)or\(N(\)2) respectively.

Advantages of Using Histograms

  1. Clear View of Data Distribution:
    Histograms reveal how data is spread—whether it’s concentrated or dispersed.

  2. Detect Anomalies:
    They help identify unusual values or outliers in the dataset.

  3. Easy Comparison:
    You can quickly compare data frequencies across intervals or between groups.

Hands-On Practice

Creating Histograms from File

You will create histograms using the Excel file: histogram.xlsx

Each column represents data from a specific probability distribution:

Variable Distribution Reference Link
x1 Normal distribution Wikipedia (TH)
x2 t-distribution Wikipedia
x3 F-distribution Wikipedia
x4 Beta distribution Wikipedia
x5 Chi-squared distribution Wikipedia
x6 Gamma distribution Wikipedia

Steps in Microsoft Excel

  1. Open histogram.xlsx
  2. Go to the Insert tab → choose Histogram from the Charts section
  3. Select data for each variable (x1 to x6)
  4. Right-click the X-axis → choose Format Axis → adjust Bin width as needed
  5. You can create a separate chart for each distribution

Steps in Jamovi

  1. Open Jamovi

  2. Go to Open → This PC → Load histogram.xlsx

  3. Navigate to the Exploration tab → select Descriptives

  4. Drag variables x1 to x6 into the Variables panel

  5. In the right panel:

    • Enable Plots → Histogram

    • Optionally enable Density for smoother comparison

  6. Click “OK” to generate the plots

Tip for Interpretation

  • x1 (Normal): bell-shaped, symmetric

  • x2 (t-distribution): similar to normal but with heavier tails

  • x3 and x5 (F and Chi-squared): typically skewed right

  • x4 (Beta) and x6 (Gamma): shapes depend on parameters, often skewed

Data Comparison

Bar Chart

Bar Chart

A bar chart uses rectangular bars to represent quantitative data. The X-axis typically displays categories or groups, while the Y-axis shows numerical values such as frequency, quantity, or percentage.

Key Features of a Bar Chart

  • Used to compare values across different categories

  • Can be vertical or horizontal

  • Ideal for comparing group data such as monthly sales or number of customers by branch

1. Standard Bar Chart (Grouped Bar Chart)

Grouped Bar Chart → Used to compare values across different groups side-by-side (in parallel).

2.1 Stacked Bar Chart

Stacked Bar Chart → Used to visualize the composition or proportion of each group stacked on top of one another.

2.2 Proportional Stacked Bar Chart

Used to display the proportion of each category within a group, in a way that makes it easier to compare the overall structure across groups.

3. Horizontal Bar Chart

Horizontal Bar Chart → Useful when category names are long or when horizontal orientation improves readability.

Sample Data for Creating Different Types of Bar Charts in Excel

You can copy the data from this slide and paste it into Excel to create all 4 types of bar charts.

Table-format data cannot be directly used to create a bar chart in Excel.

category group value
A X 10
A Y 15
B X 20
B Y 25
C X 30
C Y 35




\[\rightarrow\]

You must first restructure the data into the format below before you can create a bar chart in Excel:

X Y
A 10 15
B 20 25
C 30 35

Line Chart

Line Chart

A line chart is used to display trends or changes in data over time.

Data Table (Copy and Paste into Excel):

year group value
2000 A 5
2000 B 7
2001 A 8
2001 B 12
2002 A 15
2002 B 18
2003 A 20
2003 B 25
2004 A 28
2004 B 30
2005 A 35
2005 B 40

\[\rightarrow\]

However, you cannot create a proper line chart until the data is restructured like this:

year A B
2000 5 7
2001 8 12
2002 15 18
2003 20 25
2004 28 30
2005 35 40

Variable Relationships

Scatter Plot

Scatter Plot is a chart used to show the relationship between two variables, X and Y, by representing each data point as a dot on the graph.

Change to Log Scale for Better Readability

Applying a log scale (logarithmic axis) can make it easier to observe patterns when data spans several orders of magnitude — especially in skewed distributions or exponential growth.

Download data from google drive

Bubble Plot

A Bubble Plot is an extension of a scatter plot that uses the size of the bubbles to represent an additional variable. This allows you to visualize data in three dimensions (X, Y, and bubble size), or even more.

Components of a Bubble Plot

  • X-axis: A quantitative (numerical) variable

  • Y-axis: A quantitative (numerical) variable

  • Bubble Size: Represents the value of a third variable (e.g., population, sales)

  • Bubble Color (Optional): Can be used to represent categories or groups (e.g., country, product type)

Example Use Cases for Bubble Plot

  • Economics: Display GDP (X) vs. Unemployment Rate (Y), with bubble size representing population.

  • Business: Display Sales (X) vs. Profit (Y), with bubble size representing number of customers.

  • Public Health: Display Life Expectancy (X) vs. Average Income (Y), with bubble size representing population.

Bubble Plot Example

References

  1. Cairo, A. (2016). The truthful art: Data, charts, and maps for communication. New Riders.

  2. Few, S. (2009). Now you see it: Simple visualization techniques for quantitative analysis. Analytics Press.

  3. Knaflic, C. N. (2015). Storytelling with data: A data visualization guide for business professionals. Wiley.

  4. Tufte, E. R. (2001). The visual display of quantitative information (2nd ed.). Graphics Press.

  5. Wickham, H. (2016). ggplot2: Elegant graphics for data analysis (2nd ed.). Springer.

  6. Wilke, C. O. (2019). Fundamentals of data visualization: A primer on making informative and compelling figures. O’Reilly Media.