International College of Digital Innovation, CMU
September 1, 2025
Credit: https://www.thinklytics.io/how-to-choose-the-better-graph-for-data-visualization/
Choosing the right chart depends on the type of data you have and the message you want to convey. Below are general guidelines:
Histogram:
Represents the distribution of a continuous dataset.
Useful for understanding the underlying frequency distribution of continuous or discrete data.
Bar Chart:
Used to compare values across categories.
Helpful for showing trends over time or comparing values across groups.
Pie Chart:
Suitable for showing proportions of a whole.
Avoid using more than 5–7 categories, as it becomes difficult to interpret.
Line Chart:
Ideal for showing trends and changes over continuous intervals (e.g., time, temperature).
Useful when displaying multiple series.
Scatter Plot:
Displays the relationship between two variables.
Helps identify patterns and outliers.
Bubble Chart:
Represents three dimensions of data: x-axis, y-axis, and bubble size.
Useful for showing relationships among three variables.
Financial Chart:
Used for technical analysis of financial data.
Highlights trends, price momentum, etc.
When choosing a chart, consider the nature of your data, the story you want to tell, and your audience.
Experiment with different chart types and select the one that best communicates your message
For further reading, visit https://r-graph-gallery.com
Key components typically found in statistical graphs include:
Axes: Horizontal (X) and vertical (Y) reference lines.
Data Points: Markers representing values.
Lines and Curves: Show trends in continuous data.
Bars: Represent category values in bar graphs.
Title and Labels: Provide context and explain axes.
Symbols: Distinguish datasets (shapes, colors).
Legend Explains symbols and colors.
Color: Enhances clarity and highlights information.
Gridlines: Help interpret values.
Scale: Defines numerical values on axes.
Tick Marks: Mark intervals on axes.
Frame: The boundary enclosing the graph.
A well-designed statistical graph improves clarity and communication.
A histogram is a bar chart that displays the frequency distribution of data within specified intervals (bins).
The x-axis shows value ranges.
The y-axis shows frequency counts.
How a histogram is typically constructed:
Data Collection: Gather a set of data that you want to analyze.
Divide into Intervals (Bins): Divide the range of the data into intervals or bins. Each bin represents a specific range of values.
Count Frequencies: Count the number of data points that fall into each bin.
Create Bars: Draw bars above each bin on the histogram. The height of each bar corresponds to the frequency of data points in that bin.
Histograms are useful for identifying symmetry, skewness, and data spread.
viewof N = Inputs.range([1000, 10000], {step: 100, label: "N"})
viewof myColor = Inputs.color({ label: "Choose a color", value: "#ff0000" })
viewof myText = Inputs.text({ label: "Enter text", placeholder: "Type title" })
viewof Choices = Inputs.radio([
"✔️ Yes",
"❌ No"
], { label: "Theoretical curve", value:
"❌ No" })
viewof clicks = Inputs.button("Click to Random")
The hist()
function in R is used to create a histogram, which is a type of plot that shows the distribution of a numeric variable.
key Arguments
Argument | Description |
---|---|
x |
A numeric vector (the data you want to plot) |
breaks |
Controls the number or placement of bins |
main |
Title of the plot |
xlab , ylab |
Labels for x- and y-axis |
col |
Fill color for the bars |
border |
Color of the borders around bars |
You can find the color names or the color hex codes fromhttps://www.color-hex.com
#fff8e7, #a8e4a0, #b2ec5d, #e8f48c, #bfefff, #e0ffff, #e0b0ff
#fff8e7, #a8e4a0, #b2ec5d, #e8f48c, #bfefff, #e0ffff, #e0b0ff
The mean-variance criteria
The mean-variance criteria is a decision-making approach commonly used in finance and investment theory, it was introduced by Harry Markowitz in 1952 and is a key component of modern portfolio theory (MPT).
The mean-variance criteria aims to optimize investment decisions by considering two key factors: the expected return and the volatility (or risk) of a portfolio.
Mean (Expected Return): This represents the average return an investor can expect from a portfolio. The higher the expected return, the better.
Variance (or Standard Deviation): This measures the volatility or risk associated with the returns of a portfolio. A lower variance indicates less risk.
The mean-variance criteria seeks to find the optimal portfolio by balancing these two factors. Investors are assumed to be risk-averse, meaning they prefer portfolios with higher returns and lower risk.
Work or not work?
We need to put the argument add = TRUE
into the second hist() function.
Question
If you run the code for return.stock3 how to made 3 histogram into the same plot?
You can choose the position by “topleft”, “bottomleft”, “topright”, or “bottomright”.
A density plot is a smoothed version of a histogram that shows the probability density of a continuous variable.
In R, we can create a density plot using the base plot()
and density()
functions
Use pipe operator
Add argument probability = TRUE
into hist() function
xlab
: This argument sets the label for the x-axis in a plot.
ylab
: This argument sets the label for the y-axis in a plot.
lwd
: This stands for “line width” and controls the thickness of lines in a plot. The default line width is lwd = 1, and increasing this value will make the line thicker.
xlim
: This argument sets the limits (range) of the x-axis. It takes a vector of two numbers, where the first number is the lower limit and the second is the upper limit.
ylim
: This argument sets the limits (range) of the y-axis. It also takes a vector of two numbers, specifying the lower and upper limits.
lty
: This stands for “line type” and controls the style of lines in a plot. It accepts integers or character strings representing different line styles. Common lty values:
lty = 1 or “solid”: A solid line (default).
lty = 2 or “dashed”: A dashed line.
lty = 3 or “dotted”: A dotted line.
lty = 4 or “dotdash”: A dot-dash line.
lty = 5 or “longdash”: A long-dash line.
lty = 6 or “twodash”: A two-dash line.
If we need to use the red color for x > 1.75
Step 1: Create a histogram without plot by assign the plot object to new variable and use the argument plot = FALSE
in the hist() function
Step 2: select the midpoint from object h
, In this case we select 1.75. and use the plot() function.
Task:
Create a basic histogram of the mpg
(miles per gallon) variable from the mtcars
dataset.
Instructions:
Load the mtcars
dataset using data(mtcars)
.
Use the hist()
function to create the histogram.
Add a title and labels to the axes using the main
, xlab
, and ylab
arguments.
Task:
Create a histogram of the Sepal.Length
variable from the iris
dataset with custom bin widths.
Instructions:
Load the iris
dataset using data(iris)
.
Use the breaks
argument in the hist()
function to create three histograms with different numbers of bins (e.g., breaks = 5
, breaks = 15
, breaks = 30
).
Observe how changing the number of bins affects the histogram.
Solution:
# Load the iris dataset
data(iris)
# Histogram with 5 bins
hist(iris$Sepal.Length,
breaks = 5,
main = "Histogram of Sepal Length (5 bins)",
xlab = "Sepal Length",
col = "lightgreen",
border = "black")
# Histogram with 15 bins
hist(iris$Sepal.Length,
breaks = 15,
main = "Histogram of Sepal Length (15 bins)",
xlab = "Sepal Length",
col = "lightcoral",
border = "black")
# Histogram with 30 bins
hist(iris$Sepal.Length,
breaks = 30,
main = "Histogram of Sepal Length (30 bins)",
xlab = "Sepal Length",
col = "lightblue",
border = "black")
Task:
Create a histogram of the weight
variable from a custom dataset and apply custom colors to the bars.
Instructions:
Create a vector weight <- c(58, 62, 67, 70, 73, 75, 80, 85, 90, 95, 100, 105, 110)
.
Use the hist()
function to plot the histogram.
Apply a custom color to the bars using the col
argument (e.g., col = "skyblue"
).
Customize the border color of the bars using the border
argument (e.g., border = "black"
).
Task:
Overlay a density line on top of the histogram of the mpg
variable from the mtcars
dataset and add a rug plot.
Instructions:
Load the mtcars
dataset.
Use hist(mtcars$mpg, freq = FALSE)
to plot the histogram with density on the y-axis.
Use the lines()
function with the density()
function to add a density plot.
Add a rug plot using the rug()
function.
Solution:
# Load the mtcars dataset
data(mtcars)
# Plot the histogram with density
hist(mtcars$mpg,
freq = FALSE,
main = "Histogram with Density and Rug Plot",
xlab = "Miles Per Gallon",
ylab = "Density",
col = "lightgray",
border = "black")
# Add a density line
lines(density(mtcars$mpg),
col = "red",
lwd = 2)
# Add a rug plot
rug(mtcars$mpg)
Task:
Compare the histograms of two different variables (Sepal.Length
and Sepal.Width
) from the iris
dataset on the same plot.
Instructions:
Load the iris
dataset.
Use the hist()
function to plot the histogram of Sepal.Length
with a specific color and transparency (col = rgb(1, 0, 0, 0.5)
).
Use the hist()
function again to plot the histogram of Sepal.Width
on the same plot by setting add = TRUE
and choosing a different color (col = rgb(0, 0, 1, 0.5)
).
Add a legend to differentiate between the two histograms.
Solution:
# Load the iris dataset
data(iris)
# Plot the histogram of Sepal.Length
hist(iris$Sepal.Length,
col = rgb(1, 0, 0, 0.5),
main = "Histogram of Sepal Length and Sepal Width",
xlab = "Length/Width",
ylab = "Frequency",
border = "black")
# Add the histogram of Sepal.Width on the same plot
hist(iris$Sepal.Width,
col = rgb(0, 0, 1, 0.5),
add = TRUE,
border = "black")
# Add a legend
legend("topright",
legend = c("Sepal Length", "Sepal Width"),
fill = c(rgb(1, 0, 0, 0.5), rgb(0, 0, 1, 0.5)))