Visualizing Data in R with Default Package: Histograms

Somsak Chanaim

International College of Digital Innovation, CMU

September 1, 2025

Overall

Credit: https://www.thinklytics.io/how-to-choose-the-better-graph-for-data-visualization/

How to Select the Chart?

Choosing the right chart depends on the type of data you have and the message you want to convey. Below are general guidelines:

Histogram:

  • Represents the distribution of a continuous dataset.

  • Useful for understanding the underlying frequency distribution of continuous or discrete data.

Bar Chart:

  • Used to compare values across categories.

  • Helpful for showing trends over time or comparing values across groups.

Pie Chart:

  • Suitable for showing proportions of a whole.

  • Avoid using more than 5–7 categories, as it becomes difficult to interpret.

Line Chart:

  • Ideal for showing trends and changes over continuous intervals (e.g., time, temperature).

  • Useful when displaying multiple series.

Scatter Plot:

  • Displays the relationship between two variables.

  • Helps identify patterns and outliers.

Bubble Chart:

  • Represents three dimensions of data: x-axis, y-axis, and bubble size.

  • Useful for showing relationships among three variables.

Financial Chart:

  • Used for technical analysis of financial data.

  • Highlights trends, price momentum, etc.

When choosing a chart, consider the nature of your data, the story you want to tell, and your audience.

Experiment with different chart types and select the one that best communicates your message

For further reading, visit https://r-graph-gallery.com

How to choose the better graph

https://www.thinklytics.io

The Components of a Statistical Graph

Key components typically found in statistical graphs include:

  1. Axes: Horizontal (X) and vertical (Y) reference lines.

  2. Data Points: Markers representing values.

  3. Lines and Curves: Show trends in continuous data.

  4. Bars: Represent category values in bar graphs.

  5. Title and Labels: Provide context and explain axes.

  6. Symbols: Distinguish datasets (shapes, colors).

  7. Legend Explains symbols and colors.

  8. Color: Enhances clarity and highlights information.

  9. Gridlines: Help interpret values.

  10. Scale: Defines numerical values on axes.

  11. Tick Marks: Mark intervals on axes.

  12. Frame: The boundary enclosing the graph.

A well-designed statistical graph improves clarity and communication.

Histogram

A histogram is a bar chart that displays the frequency distribution of data within specified intervals (bins).

  • The x-axis shows value ranges.

  • The y-axis shows frequency counts.

How a histogram is typically constructed:

  • Data Collection: Gather a set of data that you want to analyze.

  • Divide into Intervals (Bins): Divide the range of the data into intervals or bins. Each bin represents a specific range of values.

  • Count Frequencies: Count the number of data points that fall into each bin.

  • Create Bars: Draw bars above each bin on the histogram. The height of each bar corresponds to the frequency of data points in that bin.

Histograms are useful for identifying symmetry, skewness, and data spread.

Interactive example

The hist() function

The hist() function in R is used to create a histogram, which is a type of plot that shows the distribution of a numeric variable.

key Arguments

Argument Description
x A numeric vector (the data you want to plot)
breaks Controls the number or placement of bins
main Title of the plot
xlab, ylab Labels for x- and y-axis
col Fill color for the bars
border Color of the borders around bars

change the color

You can find the color names or the color hex codes fromhttps://www.color-hex.com

Try their hex colors

#fff8e7, #a8e4a0, #b2ec5d, #e8f48c, #bfefff, #e0ffff, #e0b0ff

Change the title

Change border color

#fff8e7, #a8e4a0, #b2ec5d, #e8f48c, #bfefff, #e0ffff, #e0b0ff

Aplication of the histogram

The mean-variance criteria

The mean-variance criteria is a decision-making approach commonly used in finance and investment theory, it was introduced by Harry Markowitz in 1952 and is a key component of modern portfolio theory (MPT).

The mean-variance criteria aims to optimize investment decisions by considering two key factors: the expected return and the volatility (or risk) of a portfolio.

  • Mean (Expected Return): This represents the average return an investor can expect from a portfolio. The higher the expected return, the better.

  • Variance (or Standard Deviation): This measures the volatility or risk associated with the returns of a portfolio. A lower variance indicates less risk.

The mean-variance criteria seeks to find the optimal portfolio by balancing these two factors. Investors are assumed to be risk-averse, meaning they prefer portfolios with higher returns and lower risk.

Draw the two histograms in the same plot

Work or not work?

How to solve?

We need to put the argument add = TRUE into the second hist() function.

Question

If you run the code for return.stock3 how to made 3 histogram into the same plot?

Add legend

You can choose the position by “topleft”, “bottomleft”, “topright”, or “bottomright”.

Add box

The Density Plot

A density plot is a smoothed version of a histogram that shows the probability density of a continuous variable.

In R, we can create a density plot using the base plot() and density() functions

Use pipe operator

Histogram + Density

Add argument probability = TRUE into hist() function

  • xlab: This argument sets the label for the x-axis in a plot.

  • ylab: This argument sets the label for the y-axis in a plot.

  • lwd: This stands for “line width” and controls the thickness of lines in a plot. The default line width is lwd = 1, and increasing this value will make the line thicker.

Density + Histogram

Histogram + Density

  • xlim: This argument sets the limits (range) of the x-axis. It takes a vector of two numbers, where the first number is the lower limit and the second is the upper limit.

  • ylim: This argument sets the limits (range) of the y-axis. It also takes a vector of two numbers, specifying the lower and upper limits.

  • lty: This stands for “line type” and controls the style of lines in a plot. It accepts integers or character strings representing different line styles. Common lty values:

    • lty = 1 or “solid”: A solid line (default).

    • lty = 2 or “dashed”: A dashed line.

    • lty = 3 or “dotted”: A dotted line.

    • lty = 4 or “dotdash”: A dot-dash line.

    • lty = 5 or “longdash”: A long-dash line.

    • lty = 6 or “twodash”: A two-dash line.

Extra Topic: Add another color to histogram.

If we need to use the red color for x > 1.75

Extra Topic: Add another color to histogram.

Step 1: Create a histogram without plot by assign the plot object to new variable and use the argument plot = FALSE in the hist() function

Step 2: select the midpoint from object h, In this case we select 1.75. and use the plot() function.

Exercise 1: Basic Histogram

Task:

Create a basic histogram of the mpg (miles per gallon) variable from the mtcars dataset.

Instructions:

  • Load the mtcars dataset using data(mtcars).

  • Use the hist() function to create the histogram.

  • Add a title and labels to the axes using the main, xlab, and ylab arguments.

Solution:

# Load the mtcars dataset
data(mtcars)

# Create a basic histogram
hist(mtcars$mpg,
     main = "Histogram of Miles Per Gallon",
     xlab = "Miles Per Gallon",
     ylab = "Frequency",
     col = "lightblue",
     border = "black")

Exercise 2: Adjusting Bin Width

Task:

Create a histogram of the Sepal.Length variable from the iris dataset with custom bin widths.

Instructions:

  • Load the iris dataset using data(iris).

  • Use the breaks argument in the hist() function to create three histograms with different numbers of bins (e.g., breaks = 5, breaks = 15, breaks = 30).

  • Observe how changing the number of bins affects the histogram.

Solution:

# Load the iris dataset
data(iris)

# Histogram with 5 bins
hist(iris$Sepal.Length,
     breaks = 5,
     main = "Histogram of Sepal Length (5 bins)",
     xlab = "Sepal Length",
     col = "lightgreen",
     border = "black")

# Histogram with 15 bins
hist(iris$Sepal.Length,
     breaks = 15,
     main = "Histogram of Sepal Length (15 bins)",
     xlab = "Sepal Length",
     col = "lightcoral",
     border = "black")

# Histogram with 30 bins
hist(iris$Sepal.Length,
     breaks = 30,
     main = "Histogram of Sepal Length (30 bins)",
     xlab = "Sepal Length",
     col = "lightblue",
     border = "black")

Exercise 3: Customizing Colors

Task:

Create a histogram of the weight variable from a custom dataset and apply custom colors to the bars.

Instructions:

  • Create a vector weight <- c(58, 62, 67, 70, 73, 75, 80, 85, 90, 95, 100, 105, 110).

  • Use the hist() function to plot the histogram.

  • Apply a custom color to the bars using the col argument (e.g., col = "skyblue").

  • Customize the border color of the bars using the border argument (e.g., border = "black").

Solution:

# Create a custom weight vector
weight <- c(58, 62, 67, 70, 73, 75, 80, 85, 90, 95, 100, 105, 110)

# Plot the histogram with custom colors
hist(weight,
     main = "Histogram of Weight",
     xlab = "Weight (kg)",
     ylab = "Frequency",
     col = "skyblue",
     border = "black")

Exercise 4: Adding Density and Rug Plot

Task:

Overlay a density line on top of the histogram of the mpg variable from the mtcars dataset and add a rug plot.

Instructions:

  • Load the mtcars dataset.

  • Use hist(mtcars$mpg, freq = FALSE) to plot the histogram with density on the y-axis.

  • Use the lines() function with the density() function to add a density plot.

  • Add a rug plot using the rug() function.

Solution:

# Load the mtcars dataset
data(mtcars)

# Plot the histogram with density
hist(mtcars$mpg,
     freq = FALSE,
     main = "Histogram with Density and Rug Plot",
     xlab = "Miles Per Gallon",
     ylab = "Density",
     col = "lightgray",
     border = "black")

# Add a density line
lines(density(mtcars$mpg),
      col = "red",
      lwd = 2)

# Add a rug plot
rug(mtcars$mpg)

Exercise 5: Comparing Two Distributions

Task:

Compare the histograms of two different variables (Sepal.Length and Sepal.Width) from the iris dataset on the same plot.

Instructions:

  • Load the iris dataset.

  • Use the hist() function to plot the histogram of Sepal.Length with a specific color and transparency (col = rgb(1, 0, 0, 0.5)).

  • Use the hist() function again to plot the histogram of Sepal.Width on the same plot by setting add = TRUE and choosing a different color (col = rgb(0, 0, 1, 0.5)).

  • Add a legend to differentiate between the two histograms.

Solution:

# Load the iris dataset
data(iris)

# Plot the histogram of Sepal.Length
hist(iris$Sepal.Length,
     col = rgb(1, 0, 0, 0.5),
     main = "Histogram of Sepal Length and Sepal Width",
     xlab = "Length/Width",
     ylab = "Frequency",
     border = "black")

# Add the histogram of Sepal.Width on the same plot
hist(iris$Sepal.Width,
     col = rgb(0, 0, 1, 0.5),
     add = TRUE,
     border = "black")

# Add a legend
legend("topright",
       legend = c("Sepal Length", "Sepal Width"),
       fill = c(rgb(1, 0, 0, 0.5), rgb(0, 0, 1, 0.5)))