International College of Digital Innovation, CMU
April 8, 2025
A bar plot
is a type of chart or graph that uses bars to represent data values.
It is essentially a visual display of information where individual bars or columns correspond to different categories or groups, and the length or height of each bar reflects the magnitude of the data it represents.
Bar plots are commonly used for visualizing categorical data and making comparisons between different groups.
The bar plot components:
Axes:
The horizontal axis typically represents categories or groups.
The vertical axis represents the values, frequencies, or percentages associated with each category.
Bars:
Each bar is a rectangular column that extends from the baseline (zero) to a height corresponding to the value it represents.
The width of the bars may vary, but they are usually uniform.
Spacing:
bar plots are useful for showing the distribution of data, identifying patterns, and making comparisons between different categories.
In R, you can create a bar plot using the barplot()
function. We will create a basic bar plot with random data:
Let’s break down the code:
vector:
values
is a vector containing the corresponding values for each category.
categories
is a vector containing the names of the categories.
argument:
height
is the data you want to visualize. (numeric variable)
names.arg
is used to specify the names of the categories. (character/factor variable)
col
sets the color of the bars (you can choose any color).
main
sets the title of the plot.
ylab
and xlab
set the labels for the y-axis and x-axis, respectively.
The table() function in R is used to create a contingency table, which shows the frequency distribution of a variable or the cross-tabulation of multiple variables.
It counts the number of occurrences of each unique value in a vector, factor, or set of factors.
Rotate graph by set horiz =TRUE
color: “#ffff00”
color: “#0000ff”
Use t() to create another bar plot
The argument beside = TRUE
is used to create grouped bar plots, where bars corresponding to different categories are placed side by side, rather than stacked on top of each other.
color: “#ff0000”
color: “#00ff00”
color: “#0000ff”
Task:
Create a basic bar plot to visualize the frequency of different gear types in the mtcars
dataset.
Instructions:
Load the mtcars
dataset using data(mtcars)
.
Use the table()
function to calculate the frequency of the gear
variable.
Use the barplot()
function to create a bar plot.
Add a title and labels to the axes using the main
, xlab
, and ylab
arguments.
Task:
Create a horizontal bar plot of the frequency of cylinder types in the mtcars
dataset.
Instructions:
Use the table()
function to calculate the frequency of the cyl
variable.
Use the barplot()
function with the horiz = TRUE
argument to create a horizontal bar plot.
Customize the colors of the bars using the col
argument.
Task:
Create a grouped bar plot to compare the frequency of cylinder types (cyl
) by the number of gears (gear
) in the mtcars
dataset.
Instructions:
Use the table()
function to create a contingency table of cyl
and gear
.
Use the barplot()
function to create a grouped bar plot by setting beside = TRUE
.
Add a legend to the plot using the legend.text
and args.legend
arguments.
Solution:
# Create a contingency table of cyl and gear
cyl_gear_table <- table(mtcars$cyl, mtcars$gear)
# Create a grouped bar plot
barplot(cyl_gear_table,
beside = TRUE,
main = "Cylinder Types by Gear Types",
xlab = "Number of Gears",
ylab = "Frequency",
col = c("red", "green", "blue"),
legend.text = rownames(cyl_gear_table),
args.legend = list(title = "Cylinders", x = "topright"))
Task:
Create a stacked bar plot to show the distribution of cylinder types (cyl
) within different gear types (gear
) in the mtcars
dataset.
Instructions:
Use the table()
function to create a contingency table of cyl
and gear
.
Use the barplot()
function to create a stacked bar plot (default behavior).
Customize the colors of the bars using a color palette.
Solution:
# Create a contingency table of cyl and gear
cyl_gear_table <- table(mtcars$cyl, mtcars$gear)
# Create a stacked bar plot
barplot(cyl_gear_table,
main = "Stacked Bar Plot of Cylinder Types by Gear Types",
xlab = "Number of Gears",
ylab = "Frequency",
col = c("red", "green", "blue"),
legend.text = rownames(cyl_gear_table),
args.legend = list(title = "Cylinders", x = "topright"))
Task:
Create a bar plot showing the counts of different species in the iris
dataset, with custom axis labels, colors, and bar widths.
Instructions:
Use the table()
function to calculate the frequency of the Species
variable in the iris
dataset.
Use the barplot()
function to create the bar plot.
Customize the bar plot with the following:
Use custom colors for each bar.
Add custom names to the x-axis labels using the names.arg
argument.
Adjust the width of the bars using the width
argument.
Solution:
# Calculate the frequency of species
species_freq <- table(iris$Species)
# Create the bar plot with customizations
barplot(species_freq,
main = "Frequency of Species in Iris Dataset",
xlab = "Species",
ylab = "Frequency",
col = c("purple", "orange", "cyan"),
names.arg = c("Setosa", "Versicolor", "Virginica"),
width = 0.7,
border = "black")
A pie chart is a circular statistical graphic that is divided into slices to illustrate numerical proportions. In R, you can create a pie chart using the pie()
function.
See Listing 1 for details.
pie() Function:
The pie()
function creates a pie chart.
The first argument x
is the data you want to visualize, which in this case is the frequency of different cylinder types.
The main argument specifies the title of the chart.
The col
argument is used to set custom colors for the slices of the pie chart.
The labels
argument customizes the labels that appear on the slices of the pie chart.
The pie chart is one of the most popular charts — but also one of the most criticized in data visualization.
❌ Why pie charts are often not recommended:
🎯 Humans are bad at comparing angles
It’s hard for people to accurately compare the size of pie slices
especially when values are close.
Bar charts allow for easier and more accurate comparison using length, not angle.
Example: Can you easily tell the difference between 23% and 27% in a pie chart? Not really.
🍩 Too many slices = confusion
Pie charts get messy with more than 4–5 categories.
Colors become harder to distinguish.
Labels may overlap or require a legend, which makes interpretation slower.
📉 No clear axis = no easy comparison
There’s no y-axis or consistent baseline.
We lose the advantage of alignment (as in bar plots) which helps the eye compare quantities.
📊 Bar chart does everything better
Bar charts:
Are easier to read
Handle small differences better
Are simpler to label
Are more scalable to many categories
✅ When pie charts might still be acceptable:
Use Case | Okay? |
---|---|
Only 2–3 clear categories | ✅ Yes |
Audience is general/public | ✅ Maybe |
Style or storytelling focus | ✅ Yes (e.g. in infographics) |
Detailed analysis or comparison | ❌ No |
Feature | Pie Chart | Bar Chart |
---|---|---|
Easy to compare | ❌ Hard | ✅ Easy |
Best for small N | ✅ Yes (2–3 items) | ✅ Yes (scalable) |
Labels/readability | ❌ Poor | ✅ Better |
Style/visual appeal | ✅ Sometimes | ⚖️ Depends on use |
Edward Tufte (data viz pioneer): “The only thing worse than a pie chart is several of them.”
Stephen Few: Recommends bar or dot plots for almost every case where pie charts are used.
This component is an instance of the CodeMirror interactive text editor. The editor has been configured so that the Tab key controls the indentation of code. To move focus away from the editor, press the Escape key, and then press the Tab key directly after it. Escape and then Shift-Tab can also be used to move focus backwards.