International College of Digital Innovation, CMU
October 30, 2024
A bar plot (or bar chart) is a type of graph that represents categorical data with rectangular bars.
Each bar’s length or height is proportional to the value it represents.
Bar plots are commonly used to compare the sizes of different categories, making it easy to visualize and interpret differences between groups.
Key Features of a Bar Plot:
Categories on One Axis: Typically, the categories are plotted along the x-axis (horizontal axis) in a vertical bar plot. In a horizontal bar plot, the categories are plotted along the y-axis.
Bar Length/Height: The length or height of each bar corresponds to the value of the category it represents. The higher or longer the bar, the greater the value.
Spacing Between Bars: Bars are usually separated by spaces to distinguish between different categories.
Customizable: Bar plots can be customized in various ways, including the color of the bars, the orientation (vertical or horizontal), whether the bars are stacked or grouped, and the use of additional elements like labels, legends, and grid lines.
Types of Bar Plots:
Vertical Bar Plot: Bars are vertical, with categories on the x-axis and values on the y-axis.
Horizontal Bar Plot: Bars are horizontal, with categories on the y-axis and values on the x-axis.
Stacked Bar Plot: Bars are stacked on top of each other to show sub-group values within the same category.
Grouped Bar Plot: Bars for different sub-groups are placed next to each other, making it easier to compare sub-group values within each category.
Uses of Bar Plots:
Comparing categories: Bar plots are ideal for comparing the size of different categories within a dataset.
Visualizing distributions: They can show the distribution of a categorical variable.
Highlighting trends: Bar plots can help highlight trends or patterns in categorical data over time or across different groups.
The geom_bar()
function in ggplot2 is used to create bar plots in R.
This function can generate bar plots in two primary ways: by counting the occurrences of each category in a dataset (the default behavior) or by plotting pre-summarized data where the height of the bars represents the values provided.
Explanation:
aes(x = class)
: Maps the class
variable to the x-axis.
geom_bar()
: Automatically counts the number of occurrences for each class
.
If you already have summarized data (e.g., counts), you can use geom_bar(stat = "identity")
.
Explanation:
aes(x = group, y = count)
: Maps grou
to the x-axis and count
to the y-axis.
geom_bar(stat = "identity")
: Uses the provided count
values directly.
A stacked bar plot shows the distribution of a second categorical variable within each bar.
Explanation:
aes(fill = drv)
: Fills the bars based on the drv
(drive type) variable.To create grouped bars instead of stacked bars, use position = "dodge"
or position = "dodge2"
.
Explanation:
position = "dodge"
: Places bars for each group side by side instead of stacking them.
position = "dodge2"
: Similar to “dodge”, but with slightly more spacing between the bars.
You can customize the colors of the bars using scale_fill_manual()
or other color scales.
Remark: The left-hand side shows the class types in the variable class, and the right-hand side shows the color names.
To flip the axes and create a horizontal bar plot, use coord_flip()
.
Explanation
coord_flip()
: Swaps the x and y axes to create a horizontal bar plot.You can use facets to create multiple bar plots for different subsets of data.
Explanation:
facet_grid(.~ drv)
: Creates a separate plot for each level of drv
.you can reorder the factor levels of the variable mapped to the x-axis. This can be done using the reorder()
function within aes()
.
Explanation:
reorder(class, -table(class)[class])
: Reorders the levels of class based on the frequency of each level in descending order.
The - sign
before table(class)[class]
sorts the bars in descending order.
If we have a bar plot where the y-axis represents a continuous variable, you can sort the bars based on that variable.
Explanation:
reorder(group, -count)
: Reorders the group
variable based on count
in descending order.
stat = "identity"
: Tells ggplot2 to use the provided value directly.
Here are 10 exercises to help you practice creating and customizing bar plots using ggplot2
in R:
Task: Create a bar plot using the mpg
dataset to show the count of cars in each class
.
Hint: Use geom_bar()
with aes(x = class)
.
Task: Modify the bar plot from Exercise 1 to fill the bars with a custom color, such as steelblue
.
Hint: Use fill = "steelblue"
inside geom_bar()
.
Task: Create a grouped bar plot showing the count of cars by class
and drv
(drive type).
Hint: Map drv
to the fill
aesthetic and set position = "dodge"
in geom_bar()
.
Task: Create a stacked bar plot showing the count of cars by class
and drv
.
Hint: Map drv
to the fill
aesthetic. The default position
is "stack"
.
Task: Create a horizontal bar plot of car counts by class
.
Hint: Use coord_flip()
to flip the axes.
Task: Create a bar plot of car counts by class
, sorting the bars in descending order by count.
Hint: Use reorder(class, -table(class)[class])
within aes(x = ...)
.
Task: Add custom labels to the x and y axes of the bar plot, and give the plot a descriptive title.
Hint: Use labs(title = ..., x = ..., y = ...)
to customize labels.
Task: Create a faceted bar plot showing the count of cars by class
, with separate panels for each drv
value.
Hint: Use facet_grid(.~ drv)
to create facets.
Task: Create a bar plot showing the count of cars by class
, using a custom color palette for the bars.
Hint: Use scale_fill_manual(values = c("red", "blue", "green", ...))
to apply custom colors.
solution
mpg |>
ggplot() +
aes(x = class, fill = class) +
geom_bar() +
scale_fill_manual(values = c("compact" = "red",
"midsize" = "blue",
"suv" = "green",
"minivan" = "purple",
"pickup" = "orange",
"subcompact" = "pink",
"2seater" = "brown")) +
labs(title = "Custom Color Bar Plot by Car Class",
x = "Class",
y = "Count",
fill = "Class")