Visualizing Data in R with ggplot2:
Bar Plot

Somsak Chanaim

International College of Digital Innovation, CMU

October 30, 2024

What is bar plot?

A bar plot (or bar chart) is a type of graph that represents categorical data with rectangular bars.

Each bar’s length or height is proportional to the value it represents.

Bar plots are commonly used to compare the sizes of different categories, making it easy to visualize and interpret differences between groups.

Key Features of a Bar Plot:

  • Categories on One Axis: Typically, the categories are plotted along the x-axis (horizontal axis) in a vertical bar plot. In a horizontal bar plot, the categories are plotted along the y-axis.

  • Bar Length/Height: The length or height of each bar corresponds to the value of the category it represents. The higher or longer the bar, the greater the value.

  • Spacing Between Bars: Bars are usually separated by spaces to distinguish between different categories.

  • Customizable: Bar plots can be customized in various ways, including the color of the bars, the orientation (vertical or horizontal), whether the bars are stacked or grouped, and the use of additional elements like labels, legends, and grid lines.

Types of Bar Plots:

  1. Vertical Bar Plot: Bars are vertical, with categories on the x-axis and values on the y-axis.

  2. Horizontal Bar Plot: Bars are horizontal, with categories on the y-axis and values on the x-axis.

  3. Stacked Bar Plot: Bars are stacked on top of each other to show sub-group values within the same category.

  4. Grouped Bar Plot: Bars for different sub-groups are placed next to each other, making it easier to compare sub-group values within each category.

Uses of Bar Plots:

  • Comparing categories: Bar plots are ideal for comparing the size of different categories within a dataset.

  • Visualizing distributions: They can show the distribution of a categorical variable.

  • Highlighting trends: Bar plots can help highlight trends or patterns in categorical data over time or across different groups.

The geom_bar() function in ggplot2

The geom_bar() function in ggplot2 is used to create bar plots in R.

This function can generate bar plots in two primary ways: by counting the occurrences of each category in a dataset (the default behavior) or by plotting pre-summarized data where the height of the bars represents the values provided.

Explanation:

  • aes(x = class): Maps the class variable to the x-axis.

  • geom_bar(): Automatically counts the number of occurrences for each class.

Bar Plot with Pre-Summarized Data

If you already have summarized data (e.g., counts), you can use geom_bar(stat = "identity").

Explanation:

  • aes(x = group, y = count): Maps grou to the x-axis and count to the y-axis.

  • geom_bar(stat = "identity"): Uses the provided count values directly.

Stacked Bar Plot

A stacked bar plot shows the distribution of a second categorical variable within each bar.

Explanation:

  • aes(fill = drv): Fills the bars based on the drv (drive type) variable.

Grouped Bar Plot

To create grouped bars instead of stacked bars, use position = "dodge" or position = "dodge2".

Explanation:

  • position = "dodge": Places bars for each group side by side instead of stacking them.

  • position = "dodge2": Similar to “dodge”, but with slightly more spacing between the bars.

Bar Plot with Custom Colors

You can customize the colors of the bars using scale_fill_manual() or other color scales.

Remark: The left-hand side shows the class types in the variable class, and the right-hand side shows the color names.

Horizontal Bar Plot

To flip the axes and create a horizontal bar plot, use coord_flip().

Explanation

  • coord_flip(): Swaps the x and y axes to create a horizontal bar plot.

Bar Plot with Facets

You can use facets to create multiple bar plots for different subsets of data.

Explanation:

  • facet_grid(.~ drv): Creates a separate plot for each level of drv.

To sort the bar plot

you can reorder the factor levels of the variable mapped to the x-axis. This can be done using the reorder() function within aes().

Explanation:

  • reorder(class, -table(class)[class]): Reorders the levels of class based on the frequency of each level in descending order.

  • The - sign before table(class)[class] sorts the bars in descending order.

Sorting Bars by a Continuous Variable

If we have a bar plot where the y-axis represents a continuous variable, you can sort the bars based on that variable.

Explanation:

  • reorder(group, -count): Reorders the group variable based on count in descending order.

  • stat = "identity": Tells ggplot2 to use the provided value directly.

Here are 10 exercises to help you practice creating and customizing bar plots using ggplot2 in R:

exercise

Exercise 1: Basic Bar Plot

  • Task: Create a bar plot using the mpg dataset to show the count of cars in each class.

  • Hint: Use geom_bar() with aes(x = class).

solution

mpg |> 
 ggplot() +
  aes(x = class) +
  geom_bar() +
  labs(title = "Count of Cars by Class", 
           x = "Class", 
           y = "Count")

Exercise 2: Bar Plot with Custom Colors

  • Task: Modify the bar plot from Exercise 1 to fill the bars with a custom color, such as steelblue.

  • Hint: Use fill = "steelblue" inside geom_bar().

solution

mpg |>  
ggplot() +
  aes(x = class) +
  geom_bar(fill = "steelblue") +
  labs(title = "Count of Cars by Class", 
           x = "Class", 
           y = "Count")

Exercise 3: Grouped Bar Plot

  • Task: Create a grouped bar plot showing the count of cars by class and drv (drive type).

  • Hint: Map drv to the fill aesthetic and set position = "dodge" in geom_bar().

solution

mpg |> 
ggplot() +
  aes(x = class, fill = drv) +
  geom_bar(position = "dodge") +
  labs(title = "Count of Cars by Class and Drive Type", 
           x = "Class", 
           y = "Count", 
        fill = "Drive Type")

Exercise 4: Stacked Bar Plot

  • Task: Create a stacked bar plot showing the count of cars by class and drv.

  • Hint: Map drv to the fill aesthetic. The default position is "stack".

solution

mpg |>  
ggplot() +
  aes(x = class, fill = drv) +
  geom_bar() +
  labs(title = "Stacked Bar Plot of Cars by Class and Drive Type", 
           x = "Class", 
           y = "Count", 
        fill = "Drive Type")

Exercise 5: Horizontal Bar Plot

  • Task: Create a horizontal bar plot of car counts by class.

  • Hint: Use coord_flip() to flip the axes.

solution

mpg |>  
ggplot() +
  geom_bar(fill = "lightgreen") +
  aes(x = class) +
  labs(title = "Horizontal Bar Plot of Cars by Class", 
           x = "Count", 
           y = "Class") +
    coord_flip() 

Exercise 6: Bar Plot Sorted by Count

  • Task: Create a bar plot of car counts by class, sorting the bars in descending order by count.

  • Hint: Use reorder(class, -table(class)[class]) within aes(x = ...).

solution

mpg |> 
 ggplot() +
   aes(x = reorder(class, -table(class)[class])) +
   geom_bar() +
   labs(title = "Sorted Count of Cars by Class", 
            x = "Class", 
            y = "Count")

Exercise 7: Bar Plot with Custom Labels

  • Task: Add custom labels to the x and y axes of the bar plot, and give the plot a descriptive title.

  • Hint: Use labs(title = ..., x = ..., y = ...) to customize labels.

solution

mpg |>  
 ggplot() +
   aes(x = class) +
   geom_bar(fill = "orange") +
   labs(title = "Car Count by Class", 
            x = "Car Class", 
            y = "Total Count")

Exercise 8: Bar Plot with Facets

  • Task: Create a faceted bar plot showing the count of cars by class, with separate panels for each drv value.

  • Hint: Use facet_grid(.~ drv) to create facets.

solution

ggplot(mpg, aes(x = class)) +
  geom_bar(fill = "skyblue") +
  facet_grid(.~ drv) +
  labs(title = "Car Count by Class, Faceted by Drive Type", 
           x = "Class", 
           y = "Count")

Exercise 9: Bar Plot with Custom Fill Colors

  • Task: Create a bar plot showing the count of cars by class, using a custom color palette for the bars.

  • Hint: Use scale_fill_manual(values = c("red", "blue", "green", ...)) to apply custom colors.

solution

mpg |> 
  ggplot() +
   aes(x = class, fill = class) +
    geom_bar() +
    scale_fill_manual(values = c("compact" = "red", 
                                 "midsize" = "blue", 
                                     "suv" = "green", 
                                 "minivan" = "purple", 
                                  "pickup" = "orange", 
                              "subcompact" = "pink", 
                                 "2seater" = "brown")) +
  labs(title = "Custom Color Bar Plot by Car Class", 
           x = "Class", 
           y = "Count", 
        fill = "Class")