Histogram

Sample Dataset

Exercises

Plot a histogram of age to observe the distribution. Use ggplot2 to create this basic plot.

Solution:

data |> 
     ggplot() + 
     aes(x = age) +
     geom_histogram() +
     labs(title = "Distribution of Age", 
              x = "Age", 
              y = "Count")

Plot a histogram of income with a specific bin width (e.g., bin width of 5000). Experiment with different bin widths to see how they affect the histogram.

Solution:

data |> 
     ggplot() +
     aes(x = income) +
     geom_histogram(binwidth = 5000) +
     labs(title = "Income Distribution with Bin Width = 5000", 
              x = "Income", 
              y = "Count")

Plot a histogram of spending_score and change the fill color to “skyblue” to give it a distinct look.

Solution:

data |> 
  ggplot() +
  aes(x = spending_score) +
  geom_histogram(fill = "skyblue") +
  labs(title = "Spending Score Distribution", 
           x = "Spending Score", 
           y = "Count")

Plot a histogram of age, change the fill color to “orange,” and set the border color to “black” for each bin.

Solution:

data |>  
  ggplot() +
  aes(x = age) +
  geom_histogram(fill = "orange", color = "black") +
  labs(title = "Distribution of Age with Orange Fill and Black Borders", 
           x = "Age", 
           y = "Count")

Overlay a density plot on top of a histogram of income to observe both the histogram and the smoothed density line.

Solution:

data |> 
  ggplot() +
  aes(x = income, y = after_stat(density)) +
  geom_histogram( fill = "lightblue", 
                 color = "black", 
                 alpha = 0.7) +
  geom_density(color = "red", lwd = 1) +
  labs(title = "Income Distribution with Density Overlay", 
           x = "Income", 
           y = "Density")

Create a histogram of spending_score and use facet_grid to separate the histograms by gender.

Solution:

data |>  
  ggplot() +
  aes(x = spending_score) +
  geom_histogram(fill = "purple", color = "black", bins = 15) +
  facet_grid(. ~ gender) +
  labs(title = "Spending Score Distribution by Gender", 
           x = "Spending Score", 
           y = "Count")

Create a histogram of income and use facet_grid to split it by region to compare distributions across regions.

Solution:

data |>  
  ggplot() +
  aes(x = income) +
  geom_histogram(fill = "green", color = "black", bins = 20) +
  facet_grid(. ~ region) +
  labs(title = "Income Distribution by Region", 
           x = "Income", 
           y = "Count")

Create a histogram of age and apply a different theme (e.g., theme_minimal, theme_classic, etc.).

Solution:

data |>  
  ggplot() +
  aes(x = age) +
  geom_histogram(fill = "blue", color = "white") +
  theme_minimal() +
  labs(title = "Age Distribution with Minimal Theme", 
           x = "Age", 
           y = "Count")

Create a histogram of income and fill the bars based on the gender variable.

Solution:

data |>  
ggplot() +
  aes(x = income, fill = gender) +
  geom_histogram(position = "identity", 
                    alpha = 0.6, 
                    color = "black", 
                     bins = 20) +
  scale_fill_manual(values = c("Male" = "lightblue", 
                             "Female" = "pink")) +
  labs(title = "Income Distribution by Gender", 
           x = "Income", 
           y = "Count", 
        fill = "Gender")

Create a histogram of spending_score and use facet_grid(gender ~ region) to create a grid of histograms by both gender and region.

Solution:

data |>  
  ggplot() +
  aes(x = spending_score) +
  geom_histogram(fill = "orange", color = "black", bins = 15) +
  facet_grid(gender ~ region) +
  labs(title = "Spending Score by Gender and Region", 
           x = "Spending Score", 
           y = "Count")