Visualizing Data in R with Default Package:
Scatter Plot

Somsak Chanaim

International College of Digital Innovation, CMU

October 30, 2024

Load library

library(gapminder)
library(dplyr)
library(RColorBrewer)

A scatter plot is a type of data visualization that is used to display the relationship between two continuous variables.

It is often employed in statistics and data analysis to identify patterns, trends, and correlations between the variables.

In a scatter plot, each data point is represented by a dot, and the position of the dot on the graph corresponds to the values of the two variables.

The horizontal axis (x-axis) typically represents one variable, while the vertical axis (y-axis) represents the other variable.

Each point on the graph represents a pair of values from the two variables.

By examining the distribution of points, you can gain insights into the nature of the relationship between the variables.

Scatter plots are particularly useful for identifying trends, outliers, clusters, or any patterns that may exist in the data.

How to draw the scatter plot with R using plot() function?

To create a scatter plot in R using the plot() function, you need to provide the vectors or data frames representing the variables you want to plot on the x-axis and y-axis.

example:

Changing color, symbol, and size

In this example:

  • x and y are vectors representing the data for the x-axis and y-axis, respectively.

  • main, xlab, and ylab are optional parameters specifying the main title, x-axis label, and y-axis label, respectively.

  • pch sets the type of plotting symbol (in this case, a solid dot).

  • col sets the color of the dots.

You can customize these parameters based on your specific data and preferences. If you have a data frame, you can use column names to access variables. For example:

This example assumes a simple scatter plot, but you can use additional parameters and functions to add more features, such as regression lines, labels, and more, depending on your analysis requirements.

pch

In R, the pch (plot character) argument is used in the base plot function to specify the type of plotting symbol or point character to be used in the scatter plot. The pch parameter allows you to customize the appearance of the points on the plot.

Here are some common values for the pch argument:

  • pch = 1: Hollow circle
  • pch = 2: Filled circle
  • pch = 3: Cross
  • pch = 4: X
  • pch = 5: Diamond
  • pch = 6: Square
  • pch = 7: Triangle point up
  • pch = 8: Triangle point down
  • pch = 9: Solid dot
  • etc.

The pch argument is used to change the plotting symbols for different groups of points.

The legend() function is used to add a legend explaining the meaning of each pch value.

You can experiment with different pch values to customize the appearance of points in your scatter plot based on your preferences.

The important argument for scatter plot

  • bg = Fill color inside the symbol.

  • col = Border color the symbol

Can you see something ?

iris dataset

Example

if speed > 20 assign red color else blue color.

if speed < 10 and dist < 40 assign red color else blue color.

Add regression line to the scatter plot

Linear regression

Add smooth line to the scatter plot

Add linear and non linear

Gapminer data from package gapminder

scatter plot between lifeExp vs gdpPercap

The scatter plot between lifeExp vs gdpPercap select the data only year 2007

The scatter plot between lifeExp vs gdpPercap select the data only year 2007

  • add linear regression blue line

  • pch is red circle.

RColorBrewer

select “Accent”

Add legend

Change background and add grid line

Basic way

Another way

Exercise

Exercise 1: Basic Scatter Plot

Solution:

# Basic scatter plot
plot(x = mtcars$wt, y = mtcars$mpg, 
     main = "Scatter Plot of Weight vs. MPG",
     xlab = "Weight", 
     ylab = "Miles Per Gallon")

Objective: Create a basic scatter plot with default settings.

  • Use the plot() function to create a scatter plot of mtcars data with wt (weight) on the x-axis and mpg (miles per gallon) on the y-axis.

  • Add a main title “Scatter Plot of Weight vs. MPG”.

Exercise 2: Customizing Point Colors and Sizes

Solution:

# Scatter plot with customized point colors and sizes
plot(x = mtcars$wt, y = mtcars$mpg, 
     col = mtcars$cyl,        # Color points based on the number of cylinders
     pch = 19,                # Solid circles
     cex = 1.5,               # Increase point size
     xlab = "Weight", 
     ylab = "Miles Per Gallon",
     main = "Scatter Plot with Customized Points")

# Add a legend
legend("topright", 
legend = unique(mtcars$cyl), 
   col = unique(mtcars$cyl), 
   pch = 19, 
 title = "Number of Cylinders")

Objective: Customize the appearance of the points.

  • Create a scatter plot of mtcars with wt on the x-axis and mpg on the y-axis.

  • Use different colors for points based on the number of cylinders (cyl).

  • Increase the size of the points.

Exercise 3: Adding Regression Line

Solution:

# Scatter plot with regression line
plot(x = mtcars$wt, y = mtcars$mpg, 
     xlab = "Weight", 
     ylab = "Miles Per Gallon",
     main = "Scatter Plot with Regression Line")

# Add regression line
abline(lm(mpg ~ wt, data = mtcars), col = "red")

Objective: Add a regression line to the scatter plot.

  • Create a scatter plot of mtcars with wt on the x-axis and mpg on the y-axis.

  • Add a regression line to the plot.

Exercise 4: Adding Text Annotations

Solution:

# Create scatter plot
plot(x = mtcars$wt, y = mtcars$mpg, 
     xlab = "Weight", 
     ylab = "Miles Per Gallon",
     main = "Scatter Plot with Annotations")

# Annotate specific points
text( x = mtcars$wt[c(1, 2, 3)], y = mtcars$mpg[c(1, 2, 3)],
     labels = rownames(mtcars)[c(1, 2, 3)],
        pos = 4, 
        cex = 0.8, 
        col = "blue")

Objective: Annotate specific points with text.

  • Create a scatter plot of mtcars with wt on the x-axis and mpg on the y-axis.

  • Annotate the points for specific cars (e.g., “Mazda RX4”, “Hornet 4 Drive”, “Duster 360”) with their names.

Exercise 5: Scatter Plot with Multiple Groups

Solution:

# Create a scatter plot with multiple groups
plot(x = mtcars$wt, y = mtcars$mpg, 
     col = mtcars$cyl,        # Color points based on the number of cylinders
     pch = mtcars$cyl,        # Use different symbols for each group
    xlab = "Weight", 
    ylab = "Miles Per Gallon",
    main = "Scatter Plot with Multiple Groups")

# Add a legend
legend("topright", 
    legend = unique(mtcars$cyl), 
       col = unique(mtcars$cyl), 
       pch = unique(mtcars$cyl), 
     title = "Number of Cylinders")

Objective: Plot multiple groups with different symbols and colors.

  • Create a scatter plot of mtcars with wt on the x-axis and mpg on the y-axis.

  • Differentiate points by the number of cylinders (cyl) using different symbols and colors.