International College of Digital Innovation, CMU
October 30, 2024
A scatter plot is a type of data visualization that is used to display the relationship between two continuous variables.
It is often employed in statistics and data analysis to identify patterns
, trends
, and correlations
between the variables.
In a scatter plot, each data point is represented by a dot, and the position of the dot on the graph corresponds to the values of the two variables.
The horizontal axis (x-axis) typically represents one variable, while the vertical axis (y-axis) represents the other variable.
Each point on the graph represents a pair of values from the two variables.
By examining the distribution of points, you can gain insights into the nature of the relationship between the variables.
Scatter plots are particularly useful for identifying trends
, outliers
, cluster
s, or any patterns
that may exist in the data.
To create a scatter plot in R using the plot()
function, you need to provide the vectors or data frames representing the variables you want to plot on the x-axis and y-axis.
example:
In this example:
x
and y
are vectors representing the data for the x-axis and y-axis, respectively.
main
, xlab
, and ylab
are optional parameters specifying the main title, x-axis label, and y-axis label, respectively.
pch
sets the type of plotting symbol (in this case, a solid dot).
col
sets the color of the dots.
You can customize these parameters based on your specific data and preferences. If you have a data frame, you can use column names to access variables. For example:
This example assumes a simple scatter plot, but you can use additional parameters and functions to add more features, such as regression lines, labels, and more, depending on your analysis requirements.
In R, the pch
(plot character) argument is used in the base plot function to specify the type of plotting symbol or point character to be used in the scatter plot. The pch
parameter allows you to customize the appearance of the points on the plot.
Here are some common values for the pch
argument:
pch = 1
: Hollow circlepch = 2
: Filled circlepch = 3
: Crosspch = 4
: Xpch = 5
: Diamondpch = 6
: Squarepch = 7
: Triangle point uppch = 8
: Triangle point downpch = 9
: Solid dotThe pch
argument is used to change the plotting symbols for different groups of points.
The legend()
function is used to add a legend explaining the meaning of each pch
value.
You can experiment with different pch
values to customize the appearance of points in your scatter plot based on your preferences.
The important argument for scatter plot
bg
= Fill color inside the symbol.
col
= Border color the symbol
Can you see something ?
if speed > 20 assign red color else blue color.
if speed < 10 and dist < 40 assign red color else blue color.
scatter plot between lifeExp vs gdpPercap
The scatter plot between lifeExp vs gdpPercap select the data only year 2007
The scatter plot between lifeExp vs gdpPercap select the data only year 2007
add linear regression blue line
pch is red circle.
select “Accent”
Basic way
Solution:
Objective: Create a basic scatter plot with default settings.
Use the plot()
function to create a scatter plot of mtcars
data with wt
(weight) on the x-axis and mpg
(miles per gallon) on the y-axis.
Add a main title “Scatter Plot of Weight vs. MPG”.
Solution:
# Scatter plot with customized point colors and sizes
plot(x = mtcars$wt, y = mtcars$mpg,
col = mtcars$cyl, # Color points based on the number of cylinders
pch = 19, # Solid circles
cex = 1.5, # Increase point size
xlab = "Weight",
ylab = "Miles Per Gallon",
main = "Scatter Plot with Customized Points")
# Add a legend
legend("topright",
legend = unique(mtcars$cyl),
col = unique(mtcars$cyl),
pch = 19,
title = "Number of Cylinders")
Objective: Customize the appearance of the points.
Create a scatter plot of mtcars
with wt
on the x-axis and mpg
on the y-axis.
Use different colors for points based on the number of cylinders (cyl
).
Increase the size of the points.
Solution:
Objective: Add a regression line to the scatter plot.
Create a scatter plot of mtcars
with wt
on the x-axis and mpg
on the y-axis.
Add a regression line to the plot.
Solution:
# Create scatter plot
plot(x = mtcars$wt, y = mtcars$mpg,
xlab = "Weight",
ylab = "Miles Per Gallon",
main = "Scatter Plot with Annotations")
# Annotate specific points
text( x = mtcars$wt[c(1, 2, 3)], y = mtcars$mpg[c(1, 2, 3)],
labels = rownames(mtcars)[c(1, 2, 3)],
pos = 4,
cex = 0.8,
col = "blue")
Objective: Annotate specific points with text.
Create a scatter plot of mtcars
with wt
on the x-axis and mpg
on the y-axis.
Annotate the points for specific cars (e.g., “Mazda RX4”, “Hornet 4 Drive”, “Duster 360”) with their names.
Solution:
# Create a scatter plot with multiple groups
plot(x = mtcars$wt, y = mtcars$mpg,
col = mtcars$cyl, # Color points based on the number of cylinders
pch = mtcars$cyl, # Use different symbols for each group
xlab = "Weight",
ylab = "Miles Per Gallon",
main = "Scatter Plot with Multiple Groups")
# Add a legend
legend("topright",
legend = unique(mtcars$cyl),
col = unique(mtcars$cyl),
pch = unique(mtcars$cyl),
title = "Number of Cylinders")
Objective: Plot multiple groups with different symbols and colors.
Create a scatter plot of mtcars
with wt
on the x-axis and mpg
on the y-axis.
Differentiate points by the number of cylinders (cyl
) using different symbols and colors.