\(~~~~~~\)R Data Structure: Vector\(~~~~~~\)

Somsak Chanaim

International College of Digital Innovation, CMU

November 29, 2024

Data Stucture

Data Stucture in R ref: First Steps in R

Object vs Function

In R, objects and functions are two fundamental concepts, but they serve different purposes and have different characteristics.

Object

An object in R is a data structure that stores values or data. Objects can be of various types, such as vectors, lists, matrices, data frames, or even more complex structures like models.

Purpose: Objects are used to hold data that you can manipulate, analyze, or pass as inputs to functions.

Function

A function in R is a set of instructions or code designed to perform a specific task. Functions take inputs (arguments), execute a series of commands, and often return a result.

Purpose: Functions are used to perform operations, calculations, or transformations on data (objects).

Characters

In R, characters are a basic data type used to represent text.

They are typically stored as character vectors.

Numerics

In R, numeric data types are used to represent numbers. This includes both integers and floating-point (decimal) numbers.

Integers

In R, integers are a specific type of numeric data that represent whole numbers.

We can define integers with the L suffix:

Logicals

In R, logical data types are used to represent boolean values, which can be either TRUE or FALSE (T or F).

Logical values are essential for controlling the flow of programs through conditional statements and loops, and they are also useful for indexing and subsetting data.

Operators in R

Operators in R are used to perform various operations on variables and values.

They can be categorized into several types:

  • Arithmetic Operators

  • Comparison Operators

  • Logical Operators

  • Assignment Operators

  • etc.

Arithmetic Operators

In R, arithmetic operators are used to perform basic mathematical operations on numeric values.

These operators are fundamental to performing calculations and are applied element-wise when used with vectors.

Basic arithmetic operations:

  • Addition (+)

  • Subtraction (-)

  • Multiplication (*)

  • Division (/)

  • Exponentiation (^)

  • Modulus (%%)

  • Integer Division (%/%)

Comparison Operators

In R, comparison operators are used to compare two values or variables. The result of a comparison operation is a logical value (TRUE or FALSE).

\(~\)

Greater than (>):


Less than (<):


Equal to (==):

\(~\)

Greater than or equal to (>=):


Less than or equal to (<=):


Not equal to (!=):

Logical Operators

Logical operators are used to perform logical operations, often in the context of conditional statements or when working with Boolean values (TRUE or FALSE).

Logical operators

  • & : Element-wise logical AND

  • | : Element-wise logical OR

  • ! : Negation (Unitary operator for negating a logical value)

These operators are essential for making decisions and controlling the flow of your code.

Logical operator AND (&)

Example

Logical operator OR (|)

Example

Logical operator Negation of Not (!)

Example

Example

Assignment Operators

In R, assignment operators are used to assign values to variables. They are fundamental for storing data, defining variables, and setting up computations.

  • <-: The most commonly used assignment operator in R. It assigns the value on the right to the variable on the left.

  • =: Also used for assignment but is generally preferred for specifying arguments in function calls rather than variable assignment.

Example

or

Remark

When we assign any data structure to an object name, R does not display the value on your screen.

Reserved Words in R

In R, reserved words (or keywords) are special words that have a specific meaning within the language.

These words cannot be used as identifiers (such as variable names, function names, etc.) because they are part of the language syntax.

List of Reserved Words in R:

  1. Control Flow Keywords:
    • if, else
    • repeat
    • while
    • function
    • for
    • in
    • next
    • break
    • return
  2. Logical Constants:
    • TRUE
    • FALSE
    • NULL
    • NA (and variants like NA_integer_, NA_real_, NA_complex_, NA_character_)
  3. Special Operators:
    • Inf (represents infinity)
    • NaN (Not a Number)
    • NA (Not Available or missing value)
  4. Others:
    • ... (Ellipsis, used to pass additional arguments to functions)
    • ~ (Tilde, used in model formulas)

Reserved Words in Context:

  1. Control Flow:
    • The keywords if, else, for, while, repeat, break, next, and return are used to control the flow of execution in an R program.
  2. Logical Constants:
    • TRUE, FALSE, and NULL are used to represent logical values and the absence of any value.
  3. Special Operators:
    • Inf and NaN represent mathematical concepts (infinity and an undefined result, respectively).
    • NA is used for missing data, which is very common in data analysis.
  4. Function Definition:
    • The function keyword is used to define new functions in R.

Importance of Reserved Words:

  • Syntax Rules: Reserved words form the core syntax of R, and their correct usage is essential for writing valid R code.

  • Naming Restrictions: Since reserved words have specific functions within the language, you cannot use them as variable or function names, which helps avoid confusion and errors in the code.

Understanding reserved words in R helps in writing clear, error-free code and avoids conflicts in naming conventions.

Vector

In R, the c() function is one of the most fundamental functions.

  • It is used to combine or concatenate elements to create a vector.

  • The c()function can take multiple arguments and combine them into a single vector.

The vector of number

The vector of character

The vector of logical

Multi element type is character

The sequence in R

In R, you can create sequences of numbers using various functions.

The most common ways to generate sequences are by using

  • the : operator.

  • the seq() function.

  • the rep() function.

1. Using the : Operator

The : operator generates a simple sequence of integers.

2. Using the seq() function

The seq() function provides more control over the sequence, including the ability to set the step size, length, and more.

  seq(from, to, by)
  • from: The starting number.

  • to: The ending number.

  • by: The step size to increment by.

3. Using the rep() function

The rep() function is used to replicate

rep(x, times, each)
  • x is vector.

  • each: Repeat each element in the vector x k times.

  • times: Repeat the vector x n times.

These methods allow you to generate sequences easily and are fundamental for data manipulation and iteration in R.

Arithmetic Operators for length of vector more than one

Example

Arithmetic Operation examples

Important

In R, e is a way to express numbers in scientific notation. Specifically:

  • 1e2 means (1 ^2), which equals 100.

Explanation:

  • The e in 1e2 stands for “exponent,” so:
    • 1e2 is equivalent to 1 * (10 ^ 2)
    • This makes it easier to represent very large or very small numbers without writing out all the zeros.

Examples:

Here are a few more examples of using scientific notation in R:

  • 2e3 is equal to \(2 \times 10^3\), which equals 2000.
  • 5e-2 is equal to \(5 \times 10^{-2}\), which equals 0.05.
  • 3.14e1 is equal to \(3.14 \times 10^1\), which equals 31.4.

This notation is particularly useful for dealing with very large or very small numbers in calculations.

Note: ou cannot use the notation e alone without a number before and after it, as shown in the example below.

Comparison Operators for length of vector more than one

Example

Greater than

Less than

Equal to

Greater than or equal to

Less than or equal to

Not equal to

how to merge/combine vector

Merge/Access/Replace value in vector

Merge

Warning

Merge vector A and vector B

Merge vector B and vector A

Note: merge A and B != merge B and A

Access

From D, show a value at position 1,2,3,4 and 5

Or

From D, show the value at even position

From D, don’t show the value at position 5.

From D, show the value at position 9 and 1 respectively.

Replace

From D, change the value in position 1 to 21.

From D, change every value in position 1 until position 5 equal 25.

From D, change the value in position 1 and position 10 to 30 and 35 respectively.

Check the object types?

typeof() and class() functions

  • typeof(): This function tells you the internal storage mode or type of the object, which is how R internally represents the data. It focuses on the low-level storage type.

  • class(): This function returns the class or high-level type of an object, which often corresponds to how the object is treated by R’s methods.

Common types returned by typeof() include:

  • “logical”

  • “integer”

  • “double”

  • “complex”

  • “character”

  • “list”

  • “NULL”

Some common classes are:

  • “numeric”

  • “factor”

  • “data.frame”

  • “matrix”

  • “lm” (linear model)

Is the object a character/numeric/logical/integer?

In R, is.xxxx() functions are a family of functions used to check if an object is of a particular type or class. These functions return TRUE if the object matches the specified type or class, and FALSE otherwise.

  • is.character(): Checks if an object is of type character.

  • is.numeric(): Checks if an object is of type numeric (either integer or double).

  • is.logical(): Checks if an object is of type logical (TRUE or FALSE).

  • is.integer(): Checks if an object is of type integer.

How to delete object

Assign NULL to an object

We can remove the object from memory or the environment by assigning a NULL value to the object.

check

Use the rm() function

the rm() function is used to remove objects from the environment.

check

Remove all objects from the environment, we can use this code

rm(list = ls())

Save/Load object in the environment

Save object into the environment

Note: you can use <anyname>.RData

Load file into the environment

Useful Function: sample()

The sample() function in R is used to generate a random sample of elements from a specified set of data, with or without replacement.

Syntax:

sample(x, size, replace = FALSE, prob = NULL)

Parameters:

  • x: A vector of elements from which to choose.

  • size: The number of items to choose.

  • replace: Logical; if TRUE, sampling is done with replacement (elements can be selected more than once). Default is FALSE.

  • prob: A vector of probability weights for obtaining the elements of the vector being sampled.

Examples

  1. Basic Sampling Without Replacement:
  1. Sampling With Replacement:
  1. Sampling with Specified Probabilities:
  1. Random Permutation:

Useful Function: paste() and paste0()

The paste() and paste0() functions in R are used to concatenate strings or other objects into a single string. While they serve similar purposes, they have some key differences in how they handle separators.

Examples

  1. Create a vector object named ID with values ranging from ‘ID:1’ to ‘ID:1000’.
  1. Create a vector object named ID with values ranging from ‘ID: 1’ to ‘ID: 1000’.
  1. Create a vector object named ID with values ranging from ‘ID:1-ICDI’ to ‘ID: 1000-ICDI’.

Useful Function: length()

The length() function in R is used to determine the number of elements in an object.

It returns the count of elements present in vectors, lists, arrays, or other objects in R.

For instance, if you have a vector containing numbers or strings, length() will provide the count of elements present within that vector.

Exercise: vector part 1

1. Create a Numeric Vector

  • Create a numeric vector with the values 10, 20, 30, 40, and 50. Assign it to a variable named my_vector.

solution

my_vector <- c(10, 20, 30, 40, 50)

2. Access Elements in a Vector

  • Given the vector my_vector, access the third element in the vector.

solution

third_element <- my_vector[3]

3. Vector Length

  • Find the length of my_vector.

solution

vector_length <- length(my_vector)

4. Sum of Vector Elements

  • Calculate the sum of all elements in my_vector.

solution

vector_sum <- sum(my_vector)

5. Vector Arithmetic

  • Create a new vector my_vector2 with the values 1, 2, 3, 4, and 5. Add my_vector and my_vector2 element-wise.

solution

my_vector2 <- c(1, 2, 3, 4, 5)
result_vector <- my_vector + my_vector2

6. Logical Indexing

  • Create a logical vector indicating which elements of my_vector are greater than 25.

solution

logical_vector <- my_vector > 25

7. Subsetting with a Condition

  • Subset my_vector to only include elements that are greater than

solution

subset_vector <- my_vector[my_vector > 25]

8. Replacing Elements in a Vector

  • Replace the value of the second element in my_vector with 99.

solution

my_vector[2] <- 99

9. Vector Repetition

  • Create a vector that repeats the values 1, 2, 3 three times.

solution

repeated_vector <- rep(x = 1:3, times = 3)

Exercise: vector part 2

The 5 exercises focusing on the seq(), rep(), paste()/paste0(), and sample() functions in R:

10. Generate a Sequence

  • Use the seq() function to create a sequence of numbers from 5 to 50 with a step of 5. Assign the result to a variable named my_seq.

solution

my_seq <- seq(from = 5, to = 50, by = 5)

11. Repeat Elements

  • Use the rep() function to create a vector that repeats the elements 1, 2, and 3, each five times.

solution

my_rep <- rep( x = c(1, 2, 3), each = 5)

12. Combine Strings with Numbers

  • Create a character vector using the paste() function that combines the string “Day” with the numbers from 1 to 7. The result should be c("Day 1", "Day 2", ..., "Day 7").

solution

days <- paste("Day", 1:7)

13. Random Sampling

  • Use the sample() function to generate a random sample of 5 unique numbers from the sequence you created in Exercise 1 (my_seq). Assign the result to my_sample.

solution

my_sample <- sample(x = my_seq, size = 5)

14. Creating IDs with paste0()

  • Generate a vector of IDs using the paste0() function. The IDs should be in the format “ID1”, “ID2”, …, “ID10”. Use paste0() and the seq() function to create this vector.

solution

ids <- paste0("ID", seq(from = 1, to = 10))