International College of Digital Innovation, CMU
October 29, 2025
Data Stucture in R ref: First Steps in R
In R, objects and functions are two fundamental concepts, but they serve different purposes and have different characteristics.
Object
An object in R is a data structure that stores values or data. Objects can be of various types, such as vectors, lists, matrices, data frames, or even more complex structures like models.
Purpose: Objects are used to hold data that you can manipulate, analyze, or pass as inputs to functions.
Function
A function in R is a set of instructions or code designed to perform a specific task. Functions take inputs (arguments), execute a series of commands, and often return a result.
Purpose: Functions are used to perform operations, calculations, or transformations on data (objects).
In R, characters are a basic data type used to represent text.
They are typically stored as character vectors.
In R, numeric data types are used to represent numbers. This includes both integers and floating-point (decimal) numbers.
In R, integers are a specific type of numeric data that represent whole numbers.
We can define integers with the L suffix:
In R, logical data types are used to represent boolean values, which can be either TRUE or FALSE (T or F).
Logical values are essential for controlling the flow of programs through conditional statements and loops, and they are also useful for indexing and subsetting data.
Operators in R are used to perform various operations on variables and values.
They can be categorized into several types:
Arithmetic Operators
Comparison Operators
Logical Operators
Assignment Operators
etc.
In R, arithmetic operators are used to perform basic mathematical operations on numeric values.
These operators are fundamental to performing calculations and are applied element-wise when used with vectors.
Basic arithmetic operations:
Addition (+)
Division (/)
Modulus (%%)
Subtraction (-)
Exponentiation (^)
Multiplication (*)
Integer Division (%/%)
In R, comparison operators are used to compare two values or variables. The result of a comparison operation is a logical value (TRUE or FALSE).
Greater than (>):
Less than (<):
Equal to (==):
Greater than or equal to (>=):
Less than or equal to (<=):
Not equal to (!=):
Logical operators are used to perform logical operations, often in the context of conditional statements or when working with Boolean values (TRUE or FALSE).
Logical operators
& : Element-wise logical AND
| : Element-wise logical OR
! : Negation (Unitary operator for negating a logical value)
These operators are essential for making decisions and controlling the flow of your code.
Logical operator AND (&)
Example
Logical operator OR (|)
Example
Logical operator Negation of Not (!)
Example
Example
In R, assignment operators are used to assign values to variables. They are fundamental for storing data, defining variables, and setting up computations.
<-: The most commonly used assignment operator in R. It assigns the value on the right to the variable on the left.
=: Also used for assignment but is generally preferred for specifying arguments in function calls rather than variable assignment.
Example
or
Remark
When we assign any data structure to an object name, R does not display the value on your screen.
In R, reserved words (or keywords) are special words that have a specific meaning within the language.
These words cannot be used as identifiers (such as variable names, function names, etc.) because they are part of the language syntax.
List of Reserved Words in R
if, else
repeat
while
function
for
in
next
break
return
Control Flow:
if, else, for, while, repeat, break, next, and return are used to control the flow of execution in an R program.TRUE
FALSE
NULL
NA (and variants like NA_integer_, NA_real_, NA_complex_, NA_character_)
Logical Constants:
TRUE, FALSE, and NULL are used to represent logical values and the absence of any value.Inf (represents infinity)
NaN (Not a Number)
NA (Not Available or missing value)
Special Operators:
Inf and NaN represent mathematical concepts (infinity and an undefined result, respectively).
NA is used for missing data, which is very common in data analysis.
… (Ellipsis, used to pass additional arguments to functions)
~ (Tilde, used in model formulas)
Function Definition:
function keyword is used to define new functions in R.Importance of Reserved Words
Syntax Rules: Reserved words form the core syntax of R, and their correct usage is essential for writing valid R code.
Naming Restrictions: Since reserved words have specific functions within the language, you cannot use them as variable or function names, which helps avoid confusion and errors in the code.
Understanding reserved words in R helps in writing clear, error-free code and avoids conflicts in naming conventions.
viewof distTypeVec = Inputs.radio(
[
"Create Numeric / Integer",
"Create Logical / Character",
"seq() and rep()",
"Indexing & Slicing [i], [i:j]",
"Negative Indexing (exclude)",
"Logical Filter (x > k)",
"Recycling Rule",
"Math Ops & Summary",
"Sort, order(), rank()",
"Set Ops & Membership",
"Names & Access by Name",
"NA handling",
"Character Ops",
"match(), which()",
"ifelse() vectorized",
"Cumulative Functions"
],
{ label: "Vector Topics", value: "Create Numeric / Integer", inline: true }
)The c() function
This is one of the most fundamental functions.
It is used to combine or concatenate elements to create a vector.
The c() function can take multiple arguments and combine them into a single vector.
The vector of number
The vector of character
The vector of logical
Multi element type is character
In R, you can create sequences of numbers using various functions.
The most common ways to generate sequences are by using
the : operator.
the seq() function.
the rep() function.
1. Using the : Operator
The : operator generates a simple sequence of integers.
These methods allow you to generate sequences easily and are fundamental for data manipulation and iteration in R.
Example
Arithmetic Operation examples
Important
In R, e is a way to express numbers in scientific notation. Specifically:
1e2 means (\(1 \times 10^2\)), which equals 100.Explanation:
e in 1e2 stands for “exponent,” so:
1e2 is equivalent to 1 * (10 ^ 2)Examples:
Here are a few more examples of using scientific notation in R:
2e3 is equal to \(2 \times 10^3\), which equals 2000.5e-2 is equal to \(5 \times 10^{-2}\), which equals 0.05.3.14e1 is equal to \(3.14 \times 10^1\), which equals 31.4.This notation is particularly useful for dealing with very large or very small numbers in calculations.
Note: We cannot use the notation e alone without a number before and after it, as shown in the example below.
Example
Greater than
Less than
Equal to
Greater than or equal to
Less than or equal to
Not equal to
Merge/Access/Replace value in vector
Merge
Warning
Merge vector A and vector B
Merge vector B and vector A
Note: merge A and B != merge B and A
Access
From D, show a value at position 1,2,3,4 and 5
Or
From D, show the value at even position
From D, don’t show the value at position 5.
From D, show the value at position 9 and 1 respectively.
Replace
From D, change the value in position 1 to 21.
From D, change every value in position 1 until position 5 equal 25.
From D, change the value in position 1 and position 10 to 30 and 35 respectively.
typeof() and class() functions
typeof(): This function tells you the internal storage mode or type of the object, which is how R internally represents the data. It focuses on the low-level storage type.
class(): This function returns the class or high-level type of an object, which often corresponds to how the object is treated by R’s methods.
Common types returned by typeof() include:
“logical”
“integer”
“double”
“complex”
“character”
“list”
“NULL”
Some common classes are:
“numeric”
“factor”
“data.frame”
“matrix”
“lm” (linear model)
Is the object a character/numeric/logical/integer?
In R, is.xxxx() functions are a family of functions used to check if an object is of a particular type or class. These functions return TRUE if the object matches the specified type or class, and FALSE otherwise.
is.character(): Checks if an object is of type character.
is.numeric(): Checks if an object is of type numeric (either integer or double).
is.logical(): Checks if an object is of type logical (TRUE or FALSE).
is.integer(): Checks if an object is of type integer.
Assign NULL to an object
We can remove the object from memory or the environment by assigning a NULL value to the object.
check
Use the rm() function
the rm() function is used to remove objects from the environment.
check
Save object into the environment
Note: you can use <anyname>.RData
 Load file into the environment
The sample() function in R is used to generate a random sample of elements from a specified set of data, with or without replacement.
x: A vector of elements from which to choose.
size: The number of items to choose.
replace: Logical; if TRUE, sampling is done with replacement (elements can be selected more than once). Default is FALSE.
prob: A vector of probability weights for obtaining the elements of the vector being sampled.
Examples
The paste() and paste0() functions in R are used to concatenate strings or other objects into a single string. While they serve similar purposes, they have some key differences in how they handle separators.
The length() function in R is used to determine the number of elements in an object.
It returns the count of elements present in vectors, lists, arrays, or other objects in R.
For instance, if you have a vector containing numbers or strings, length() will provide the count of elements present within that vector.
my_vector.my_vector, access the third element in the vector.my_vector.my_vector.my_vector2 with the values 1, 2, 3, 4, and 5. Add my_vector and my_vector2 element-wise.my_vector are greater than 25.my_vector to only include elements that are greater than
my_vector with 99.The 5 exercises focusing on the seq(), rep(), paste()/paste0(), and sample() functions in R:
seq() function to create a sequence of numbers from 5 to 50 with a step of 5. Assign the result to a variable named my_seq.rep() function to create a vector that repeats the elements 1, 2, and 3, each five times.paste() function that combines the string “Day” with the numbers from 1 to 7. The result should be c("Day 1", "Day 2", ..., "Day 7").sample() function to generate a random sample of 5 unique numbers from the sequence you created in Exercise 1 (my_seq). Assign the result to my_sample.paste0() function. The IDs should be in the format “ID1”, “ID2”, …, “ID10”. Use paste0() and the seq() function to create this vector.