Data Structure: Data Frame
A list of common functions in R for creating, manipulating, and analyzing data frames:
1. Creating Data Frames
data.frame()
– Create a data frame by combining vectors of equal length.as.data.frame()
– Convert an object to a data frame.tibble::tibble()
– Create a modern data frame (tibble) with better printing and handling of large datasets (from thetibble
package).
2. Accessing and Modifying Data Frames
$
– Access a column in a data frame by name (e.g.,df$column
).[]
– Subset rows, columns, or individual elements in a data frame.names()
– Get or set column names of a data frame.rownames()
– Get or set row names.colnames()
– Get or set column names.
3. Inspecting Data Frames
str()
– Display the structure of a data frame.summary()
– Get summary statistics for each column.head()
/tail()
– Display the first or last few rows.dim()
– Get the dimensions (number of rows and columns).nrow()
– Get the number of rows.ncol()
– Get the number of columns.View()
– Open the data frame in a spreadsheet-like viewer (in RStudio).
4. Data Manipulation Functions
subset()
– Subset rows based on conditions.transform()
– Add or modify columns in a data frame.merge()
– Merge two data frames by common columns or row names.cbind()
– Add columns to a data frame.rbind()
– Add rows to a data frame.
5. dplyr
Package Functions
filter()
– Subset rows based on conditions.select()
– Select specific columns.mutate()
– Add new columns or modify existing columns.arrange()
– Sort data frame rows by one or more columns.group_by()
– Group data by one or more columns (usually used withsummarize()
).summarize()
/summarise()
– Create summary statistics for grouped data.distinct()
– Remove duplicate rows.rename()
– Rename columns in a data frame.
6. Applying Functions
apply()
– Apply a function to rows or columns of a data frame.lapply()
– Apply a function to each column and return a list.sapply()
– Apply a function to each column and return a simplified result.
7. Handling Missing Data
is.na()
– Check for missing values in a data frame.na.omit()
– Remove rows withNA
values.na.fill()
(from thezoo
package) – Fill in missing values.
8. Reshaping Data
reshape()
– Reshape data from wide to long format or vice versa.reshape2::melt()
andreshape2::dcast()
– Reshape data using themelt
anddcast
functions (from thereshape2
package).tidyr::pivot_longer()
/pivot_wider()
– Reshape data from wide to long and vice versa (from thetidyr
package).
9. Combining Data Frames
bind_rows()
– Combine data frames by rows (fromdplyr
).bind_cols()
– Combine data frames by columns (fromdplyr
).full_join()
,left_join()
,right_join()
,inner_join()
– Join data frames based on common columns (fromdplyr
).
10. Other Useful Functions
table()
– Create a frequency table for one or more columns.duplicated()
– Identify duplicate rows.order()
– Get the order of rows based on one or more columns.with()
– Apply an expression within the context of a data frame.attach()
/detach()
– Temporarily add a data frame to the R search path (use with caution).
These functions cover a range of operations from basic data frame creation and inspection to advanced manipulation and reshaping, providing a robust toolkit for data analysis in R.