Data Structure: Data Frame
A list of common functions in R for creating, manipulating, and analyzing data frames:
1. Creating Data Frames
data.frame()– Create a data frame by combining vectors of equal length.as.data.frame()– Convert an object to a data frame.tibble::tibble()– Create a modern data frame (tibble) with better printing and handling of large datasets (from thetibblepackage).
2. Accessing and Modifying Data Frames
$– Access a column in a data frame by name (e.g.,df$column).[]– Subset rows, columns, or individual elements in a data frame.names()– Get or set column names of a data frame.rownames()– Get or set row names.colnames()– Get or set column names.
3. Inspecting Data Frames
str()– Display the structure of a data frame.summary()– Get summary statistics for each column.head()/tail()– Display the first or last few rows.dim()– Get the dimensions (number of rows and columns).nrow()– Get the number of rows.ncol()– Get the number of columns.View()– Open the data frame in a spreadsheet-like viewer (in RStudio).
4. Data Manipulation Functions
subset()– Subset rows based on conditions.transform()– Add or modify columns in a data frame.merge()– Merge two data frames by common columns or row names.cbind()– Add columns to a data frame.rbind()– Add rows to a data frame.
5. dplyr Package Functions
filter()– Subset rows based on conditions.select()– Select specific columns.mutate()– Add new columns or modify existing columns.arrange()– Sort data frame rows by one or more columns.group_by()– Group data by one or more columns (usually used withsummarize()).summarize()/summarise()– Create summary statistics for grouped data.distinct()– Remove duplicate rows.rename()– Rename columns in a data frame.
6. Applying Functions
apply()– Apply a function to rows or columns of a data frame.lapply()– Apply a function to each column and return a list.sapply()– Apply a function to each column and return a simplified result.
7. Handling Missing Data
is.na()– Check for missing values in a data frame.na.omit()– Remove rows withNAvalues.na.fill()(from thezoopackage) – Fill in missing values.
8. Reshaping Data
reshape()– Reshape data from wide to long format or vice versa.reshape2::melt()andreshape2::dcast()– Reshape data using themeltanddcastfunctions (from thereshape2package).tidyr::pivot_longer()/pivot_wider()– Reshape data from wide to long and vice versa (from thetidyrpackage).
9. Combining Data Frames
bind_rows()– Combine data frames by rows (fromdplyr).bind_cols()– Combine data frames by columns (fromdplyr).full_join(),left_join(),right_join(),inner_join()– Join data frames based on common columns (fromdplyr).
10. Other Useful Functions
table()– Create a frequency table for one or more columns.duplicated()– Identify duplicate rows.order()– Get the order of rows based on one or more columns.with()– Apply an expression within the context of a data frame.attach()/detach()– Temporarily add a data frame to the R search path (use with caution).
These functions cover a range of operations from basic data frame creation and inspection to advanced manipulation and reshaping, providing a robust toolkit for data analysis in R.