Data Structure: Data Frame

Full Screen

A list of common functions in R for creating, manipulating, and analyzing data frames:

1. Creating Data Frames

  • data.frame() – Create a data frame by combining vectors of equal length.
  • as.data.frame() – Convert an object to a data frame.
  • tibble::tibble() – Create a modern data frame (tibble) with better printing and handling of large datasets (from the tibble package).

2. Accessing and Modifying Data Frames

  • $ – Access a column in a data frame by name (e.g., df$column).
  • [] – Subset rows, columns, or individual elements in a data frame.
  • names() – Get or set column names of a data frame.
  • rownames() – Get or set row names.
  • colnames() – Get or set column names.

3. Inspecting Data Frames

  • str() – Display the structure of a data frame.
  • summary() – Get summary statistics for each column.
  • head() / tail() – Display the first or last few rows.
  • dim() – Get the dimensions (number of rows and columns).
  • nrow() – Get the number of rows.
  • ncol() – Get the number of columns.
  • View() – Open the data frame in a spreadsheet-like viewer (in RStudio).

4. Data Manipulation Functions

  • subset() – Subset rows based on conditions.
  • transform() – Add or modify columns in a data frame.
  • merge() – Merge two data frames by common columns or row names.
  • cbind() – Add columns to a data frame.
  • rbind() – Add rows to a data frame.

5. dplyr Package Functions

  • filter() – Subset rows based on conditions.
  • select() – Select specific columns.
  • mutate() – Add new columns or modify existing columns.
  • arrange() – Sort data frame rows by one or more columns.
  • group_by() – Group data by one or more columns (usually used with summarize()).
  • summarize() / summarise() – Create summary statistics for grouped data.
  • distinct() – Remove duplicate rows.
  • rename() – Rename columns in a data frame.

6. Applying Functions

  • apply() – Apply a function to rows or columns of a data frame.
  • lapply() – Apply a function to each column and return a list.
  • sapply() – Apply a function to each column and return a simplified result.

7. Handling Missing Data

  • is.na() – Check for missing values in a data frame.
  • na.omit() – Remove rows with NA values.
  • na.fill() (from the zoo package) – Fill in missing values.

8. Reshaping Data

  • reshape() – Reshape data from wide to long format or vice versa.
  • reshape2::melt() and reshape2::dcast() – Reshape data using the melt and dcast functions (from the reshape2 package).
  • tidyr::pivot_longer() / pivot_wider() – Reshape data from wide to long and vice versa (from the tidyr package).

9. Combining Data Frames

  • bind_rows() – Combine data frames by rows (from dplyr).
  • bind_cols() – Combine data frames by columns (from dplyr).
  • full_join(), left_join(), right_join(), inner_join() – Join data frames based on common columns (from dplyr).

10. Other Useful Functions

  • table() – Create a frequency table for one or more columns.
  • duplicated() – Identify duplicate rows.
  • order() – Get the order of rows based on one or more columns.
  • with() – Apply an expression within the context of a data frame.
  • attach() / detach() – Temporarily add a data frame to the R search path (use with caution).

These functions cover a range of operations from basic data frame creation and inspection to advanced manipulation and reshaping, providing a robust toolkit for data analysis in R.