Which of the following is an example of continuous data?
Number of books
Gender
Height
Hair color
Education level
Answer:
Semi-structured data is best managed in:
Relational databases
NoSQL databases
Data lakes
Data warehouses
Data marts
Answer:
A healthcare provider collects patient data from various sources, including electronic health records, medical imaging, and wearable devices. Which V’s of Big Data apply to this scenario?
Volume and Variety
Volume and Velocity
Variety and Veracity
Velocity and Veracity
Volume, Velocity, and Variety
Answer:
A financial institution must ensure that the data used for fraud detection is accurate and trustworthy. Which V of Big Data is most critical in this context?
Volume
Velocity
Variety
Veracity
All of the above
Answer:
II) Matching the following terms with their definitions:
Definition
A. Raw facts and figures that are collected from various sources and can be processed to extract meaningful information.
B. A small set of data that can be easily managed and analyzed using traditional data processing tools.
C. Process of extracting meaningful information from data
D. A field that solely focuses on the storage and retrieval of data without any analysis.
E. Structured information that has been processed and organized to provide insights.
F. A massive amount of data sets that cannot be stored, processed, or analyzed using traditional tools.
G. A multidisciplinary field that aims to produce broader insights by combining skills from statistics, computer science, and domain expertise.
Term 1. Data: Answer
Big Data: Answer
Data Analytic: Answer
Data Science: Answer
Fill in the Blanks
data are measures of ‘types’ and may be represented by a name, symbol, or a number code.
A is a subset of an organization’s data that is usually created for a specific user group or department.
is the programming language used to manage structured data.
A data is a hybrid data storage architecture that combines the features of data warehouses and data lakes.
data does not have a predefined data model and is best managed in non-relational (NoSQL) databases.
The phase in CRISP-DM involves determining the business question, objective, and success criteria.
IV) Writing Answer
Explain the differences between data analyst, data engineer, and data scientist. Provide the main role examples for each type.
The roles of data analyst, data engineer, and data scientist each involve working with data but have distinct responsibilities and skill sets. Here’s a breakdown of the differences between these roles and examples of their main tasks:
Data Analyst
Primary Focus: Analyzing and interpreting data to provide actionable insights.
Key Responsibilities:
Data Cleaning and Preparation: Preparing data for analysis by handling missing values, correcting data errors, and ensuring data quality.
Data Visualization: Creating charts, graphs, and dashboards to communicate findings clearly to stakeholders.
Statistical Analysis: Performing statistical analyses to identify trends, patterns, and correlations within data.
Reporting: Generating regular reports to summarize data insights and support decision-making processes.
Example Tasks:
Using tools like Excel, Tableau, or Power BI to create visual reports.
Analyzing sales data to determine which products are performing best.
Conducting A/B testing to evaluate the effectiveness of marketing campaigns.
Data Engineer
Primary Focus: Building and maintaining the infrastructure and architecture for data generation, storage, and processing.
Key Responsibilities:
Data Pipeline Development: Designing and implementing data pipelines to collect, process, and store data from various sources.
Database Management: Setting up and managing databases or data warehouses to ensure efficient data storage and retrieval.
Data Integration: Integrating data from different sources and ensuring consistency and reliability.
Performance Optimization: Ensuring the performance and scalability of data systems.
Example Tasks:
Using tools like Apache Hadoop, Apache Spark, or AWS to build data processing frameworks.
Creating ETL (Extract, Transform, Load) processes to move data from transactional databases to data warehouses.
Ensuring data security and compliance with regulations.
Data Scientist
Primary Focus: Applying advanced analytical and machine learning techniques to extract deeper insights and make data-driven predictions.
Key Responsibilities:
Data Exploration: Exploring and understanding large datasets to identify potential patterns and relationships.
Model Building: Developing machine learning models to predict outcomes or classify data.
Advanced Analytics: Performing complex analyses, including predictive modeling, clustering, and natural language processing.
Experimentation: Designing experiments to test hypotheses and validate models.
Example Tasks:
Using programming languages like Python or R to develop predictive models.
Implementing natural language processing algorithms to analyze text data.
Applying clustering techniques to segment customers based on behavior.
Summary of Differences
Data Analyst: Focuses on extracting actionable insights from data through analysis and visualization.
Data Engineer: Concentrates on building and maintaining the data infrastructure necessary for data analysis and processing.
Data Scientist: Utilizes advanced statistical and machine learning techniques to make predictions and derive deeper insights from data.