Exercise: Data

I) Multiple Choice Questions

Which of the following is NOT a form of data?
1. Text
2. Numeric
3. Audio/Visual
4. Emotions
5. Models

Answer:

Quantitative data are measures of:
1. Types
2. Values or counts
3. Categories
4. Names
5. Symbols

Answer:

Which of the following is an example of continuous data?
1. Number of books
2. Gender
3. Height
4. Hair color
5. Education level

Answer:

Semi-structured data is best managed in:
1. Relational databases
2. NoSQL databases
3. Data lakes
4. Data warehouses
5. Data marts

Answer:

A healthcare provider collects patient data from various sources, including electronic health records, medical imaging, and wearable devices. Which V’s of Big Data apply to this scenario?
1. Volume and Variety
2. Volume and Velocity
3. Variety and Veracity
4. Velocity and Veracity
5. Volume, Velocity, and Variety

Answer:

A financial institution must ensure that the data used for fraud detection is accurate and trustworthy. Which V of Big Data is most critical in this context?
1. Volume
2. Velocity
3. Variety
4. Veracity
5. All of the above

Answer:

II) Matching the following terms with their definitions:

Definition

A. Raw facts and figures that are collected from various sources and can be processed to extract meaningful information.

B. A small set of data that can be easily managed and analyzed using traditional data processing tools.

C. Process of extracting meaningful information from data

D. A field that solely focuses on the storage and retrieval of data without any analysis.

E. Structured information that has been processed and organized to provide insights.

F. A massive amount of data sets that cannot be stored, processed, or analyzed using traditional tools.

G. A multidisciplinary field that aims to produce broader insights by combining skills from statistics, computer science, and domain expertise.

Term 1. Data: Answer

Big Data: Answer
Data Analytic: Answer
Data Science: Answer

Fill in the Blanks

data are measures of ‘types’ and may be represented by a name, symbol, or a number code.
A is a subset of an organization’s data that is usually created for a specific user group or department.
is the programming language used to manage structured data.
A data is a hybrid data storage architecture that combines the features of data warehouses and data lakes.
data does not have a predefined data model and is best managed in non-relational (NoSQL) databases.
The phase in CRISP-DM involves determining the business question, objective, and success criteria.

IV) Writing Answer

Explain the differences between data analyst, data engineer, and data scientist. Provide the main role examples for each type.

The roles of data analyst, data engineer, and data scientist each involve working with data but have distinct responsibilities and skill sets. Here’s a breakdown of the differences between these roles and examples of their main tasks:

Data Analyst

Primary Focus: Analyzing and interpreting data to provide actionable insights.

Key Responsibilities:

Data Cleaning and Preparation: Preparing data for analysis by handling missing values, correcting data errors, and ensuring data quality.
Data Visualization: Creating charts, graphs, and dashboards to communicate findings clearly to stakeholders.
Statistical Analysis: Performing statistical analyses to identify trends, patterns, and correlations within data.
Reporting: Generating regular reports to summarize data insights and support decision-making processes.

Example Tasks:

Using tools like Excel, Tableau, or Power BI to create visual reports.
Analyzing sales data to determine which products are performing best.
Conducting A/B testing to evaluate the effectiveness of marketing campaigns.

Data Engineer

Primary Focus: Building and maintaining the infrastructure and architecture for data generation, storage, and processing.

Key Responsibilities:

Data Pipeline Development: Designing and implementing data pipelines to collect, process, and store data from various sources.
Database Management: Setting up and managing databases or data warehouses to ensure efficient data storage and retrieval.
Data Integration: Integrating data from different sources and ensuring consistency and reliability.
Performance Optimization: Ensuring the performance and scalability of data systems.

Example Tasks:

Using tools like Apache Hadoop, Apache Spark, or AWS to build data processing frameworks.
Creating ETL (Extract, Transform, Load) processes to move data from transactional databases to data warehouses.
Ensuring data security and compliance with regulations.

Data Scientist

Primary Focus: Applying advanced analytical and machine learning techniques to extract deeper insights and make data-driven predictions.

Key Responsibilities:

Data Exploration: Exploring and understanding large datasets to identify potential patterns and relationships.
Model Building: Developing machine learning models to predict outcomes or classify data.
Advanced Analytics: Performing complex analyses, including predictive modeling, clustering, and natural language processing.
Experimentation: Designing experiments to test hypotheses and validate models.

Example Tasks:

Using programming languages like Python or R to develop predictive models.
Implementing natural language processing algorithms to analyze text data.
Applying clustering techniques to segment customers based on behavior.

Summary of Differences

Data Analyst: Focuses on extracting actionable insights from data through analysis and visualization.
Data Engineer: Concentrates on building and maintaining the data infrastructure necessary for data analysis and processing.
Data Scientist: Utilizes advanced statistical and machine learning techniques to make predictions and derive deeper insights from data.

I) Multiple Choice Questions

II) Matching the following terms with their definitions:

Fill in the Blanks

IV) Writing Answer

Data Analyst

Data Engineer

Data Scientist

Summary of Differences

Sources