Sample of Supervised Learning Examination

Part 1 Multiple-choice questions along with their answers (20 items, 10 points)

I) Multiple Choice Questions

In linear regression, the relationship between the input variable X and the output variable Y is assumed to be:
1. Linear
2. Non-linear
3. Exponential
4. Polynomial

Answer:

In a linear regression problem, the output variable is always:
1. A string
2. A binary value
3. A discrete label
4. A continuous value

Answer:

In simple linear regression, how many independent variables are used?
1. None
2. One
3. Two
4. Multiple

Answer:

Which of the following metrics is commonly used to evaluate a regression model?
1. Accuracy
2. Precision
3. Mean Squared Error (MSE)
4. Confusion Matrix

Answer:

The R-squared value of a regression model explains:
1. The proportion of the variance in the dependent variable explained by the independent variable
2. The strength and direction of the relationship between variables
3. The slope of the regression line
4. The average residual error in the predictions

Answer:

Use the following data for answering questions 6-10

Data Description

The dataset contains information about different advertising campaigns run by a company. The objective is to predict the sales revenue generated from each campaign based on the amount spent on advertising.

Advertising_Spend ($): The amount of money (in USD) spent on advertising for a particular campaign.
Sales_Revenue ($): The sales revenue (in USD) generated from the campaign.

Campaign ID	Advertising Spend ($)	Sales Revenue ($)
1	10000	150000
2	12000	180000
3	15000	210000
4	18000	250000
5	20000	270000
6	25000	350000
7	30000	400000
8	35000	450000

A company wants to predict sales revenue based on the amount spent on advertising. Which of the following is the best model to use?
1. Logistic Regression
2. Simple Linear Regression
3. Polynomial Regression
4. K-Means Clustering

Answer:

Which of the following is the dependent variable in the dataset?
1. Advertising Spend
2. Campaign ID
3. Sales Revenue
4. Both Advertising Spend and Sales Revenue

Answer:

What is the independent variable in the dataset?
1. Advertising Spend
2. Campaign ID
3. Sales Revenue
4. None of the above

Answer:

Regarding Figure 1, what type of relationship is expected between Advertising Spend and Sales Revenue?
1. No relationship
2. Negative linear relationship
3. Positive linear relationship
4. Exponential relationship

Answer:

The equation of the linear regression model is in the form \[\text{Sales\_Revenue}=\beta_0 +\beta_1\times\text{Advertising\_Spend}.\] What does $\beta_0$ represent in this context?
1. The rate of increase in sales revenue for each dollar spent
2. The minimum sales revenue that can be generated
3. The average sales revenue across all campaigns
4. The fixed base sales revenue when no money is spent on advertising

Answer:

Which of the following algorithms is not typically used for classification tasks?
1. Logistic Regression
2. K-Nearest Neighbors (KNN)
3. Linear Regression
4. Decision Trees

Answer:

In a binary classification problem, which of the following performance metrics is best for evaluating how well the model distinguishes between the two classes?
1. Mean Squared Error (MSE)
2. Mean Absolute Error (MAE)
3. Adjusted R-squared
4. Confusion Matrix

Answer:

In binary classification, what does the model output?
1. A continuous value between 0 and 1
2. A predicted probability and a class label
3. A list of potential class labels
4. A vector of distances to neighboring points

Answer:

What is overfitting in the context of classification models?
1. A model that generalizes well to unseen data
2. A model that has too few features
3. A model that performs well on the training data but poorly on new, unseen data
4. A model that fails to capture the patterns in the training data

Answer:

Which of the following metrics is NOT used to evaluate classification models?
1. Accuracy
2. Precision
3. R-Squared
4. F1-Score

Answer:

In a decision tree, what is the leaf node?
1. A node that has been pruned
2. A node that represents a final class label
3. A node where a split occurs
4. The root node of the tree

Answer:

What does it mean if a classification model has high bias?
1. It performs well on the training data but poorly on the test data.
2. It performs poorly on both the training data and the test data.
3. It performs well on the test data but poorly on the training data.
4. It performs well on high-dimensional datasets.

Answer:

What does the confusion matrix represent in classification tasks?
1. A matrix that summarizes the distance between true and predicted values
2. A matrix that shows the number of correct and incorrect predictions
3. A matrix that lists all the features used for classification
4. A matrix that compares the performance of multiple models

Answer:

In a confusion matrix, what does the True Positive (TP) entry represent?
1. The number of instances where the model correctly predicted the positive class
2. The number of instances where the model incorrectly predicted the positive class
3. The number of instances where the model correctly predicted the negative class
4. The number of instances where the model incorrectly predicted the negative class

Answer:

Suppose a model has the following confusion matrix:
- True Positives (TP): 80
- True Negatives (TN): 70
- False Positives (FP): 20
- False Negatives (FN): 40

What is the precision of the model?

A) 0.80

B) 0.67

C) 0.89

D) 0.88

Answer:

Part 2 Answer the following questions or complete the statements by writing appropriate words or amounts in the answer blanks (10 items, 10 points)

Use the data below to answer questions 21-25

The dataset below represents a company’s analysis of factors affecting sales revenue based on advertising spend in TV, radio, and newspapers.

Data Description for Multiple Linear Regression

This dataset represents a company’s advertising efforts across three different media channels (TV, Radio, and Newspaper) and the resulting sales revenue generated from these campaigns. The goal is to predict sales revenue based on the amount of money spent on each type of advertisement.

Here are the key details:

Campaign_ID: A unique identifier for each advertising campaign.
TV_Ad_Spend ($): The amount of money spent on TV advertisements for each campaign, measured in USD.
Radio_Ad_Spend ($): The amount of money spent on radio advertisements for each campaign, measured in USD.
Newspaper_Ad_Spend ($): The amount of money spent on newspaper advertisements for each campaign, measured in USD.
Sales_Revenue ($): The total sales revenue generated by each campaign, measured in USD.

Campaign ID	TV Spend ($)	Radio Spend ($)	Newspaper Spend ($)	Sales Revenue ($)
1	230,000	37,000	69,000	22,000
2	44,000	39,000	45,000	10,000
3	17,000	45,000	69,000	9,000
4	151,000	41,000	58,000	18,000
5	180,000	10,000	20,000	12,000
6	8,000	49,000	15,000	8,000
7	57,000	32,000	37,000	11,000
8	120,000	43,000	55,000	15,000
9	197,000	25,000	34,000	17,000
10	75,000	35,000	29,000	14,000

Multiple Linear Regression Model

We will use the following model to predict sales revenue:

\[\text{Sales\_Revenue}=\beta_0+\beta_1\times \text{TV\_Ad\_Spend}+\beta_2 \times \text{Radio\_Ad\_Spend}+\beta_3 \times \text{Newspaper\_Ad\_Spend}\] Where:

$\beta_0$ is the intercept
$\beta_1,\beta_2,\beta_3$ are the coefficients for TV, Radio, and Newspaper Ad Spend, respectively.

How many independent variables are included in the regression model?

Answer: (Type number)

What is the correlation between TV Spend and Sales Revenue (with an answer rounded to two decimal places)?

Answer:

Regarding Figure 2, what are the values of $\beta_1$ (the coefficient for TV Spend), $\beta_2$ (the coefficient for Radio Spend), and $\beta_13$ (the coefficient for Newspaper Spend) in the regression model (with answers rounded to two decimal places)?
- Answer: $\beta_1$ =
- Answer: $\beta_2$ =
- Answer: $\beta_3$ =
Among $\beta_1$ (TV Spend), $\beta_2$ (Radio Spend), and $\beta_3$ (Newspaper Spend), which coefficient has the highest effect on the output (Sales Revenue) in the regression model?

Answer: has the highest value in the regression model.

What is the predicted sales revenue if TV Spend = $0, Radio Spend = $100,000, and Newspaper Spend = $50,000 based on the regression model as shown in Figure 3 (with an answer rounded to no decimal place)?

Answer: $\beta_1$ =

Use the decision tree as shown in Figure 4 to answer questions 26-28

Data Description:

This decision tree is based on three variables:

Likes Gravity (Boolean): A binary variable representing whether a person likes gravity or not (values: true or false).
Likes Dogs (Boolean): A binary variable representing whether a person likes dogs or not (values: true or false).
Age (Continuous): A numerical variable that indicates a person’s age.

What is the first decision node in the tree?

Answer:

What is the outcome if a person likes gravity, does not like dogs, and is younger than 40.5 years?

Answer: The outcome is .

What is the outcome if a person likes gravity, does not like dogs, and is 40.5 years or older?

Answer: The outcome is .

Use the Classification Model Performance as shown in the Table 1 to answer questions 29-30 Table 1 Model Performance Comparison

Table 1 Model Performance Comparison

Metric	Model A	Model B
Accuracy	0.82	0.9
Precision	0.84	0.84
Recall	0.91	0.87
F1-Score	0.8	0.8
AUC	0.91	0.91

If accuracy is the most important metric, which model would you choose?

Answer:

If recall is the most important metric, which model would you choose?

Answer: