Pre-test: Decision Tree with Orange Data Mining

The Car Evaluation Dataset (1997) is a classic dataset used in machine learning and data analysis, especially for classification tasks. It was originally derived from a decision model created to evaluate car purchases based on specific criteria. This dataset is available from the UCI Machine Learning Repository.

1 Features of the Dataset

The Car Evaluation dataset contains six input attributes (features) and one target variable. The features are categorical, making it suitable for decision trees, random forests, and other classification algorithms.

2 Features (Attributes)

Buying: The price of the car from the perspective of the buyer. It has 4 possible values:
- vhigh: very high
- high: high
- med: medium
- low: low
Maint (Maintenance): The cost of maintaining the car. It also has 4 possible values:
- vhigh: very high
- high: high
- med: medium
- low: low
Doors: The number of doors the car has. It has 4 possible values:
- 2: 2 doors
- 3: 3 doors
- 4: 4 doors
- 5more: 5 or more doors
Persons: The capacity of the car in terms of passengers. It has 3 possible values:
- 2: 2 persons
- 4: 4 persons
- more: more than 4 persons
Lug_boot: The size of the luggage boot. It has 3 possible values:
- small: small
- med: medium
- big: big
Safety: The safety level of the car. It has 3 possible values:
- low: low safety
- med: medium safety
- high: high safety

3 Target Variable

Class (Car evaluation): The final evaluation of the car based on the attributes mentioned above. It has 4 possible values:
- unacc: unacceptable
- acc: acceptable
- good: good
- vgood: very good

4 Example Record in the Dataset:

Buying	Maint	Doors	Persons	Lug_boot	Safety	Class
vhigh	vhigh	2	2	small	low	unacc
low	low	4	more	big	high	vgood

5 Objective

The goal of analyzing this dataset is typically to predict the “Class” of a car (whether it is unacceptable, acceptable, good, or very good) based on its attributes like price, maintenance cost, number of doors, safety, etc.

To interpret results from the Car Evaluation Dataset using a decision tree, we follow the same principles as interpreting any decision tree, with a specific focus on the categorical features and target classes in this dataset. Let’s break down the steps using an example decision tree trained on this data.

How many observations or instances

Use all features to classify the target car, and double-click on the tree widget. What is the minimum tree depth required to achieve a classification accuracy (CA) greater than 0.90?

Decision Tree 3) Classification with Tree.

If Person equal 5 and safety is low, What is the prediction from tree

If Person equal 5, safety is high, buying is high, and maint is vhigh. What is the prediction from tree?

If Person equal 5, safety is high, buying is high, and maint is high, log_boot is small, and safety is high. What is the prediction from tree?

6 The confusion matrix

Accuracy: Accuracy in classification problems is the number of correct predictions made by the model over all kinds predictions made.

\[AC =\dfrac{TP+TN}{TP+FP+FN+TN} \]

What is the value of CA (3 decimals)

Pecision:

\[Prec =\dfrac{TP}{TP+FP}\] What is the value of Precision (3 decimals)

Recall:

\[Recall = \dfrac{TP}{TP+FN}\] What is the value of Recall (3 decimals)

F1 score:

\[F1 = \dfrac{2(Precision)(Recall)}{Precision+Recall}\]

What is the value of F1 (3 decimals)