Exercise: Association Rule
Appilcation of Association Rules
Association Rule Mining can be highly beneficial for businesses in various industries, as it helps to uncover hidden patterns, relationships, and insights from large datasets. The most well-known use case is Market Basket Analysis, but the applications extend to many areas. Below are key business applications and examples:
1. Market Basket Analysis (Retail)
This is the most common application of association rule mining, used to understand which products are frequently purchased together. This analysis helps businesses optimize their product placement, promotions, and bundling strategies.
- Example: A supermarket might find that customers who buy bread are likely to buy butter. Based on this insight, the store can:
- Place bread and butter closer together on shelves.
- Offer discounts or bundled promotions (e.g., buy bread, get butter at a discount).
- Suggest butter as a recommended product when a customer adds bread to their online cart.
2. Cross-Selling and Upselling (E-commerce)
E-commerce platforms can use association rules to suggest additional products to customers based on their purchase history or items currently in their cart. This enhances the shopping experience and increases revenue.
- Example: Amazon frequently uses association rules to display “Customers who bought this item also bought…” This cross-selling tactic increases the chances that customers will purchase additional items.
3. Product Placement Optimization (Retail)
Retailers can use association rule mining to place frequently bought-together items closer to each other in the store, which can improve the customer shopping experience and increase sales.
- Example: In a grocery store, the insight that people often buy chips and soda together can lead to these products being placed in the same aisle.
4. Recommendation Systems (Entertainment)
Streaming services like Netflix or Spotify use association rule mining to suggest new content based on a user’s past behavior. The goal is to enhance the user experience by providing personalized recommendations.
- Example: Netflix recommends movies and TV shows based on what other users with similar viewing habits have watched, leveraging association rules to find patterns.
5. Targeted Advertising (Digital Marketing)
Advertisers can use association rule mining to create more effective ad campaigns. By understanding which products or services are often consumed together, businesses can design ads that appeal to these associations.
- Example: An online advertising platform might show an ad for baby strollers to users who have searched for diapers, knowing that these items are often associated.
Example Data
Transaction Data
| transaction | coke | milk | green tea | snack | water |
|---|---|---|---|---|---|
| TRAN: 1 | 1 | 0 | 0 | 1 | 1 |
| TRAN: 2 | 1 | 0 | 1 | 0 | 1 |
| TRAN: 3 | 1 | 0 | 0 | 0 | 1 |
| TRAN: 4 | 0 | 1 | 1 | 1 | 1 |
| TRAN: 5 | 1 | 1 | 0 | 0 | 0 |
| TRAN: 6 | 0 | 1 | 0 | 1 | 1 |
| count |
What are the names of the top-selling products?
Answer:
1. Support
Support refers to the proportion or frequency of transactions that contain a specific itemset in the entire dataset. It indicates how often an item or itemset appears in the data.
Formula: \[ \text{Support}(A) = \frac{\text{Number of transactions containing A}}{\text{Total number of transactions}} \]
- Support tells us how frequent a particular item or set of items occurs.
- Low support might indicate that the rule is not relevant because the itemset occurs rarely.
Example: If you have 6 transactions, and “bread” appears in 4 of them, the support for “coke” is: \[ \text{Support}(\text{coke}) = \frac{4}{6} = 2/3 =0.6667 \] This means that 66.67% of all transactions include “coke.”
Question: What is the value if \(\text{Support}(\text{snack}) =\)
2. Confidence
Confidence measures the likelihood that if an item \(A\) appears in a transaction, item \(B\) will also appear. It evaluates the accuracy or reliability of an association rule.
Formula: \[ \text{Confidence}(A \Rightarrow B) = \frac{\text{Number of transactions containing both A and B}}{\text{Number of transactions containing A}} \]
- Confidence tells us the probability of seeing item \(B\) in transactions that already contain item \(A\).
- A higher confidence indicates a stronger association between \(A\) and \(B\).
Example: If 1 out of 6 transactions that contain “coke” also include “snack,” the confidence of the rule “If a customer buys coke, they will buy snack” is: \[ \text{Confidence}(\text{coke} \Rightarrow \text{snack}) = \frac{1}{4} = 1/4 =0.25 \] This means that 25% of transactions involving “coke” also involve “snack.”
Question: \(\text{Confidence}(\text{snack} \Rightarrow \text{coke}) =\)
3. Lift
Lift measures how much more likely \(B\) is to be purchased when \(A\) is purchased, compared to the probability of purchasing \(B\) regardless of whether \(A\) is purchased. It is used to assess the strength of the association between \(A\) and \(B\).
Formula: \[ \text{Lift}(A \Rightarrow B) = \frac{\text{Confidence}(A \Rightarrow B)}{\text{Support}(B)}= \frac{\text{Support}(A, B)}{\text{Support}(A)\times\text{Support}(B)} \]
Lift > 1: \(A\) increases the likelihood of \(B\) occurring.
Lift = 1: \(A\) and \(B\) are independent (no association).
Lift < 1: \(A\) reduces the likelihood of \(B\) occurring.
Example: If the lift of the rule “If a customer buys coke, they will buy snack” is
\[ \text{Lift}(coke \Rightarrow snack) = \frac{\text{Support}(coke \Rightarrow snack)}{\text{Support}(snack)} =\dfrac{0.25}{0.5}=0.5 \]
If a customer buys coke, they will buy snack” is 0.5, it indicates that customers who buy coke are less likely to buy scnack than those who buy snack independently of coke. This might mean that customers who buy coke prefer an alternative, such as margarine, or have different purchasing habits.
Question: what is the value of \(\text{Lift}(snack \Rightarrow coke) =\)
Summary:
Support: Indicates how often an item or itemset appears in the dataset (relative frequency).
Confidence: Measures how likely it is that item \(B\) will appear in transactions that contain item \(A\) (conditional probability).
Lift: Measures the strength of the relationship between \(A\) and \(B\) by comparing it to the expected frequency of \(B\) without \(A\) (true association strength).
These three metrics are commonly used together to assess the quality and significance of association rules in data mining.
:::
Association Rule with Orange Data Mining.
Step 1 select Options and choose Add-ons…
Step 2 Open file “Asso.ows” and widget file choose “Trans.xlsx” (click here to download file)
Open widget Frequent Itemsets, From the list below, which product have support more than 49% (TRUE or FALSE)
coke
coke and water
green tea
milk and water
Open widget Associate Rules, From the list below, which product have Lift value more than 1 (TRUE or FALSE).
\(\text{Lift}(\text{milk and water}\Rightarrow \text{green tea and snack}) =\)
\(\text{Lift}(\text{milk}\Rightarrow \text{snack}) =\)
\(\text{Lift}(\text{green tea and water}\Rightarrow \text{coke}) =\)
\(\text{Lift}(\text{coke}\Rightarrow \text{snack and water}) =\)