Coursera: Pattern Discovery in Data Mining (2024)

Atsushi Takayama

11 min read

Jan 20, 2019

Pattern Discovery in Data Mining | Coursera

Learn the general concepts of data mining along with basic methodologies and applications. Then dive into one subfield…

www.coursera.org

## 1.1: What Is Pattern Discovery? Why Is It Important?

Patterns:

a set of items, subsequences, or substructures that occur frequently together (or strongly correlated) in a data set
represent intrinsic and important properties of datasets

Pattern discovery: Uncovering patterns from massive data sets

## 1.2: Frequent Patterns and Association Rules

(absolute) support (count): how many times the item appears in the itemsets

minsup: minimum support threshold, if (relative) support exceeds, the item is frequent

Association rules: X → Y (s, c)

Support s: the probability that a transaction contains X ∪ Y

Confidence c: the conditional probability that a transaction containing X also contains Y

c = sup(X ∪ Y) / sup(X)

Association rule mining: find all of the rules, X → Y

Coursera: Pattern Discovery in Data Mining (2)

## 1.3: Compressed Representation: Closed Patterns and Max-Patterns

Coursera: Pattern Discovery in Data Mining (3)

There are far too many patterns for any computer to compute or store all the patterns!

Closed pattern (close pattern, closed frequent pattern): A pattern (itemset) X is closed if X is frequent and there exists no super-pattern Y ⊃ X, with the same support as X

Max-pattern (max frequent pattern): A pattern X is a max-pattern if X is frequent and there exists no frequent super-pattern Y ⊃ X

Closed pattern is a lossless compression, while max-pattern is a lossy compression. We only know if a pattern is frequent or not.

## 2.1: The Downward Closure Property of Frequent Patterns

The downward closure (also called “Apriori”) property of frequent patterns

Any subset of a frequent itemset must be frequent. (if {beer, diaper, nuts} is frequent, so is {beer, diaper})

Apriori pruning principle: if any subset of an itemset S is infrequent, then there is no chance for S to be frequent.

## 2.2: The Apriori Algorithm

level-wise, candidate generation and test

Initially, scan DB once to get frequent 1-itemset

Repeat

Generate length-(k+1) candidate itemsets from length-k frequent itemsets
Test the candidates against DB to find frequent (k+1)-itemsets
Set k := k+1

Until no frequent or candidate set can be generated

Return all the frequent itemsets derived

Coursera: Pattern Discovery in Data Mining (4)

Coursera: Pattern Discovery in Data Mining (5)

Coursera: Pattern Discovery in Data Mining (6)

## 2.3: Extensions or Improvements of Apriori

Scanning the entire database is costly. Partition it into k sub-databases. The global frequent patterns should be frequent in at least one of the sub-databases.

Coursera: Pattern Discovery in Data Mining (7)

How to partition the database? Such that a sub-database fits into the main memory.

Coursera: Pattern Discovery in Data Mining (8)

## 2.4: Mining Frequent Patterns by Exploring Vertical Data Format

ECLAT: Equivalence Class Transformation

10: a, b, d, e
20: a, b, e
30: b, c, e

Transform this into

a: 10, 20
b: 20, 30
c: 10, 30
d: 10
e: 10, 20, 30

Then, “ae” appears only in the intersection of “a” and “e”.

Using diffset accelerate mining.

## 2.5: FPGrowth: A Pattern Growth Approach

FPGrowth: frequent pattern growth

Find frequent single items and partition the database based on each such item
Recursively grow frequent patterns by doing the above for each partitioned database (also called conditional database)

Use data structure FP-tree

Coursera: Pattern Discovery in Data Mining (9)

Coursera: Pattern Discovery in Data Mining (10)

Coursera: Pattern Discovery in Data Mining (11)

## 2.6: Mining Closed Patterns

CLOSET+: Mining Closed Itemsets by Pattern-Growth

## Programming Assignment 1

## 3.1: Limitation of the Support-Confidence Framework

Not all patterns mined are interesting. Interestingness measure:

Objective: support, confidence, correlation
Subjective (may vary from person to person): query-based, unexpected, freshness, timeliness, multi-dimensional visualization tools

## 3.2: Interestingness Measures: Lift and χ2

Support & confidence are not good enough as interesting measures, what else?: Lift and χ2

Coursera: Pattern Discovery in Data Mining (12)

Coursera: Pattern Discovery in Data Mining (13)

Null transactions: Transactions that contain neither B nor C

Too many null transactions causes a problem with Lift and χ2.

## 3.3: Null Invariance Measures

Coursera: Pattern Discovery in Data Mining (14)

## 3.4: Comparison of Null-Invariant Measures

Coursera: Pattern Discovery in Data Mining (15)

Coursera: Pattern Discovery in Data Mining (16)

## 4.1: Mining Multi-Level Associations

Items often form hierarchies

eg. Level 1: Milk, Level 2: 2% milk & skim milk

Shared multi-level mining: use the lowest min-support to pass down the set of candidates (mine the database in one scan)

Redundancy filtering: remove redundant rules that can be derived from higher level rules

## 4.2: Mining Multi-Dimensional Associations

Coursera: Pattern Discovery in Data Mining (17)

## 4.3: Mining Quantitative Associations

Coursera: Pattern Discovery in Data Mining (18)

## 4.4: Mining Negative Correlations

Rare patterns: very low support but interesting → setting customized min-support

Negative patterns (negative correlated patterns): unlikely to happen together

sup(A ∪ B) << sup(A) × sup(B)

Problem: the support-based definition is not null-invariant

A Kulczynski measure-based definition

If itemsets A and B are frequent but (P(A|B)+P(B|A))/2<ε, where ε is a negative pattern threshold, then A and B are negatively correlated

## 4.5: Mining Compressed Patterns

Coursera: Pattern Discovery in Data Mining (19)

Want to find typical patterns.

P1 and P2 share similar items and have similar support, so only list P2 which have more items
same for P4 and P5

## 5.1: Sequential Pattern and Sequential Pattern Mining

Sequential patterns: shopping sequences, medical treatments, natural disasters, weblog click streams, programming execution sequences, DNA, protein, etc.

Transaction DB, sequence DB vs time-series DB

Gapped vs non-gapped

Coursera: Pattern Discovery in Data Mining (20)

Within a basket, the sequence doesn’t matter.

Given support threshold min_sup = 2, <(ab)c> is a sequential pattern.

(a and b getting together, then c)

Algorithm requirement: efficient, scalable, finding complete set, incorporating various kinds of user-specific constraints

The Apriori property still holds: if a subsequence s_1 is infrequent, none of s_1’s super-sequences can be frequent

## 5.2: GSP: Apriori-Based Sequential Pattern Mining

Coursera: Pattern Discovery in Data Mining (21)

Scan for length-1, then prune sequences of lengths less than the min-support.
Combine the sequences of length-1 to form candidate sequences of length-2.
Scan for length-2, then prune sequences of lengths less than the min-support.
Combine the sequences of length-2 to form candidate sequences of length-3.
and so on

## 5.3: SPADE — Sequential Pattern Mining in Vertical Data Format

SPADE: Sequential PAttern Discovery using Equivalent Class

Coursera: Pattern Discovery in Data Mining (22)

## 5.4: PrefixSpan — Sequential Pattern Mining by Pattern-Growth

Coursera: Pattern Discovery in Data Mining (23)

Coursera: Pattern Discovery in Data Mining (24)

Coursera: Pattern Discovery in Data Mining (25)

## 5.5: CloSpan — Mining Closed Sequential Patterns

A closed sequential pattern s: There exists no superpattern s’ such that s’ ⊃ s, and s’ and s have the same support (in other words s has the long sequence with the same support).

Property P_1: If s ⊃ s_1, s is closed iff two project DBs have the same size.

Coursera: Pattern Discovery in Data Mining (26)

## 6.1: Mining Spatial Associations

Spatial frequent patterns and association rule: A → B [s%, c%]

s%: support
c%: confidence

A and B:

Topological relations: intersects, overlaps, disjoint, etc.
Spatial orientations: left_of, west_of, under, etc.
Distance information: close_to, within_distance, etc.

Example: If x is a large town and x intersects with a highway, then x is likely to be adjacent to water (lakes, rivers, etc.) with certain support and confidence.

is_a(x, large_town) ^ intersect(x, highway) → adjacent_to(x, water) [7%, 85%]

Hierarchy of spatial relationship: eg. close_to is a generalization of near_by, touch, intersect, contain

Progressive refinement: first search for rough relationship, then refine it

To save mining cost;

Step 1: MBR (Minimum Bounding Rectangle) or R-tree for rough estimation
Step 2: Apply more detailed algorithm for objects that have passed Step 1

## 6.2: Mining Spatial Colocation Patterns

Coursera: Pattern Discovery in Data Mining (27)

A, B, C, D: eg. regions where West Nile Virus are spreading, mosquito regions, cities, locations where birds are present, etc.

Edges: co-location, (eg. 6, 17 and 3 are co-located above)

Rowset: list of places where particular features are co-located

cp: conditional probability

Coursera: Pattern Discovery in Data Mining (28)

## 6.3: Mining and Aggregating Patterns over Multiple Trajectories

Coursera: Pattern Discovery in Data Mining (29)

Time-annotated symbolic sequence

Coursera: Pattern Discovery in Data Mining (30)

Flock pattern requires at least m entities moving at least k consecutive time stamps within a circular region of radius r. → A bit too rigid…

Convoy pattern is density-based, so it’s looser than the flock pattern.

Swarm pattern is even more relaxed.

Coursera: Pattern Discovery in Data Mining (31)

## 6.4: Mining Semantics-Rich Movement Patterns

Frequent Movement Pattern focuses on the semantics of places, as opposed to Frequent Sequential Pattern that focuses on spatial pattern.

eg. Home→Starbucks→Office (the exact location does not matter)

Coursera: Pattern Discovery in Data Mining (32)

## 6.5: Mining Periodic Movement Patterns

Periodic Movement Pattern in a Sparse Data: eg. bird’s nest

Find reference spots with density-based method, then detect periods using Fourier Transform and auto-correlation.

Coursera: Pattern Discovery in Data Mining (33)

## 7.1: From Frequent Pattern Mining to Phrase Mining

Phrase: A natural, meaningful, unambiguous semantic unit (unigrams are often ambiguous)

General principle: Exploit information redundancy and data-driven criteria to determine phrase boundaries and salience

Methodology: Exploring three ideas

Frequent pattern mining and colocation analysis
Phrasal segmentation
Quality phrase assessment

## 7.2: Previous Phrase Mining Methods

Reducing annotation cost: Web n-grams, SotA 95% accuracy, 88% phrase-level F-score

Topic modeling: each topic is represented by a word distribution

Three strategies on phrase mining with topic modeling

Strategy 1: Generate bag-of-words → generate sequence of tokens
Bigram Topic Model, Topical N-Gram (TNG), Phrase-Discovering LDA (PDLDA)
High model complexity (tends to overfitting) & high inference cost
Strategy 2: Post bag-of-words model inference, visualize topics with n-grams
TurboTopics, KERT
Strategy 3: Prior bag-of-words model inference, mine phrase and impose on the bag-of-words model

Coursera: Pattern Discovery in Data Mining (34)

Coursera: Pattern Discovery in Data Mining (35)

## 7.3: ToPMine: Phrase Mining without Training Data

Problem of Strategy 2 (Topic Modeling → Phrase Mining) is that phrases may be split into different topic models

Solution: Switch order!

Phrase mining, document segmentation, and phrase ranking
Topic model inference with phrase constraint

Coursera: Pattern Discovery in Data Mining (36)

Collocation mining: any of these will work

Coursera: Pattern Discovery in Data Mining (37)

Phrase Candidate Generation:

Coursera: Pattern Discovery in Data Mining (38)

eg. set α = 5 as the threshold (5 standard deviations)

## 7.4: SegPhrase: Phrase Mining with Tiny Training Sets

ToPMine: no training data required

SegPhrase: a small set of training data may enhance the quality of phrase mining significantly

Coursera: Pattern Discovery in Data Mining (39)

Classification → Phrasal Segmentation (SegPhrase)

Classification → Phrasal Segmentation → Classification → Phrasal Segmentation (SegPhrase+)

## Programming Assignment 2

## 8.1: Frequent Pattern Mining in Data Streams

Stream data mining tasks

Pattern mining in data streams (OUR INTEREST HERE!)
Multi-dimensional on-line analysis of streams
Clustering data streams
Classification of stream data
Mining outliers and anomalies in stream data

Unrealistic to mine precise patterns. Approximate answers are often sufficient. How to mine frequent patterns with good approximation?

Lossy Counting Algorithm: Not to keep the items with very low support count

Advantage: Guaranteed error bound
Disadvantage: Keeping a large set of traces

Coursera: Pattern Discovery in Data Mining (40)

Coursera: Pattern Discovery in Data Mining (41)

## 8.2: Pattern Discovery for Software Bug Mining

Coursera: Pattern Discovery in Data Mining (42)

CP-Miner:

Tokenize code
Compute hash of the token sequence of each line to construct a sequence DB
Find similar (allowing some gaps due to insertions) pattern from the sequence DB
Find “forgot-to-change” tokens

##8.3: Pattern Discovery for Image Analysis

Coursera: Pattern Discovery in Data Mining (43)

## 8.4: Advanced Topics on Pattern Discovery: Pattern Mining and Society — Privacy Issue

Tree categories on privacy issues arising out of data mining

Input privacy (or data hiding): hide data to prevent miners from reliably extracting confidential or private information
Output privacy (or knowledge hiding): no disclosure of sensitive patterns or knowledge from datasets
Owner privacy: does not allow any party to reliably learn the data or sensitive information that other owners hold

Ensuring Input Privacy

K-anonymity privacy requirement and subsequent studies

k-anonymity: each equivalent class contains at least k records
It is still not sufficient, thus leads to further studies, such as l-diversity, t-closeness, and differential privacy

Statistical distortion: Using randomization algorithms

Independent attribute perturbation
Dependent attribute perturbation: take care of correlations across attributes