[PDF] Discovering Significant Patterns | Semantic Scholar (2024)

Topics

Pattern Discovery Technique (opens in a new tab)Spurious Patterns (opens in a new tab)

223 Citations

Layered critical values: a powerful direct-adjustment approach to discovering significant patterns
    Geoffrey I. Webb

    Computer Science, Mathematics

    Machine Learning

  • 2008

The assignment of different critical values to different areas of the search space as an approach to alleviating this problem is investigated, using a variant of a technique originally developed for other purposes.

  • 60
  • Highly Influenced
  • PDF
A tutorial on statistically sound pattern discovery
    W. HämäläinenGeoffrey I. Webb

    Computer Science, Mathematics

    Data Mining and Knowledge Discovery

  • 2018

This tutorial introduces the key statistical and data mining theory and techniques that underpin statistically sound pattern discovery research or practice and clarifies alternative interpretations of statistical dependence and introduces appropriate tests for evaluating statistical significance of patterns in different situations.

  • 38
  • Highly Influenced
  • PDF
Finding the real patterns
    Geoffrey I. Webb

    Computer Science, Mathematics

    PAKDD

  • 2007

The problem of false discoveries is discussed, and techniques for avoiding them are presented.

  • PDF
Permutation Strategies for Mining Significant Sequential Patterns
    Andrea TononFabio Vandin

    Computer Science, Mathematics

    2019 IEEE International Conference on Data Mining…

  • 2019

The results of the experimental evaluation show that PROMISE is an efficient method that allows the discovery of statistically significant sequential patterns from transactional datasets while properly controlling for false discoveries.

  • 16
Significant Pattern Mining on Continuous Variables
    M. SugiyamaK. Borgwardt

    Computer Science, Mathematics

    ArXiv

  • 2017

This work solves the open problem of significant pattern mining on continuous variables by using Spearman's rank correlation coefficient to represent the frequency of a pattern and detects true patterns with higher precision and recall than competing methods that require a prior binarization of the data.

Statistically Significant Pattern Mining With Ordinal Utility
    Thien Q. TranKazuto f*ckuchiYouhei AkimotoJun Sakuma

    Computer Science

    IEEE Transactions on Knowledge and Data…

  • 2023

This work proposes an iterative multiple testing procedure that can alternately reject a hypothesis and safely ignore the less useful hypotheses than the rejected one, and shows that the proposed method always discovers equally or more useful patterns than Tarone-Bonferroni and Subfamily-wise Multiple Testing (SMT).

Constrained independence for detecting interesting patterns
    T. DelacroixAhcène BoubekkiP. LencaS. Lallich

    Computer Science

    2015 IEEE International Conference on Data…

  • 2015

This work defines constrained independence, a generalization to the notion of independence, to describe probabilistic models for evaluating redundancy in frequent itemset mining and provides algorithms, integrated within the mining process, for determining non-redundant itemsets.

  • 2
A statistical significance testing approach to mining the most informative set of patterns
    Jefrey LijffijtP. PapapetrouK. Puolamäki

    Computer Science

    Data Mining and Knowledge Discovery

  • 2012

The novel problem of finding the smallest set of patterns that explains most about the data in terms of a global p value is studied and it is found that a greedy algorithm gives good results on real data and that it can formulate and solve many known data-mining tasks.

  • 35
  • Highly Influenced
  • PDF
Assessing the statistical significance of association rules
    W. Hämäläinen

    Computer Science, Mathematics

    ArXiv

  • 2014

This paper inspects the most common measure functions - frequency, confidence, degree of dependence, c 2 , correlation coefficient, and J-measure - and redundancy reduc- tion techniques and gives new theoretical results which can be use to guide the search for statistically significant association rules.

Efficient mining of the most significant patterns with permutation testing
    Leonardo PellegrinaFabio Vandin

    Computer Science, Mathematics

    Data Mining and Knowledge Discovery

  • 2020

This work develops TopKWY, the first algorithm to mine the top-k significant patterns while rigorously controlling the family-wise error rate of the output, and provides theoretical evidence of its effectiveness.

  • 37
  • PDF

48 References

Discovering Predictive Association Rules
    N. MegiddoR. Srikant

    Computer Science, Mathematics

    KDD

  • 1998

Empirical evaluation shows that on typical datasets the fraction of rules that may be false discoveries is very small, and a novel approach is presented for estimating the number of "false discoveries" at any cutoff level.

  • 121
  • Highly Influential
  • PDF
Discovering significant rules
    Geoffrey I. Webb

    Computer Science

    KDD '06

  • 2006

Generic techniques that allow definitions of true and false discoveries to be specified in terms of arbitrary statistical hypothesis tests and which provide strict control over the experiment wise risk of false discoveries are presented.

  • 105
  • Highly Influential
  • PDF
Pruning and summarizing the discovered associations
    B. LiuW. HsuY. Ma

    Computer Science

    KDD '99

  • 1999

The technique first prunes the discovered associations to remove those insignificant associations, and then finds a special subset of the unpruned associations to form a summary of the discovered association rules, which are then called the direction setting rules.

  • 485
  • PDF
Finding the Most Interesting Patterns in a Database Quickly by Using Sequential Sampling
    T. SchefferS. Wrobel

    Computer Science

    J. Mach. Learn. Res.

  • 2002

A sampling algorithm that solves this problem by issuing a small number of database queries while guaranteeing precise bounds on the confidence and quality of solutions, and it is proved that there is no sampling algorithm for a popular class of utility functions that cannot be estimated with bounded error.

  • 87
  • PDF
A Statistical Theory for Quantitative Association Rules
    Y. AumannYehuda Lindell

    Computer Science, Mathematics

    KDD '99

  • 1999

A new definition of quantitative association rules based on statistical inference theory is introduced, reflecting the intuition that the goal of association rules is to find extraordinary and therefore interesting phenomena in databases.

  • 297
  • PDF
Efficient mining of emerging patterns: discovering trends and differences
    Guozhu DongJinyan Li

    Computer Science

    KDD '99

  • 1999

It is believed that EPs with low to medium support, such as 1%-20%, can give useful new insights and guidance to experts, in even “well understood” applications.

  • 1,160
  • PDF
Frequent subgraph discovery
    Michihiro KuramochiG. Karypis

    Computer Science, Chemistry

    Proceedings 2001 IEEE International Conference on…

  • 2001

The empirical results show that the algorithm scales linearly with the number of input transactions and it is able to discover frequent subgraphs from a set of graph transactions reasonably fast, even though it has to deal with computationally hard problems such as canonical labeling of graphs and subgraph isomorphism which are not necessary for traditional frequent itemset discovery.

  • 1,239
  • PDF
Finding association rules that trade support optimally against confidence
    T. Scheffer

    Computer Science

    Intell. Data Anal.

  • 2005

This work presents a fast algorithm that finds the n best rules which maximize the resulting criterion, and dynamically prunes redundant rules and parts of the hypothesis space that cannot contain better solutions than the best ones found so far.

  • 251
  • PDF
Beyond market baskets: generalizing association rules to correlations
    Sergey BrinR. MotwaniCraig Silverstein

    Computer Science, Business

    SIGMOD '97

  • 1997

This work develops the notion of mining rules that identify correlations (generalizing associations), and proposes measuring significance of associations via the chi-squared test for correlation from classical statistics, enabling the mining problem to reduce to the search for a border between correlated and uncorrelated itemsets in the lattice.

  • 1,617
  • PDF
Constraint-Based Rule Mining in Large, Dense Databases
    R. BayardoR. AgrawalD. Gunopulos

    Computer Science

    Proceedings 15th International Conference on Data…

  • 1999

A new algorithm that directly exploits all user-specified constraints including minimum support, minimum confidence, and a new constraint that ensures every mined rule offers a predictive advantage over any of its simplifications is described.

  • 665
  • Highly Influential
  • PDF

...

...

Related Papers

Showing 1 through 3 of 0 Related Papers

    [PDF] Discovering Significant Patterns | Semantic Scholar (2024)
    Top Articles
    Latest Posts
    Article information

    Author: Amb. Frankie Simonis

    Last Updated:

    Views: 5717

    Rating: 4.6 / 5 (76 voted)

    Reviews: 83% of readers found this page helpful

    Author information

    Name: Amb. Frankie Simonis

    Birthday: 1998-02-19

    Address: 64841 Delmar Isle, North Wiley, OR 74073

    Phone: +17844167847676

    Job: Forward IT Agent

    Hobby: LARPing, Kitesurfing, Sewing, Digital arts, Sand art, Gardening, Dance

    Introduction: My name is Amb. Frankie Simonis, I am a hilarious, enchanting, energetic, cooperative, innocent, cute, joyous person who loves writing and wants to share my knowledge and understanding with you.