Topics
Pattern Discovery Technique (opens in a new tab)Spurious Patterns (opens in a new tab)
223 Citations
- Geoffrey I. Webb
- 2008
Computer Science, Mathematics
Machine Learning
The assignment of different critical values to different areas of the search space as an approach to alleviating this problem is investigated, using a variant of a technique originally developed for other purposes.
- 60
- Highly Influenced
- PDF
- W. HämäläinenGeoffrey I. Webb
- 2018
Computer Science, Mathematics
Data Mining and Knowledge Discovery
This tutorial introduces the key statistical and data mining theory and techniques that underpin statistically sound pattern discovery research or practice and clarifies alternative interpretations of statistical dependence and introduces appropriate tests for evaluating statistical significance of patterns in different situations.
- 38
- Highly Influenced
- PDF
- Geoffrey I. Webb
- 2007
Computer Science, Mathematics
PAKDD
The problem of false discoveries is discussed, and techniques for avoiding them are presented.
- PDF
- Andrea TononFabio Vandin
- 2019
Computer Science, Mathematics
2019 IEEE International Conference on Data Mining…
The results of the experimental evaluation show that PROMISE is an efficient method that allows the discovery of statistically significant sequential patterns from transactional datasets while properly controlling for false discoveries.
- 16
- M. SugiyamaK. Borgwardt
- 2017
Computer Science, Mathematics
ArXiv
This work solves the open problem of significant pattern mining on continuous variables by using Spearman's rank correlation coefficient to represent the frequency of a pattern and detects true patterns with higher precision and recall than competing methods that require a prior binarization of the data.
- Thien Q. TranKazuto f*ckuchiYouhei AkimotoJun Sakuma
- 2023
Computer Science
IEEE Transactions on Knowledge and Data…
This work proposes an iterative multiple testing procedure that can alternately reject a hypothesis and safely ignore the less useful hypotheses than the rejected one, and shows that the proposed method always discovers equally or more useful patterns than Tarone-Bonferroni and Subfamily-wise Multiple Testing (SMT).
- T. DelacroixAhcène BoubekkiP. LencaS. Lallich
- 2015
Computer Science
2015 IEEE International Conference on Data…
This work defines constrained independence, a generalization to the notion of independence, to describe probabilistic models for evaluating redundancy in frequent itemset mining and provides algorithms, integrated within the mining process, for determining non-redundant itemsets.
- 2
- Jefrey LijffijtP. PapapetrouK. Puolamäki
- 2012
Computer Science
Data Mining and Knowledge Discovery
The novel problem of finding the smallest set of patterns that explains most about the data in terms of a global p value is studied and it is found that a greedy algorithm gives good results on real data and that it can formulate and solve many known data-mining tasks.
- 35
- Highly Influenced
- PDF
- W. Hämäläinen
- 2014
Computer Science, Mathematics
ArXiv
This paper inspects the most common measure functions - frequency, confidence, degree of dependence, c 2 , correlation coefficient, and J-measure - and redundancy reduc- tion techniques and gives new theoretical results which can be use to guide the search for statistically significant association rules.
- 1
- Highly Influenced[PDF]
- Leonardo PellegrinaFabio Vandin
- 2020
Computer Science, Mathematics
Data Mining and Knowledge Discovery
This work develops TopKWY, the first algorithm to mine the top-k significant patterns while rigorously controlling the family-wise error rate of the output, and provides theoretical evidence of its effectiveness.
- 37
- PDF
48 References
- N. MegiddoR. Srikant
- 1998
Computer Science, Mathematics
KDD
Empirical evaluation shows that on typical datasets the fraction of rules that may be false discoveries is very small, and a novel approach is presented for estimating the number of "false discoveries" at any cutoff level.
- 121
- Highly Influential
- PDF
- Geoffrey I. Webb
- 2006
Computer Science
KDD '06
Generic techniques that allow definitions of true and false discoveries to be specified in terms of arbitrary statistical hypothesis tests and which provide strict control over the experiment wise risk of false discoveries are presented.
- 105
- Highly Influential
- PDF
- B. LiuW. HsuY. Ma
- 1999
Computer Science
KDD '99
The technique first prunes the discovered associations to remove those insignificant associations, and then finds a special subset of the unpruned associations to form a summary of the discovered association rules, which are then called the direction setting rules.
- 485
- PDF
- T. SchefferS. Wrobel
- 2002
Computer Science
J. Mach. Learn. Res.
A sampling algorithm that solves this problem by issuing a small number of database queries while guaranteeing precise bounds on the confidence and quality of solutions, and it is proved that there is no sampling algorithm for a popular class of utility functions that cannot be estimated with bounded error.
- 87
- PDF
- Y. AumannYehuda Lindell
- 1999
Computer Science, Mathematics
KDD '99
A new definition of quantitative association rules based on statistical inference theory is introduced, reflecting the intuition that the goal of association rules is to find extraordinary and therefore interesting phenomena in databases.
- 297
- PDF
- Guozhu DongJinyan Li
- 1999
Computer Science
KDD '99
It is believed that EPs with low to medium support, such as 1%-20%, can give useful new insights and guidance to experts, in even “well understood” applications.
- 1,160
- PDF
- Michihiro KuramochiG. Karypis
- 2001
Computer Science, Chemistry
Proceedings 2001 IEEE International Conference on…
The empirical results show that the algorithm scales linearly with the number of input transactions and it is able to discover frequent subgraphs from a set of graph transactions reasonably fast, even though it has to deal with computationally hard problems such as canonical labeling of graphs and subgraph isomorphism which are not necessary for traditional frequent itemset discovery.
- 1,239
- PDF
- T. Scheffer
- 2005
Computer Science
Intell. Data Anal.
This work presents a fast algorithm that finds the n best rules which maximize the resulting criterion, and dynamically prunes redundant rules and parts of the hypothesis space that cannot contain better solutions than the best ones found so far.
- 251
- PDF
- Sergey BrinR. MotwaniCraig Silverstein
- 1997
Computer Science, Business
SIGMOD '97
This work develops the notion of mining rules that identify correlations (generalizing associations), and proposes measuring significance of associations via the chi-squared test for correlation from classical statistics, enabling the mining problem to reduce to the search for a border between correlated and uncorrelated itemsets in the lattice.
- 1,617
- PDF
- R. BayardoR. AgrawalD. Gunopulos
- 1999
Computer Science
Proceedings 15th International Conference on Data…
A new algorithm that directly exploits all user-specified constraints including minimum support, minimum confidence, and a new constraint that ensures every mined rule offers a predictive advantage over any of its simplifications is described.
- 665
- Highly Influential
- PDF
...
...
Related Papers
Showing 1 through 3 of 0 Related Papers