Abstract: In this paper we present a technique to derive rules describing contrast sets. Contrast sets are a formalism to represent groups differences. We propose a novel approach to describe directional contrasts using rules where the contrasting effect is partitioned into pairs of groups. Our approach makes use of a directional Fisher Exact Test to find significant differences across groups. We used a Bonferroni within-search adjustment to control type I errors and a pruning technique to prevent derivation of non significant contrast set specializations.
Keywords: Contrast Sets, association rules, Fisher exact test, Bonferroni adjustment
Abstract: In this paper we propose a framework for defining and discovering optimal association rules involving a numerical attribute A in the consequent. The consequent has the form of interval conditions (A < x , A ⩾ x or A ∈ I where I is an interval or a set of intervals of the form [x l , x u )). The optimality is with respect to leverage, one well known association rule interest measure. The generated rules are called Maximal Leverage Rules (MLR) and are generated from Distribution Rules. The principle for finding the MLR is related to the Kolmogorov-Smirnov…goodness of fit statistical test. We propose different methods for MLR generation, taking into account leverage optimallity and readability. We theoretically demonstrate the optimality of the main exact methods, and measure the leverage loss of approximate methods. We show empirically that the discovery process is scalable.
Show more
Keywords: Numerical association rules, leverage, optimal association rules, distribution rules
Abstract: The Symbolic Aggregate Approximation (iSAX) is widely used in time series data mining. Its popularity arises from the fact that it largely reduces time series size, it is symbolic, allows lower bounding and is space efficient. However, it requires setting two parameters: the symbolic length and alphabet size, which limits the applicability of the technique. The optimal parameter values are highly application dependent. Typically, they are either set to a fixed value or experimentally probed for the best configuration. In this work we propose an approach to automatically estimate iSAX's parameters. The approach - AutoiSAX - not only discovers the…best parameter setting for each time series in the database, but also finds the alphabet size for each iSAX symbol within the same word. It is based on simple and intuitive ideas from time series complexity and statistics. The technique can be smoothly embedded in existing data mining tasks as an efficient sub-routine. We analyze its impact in visualization interpretability, classification accuracy and motif mining. Our contribution aims to make iSAX a more general approach as it evolves towards a parameter-free method.
Show more
Keywords: Time series, data mining, representation, iSAX, parameters