文档介绍:A High-Performance Distributed Algorithm for Mining Association Rules
Assaf Schuster, Ran Wolff, and Dan Trock
Technion – Israel Institute of Technology
¡
Email: assaf,ranw,dtrock ¢ ***@
Abstract the largest itemset in Apriori [3], to typically just a single
scan in modern ARM algorithms such as Sampling and DIC
We present a new distributed association rule mining [17, 5].
(D-ARM) algorithm that demonstrates superlinear speedup Much progress has also been made in parallelized algo-
with the number puting nodes. The algorithm is rithms. With these, the architecture of the parallel system
the first D-ARM algorithm to perform a single scan over plays a key role. For instance, many algorithms were pro-
the database. As such, its performance is unmatched by posed which take advantage of the fast interconnect, or the
any previous algorithm. Scale-up experiments over stan- shared memory, of puters. The latest develop-
dard synthetic benchmarks demonstrate stable run time re- ment with these is [18], in which each process makes just
gardless of the number puters. Theoretical analysis two passes over its portion of the database.
reveals a tighter bound on error probability than the one
puters are, however, very costly. Hence,
shown in the corresponding sequential algorithm.
although these algorithms were shown to scale up to 128
processors, aniz