文档介绍:: .
erg queries have been recently identified as important queries for many applications
belonging to this category. These applications can be found in data mining [3, 20, 26],
information retrieval [15, 18, 24, 25], decision support and data warehouse [7], web
mining [9] and top k queries [10, 11]. The iceberg queries are formally introduced by
Fang et al. [12]. Detailed application examples have been also presented in [12]. These
queries have been extended to data cubes in [7]. Moreover, they are covered in database
textbooks; . [23]. These queries can be characterized by their huge input-small output.
The iceberg refers to the input, and the tip of it refers to the output. Typical applications
4748 Khaled AlSabti
of the iceberg queries can have very large databases; . several gigabytes or more [13].
Below, we give a formal definition of the iceberg queries that we consider in this work.
Problem statement: Iceberg queries are characterized as queries with a huge input and
small output. In this paper, we consider an important class of these queries, which
returns frequently occurring values from a set of attributes. Below, we present a formal
definition of these queries. Given a relation R that consists of n tuples each with m
attributes and a set of attributes ai1, ai2,..., aik, find the values of the tuples (. the tip of
the iceberg) which have attributes ai1, ai2,..., aik, that are replicated more than a pre-
specified threshold f. The assumptions are (1) relation R cannot fit into the main memory
and (2) f is a relatively large percentage so that the output of the query is very small
compared to the input. The type of the specified attribute(s) affects the computation
requirement of the problem. For categorical attribut