文档介绍:A Dissertation for the Degree of Doctor Scientiarum
An Information Theoretic Approach
to Machine Learning
Robert Jenssen
May 2005
FACULTY OF SCIENCE
Department of Physics
University of Tromsø, NO-9037 Tromsø, Norway, telephone: +47 77 64 51 50, fax no: +47 77 64 55 80
To Guro
Abstract
In this thesis, theory and applications of machine learning systems based on information
theoretic criteria as performance measures are studied.
A new clustering algorithm based on maximizing the Cauchy-Schwarz (CS) divergence
measure between probability density functions (pdfs) is proposed. The CS divergence is es-
timated non-parametrically using the Parzen window technique for density estimation. The
problem domain is transformed from discrete 0/1 cluster membership values to continu-
ous membership values. A constrained gradient descent maximization algorithm is imple-
mented. The gradients are stochastically approximated to plexity,
making the algorithm more practical. Parzen window annealing is incorporated into the al-
gorithm to help avoid convergence to a local maximum. The clustering results obtained on
synthetic and real data are encouraging.
The Parzen window-based estimator for the CS divergence is shown to have a dual ex-
pression as a measure of the cosine of the angle between cluster mean vectors in a feature
space determined by the eigenspectrum of a Mercer kernel matrix. A spectral clustering
algorithm is derived and implemented in feature spaces defined by the spectrum of the
affinity matrix and the Laplacian matrix, respectively. Using tools from statistics for Parzen
window-size selection, the new spectral algorithm operates in a fully automatic mode with
respect to the width of the Mercer kernel. A connection to the graph cut is also provided.
The performance of the new algorithm is quite promising.
It is further shown that Parzen window-based estimators for Renyi’s quadratic entropy
and an integrated squared error (ISE) pdf di