文档介绍：Generalization performance of graph-based
semi-supervised classiﬁcation ∗
Hong Chen Luoqing Li
Faculty of Mathematics puter Science, Hubei University
Wuhan 430062, P. R. China
Abstract
Semi-supervised learning has been of growing interest over the past few years and
many methods have been proposed. Although there are various algorithms to im-
plement semi-supervised learning, the crucial issue of dependence of generalization
error on the number of labeled and unlabeled examples is still poorly understood. In
this paper, we consider a regularization graph-based semi-supervised classiﬁcation
algorithm. By introducing a deﬁnition of graph cut, we illustrate some relations
of graph cut and regularization error. Then, based on the structural invariants
of the data graph, generalization error bounds of the graph-based algorithm are
established
Keywords Semi-supervised learning, generalization error, graph Laplacian, graph
cut, localized envelope
AMS(2000) subject classiﬁcation 68T05; 68G05; 68P30
1 Introduction
The problem of learning from labeled and unlabeled data has attracted considerable at-
tention in recent years. The key point of semi-supervised learning is how to improve our
generalization performance utilizing unlabeled data. Some new semi-supervised methods
are proposed in the past years, which include marginal methods (see [19, 22]), co-training
in [6] and various graph methods (see [3, 7]). In general, these algorithms assume that
∗The research was partially supported by NSFC under grant 10771053, by the National Research
Foundation for the Doctoral Program of Higher Education of China (SRFDP) under grant 20060512001,
and by Natural Science Foundation of Hubei Province under grant 2007ABA139.
1

data is obtained by an underlying manifold embedded in a high dimension space. Recent
approaches to dimensionality reduction, feature selection and classiﬁcation belong to this
setting (see [4, 26]). Recently, Belkin