文档介绍:
基于主题的学术社区发现算法
王萌星,卢美莲**
(北京邮电大学网络与交换技术国家重点实验室,北京 100876)
5
10
15
20
25
30
35
40
摘要:针对基于拓扑的社区发现算法存在的对数据集要求较高等问题,本文提出一种基于主
题的学术社区发现算法。首先利用主题模型提取作者间关系,构建作者关联网络;在此基
础上利用 GN 算法进行社区发现。该方法有效解决了引用关系稀疏导致网络结构松散的问
题,从而得到很好的社区构建结果,提高了社区的模块度。
关键词:数据挖掘;主题模型;社区发现;GN 算法;LDA
中图分类号:TP391
The munity Identification based on Topic
WANG Mengxing, LU Meilian
(State Key Lab working & Switching Technology, Beijing University of Posts &
munication, Beijing 100876)
Abstract: In the field munity identification, muinty identification algorithm based
on topology can't get good identification result with sparse information in dataset. Aiming to solve
the problem, in this paper, we proposed a new munity identification algorithm based
on topic. First, topic model is utilized to extract relationships among authors and build the
work of authors. Then on the basis of the author's work, GN algorithm
is used to find munity. As shown in experimental result, the proposed algorithm is
effective to solve the problem of work structure result from the sparse reference
information. And the proposed algorithm also obtains munity identification result, and
improves the modularity munity.
Key words: Data Mining; Topic Model; Community Identification; GN Algorithm; LDA
0 引言
现实生活中的许多系统都可以用复杂网络的抽象图来表示,其中网络中的每个节点对应
个体,连接节点的边则表示个体间的某种关系。随着对网络性质与数学特性的深入研究,学
者发现大多网络都具有社区结构的共性。也就是说,整个网络是由若干个“群”或