文档介绍:
第 44 卷第 11 期上海交通大学学报 Vol. 44 No. 11
2010 年 11 月 JO U RN A L OF SH A N GH AI JIA O T O NG U N IV ERSIT Y Nov. 2010
文章编号: 1006
2467( 2010) 11
1496
05
基于 LDA 话题关联的话题演化
楚克明,
李
芳
( 上海交通大学电子信息与电气工程学院, 上海 200240)
摘
要: 话题演化可以帮助人们快速获取信息和了解趋势. 提出了一种挖掘话题随时间变化的方
法, 通过话题抽取和话题关联实现话题的演化. 对不同时间段的文集进行话题的自动抽取, 话题数
目在不同时间段是可变的; 计算相邻时间段中任意 2 个话题的分布距离和话题的特征向量相似度
实现话题的关联. 实验结果证明, 该方法不但可以描述同一个话题随时间的强度变化, 还可以描述
新话题的产生, 旧话题的消失以及话题内容随时间的演化.
关键词: 话题探测; 话题关联; 话题演化; 潜在狄里特里分配
中图分类号: T P 391
文献标志码: A
Topic Evolution Based on LDA and Topic Association
CH U K e
M ing ,
L I Fang
( Scho ol o f Electronic, Information and Electr ical Engineering , Shanghai Jiaotong U niv er sity,
Shang hai 200240, China)
Abstract: Topic ev olution w ill help people to learn inform ation quickly. In this paper, a m ethod w as pro
po sed to disco ver topic
s evolutio n ov er time by topic detectio n and relating topics in different time peri
ods. T he m ethod applies LDA mo del on temporal do cuments to ex tract to pics. T he num ber of topics in dif
ferent tim e periods is different. Relating topics in consecutive time periods is based on Jensen
Shannon di
vergence and featur es sim ilarity. Ex periments show that the method can detect new topics and describe
topic
s evo lution over time effectively. It not only show s that the to pics ev olve w ith tim e, but also that the