文档介绍:社会化媒体研究数据的抽样
Sampling Methods for Social Media Data
祝建华 Jonathan Zhu
Collection of Social Media Data
Offline: Online:
• Survey • Online surveys
• Experiment • Server log files
• Content analysis • Client log files
• Observations • Webpage crawling
• etc. • etc.
Determining factors:
1. Validity (data representative of the population under study)
2. Reliability (minimal quantity of maximum precision)
3. Practicality (no access to backend data; everything from open
sources)
2
Structure of Social Media Data
1. Attributes-based data 3. work data
I X Y Z …… I X Y Z J1 J2 …
1 X1 y1 Z1 …… 1 X1 y1 Z1 J11 J21 …
2 X2 y2 Z2 …… 2 X2 y2 Z2 J12 J22 …
………………………………
n Xn yn Zn …… n Xn yn Zn J1n J2n …
2. work data
I1 I2 I3 … In
I1 - e12 e13 … e1n
I2 e21 - e23 … e2n
Egos Alters
I3 e31 e32 - … e3n
………………
In en1 en2 en3 …-
3
Sampling Approaches
Data Structure
Social Media Attributes Data work Data work Data
Blogs
Probability
Wikis
Sampling
…
Probability or
SNSs Snowball
Sampling
Microblogging sites
Snowball
Sampling
P2P sites
…
4
Traditional Sampling Methods
Used in Social Science Research
Nonprobability Sampling: Probability Sampling:
• Convenience • Simple Random (.,
• Purposive Random Digit Dialing)
• Snowballing • Systematic
• Quota • Stratified
• etc. • Cluster
• Multistage
5
Comparison of Four Sampling Methods
Source: Zhu et al. (2011). Social puter Review,
http://ssc./content/early/2010/09/16/
6
Our Recent Works
. Probability sampling:
J. J. H. Zhu, Q. Mo, F. Wang, & H. Lu (2011). A random
digit search (RDS) method for sampling of blogs and
other web content. Social puter Review, 29
(3). http://ssc./content/early/2010/09/16/
. Snowball sampling:
J. J. H. Zhu, L. Zhang, M. H. Yang, Q. R. Liu, H. Lu, & J.
Jiang (2011). Popular-Alter Driven Sampling (PADS) for