文档介绍:Mathematics Vol. 1, No. 2: 226-251
A Brief History of
Generative Models for
Power Law and Lognormal
Distributions
Michael Mitzenmacher
Abstract. Recently, I became interested in a current debate over whether file size
distributions are best modelled by a power law distribution or a lognormal distribution.
In trying to learn enough about these distributions to settle the question, I found a rich
and long history, spanning many fields. Indeed, several recently proposed models from
puter munity have antecedents in work from decades ago. Here,
Ibriefly survey some of this history, focusing on underlying generative models that
lead to these distributions. One finding is that lognormal and power law distributions
connect quite naturally, and hence, it is not surprising that lognormal distributions
have arisen as a possible alternative to power law distributions across many fields.
1. Introduction
Power law distributions (also often referred to as heavy-tail distributions, Pareto
distributions, Zipfian distributions, etc.) are now pervasive puter science;
See, for example, [Broder et al. 00, Crovella and Bestavros 97, Faloutsos et al.
99]. Numerous other examples can be found in the extensive bibliography of this
paper.
This paper was specifically motivated by a recent paper by Downey [Downey
01] challenging the now conventional wisdom that file sizes are governed by a
© A K Peters, Ltd.
1542-7951/04 $ per page 226
Mitzenmacher: Generative Models for Power Law and Lognormal Distributions 227
power law distribution. The argument was substantiated both by collected data
and by the development of an underlying generative model which suggested that
file sizes were better modeled by a lognormal In my attempts
to learn more about this question, I was drawn to the history of lognormal
and power law distributions. As part of this process, I delved into past and
present literature, and came across some interesting facts that appear not