文档介绍:The Promises and Perils of Mining GitHubEirini KalliamvakouUniversity of ******@ GousiosDelft University of TechnologyG.******@ BlincoeUniversity of ******@Leif SingerUniversity of ******@ M. German?University of ******@ DamianUniversity of ******@ over 10 milliongitrepositories, GitHub is ingone of the most important source of software artifacts onthe . Researchers are starting to mine the infor-mation stored in GitHub’s event logs, trying to understandhow its users employ the site to collaborate on , so far there have been no studies describing thequality and properties of the data available from document the results of an empirical study aimed at un-derstanding the characteristics of the repositories in GitHuband how users take advantage of GitHub’s main features—mits, pull requests, and issues. Our results indi-cate that, while GitHub is a rich source of data on softwaredevelopment, mining GitHub for research purposes shouldtake various potential perils into consideration. We show,for example, that the majority of the projects are personaland inactive; that GitHub is also being used for free storageand as a Web hosting service; and that almost 40%of all pullrequests do not appear as merged, even though they provide a set of mendations for software engineer-ing researchers on how to approach the data in and Subject [Software Engineering]: Management—Software con-?guration managementGeneral TermsSoftware EngineeringKeywordsMining software repositories,git, GitHub, code . INTRODUCTIONGitHub is a collaborative code hosting site built on topof thegitversion control system. GitHub introduced a?Corresponding AuthorPermission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distrib