Google+ Social Networks with Node Attributes


Papers

If you use our dataset in your papers, please cite the following papers.

  • Neil Zhenqiang Gong, Wenchang Xu and Dawn Song . "Reciprocity in Social Networks: Measurements, Predictions, and Implications". In arXiv:1302.6309, 2013.
  • Neil Zhenqiang Gong, Ameet Talwalkar, Lester Mackey, Ling Huang, Eui Chul Richard Shin, Emil Stefanov, Elaine(Runting) Shi and Dawn Song . "Joint Link Prediction and Attribute Inference using a Social-Attribute Network". Accepted by ACM Transaction on Intelligent Systems and Technology (ACM TIST), 2013.
  • Neil Zhenqiang Gong, Wenchang Xu, Ling Huang, Prateek Mittal, Emil Stefanov, Vyas Sekar and Dawn Song. "Evolution of Social-Attribute Networks: Measurements, Modeling, and Implications using Google+". In ACM/USENIX Internet Measurement Conference (IMC), 2012.
  • Neil Zhenqiang Gong, Ameet Talwalkar, Lester Mackey, Ling Huang, Eui Chul Richard Shin, Emil Stefanov, Elaine(Runting) Shi and Dawn Song . "Jointly Predicting Links and Inferring Attributes using a Social-Attribute Network (SAN)". In ACM Workshop on Social Network Mining and Analysis (SNA-KDD), 2012.


  • Dataset Summary

    This published dataset consisting of 4 Google+ snapshots is a subset of the dataset studied in our IMC'12 paper. Each snapshot includes both directed social structure and node attributes, which can be represented by the following Social-Attribute Network. Snapshots 3 and 4 were crawled after Google+ was opened to the public.

    a picture of san


    Table I. Dataset summary

    #Social nodes #Social links #Attri nodes #Attri links Crawled time TimeID
    snapshot 1 4,693,129 47,130,325 991,545 3,644,103 Jul., 2011 0
    snapshot 2 17,091,929 271,915,755 3,108,141 14,693,125 Aug., 2011 1
    snapshot 3 26,244,659 410,445,770 4,147,389 19,344,382 Sep., 2011 2
    snapshot 4 28,942,911 462,994,069 4,443,631 20,592,962 Oct., 2011 3


    Dataset Format

    Directed social structure

    UserIDFrom UserIDTo TimeID
    Each line corresponds to a directed link. UserIDs are anonimyzed to be integers starting from 0. TimeID is 0, 1, 2 or 3, indicating which snapshot this directed link appears in.

    Node attributes

    UserID AttriID TimeID
    Each line corresponds to an undirected attribute link. AttriID are anonimyzed to be negative integers starting from -1. Again, TimeID is 0, 1, 2 or 3, indicating which snapshot this directed link appears in.

    Attributes type

    AttriID AttriType
    Each line corresponds to an attribute. AttriType could be employer, school, major or places_lived.

    Dataset Release Policy

  • To download the dataset, please send emails to Neil Zhenqiang Gong (sunblaze.gplus@gmail.com) with "[G+ Request]" in the subject. We will tell you the links to download the dataset. In your email, please include the following information (if we don't know each other). The information is needed for verification purpose.
  • If your papers use our dataset, please cite our papers.
  • You're not allowed to further distribute the dataset without our permission.
  • Sending us emails for our dataset implies that you are aware of and agree with the above policies.