Neil Zhenqiang Gong

         Ph.D. Candidate
         Computer Science Division
         University of California Berkeley
         Office: 721 Soda Hall

Home                                      Publications                                    Teaching                                    Dataset                                      Press

Google+ Social Networks with Node Attributes

Dataset Summary

This published dataset consisting of 4 Google+ snapshots is a subset of the dataset studied in our IMC'12 paper. Each snapshot includes both directed social structure and node attributes, which can be represented by the following Social-Attribute Network. Snapshots 3 and 4 were crawled after Google+ was opened to the public.

Table I. Dataset summary

#Social nodes #Social links #Attri nodes #Attri links Crawled time TimeID
snapshot 1 4,693,129 47,130,325 991,545 3,644,103 Jul., 2011 0
snapshot 2 17,091,929 271,915,755 3,108,141 14,693,125 Aug., 2011 1
snapshot 3 26,244,659 410,445,770 4,147,389 19,344,382 Sep., 2011 2
snapshot 4 28,942,911 462,994,069 4,443,631 20,592,962 Oct., 2011 3

Dataset Format

Directed social structure

UserIDFrom UserIDTo TimeID
Each line corresponds to a directed link. UserIDs are anonimyzed to be integers starting from 0. TimeID is 0, 1, 2 or 3, indicating the snapshot in which this directed link first appears.

Node attributes

UserID AttriID TimeID
Each line corresponds to an undirected attribute link. AttriID are anonimyzed to be negative integers starting from -1. Again, TimeID is 0, 1, 2 or 3, indicating the snapshot in which this link first appears.

Attribute types

AttriID AttriType
Each line corresponds to an attribute. AttriType could be employer, school, major or places_lived.

Reconstructing the tth Snapshot

To obtain the tth snapshot, you should keep all edges whose TimeIDs are less than t, where t=1,2,3,4.


Dataset Release Policy

  • To download the dataset, please send emails to Neil Zhenqiang Gong ( with "[G+ Request]" in the subject. We will tell you the links to download the dataset. In your email, please include the following information (if we don't know each other).

    • Name.
    • Affiliation.
    • Homepage.
    • A short description about what you're going to do with our dataset. Some keywords (e.g., link prediction, attribute inference, evolution) are enough. We don't need to know details.
    The information is needed for verification purpose.
  • If your papers use our dataset, please cite our papers.
  • You're not allowed to further distribute the dataset without our permission.
  • Sending us emails for our dataset implies that you are aware of and agree with the above policies.