Home Publications Teaching Dataset Press
Malicious Data CleaningI define malicious data as the data that makes data-driven applications useless, that is unsolicited, or that is harmful to users. For instance, in reputation systems such as Yelp, an attacker (e.g., the owner of a restaurant) could register many accounts and write fake reviews to manipulate the reputations of any restaurant, which makes the reputation systems meaningless. Moreover, popular online social networks (OSNs) suffer from a pervasive threat of malicious accounts -- Sybil identities registered en mass by miscreants for the purposes of spam and abuse. Reports from August, 2012 claim that as many as 83 million of Facebook's 900 million users are in fact fake/malicious. These abusive accounts leverage their access to millions of benign users in order to disseminate scams, carry out phishing attacks, distribute malware; harvest private user data; and spread disinformation both by hijacking conversations within a social network as well as by manipulating the authority of brands (via "+1" or "like" clicks).
Our work aims to detect malicious accounts using the social relationships between them. The intuition is that it is hard for attackers to establish trust relationships between malicious accounts and benign users. Specifically, we design SybilBelief, a scalable semi-supervised learning framework that leverages Markov Random Fields and Loopy Belief Propagation, to detect malicious/Sybil accounts using social networks among the users.