Conference 2015
Top image

 
Home
Program LNMB Conference
Invited Speakers LNMB Conference
Program PhD presentations
Abstracts PhD presentations
Registration LNMB Conference
Announcement NGB/LNMB Seminar
Abstracts/Bios NGB/LNMB Seminar
Registration NGB/LNMB Seminar
Registered Participants
Conference Office
How to get there
 
Return to LNMB Site
 

Alexandre Proutiere: Community Detection via Random and Adaptive Sampling

Abstract: Extracting structures or communities in networks is a central task in many disciplines including social sciences, biology, computer science, statistics, and physics. Applications are numerous. For instance, in social networks, one hopes that identifying clusters of users provides fundamental insights into the way users behave and interact, and in turn, helps the design of efficient recommender systems or the development of other marketing and advertisement techniques. Naturally, a user attached to a particular community interacts a lot more with users within this cluster than with other users. Most methods for community detection assume that user interactions can be represented as a graph whose edges represent user pairs known to interact. This graph is first extracted from observed pairwise interactions and then partitioned into communities. Hence in most studies, the process of gathering information on users and the extraction of communities are decoupled. In this lecture, we explore a more general framework where strategies to gather information, i.e., sampling schemes, and clustering algorithms can be designed jointly. The interaction between pairs of nodes may be sampled from a large available data set, which allows a given pair of users to be sampled several times. For a given budget of user pair samples or observations, we wish to jointly design a sampling strategy (the sequence of sampled user pairs) and a clustering algorithm that recovers the hidden communities with the highest possible accuracy, i.e., the proportion of misclassified users has to be minimized. We investigate two classes of sampling strategies: (i) non-adaptive random strategies where the sequence of observed user pairs is decided a priori, and (ii) adaptive strategies under which the user pair sampled next depends on the previously sampled pairs, and the corresponding outcomes. For both classes of sampling strategy, we identify fundamental performance limits satisfied by any joint sampling and clustering algorithm, and also devise simple algorithms that approach these limits. We finally present streaming memory-limited versions of these algorithms.