CTRL+Z: Recovering Anonymized Social Graphs

Zhang, Yang; Humbert, Mathias; Surma, Bartlomiej; Manoharan, Praveen; Vreeken, Jilles; Backes, Michael

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Paper

CTRL+Z: Recovering Anonymized Social Graphs

MPS-Authors

/persons/resource/persons79525

Vreeken, Jilles
Databases and Information Systems, MPI for Informatics, Max Planck Society;

External Resource

No external resources are shared

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

arXiv:1711.05441.pdf
(Preprint), 4MB

Supplementary Material (public)

There is no public supplementary material available

Citation

Zhang, Y., Humbert, M., Surma, B., Manoharan, P., Vreeken, J., & Backes, M. (2017). CTRL+Z: Recovering Anonymized Social Graphs. Retrieved from http://arxiv.org/abs/1711.05441.

Cite as: https://hdl.handle.net/21.11116/0000-0000-6463-0

Abstract

Social graphs derived from online social interactions contain a wealth of information that is nowadays extensively used by both industry and academia. However, due to the sensitivity of information contained in such social graphs, they need to be properly anonymized before release. Most of the graph anonymization techniques that have been proposed to sanitize social graph data rely on the perturbation of the original graph's structure, more specifically of its edge set. In this paper, we identify a fundamental weakness of these edge-based anonymization mechanisms and exploit it to recover most of the original graph structure. First, we propose a method to quantify an edge's plausibility in a given graph by relying on graph embedding. Our experiments on three real-life social network datasets under two widely known graph anonymization mechanisms demonstrate that this method can very effectively detect fake edges with AUC values above 0.95 in most cases. Second, by relying on Gaussian mixture models and maximum a posteriori probability estimation, we derive an optimal decision rule to detect whether an edge is fake based on the observed graph data. We further demonstrate that this approach concretely jeopardizes the privacy guarantees provided by the considered graph anonymization mechanisms. To mitigate this vulnerability, we propose a method to generate fake edges as plausible as possible given the graph structure and incorporate it into the existing anonymization mechanisms. Our evaluation demonstrates that the enhanced mechanisms not only decrease the chances of graph recovery (with AUC dropping by up to 35%), but also provide even better graph utility than existing anonymization methods.