PRIVACY LEAKAGE IN HEALTH SOCIAL NETWORKS

Authors


Abstract

Members of health social networks may be susceptible to privacy leaks by the amount of information they leave behind. The threat to privacy increases when members of these networks reuse their pseudonyms in other social networks. The risk of re-identifying users from such networks requires quantitative estimates to evaluate its magnitude. The estimates will enable managers and members of health social communities to take corrective measures. We introduce a new re-identification attack, the social network attack, that takes advantage of the fact that users reuse their pseudonyms. To demonstrate the attack, we establish links between MedHelp and Twitter (two popular social networks) based on matching pseudonyms. We used Bayesian networks to model the re-identification risk and used stylometric techniques to identify the strength of the links. On the basis of our model 7-11. 8% of the MedHelp members in the sample population who reused their pseudonyms in Twitter were re-identifiable compared with 1% who did not. The risk estimates were measured at the 5% risk threshold. Our model was able to re-identify users with a sensitivity of 41% and specificity of 96%. The potential for re-identification increases as more data is accumulated from these profiles, which makes the threat of re-identification more serious.

Ancillary