Analyzing group E-mail exchange to detect data leakage



Today's organizations spend a great deal of time and effort on e-mail leakage prevention. However, there are still no satisfactory solutions; addressing mistakes are not detected and in some cases correct recipients are wrongly marked as potential mistakes. In this article we present a new approach for preventing e-mail addressing mistakes in organizations. The approach is based on an analysis of e-mail exchanges among members of an organization and the identification of groups based on common topics. When a new e-mail is about to be sent, each recipient is analyzed. A recipient is approved if the e-mail's content belongs to at least one common topic to both the sender and the recipient. This can be applied even if the sender and recipient have never communicated directly before. The new approach was evaluated using the Enron e-mail data set and was compared with a well known method for the detection of e-mail addressing mistakes. The results show that the proposed approach is capable of detecting 87% of nonlegitimate recipients while incorrectly classifying only 0.5% of the legitimate recipients. These results outperform previous work, which reports a detection rate of 82% without reference to the false positive rate.