A crime reports analysis system to identify related crimes



The popularity of online and anonymous options to report crimes, such as tips websites and text messaging, has led to an increasing amount of textual information available to law enforcement personnel. However, locating, filtering, extracting, and combining information to solve crimes is a time-consuming task. In response, we are developing entity and document similarity algorithms to automatically identify overlapping and complementary information. These are essential components for systems that combine and contrast crime information. The entity similarity algorithm integrates a domain-specific hierarchical lexicon with Jaccard coefficients. The document similarity algorithm combines the entity similarity scores using a Dice coefficient. We describe the evaluation of both components. To evaluate the entity similarity algorithm, we compared the new algorithm and four generic algorithms with a gold standard. The strongest correlation with the gold standard, r = 0.710, was found with our entity similarity algorithm. To evaluate the document similarity algorithm, we first developed a test bed containing witness reports for 17 crimes shown in video clips. We evaluated five versions of the algorithm that differ in how much importance is assigned to different entity types. Cosine similarity is then used as a baseline comparison to evaluate the performance of the document similarity algorithms for accuracy in recognizing reports describing the same crime and distinguishing them from reports on different crimes. The best version achieved 92% accuracy.