Mitigating High Latency Outliers for Cloud-Based Telecommunication Services



Telecommunication applications are distinguished by their stringent requirements for availability and completion times. A highly available, low-latency, distributed data store is therefore a critical component of cloud-based realizations of telecommunication services. We present a systematic experimental evaluation of state-of-the-art database systems as components of telecommunication applications. We show that while their average latencies are well within the required time scales, the distribution of latencies exhibits a long tail of unacceptably large outliers which may significantly impair meeting the performance requirements of telecommunication applications. To address the observed phenomenon of high latency outliers, we present a new solution that is implemented in a Bell Labs system code named Flurry. Flurry is based on using the first response from a replica rather than waiting for all or a quorum of responses from replicas. To handle incorrect responses arising from message losses, Flurry uses a novel checking algorithm based on vector clocks to determine the correctness of a replica's response. We present experimental evaluation results which show that Flurry significantly reduces both the average response time and the probability of unacceptable response times to values that would allow meeting the availability and completion time thresholds required for telecommunication services. © 2012 Alcatel-Lucent.