Summary of Tail At Scale Paper

Every day we interact with a lot of web services like Google, Netflix, Instagram, etc. For example, Google responds with search results very quickly to user input while processing terabytes of data spanning over a lot of servers. We would love to use such web services because of their responsiveness. But It is challenging for the service providers to keep the tail latency short when the size and complexity of the system scale up or overall use increases.

What is Tail Latency?


Why is it important to reduce Tail Latency?

Just as fault-tolerant computing aims to create a reliable whole out of less-reliable parts, large online services need to create a predictably responsive whole out of less-predictable parts

Why Latency Variability exists?

Reducing Component Variability

While effective caching layers can be useful, even a necessity in some systems, they do not directly address tail latency

Living with Latency Variability

Cross Request Long Term Adaptations

Within Request Immediate-Response (Short-Term) Adaptations

Large Information retrieval systems



This paper tells us about the importance of tail latency when the size and complexity of the system scale up, about the causes of the latency variability, and also provided with proven tail tolerant techniques to reduce the overall latency of the system.