Written by Lenin Kumar Pothabattula
on July 19, 2020

Introduction to Rate Limiting

Have you ever wondered how some websites protect themselves from DDoS attacks or why we get status code HTTP 429 in response to calling some API (Ex: GitHubAPI) too many times. The solution behind them is Rate limiting.

What is Rate Limiting?

Rate limiting is used to control the amount of incoming traffic to a service or an API. For example, an API limits 100 requests per user in an hour using a Rate Limiter. If you send more than 100 requests then you will recieve TooManyRequests error. This way it controls incoming traffic as well as mitigates the DDoS attacks.

Types of Rate Limiting

Server level Rate Limiting
- Server level Rate Limiting can be done using Nginx server.
- By using Nginx server for running our service, we can control the traffic based on incoming URL or IP address.
- You can learn more about it here.
API level Rate Limiting
- API level Rate Limiting can be achieved by writing a middleware on top of API end points.
- Every request will be inspected by middleware and it will decide whether the request should be processed or rejected.
- If the request is rejected by middleware, API sends response with status code HTTP 429.

In this post, we will be discussing more about API level Rate limiting using GoLang.

API level Rate Limiting

Let’s understand API level Rate Limiting by implementing simple Rate Limiter which accepts one request per sec as middleware on an API end point.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
  // main.go
  package main
  import (
    "fmt"
    "net/http"
  )

  func main() {
    // Create New HTTP multiplexer
    mux := http.NewServeMux()
    // Wrap helloHandler with middleware(ratelimit) 
    mux.HandleFunc("/hello", ratelimit(helloHandler))
    fmt.Println("Server listening at 8888:...")
    http.ListenAndServe(":8888", mux)
  }

  func helloHandler(w http.ResponseWriter, r *http.Request) {
    w.Write([]byte("Hello World!\n"))
  }

Above code snippet listens at end point localhost:8888/hello . We are limiting the number of incoming calls to the endpoint by calling ratelimit method before calling helloHandler. This ratelimit method decides whether the incoming request should be processed or rejected. Let’s implement the ratelimit using package x/time/rate. This package internally implements Token bucket algorithm which we will discuss in the next section.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
  // RateLimiter.go
  package main
  import (
    "net/http"
    "golang.org/x/time/rate"
  )

  var limiter *rate.Limiter = rate.NewLimiter(1, 1)

  func ratelimit(next http.HandlerFunc) http.HandlerFunc {
    return func(w http.ResponseWriter, r *http.Request) {
      // Checking whether request should be processed or not
      if !limiter.Allow() {
        http.Error(w, "Limit on Number requests has Exceeded", http.StatusTooManyRequests)
        return
      }
      next.ServeHTTP(w, r)
    }
  }

At line 8, we first created a global instance of Limiter by calling NewLimiter with rate and burst equal to one. The ratelimit method accepts helloHandler as parameter and returns a new http.HandlerFunc. At line 12, new handler checks whether the request should be processed or not. If not, it will respond with status code http.StatusTooManyRequests else it proceeds.

For example, i ran the above code locally and sent two requests concurrently. One of them failed with the below error which is what expected

Architecture

How Token Bucket Algorithm works?

Let’s define a struct TokenBucket with Rate, Burst and Available as parameters.

1
2
3
4
5
6
7
8
9
10
  // Burst is the maximum number of tokens can be present in bucket
  // Rate is the number of tokens to be added per sec to the bucket
  // Available is the number of tokens currently available in the bucket
  type TokenBucket struct {
    Rate      int
    Burst     int
    Available int
    mu        sync.RWMutex
  }

Architecture

Fill method fills the bucket with Rate tokens every second

1
2
3
4
5
6
7
8
9
  func (tokenBucket *TokenBucket) Fill() {
    ticker := time.NewTicker(time.Second)
    for range ticker.C {
      tokenBucket.mu.Lock()
      defer tokenBucket.mu.Unlock()
      tokenBucket.Available = max(tokenBucket.Burst, tokenBucket.Available+tokenBucket.Rate)
    }
  }

Take method reduces the available tokens by one and returns true if there are any available tokens

1
2
3
4
5
6
7
8
9
10
  func (tokenBucket *TokenBucket) Take() bool {
    tokenBucket.mu.Lock()
    defer tokenBucket.mu.Unlock()
    if tokenBucket.Available > 0 {
      tokenBucket.Available--
      return true
    }
    return false
  }

In the below code snippet, at line 7 TokenBucket is created with burst and rate equal to 1. line 9 to line 19 spawns five Go Routines and each tries to access the token.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
  package main
  import (
    "fmt"
    "sync"
  )
  func main() {
    tokenBucket := NewTokenBucket(1, 1)
    var wg sync.WaitGroup
    for i := 0; i < 5; i++ {
      wg.Add(1)
      go func(wg *sync.WaitGroup) {
        defer wg.Done()
        if tokenBucket.Take() {
          fmt.Println("Available")
        } else {
          fmt.Println("Not Available")
        }
      }(&wg)
    }
    wg.Wait()
  }

Result of running the above code can be seen in the below image. First Go Routine was able to access the token and remaining Go Routines failed to access because only one token was available at that second.

Architecture

Complete implementation for Token Bucket Algorithm is available here

More Usecases

We have discussed the usecase which limits number of requests globally. What if we want to limit number of requests per user?
- Hint: In-Memory Cache and Invalidating the old entries
How would you scale the above usecase When there are millions of users?
- Hint: External cache like Redis or Memcached

In conclusion, Rate Limiting helps by securing the APIs from DDoS attacks and also controls incoming traffic.

Introduction to Rate Limiting

What is Rate Limiting?

Types of Rate Limiting

API level Rate Limiting

How Token Bucket Algorithm works?

More Usecases

Further Reading