Rate Limiting protects a service or system from excessive load caused by bad or misconfigured actors and allows good actors to cooperate with the system, sharing resources equally among clients.
A publically available system can be overwhelmed by client requests, causing degraded performance or outages. Misconfigured or malicious clients may issue excessive requests, using up the available resources and impacting availability for other clients.
Clients should be able to expect a level of availability for a service that meets a stated service level agreement or receives an equal share of resources as other clients when performance degrades.
Use rate limiting to promise and enforce a request limit over time intervals for each client. Communicate the client’s limit, remaining allocation, and time before reset via standard means, such as HTTP headers.
Define different classes of rate-limiting, depending on the service, to encourage clients to authenticate, allowing for better auditing and accounting in the system. As trust in a client increases, increase their request limit. Anonymous clients should receive the lowest limits. Authenticated clients should receive a higher limit. Administrative clients should receive the highest limits.
If some resources are disproportionately expensive, use cost-based rate limiting. Requests that cost the system more in compute or other resources use more of a client’s allocated limit. Increasing fairness amongst clients and reducing the target surface for a denial of service.
Two common implementations of rate limiting are the fixed window and the token bucket.
In the fixed window algorithm, a client is allocated a maximum of n requests for every fixed size time window, with a known boundary. Once the client has performed n requests, they must wait until the end of the window before their request allotment is refilled.
The token bucket algorithm (similar to the leaky bucket algorithm) grants each client a bucket of at most n tokens. Each request uses 1 token. The bucket’s tokens are refilled at a rate of m per interval t. Once the client has emptied their bucket, they must only wait for the interval t to pass before they can perform another m requests.
The fixed window is conceptually simpler to respond to in a client. The token bucket accommodates clients that may burst requests.
With its low latency and data structures that support expiry and sorting, Redis is a popular choice for implementing rate limiting shared amongst application instances.
Prefab.cloud supports multiple rate-limiting algorithms as a service.
In addition to its other content acceleration and security features Cloudflare, offers a rate-limiting product for an additional fee.
Load balancers like HAProxy can provide some rate limiting. As they operate below the application layer, a load balancer may lack the context needed to perform fine-grained limiting based on resources or users.