Rate Limiting, Circuit Breakers, and Retry Storms: Building Resilient APIs
Your API handles 1,000 requests/second smoothly. Then a downstream service goes down, clients start retrying, and suddenly you're drowning in 50,000 requests/second. Here's how to prevent cascade failures.
The Retry Storm That Took Down Production
Here's what happened to a real system: A database replica went down for 2 minutes. The API started returning 500 errors. Every client had retry logic: retry 3 times with no backoff. 1,000 clients × 3 retries = 3,000 requests/second hitting an already struggling server. The primary database couldn't handle the spike. Now everything was down.
This is called a retry storm, and it's the #1 cause of cascading failures in distributed systems.
Fix 1: Exponential Backoff with Jitter
Never retry immediately. Use exponential backoff (wait longer each retry) with random jitter (prevent all clients retrying at the same instant):
import random
import time
import httpx
def fetch_with_retry(url, max_retries=3):
for attempt in range(max_retries):
try:
response = httpx.get(url, timeout=5.0)
response.raise_for_status()
return response.json()
except (httpx.HTTPStatusError, httpx.TimeoutException) as e:
if attempt == max_retries - 1:
raise
# Exponential backoff: 1s, 2s, 4s
base_delay = 2 ** attempt
# Jitter: randomize to prevent thundering herd
jitter = random.uniform(0, base_delay)
delay = base_delay + jitter
print(f"Retry {attempt + 1}/{max_retries} in {delay:.1f}s")
time.sleep(delay)Fix 2: Circuit Breaker Pattern
A circuit breaker tracks failures. After a threshold (e.g., 5 failures in 30 seconds), it 'opens' and immediately returns errors without calling the downstream service. After a cooldown period, it lets one request through to test if the service recovered.
import time
from enum import Enum
class CircuitState(Enum):
CLOSED = 'closed' # normal operation
OPEN = 'open' # failing, reject all requests
HALF_OPEN = 'half_open' # testing recovery
class CircuitBreaker:
def __init__(self, failure_threshold=5, recovery_timeout=30):
self.state = CircuitState.CLOSED
self.failure_count = 0
self.failure_threshold = failure_threshold
self.recovery_timeout = recovery_timeout
self.last_failure_time = None
def call(self, func, *args, **kwargs):
if self.state == CircuitState.OPEN:
if time.time() - self.last_failure_time > self.recovery_timeout:
self.state = CircuitState.HALF_OPEN
else:
raise CircuitBreakerOpen("Service unavailable")
try:
result = func(*args, **kwargs)
self._on_success()
return result
except Exception as e:
self._on_failure()
raise
def _on_success(self):
self.failure_count = 0
self.state = CircuitState.CLOSED
def _on_failure(self):
self.failure_count += 1
self.last_failure_time = time.time()
if self.failure_count >= self.failure_threshold:
self.state = CircuitState.OPEN
# Usage
payment_breaker = CircuitBreaker(failure_threshold=5)
try:
result = payment_breaker.call(stripe.PaymentIntent.create, amount=1000)
except CircuitBreakerOpen:
return Response(
{'error': 'Payment service temporarily unavailable'},
status=503
)Fix 3: Server-Side Rate Limiting
Protect your API from both legitimate overuse and malicious abuse. Use a sliding window rate limiter backed by Redis:
# Django middleware: Redis-backed sliding window rate limiter
import time
import redis
class RateLimitMiddleware:
def __init__(self, get_response):
self.get_response = get_response
self.redis = redis.Redis(host='redis', port=6379)
def __call__(self, request):
client_ip = request.META.get('HTTP_X_FORWARDED_FOR',
request.META['REMOTE_ADDR'])
key = f'rate_limit:{client_ip}'
# Sliding window: 100 requests per 60 seconds
now = time.time()
pipe = self.redis.pipeline()
pipe.zremrangebyscore(key, 0, now - 60) # remove old entries
pipe.zadd(key, {str(now): now}) # add current request
pipe.zcard(key) # count requests
pipe.expire(key, 60) # auto-cleanup
_, _, request_count, _ = pipe.execute()
if request_count > 100:
return JsonResponse(
{'error': 'Rate limit exceeded. Try again in 60s.'},
status=429,
headers={'Retry-After': '60'}
)
return self.get_response(request)The Resilience Checklist
Client-side: Exponential backoff + jitter on all retries. Set request timeouts (never wait forever). Implement circuit breakers for each external dependency.
Server-side: Rate limiting per IP and per API key. Graceful degradation (return cached data when the database is slow). Health check endpoints for load balancers. Bulkhead pattern (isolate critical services from non-critical ones).
Infrastructure: Auto-scaling based on request queue depth, not just CPU. Separate read replicas for read-heavy endpoints. CDN for static assets and cacheable API responses.
Everything fails, all the time. The question isn't whether your dependencies will go down, but whether your system gracefully handles it when they do.
— Werner Vogels, CTO of Amazon
