Why Rate Limiting Matters
In 2025, 73% of public APIs experienced some form of abuse—ranging from credential stuffing to runaway automation scripts that burned through millions of tokens in hours. Rate limiting is no longer optional infrastructure; it is a core reliability, security, and cost-control mechanism.
Without rate limiting, a single misconfigured client can exhaust your database connection pool, inflate your cloud bill by orders of magnitude, and deny service to every legitimate user. Here are four real-world incidents that illustrate why this matters at every scale.
A single developer published an open-source script that proxied OpenAI requests without any rate limiting on the proxy layer. Within 72 hours, the script was embedded in thousands of projects. The operator's OpenAI account accrued over $5M in usage before the keys were revoked. Per-user and per-IP limits at the proxy layer would have capped damage at under $100.
Attackers fired 40,000 card-validation requests per hour against a merchant's Stripe integration using rotating IPs. Without per-IP rate limits on the merchant's own backend, each attempt hit Stripe and incurred API costs. Stripe's own per-key limits eventually triggered a suspension of the merchant's account.
A newly deployed GitHub Actions workflow had an infinite retry loop on cache miss. During a CI spike, 3,000 parallel jobs each retried the cache API every 500ms. GitHub's rate limiting responded with 429s, but the client code treated 429 as a transient error and retried immediately — amplifying the problem. Exponential backoff with jitter would have self-healed within minutes.
When Twitter migrated from v1 to v2 and tightened rate limits from 900 to 500 requests per 15-minute window, hundreds of production applications that relied on "free" unlimited access collapsed overnight. Teams with proper rate limit header parsing (Retry-After, X-Rate-Limit-Remaining) recovered in hours; others spent days debugging cascading timeouts.
Caps request volume per IP or key before it saturates your origin.
Prevents runaway automation from generating unexpected cloud/API charges.
Ensures high-volume users cannot degrade experience for others.
Keeps latency and availability within contractual bounds for all tenants.
Rate Limiting Algorithms Compared
Choosing the right algorithm determines whether your API rejects bursts gracefully or allows them intentionally. Each algorithm trades precision, memory usage, and implementation complexity differently.
| Algorithm | Burst Handling | Memory | Precision | Best Use Case |
|---|---|---|---|---|
| Token Bucket | Allows burst up to bucket size | O(1) per key | Approximate | Public APIs, general purpose |
| Sliding Window Log | No burst — precise count | O(N requests in window) | Exact | Strict per-user financial APIs |
| Fixed Window Counter | 2× burst at window boundary | O(1) per key | Low | Simple quota enforcement |
| Leaky Bucket | Rejects excess bursts | O(1) per key | Moderate | Rate-smoothing, queuing systems |
Natural burst allowance mirrors real user behavior. O(1) storage. Tokens replenish continuously — no sharp window resets.
Approximate — an attacker can always consume the full bucket in the first millisecond of each refill cycle.
Default choice for public REST APIs where short bursts are acceptable (e.g., loading a dashboard that fires 5 requests at once).
Exact request count within any rolling time window. No boundary burst problem. Best auditability.
Stores a timestamp for every request in the window. At high traffic, memory consumption is proportional to request volume. Expensive to query.
Financial APIs, SMS/email sending APIs, any context where you must guarantee exactly N per window with no exceptions.
Simplest possible implementation: single Redis INCR + EXPIRE. Lowest memory and CPU overhead.
Boundary burst problem: a user can fire N requests at the end of window T and N more at the start of T+1, effectively 2N in a short span.
Coarse daily/monthly quota enforcement (e.g., "1,000 API calls per day") where sub-minute precision does not matter.
Guarantees a smooth, constant output rate regardless of input burst. Ideal for protecting downstream services from spikes.
Bursts are lost rather than queued, which frustrates legitimate users with bursty patterns. Harder to reason about for end users.
Upstream rate-smoothing before hitting a downstream service with strict capacity (e.g., a payment processor or external partner API).
Redis-Based Rate Limiting Implementation
Redis is the de facto standard for production rate limiting because it provides sub-millisecond latency, atomic operations via Lua scripts, and native TTL support. Here are the key patterns you need.
Token Bucket — INCR + EXPIRE
// Node.js — Token Bucket via Redis INCR + EXPIRE
async function isAllowed(redis, key, limit, windowSeconds) {
const current = await redis.incr(key);
if (current === 1) {
// First request in window — set TTL
await redis.expire(key, windowSeconds);
}
return current <= limit;
}
// Usage
const allowed = await isAllowed(redis, `rl:${userId}`, 100, 60);
if (!allowed) {
res.status(429).json({ error: 'Rate limit exceeded', retryAfter: 60 });
return;
}Sliding Window Log — ZADD + ZREMRANGEBYSCORE
// Sliding Window Log — stores timestamp per request as sorted set
async function slidingWindowCheck(redis, key, limit, windowMs) {
const now = Date.now();
const windowStart = now - windowMs;
const pipeline = redis.pipeline();
// Remove timestamps outside the window
pipeline.zremrangebyscore(key, '-inf', windowStart);
// Add current request timestamp
pipeline.zadd(key, now, `${now}-${Math.random()}`);
// Count requests in window
pipeline.zcard(key);
// Set TTL so keys self-clean
pipeline.expire(key, Math.ceil(windowMs / 1000) + 1);
const results = await pipeline.exec();
const requestCount = results[2][1];
return requestCount <= limit;
}Atomic Lua Script (Race-Condition Safe)
The INCR + EXPIRE pattern above has a tiny race window between the two commands. Use a Lua script to make the operation fully atomic — Redis executes Lua scripts as a single unit with no interruption.
-- rate_limit.lua — atomic token bucket
local key = KEYS[1]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local current = redis.call('GET', key)
if current and tonumber(current) >= limit then
return 0 -- rate limited
end
local new_val = redis.call('INCR', key)
if new_val == 1 then
redis.call('EXPIRE', key, window)
end
return 1 -- allowed
-- Load and call from Node.js:
-- const result = await redis.eval(luaScript, 1, key, limit, windowSecs);Using redis-rate-limiter (npm)
import { RateLimiterRedis } from 'rate-limiter-flexible';
import { createClient } from 'redis';
const redisClient = createClient({ url: process.env.REDIS_URL });
await redisClient.connect();
const rateLimiter = new RateLimiterRedis({
storeClient: redisClient,
keyPrefix: 'rl',
points: 100, // max requests
duration: 60, // per 60 seconds
blockDuration: 30, // block for 30s after limit exceeded
});
// Express middleware
export async function rateLimitMiddleware(req, res, next) {
try {
const key = req.user?.id ?? req.ip;
await rateLimiter.consume(key);
next();
} catch (rejRes) {
const secs = Math.ceil(rejRes.msBeforeNext / 1000);
res.set('Retry-After', String(secs));
res.status(429).json({ error: 'Too Many Requests', retryAfter: secs });
}
}- Redis Cluster vs Single-Node: Use cluster mode for HA. Be aware that Lua scripts must run on the same shard — ensure your keys hash to the same slot using hash tags:
{{userId}}. - Lua Atomicity: Redis executes Lua scripts single-threaded with no preemption. This is your safest option for race-free counters, but keep scripts short to avoid blocking other clients.
- TTL Management: Always set a TTL. Without it, rate limit keys accumulate indefinitely. Use slightly longer TTL than your window (window + 1 second) to avoid edge-case expiry during window evaluation.
- Graceful Degradation: If Redis is unavailable, decide upfront: fail open (allow all traffic) or fail closed (block all traffic). Most APIs fail open with logging to avoid an outage-within-an-outage.
API Gateway Rate Limiting Solutions
For most teams, the right move is to enforce rate limiting at the gateway layer before requests reach application code. Here is how the major platforms compare.
Kong Rate Limiting Plugin (YAML Config)
# kong.yml — Declarative config for rate limiting plugin
plugins:
- name: rate-limiting
config:
minute: 100 # requests per minute
hour: 2000 # requests per hour
policy: redis # local | cluster | redis
redis_host: redis.internal
redis_port: 6379
redis_timeout: 2000
limit_by: consumer # ip | consumer | credential | header | path
hide_client_headers: false
fault_tolerant: true # fail open if Redis is downAWS API Gateway — Usage Plans + Throttling
# AWS CDK — API Gateway Usage Plan with throttling
const plan = api.addUsagePlan('BasicPlan', {
name: 'Basic',
throttle: {
rateLimit: 100, // requests per second (steady state)
burstLimit: 200, // token bucket size (max concurrent burst)
},
quota: {
limit: 10000, // total requests
period: Period.MONTH, // per calendar month
},
});
// Attach an API key to the plan
const key = api.addApiKey('MyApiKey');
plan.addApiKey(key);
// Attach the plan to a stage
plan.addApiStage({ stage: api.deploymentStage });Cloudflare Rate Limiting
Cloudflare's rule builder lets you create rate limit rules based on IP, User-Agent, ASN, cookie, header value, or request path. Rules fire before the request reaches your origin, making them ideal for DDoS mitigation.
# Cloudflare Rate Limiting Rule (Terraform)
resource "cloudflare_rate_limit" "api" {
zone_id = var.zone_id
threshold = 100
period = 60 # seconds
match {
request {
url_pattern = "example.com/api/*"
schemes = ["HTTPS"]
methods = ["GET", "POST"]
}
}
action {
mode = "ban"
timeout = 300 # ban for 5 minutes
response {
content_type = "application/json"
body = "{"error":"rate limited"}"
}
}
}| Solution | Cost | Complexity | Algorithm | Best For |
|---|---|---|---|---|
| Kong | Free (OSS) / Paid | Medium | Sliding Window | Self-hosted microservices |
| AWS API GW | Pay-per-call | Low | Token Bucket | AWS-native serverless APIs |
| Cloudflare | $5/mo + overages | Low | Fixed Window | Edge DDoS protection |
| Nginx limit_req | Free | Medium | Leaky Bucket | Simple single-server setup |
API Authentication Methods Compared
Rate limiting works best when paired with strong authentication — you can only enforce fair use if you know who is making the request. Here is how the four major approaches compare across eight dimensions.
| Factor | API Keys | OAuth 2.0 | JWT | mTLS |
|---|---|---|---|---|
| Complexity | Very Low | High | Medium | High |
| Revocation | Immediate (DB lookup) | Token introspection | Hard (requires blacklist) | Certificate revocation (CRL/OCSP) |
| Expiry | Manual rotation | Short-lived tokens | Built-in (exp claim) | Certificate validity period |
| Use Case | Server-to-server, simple | User-delegated access | Stateless auth at scale | Zero-trust service mesh |
| Client Type | Any | Browser, Mobile, Server | Any | Server-to-server only |
| Server Load | DB lookup per request | Token exchange overhead | Verify signature only | TLS handshake overhead |
| Standards | None (ad-hoc) | RFC 6749 / 6750 | RFC 7519 | RFC 5246 / 8446 |
| Ecosystem | Universal | Excellent (Google, GitHub…) | Excellent (every language) | Strong (service mesh) |
JWT Best Practices
JWTs are everywhere — and widely misused. These are the non-negotiable practices for production JWT systems.
Access tokens should expire in 15 minutes. Refresh tokens can live for 7 days but must be single-use and rotated on every use. If a refresh token is used twice (replay attack), invalidate the entire refresh token family immediately.
// Issuing tokens
const accessToken = jwt.sign(
{ sub: userId, iss: 'api.yourdomain.com', aud: 'app', jti: uuidv4() },
privateKey,
{ algorithm: 'RS256', expiresIn: '15m' }
);
const refreshToken = jwt.sign(
{ sub: userId, family: familyId, jti: uuidv4() },
refreshSecret,
{ expiresIn: '7d' }
);HS256 uses a shared secret — every service that needs to verify tokens must hold the secret, creating a wide attack surface. RS256 uses a private key to sign (held only by your auth server) and a public key to verify (distributed freely to all services). Use RS256 in any multi-service architecture.
Shared secret. All verifiers must know the secret. One compromised service leaks all tokens.
Private key signs. Public key verifies. Services get the public key from a JWKS endpoint.
issIssuer — who created the token (e.g., "auth.yourdomain.com"). Validate this on every request.subSubject — the user or entity the token represents. Usually your userId.audAudience — the intended recipient service. Validate to prevent token reuse across services.expExpiration — Unix timestamp after which the token must be rejected. Always validate.iatIssued At — when the token was created. Useful for detecting clock skew attacks.jtiJWT ID — a unique token identifier. Required for implementing token blacklists.JWT payloads are base64-encoded, not encrypted. Anyone with the token can decode the payload without a key. Never include email addresses, phone numbers, credit card details, or any personally identifiable information. The payload is visible to clients, intermediary proxies, and anyone with access to server logs. Include only a userId and minimal claims needed for routing/authorization decisions.
OAuth 2.0 Flows — When to Use Each
OAuth 2.0 defines several grant types optimized for different client environments. Choosing the wrong flow is a common security mistake.
The user is redirected to the authorization server, authenticates, and is redirected back with an authorization code. The code is exchanged for tokens server-side (or via PKCE for public clients). PKCE (Proof Key for Code Exchange) prevents authorization code interception attacks — mandatory for mobile and SPA apps.
Your service authenticates directly with the auth server using its client ID and secret, and receives an access token. There is no user redirect, no consent screen. Used when your backend needs to call another service's API using its own identity.
The device displays a short code and a URL. The user goes to the URL on another device (phone/laptop) and enters the code to authorize. The device polls the token endpoint until authorization is granted or times out.
| Flow | User Present? | Redirect? | PKCE? | Token Storage |
|---|---|---|---|---|
| Auth Code + PKCE | Yes | Yes | Required | HttpOnly cookie or memory |
| Client Credentials | No | No | N/A | Server env variable / vault |
| Device Flow | Yes (on 2nd device) | No | No | Device secure storage |
Rate Limit Headers & Client Handling
Well-behaved APIs communicate rate limit state through standard response headers. This lets clients self-throttle before hitting the limit, dramatically reducing unnecessary 429 responses.
Standard Response Headers
HTTP/1.1 200 OK
RateLimit-Limit: 100
RateLimit-Remaining: 47
RateLimit-Reset: 1742600000
Retry-After: 32
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 47
X-RateLimit-Reset: 1742600000
# When rate limited (HTTP 429):
HTTP/1.1 429 Too Many Requests
Retry-After: 32
RateLimit-Policy: 100;w=60;comment="sliding window"
Content-Type: application/json
{
"error": "too_many_requests",
"message": "Rate limit exceeded. Retry after 32 seconds.",
"retryAfter": 32
}RateLimit-LimitThe maximum number of requests allowed in the current window. Set this to the limit for the caller's current tier.RateLimit-RemainingRequests remaining in the current window. Clients should start throttling themselves when this approaches zero.RateLimit-ResetUnix timestamp (seconds) when the current window resets. Clients can calculate exact wait time.Retry-AfterDefined in RFC 6585. Seconds to wait before retrying. Sent with 429 and 503 responses. Takes priority over RateLimit-Reset.RateLimit-PolicyNew IETF draft header. Describes the rate limit policy: limit;w=window;burst=N. Helps clients understand the algorithm.Client-Side 429 Handling — Exponential Backoff with Jitter
// Fetch with exponential backoff + jitter
async function fetchWithRetry(url, options = {}, maxRetries = 5) {
let attempt = 0;
while (attempt <= maxRetries) {
const response = await fetch(url, options);
if (response.status !== 429) return response;
const retryAfter = response.headers.get('Retry-After');
const baseWait = retryAfter
? parseInt(retryAfter, 10) * 1000
: Math.min(1000 * 2 ** attempt, 64000);
// Add jitter: +/- 20% of base wait to avoid thundering herd
const jitter = baseWait * 0.2 * (Math.random() - 0.5);
const waitMs = Math.round(baseWait + jitter);
console.warn(`429 received. Waiting ${waitMs}ms before retry ${attempt + 1}`);
await new Promise(resolve => setTimeout(resolve, waitMs));
attempt++;
}
throw new Error(`Max retries exceeded for ${url}`);
}Security Hardening Techniques
Rate limiting and authentication are the foundation, but production APIs need additional layers. Here are the hardening techniques used at scale.
Sign the request body and key headers using HMAC-SHA256 with a shared secret. The server recomputes the signature and rejects any request where it does not match. This prevents request tampering in transit and replay attacks (use a timestamp in the signed content, reject if timestamp is older than 5 minutes).
// HMAC request signing
import crypto from 'crypto';
function signRequest(body, secret) {
const timestamp = Math.floor(Date.now() / 1000).toString();
const payload = timestamp + '.' + JSON.stringify(body);
const sig = crypto.createHmac('sha256', secret).update(payload).digest('hex');
return { 'X-Timestamp': timestamp, 'X-Signature': `sha256=${sig}` };
}Internal APIs (metrics endpoints, admin APIs, database management interfaces) should only be reachable from known IP ranges. Use your cloud provider's security group rules (AWS: security groups, GCP: firewall rules, Cloudflare: IP Access Rules) to allowlist your office IPs, VPN exit nodes, and deployment server IPs. Layer this with authentication — never rely solely on IP allowlisting as it can be spoofed in certain network configurations.
Mutual TLS means both the client and server authenticate each other with certificates — not just the server proving identity to the client. In a Kubernetes environment, Istio or Linkerd can enforce mTLS automatically between all services in the mesh, with zero code changes. Every service gets a SPIFFE/SPIRE identity certificate. This eliminates the need for API keys between internal services entirely and makes lateral movement after a breach dramatically harder.
Versioning your API allows you to make breaking changes without forcing all clients to update simultaneously. Two common approaches:
/api/v1/usersCacheable, visible in logs, easy to route at gateway level. Recommended for public APIs.
Accept: application/vnd.api+json;version=2Cleaner URLs, but harder to test in browsers and debug in logs.
When receiving webhooks, you cannot use OAuth since the provider is pushing to you. Instead, validate the HMAC signature that providers include in the request headers (e.g., X-Hub-Signature-256 from GitHub, Stripe-Signature from Stripe).
// Webhook signature validation (Stripe/GitHub pattern)
function validateWebhook(payload, signature, secret) {
const expected = crypto
.createHmac('sha256', secret)
.update(payload, 'utf8')
.digest('hex');
const trusted = `sha256=${expected}`;
// Use timingSafeEqual to prevent timing attacks
return crypto.timingSafeEqual(
Buffer.from(signature),
Buffer.from(trusted)
);
}Frequently Asked Questions
Book a Security Architecture Review
Our senior engineers will audit your API authentication, rate limiting configuration, and token handling — and deliver a prioritized remediation plan within 5 business days.
Book a Free Security Review