Skip to main content
Security Engineering · March 2026

API Rate Limiting & Authentication
Best Practices 2026

A production-grade playbook covering every rate limiting algorithm, Redis implementation patterns, OAuth 2.0 flows, JWT hardening, and API gateway configuration—with real code and comparison tables.

15 min read·Updated March 21, 2026·By Codazz Engineering

Why Rate Limiting Matters

In 2025, 73% of public APIs experienced some form of abuse—ranging from credential stuffing to runaway automation scripts that burned through millions of tokens in hours. Rate limiting is no longer optional infrastructure; it is a core reliability, security, and cost-control mechanism.

Without rate limiting, a single misconfigured client can exhaust your database connection pool, inflate your cloud bill by orders of magnitude, and deny service to every legitimate user. Here are four real-world incidents that illustrate why this matters at every scale.

OpenAI $5M Bill Incident

A single developer published an open-source script that proxied OpenAI requests without any rate limiting on the proxy layer. Within 72 hours, the script was embedded in thousands of projects. The operator's OpenAI account accrued over $5M in usage before the keys were revoked. Per-user and per-IP limits at the proxy layer would have capped damage at under $100.

Stripe API Credential Stuffing

Attackers fired 40,000 card-validation requests per hour against a merchant's Stripe integration using rotating IPs. Without per-IP rate limits on the merchant's own backend, each attempt hit Stripe and incurred API costs. Stripe's own per-key limits eventually triggered a suspension of the merchant's account.

GitHub Actions Cache Stampede

A newly deployed GitHub Actions workflow had an infinite retry loop on cache miss. During a CI spike, 3,000 parallel jobs each retried the cache API every 500ms. GitHub's rate limiting responded with 429s, but the client code treated 429 as a transient error and retried immediately — amplifying the problem. Exponential backoff with jitter would have self-healed within minutes.

Twitter API v2 Migration

When Twitter migrated from v1 to v2 and tightened rate limits from 900 to 500 requests per 15-minute window, hundreds of production applications that relied on "free" unlimited access collapsed overnight. Teams with proper rate limit header parsing (Retry-After, X-Rate-Limit-Remaining) recovered in hours; others spent days debugging cascading timeouts.

DDoS Protection

Caps request volume per IP or key before it saturates your origin.

Cost Control

Prevents runaway automation from generating unexpected cloud/API charges.

Fair Use Enforcement

Ensures high-volume users cannot degrade experience for others.

SLA Guarantees

Keeps latency and availability within contractual bounds for all tenants.

Rate Limiting Algorithms Compared

Choosing the right algorithm determines whether your API rejects bursts gracefully or allows them intentionally. Each algorithm trades precision, memory usage, and implementation complexity differently.

AlgorithmBurst HandlingMemoryPrecisionBest Use Case
Token BucketAllows burst up to bucket sizeO(1) per keyApproximatePublic APIs, general purpose
Sliding Window LogNo burst — precise countO(N requests in window)ExactStrict per-user financial APIs
Fixed Window Counter2× burst at window boundaryO(1) per keyLowSimple quota enforcement
Leaky BucketRejects excess burstsO(1) per keyModerateRate-smoothing, queuing systems
Token Bucket
Pros

Natural burst allowance mirrors real user behavior. O(1) storage. Tokens replenish continuously — no sharp window resets.

Cons

Approximate — an attacker can always consume the full bucket in the first millisecond of each refill cycle.

Best For

Default choice for public REST APIs where short bursts are acceptable (e.g., loading a dashboard that fires 5 requests at once).

Sliding Window Log
Pros

Exact request count within any rolling time window. No boundary burst problem. Best auditability.

Cons

Stores a timestamp for every request in the window. At high traffic, memory consumption is proportional to request volume. Expensive to query.

Best For

Financial APIs, SMS/email sending APIs, any context where you must guarantee exactly N per window with no exceptions.

Fixed Window Counter
Pros

Simplest possible implementation: single Redis INCR + EXPIRE. Lowest memory and CPU overhead.

Cons

Boundary burst problem: a user can fire N requests at the end of window T and N more at the start of T+1, effectively 2N in a short span.

Best For

Coarse daily/monthly quota enforcement (e.g., "1,000 API calls per day") where sub-minute precision does not matter.

Leaky Bucket
Pros

Guarantees a smooth, constant output rate regardless of input burst. Ideal for protecting downstream services from spikes.

Cons

Bursts are lost rather than queued, which frustrates legitimate users with bursty patterns. Harder to reason about for end users.

Best For

Upstream rate-smoothing before hitting a downstream service with strict capacity (e.g., a payment processor or external partner API).

Redis-Based Rate Limiting Implementation

Redis is the de facto standard for production rate limiting because it provides sub-millisecond latency, atomic operations via Lua scripts, and native TTL support. Here are the key patterns you need.

Token Bucket — INCR + EXPIRE

// Node.js — Token Bucket via Redis INCR + EXPIRE
async function isAllowed(redis, key, limit, windowSeconds) {
  const current = await redis.incr(key);
  if (current === 1) {
    // First request in window — set TTL
    await redis.expire(key, windowSeconds);
  }
  return current <= limit;
}

// Usage
const allowed = await isAllowed(redis, `rl:${userId}`, 100, 60);
if (!allowed) {
  res.status(429).json({ error: 'Rate limit exceeded', retryAfter: 60 });
  return;
}

Sliding Window Log — ZADD + ZREMRANGEBYSCORE

// Sliding Window Log — stores timestamp per request as sorted set
async function slidingWindowCheck(redis, key, limit, windowMs) {
  const now = Date.now();
  const windowStart = now - windowMs;

  const pipeline = redis.pipeline();
  // Remove timestamps outside the window
  pipeline.zremrangebyscore(key, '-inf', windowStart);
  // Add current request timestamp
  pipeline.zadd(key, now, `${now}-${Math.random()}`);
  // Count requests in window
  pipeline.zcard(key);
  // Set TTL so keys self-clean
  pipeline.expire(key, Math.ceil(windowMs / 1000) + 1);
  const results = await pipeline.exec();

  const requestCount = results[2][1];
  return requestCount <= limit;
}

Atomic Lua Script (Race-Condition Safe)

The INCR + EXPIRE pattern above has a tiny race window between the two commands. Use a Lua script to make the operation fully atomic — Redis executes Lua scripts as a single unit with no interruption.

-- rate_limit.lua — atomic token bucket
local key = KEYS[1]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])

local current = redis.call('GET', key)
if current and tonumber(current) >= limit then
  return 0  -- rate limited
end

local new_val = redis.call('INCR', key)
if new_val == 1 then
  redis.call('EXPIRE', key, window)
end
return 1  -- allowed

-- Load and call from Node.js:
-- const result = await redis.eval(luaScript, 1, key, limit, windowSecs);

Using redis-rate-limiter (npm)

import { RateLimiterRedis } from 'rate-limiter-flexible';
import { createClient } from 'redis';

const redisClient = createClient({ url: process.env.REDIS_URL });
await redisClient.connect();

const rateLimiter = new RateLimiterRedis({
  storeClient: redisClient,
  keyPrefix: 'rl',
  points: 100,       // max requests
  duration: 60,      // per 60 seconds
  blockDuration: 30, // block for 30s after limit exceeded
});

// Express middleware
export async function rateLimitMiddleware(req, res, next) {
  try {
    const key = req.user?.id ?? req.ip;
    await rateLimiter.consume(key);
    next();
  } catch (rejRes) {
    const secs = Math.ceil(rejRes.msBeforeNext / 1000);
    res.set('Retry-After', String(secs));
    res.status(429).json({ error: 'Too Many Requests', retryAfter: secs });
  }
}
Key Production Considerations
  • Redis Cluster vs Single-Node: Use cluster mode for HA. Be aware that Lua scripts must run on the same shard — ensure your keys hash to the same slot using hash tags: {{userId}}.
  • Lua Atomicity: Redis executes Lua scripts single-threaded with no preemption. This is your safest option for race-free counters, but keep scripts short to avoid blocking other clients.
  • TTL Management: Always set a TTL. Without it, rate limit keys accumulate indefinitely. Use slightly longer TTL than your window (window + 1 second) to avoid edge-case expiry during window evaluation.
  • Graceful Degradation: If Redis is unavailable, decide upfront: fail open (allow all traffic) or fail closed (block all traffic). Most APIs fail open with logging to avoid an outage-within-an-outage.

API Gateway Rate Limiting Solutions

For most teams, the right move is to enforce rate limiting at the gateway layer before requests reach application code. Here is how the major platforms compare.

Kong Rate Limiting Plugin (YAML Config)

# kong.yml — Declarative config for rate limiting plugin
plugins:
  - name: rate-limiting
    config:
      minute: 100          # requests per minute
      hour: 2000           # requests per hour
      policy: redis        # local | cluster | redis
      redis_host: redis.internal
      redis_port: 6379
      redis_timeout: 2000
      limit_by: consumer   # ip | consumer | credential | header | path
      hide_client_headers: false
      fault_tolerant: true # fail open if Redis is down

AWS API Gateway — Usage Plans + Throttling

# AWS CDK — API Gateway Usage Plan with throttling
const plan = api.addUsagePlan('BasicPlan', {
  name: 'Basic',
  throttle: {
    rateLimit: 100,   // requests per second (steady state)
    burstLimit: 200,  // token bucket size (max concurrent burst)
  },
  quota: {
    limit: 10000,         // total requests
    period: Period.MONTH, // per calendar month
  },
});

// Attach an API key to the plan
const key = api.addApiKey('MyApiKey');
plan.addApiKey(key);

// Attach the plan to a stage
plan.addApiStage({ stage: api.deploymentStage });

Cloudflare Rate Limiting

Cloudflare's rule builder lets you create rate limit rules based on IP, User-Agent, ASN, cookie, header value, or request path. Rules fire before the request reaches your origin, making them ideal for DDoS mitigation.

# Cloudflare Rate Limiting Rule (Terraform)
resource "cloudflare_rate_limit" "api" {
  zone_id   = var.zone_id
  threshold = 100
  period    = 60  # seconds

  match {
    request {
      url_pattern = "example.com/api/*"
      schemes     = ["HTTPS"]
      methods     = ["GET", "POST"]
    }
  }

  action {
    mode    = "ban"
    timeout = 300  # ban for 5 minutes
    response {
      content_type = "application/json"
      body         = "{"error":"rate limited"}"
    }
  }
}
SolutionCostComplexityAlgorithmBest For
KongFree (OSS) / PaidMediumSliding WindowSelf-hosted microservices
AWS API GWPay-per-callLowToken BucketAWS-native serverless APIs
Cloudflare$5/mo + overagesLowFixed WindowEdge DDoS protection
Nginx limit_reqFreeMediumLeaky BucketSimple single-server setup

API Authentication Methods Compared

Rate limiting works best when paired with strong authentication — you can only enforce fair use if you know who is making the request. Here is how the four major approaches compare across eight dimensions.

FactorAPI KeysOAuth 2.0JWTmTLS
ComplexityVery LowHighMediumHigh
RevocationImmediate (DB lookup)Token introspectionHard (requires blacklist)Certificate revocation (CRL/OCSP)
ExpiryManual rotationShort-lived tokensBuilt-in (exp claim)Certificate validity period
Use CaseServer-to-server, simpleUser-delegated accessStateless auth at scaleZero-trust service mesh
Client TypeAnyBrowser, Mobile, ServerAnyServer-to-server only
Server LoadDB lookup per requestToken exchange overheadVerify signature onlyTLS handshake overhead
StandardsNone (ad-hoc)RFC 6749 / 6750RFC 7519RFC 5246 / 8446
EcosystemUniversalExcellent (Google, GitHub…)Excellent (every language)Strong (service mesh)

JWT Best Practices

JWTs are everywhere — and widely misused. These are the non-negotiable practices for production JWT systems.

1. Token Lifetimes — Keep Them Short

Access tokens should expire in 15 minutes. Refresh tokens can live for 7 days but must be single-use and rotated on every use. If a refresh token is used twice (replay attack), invalidate the entire refresh token family immediately.

// Issuing tokens
const accessToken = jwt.sign(
  { sub: userId, iss: 'api.yourdomain.com', aud: 'app', jti: uuidv4() },
  privateKey,
  { algorithm: 'RS256', expiresIn: '15m' }
);

const refreshToken = jwt.sign(
  { sub: userId, family: familyId, jti: uuidv4() },
  refreshSecret,
  { expiresIn: '7d' }
);
2. RS256 vs HS256 — Use Asymmetric Keys for Multi-Service

HS256 uses a shared secret — every service that needs to verify tokens must hold the secret, creating a wide attack surface. RS256 uses a private key to sign (held only by your auth server) and a public key to verify (distributed freely to all services). Use RS256 in any multi-service architecture.

HS256 — Avoid in Multi-Service

Shared secret. All verifiers must know the secret. One compromised service leaks all tokens.

RS256 — Recommended

Private key signs. Public key verifies. Services get the public key from a JWKS endpoint.

3. JWT Standard Claims — What Each Means
issIssuer — who created the token (e.g., "auth.yourdomain.com"). Validate this on every request.
subSubject — the user or entity the token represents. Usually your userId.
audAudience — the intended recipient service. Validate to prevent token reuse across services.
expExpiration — Unix timestamp after which the token must be rejected. Always validate.
iatIssued At — when the token was created. Useful for detecting clock skew attacks.
jtiJWT ID — a unique token identifier. Required for implementing token blacklists.
Never Put PII in JWT Payload

JWT payloads are base64-encoded, not encrypted. Anyone with the token can decode the payload without a key. Never include email addresses, phone numbers, credit card details, or any personally identifiable information. The payload is visible to clients, intermediary proxies, and anyone with access to server logs. Include only a userId and minimal claims needed for routing/authorization decisions.

OAuth 2.0 Flows — When to Use Each

OAuth 2.0 defines several grant types optimized for different client environments. Choosing the wrong flow is a common security mistake.

Authorization Code + PKCE
Web apps, mobile apps, any user-facing application

The user is redirected to the authorization server, authenticates, and is redirected back with an authorization code. The code is exchanged for tokens server-side (or via PKCE for public clients). PKCE (Proof Key for Code Exchange) prevents authorization code interception attacks — mandatory for mobile and SPA apps.

Example: User logs into your SaaS with Google/GitHub SSO. Your app gets an access token to call Google APIs on their behalf.
Client Credentials
Machine-to-machine (M2M) communication — no user involved

Your service authenticates directly with the auth server using its client ID and secret, and receives an access token. There is no user redirect, no consent screen. Used when your backend needs to call another service's API using its own identity.

Example: Your billing microservice calling the notifications microservice. Your CI/CD pipeline calling your deployment API.
Device Authorization Flow
Devices with limited input capability (TVs, IoT, CLI tools)

The device displays a short code and a URL. The user goes to the URL on another device (phone/laptop) and enters the code to authorize. The device polls the token endpoint until authorization is granted or times out.

Example: Logging into Netflix on a smart TV. Authorizing a CLI tool (like GitHub CLI or AWS CLI) to access your account.
FlowUser Present?Redirect?PKCE?Token Storage
Auth Code + PKCEYesYesRequiredHttpOnly cookie or memory
Client CredentialsNoNoN/AServer env variable / vault
Device FlowYes (on 2nd device)NoNoDevice secure storage

Rate Limit Headers & Client Handling

Well-behaved APIs communicate rate limit state through standard response headers. This lets clients self-throttle before hitting the limit, dramatically reducing unnecessary 429 responses.

Standard Response Headers

HTTP/1.1 200 OK
RateLimit-Limit: 100
RateLimit-Remaining: 47
RateLimit-Reset: 1742600000
Retry-After: 32
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 47
X-RateLimit-Reset: 1742600000

# When rate limited (HTTP 429):
HTTP/1.1 429 Too Many Requests
Retry-After: 32
RateLimit-Policy: 100;w=60;comment="sliding window"
Content-Type: application/json

{
  "error": "too_many_requests",
  "message": "Rate limit exceeded. Retry after 32 seconds.",
  "retryAfter": 32
}
RateLimit-LimitThe maximum number of requests allowed in the current window. Set this to the limit for the caller's current tier.
RateLimit-RemainingRequests remaining in the current window. Clients should start throttling themselves when this approaches zero.
RateLimit-ResetUnix timestamp (seconds) when the current window resets. Clients can calculate exact wait time.
Retry-AfterDefined in RFC 6585. Seconds to wait before retrying. Sent with 429 and 503 responses. Takes priority over RateLimit-Reset.
RateLimit-PolicyNew IETF draft header. Describes the rate limit policy: limit;w=window;burst=N. Helps clients understand the algorithm.

Client-Side 429 Handling — Exponential Backoff with Jitter

// Fetch with exponential backoff + jitter
async function fetchWithRetry(url, options = {}, maxRetries = 5) {
  let attempt = 0;
  while (attempt <= maxRetries) {
    const response = await fetch(url, options);
    if (response.status !== 429) return response;

    const retryAfter = response.headers.get('Retry-After');
    const baseWait = retryAfter
      ? parseInt(retryAfter, 10) * 1000
      : Math.min(1000 * 2 ** attempt, 64000);

    // Add jitter: +/- 20% of base wait to avoid thundering herd
    const jitter = baseWait * 0.2 * (Math.random() - 0.5);
    const waitMs = Math.round(baseWait + jitter);

    console.warn(`429 received. Waiting ${waitMs}ms before retry ${attempt + 1}`);
    await new Promise(resolve => setTimeout(resolve, waitMs));
    attempt++;
  }
  throw new Error(`Max retries exceeded for ${url}`);
}

Security Hardening Techniques

Rate limiting and authentication are the foundation, but production APIs need additional layers. Here are the hardening techniques used at scale.

HMAC Request Signing (AWS SigV4 Pattern)

Sign the request body and key headers using HMAC-SHA256 with a shared secret. The server recomputes the signature and rejects any request where it does not match. This prevents request tampering in transit and replay attacks (use a timestamp in the signed content, reject if timestamp is older than 5 minutes).

// HMAC request signing
import crypto from 'crypto';

function signRequest(body, secret) {
  const timestamp = Math.floor(Date.now() / 1000).toString();
  const payload = timestamp + '.' + JSON.stringify(body);
  const sig = crypto.createHmac('sha256', secret).update(payload).digest('hex');
  return { 'X-Timestamp': timestamp, 'X-Signature': `sha256=${sig}` };
}
IP Allowlisting for Internal Services

Internal APIs (metrics endpoints, admin APIs, database management interfaces) should only be reachable from known IP ranges. Use your cloud provider's security group rules (AWS: security groups, GCP: firewall rules, Cloudflare: IP Access Rules) to allowlist your office IPs, VPN exit nodes, and deployment server IPs. Layer this with authentication — never rely solely on IP allowlisting as it can be spoofed in certain network configurations.

mTLS for Service Mesh (Istio/Linkerd)

Mutual TLS means both the client and server authenticate each other with certificates — not just the server proving identity to the client. In a Kubernetes environment, Istio or Linkerd can enforce mTLS automatically between all services in the mesh, with zero code changes. Every service gets a SPIFFE/SPIRE identity certificate. This eliminates the need for API keys between internal services entirely and makes lateral movement after a breach dramatically harder.

API Versioning Strategy

Versioning your API allows you to make breaking changes without forcing all clients to update simultaneously. Two common approaches:

URL Path Versioning
/api/v1/users

Cacheable, visible in logs, easy to route at gateway level. Recommended for public APIs.

Header Versioning
Accept: application/vnd.api+json;version=2

Cleaner URLs, but harder to test in browsers and debug in logs.

Webhook Security — Shared Secret + Signature Validation

When receiving webhooks, you cannot use OAuth since the provider is pushing to you. Instead, validate the HMAC signature that providers include in the request headers (e.g., X-Hub-Signature-256 from GitHub, Stripe-Signature from Stripe).

// Webhook signature validation (Stripe/GitHub pattern)
function validateWebhook(payload, signature, secret) {
  const expected = crypto
    .createHmac('sha256', secret)
    .update(payload, 'utf8')
    .digest('hex');
  const trusted = `sha256=${expected}`;
  // Use timingSafeEqual to prevent timing attacks
  return crypto.timingSafeEqual(
    Buffer.from(signature),
    Buffer.from(trusted)
  );
}

Frequently Asked Questions

Book a Security Architecture Review

Our senior engineers will audit your API authentication, rate limiting configuration, and token handling — and deliver a prioritized remediation plan within 5 business days.

Book a Free Security Review