Working with Rate-Limited Third-Party APIs

Understanding rate limits

Third-party APIs enforce rate limits to protect their infrastructure. Common limit types:

Requests per second (burst limit) — typically 1–10 requests per second
Requests per minute — common for REST APIs
Daily quotas — often used for expensive operations like LLM API calls
Concurrent connection limits

Hitting a rate limit returns a 429 response. Hitting it repeatedly may result in temporary or permanent API key suspension.

The token bucket implementation

For controlling your own outbound request rate, implement a token bucket algorithm. Tokens represent the right to make one API request. Tokens are added to the bucket at the API's allowed rate. A request consumes one token. If the bucket is empty, the request waits.

This smooths request traffic and ensures you never exceed the API's limits, even during burst conditions.

Handling 429 responses

When you receive a 429:

1. Check for a Retry-After header — this tells you exactly how long to wait 2. If no Retry-After is present, wait with exponential backoff (start at 1 second, double each retry) 3. Add jitter (a small random delay) to prevent retry storms when multiple workers hit the limit simultaneously

Queuing for high-volume workloads

For workloads that need to make thousands of API calls (bulk data sync, report generation, notification delivery), use a job queue with concurrency limits rather than making all calls in parallel. A queue gives you:

Control over the request rate
Automatic retry on failure
Visibility into backlog and processing rate
Graceful handling of API downtime

Caching to reduce API calls

Cache API responses where the data does not change frequently. A company's address from a data enrichment API, a user's account status from a CRM, or a product's price from a catalogue API — these do not change on every request. Even a 5-minute TTL cache can dramatically reduce API call volume for high-traffic integrations.