TTS Rate Limit Error Due to Token Usage Exceeded – Agora Support Center

Issue Description:

Users may encounter recurring “rate limit reached” errors when using the Text-to-Speech (TTS) feature, even when the configured requests-per-minute (RPM) limit has not been modified and appears sufficiently high. This error prevents new TTS conversions from processing as expected.

Platform/SDK:

Agora TTS SDK integrated with MiniMax (third-party provider)

Error Message:

Module: tts Code: 1000 Message: recv vendor error, message: rate limit, code: 1002
HTTP 429 - Rate limit error

Step by Step Solution:

1. Verify RPM Setting:

- Confirm that your TTS requests-per-minute (RPM) configuration matches your expected workload.
- If it is set to at least 100 RPM, proceed to the next steps.

2. Review Token Usage with MiniMax:

- Access your MiniMax account or quota dashboard.
- Check the token usage per minute and ensure it does not exceed your allowed quota.
- Note that long text inputs can consume a large number of tokens, potentially reaching the limit even with few requests.

3. Shorten Text Inputs (If Necessary):

- Divide long TTS input text into smaller segments.
- Make multiple calls rather than sending a single large request.

4. Monitor Session Logs:

- Review session logs for each Agent ID and Turn ID to identify requests triggering the rate limit.
- Look for timestamps where multiple or lengthy TTS calls overlap.

5. Retry with Backoff:

- Implement an exponential backoff mechanism to automatically retry requests when a 429 rate limit error occurs.

6. Confirm Resolution:

- Once token quotas and text lengths are optimized, test again to confirm the “rate limit reached” error no longer appears.

Root Cause:

Although the RPM limit was not exceeded, the TTS requests used more tokens per minute than the MiniMax quota allowed due to long text inputs. The MiniMax API enforces token-based rate limits in addition to request-based limits, triggering the “rate limit” error.

Prevention/Best Practice:

- Keep TTS input lengths moderate to avoid high token consumption.
- Regularly monitor MiniMax token usage in addition to request throughput.
- Implement automated rate limit handling (e.g., retry after delay) to minimize service interruptions.

Corresponding Document/Link:

- Agora TTS Documentation
- MiniMax API Rate Limits Guide