Issue Description:
Users may encounter recurring “rate limit reached” errors when using the Text-to-Speech (TTS) feature, even when the configured requests-per-minute (RPM) limit has not been modified and appears sufficiently high. This error prevents new TTS conversions from processing as expected.
Platform/SDK:
Agora TTS SDK integrated with MiniMax (third-party provider)
Error Message:
Module: tts Code: 1000 Message: recv vendor error, message: rate limit, code: 1002
HTTP 429 - Rate limit error
Step by Step Solution:
1. Verify RPM Setting:
- Confirm that your TTS requests-per-minute (RPM) configuration matches your expected workload.
- If it is set to at least 100 RPM, proceed to the next steps.
2. Review Token Usage with MiniMax:
- Access your MiniMax account or quota dashboard.
- Check the token usage per minute and ensure it does not exceed your allowed quota.
- Note that long text inputs can consume a large number of tokens, potentially reaching the limit even with few requests.
3. Shorten Text Inputs (If Necessary):
- Divide long TTS input text into smaller segments.
- Make multiple calls rather than sending a single large request.
4. Monitor Session Logs:
- Review session logs for each Agent ID and Turn ID to identify requests triggering the rate limit.
- Look for timestamps where multiple or lengthy TTS calls overlap.
5. Retry with Backoff:
- Implement an exponential backoff mechanism to automatically retry requests when a 429 rate limit error occurs.
6. Confirm Resolution:
- Once token quotas and text lengths are optimized, test again to confirm the “rate limit reached” error no longer appears.
Root Cause:
Although the RPM limit was not exceeded, the TTS requests used more tokens per minute than the MiniMax quota allowed due to long text inputs. The MiniMax API enforces token-based rate limits in addition to request-based limits, triggering the “rate limit” error.
Prevention/Best Practice:
- Keep TTS input lengths moderate to avoid high token consumption.
- Regularly monitor MiniMax token usage in addition to request throughput.
- Implement automated rate limit handling (e.g., retry after delay) to minimize service interruptions.