Issue Description
A Conversational AI agent may successfully deliver real-time voice responses while failing to transmit live transcripts through the Signaling service. This issue typically occurs in Flutter applications where the agent is active and voice interaction is functional, yet the expected transcription messages are not received by the client despite being enabled in the service configuration.
While this guide uses a Flutter integration as a primary example, the underlying token configuration requirements apply to all platforms utilizing Agora Conversational AI.
Platform/SDK
Service: Agora Conversational AI
Client SDK: Agora RTM SDK v2.2.5 for Flutter
Operating System: Android
Error Characterization
There is often no explicit error message returned by the SDK. The primary symptom is the total absence of data packets within the Signaling channel assigned for transcriptions, leading to a functional voice experience without the accompanying visual text in the user interface.
Root Cause
The failure to deliver transcripts originates from a token privilege mismatch. This occurs when an agent is initiated using a legacy RTC-only token. While such a token permits the agent to join the media channel for voice interaction, it lacks the necessary authorization to interface with the Real-Time Messaging (RTM) service. Consequently, the agent is unable to join the signaling channel and cannot publish the generated transcription events to the client.
Step by Step Solution
-
Verify Agent Runtime Configuration
Confirm that the agent startup parameters explicitly enable the data transmission path. The following properties must be set to true within the agent configuration:
enable_rtm=truedata_channel=rtmtranscript.enable=true
-
Validate Client-Side RTM Connectivity
Verify that the Flutter application can successfully exchange standard RTM messages with other clients in the same channel. This step ensures that the signaling subscription logic within the client app is functional and that the issue is isolated to the agent's publishing capabilities.
-
Cross-Reference with Conversational AI Playground
Test the identical App ID and agent configuration using the Agora Conversational AI Playground. If transcripts appear correctly in the demo environment, the issue is confirmed to be related to the specific token generation or SDK integration within your custom application.
-
Transition to AccessToken2 utilizing the 007 Format
Deprecated RTC-only tokens are insufficient for Conversational AI features. You must update your token server to generate AccessToken2 utilizing the 007 format. This modern token structure is the mandatory standard for all RTM 2.x and AI-driven services.
-
Consolidate RTC and RTM Service Privileges
When generating the token for the agent, you must explicitly include permissions for both the RTC Service to enable voice interaction and the RTM Service to authorize the agent to join the signaling channel and publish transcript events. Without the RTM service privilege, the agent remains unauthorized to transmit data, even if transcription is enabled in the backend settings.
-
Re-initiate the Agent Session
After updating your token generation logic, restart the agent with the new AccessToken2. Rejoin the channel and monitor the signaling path to confirm the arrival of transcript payloads.
-
Implement Payload Decoding for Flutter
Once RTM messages are received, ensure your Flutter application correctly parses the transcription structure. The data format aligns with the standards used in the Web and JavaScript toolkits and should be mapped to corresponding Dart models for UI rendering.
Best Practice
Standardize on AccessToken2: Adopt the 007 token format as the default for all projects utilizing RTM 2.x to avoid permission-related failures.
Service Audit: When configuring AI agents, always audit the token generation logs to ensure that both RTC and RTM service IDs are present in the final string.
Pre-flight Check: If voice interaction is functional but transcripts are missing, verify the token capabilities before troubleshooting the client-side UI or rendering logic.