Issue Description
AI Voice Activity Detection, also known as AIVAD, fails to initialize during active sessions despite the implementation of semantic based speech detection settings. This condition manifests as an inactive detection state in system logs, occurring when the HTTP request payload contains contradictory instructions, parameter enable_aivad: false, from legacy and modern configuration schemas.
Platform/SDK
Service: Agora Conversational AI
Configuration Interface: RESTful API or Convo Studio
Affected Versions: All versions prior to v2.6
Root Cause
The issue occurs because the request body included both the new and old configuration parameters for voice activity detection:
- New format:
turn_detection.end_of_speech.mode: "semantic" - Old format:
advanced_features.enable_aivad
When these coexist, the system prioritizes the new semantic configuration, but the presence of the deprecated enable_aivad: false parameter can still cause AIVAD to appear disabled in certain tools or logs.
Root Cause: The front-end Studio tool was still appending the outdated enable_aivad parameter to requests, even when the new semantic configuration was being used. This created confusion and inconsistent behavior across environments.
Step-to-Step Solution
Enforce the Unified Turn Detection Schema
Standardize all voice activity detection requirements under the turn_detection object. Developers must utilize the following JSON structure to ensure semantic mode is correctly prioritized:
json "turn_detection": { "mode": "default", "config": { "start_of_speech": { "mode": "semantic" }, "end_of_speech": { "mode": "semantic", "semantic_config": { "max_wait_ms": 1500, "silence_duration_ms": 480 } }, "speech_threshold": 0.5 } }Purge Legacy Feature Flags
Remove the enable_aivad property from the advanced_features section in both custom request bodies and backend presets. Ensuring the absence of this deprecated field prevents the engine from receiving conflicting activation signals.
Align SIP Presets with Modern Schemas
Audit the sip_default preset and any other predefined templates. These global configurations must be updated to follow the new schema to ensure full compatibility with semantic based detection logic.
Decommission Private Parameters
Discontinue the use of private or experimental fields such as semantic_cfg.use_semantic_eos. These parameters are scheduled for removal in version 2.6 and later, and their presence may cause unpredictable behavior in future SDK releases.
Utilize Studio UI for Parameter Calibration
Prefer adjusting detection sensitivity and timeouts through the official Studio user interface. This ensures that the generated JSON payloads are compliant with the latest schema requirements and prevents the accidental inclusion of hidden deprecated fields.
Conclusion
Eliminating the outdated parameter path ensures that the media engine correctly parses the semantic detection instructions. This implementation restores full AIVAD functionality and provides a stable foundation for AI driven voice interactions.
Corresponding Document/Link
- CSD-77608