Safety and Alignment
for Voice Agents
Bringing AI safety to voice agents, and voice-first interaction to AI safety.
SafeVoiceAgents is motivated by a two-way question: how can AI safety methods be adapted to voice agents, and how should voice-first interaction reshape the broader AI safety agenda? The workshop brings advances in alignment, evaluation, and governance to spoken, streaming, and full-duplex conversational agents — while drawing attention to safety challenges that arise uniquely in voice and are difficult to capture in text-only settings.
Voice is rapidly becoming a dominant interface between humans and AI, as advances in audio language models enable increasingly natural and capable spoken conversational agents. The deployment of systems such as ChatGPT Voice, Gemini Live, Doubao, and related platforms signals this shift: from text-centric AI toward voice-native human–agent interaction. As users increasingly rely on voice agents to seek information, receive recommendations, and interact with digital services, ensuring that these systems behave safely, reliably, and in accordance with human values is increasingly urgent. Yet safety research has not kept pace with the rapid progress in capability for voice agents, and most existing safety and alignment studies remain fundamentally text-centric.
Voice differs from text in safety-critical ways at both the interaction and model levels. At the interaction level, speech conveys not only semantic content but also prosody, tone, emotion, speaking style, and other paralinguistic cues; spoken interactions may also carry background sounds and non-speech acoustic signals. Many acoustically distinct utterances can correspond to the same transcript while conveying different intent or social meaning — so safety mechanisms for voice agents must reason beyond text alone. At the model level, voice agents are increasingly powered by audio language models that map both speech and text into a shared decoding process. Practical models heavily compress the audio stream, which can bias them toward textual content, weaken faithful use of acoustic cues, and degrade multimodal reasoning. The result is a clear gap: deployment is becoming audio-first, while most successful safety and alignment methods remain text-first.
Voice agent safety is at an inflection point. Three processes are converging: deployment is becoming voice-first to provide natural human–AI interaction, model architectures are becoming voice-native and increasingly real-time, and the evidence base for unsafe model behavior has matured from warnings to reproducible failure modes.
Recent literature now provides enough critical mass to make voice agent safety a community agenda rather than a set of isolated papers. Safety risks have been observed across jailbreak, robustness, privacy, accent, emotion, adversarial, speaking style, and compositional audio language prompting. In parallel, voice-native benchmarks, judges, fairness studies, and early protective mechanisms are beginning to define what beyond-transcription evaluation and mitigation should look like. Without a dedicated venue, progress may remain scattered across speech and AI safety communities — slowing the development of safety standards at the very moment voice is becoming a primary interface between humans and AI.
Contribute to the first workshop on voice agent safety.
We welcome research papers across nine topic areas, from voice-specific safety risks to policy and governance. Submissions follow the NeurIPS 2026 Author Instructions.
Read the Call for Papers →