Skip to content

useAiVoice Voice Interaction

Browser Compatibility

This feature uses the browser's native Web Speech API for speech-to-text. In Google Chrome, it requires running in a 🪄 secure context (HTTPS or localhost) to work properly.

useAiVoice is a highly integrated voice processing hook. It not only encapsulates the browser's native speech-to-text (STT) capabilities but also features a built-in Web Audio analysis engine that outputs real-time high-fidelity waveform data, perfectly matching the AiVoiceTrigger component for dynamic visual feedback.

Key Features

  • Visual Sync: Built-in frequency analyzer returns real-time amplitudes array to drive waveform animations directly.
  • Physical Recording: Built-in MediaRecorder generates a .webm audio blob after recording.
  • Smart VAD: Supports Voice Activity Detection to automatically stop recording when the user finishes speaking.
  • Real-time STT: Supports interimResults to provide partial transcripts while the user is still speaking.
  • Plug-and-Play: Automatically manages microphone permissions, AudioContext lifecycle, and proper cleanup.

Basic Usage

⚠️ Current environment doesn't support Web Speech API (Chrome recommended).
Waiting for speech...
Basic Interaction Demo

Integration with AiSender

Integrate the voice trigger into AiSender for an input experience similar to mainstream AI assistants.

Integrated Input Field

Advanced Case: Spheromorphism AI Chat

Hello! I am your voice assistant. What would you like to talk about?

Full Voice Chat Loop

API

UseAiVoiceOptions

PropertyDescriptionTypeDefault
languageRecognition languagestring'zh-CN'
interimResultsRequest partial resultsbooleantrue
continuousContinuous recognitionbooleanfalse
vadEnable Smart Silence Detectionbooleantrue
vadThresholdTime threshold for silence (ms)number2000
volumeThresholdVolume sensitivity threshold (0-1)number0.05
waveCountNumber of amplitude barsnumber20
useSTTEnable browser speech recognitionbooleantrue
onStartCallback when recording starts() => void-
onStopCallback with transcript and audio blob(transcript: string, blob: Blob | null) => void-
onResultCallback for finalized transcript(transcript: string) => void-
onPartialResultCallback for interim transcript(transcript: string) => void-
onErrorError callback(error: unknown) => void-

Return Value

ExportDescriptionType
isRecordingReactive recording stateRef<boolean>
transcriptConfirmed final textRef<string>
interimTranscriptReal-time partial textRef<string>
amplitudesReal-time waveform data for AiVoiceTriggerRef<number[]>
volumeReal-time volume (0-100)Ref<number>
audioBlobGenerated audio fileRef<Blob | null>
startStart recording and recognition() => Promise<void>
stopStop and get results() => void
cancelCancel recording and discard current result() => void
sttSupportedBrowser support checkboolean

Released under the MIT License.