If you've tried building voice AI apps with Gemini, you probably hit the same issues I did:
- Gemini expects 16kHz PCM input, sends 24kHz output, but browsers use 44.1kHz/48kHz
- The audio is little-endian PCM16 - use Int16Array directly and you get garbage on some devices
- Buffer too little = choppy audio. Buffer too much = noticeable latency
- Audio chunks arrive while playing - chain them wrong and you get gaps
I spent some hours debugging this, so I extracted the working code into a library.npm install gemini-live-react