HN – Real-time talking avatar pipeline?

NavTalk lets you upload ~2 sec of video and immediately get a realtime avatar with lip sync and live voice interaction, with <2 s latency. There’s no obvious long training step; more like instant personalization.

Is this just feature extraction + conditioning on a pretrained talking-head model? Curious what the minimal pipeline is (feature encoding, identity representation, realtime inference) and how people are doing this efficiently.

Any insights or similar open-source patterns?

I already am good at doing Voice AI bots, but struggling with the face.

https://navtalk.ai/

Real-time talking avatar pipeline?

0 comments