Tavus Research Models Phoenix-3, Raven-0, and Hummingbird-0 Redefine Realism and Perception in AI Systems

Evertise

7 months ago

Six Months After Launch, Tavus’ Phoenix-3, Raven-0, and Hummingbird-0 Are Redefining the Future of Human-AI Interaction

San Francisco, CA – Earlier this year, Tavus quietly rolled out a suite of research models that would go on to reshape how the industry thinks about AI avatars and perception: Phoenix-3, a frontier rendering model; Raven-0, the first contextual perception system; and Hummingbird-0, a zero-shot lip-sync engine.

Now these models are powering a new generation of applications for Fortune 500 companies and startups alike where AI doesn’t just look real, but feels real. From conversational video interfaces to multilingual dubbing pipelines, Tavus’ technology has kicked off a revolution for how humans interact with AI systems.

The Research Team

Behind these advances is a tightly coordinated team of researchers at Tavus: Damian Willary, Eloi du Bois, Karthik Ragunath Ananda Kumar, Minh Anh Nguyễn, Mustafa Isik, Jack Saunders, Roey Paz-Priel, Mert Gerdan, Chenglai Zhong, Haiyao Xiao, and Ari Korin.

Damian Willary has served as the overall research lead for Tavus’ rendering models, steering the Phoenix series to its current state-of-the-art performance.
Eloi du Bois, who joined from Roblox, spearheaded diffusion modeling research that became central to Phoenix-3 and Hummingbird-0.
Karthik Ragunath Ananda Kumar has been a driving force across Phoenix-1, Phoenix-2, Phoenix-3, and Hummingbird-0 models, pioneering research in model architecture and implementation in NeRF, Gaussian Splatting, and Diffusion/Flow-Matching models.
Minh Anh Nguyễn led the research that pushed Hummingbird-0 into zero-shot, production-ready territory.
Chenglai Zhong and Haiyao Xiao brought deep expertise to rendering fidelity and systems architecture, ensuring scale and robustness.
Mustafa Isik and Jack Saunders played a crucial role in introducing novel innovations in Phoenix-3’s rendering architecture, helping push the model’s realism and expressiveness to new levels.
Roey Paz-Priel and Mert Gerdan shaped the architecture of Raven-0, making contextual perception a practical reality.
Ari Korin focused on multimodal integration and systems engineering, helping bridge perception, rendering, and conversational AI.

This team’s mix of backgrounds—from rendering pipelines to perception systems to multimodal AI—was the catalyst that made the March release possible.

Phoenix-3: Solving the Uncanny Valley

For years, AI avatars struggled with the “uncanny valley” in rendered faces that moved, but didn’t emote. Phoenix-3 changed that. Using a Gaussian diffusion backbone, it renders full-face animation in real time, capturing blinks, micro-expressions, and emotional nuance. The result is something that feels less like a simulation and more like a person on the other side of the screen.

Raven-0: From Vision to Perception

Most machine vision systems see the world as pixels and categories. Raven-0 treats it like context. It’s the first AI perception system that interprets intent, emotion, and subtle cues in real time—an approach that’s already proving valuable in healthcare, education, and customer engagement.

Hummingbird-0: Lip Sync That Just Works

Born out of Phoenix-3’s development, Hummingbird-0 quickly took on a life of its own. The state-of-the-art model can align audio and video with zero training or fine-tuning, while preserving both identity and realism. For creators, studios, and enterprises, that means faster dubbing, seamless localization, and entirely new workflows for video production.

Six Months of Impact

Since launch, developers have built on top of Tavus’ APIs to create:

Face-to-face conversational video AI that actually listens, responds, and emote in real time.
Multilingual dubbing pipelines where Hummingbird-0 handles professional lip sync jobs without post-processing.
Context-aware agents that use Raven-0 to perceive and adapt to subtle and explicitl visual signals in their environment.

Benchmarks have confirmed what early adopters are seeing in practice: Hummingbird-0 sets a new bar in lip-sync accuracy, visual quality, and identity preservation, while Phoenix-3 has brought real-time rendering to a fidelity level once thought impossible.

A Turning Point for AI Video

“The release of Phoenix-3, Raven-0, and Hummingbird-0 wasn’t just about making avatars look real—it was about making them feel present. It’s a turning point in how AI connects with people, and it’s only the start.” – statement from the Tavus Research Team.