AI at Your Service: Digital Avatars With Speech Capabilities Offer Interactive Customer Experiences

Editor’s note: This post is part of the AI On blog series, which explores the latest techniques and real-world applications of agentic AI, chatbots and copilots. The series will also highlight the NVIDIA software and hardware powering advanced AI agents, which form the foundation of AI query engines that gather insights and perform tasks to transform everyday experiences and reshape industries.

To enhance productivity and upskill workers, organizations worldwide are seeking ways to provide consistent, around-the-clock customer service with greater speed, accuracy and scale.

Intelligent AI agents offer one such solution. They deliver advanced problem-solving capabilities and integrate vast and disparate sources of data to understand and respond to natural language.

Powered by generative AI and agentic AI, digital avatars are boosting efficiency across industries like healthcare, telecom, manufacturing, retail and more. According to Gartner, by 2028, 45% of organizations with more than 500 employees will use employee AI avatars to expand the capacity of human capital.1

From educating prospects on policies to giving customers personalized solutions, AI is helping organizations optimize revenue streams and elevate employee knowledge and productivity.

Where Context-Aware AI Avatars Are Most Impactful

Staying ahead in a competitive, evolving market requires continuous learning and analysis. AI avatars — also referred to as digital humans — are addressing key concerns and enhancing operations across industries.

One key benefit of agentic digital human technology is the ability to offer consistent, multilingual support and personalized guidance for a variety of use cases.

For instance, a medical-based AI agent can provide 24/7 virtual intake and support telehealth services. Or, a virtual financial advisor can help enhance client security and financial literacy by alerting bank customers of potential fraud, or offering personalized offers and investment tips based on their unique portfolio.

These digital humans boost efficiency, cut costs and enhance customer loyalty. Some key ways digital humans can be applied include:

Personalized, On-Brand Customer Assistance: A digital human interface can provide a personal touch when educating new customers on a company’s products and service portfolios. They can provide ongoing customer support, offering immediate responses and solving problems without the need for a live operator.
Enhanced Employee Onboarding: Intelligent AI assistants can offer streamlined, adaptable, personalized employee onboarding, whether in hospitals or offices, by providing consistent access to updated institutional knowledge at scale. With pluggable, customizable retrieval-augmented generation (RAG), these assistants can deliver real-time answers to queries while maintaining a deep understanding of company-specific data.
Seamless Communication Across Languages: In global enterprises, communication barriers can slow down operations. AI-powered avatars with natural language processing capabilities can communicate effortlessly across languages. This is especially useful in customer service or employee training environments where multilingual support is crucial.

Learn more by listening to the NVIDIA AI Podcast episode with Kanjun Qiu, CEO of Imbue, who shares insights on how to build smarter AI agents.

Interactive AI Agents With Text-to-Speech and Speech-to-Text

With text-to-speech and speech-to-text capabilities, AI agents can offer enhanced interactivity and engagement in customer service interactions.

SoftServe, an IT consulting and digital services provider, has built several digital humans for a variety of use cases, highlighting the technology’s potential to enhance user experiences.

SoftServe’s Digital Concierge is accelerated by NVIDIA AI Blueprints and NVIDIA ACE technologies to rapidly deploy scalable, customizable digital humans across diverse infrastructures.

GEN, SoftServe’s virtual customer service assistant and digital concierge, makes customer service more engaging by providing lifelike interactions, continuous availability, personalized responses and simultaneous access to all necessary knowledge bases.

SoftServe also developed FINNA, an AI-powered virtual financial advisor that can provide financial guidance tailored to a client’s profile and simplify complex financial terminology. It helps streamline onboarding and due diligence, supporting goal-oriented financial planning and risk assessment.

AISHA is another AI-powered digital human developed by SoftServe with NVIDIA technology. Created for the UAE Ministry of Justice, the digital human significantly improves judicial processes by reducing case review times, enhancing the accuracy of rulings and providing rapid access to legal databases. It demonstrates how generative AI can bridge the gap between technology and meaningful user interaction to enhance customer service and operational efficiency in the judicial sector.

How to Design AI Agents With Avatar and Speech Features

Designing AI agents with avatar and speech features involves several key steps

Determine the use case: Choose between 2D or 3D avatars based on the required level of immersion and interaction.
Avatar development:

For 3D avatars, use specialized software and technical expertise to create lifelike movements and photorealism.
For 2D avatars, opt for quicker development suitable for web-embedded solutions.

Integrate speech technologies: Use NVIDIA Riva for world-class automatic speech recognition, along with text-to-speech to enable verbal interactions.
Rendering options: Use NVIDIA Omniverse RTX Renderer technology or Unreal Engine tools for 3D avatars to achieve high-quality output and compute efficiency.
Deployment: Tap cloud-native deployment for real-time output and scalability, particularly for interactive web or mobile applications.

For an overview on how to design interactive customer service tools, read the technical blogs on how to “Build a Digital Human Interface for AI Apps With an NVIDIA AI Blueprint” and “Expanding AI Agent Interface Options With 2D and 3D Digital Human Avatars.”

NVIDIA AI Blueprint for Digital Humans

The latest release of the NVIDIA AI Blueprint for digital humans introduces several updates that enhance the interactivity and responsiveness of digital avatars, including dynamic switching between RAG models. Users can experience this directly in preview.

The integration of the Audio2Face-2D microservice in the blueprint means developers can create 2D digital humans, which require significantly less processing power compared with 3D models, for web- and mobile-based applications.

2D avatars are better suited for simpler interactions and platforms where photorealism isn’t necessary. This makes them ideal for scenarios like telemedicine, where quick loading times with lower bandwidth requirements are crucial.

Another significant update is the introduction of user attention detection through vision AI. This feature enables digital humans to detect when a user is present — even if they are idle or on mute — and initiate interaction, such as greeting the user. This capability is particularly beneficial in kiosk scenarios, where engaging users proactively can enhance the service experience.

Getting Started

NVIDIA AI Blueprints make it easy to start building and setting up virtual assistants by offering ready-made workflows and tools to accelerate deployment. Whether for a simple AI-powered chatbot or a fully animated digital human interface, the blueprints offer resources to create AI assistants that are scalable, aligned with an organization’s brand and deliver a responsive, efficient customer support experience.

1. Gartner®, Hype Cycle for the Future of Work, 2024, Tori Paulman, Emily Rose, etc., July 2024

GARTNER is a registered trademark and service mark and Hype Cycle is a trademark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved.