Blog

PRESENCE at NVIDIA GPU Technology Conference (GTC) 2025

by Hannes Fassold – JOANNEUM RESEARCH

Introduction

In the swiftly changing field of Extended Reality (XR), the research project PRESENCE aims to revolutionize human interactions in virtual environments. This blog post spotlights the real-time pose estimation algorithm researched by JOANNEUM RESEARCH within PRESENCE, which was presented by Hannes Fassold as a poster at the NVIDIA GPU Technology Conference (GTC) 2025 from March 17-21 in San Jose, California.

Attracting roughly 25,000 attendees and featuring over 1,100 sessions and workshops, NVIDIA’s GPU Technology Conference 2025 solidifies its status as a premier event for AI developers, researchers and industry leaders.  Often referred to as the “Super Bowl of AI,” the conference centered on the theme of AI reaching a critical inflection point, with a strong emphasis on generative AI, AI reasoning capabilities, physical AI and the concept of AI Factories.  This focus extended into the realm of spatial computing, with significant developments showcased in Augmented Reality (AR) and Virtual Reality (VR). The advancements in GPU technology provide the underlying engine needed for rendering these increasingly sophisticated, AI-enhanced virtual and augmented worlds explored throughout GTC 2025.

Figure 1: NVIDIA GPU Technology Conference 2025 held at San Jose McEnery Convention Center

PRENSENCE Poster presentation at GTC 2025

On Monday 17th March, Hannes Fassold presented at the GTC poster session the work “A flexible Framework for High-Quality Real-Time Human Pose Estimation”. It describes an AI-powered human pose estimation algorithm designed for real-time multimedia applications. The approach combines robust person detection using Scaled-YoloV4, efficient optical-flow-based tracking and the high-quality RTMPose algorithm for skeleton extraction. Operating at 25 frames per second for up to five individuals simultaneously, the system demonstrates robust performance across content from diverse scenarios in VR/XR environments. Experimental results confirm that the algorithm performs well in challenging conditions, such as crowded scenes with substantial occlusion, thereby offering significant potential for enhancing human-computer interaction and automated behavior analysis. Besides the scenario of VR/XR environments from PRESENCE, the real-time pose estimation algorithm can also be used in application areas like surveillance and healthcare.

Figure 2: Senior researcher Hannes Fassold from JOANNEUM RESEARCH presenting his poster about real-time pose estimation at GTC 2025

Real-time post estimation for Intelligent Virtual Humans in PRESENCE

The presented real-time pose estimation algorithm is a crucial component of the action classification algorithm for the Virtual Humans (which can be either smart avatars, holoported humans, or intelligent agents) developed by JOANNEUM RESEARCH in PRESENCE WP4.  By accurately recognizing user actions and gestures in real time, virtual avatars and holoported humans can behave naturally, improving the sense of presence and realism in digital environments. This capability enables seamless human-computer interaction, allowing users to control avatars through body movements, engage in natural communication with virtual characters, and experience adaptive responses based on their actions.

JOANNEUM RESEARCH has developed in PRESENCE a first prototype of a real-time action recognition algorithm for virtual humans. It employs the presented pose estimation algorithm together with a neural network for pose-based action recognition and is able to detect actions performed by the virtual humans, like raise a hand, waving hands, crossing hands in front, picking up something, or falling down. The action recognition is done by analyzing the estimated poses of a virtual human over a short time span (roughly two seconds) and classifying the resulting pose trajectory with a neural network. As can be seen in Figure 3, the action recognition algorithm is able to recognize actions successfully for different kinds of virtual humans.

Figure 3. Illustration of successfully detected actions for different kinds of virtual humans (left: holoported human, right: smart avatar).

LinkedIn
Instagram
YouTube