One of the central challenges in Extended Reality (XR) is creating virtual humans that feel believable, expressive, and diverse. Whether used as avatars, virtual companions, trainees, or background characters populating immersive environments, digital humans play a critical role in establishing a convincing sense of presence.

Within the PRESENCE project, Didimo has continued to advance the technologies that make high-quality virtual humans accessible at scale. Over the last few years, our research has focused on two complementary goals: improving the fidelity of digital character creation from minimal user input and enabling the efficient generation of large populations of unique characters suitable for interactive applications.

These efforts have resulted in innovations presented in particular at SIGGRAPH, demonstrating how research can help bridge the gap between realism, scalability, and production efficiency.

Creating a Digital Human from a Single Photo or Concept Art

One of the longstanding challenges in digital human creation is the amount of time, expertise, and specialized equipment traditionally required to produce realistic 3D characters. Creating a digital human often involves complex capture setups, detailed 3D modeling, and extensive manual refinement by skilled artists. While these workflows can deliver impressive results, they are difficult to scale to the needs of modern XR experiences.

To address this challenge, Didimo developed a fully automated pipeline capable of reconstructing a high-fidelity 3D face from a single frontal image. The input can be either a photograph of a real person or a realistic piece of concept art, making the technology applicable to both avatar creation and digital content production workflows.

The system combines machine learning, computer vision and computer graphics techniques to infer the underlying three-dimensional structure of a face from a two-dimensional image. A neural network first estimates the overall facial geometry, generating an initial 3D representation that captures the major characteristics of the individual or character. Rather than relying on large collections of real-world 3D facial scans, the system was trained using photorealistic synthetic data, enabling it to learn robust relationships between facial appearance and shape while avoiding the challenges associated with acquiring large-scale 3D ground-truth datasets.

The reconstruction is then refined through a detailed analysis of facial landmarks, including the eyes, nose, mouth, jawline, and forehead. This additional step allows the generated model to preserve subtle facial characteristics that contribute to identity and recognizability. The result is a realistic, animatable digital human that maintains a strong resemblance to the original image while being suitable for real-time applications.

Supporting both photographs and realistic concept art significantly broadens the range of applications for the technology. Users can create personalized avatars from their own images, while artists and content creators can transform character illustrations into fully animatable 3D assets without requiring a complete manual modeling workflow. This flexibility creates a bridge between creative design and production, helping accelerate the creation of virtual humans for XR experiences.

For the PRESENCE project, such capabilities are particularly valuable. By making it easier to create digital humans that either faithfully represent real individuals or accurately reproduce designed characters, the technology helps strengthen the sense of identity, immersion, and social presence within virtual environments.

Semantic Character Creation

While generating a digital human from a photograph or concept art provides an intuitive creation workflow, there are many situations where no reference image exists. Game designers may wish to populate a virtual world with previously unseen characters, researchers may need synthetic populations with controlled characteristics, and XR creators may want to rapidly explore alternative character designs.

To address these use cases, Didimo extended its character generation technology to support character creation directly from semantic descriptions. Instead of starting from an image, users can specify a set of demographic attributes, such as age, gender, and broad population characteristics, together with a collection of facial shape descriptors that capture the proportions and appearance of key facial regions.

These descriptors provide intuitive control over characteristics such as head shape, facial proportions, eye geometry, nose dimensions, lip shape, jaw structure, ear morphology, and other anatomical traits. Internally, the system translates these high-level descriptions into a consistent three-dimensional facial representation using the same statistical models that underpin the image-based reconstruction pipeline.

This approach offers two important advantages. First, it enables the creation of entirely new characters that are not derived from existing photographs, providing greater creative freedom while avoiding dependence on reference imagery. Second, it allows developers to generate controlled populations of virtual humans with specific demographic and morphological distributions, which can be particularly valuable for simulations, training environments, and large-scale XR experiences.

Because both image-based reconstruction and descriptor-based generation ultimately produce compatible digital human representations, they can be integrated seamlessly within the same production workflow. Creators may generate some characters from photographs, others from concept art, and others entirely from semantic descriptions, while maintaining visual consistency and compatibility with downstream animation and asset-generation processes.

For the PRESENCE project, this capability represents an important step toward more flexible and scalable virtual human creation. Rather than being limited to reproducing existing individuals, XR applications can generate diverse populations of believable digital humans tailored to the needs of a specific virtual environment, helping create richer and more engaging social experiences.

Beyond Reconstruction: Scaling Character Creation

While realistic avatar reconstruction is important, many XR experiences and games require something equally challenging: large populations of unique digital humans.

Creating hundreds or thousands of characters manually is often impractical. Each character typically requires modeling, texturing, rigging, animation setup, and asset fitting, resulting in substantial production costs and long development cycles.

Building on the technologies developed for facial reconstruction, Didimo extended its character generation pipeline to support large-scale character creation while preserving artistic direction and technical consistency.

This work introduces a framework capable of generating fully rigged, dressed, and animatable 3D characters from a single template character. Rather than replacing artists, the system amplifies their work by allowing a single carefully crafted template to serve as the foundation for an unlimited number of variations.

The process begins with a template character created by an artist and designed to match the visual style and technical requirements of a specific project. The system automatically extracts the character’s stylistic characteristics and maps them onto a flexible underlying human representation.

New characters can then be generated from photographs, concept art, or descriptive parameters. Their facial features, body characteristics, textures, and other visual attributes are automatically adapted to match the original artistic style. Existing assets such as clothing, hairstyles, accessories, and animations are transferred automatically, significantly reducing manual effort.

This approach allows developers to rapidly populate virtual worlds with diverse characters while maintaining visual coherence across the entire experience.

Supporting Presence Through Diversity and Realism

A convincing sense of presence depends not only on realism, but also on diversity. Virtual environments become more believable when they are populated by characters that look distinct, behave naturally, and reflect the variety found in real-world communities.

The technologies developed by Didimo contribute to this goal in two complementary ways.

First, they make it easier to create personalized digital humans that faithfully represent individual users. Second, they enable the generation of large numbers of unique characters that can inhabit virtual spaces without appearing repetitive or artificially duplicated.

Together, these capabilities help create richer and more engaging XR experiences, supporting the broader objectives of the PRESENCE project.

Looking Ahead

The evolution from single-photo facial reconstruction to scalable character generation illustrates how advances in artificial intelligence, computer graphics, and automation are transforming the creation of virtual humans.

As XR experiences continue to grow in scale and sophistication, the demand for believable, expressive, and diverse digital characters will only increase. By combining research innovation with production-ready tools, Didimo aims to make the creation of high-quality virtual humans faster, more accessible, and better suited to the immersive experiences envisioned by PRESENCE.

The ultimate goal is not simply to create more digital characters, but to enable more meaningful and authentic human interactions in virtual environments.

References

Mariana Dias, Pedro Coelho, Rui Figueiredo, Rita Carvalho, Verónica Orvalho, and Alexis Roche. “Creating infinite characters from a single template: How automation may give super powers to 3D artists.” In ACM SIGGRAPH 2024 Talks, 2024.

Mariana Dias, Alexis Roche, Margarida Fernandes, and Verónica Orvalho. “High-fidelity facial reconstruction from a single photo using photo-realistic rendering.” In ACM SIGGRAPH 2022 Talks, 2022.


“A young East Asian female with wide lips and narrow forehead”	“An elderly Caucasian male with broad jawline, prominent nose, and blue eyes”

“An adult Middle East female with upward-tilted eyes and narrow face”	“An adult Hispanic male with small eyes and narrow nose”

Scalable Virtual Human Creation from Images and Text