3D realistic faces created on the fly

by time news

The dramatic advances in creative artificial intelligence (AI) innovations never cease to amaze us. Evidenced by the recent results of collaborative research between Nvidia and Stanford University (Computational Imaging Lab), which generates videos of faces in three dimensions by coupling the synthesis of super-realistic 3D images with a system of deep learning. This work was developed in December 2021 and will be presented in June at the next IEEE Computer Vision and Pattern Recognition conference.

This method starts from a few two-dimensional photos to create series of realistic 3D images of human faces – or cats –, varying the angle of view in a fluid way thanks to unsupervised learning (without prior labeling of the data). The problem of synthesizing 3D models from 2D photos is not new in itself, it represents a long-standing challenge. The method uses deep learning called “generative adversarial networks” (generative adversarial networks or GAN), where two neural networks are put in opposition, one to create data and the other to evaluate its quality, leading to its improvement. The novelty was to couple this GAN to a superresolution rendering technique.

Another advantage of this method is the new hybrid network architecture (known as “explicit-implicit”) manages to avoid the pitfalls of the previous approaches, which were very computationally intensive and produced 3D images whose resolution was of low quality.

The impact would be immense on the games industry but also on animated films and, more broadly, for the creative content industry.

The result is quite stunning, very realistic, geometrically consistent and above all obtained in real time! In addition, the method allows “morphing” effects by transforming the physiognomy of the characters continuously thanks to an interpolation between faces that are close in the space of possibilities explored by the AI.

The pose and the emotion

Another recent advancement in generative AI advances audio-driven facial animation (a recorded voice) with joint learning of pose and emotion: speech-mediated expression (sadness, anger, joy…) is immediately transcribed on the movements and facial features of the characters – human or not. One can easily imagine the power of a system that would combine both the generation of realistic 3D models of faces in real time from 2D photos with their animation by audio, served by adverse generative learning.

You have 33.68% of this article left to read. The following is for subscribers only.

You may also like

Leave a Comment