Engineers and researchers from Samsung’s AI Center in Moscow and Skolkovo Institute of Science and Technology have created a model that can generate realistic animated talking heads from images without traditional methods like 3D modeling.
Samsung opened AI research centers last year in Moscow, Cambridge, and Toronto.
“Effectively, the learned model serves as a realistic avatar of a person,” said engineer Egor Zakharov in a video explaining the results.
Well-known faces seen in the paper include Marilyn Monroe, Albert Einstein, Leonardo da Vinci’s Mona Lisa, and RZA from the Wu Tang Clan. The technology that focuses on synthesizing photorealistic head images and facial landmarks could be applied to video games, video conferences, or digital avatars like the kind now available on Samsung’s Galaxy S10. Facebook is also working on realistic avatars for its virtual reality initiatives.
Such tech could clearly also be used to create deepfakes.
Few-shot learning means the model can begin to animate a face using just a few, or even one image of an individual. Meta training with the VoxCeleb2 dataset of videos is first carried out before the model can animate previously unseen faces.
During the training process, the system creates three neural networks: The embedded network maps frames to vectors, a generator network maps facial landmarks in the synthesized video, and a discriminator network assesses the realism and pose of the generated images.
“Crucially, the system is able to initialize the parameters of both the generator and the discriminator in a person-specific way, so that training can be based on just a few images and done quickly, despite the need to tune tens of millions of parameters. We show that such an approach is able to learn highly realistic and personalized talking head models of new people and even portrait paintings,” coauthors said in a summary of the paper on arXiv.
In other forms of AI developed recently to mimic human faces, University of Washington researchers last year shared how they created ObamaNet, a lip sync model based on Pix2Pix trained on videos of the former U.S. president.
University of California, Berkeley researchers introduced a model last fall that uses YouTube videos to train an AI dataset to dance or make acrobatic moves like backflips.