The paper of this model was published in Samsung AI Center which can be read on Arvix. According to the study, researchers used artificial neural network techniques to bring facial expressions and body movement in the source imagery. The AI will show the target face talking and make it do what source face does.
This technique and system is not entirely new. It is part of synthetic imagery which is related to AI. The techniques can make videos from a single photo of a person talking, but it requires a lot of data to make a video of one to two minutes in order to analyze the photo.
Moscow based researchers further wrote in the paper that videos generated from a single photo or painting which can show the person in the picture talking or moving are convincing, but not flawless.
The technique involves front-loading of facial identification process by using data to make a model that can find parts of source face which corresponds to target face. More data would result in better video and animation. Therefore, it is also possible by using one picture of Einstein or Mona Lisa to create moving faces with the technique of single shot learning.
Generative Adversarial Network also used in the process to merge two models to produce an output which could resemble human face by 90 percent. The network use two opposite pictures to create discriminator that could meet their expectations.
The paper included other example as well which try to portray the quality of fake talking face in which pictures of a person was captured from cable news to replicate his pictures. In result news ticker was also recreated. The video had some weird artifacts which should be looked at.
This new system is remarkable, but it can be applied to face or upper torso only. Animation of snapping fingers cannot be created from it.