In a few minutes, let the child's character graffiti "move", and meta AI created a wonderful match world

2021-12-20

Have you ever thought about making an animation of a children's painting? As shown below, children can draw unique and creative tasks and animals: stars with feet, birds with super long legs Parents and teachers can easily understand what children's painting wants to express, but AI is difficult to complete this task, because children's painting is usually constructed in an abstract and strange way. For the "people" in children's painting, the "people" in painting have many different forms, colors, sizes and proportions, in terms of body symmetry There are few similarities in form and perspective. For AI, there are still some difficulties in identifying children's paintings. At present, there are many AI tools and technologies to deal with realistic drawings, but children's paintings increase a certain degree of diversity and unpredictability, which makes it more complex to identify the content depicted. Many AI researchers are trying to overcome this challenge so that AI systems can better identify various character paintings created by children. Recently, meta announced the creation of an AI system, which can automatically animate children's hand-painted characters and humanoid characters (i.e. characters with two arms, two legs and one head) without any manual guidance, and realize the transformation from a static picture to animation in a few minutes. For example, the kittens and bees drawn by children are uploaded to meta AI, and you will see that the paintings become dancing characters with very realistic movements. Trial address: https://sketch.metademolab.com/ By uploading the painting to the meta prototype system, users can experience that the painting becomes a jumping role. In addition, users can also download animation to share with friends and family. If users wish, they can also submit these paintings to help improve the AI model. Meta completes the transformation from painting to animation through four steps: target detection and human shape recognition; Use the character mask to promote the human figure from the scene; Prepare for animation through "rigging"; Use 3D motion capture to animate a 3D human figure. The first step is to distinguish the characters in the painting from the background and other types of characters in the painting. The existing target detection methods have good recognition effect on children's painting, but the segmentation mask is not accurate enough to be used in animation. In order to solve this problem, meta uses the bounding boxes obtained from the target detector, and applies a series of morphological operations and image processing steps to obtain the mask. Meta AI uses mask r-cnn, a target detection model based on convolutional neural network, to extract characters in children's paintings. Although mask r-cnn is pre trained on the largest segmentation data set, the data set is composed of real-world object photos, not paintings. In order to enable the model to process the drawing, it is necessary to fine tune the model. Meta AI uses resnet-50 + FPN to fine tune the model to predict a single category "humanoid graph". Meta AI fine tuned the model on about 1000 paintings. After fine tuning, the model well detects the human figure in the test data set. However, there are also cases of failure. The following pictures can be divided into four categories: the detected human figure does not contain the whole image (for example, the tail in the picture is not included); the human figure is not separated from the background; several human figures are not separated; and non-human characters (such as trees) are incorrectly recognized. After identifying and extracting characters from the painting, the next step in generating animation is to separate them from other parts of the scene and the background. This process is called masking. The mask must accurately map the outline of the character because it will be used to create a mesh and then deform to generate animation. When everything is right, the mask will contain all the components of the role and eliminate any background content. Although mask r-cnn can output masks, meta AI finds that they are not suitable for animation. When the appearance of body parts changes greatly, the predicted mask usually cannot capture the whole character. As shown in the figure below, a large yellow triangle represents the body and a pencil stroke represents the arm. When using mask r-cnn to predict the mask, the pencil stroke connecting both hands is usually omitted. Based on this, meta AI has developed a method based on classical image processing, which is more robust to character changes. Based on this method, meta AI uses the predicted humanoid bounding box to crop the image. Then, using adaptive threshold and morphological closing / dialing operation, fill from the edge of the bounding box, and assume that the mask is the largest polygon that is not filled. Comparison between mask r-cnn and classical image processing methods. However, although this method is simple and effective for extracting an accurate mask suitable for animation, it may also fail when the background is messy, the characters are too close, or there are wrinkles, tears or shadows on the paper page. Prepare for animation through "rigging" The children's Club draws a variety of body shapes, which goes far beyond the traditional concept of human shape with complete head, arms, legs and trunk. The matches drawn by some children have no trunk, only arms and legs are directly connected to the head. Other children draw more bizarre human shapes, with legs extending from the head and arms from the thighs. Therefore, meta AI needs to find a rigging method that can show body shape changes. They chose to use the human posture detection model alphapose to identify the key points in the portrait as hips, shoulders, elbows, knees, wrists and ankles. The model is trained on real-life images, so before adjusting it to detect human posture in children's paintings, meta AI must be retrained to deal with the types of changes in children's paintings. Specifically, meta AI achieves the above goal by internally collecting and annotating small data sets of children's human images. Then, using the attitude detectors trained on these initial data sets, an internal tool is created so that parents can upload and animate their children's paintings. With the addition of more data, meta AI iteratively retrains the model until high accuracy is achieved. Animating a 3D human figure using 3D motion capture With masks and joint predictions, you have everything you need to make animation. Meta AI first uses the extracted mask to generate the mesh, and uses the original painting for texturing. Using the predicted joint positions, they create bones for the character. After that, the character is transplanted to various poses by rotating the bones and deforming the mesh with the new joint position. By transplanting the character into a series of consecutive poses, you can then create animation. It is common for children to draw body parts from the most recognizable angle, such as legs and feet from the side and head and trunk from the front. Meta AI takes advantage of this phenomenon in the action relocation step. For the lower body and raw body, they will automatically determine whether to recognize the action from the front or side. Specifically, they mapped the action to a single 2D plane and used it to drive the character, and verified the results of this action relocation using the perceptual user study run by Mechanical Turk. The segmented detection process is shown in the figure below: Meta AI said that it is helpful to take the distorted perspective into account, because many types of actions do not fall on a single projection plane. For example, in rope skipping, the arms and wrists mainly move in the frontal plane, while the bent legs tend to move in the sagittal plane. Therefore, meta AI does not determine a single action platform for motion capture posture, but determines the projection planes of upper body and lower body respectively. At the same time, with AR eyes, the stories in the painting can be lifelike in the real world, and the characters in the painting can dance or talk with the children who painted it. (Xinhua News Agency)

Edit:Li Ling Responsible editor:Chen Jie

Source:jiqizhixin

Special statement: if the pictures and texts reproduced or quoted on this site infringe your legitimate rights and interests, please contact this site, and this site will correct and delete them in time. For copyright issues and website cooperation, please contact through outlook new era email：lwxsd@liaowanghn.com

Return to list