The team's two-stage technique controls a character using a full-body kinematic motion reference, prioritizing precise imitation. Disney Research/YouTube
Engineers at Disney Research have enabled robots to learn dance by tapping into unstructured motion data.
The team utilized a two-stage technique to control a character using full-body kinematic motion.
First, they trained a variational autoencoder to create a latent space encoding by processing short motion segments from unstructured data. Next, they used this encoding to train a conditional policy, linking kinematic input to dynamics-aware output.
By separating these stages, the team enhanced the quality of latent codes and avoided issues like mode collapse. They demonstrated the method’s efficiency and robustness in simulations and on a bipedal robot, successfully bringing dynamic motions to life.
Efficient motion training
Physics-based character animation has improved greatly in recent years through imitation-driven reinforcement learning, which allows for accurate tracking across many skills. However, current methods fail to do so with just one policy that can handle diverse raw dynamic motions while also achieving full body control.
Character control methods that rely on learning have greatly evolved, particularly in kinematic and physics-based motion synthesis. Kinematic approaches utilize compact motion representation or generative models to produce seamless and plausible motions, sometimes incorporating physics engines to help eliminate artifacts.
Deep reinforcement learning-assisted physics-based techniques mainly concentrate on imitating reference animations but often need elaborate configurations for different skills. In this case, approaches adopt latent spaces to train policies, balancing data diversity and control accuracy. Nevertheless, these often necessitate custom setups or extensive retraining.
The Disney Research team’s new technique efficiently trains a single policy, offering robust, diverse, and high-fidelity full-body control.
Dynamic control framework
The proposed method of character motion control has two parts. First, the variational autoencoder (VAE) is trained to produce a latent representation of motion from randomly sampled short windows of data. This latent space captures essential features of motion on a large and diverse collection of clips.
In the second stage, a reinforcement learning policy is trained to use this latent code and current motion data to control the character, aiming at accurate tracking and smooth motions. The RL policy conditions on both the current kinematic state and the latent code help align new inputs with learned motions.
The authors noted that it also includes rewards for tracking accuracy, staying alive, and smoothness, along with domain randomization to enhance robustness and prevent overfitting. This method effectively handles unseen input, maintaining high fidelity in motion control for both virtual and robotic characters.
Furthermore, the technique effectively scales with motion diversity and training complexity, accurately tracks unseen dynamic motions, and interfaces with common animation techniques.
Robust motion techniques
Researchers claim that demonstrations on both virtual and physical humanoid characters show that this method robustly executes expressive motions, even at the physical limits of hardware.
Users may precisely control character motions using the kinematic motion interface, and the two-stage training method can handle a wide range of talents. The method is anticipated to combine with various control modalities and generative tasks, however it has not been tested directly.
However, it has trouble with movements that involve long-term planning, like acrobatics, which can require more sophisticated designs. Furthermore, although the approach works well for tracking kinematic references, its generative potential is still unknown.
Researchers claim that by demonstrating expressive motions on robotic hardware, this work unites advances in computer graphics and robotics and suggests that self-supervised and RL techniques could lead to universal control policies.
RECOMMENDED ARTICLES
0COMMENT
NEWSLETTER
The Blueprint Daily
Stay up-to-date on engineering, tech, space, and science news with The Blueprint.
By clicking sign up, you confirm that you accept this site's Terms of Use and Privacy Policy
ABOUT THE EDITOR
Jijo Malayil Jijo is an automotive and business journalist based in India. Armed with a BA in History (Honors) from St. Stephen's College, Delhi University, and a PG diploma in Journalism from the Indian Institute of Mass Communication, Delhi, he has worked for news agencies, national newspapers, and automotive magazines. In his spare time, he likes to go off-roading, engage in political discourse, travel, and teach languages.