Conditional Temporal Variational AutoEncoder for Action Video Prediction
Publication in refereed journal

Altmetrics Information

Other information
AbstractTo synthesize a realistic action sequence based on a single human image, it is crucial to model both motion patterns and diversity in the action video. This paper proposes an Action Conditional Temporal Variational AutoEncoder (ACT-VAE) to improve motion prediction accuracy and capture movement diversity. ACT-VAE predicts pose sequences for an action clip from a single input image. It is implemented as a deep generative model that maintains temporal coherence according to the action category with a novel temporal modeling on latent space. Further, ACT-VAE is a general action sequence prediction framework. When connected with a plug-and-play Pose-to-Image network, ACT-VAE can synthesize image sequences. Extensive experiments bear out our approach can predict accurate pose and synthesize realistic image sequences, surpassing state-of-the-art approaches. Compared to existing methods, ACT-VAE improves model accuracy and preserves diversity.
All Author(s) ListXiaogang Xu, Yi Wang, Liwei Wang, Bei Yu, Jiaya Jia
Journal nameInternational Journal of Computer Vision
Volume Number131
Issue Number10
PublisherSpringer Science+Business Media
Pages2699 - 2722
LanguagesEnglish-United States
KeywordsVariational AutoEncoder, Action modeling, Temporal coherence, Adversarial learning

Last updated on 2024-04-03 at 14:10