by Xiaoyu Shi1*, Zhaoyang Huang1*, Fu-Yun Wang1*, Weikang Bian1*, Dasong Li 1, Yi Zhang1, Manyuan Zhang1, Ka Chun Cheung2, Simon See2, Hongwei Qin3, Jifeng Dai4, Hongsheng Li1
1CUHK-MMLab 2NVIDIA 3SenseTime 4 Tsinghua University
@article{shi2024motion,
title={Motion-i2v: Consistent and controllable image-to-video generation with explicit motion modeling},
author={Shi, Xiaoyu and Huang, Zhaoyang and Wang, Fu-Yun and Bian, Weikang and Li, Dasong and Zhang, Yi and Zhang, Manyuan and Cheung, Ka Chun and See, Simon and Qin, Hongwei and others},
journal={SIGGRAPH 2024},
year={2024}
}
}
Overview of Motion-I2V. The first stage of Motion-I2V targets at deducing the motions that can plausibly animate
the reference image. It is conditioned on the reference image and text prompt, and predicts the motion field maps between
the reference frame and all the future frames. The second stage propagates reference image’s content to synthesize frames. A
novel motion-augmented temporal layer enhances 1-D temporal attention with warped features. This operation enlarges the
temporal receptive field and alleviates the complexity of directly learning the complicated spatial-temporal patterns.
- Install environments
conda env create -f environment.yaml
- Download models
git clone https://huggingface.co/wangfuyun/Motion-I2V
- Run the code
python -m scripts.app