Visual narratives
Prompts can be changed over time to tell visual stories.
Long(er) video generation
By default, VideoPoet outputs 2-second videos. But the model is also capable of long video generation by predicting 1 second of video output given an input of a 1-second video clip. This process can be repeated indefinitely to produce a video of any duration. Despite the short input context, the model shows strong object identity preservation not seen in prior works, as demonstrated in these longer duration clips.
Interactive video editing
Interactive editing is also possible, extending input videos a short duration and selecting from a list of examples. By selecting the best video from a list of candidates, we can finely control the types of desired motion from a larger generated video. Here we generate three samples without text conditioning and the final one with text conditioning.
Controllable video editing
The VideoPoet model can edit a subject to follow different motions, such as dance styles.