VideoPoet, a creation of Google Research, marks a significant advancement in video generation, especially in its ability to produce expansive, engaging, and high-quality motions.
This tool serves to convert autoregressive language models into a premium video generator. It incorporates elements such as the MAGVIT V2 video tokenizer and SoundStream audio tokenizer, which convert images, video, and audio clips of varying durations into a unified sequence of discrete codes.
These codes are integrated with language models based on text, facilitating interaction with other modalities such as text. An autoregressive language model, integrated within this tool, learns across video, image, audio, and text formats to predict, in an autoregressive fashion, the subsequent video or audio token in the sequence.
Moreover, it integrates multimodal generative learning objectives into the training framework. These include text-to-video, text-to-image, image-to-video, video frame continuation, video inpainting and outpainting, video stylization, and video-to-audio capabilities.
VideoPoet is capable of generating videos in both square and portrait orientations to meet the needs of short-form content. It also enables the creation of audio from a video source.
With its ability to multitask across various video-focused inputs and outputs, VideoPoet demonstrates how language models are able to synthesize and modify videos while ensuring desirable temporal consistency.
Motions with high accuracy
SoundStream audio tokenizer
MAGVIT V2 video tokenizer
Restricted orientation options
Unforeseeable results
No live editing

Released 1 year ago
Free

Released 10 months ago
From $9.99/month

Released 1 year ago
From $49.90/month

Released 4 months ago
Free + from $8/month

Released 1 year ago
Contact for pricing

Released 5 months ago
Free + from $9.99/month

Released 2 years ago
Free + from $5/unit

Released 2 years ago
Free

Transform footage, animate still images, and generate cinematic video clips.
Released 8 days ago
From $19/month