Buy Me a Coffee

Runway’s Gen-2 AI Video Model: A New Frontier in Text-to-Video Generation

Introducing Runway’s Gen-2 AI Video Model

Runway, the startup behind the AI art generator Stable Diffusion, is preparing to launch the first public test for its Gen-2 AI video model. This new model claims to be the first publicly available text-to-video model and is a significant upgrade from the company’s Gen-1 video-to-video AI model. Gen-2 aims to allow users to create 3-second videos from scratch using simple text prompts.

Google and Meta Entering the Text-to-Image Race

Both Google and Meta are working on their text-to-image generators, but have been tight-lipped about their progress. In contrast, Runway has been steadily growing its reputation with its video editing tools and AI models, starting with its Gen-1 AI model that could transform existing videos based on text prompts or reference images.

ModelScope – An Alternative AI Video Generator

For those eager to try AI video generation, ModelScope, an AI text-to-video system developed by the DAMO Vision Intelligence Lab (a research division of Alibaba), is another option. ModelScope has generated buzz for its sometimes awkward and quirky 2-second video clips created using a basic diffusion model.

Open Source and GPU Server Requirements

ModelScope is open source and available on Hugging Face, but users may need to pay a small fee to run the system on a separate GPU server. Tech YouTuber Matt Wolfe offers a tutorial on setting it up, or users with technical skills and sufficient VRAM can run the code themselves.

Copyright Concerns with Training Data

ModelScope’s training data seems to include a significant number of videos and images from Shutterstock, as many generated videos show the stock photo site’s logo. This issue is not unique to ModelScope; Getty Images has sued Stability AI for similar concerns with its AI art generator, Stable Diffusion.

Runway Aiming to Stand Out in AI Research

Runway seeks to make a name for itself in the competitive AI research field. Their Gen-1 system relies on a large-scale dataset of images, videos, and text-image data, but the researchers have found a lack of high-quality video-text datasets. It remains to be seen how Runway’s more polished text-to-video model will compare to heavyweights like Google in the future.

More content at UsAndAI. Join our community and follow us on our Facebook page, Facebook Group, and Twitter.

Facebook
Twitter
LinkedIn

Subscribe

* indicates required