Thursday, June 1, 2023

Google Joins Meta in Creating AI-Powered Text-to-Video Generator

Google Joins Meta in Creating AI-Powered Text-to-Video Generator – Meta, founded by Mark Zuckerberg, is not the only company building a text-to-video generating AI algorithm. Google has been working on one, too. Google Brain researchers introduced Imagen Video on Wednesday, a program that can generate realistic-looking video clips from text input.

The system expands Google’s original Imagen tool by incorporating moving visuals, resulting in unique videos that are essentially consistent from frame to frame. Google researchers wrote in a paper that Imagen Video is not only capable of producing high-quality videos, but also has a high degree of controllability and world knowledge, along with the ability to generate diverse videos as well as text animations in various artistic styles and with 3D object understanding.

Imagen Video can generate 5.3-second, 1,280-by-768 resolution 24frame-per-second videos. Researchers at Google developed the program by teaching its computer models to recognize videos but still images that had already been annotated with text descriptions. When given a text prompt, Imagen Video attempts to duplicate the imagery in the form of a video.

“While training on natural video data only enables the model to learn dynamics in natural settings, the model can learn about different image styles (such as sketch, painting, etc.) by training on images,” the paper added. “As a result, this joint training enables the model to generate interesting video dynamics in different styles.”

Imagen Video was trained on a total of 14 million movies and 60 million still images from an internal dataset as well as 400 million images from the LAION-400M open dataset. Researchers discovered that the computer was intelligent enough to comprehend three-dimensional objects and environments since it can generate videos of rotating objects while roughly keeping their structure.

Nevertheless, it is evident that Imagen Video could usher in a new era of video creation. Additionally, the application may generate the video clips in less than one minute. However, Google’s researchers are not disclosing the technology to the public at this time. The team has already implemented measures to prevent Imagen Video from producing fake, explicit, or harmful content.  Given that the technology was trained using a limited range of videos and images, the researchers are still concerned that it may promote stereotypes.

“While our internal testing suggest much of explicit and violent content can be filtered out, there still exists social biases and stereotypes which are challenging to detect and filter. We have decided not to release the Imagen Video model or its source code until these concerns are mitigated,” the researchers wrote. Meta, on the other hand, intends to provide its own text-to-video generator to the public after additional testing. All videos generated with the application, however, will include a watermark.

