This year has been one of the most exciting for AI. For six months now there has been a lot of buzz surrounding DALL-E and Midjourney, AI programs that generate images from mere text. And if you think that was impressive wait until you hear what AI programs can do now.
According to various recent reports, Google and Meta are each independently creating AI-powered programs that can generate video out of text inputs. Both companies have given impressive demonstrations of what their systems can do.
Meta’s Make-a-Video was the first to come onto the scene. A team of engineers at the company unveiled their research in a blog post where they detailed what it was capable of.
“With just a few words or lines of text, Make-A-Video can bring imagination to life and create one-of-a-kind videos full of vivid colours, characters, and landscapes. The system can also create videos from images or take existing videos and create new ones that are similar.”
It is important to note that Make-A-Video is not a standalone project but rather a continuation of a project that began with a Make-A-Scene, Meta’s own AI image generative system. According to the post, however, AI was unique compared to the rest as it can “create photorealistic illustrations and storybook-quality art using words, lines of text, and freeform sketches.”
As to Make-A-Video’s qualities, Meta insists that the system is very much a work in progress. They have, however, posted a few gifs and videos showing some of the results of their work.
You can check the gallery here.
Suffice it to say that the system is pretty impressive if you’ve tried it. Even if still in development it still manages to produce some pretty decent videos.
Can it create a feature film?
No. For now. But at this rate of development, we shouldn’t be surprised if we achieve that sooner than we expected.
Google is also not to be left behind in the AI race. This week, the company announced the development of Imagen Video, an AI mode claimed to be able of producing 1280 x 768 videos at 24 frames per second from just a text prompt.
This announcement came just a week after Meta released their breakthrough in this space. This shows the stiff competition to develop what has the potential to be one of the biggest disruptors in the creative industry.
Corporate PR aside, the research paper shows some very promising results for what is claimed to be just the first steps in development. Compared to Make-A-Video’s videos, Imagen Video seems to be a little bit better. But just by a bit.’ Both are still pretty basic at this point. The best you can get from these systems is pretty much what you would expect from AI programs that are still learning to interpret the real world.
In fact, the training data for Google Imagen Video comes from the publicly available LAION-400M image-text pairs. The result has been some ‘problematic’ output that can contain sexually explicit and violent content.
For this reason, we should expect that both projects will take a little longer to be actually useful in the creative market. Google says “We have decided not to release the Imagen Video model or its source code until these concerns are mitigated.”
Overall, this move indicates a very positive step in AI research and development. After years of work, we are finally getting to experience some useful applications for AI, especially in the creative industry. However, there is still a lot of room for improvement.