OpenAI unveils powerful, creepy new text-to-video model

The generative AI company behind ChatGPT and DALL-E has a new toy: Sora, a text-to-video model that can generate pretty convincing 60-second clips from prompts like “a stylish woman walks down a Tokyo street…” and “a movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet…”

A lot of the AI video generation we’ve seen so far fails to sustain a consistent reality, redesigning faces and clothing and objects from one frame to the next. Sora, however, “understands not only what the user has asked for in the prompt, but also how those things exist in the physical world,” says OpenAI in its announcement post (using the word “understands” loosely).

View post on imgur.com”

The Sora clips are impressive and creepy. If I weren’t looking closely—say, I was just scrolling past them on social media—I’d probably think many of them were real. The prompt “a Chinese Lunar New Year celebration video with Chinese Dragon” looks at first like typical documentary footage of a parade. But then you realize that the people are oddly proportioned, and seem to be stumbling—it’s like the moment in a dream when you suddenly notice that everything is a little bit wrong.

“The current model has weaknesses,” writes OpenAI. “It may struggle with accurately simulating the physics of a complex scene, and may not understand specific instances of cause and effect. For example, a person might take a bite out of a cookie, but afterward, the cookie may not have a bite mark. The model may also confuse spatial details of a prompt, for example, mixing up left and right, and may struggle with precise descriptions of events that take place over time, like following a specific camera trajectory.”

My favorite demonstration of Sora’s weaknesses is a video in which a plastic chair begins morphing into a Cronenberg lifeform. Behold:

View post on imgur.com”

Sora is not currently available to the public, and OpenAI says it’s assessing social risks of the model and working on mitigating them, for instance with “a detection classifier that can tell when a video was generated by Sora.”

It’s fascinating as a research project, but of course, OpenAI isn’t just interested in doing cool computer science. If it can outmaneuver copyright critics and legislators, it’s here to make bank. The company says it’s currently “granting [Sora] access to a number of visual artists, designers, and filmmakers to gain feedback on how to advance the model to be most helpful for creative professionals.” 

One commenter on X optimistically wondered if models like Sora will one day allow the public to wrest control of filmmaking away from Hollywood by making movies purely with prompts—but I wonder where they think the source material for all this generated video will come from if not, you know, filmmakers? Hollywood movies may already look pretty homogenous, but auto-reproducing Marvel Cinematic Universe-style CGI and car commercial drone shots isn’t exactly bringing creative expression to the masses, if you ask me. (The blog post notably doesn’t mention Sora’s training material.)

View post on imgur.com”

Nevertheless, we’re already seeing generative AI used in games, both in ways that are directly visible to us, like to generate art and voices, and in ways that are less obvious, like generating code or early concept art, or for marketing. A recent survey found that 31% of game development professionals use generative AI in some capacity. Combined with other software, I wonder what this kind of machine learning-driven video simulation could do besides generate slightly-off CG-like clips.

I don’t think anyone really knows what the consequences of all this machine learning development will be, but it isn’t slowing down, so we’re going to find out. OpenAI and other companies are explicitly working not just toward better image and video and text generators, but toward “artificial general intelligence” or AGI—as in, sci-fi AI.

“Sora serves as a foundation for models that can understand and simulate the real world, a capability we believe will be an important milestone for achieving AGI,” says OpenAI.

Leave a Reply

Your email address will not be published.

Previous post Metro Exodus has now sold more than 10 million copies, 4A Games teases the next game in the series ‘when it’s ready’
Next post Starfield & Indiana Jones Are Not Coming to PS5 (Just Yet)