The future of AI video after Sora is impressive and flawed

August 2024 · 9 minute read

This is the future of AI video.

Scroll to continue

When videos like these are made completely by artificial intelligence.

None of these videos depict real people, places or events.

Comment on this storyAdd to your saved stories

At first glance, the images amaze and confound: A woman strides along a city street alive with pedestrians and neon lights. A car kicks up a cloud of dust on a mountain road.

But upon closer inspection, anomalies appear: The dust plumes don’t always quite line up with the car’s rear wheels. And those pedestrians are stalking that woman like some eerie zombie horde.

This is Sora, a new tool from OpenAI that can create lifelike, minute-long videos from simple text prompts. When the company unveiled it on Feb. 15, experts hailed it as a major moment in the development of artificial intelligence. Google and Meta also have unveiled new AI video research in recent months. The race is on toward an era when anyone can almost instantly create realistic-looking videos without sophisticated CGI tools or expertise.

Story continues below advertisement

Advertisement

Story continues below advertisement

Advertisement

Disinformation researchers are unnerved by the prospect. Last year, fake AI photos of former president Donald Trump running from police went viral, and New Hampshire primary voters were targeted this January with fake, AI-generated audio of President Biden telling them not to vote. It’s not hard to imagine lifelike fake videos erupting on social media to further erode public trust in political leaders, institutions and the media.

For now, Sora is open only to testers and select filmmakers; OpenAI declined to say when Sora will available to the general public. “We’re announcing this technology to show the world what’s on the horizon,” said Tim Brooks, a research scientist at OpenAI who co-leads the Sora project.

The videos that appear here were created by the company, some at The Washington Post’s request. Sora uses technology similar to artificial intelligence chatbots, such as OpenAI’s ChatGPT, to translate human-written prompts into requests with sufficient detail to produce a video.

Some are shockingly realistic. After Sora was asked to create a scene from California’s rugged Big Sur coastline, the AI tool’s output is stunning.

AI-generated fake videoAI-manipulated videoReal-life videoPrompt: Drone view of waves crashing against the rugged cliffs along Big Sur’s garay point beach. The crashing blue waters create white-tipped waves, while the golden light of the setting sun illuminates the rocky shore. A small island with a lighthouse sits in the distance, and green shrubbery covers the cliff’s edge. The steep drop from the road down to the beach is a dramatic feat, with the cliff’s edges jutting out over the sea. This is a view that captures the raw beauty of the coast and the rugged landscape of the Pacific Coast Highway. (OpenAI)AI-generated fake videoAI-manipulated videoReal-life videoAerial video uploaded in 2023 by Philip Thurston of the actual coastline of Big Sur in California. (Philip Thurston/Getty Images)

Although “garay point beach” is not a real place, Sora produced a video that is almost indistinguishable from this real video of the Big Sur coast near Pfeiffer Falls shot by photographer Philip Thurston. If anything, the fake scene looks more majestic than the real one.

Humans and animals are harder. But here, too, Sora produces surprisingly lifelike results. Take a look at this scene of a cat demanding breakfast.

AI-generated fake videoAI-manipulated videoReal-life videoPrompt: A cat waking up its sleeping owner demanding breakfast. The owner tries to ignore the cat, but the cat tries new tactics and finally the owner pulls out a secret stash of treats from under the pillow to hold the cat off a little longer. (OpenAI)

The texture of the cat’s fur, the intricate shadows on the blankets and the way the person’s face responds to the cat’s intrusion are all realistic. But take another look at that paw.

AI-generated fake videoAI-manipulated videoReal-life video (OpenAI)

Sora seems to have trouble with cause and effect, so when the cat moves its left front paw, another appendage sprouts to replace it. The person’s hand is accurately rendered — a detail previous AI tools have struggled with — but it’s in an odd spot.

A similar thing happens in this scene from a Holi spring festival in India, which OpenAI produced at The Post’s request.

AI-generated fake videoAI-manipulated videoReal-life videoPrompt: Drone view of a crowd of people celebrating the festival of Holi in a city center in India. The people laugh and run through the streets throwing colored powder at each another. The drone zooms out and the shot pans around the rest of the city, showing a skyline and the sun beginning to set. (OpenAI)

Sora produces a realistic drone shot of the colorful celebration, but some people in the crowd blur together, while others sprout clones.

AI-generated fake videoAI-manipulated videoReal-life video (OpenAI)

Sora was created by training an AI algorithm on countless hours of videos licensed from other companies and public data scraped from the internet, said Natalie Summers, a spokesperson for the Sora project. By ingesting all that video, the AI amasses knowledge of what certain things and concepts look like. Brooks compared the model’s growth to the way humans come intuitively to understand the world instead of explicitly learning the laws of physics.

Story continues below advertisement

Advertisement

Story continues below advertisement

Advertisement

Successive versions of the model have gotten better, said Bill Peebles, the other co-lead on the Sora project. Early versions couldn’t even make a credible dog, he said. “There would be legs coming out of places where there should not be legs.”

This video shows Sora has gotten the canine thing down. But these frolicking gray wolf pups still merge and reemerge with mesmerizing weirdness.

AI-generated fake videoAI-manipulated videoReal-life videoPrompt: Five gray wolf pups frolicking and chasing each other around a remote gravel road, surrounded by grass. The pups run and leap, chasing each other, and nipping at each other, playing. (OpenAI)

How about a scene from a classic Hollywood film? At The Post’s request, Sora produced an actor and a sensibility that seems plucked directly from a real movie.

AI-generated fake videoAI-manipulated videoReal-life videoPrompt: A person in a 1930s Hollywood movie sits at a desk. They pick up a cigarette case, remove a cigarette and light it with a lighter. The person takes a long drag from the cigarette and sits back in their chair. Golden age of Hollywood, black and white film style. (OpenAI)

But Sora clearly is confounded by how to light a cigarette. It knows the process involves hands, a lighter and smoke, but it can’t quite figure out what the hands do or in what order.

There are other problems. Look closely at the telephone. It has two handsets and a cord that stretches upward to become part of the lamp. Other items on the desk look vaguely real, but it’s unclear what they’re supposed to be.

“The model is definitely not yet perfect,” Brooks said.

Other videos show struggles, too. In this one, a man runs realistically on a treadmill — except he’s facing backward.

AI-generated fake videoAI-manipulated videoReal-life videoPrompt: Step-printing scene of a person running, cinematic film shot in 35mm. (OpenAI)

And even when Sora gets it right, problems may lurk. Take this video Sora made of a Victoria crowned pigeon. Tech critic and author Brian Merchant pointed out that the video looks quite similar to a real one of the same bird filmed by a photographer whose images are available on Shutterstock.

AI-generated fake videoAI-manipulated videoReal-life videoPrompt: This close-up shot of a Victoria crowned pigeon showcases its striking blue plumage and red chest. Its crest is made of delicate, lacy feathers, while its eye is a striking red color. The bird’s head is tilted slightly to the side, giving the impression of it looking regal and majestic. The background is blurred, drawing attention to the bird’s striking appearance. (OpenAI)AI-generated fake videoAI-manipulated videoReal-life videoStock footage of a close-up shot showing a Victoria crowned pigeon. (Shutterstock)

OpenAI has a partnership with Shutterstock to use its videos to train AI. But because Sora is also trained on videos taken from the public web, owners of other videos could raise legal challenges alleging copyright infringement. AI companies have argued that using publicly available online images, text and video amounts to “fair use” and is legal under copyright law. But authors, artists and news organizations have sued OpenAI and others, saying they never gave permission or received payment for their work to be used this way.

Story continues below advertisement

Advertisement

Story continues below advertisement

Advertisement

The AI field is struggling with other problems, as well. Sora and other AI video tools can’t produce sound, for example. Although there has been rapid improvement in AI tools over the past year, they are still unpredictable, often making up false information when asked for facts.

Meanwhile, “red teamers” are assessing Sora’s propensity to create hateful content and perpetuate biases, said Summers, the project spokesperson.

Still, the race to produce lifelike AI videos isn’t slowing down. One of Google’s efforts, called Lumiere, can fill in pieces cut out of real videos. Here, it fills in the black section from the video on the left.

AI-generated fake videoAI-manipulated videoReal-life video (Google)AI-generated fake videoAI-manipulated videoReal-life video (Google)

“Our primary goal in this work is to enable novice users to generate visual content in a creative and flexible way,” Google said in a research paper. The company declined to make a Lumiere researcher available for an interview.

Other companies have begun commercializing AI video technology. New York-based start-up Runway has developed tools to help people quickly edit things into or out of real video clips.

AI-generated fake videoAI-manipulated videoReal-life videoA screen recording shows Runway’s Inpainting AI tool being used to edit a video. (Runway)

OpenAI has even bigger dreams for its tech. Researchers say AI could one day help computers understand how to navigate physical spaces or build virtual worlds that people could explore.

“There’s definitely going to be a new class of entertainment experiences,” Peebles said, predicting a future in which “the line between video game and movie might be more blurred.”

AI-generated fake videoAI-manipulated videoReal-life videoPrompt: Extreme close up of a 24 year old woman’s eye blinking, standing in Marrakech during magic hour, cinematic film shot in 70mm. (OpenAI)
About this story

Editing by Karly Domb Sadof and Yun-Hee Kim. Design editing by Betty Chavarria. Video production by Nicki DeMarco. Copy editing by Melissa Ngo.

ncG1vNJzZmivp6x7uK3SoaCnn6Sku7G70q1lnKedZMGmr8enpqWnl658qrrTnqmam6Sew6Z7kWlpbWeRnnq3tcOepmarn6eubrvPnqWaoV2buaLD0mg%3D