A.I. Can Make Music, Screenplays, and Poetry. What About a Movie?

How long will it be until an A.I. can make an actual feature film on demand?

Illustration: Yann Bastard

Let’s say for the sake of argument you’re stuck at home for a long time watching too much of the stuff we euphemistically call “streaming content,” by which I mean movies and TV. Come up with your own reason — anything from being one of Japan’s pathologically introverted hikikomori to, say, hiding out from some sort of potentially lethal respiratory virus. In any case, you will at some point sour on all the available programming options and scroll glumly through all the familiar title selection menus until you give up. Tiger King is more of a punch line than a TV show at this point, and, sure, you could plumb the depths of history’s most creative auteurs over on the Criterion Channel, but that sounds hard, and if you are like me, you consider reading the morning news emotional labor.

But what if there were a movie streaming service with no downsides? Call it “Black Box.” You would get exactly what you’re in the mood for every time, but unlike rewatching an old favorite, it wouldn’t be a retread, because no two movies on Black Box would ever be the same. On Black Box, rather than selecting a title, you would choose from a menu of options like genres, plots, types of characters, locations, and content keywords to include or exclude.

Want a movie where a protagonist your age, race, sexuality, gender, and religion becomes an Olympic swimmer? You got it. Want a movie where someone demographically identical to your boss gets squeezed to death and devoured by a Burmese python? Your wish is its command. Want to leave out the specifics and let fate decide what never-before-imagined movie will be entertaining you this evening? Black Box has you covered.

After you make your choices — and of course pay a nominal fee for the serious computational heavy lifting necessarily involved — your order is received at Black Box HQ, and an original movie will be on its way shortly.

Black Box converts your specifications into data — or if you didn’t ask for anything specific, a blob of randomly generated numerical noise will do — and the creation process can begin. That first collection of ones and zeros will become a prompt, and will be fed into a type of A.I. called a transformer, which will spit out the text screenplay for your movie through a process a little like the autocomplete function on your smartphone.

That screenplay will then be fed into a variation on today’s vector quantized variational autoencoders — neural nets that generate music, basically — producing chopped up little bits of sound that, when strung together, form an audio version of the spoken dialogue and sound effects in your custom movie, plus an orchestral score. Finally, in the most challenging part of the process, those 90 minutes of audio, along with the screenplay, get fed into the world’s most sophisticated GAN, or generative adversarial network. Working scene by scene, the Black Box GAN would generate a cast of live action characters — lifelike humans, or at least human-esque avatars — built from the ground up, along with all of the settings, monsters, car chases, dogs, cats, and little surprises that make it feel like a real movie.

Setting aside the sheer implausibility of this scenario for a moment, if someone managed to do all of this with present-day technology, the result would be deeply tweaked and possibly disturbing. No matter how much Black Box tried to make a normal movie, A.I. authorship would all but guarantee strangeness. For now.

There was once a pretty well-worn joke about the output of supposedly creative robots being derivative and boring; it dates back at least a couple decades.

But generative A.I. systems actually exist now, and they’ve revealed themselves to have deeply warped creative instincts. In 2015, the A.I. nightmare known as DeepDream, one of the first popular examples of generative A.I., was released online. Google engineer Alexander Mordvintsev had created a system for generating outputs to match an image recognition system’s algorithmic understanding of the visual world, and when it went viral, the public got its first hellish taste of real computer “dreams” — nightmare landscapes dominated by wormlike swirls, eyes, and a ton of dogs for some reason. Since then, A.I.s have churned out disturbed cultural products like accidental horror stories and bizarre Frank Sinatra songs about spending Christmas in a hot tub.

So given A.I.’s demonstrated creative capacities thus far, should we really expect someone to take on the demonic task of forcing an A.I. to make an entire movie?

Yes, according to Duke University A.I. scientist David Carlson, PhD. “I think someone will eventually try to do this,” he told OneZero. And Carlson himself might be involved if they do, having helped engineer the A.I. systems that turned text into visual narrative media for papers published in 2018 and 2019. But, Carlson said, “That’s a long way off. You know, years at the minimum.”

It’ll take a lot of minds like Carlson’s to get an A.I. system to belch out a movie, because the task won’t be as simple as “making a robot watch every movie” and seeing what comes out. No matter how many comedians’ tweets you’ve read in that format, we’re actually decades or centuries from the technological milestone it implies — namely, general artificial intelligence, or something very close to it.

Rather than dumping a Roomba on a couch with a Netflix subscription for 10,000 hours and then asking it nicely to generate a blockbuster, an A.I.-generated movie is something that probably has to be painstakingly engineered, step by step. That’s very different from a single GAN, trained on so much movie data that it can just spit out entire movies at the push of a button. There may not be enough silicon in the universe to create a system that can do that.

In fact, there’s a pretty daunting gap between what cutting-edge A.I. can do right now, and what seems feasible based on contemporary science fiction and deceptive headlines. As I write this, the latest headline on the Daily Mail about A.I. reads: “Chinese state news agency unveils ‘the world’s first 3D A.I. anchor’ after ‘cloning’ a human reporter.” As you might expect, the real story is not as exciting as it sounds — it’s about as convincing a human facsimile as Andy from Toy Story. Meanwhile, actual generative A.I. systems at the cutting edge of technology can still have a hard time recognizing basic objects like fire trucks and birds outside of lab conditions.

If we wanted to use what we have now to create a 100% A.I. movie — meaning no human input other than the initial prompt — Carlson proposed a “stepwise procedure,” that is, basically, the general framework of the Black Box system at the start of this article. “I think given the current technology, we could probably actually go from a screenplay to an audio recording that might be convincing in some way,” Carlson said. But how much harder is video than audio? In his own research, Carlson said, “the struggles in video are in things like scene changes and consistency.”

Consistency is one of the operative words in Carlson’s research. For our purposes, it means that a computer has a hard time with what we moviegoers call “continuity.” Objects that exit the frame may not be related to the ones that reenter the frame, or they may just disappear from reality altogether.

“Unless you specifically tell it that there has to be logical consistency between scenes, it’s very conceivable that you have your first scene where you have a set of people, and then you just switch angles and it’s a completely different set of people talking about the same thing,” Carlson said. The trick is to “represent the internal consistencies in math.”

Or maybe inconsistencies and other such problems are the whole point. Talking to Oscar Sharp, director of Sunspring, the 2016 viral short film written by an A.I. and then conventionally made starring Silicon Valley’s Thomas Middleditch, one gets the impression that he’s more fascinated by what A.I. has to show him than what he can command an A.I. to produce. Sharp told OneZero that asking the question “what wouldn’t someone do?” is “a good shortcut to something that’s quite creative.” That way, he explained, “You’ve gone to somewhere we haven’t explored.”

Indeed, Sunspring features some memorable moments that it’s hard to imagine any organic human being — other than maybe David Lynch or Alejandro Jodorowsky — thinking up. It’s also an exercise in torturing actors with truly baffling dialogue.

Sharp is not a computer scientist — for that he relies on New York University A.I. researcher Ross Goodwin. “I only know somewhat about how the systems work,” he explained. “I prefer to blindfold myself — most of the time — from the exact processes that are being used.“ Nonetheless, he’s worked with enough A.I. members on his crew to know their limits. He said point blank that he would like to make the world’s first all-A.I. movie, but he, like Carlson, felt that no one will likely accomplish that by creating a single system trained on every movie ever that just generates whole movies with a single button push. “The processing would just be a bit much,” he said.

After Sunspring, Sharp attempted to have A.I. artists do as much of the creative work as possible on a project rather than just writing the screenplay, and the result was a short film called Zone Out. The film was created according to a step-by-step process not so unlike the one Carlson outlined — plus quite a bit more human intervention than Sharp originally intended. It wasn’t the work of a single GAN, but multiple A.I. systems handling different jobs. A convolutional neural network was meant to comb through public-domain movies and locate visuals that matched the places and objects in an A.I.-generated screenplay. Then another A.I. system was supposed to cast “actors” — people in those old movies who are similar to the people in the script, and “puppeteer their mouths” to match the screenplay dialogue. Yet another A.I. would then synthesize proper voices for the characters, and still another created music. “It fell down on most fronts,” Sharp said.

Consistency is certainly not one of Zone Out’s strong suits — the faces of the characters keep changing, to name just one problem. But judge for yourself:

The cobbled-together-from-parts feel of Zone Out was not a mistake. A.I.s as we know them today tend to gobble up information and spit out something derived from that information that’s different — and this is what seems to fascinate Sharp most of all about his A.I. projects. Sharp is interested in the similarities he sees between training data in an A.I. system, and the fragments of other people’s ideas live in the heads of creative human beings. He will wax philosophical at the drop of a hat if you ask him about how these similarities relate to A.I.-created art. “Machine learning is a very useful metaphor for human thinking,” he said. If you’re a creative person, that doesn’t mean you’re truly original. “We put a load of stuff that humans made in you, and now you make stuff.”

This is where the comparison to dreams comes in for Sharp. “When you shut your eyes, the model is still there and it’s still predicting,” he said. “That’s what a dream is. It’s also what the screenplays that I’ve been directing are like.”

Yet Aaron Hertzmann, PhD, the principal scientist at Adobe Research and a frequent commentator on whether or not computers can create art, was eager to burst the whole “computers can dream” bubble. “The analogy between human dreaming and random sampling from a generative model is an imperfect one,” he told OneZero, “because generative models do not have consciousness or subconscious. Sometimes the random samples have a dreamlike quality; that’s a stylistic description, not a cognitive one.”

Hertzmann told me an A.I.-generated feature film — if it can be made — will neither be art nor worthwhile pop culture. “There is no reason to believe that any kind of technology currently exists that can create a cultural product without a human author guiding the process,” he said. “Even as pure speculation, there is no meaningful way to even imagine what this would be like; it’s like asking Jules Verne in 1850 what books will be like in the year 2020. It’s fine to discuss as science fiction, but you have to make some assumptions that don’t match what we currently know.”

Even Carlson admits that the possibility of reaching the milestone of an “A.I.-created movie” varies depending on where the goalposts are.

“Maybe I’m being overly optimistic,” he said. “It depends on what you consider success.” He said if anyone is champing at the bit for the first robo-movie, it might shorten the wait time to settle for something animated, or a short rather than a feature, or a feature with a lot of human input along the way.

“Do you need it to be photorealistic on everything, with lots of consistency, etcetera?” he said. “We won’t have that this year.”

Writer on the hypothetical question beat. Covering climate, war, and the future at VICE. Outbursts and opinions here. Plz never @ me mike.pearl@vice.com

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store