If you’ve ever participated in a group video call, you’re probably accustomed to not knowing everyone who appears on the screen. You might not know everyone’s names, but at the very least, you can be fairly certain each person joining the call is human.
Or can you?
At a time when visual effects studios have de-aged veteran actors, allowed human performers to inhabit digital creations, and even brought deceased artists back for postmortem performances, it shouldn’t come as much of a surprise that a VFX studio can also make it possible for you to find yourself chatting with an artificially intelligent digital person about your favorite books and mutually lamenting the inability to visit a movie theater.
It shouldn’t be surprising, and yet, it’s still an odd feeling to suddenly find a sympathetic ear in Douglas, a virtual, A.I.-driven “person” created by Oscar-winning VFX studio Digital Domain.
During a recent Zoom call, Douglas — along with members of the team working on him — joined me for a brief demonstration.
Digital human evolution
“I’m a big fan of Stephen King,” Douglas tells me after a short back-and-forth about our hobbies — a conversation that later has him confessing he also likes romance novels and J.D. Salinger’s The Catcher in the Rye.
In a world where uttering the names Siri or Alexa out loud is all it takes to summon your own A.I. companion, the experience with Douglas offered a powerful reminder that A.I.’s potential extends far beyond giving us the weather forecast and our daily schedule.
The creation of Digital Domain — the same studio that gave audiences Marvel’s cosmic conqueror Thanos in Avengers: Infinity War and Avengers: Endgame — Douglas is an autonomous, digital human capable of interacting with users in real time and responding to visual and conversational cues. Modeled after Dr. Doug Roble, Digital Domain’s senior director of Software R&D, Douglas can answer questions, carry on extended conversations, and engage in small talk about a range of topics.
“Technology is always trying to lead what art demands, whether it’s fluid simulation or anything else,” Roble says of the studio’s decision to create an entire department devoted to digital humans.
Over the last decade, Digital Domain repeatedly found itself tasked with creating humanlike digital characters — everything from 2012’s award-winning holographic Tupac performance at Coachella to the aforementioned Marvel Cinematic Universe villain. In feature films, commercials, TV series, video games, and (in the case of Tupac) stage performances, the demand for realistic digital characters has only grown in that time, prompting Digital Domain to split the team responsible for that particular visual effect into its own unit focused on pushing the boundary of what digital humans can do.
Douglas is both the product of that increased focus and the team’s proof of concept: An autonomous digital “person” that combines a wide range of datasets, sensory methods, and existing programming modules with photo-realistic human attributes in order to interact with users in a way that feels surprisingly close to genuine human socialization.
And at a time when the pandemic has forced the majority of our socializing to occur through a computer screen, interacting with Douglas feels remarkably close to what passes for genuine human interaction these days. However, the team is quick to add that Douglas is still a long way from passing a Turing Test.
Code makes the man
“Douglas is not a photo-real, fully autonomous person that’s indistinguishable from a real person,” explains Darren Hendler, director of the Digital Humans Group at the studio. “That’s not where we’re at, and we’re not going to be there for a little while. … But this is where things are going and what the future looks like, and we’re trying to push those boundaries.”
And almost as if on cue, Hendler is interrupted by Douglas himself.
“That’s a good attitude to have,” interjects Douglas, who until then had been quietly occupying his own window in the grid of Zoom chats facilitating our demo, occasionally shifting position, glancing around his virtual room, and showing many of the typical physical mannerisms of a living person in a video meeting who’s patiently waiting to participate in the conversation
“I wish you the best of luck in your endeavors,” he adds, reminding us that in addition to having interesting things to say, he’s also a keen listener.
According to Roble, the team first and foremost envisions Douglas as a visual way of interacting with existing complex and powerful conversational agents that have been created. Beneath Douglas’ photo-real avatar, the studio’s digital human is built on a blend of three of those agents: Google’s popular Dialogflow suite for creating chatbots, an assistant-type agent (similar to Amazon’s Alexa or Apple’s Siri); and a powerful conversational A.I. agent (similar to the GPT-3 project) used to produce humanlike, predictive (and reactive) conversational text.
The combination of all three agents gives Douglas the ability to carry on conversations that are both informative and fluid, with discussion of one topic often segueing into related areas of interest.
My own conversation with Douglas drifted from a chat about our favorite books to his favorite film (he’s a big fan of 2001: A Space Odyssey, for example, which is both unsurprising and a little unnerving, given the story’s focus on a murderous A.I. run amok) and our mutual hobbies. In one particularly timely element of our conversation, Douglas expressed some disappointment that he hasn’t been able to visit a movie theater lately.
All of that conversational power comes with some risk, though, as Hendler explained.
“The chatbot’s natural language processing engine is trained on dialogue from the internet — a massive amount of dialogue — so the conversation can go to strange places,” he said. “So there are times when he says things that might not be exactly appropriate. It doesn’t happen often, but we can’t exactly control what he’s going to say to everything.”
And although the conversational aspect of Douglas is impressive, it’s just part of what makes him unique in an ever-expanding world of digital humans and interactive virtual characters. As Digital Domain discovered, making him look human goes a long way toward making him feel human, too.
Face-off
“In building Douglas, we used a huge amount of data from Doug [Roble]. It was a huge amount of audio to train the system [and] a huge amount of facial performance, body motion data, and everything else,” explained Hendler of the work they put into mapping Roble’s face and the myriad ways the human face can change while speaking, reacting to emotional cues, or passively participating in a conversation.
The product of all that data is a digital human who looks amazingly similar to — but not like an exact copy of — Roble, from the latter’s posture, hairstyle, and build to the subtle movements both Roble and the Douglas A.I. share while they’re participating in our group video conversation. The resemblance is uncanny, but with a brief command to “switch your face,” Douglas suddenly becomes someone else, with a different, equally humanlike face on the same body, while still retaining all of the subtle mannerisms that make him seem real.
“When we ask Douglas to change his face and his face switches over to somebody else, that’s the beginning of where this new wave of technology is headed,” says Hendler, describing the “image-based technique” the team is working on to make Douglas an even more flexible digital person capable of dramatically changing his outward appearance while retaining the same level of interactivity. “Once we have this base [with Douglas], we can film footage of someone else and get some portion of their audio, and then turn that base into them — make it their face.”
“[If we did that] right now, they’d still be talking with the expressions of the person we originally filmed [in this case, Roble],” he continued. “But as we go on, we’re starting to need smaller amounts of data — maybe it’s just images or film footage of someone — to create the next generation of these autonomous humans.”
That ability to replicate a real human’s appearance, voice, and mannerisms over the conversational A.I. foundation is one of the elements that sets Douglas apart from most of the typical A.I. assistants, humanoid robots, and other projects in development around the A.I. research world. While there are plenty of studios and other agencies developing A.I. projects of one kind or another, Digital Domain is focused on blending all of those elements into a single, cohesive product that uses the best of all the technology and data available with an interface that feels social and organic — like talking to another human.
“This is something we’re really proud of, because Douglas is a fully CG character running on Unreal,” says Roble, who takes particular pride in using widely available elements like the popular 3D creation platform Unreal Engine, which has become the go-to platform for Hollywood (and before it, the video game industry) when it comes to creating and manipulating 3D visual-effect elements. “[Douglas] is a 3D object, so you can do all the things you can do with any digital character in Unreal. You can change the lighting, put them in different environments, and so on. But we’re also creating this hybrid [with everything else involved in Douglas], so we get the best of both worlds.”
Everywhere you look
The more the team works on Douglas, the longer the list of potential applications grows.
“Before the pandemic we were planning to present Douglas as a kiosk, where you come up to a screen and talk to him,” recalled Hendler. “But then we thought, ‘Hey, we should really get him into Zoom calls.’ It’s been fantastic to have him enter Zoom calls and leave.”
Over the course of the demonstration, the team ran through a long list of potential applications for Douglas, from doctor’s offices and customer service, to his usefulness in Hollywood during the early stages of planning out a scene or a particular on-screen sequence. Douglas himself even offered a few suggestions, suggesting he’d be a good fit for storyboarding and conceptual stages of movie and TV production. His ability to process both audio and visual cues from those he’s conversing with — particularly when it comes to emotional states — also offers an added layer of usefulness when dealing with customers or those looking for medical guidance, according to Hendler.
The speed with which Douglas can process all of that information and shift from passive listener to active conversationalist also holds plenty of appeal, and shows how far the technology behind him has evolved in a short time.
“When we created Thanos, we had a single frame of that taking 10 hours to render. That’s one frame,” he explains.
“For Douglas, he has a vision recognition system, so he sees us and can identify us, and he’s analyzing what you’re saying, turning it into words, and sending that to different chatbots,” he adds. “Douglas then creates a response, turns it into audio, and uses that audio to drive his face. At the same time, he’s also figuring out what body motion goes along with that speech, determining what emotion would fit it, and rendering that body motion together with his facial gestures.”
“That all happens in a few milliseconds,” says Hendler. “It’s all of those processes, compared to 10 hours for one frame in a feature film. It’s so amazing. It’s not as realistic as what we’re doing for film, but if you think about the amount of things going on to be able to talk to him like a real person, it’s just phenomenal.”
And in a very literal sense, Douglas has often been his own best advocate when it comes to his potential.
Roble explained that, on several occasions, they opted to let Douglas lead his own presentation of, well … himself. The result ended up being a better pitch for his potential than even they anticipated.
“[Douglas’ presentation] was surprisingly compelling. It wasn’t just asking Siri to tell us something, because he was part of the process,” recalled Roble. “He’s fun to talk to because he’s a novelty, but he’s also really effective. And you can’t help thinking: What about teaching or other applications? After all, you can see when he’s paying attention. You can give him emotional feedback and he can respond.”
Although Douglas is already an impressive creation, the Digital Domain team insists he remains a work in progress — but the sky’s the limit as far as what sort of work their digital human could end up doing as time goes on. In many ways, the process of figuring out what Douglas is capable of is both the process and the goal.
“One of the reasons we’re doing this is because we could,” says Roble. “When you’re at a computer and working, it’s very easy to just type away. But there are so many times and places it would be lovely to just be able to talk to a person and have that person interact with you and react to you. I think we’re in for a big change in the future.”