Ori Inbar, the CEO and founder of Augmented World Expo, stood on stage in front of a live audience in October and had a conversation—with himself. Or rather a virtual hologram of himself.
Created using facial recognition, volumetric capture technology, and holography, the fake Inbar spoke in his voice, using words produced by ChatGPT. Fake Ori and real Ori traded barbs about each other’s skill sets. “Your words are a pile of cliches,” said the real Inbar to his doppelganger. “I’m not so sure that yours aren’t the same,” said the AI. Real Inbar underscored the key message of the demo, saying, “It’s a human’s job to double down on what makes us human.”
As tools to create synthetic virtual humans become widespread, less expensive, and accessible to all, it’s going to be increasingly difficult to tell what’s real and what’s fake. It may not matter; and it’s likely to produce entirely new revenue streams.
Real Tom Hanks can’t be everywhere all at once, but fake Tom Hanks can. Better yet, he can remain eternally young. In a preview of what’s to come, Hanks and Robin Wright have signed on to a Robert Zemeckis film in which Metaphysic, a company that specializes in creating digital humans, will de-age them live and in real time using generative AI, eliminating the need for laborious compositing or special effects work.
Elsewhere, you can find future-media-savvy Paris Hilton in myriad virtual personae. She’s an AI driven chatbot on Meta’s Instagram, WhatsApp, and Facebook platforms, and also part of the Metaphysic personalities being virtualized. Other businesses include dabbling in NFTs and building her own Roblox world.
Ninety-one-year-old James Earl Jones signed away his signature Darth Vader voice to Disney. It will be recreated using a voice cloning app called Respeecher. The company uses sound bites to “clone” an actor’s voice, allowing a studio to employ it in new roles in perpetuity. Actors can be anywhere, everywhere, all at once, forever.
The Birth of Virtual Hollywood
Sophisticated video effects were once expensive and confined to specially equipped studios. Today, new tools are putting the power of special effects—virtual human creation—into the hands of everyone. You may have seen or read about virtual humans (created without permission or license) in the likes of Tom Hanks and Tom Cruise. Hanks was recently exploited as selling a dental plan on social media. Gayle King’s likeness was pushing weight loss gummies.
Shenanigans like these have Hollywood celebrities flocking to digital studios to create, claim, and manage their own digital likenesses—to combat fakes and secure the royalties due on their likenesses. Remington Scott, CEO and chief architect of HyperReal in New York City, creates amazingly realistic A-List virtual humans. At the same time, he’s also developing the underlying rights management system to ensure that digital likenesses are tracked and compensated across multiple platforms. “Own, copyright, and monetize is our mantra,” says Scott.
Hyperreal produces what it calls hypermodels—ultra-realistic digital re-creations of a person’s entire body, face, voice, motion-captured performance movements, and mannerisms. The company has created near-perfect likenesses of young Paul McCartney, TikTok star Madison Beers, and others. Scott wants to digitize once, but use the likeness everywhere, whether that’s TikTok, Fortnite, or Netflix.
The resolution and the real-ness will vary by platform—a TikTok video versus a large-screen movie, for example. The cost of creating a virtual human will vary, too, based on the level of realism. “Everything about performance capture is tied to a technology, and the technology is always changing,” says Scott. He’s been a part of Hollywood’s digital scene for decades, from the first digitized home video games in the 1980s, to the first motion capture studio dedicated to entertainment, to directing the motion capture on the first hyperrealistic feature film. “It’s about data compression, and while it’s getting better every day, we’re still in beta,” he says.
When building an actor’s digital double, Hyperreal develops “source code,” an amalgam derived from an individual’s motion, voice, appearance, and logic. The highly detailed, 3-D models around iconic images can be licensed for future use, with royalties going to the artist or their studios. Hyperreal takes a fee for managing the platform.
Metaphysic has been making a name for itself, too. In a bit of digital trickery, company cofounder Tom Graham recently swapped his face, live on the TED stage, with the face of his event moderator and TED curator, Chris Anderson. It even spoke with Anderson’s perfect British accent, created using Metaphysic voice cloning. Metaphysic is luring top tier actors and actresses who recognize the need to get out in front of ownership of their virtual likenesses. The company’s roster of clients includes Tom Hanks, Octavia Spencer, and Anne Hathaway. Like HyperReal, Metaphysic will create ultra-realistic humans and manage the usage rights.
Will Kreth, founder of HAND (Human & Digital), says we need “a set of standards-based, resolvable persistent IDs for both real and virtual identities.” His company is creating a system of “citation-backed notability,” verifying that someone is indeed a “real” celebrity in entertainment or sports, based on their work credits and linked data. The company’s registry creates a unique digital identifier to automate the provenance of three things: a notable legal person (i.e. birthdate and legal citizenship of a country), licensable virtual human identity (i.e. every version of a notable legal human that was or will be scanned or captured for performances), and fictional characters (non-human characters who are often portrayed by notable legal persons in their image and/or voice likeness (such as Batman or Barbie).
Virtual at Birth
You don’t need a real human to create a virtual human. A new generation of celebrities are being born virtual, with influence and the power to be monetized. Mave is an AI-powered K-pop girl group. They have an ultra-human look, can speak four languages, and have racked up 20 million Spotify views for “Pandora,” their single recording. The entertainment company behind the phenomena, Kakao, created Mave using a combination of motion capture of the human body and face, AI for speech and lyric creation, synthetic audio, and more.
Kakao Entertainment’s K-Pop band, Mave is entirely virtual. Beyond entertainment, everyday virtual humans are hard at work, too. On Chinese TV, you’ll see QVC-like shopping segments where virtual clones of real streamers sell products on e-commerce platforms like Taobao and Douyin. (Those not in China cannot sign up and will not be able to have the full experience.)
Once the avatars are generated through facial recognition and motion capture, they are trained to read AI-generated scripts with their mouth movements perfectly synced to the words. Build these synthetic humans once, and you can use them again and again simply by altering the text. Synthetic stars like these are selling billions of dollars of goods in a single ecommerce shopping event in China, according to reporting by Technology Review.
Anina.Net (her screen name), an American actress and fashion designer living in Beijing, makes a nice living representing products on shopping networks in China. She told me that she wasn’t overly concerned about virtual streamers stealing her job. “Typically, they can’t do much,” she tells me. “They can’t hold up products or demonstrate anything. They can’t even breathe, so they look quite unnatural.” At least, for now.
Shopping virtual assistants are just the beginning. We’re also seeing newscasters going virtual. Hour One, a virtual human company, can create news videos that simulate a real newscast, including different camera angles, chyrons, spliced-in video footage, and other content.
Virtual Humans for Working Stiffs
What to do if you’re not an A-list celeb or a big shopping network? The cost of creating a high-quality motion capture varies. Day rates of $500 in a studio are not uncommon, and top-quality full-body motion capture can cost thousands of dollars.
One of the easiest ways to get started with synthetic humans is by using libraries of synths available on sites like Synthesia, D-ID, and Hour One. They are being widely used for internal communications, training videos, promos, explainer clips, and more. You choose from a number of realistic video characters, assign them a voice and a tone (like enthusiastic or serious), and train them on a script. It doesn’t produce a likeness of you, but yields synthetic humans that can be used in a variety of ways.
Frederic Werner of the International Telecommunications Union uses synthetic humans to help event attendees navigate the organization’s AI for Good webinars. “It seemed like a waste of time to have a human dress up in office clothes just to tell attendees how to navigate the site, over and over again,” he says. “With 150 webinars a year, and 3 hours of a human’s time for each one, it made sense to automate routine functions.”
Werner chose Synthesia to create a video avatar. It is ridiculously simple to use, offering 140 avatars, a selection of clothes, and some control over their style of speech. The results look stilted; there’s no mistaking these fakes for humans. But you definitely get a usable video that can introduce a lesson, a training, or webinar.
D-ID is a little slicker. In addition to letting you select from a library of avatars, the program incorporates AI image generation, so that you can ask it to synthesize a character from your imagination—a ghost, for example—and have that character recite a typed-in script or a recording of your own voice. This produces video that can be a little more creative.
Hour One is the most AI rich of these services. Its built-in generative AI will write a script for you based on prompts, it’ll choose footage for your video, and it will let you choose an avatar. You can create multiple scenes for your video. Here is my 3-minute creation.
As you experiment with these tools, remember that people can react viscerally and negatively to synthetic humans. In all but the top-of-the-line celebs, virtual humans tend not to or use their hands, for instance, and they don’t show normal human emotions. So, use them judiciously, says Werner. “They make sense in situations where the virtual humans do something like an intro, followed by the introduction of real humans,” he says.
We are rapidly moving past the uncanny valley where virtual beings looked like zombies, to an age where it will be difficult to tell who’s real and who’s a digital fabrication. Usage of our digital likeliness will become codified and commonplace. We’re in for a wild ride.