Author: Philip

Becoming an Amplified Creative

Post author By Philip
Post date September 13, 2021

Beyond specific NLE implementations, ML is making it’s way into almost every part of the productions process: storyboarding, production breakdowns, voice casting, digital sets, smart cameras, synthetic presenters, digital humans, voice cloning, music composition, voice overs, colorizing, image upscaling, frame rate upscaling, rotoscoping, background fill, intelligent reframing, Aging, de-aging and digital makeup, “created” images and action, Logging and organization, automatic editing (of sorts), temporization of production, analytics and personalization, storytelling, and directing.

That’s a lot, and it’s only the examples that I’ve kept records of! I’m sure there are many more I’ve missed here.

This is a very detailed article, ultimately running to 12 posts. If you would prefer a briefer version I wrote an overview at the Frame.io blog. This version includes a lot more examples and references.

General

Amplifying Editing

Post author By Philip
Post date September 13, 2021

I frequently get asked if I think there will ever be AI “Automated Editing.” It’s somewhat of a moot point, as there is already a whole lot of automated editing going on! It’s been going on for a long time.

General

Amplifying Metadata

Post author By Philip
Post date September 13, 2021

If you know me, you’ll know I’m a metadata nut. I’ve never metadata I didn’t like! For me this is both an extremely exciting time and yet somewhat disappointing.

We now have the tools for accurate transcription, extract text from moving images, identify objects in the video, extract keywords, concepts and even emotion, and that is all good.

In fact, for all types of searching, whether it’s in a project, or any form of asset management system, this metadata is awesome. Tools like FCP Video Tag make it possible to find that perfect b-roll shot in your FCP Library, or like AxelAI in your asset library.

Metadata is information about the media files. It can be technical metadata describing the file format, codec, size, etc, or it can be information about the content – what we also call Logging!

For discovery and organization we need the logging metadata to be concise and associated only with the range in the media file that it’s relevant to. One way to achieve that is to isolate subclip ranges and organize them in Bins associated with topics, or in Markers with duration.For me the perfect embodiment of Logging Metadata is Final Cut Pro’s Keyword Ranges, that self organize into collections.

I’ll be focusing on Visual Search and Natural Language Processing, but there are many commercial and open source tools for extracting or embedding technical metadata including Synopsis.video, which will also allow semantic searching of movies using terms like “an interior, closeup shot outside, in a vineyard with shallow depth of field”, and CinemaNet (part of the Synopsis set of tools) will understand and match because it has been taught to understand those visual concepts.

General

Amplifying the Pixel

Post author By Philip
Post date September 13, 2021

The magic of our industry has always been to create something that isn’t real, out of some real, and some created, elements. Well, imagine if those elements were just a bunch of pixels, then you have a grasp of where this is going: creating video out of a description alone.

Another necessity for Mike Pearl’s ‘Black Box’ it would be a great tool even for educational and corporate production. Instead of Alton Brown needing elaborate props and sets to illustrate the inner workings of culinary concoctions, simply describe the example you want, and your Black Box will create it for you. In full 8K HDR!

Okay, we are nowhere near that, yet, but the research is surprisingly advanced. Keeping in mind how much advancement we’ve seen over two years in other examples, like Jukebox between 2014 and 2016: from 80’s video game to senior living commercial backing track!

These may be research projects now, but I expect them to be very exciting when I update this article in 2023!

General

Amplifying Actors

Post author By Philip
Post date September 13, 2021

I am uncertain whether digital actors and sets belong in production or post production! If there’s no location or actors to shoot, is it production at all? That philosophical discussion is for a future generation to decide, but digital actors in digital sets are here now.

General

Amplifying Visual Processing and Effect

Post author By Philip
Post date September 13, 2021

While ML is making inroads into every aspect of production, it certainly seems like Visual Processing and effects show the most impressive results.

ML is already in upscaling, noise reduction, frame rate conversion, colorization (of monochrome footage), intelligent reframing, rotoscoping and object fill, color grading, aging and de-aging people, digital makeup, facial emotion manipulation, and I’m sure there’s more that I’ve missed.

Although they could be grouped under Visual Processing and Effects, I’ve chosen to group digital humans and deep fakes in another section. Similarly, there’s a separate section devoted to fully synthetic image creation.

If you think we’ve been able to manipulate ‘reality’ to create something new with traditional compositing and effects tools, just wait until you see what’s coming.

General

Amplifying Audio Post Production

Post author By Philip
Post date September 13, 2021

Audio Post, along with Visual Effects, have benefited greatly from innovative ML-based tools. There are tools on the horizon for voice cloning, isolating voices from the sound of a crowd, and automated mixing. More challenging are automated music composition and voice overs, which are rapidly getting ready for Prime Time. They’re not there yet, but as I mentioned in Amplifying Production, automated voice overs are ready for education/training and even corporate production.

Fully automated “radio” (audio only) news is on the horizon. These ML tools will take a basic data feed from a sports-ball game, for example, and format it into an article, that is then “read” by a ML voice over. I doubt it will be long before that research and BBC Research’s synthetic newsreader merge! Listeners, or viewers, would never know there wasn’t a human involved.

With all the attendant ethical complications, voice cloning will forever end the need for “frankenbites.” Voice cloning once needed large samples of a voice before it could synthesize new words, but now requires less than a minute of the sample voice to accurately create new words in that voice. It doesn’t provide any visuals, but that’s only a temporary setback

Much more complex than the technology are the ethical issues. The ability to reliably create words in someone’s voice, words they never said, is very open to abuse. Even in the context of Frankenbites, how far is “fixing” the line, and where does outright fake take over?

It’s not only audio voices that are being fakes, deepfakes create a compelling, but fake, visual of a person. Much more on deepfakes later.

Also interesting to note, is that many of these technologies are now mature enough that there are open source versions, so programmers can add it to their apps.

General

Amplifying Production

Post author By Philip
Post date September 13, 2021

I think it’s important to reiterate that “production” is an umbrella term, covers everything from big budget feature films, to corporate, education, YouTube and TikTok. Tools designed for high end feature film production, usually with large crews, are probably not going to be at all relevant to a Tom Scott producing informational videos on YouTube! Solo, and small production crews will benefit most from their efforts being amplified by ML, while the immediate affect on the next Marvel blockbuster will be negligible.

In between there are lots of exciting developments. Google had a research project in 2017 that automatically pick the best angle from a multicam shoot, or direct a single camera to stay on the subject. Carnegie Melon and Disney Research have a project that edits from among the many cameras at an event – “social cameras.”

More practically, there’s a slew of new smart camera mounts for consumers that track an individual and keep them on camera. One of the new features of Apple’s iPad Pro announced April 2021 is called Center Stage and the camera will automatically track one or two people in the images, and frame accordingly.

There are smart drones that are being trained to better follow performers, or even having autonomous drones do all the filming!

Then there is the question of what exactly is “production?” If we can create synthetic actors in synthetic sets, is that still production?

Even though production has shown the lowest infiltration by ML, there are still a lot of exciting tools coming.

General

Amplifying Pre-production

Post author By Philip
Post date September 13, 2021

Before a single frame of a feature film or television series is shot, it is likely ML has been involved. Writers focus on whether or not an AI can write a script, but that’s probably the wrong question. Attempts at storytelling are the subject of the last section, but we can safely say that script writing will – at a very minimum – need input from humans for a very long time to come.

The right question would be whether there are ML based smart assistants that are in use, or proposed, for pre-production. None of these write scripts, but like all creative amplifications, these tools are freeing creatives to spend more time on what they alone can do. Whether it’s helping decide which projects to green light, to automated storyboarding tools in development, to breakdowns and budgets, or voice casting, ML is already creating Amplified Creatives.

General

Amplified Storytelling

Post author By Philip
Post date September 13, 2021

Everything starts with a story, so the obvious question is, “can Machines tell compelling stories.” Could a machine write or direct a “movie”? When it comes to results so far, we should be asking if they can even tell coherent stories!