I frequently get asked if I think there will ever be AI “Automated Editing.” It’s somewhat of a moot point, as there is already a whole lot of automated editing going on! It’s been going on for a long time.
I frequently get asked if I think there will ever be AI “Automated Editing.” It’s somewhat of a moot point, as there is already a whole lot of automated editing going on! It’s been going on for a long time.
If you know me, you’ll know I’m a metadata nut. I’ve never metadata I didn’t like! For me this is both an extremely exciting time and yet somewhat disappointing.
We now have the tools for accurate transcription, extract text from moving images, identify objects in the video, extract keywords, concepts and even emotion, and that is all good.
In fact, for all types of searching, whether it’s in a project, or any form of asset management system, this metadata is awesome. Tools like FCP Video Tag make it possible to find that perfect b-roll shot in your FCP Library, or like AxelAI in your asset library.
Metadata is information about the media files. It can be technical metadata describing the file format, codec, size, etc, or it can be information about the content – what we also call Logging!
For discovery and organization we need the logging metadata to be concise and associated only with the range in the media file that it’s relevant to. One way to achieve that is to isolate subclip ranges and organize them in Bins associated with topics, or in Markers with duration.For me the perfect embodiment of Logging Metadata is Final Cut Pro’s Keyword Ranges, that self organize into collections.
I’ll be focusing on Visual Search and Natural Language Processing, but there are many commercial and open source tools for extracting or embedding technical metadata including Synopsis.video, which will also allow semantic searching of movies using terms like “an interior, closeup shot outside, in a vineyard with shallow depth of field”, and CinemaNet (part of the Synopsis set of tools) will understand and match because it has been taught to understand those visual concepts.
The magic of our industry has always been to create something that isn’t real, out of some real, and some created, elements. Well, imagine if those elements were just a bunch of pixels, then you have a grasp of where this is going: creating video out of a description alone.
Another necessity for Mike Pearl’s ‘Black Box’ it would be a great tool even for educational and corporate production. Instead of Alton Brown needing elaborate props and sets to illustrate the inner workings of culinary concoctions, simply describe the example you want, and your Black Box will create it for you. In full 8K HDR!
Okay, we are nowhere near that, yet, but the research is surprisingly advanced. Keeping in mind how much advancement we’ve seen over two years in other examples, like Jukebox between 2014 and 2016: from 80’s video game to senior living commercial backing track!
These may be research projects now, but I expect them to be very exciting when I update this article in 2023!
I am uncertain whether digital actors and sets belong in production or post production! If there’s no location or actors to shoot, is it production at all? That philosophical discussion is for a future generation to decide, but digital actors in digital sets are here now.
While ML is making inroads into every aspect of production, it certainly seems like Visual Processing and effects show the most impressive results.
ML is already in upscaling, noise reduction, frame rate conversion, colorization (of monochrome footage), intelligent reframing, rotoscoping and object fill, color grading, aging and de-aging people, digital makeup, facial emotion manipulation, and I’m sure there’s more that I’ve missed.
Although they could be grouped under Visual Processing and Effects, I’ve chosen to group digital humans and deep fakes in another section. Similarly, there’s a separate section devoted to fully synthetic image creation.
If you think we’ve been able to manipulate ‘reality’ to create something new with traditional compositing and effects tools, just wait until you see what’s coming.
Audio Post, along with Visual Effects, have benefited greatly from innovative ML-based tools. There are tools on the horizon for voice cloning, isolating voices from the sound of a crowd, and automated mixing. More challenging are automated music composition and voice overs, which are rapidly getting ready for Prime Time. They’re not there yet, but as I mentioned in Amplifying Production, automated voice overs are ready for education/training and even corporate production.
Fully automated “radio” (audio only) news is on the horizon. These ML tools will take a basic data feed from a sports-ball game, for example, and format it into an article, that is then “read” by a ML voice over. I doubt it will be long before that research and BBC Research’s synthetic newsreader merge! Listeners, or viewers, would never know there wasn’t a human involved.
With all the attendant ethical complications, voice cloning will forever end the need for “frankenbites.” Voice cloning once needed large samples of a voice before it could synthesize new words, but now requires less than a minute of the sample voice to accurately create new words in that voice. It doesn’t provide any visuals, but that’s only a temporary setback
Much more complex than the technology are the ethical issues. The ability to reliably create words in someone’s voice, words they never said, is very open to abuse. Even in the context of Frankenbites, how far is “fixing” the line, and where does outright fake take over?
It’s not only audio voices that are being fakes, deepfakes create a compelling, but fake, visual of a person. Much more on deepfakes later.
Also interesting to note, is that many of these technologies are now mature enough that there are open source versions, so programmers can add it to their apps.
I think it’s important to reiterate that “production” is an umbrella term, covers everything from big budget feature films, to corporate, education, YouTube and TikTok. Tools designed for high end feature film production, usually with large crews, are probably not going to be at all relevant to a Tom Scott producing informational videos on YouTube! Solo, and small production crews will benefit most from their efforts being amplified by ML, while the immediate affect on the next Marvel blockbuster will be negligible.
In between there are lots of exciting developments. Google had a research project in 2017 that automatically pick the best angle from a multicam shoot, or direct a single camera to stay on the subject. Carnegie Melon and Disney Research have a project that edits from among the many cameras at an event – “social cameras.”
More practically, there’s a slew of new smart camera mounts for consumers that track an individual and keep them on camera. One of the new features of Apple’s iPad Pro announced April 2021 is called Center Stage and the camera will automatically track one or two people in the images, and frame accordingly.
There are smart drones that are being trained to better follow performers, or even having autonomous drones do all the filming!
Then there is the question of what exactly is “production?” If we can create synthetic actors in synthetic sets, is that still production?
Even though production has shown the lowest infiltration by ML, there are still a lot of exciting tools coming.
Before a single frame of a feature film or television series is shot, it is likely ML has been involved. Writers focus on whether or not an AI can write a script, but that’s probably the wrong question. Attempts at storytelling are the subject of the last section, but we can safely say that script writing will – at a very minimum – need input from humans for a very long time to come.
The right question would be whether there are ML based smart assistants that are in use, or proposed, for pre-production. None of these write scripts, but like all creative amplifications, these tools are freeing creatives to spend more time on what they alone can do. Whether it’s helping decide which projects to green light, to automated storyboarding tools in development, to breakdowns and budgets, or voice casting, ML is already creating Amplified Creatives.
Everything starts with a story, so the obvious question is, “can Machines tell compelling stories.” Could a machine write or direct a “movie”? When it comes to results so far, we should be asking if they can even tell coherent stories!
Every time there is a major shift in technology it is uncomfortable. Established workflows and patterns change. We have a saying at our place: “No-one voluntarily changes their workflow!” The innovations that are coming are going to be disruptive.
How quickly and directly these changes are going to affect you depends where you create in the broad spectrum of film, television, corporate, education and other production.
At the big money end, everything remains extremely conservative. Changing editing platforms is major news. (Well, editing a major motion picture on something other than Avid’s Media Composer is news, but I digress.) Because of the investments involved everyone is conservative. To change a workflow is a major challenge. There are months of testing involved before a commitment is made. If you work on network TV or feature films then nothing will change in the next couple of years.
Realistically, that end of the market will not embrace ML driven Smart Tools until they’re Avid sanctioned and available in Media Central.
Outside of that market uptake will be mixed. The most adept will explore which of these new tools enhance their creativity. I expect the smaller the creative group, the faster the uptake. These new smart tools particularly benefit small independent production groups, whether they’re creating their own projects or providing the creative and physical services to clients. The more efficiently they can work, the more they can create and/or remain competitive.
Although the exact words are a summary of Charles Darwin’s thesis and not his own, the intent is certainly applicable:
It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is most adaptable to change.
Insist on working without amplification and you will fade into irrelevance. Adapt, adopt and amplify your creativity and thrive. As Alvin Toffler said: