If you’ve used Final Cut Pro, Premiere Pro or Blackmagic Resolve you are already using some Artificial Intelligence/Machine Learning tools. Final Cut Pro is catching up with the others in their use of Machine Learning tools in their NLEs. It’s likely that some of these stand alone tools, or features from them, will migrate into an NLE near you, but in the meantime here are five amazing tools you could/should be using now.
Automated Rotoscoping
Rotoscoping tools have been consistently evolving and improving from the days of b-splines in Commotion, to todays’s Rotobrush 2 in Adobe After Effects, but it remains one of the most tedious post production tasks, well suited to Machine Learning. RunwayML‘s approach uses an object recognition approach for both object extraction and background fill. RunwayML uses a monthly subscription based on output format: p to 720P output is free and 1080 output is just $15 a month. ProRes and 4K output is $35 a month and is apparently their most popular subscription.
Custom Music Creation
One of the technologies I was in awe of when I was a more active editor, was Smartsound’s Sonicfire Pro, which customized music tracks to match a specified video length. Very clever stuff that indeed was invented there. Having experienced Sonicfire Pro when I came across Soundraw it was immediately familiar.
In Soundraw’s interface you choose a Mood or Video Theme, Style, Length and Instruments. In about 15 seconds Soundraw will generate 15 new, original and unique tracks for you. Choose one to edit and you have even more control over tempo, key, instruments and mix. You can load a video to synchronize with.
Soundraw’s durations lack the precision of Sonicfire Pro as they only increment in 30 or 60 second intervals, but for royalty free music I’ll take it. Two plans from $16.60 per month.
Royalty free and unique custom models that have never existed
One way to avoid being busted using Stock Images to populate your advertising campaign, or to fill a bit of b-roll, is to make sure you never use a photo of a person. Enter Generated Photos where every face is unique and completely fake! You have complete control over how diverse you want your fake population.
There is also This Person Does Not Exist for random face generation.
Spokesperson Avatars
The mainstay of many production business is the “talking head” shoot, which require organizing the on-camera talent, gear and crew, and a location. Not every “talking head” shoot is for something wonderful. More often they are fairly basic corporate communication or for education, and often melded with a PowerPoint style presentation. Synthesis.io takes your text input – typed or pasted in – and animates one of their many Avatars to speak that text.
Lumberjack System is using Synthesis.io to replace me in social media and help videos. It typically takes me about 5-10 minutes to set up and preview the audio and about 15 minutes for a rendered two minute video. The base plan allows for 10 minutes of video for $30 per month.
Here’s an example of a version one Avatar, which are pretty good, but you will see minor “uncanny valley” moments if you watch closely. They are rolling out there “more natural” version 2.0 avatars at the moment, but I haven’t yet experienced them yet. One nice editing feature is that all sentences start and end on the same frame with the new version! If only our real world presenters would be so consistent!
AI Studios is a new competitor to Synthesia.io while Rephrase.io takes a similar technology but to modify the one presentation to customize to thousands of unique videos.
When I drafted my Becoming an Amplified Creative article in May last year, I predicted that something like these Avatars was coming in “2-3 years.” In my pre-publication revision three months later, I had to include Synthesis.io and two months after that, I was a customer! This technology is evolving rapidly and it won’t be long before they are indistinguishable from live shoots with humans.
In all cases, these are real people who have been “sampled” in 10 minutes of video before being processed into the Avatars. There are dozens of Avatars and over a hundred languages. All performers have given permissions and were compensated according to the Synthesis.io.
“Computer, enhance”
Visual processing is where Machine Learning has made the most advances.Research Papers – technology developments not yet products – can regenerate faces in great detail from very low resolution or damaged inputs, fill in detail to upscale images much more. Topaz Labs have released tools for upscaling and de-noising that are invaluable for documentary work, where the originals often leave much to be desired in a 4K world!
Their Video Enhance AI not only upscales, but de-noises, de-interlaces, does some restoration and frame rate conversion. At US$200 it seems like magic to someone who started with ¾” quality! Adobe Camera RAW edged out Video Enhance AI for upscale quality in an A.I. Upscaling Software Shootout at Pro Video Coalition.