Categories
Artificial Intelligence Machine Learning The Technology of Production

AI/Machine Learning Basics on the Digital Production BuZZ

I don’t always cross post my appearances on Larry Jordan’s Digital Production BuZZ, but I thought I did a particularly good explanation of the basics of AI and Machine Learning and how they might apply in production, that I thought I’d share this one.

https://www.digitalproductionbuzz.com/interview/philip-hodgetts-the-basics-of-ai-explained/#.W9iRBi2ZN1Q

Categories
Intelligent Assistance Software Lumberjack

Where we’ll be at IBC 2018

Greg and I will be at IBC 2018 and we’re looking forward to seeing you there.

If you’d like to pick our brains for up to an hour, then schedule a meeting with us. We’ll run through your workflow and offer suggestions on where there might be efficiencies, or we’re happy to demonstrate the innovative Lumberjack Builder. If you’re a Lumberjack customer, we’d love to hear how you’ve been using it and how it could be better for you. We’ll even buy you a beer!

Other than those meetings we’ll be mostly hanging around the Atomos ProRes RAW Theater as that seems to be the center of FCP X action this year.

As there’s no Supermeet this year, those of us who’d see each other there, are celebrating the Not Very Supermeet so come join us there.

Categories
Interesting Technology Machine Learning The Technology of Production

The Advantage of Web APIs

Web APIs (Application Programming Interface) allow us to send data to a remote service and get a result back. Machine learning tools and Cognitive Services like speech-to-text and image recognition are mostly online APIs. Trained machines can be integrated into apps, but in general these services operate through an API.

The big advantage is that they keep getting better, without the local developer getting involved.

Nearly two years ago I wrote of my experience with SpeedScriber*, which was the first of the machine learning based transcription apps on the market. At the time I was impressed that I could get the results of a 16 minute interview back in less than 16 minutes, including prep and upload time. Usually the overall time was around the run time of the file.

Upload time is the downside of of web based APIs and is significantly holding back image recognition on video. That is why high quality proxy files are created for audio to be transcribed, which reduces upload time.

My most recent example sourced from a 36 minute WAV, took around one minute to convert to archival quality m4a which reduced the file size from 419 MB to 71MB. The five times faster upload – now 2’15” – compared with more than 12 minutes to upload the original, more than compensates for the small prep time for the m4a.

The result was emailed back to me 2’30.” That’s 36 minutes of speech transcribed with about 98% accuracy, in 2.5 minutes. That’s more than 14x real time. The entire time from instigating the upload to finished transcript back was 5’45” for 36 minutes of interview.

These APIs keep getting faster and can run on much “heavier iron” than my local iMac which is no doubt part of the reason they are so fast, but that’s just another reason they’re good for developers. Plus, every time the speech-to-text algorithm gets improved, every app that calls on the API gets the improvement for free.

*I have’t used SpeedScriber recently but I would expect that it has similarly benefited from improvements on the service side of the API they work with.

Categories
Lumberjack Machine Learning Metadata

Speech-to-Text: Recent Example

For a book project I recorded a 46 minute interview and had it transcribed by Speechmatics.com (as part of our testing for Lumberjack Builder). The interview was about 8600 words raw.

The good news is that it was over 99.98% accurate. I corrected 15 words out of a final 8100. The interview had good audio. I’m sure an audio perfectionist would have made it better, as would recording in a perfect environment, but this was pretty typical of most interview setups. It was recorded to a Zoom H1N as a WAV file. No compression.

Naturally, my off-mic questions and commentary was not transcribed accurately but it was never expected or intended to be. Although, to be fair, it was clear enough that a human transcriber would probably have got closer.

The less good news: my one female speaker was identified as about 15 different people! If I wanted a perfect transcript I probably would have cleaned up the punctuations as it wasn’t completely clean. But reality is that people do not speak in nice, neat sentences.

But neither the speaker identification nor the punctuation matter for the uses I’m going to make. I recognize that accurate punctuation would be needed for Closed (or open) Captioning for an output, but for production purposes perfect reproduction of the words is enough.

Multiple speakers will be handled in Builder’s Keyword Manager and reduced to one there. SpeedScriber has a feature to eliminate the speaker ID totally, which I would have used if a perfect output was my goal. For this project I simply eliminated any speaker ID.

The punctuation would also not be an issue in Builder, where we break on periods, but you can combine and break paragraphs with simple keystrokes. It’s not a problem for the book project as it will mostly be rewritten from spoken form to a more formal written style.

Most importantly for our needs, near perfect text is the perfect input for keyword, concept and emotion extraction.

Categories
General

Putting Words in Their Mouths!

As I research further into Machine Learning to gain a better understanding of what’s possible and how it might be applied, I found a couple of audio related articles. While mostly still in the lab, this research will guarantee the perfect Frankenbite in the future!

Categories
Adobe Apple Apple Pro Apps Interesting Technology Machine Learning Nature of Work The Business of Production The Technology of Production

Maybe 10 Years is Enough for Final Cut Pro X

On the night of the Supermeet 2011 Final Cut Pro X preview I was told that this was the “foundation for the next 10 years.” Well, as of last week, seven of the ten have elapsed. I do not, for one minute, think that Apple intended to convey a ten year limit to Final Cut Pro X’s ongoing development, but maybe it’s smart to plan obsolescence. To limit the time an app continues to be developed before its suitability for the task is re-evaluated.

Categories
Business Intelligent Assistance Software

For your own sake: “Read the Fine Help!”

I speak as both a customer of software (among other things) and a developer of niche software and in both voices I want to scream “Read the Help” many times a day.

We get many emails where someone has tried to use one of our apps and “it hasn’t worked” and they’re “really stressed”. At least 80% are solved by copying and pasting part of the Help. For sure it’s annoying for us to write the Help and then have to provide it in bite size chunks to the customer. It takes time and that costs us money, but that’s not the reason you should read the Help.

Reading the Help will reduce your stress and get you answers faster.

Categories
Machine Learning Nature of Work

What Do We Want Machine Learning to be Used for in Post?

As someone who’s watched the development of machine learning, and who is in the business of providing tools for post production workflows that “take the boring out of post” you’d think I’d be full of ideas of how post can be enhanced by machine learning.

I’m not.

Categories
Lumberjack

Logging Real Time? Don’t Panic, Back-time!

At Lumberjack System, we frequently get push back that it’s “too hard” to log during the shoot. If you’re manipulating a camera (reframing etc) then sure you can’t log. And if you’re holding a boom, ditto. But if you’re monitoring audio during recording, or running the interview, then it is totally possible to log during the shoot with no added stress.

I do it all the time. For my family history project I set up cameras, mics and audio records and run the interview and log. Lunch with Philip and Greg is much the same, with the added complication of eating.

The best approach is to work in back-time mode and relax. Back-time eliminates the stress of anticipating when an answer starts because you’re logging “in the past”

I typically work with the 5 second back time on by default. This means I can be fully engaged with the subject while asking the question, and continue to be engaged with them as they start their answer. I can glance down after a few seconds (fewer than five!) and tap on the keyword start.

This takes the stress and tension out of having to “get it right on the moment.”

Back time also allows us to add a new keyword and log it from up to 90 seconds in the past. Keyword range end is always the current time.

Categories
Career Item of Interest

The Terence and Philip Show Episode 82: What Do You Get Paid For?

When I wrote yesterday’s blog post on Aging Out, I had completely forgotten this episode of The Terence and Philip Show we recorded back in February. Turns out it couldn’t be more relevant.

In this show we discuss the important role of professional skills and experience. They discuss the difference between having the tools, knowing how to use them and how to create with those tools.

With so much changing, new careers will need to be invented.