As big a fan of all metadata as I am, I do acknowledge that the most useful metadata is also the most expensive to obtain. Technical metadata from the camera comes at almost no cost: set the camera correctly and metadata on frame size, codec, frame rate, Timecode, Time of Shoot, etc comes effectively free.
Not so content metadata about where the shot it, who is in the shot, what people are saying, what people are talking about (requires understanding), or where people are. That metadata takes time (and mostly human involvement) to add, making it quite expensive.
Back in 2008, when we released First Cuts for FCP, we knew the power of metadata to kick start the editing process for non-scripted production. First Cuts didn’t reach its potential because of the expense of the metadata offset some of the benefits.
That’s why I am so interested in the potential for Machine Learning to reduce the cost of acquiring Content Metadata. Once we can derive the metadata affordably, we can use that to kick-start the creative process and avoid the paralysis of an empty timeline!