Categories
Metadata Random Thought

What is the fifth type of metadata?

Right now I’m in the middle of updating and adding to my digital photo library by scanning in old photos, negatives and (eventually) slides. Of course, the photos aren’t in albums (too heavy to ship from Australia to the US) and there are not extensive notes on any because “I’ll always remember these people and places!” Except I don’t remember a lot of the people and getting particular events in order is tricky when they’re more than “a few” years old, or those that were before my time because a lot have been scanned in for my mother’s blog/journal.

Last time I wrote about the different types of metadata we had identified four types of metadata:

  • Source Metadata is stored in the file from the outset by the camera or capture software, such as in EXIF format. It is usually immutable.
  • Added Metadata is beyond the scope of the camera or capture software and has to come from a human. This is generally what we think about when we add log notes – people, place, etc.
  • Derived Metadata is calculated using a non-human external information source and includes location from GPS, facial recognition, or automatic transcription.
  • Inferred Metadata is metadata that can be assumed from other metadata without an external information source. It may be used to help obtain Added metadata.

See the original post for clearer distinction between the four types of metadata. Last night I realized there is at least one additional form of metadata, which I’ll call Analytical Metadata. The other choice was Visually Obvious Invisible Metadata, but I thought that was confusing!

Analytical metadata is encoded information in the picture about the picture, probably mostly related to people, places and context. The most obvious example is a series of photos without any event information. By analyzing who was wearing what clothes and correlating between shots, the images related to an event can be grouped together even without an overall group shot. Or there is only one shot that clearly identifies location but can be cross-correlated to the other pictures in the group by clothing.

Similarly a painting, picture, decoration or architectural element that appears in more than one shot can be used to identify the location for all the shots at that event. I’ve even used hair styles as a general time-period indicator, but that’s not a very fine-grained tool!  Heck, even the presence or absence of someone in a picture can identify a time period: that partner is in the picture so it must be between 1982 and 1987.

I also discovered two more sources of metadata. Another source of Source Metadata is found on negatives, which are numbered, giving a clear indication of time sequence. (Of course Digital Cameras have this and more.) The other important source of metadata for this exercise has been a form of Added Metadata: notes on the back of the image! Fortunately Kodak Australia for long periods of time printed the month and year of processing on the back. Rest assured that has been most helpful for trying to put my lifetime of photos into some sort of order. The rate I’m going it will take me the last third of my life to organize the images from the first two thirds.

Another discovery: facial recognition in iPhoto ’09 is nowhere near as good as it seems in the demonstration. Not surprising because most facial recognition technology is still in its infancy. I also think it prefers the sharpness of digital images rather than scans of prints, but even with digital source, it seem to attempt a guess at one in five faces, and be accurate about 30% of the time. It will get better, and it’s worth naming the identified faces and adding ones that were missed to gain the ability to sort by person. It’s also worthwhile going through and deleting the false positives – faces recognized in the dots of newspapers or the patterns in wallpaper, etc. so they don’t show up when it’s attempting to match faces.

Added June 2: Apparently we won’t be getting this type of metadata from computers any time soon!

Categories
General Interesting Technology Video Technology

What we learnt from the editor/software face-off at NAB

Let’s start by saying we’re working with a very specific type of video production: trade-show style video where there is an A-roll interview and limited b-roll that goes specifically with the A-roll. These are generally shot on a trade-show booth with shots of product from the booth.

Finisher was originally conceived as the book-end to First Cuts. First Cuts will save documentarians many weeks of work getting to first cuts, with the ability to create first cuts almost instantly while you explore the stories in the footage you have. These cuts are complete with story arc and b-roll. We worked on the assumption that an editor would probably delete the b-roll while they worked on cutting the a-roll into the finished form. (Although not necessarily: I cut one piece while keeping the b-roll around to save me having to go find it again.)

Finisher was suggested by Loren Miller of Keyguides fame who wanted an “editing valet” that would take his a-roll and add b-roll and lower third back in. That suggestion became Finisher.

However, I’ve been long interested in the application to these trade-show type edits that had never been near First Cuts and had to use much simplified metadata. My gut told me that an experienced editor would be faster but the cost effectiveness of a novice with Finisher would be compelling.

I was wrong. As it turned out, I ended up being the editing contender. I was happy about that because I trust my abilities – I’m fast and effective at this type of video. Up against me was the software’s co-developer, Greg Clarke. Greg’s first FCP lessons (other than import XML, export XML, open a Sequence) were on Sunday afternoon ahead of a Tuesday afternoon shootout. To say his editing skills and FCP skills were rudimentary is a huge understatement!

Greg had his edit complete in 27 minutes from being presented with raw footage. (Both competitors saw the footage together in raw form in a new project.) This video shows the Greg + Finisher cut. It’s acceptable but could definitely use an experienced eye.

My cut took 40 minutes to add in lower third and all the b-roll. There is a third cut, which is where I took the Greg + Finisher cut and added my editorial experience to that, which took an additional 11 minutes, for a total of 38 minutes. Yep, almost exactly the same time to get to a finished result.

Until you work on the cost side of the equation. Let’s assume that an experienced editor is going to work for $45 an hour for this type of work. (That’s approximately the Editor’s Guild rate for an assistant on a low budget documentary.) Let’s also assume that we’re paying Interns $15 an hour.

Rounding to nearest quarter hours for easy math, my cut was $33.75 to the producer; the basic Finisher cut would be $7.50 and the Finisher plus novice with editor tidy-up (however you would write that elegantly) would add another $7.50 of craft editor on top of the cost of the Intern cut.

Under half price.

Scaling production

Here’s where it gets exciting (for me anyway – I am easily excited). The Digital Cinema Society and Studio Daily produced some forty videos during NAB 2009 with the talented Chris Knell editing. Let’s assume that Chris got paid the hourly rate he should have and worked 10 hour days (with breaks) to get forty videos done within the week. By rights he should have been paid in the order of $1800 for that time.

One craft editor can tidy and clean four videos an hour (five based on my numbers, but let’s say four). Each video will take an Intern about 30 minutes to prepare a video for the craft editor. We need two Interns to feed the skilled craft editor four videos an hour. (2 Interns producing two cuts with Finisher per hour). Now 10 videos can be produced in 2.5 hours instead of 10 (getting them to the audience faster).

Faster and cheaper: Cost per day is 2.5 x 45 = $112.50 plus 2 x 2.5 x 15 = $75 for a daily total of $187.50. For the four days the editor also gets to enjoy NAB – show or hospitality – and the total cost to the producer is $750, not $1800. The massive reduction in time means that one crew could shoot and edit without damaging their personal health.

So, what I learnt at the Face-off is that Finisher is a tool I can use as an editor (more on that shortly); it helps scale large volume production to get results out faster; and it can substantially reduce the cost of the mass production of these types of video. It was not only Studio Daily producing forty videos but FreshDV, Mac Video and  MacBreak were also producing video and could have achieved similar savings.

Analysis

Both approaches required logging the material. During the Face-off we both trimmed or subclipped our b-roll to individual shots. (Here’s a tip we both used: drop the b-roll clip or clips in a Sequence and add edits, deleting bad sections of b-roll as you go, then convert to independent clips and name something appropriate. Finisher will use the name as metadata).

We also trimmed up our A-roll adding Markers as we went. For Finisher the Markers were added to Sequence Markers and given a duration that the novice wanted to cover with b-roll. I was placing Markers into the A-roll clip – so they would move when I cut the clip – so I could locate where b-roll shots would go based on topic.

What I learnt was that, if I adopted the convention from Finisher and basically added comments to my Markers that matched clip names, I could automate the process of laying in clips to the Timeline – 2 minutes for the Finisher round trip vs 10 or so to do it manually. It’s basically an automation tool.

Plus, as an editor I’d be closer to being finished as I’d place my Markers a heck of a lot better than a novice does/did.

But it’s really in the scaling and cost reduction for mass production that came as a surprise – a pleasant one.

Categories
Interesting Technology The Technology of Production

What were the technical trends at NAB 2009?

There certainly wasn’t much new in NLE at NAB. Avid had already announced, Apple are keeping to their own schedule that apparently doesn’t include NAB (although Apple folk were in town) and Adobe have a 4.1 update coming for Premiere Pro CS4. The only new NLE version was Sony’s Vegas, which moves up to version 9. With, of course, RED support. Can’t forget the RED support – it was everywhere (again).

Lenses for RED, native support, non-native support: everyone has something for RED, or Scarlet/Epic coming up. Lenses are already appearing for those not-yet-shipping cameras.

Even camera technology seemed to take a year off. I certainly became convinced of the format superiority (leaving aside lenses, and convenience factors) of AVCCAM, which is a pro version of the consumer AVCHD, with higher bit rates. The evidence supports the hypothesis that AVCCAM at 21 or 24 Mbits/sec should produce a much higher quality image than MPEG-2 at the same bitrate. Before this NAB I was only convinced “in theory”. Of course, choose the AVCCAM path and you’ll be transcoding on import to FCP or Avid to much larger ProRes or DNxHD files, which is an optional (and recommended) path for HDV or XDCAM HD/EX.

Everyone has a 3D story to tell. Panasonic promise 3D-all-the-way workflows “coming” and there were all sorts of tools on the floor for working with 3D, projecting 3D, viewing 3D…  As one of my friends quipped “The presentations were amazing. What’s more I took off my glasses and the 3D experience continued around me!”

I confess to being a little torn on 3D (and Twitter, but that’s another post). I’ve seen some really amazing footage, and some that simply tries too hard to be 3D.  I also worry how we’ll adapt to sudden jumps in perspective as the 3D camera cuts to a different shot. I noticed a little of this when viewing an excerpt from the U2 3D concert film. There are natural analogs to cutting in 2D – in effect we build out view of the world from many quick closeups, so cutting in film and TV parallels that.

I can’t think of an analog for the sudden jumps in position in 3D space and perspective that would help our brains adapt. Maybe we’ll just adapt and I’m jumping at shadows? Who knows. I don’t plan on 3D soon.

Nor do I expect to see Flash supported on a TV in my home for at least a couple of years. That’s the problem that Adobe faces in getting support for Flash on TVs and set-top boxes. For a start it will require a lot more horsepower than those boxes have already, but Moore’s law will take care of that without a blink. A bigger problem is the slow turn-over cycle of Televisions. Say it’s 6 months before the first sets come out (and none are yet announced). It’s probably ten years before any particularly provider can rely on the majority of sets being Flash enabled. Assuming it catches on.

So I rather see that as a non-announcement. Remember the cable industry already has it’s own Tru2way technology for interactivity on set-top boxes.

I am much more interested in Adobe’s new Strobe frameworks, even though it could take some business away from my own OpenTVNetwork.

For the geeky, my favorite new piece of technology for the show would have to be Blackmagic Design’s Ultrascope – an HD scope package, just add PC and monitor to the $695 hardware and software bundle for a true HD scope at an affordable price.

I’ve already given my opinions on the Blackmagic Design announcements, AJA announcements and Panasonic announcements during the show.

Two more trends this year: cheaper and better storage and voice and facial recognition technologies are becoming more widespread.

I am amazed at the way hardware-RAID protected systems have fallen in cost. Not only the drives themselves but the enclosures are getting to the point where it’s no longer cost-effective to build your own, certainly not if you want RAID 5 or 6.

Five years ago the only company demonstrating facial and speech recognition were Virage, who I didn’t see this year. But there are an increasing number of companies that have speech recognition that seems to be, overall, about the same quality as that bundled with Adobe’s Premiere Pro and Soundbooth CS4, i.e. it can get reasonably high in accuracy with well paced, clean audio and no accent. Good enough for locating footage.

Facial recognition seems to be everywhere, from Google’s Picassa to news transcription services. Not only do they recognize cuts but they also recognize the people in the shots, prompting when a new face is recognized.

How long before the metadata that powers First Cuts doesn’t have to be input by a person, again? That’s what really excited me about NAB 2009.

Categories
Interesting Technology

What do Blackmagic Design’s NAB announcements mean to you?

For the details of the announcements, see my news report at the Digital Production BuZZ.

Among a blizzard of NAB announcements Blackmagic Design’s Ultrascope is another of Grant Petty’s breakthrough products. Grant has always had as his goal to bring down the price of truly professional tools without sacrificing quality.

Until now, HD monitoring has not kept pace with the drop in prices for other parts of the HD production workflow. The Ultrascope runs on commodity PC hardware (i.e. cheap) and a 24″ display to bring six SD or HD Waveform Monitors into a single display, for a total investment of around $2000. The bundle includes a DeckLink card and the Ultrascope software for $695: bring your own PC and monitor.

Like the VideoHub router, Ultrascope breaks through the price/performance barrier. All we can wish for now are future software updates that add Vectorscope and other scopes to the display. (All things in time I guess.)

The optical fiber support in HDLink and a new DeckLink card positions Blackmagic Design well for the “big iron plant” business. Optical Fiber is a little out of my league but it is becoming increasingly important in those large facilities and previously needed to be converted to HD-SDI before capture. The new card takes the conversion out of the picture for direct capture to anything offered.

While I didn’t mention it in the main press release, I was interested to notice that there is now Linux support for Blackmagic cards and their Media Player software. Linux is not widespread in the post industry except in the large facilities that would also be likely targets for VideoHub and optical fiber support.

Seems to me that Blackmagic Design are providing more and more for the higher end facility while maintaining low cost products for the wider production community. And that’s a good thing.

Categories
Interesting Technology

How will AJA’s NAB announcements affect you?

For the details on the releases see my story at the Digital Production BuZZ  AJA’s NAB Announcements.

The Ki Pro is the most exciting announcement I’ve heard at NAB so far this year and is likely to garner a number of awards before the week is out. A direct shot at Panasonic who are constantly touting AVC-Intra as “pristine 10 bit full raster capture”, that quality is now available to any camcorder, regardless of format, direct to ProRes 422. It’s even possible to shoot with an SD camera and have the Ki Pro scale to HD before converting to ProRes. At $3995 it’s comparable to similar recorders from Panasonic for AVC-I and AVCCAM.

It’s a smart device – recording either to removable hard drive modules that come complete with FW800, or to Flash RAM modules in the ExpressCard 34 form factor that will go directly into any modern Mac laptop. I’m told there’s also an ‘exoskeleton’ that mounts the Ki Pro under the camera between camera mount and camera so it doesn’t need to hang off the camera.

This is a great product for those who mostly want to shoot, say, XDCAM EX/HD but require higher quality at times; or for those with older cameras who want to move forward to a ProRes workflow. Unlike the JVC GY-HM700 or GY-HM100 “Final Cut Pro ready” camcorders, the Ki Pro is full raster ProRes master quality while the JVC records in XDCAM HD within a QuickTime movie.

Definitely the Ki Pro is an amazing product, if only they could get the price down a little.

The Io Express appears to be a direct challenge to Matrox’s MXO 2, at a slightly lower price point. The key difference is that the Io Express, like the Io HD, converts to ProRes 422 in hardware before sending it to the computer. The MXO 2 pushes uncompressed video through the ExpressCard34 slot (or PCIe slot on a desktop) where it can optionally be converted to ProRes on the CPU. (Of course Matrox have new products as well, the MXO 2 mini at $449, which I’ll cover shortly.)

With fewer inputs than the Io HD (although not that many fewer, mostly reduced audio input support) the Io Express at US $995 is pretty darned cool.

Finally, the Kona LHi and Xena LHi (essentially the same card with minor differences due to platform support) seems to be everything the Kona 3 was with added support for HDMI in and out but at only US$1495 it’s cheaper than the Kona LH/LHe with more capability than the Kona 3 that was twice the price. Plus the new cards have analog input support missing from the Kona 3.

A great set of new tools for us all to play with. Now, let’s see what everyone else has been up to!

Categories
Interesting Technology

What did Panasonic reveal at their NAB 2009 Press Conference?

With Panasonic executives lining the wall, Nation Marketing Manager for Services, Jim Wickizer reminded the crowd of Panasonic’s role in the last 10 Olympics and revealed that Vancouver 2010 will be shot exclusively with Panasonic P2 HD -  the official recording format for the Vancouver 2010 winter Olympic games. Interestingly he noted the format would be 1080i60, which is not my first choice for fast action sports.

John Baisely, President Panasonic Broadcast waxed lyrically about MPEG-4 AVC (H.264) compression, commenting that it’s used in a “full range” of cameras, conflating the all-I-frame AVC-Intra used with P2 cards, and the AVCCAM range of camcorders featuring long GOP H.264 MPEG-4.

In probably the most exciting announcement, Panasonic revealed the P2 E series of cards. The E series are faster at ‘up to’ 1.2 Gbit/sec but more importantly, it is a more economic series, with 64 GB coming in under $1000 ($998); 32 GB $625 and 16 GB just $420. Unlike the original P2 media, the E series has a limited life of five years. The 16 and 32 GB cards will be available in May with the 64 GB coming in August. This significantly changes the cost dynamics of P2 media making it much more affordable to a wider range of people.

For the first time that I noticed, Panasonic have stopped using 720p24 as their benchmark for record time on P2 media, instead stating that a 64 GB P2E card will record one hour of 1920 x 1080 (full raster) 10 bit, 4:2:2 Intra-frame recording. With five slot cameras that’s a lot of continuous recording time at the highest HD resolution.

Director, Product Marketing Joe Facchini took the stage to reintroduce the HPX-300 – originally released just a few months ago – with 3MOS chips. 3MOS is Panasonic’s way of saying 3 CMOS chips. With 10bit AVC-Intra 4:2:2 recording, 20 variable frame rates and dynamic stretch it is a very nice camera. What was new is that there is going to be a customized studio configuration, for under $10,000.

Joe also addressed the rolling shutter issue that affects some CMOS implementations, like that in the HPX-300 (and most CMOS camcorders for that matter). He announced that a future firmware update for the HPX-300 will have “Flashband Compensation” to accommodate flashes that take less than a full frame, by borrowing information from an adjacent frame.

New to the P2 range are the:

AG-HPG20 P2 Portable 10 bit, 4:2:2 general purpose portable player/recorder weighing just 2.5 lbs (about 1 KG). The HPG20 has HD-SDI in and out for easy integration in existing workflows.

AJ-PCD35 five slot P2 card reader that connects to the computer via PCIe for high speed transfer.

AJ-HRW10 – a P2 ‘rapid writer’ that offloads up to five P2 cards at a time to two 3.5” hard drive RAIDs simultaneously. It includes the PCD35 and connects via Gbit Ethernet to the rest of your facility.

The only new P2 Varicam is the AJ-HPX3700, which outputs 4:4:4 RGB dual link signals live from the camera and records HD in camera to 4:2:2. It is positioned as a premium production Varicam.

Robert Harris, VP Marketing and Product development took to the stage to talk about the success of the AVCCAM format – based on the consumer AVCHD format but with higher bitrate options for improved quality. Pitched as “for those who can’t afford P2 independent frame products” like schools, event videographers, churches, etc.

AVCCAM records to ubiquitous SD media at data rates comparable to HDV. Like HDV AVCCAM is long GOP, although AVCCAM is H.264 not MPEG-2. H.264, which is also known as the AVC coded (Advanced Video Codec for MPEG-4). AVCCAM is gaining NLE support and theoretically provides significantly higher quality at any given data rate. H.264 is generally considered to be 2-4 times higher quality than MPEG-2 (HDV and XDCAM HD/EX).

So, while both HDV and AVCCAM produce Long GOP material, all else being equal, the AVCCAM footage will be significantly higher quality than that from HDV. All else being equal!

Panasonic announced a new camera to join the existing two products in the AVCCAM line: the AG-HMC70 and HMC-150. The new camera – AG-HMC-40 is a compact handheld camcorder (prosumer form factor) that weighs in at around 2.2 lbs (1 KG) with three 1/4” 3MOS chips, 12x optical zoom, Dynamic Range Stretch and Cine-like gamma. The HMC-40 records full raster 1080 at 60I, 30P and 24P; 720p60 and SD. Well equipped with outputs the camera features HDMI; USB 2; Composite and Component out. An optional XLR input adapter has manual level control. The HMC-40 will be available in August. The HMC-40 will carry an MRSP of $3195 and records to standard SD cards.

Also in the announcements from Robert:

HMR-10 – a compact, portable, battery powered recorder/player with  3.5” screen, HDMI and HD-SDI output, HD0SDI Input, USB port, audio input, remote start stop. At the highest bitrate it offers 3 hrs full raster recording or 12 hrs at 1440 x 1080 and a lower bitrate. (1440 x 1080 matches HDV and XDCAM HD/EX at below 50 Mbits/sec).

Billed as “HD Quality” the AG-HCK10 is a compact camera head with 3MOS 1/4” images. It teams with the HMR-10 where iris, focus, zoom and remote control come from the deck over HMR cables up to 10 meters each.

Both deck and compact camera head will be available in August with the HMR-10 coming in at $2650 and camera head similarly priced.

That completed the new product announcements but Robert Harris returned to the stage to commit Panasonic to supporting 3D throughout the entire camera-to-home workflow. He noted that the recent Monsters vs Aliens release had 28% of the screens showing a 3D version but those screens took in 56% of the total revenue! No wonder the industry is heading for 3D. The slide showed a single camera that had two lenses on the body – most unusual looking as the appeared to merge into the body.

No timescale was revealed for the Panasonic push to 3D but they are previewing technologies, particularly display technologies, on the NAB 2009 booth in the Central Hall.

Categories
Video Technology

How to find a needle in a haystack

Over the weekend I got a call from a client who was having trouble capturing the P2-based media he’d shot at HD Expo last Thursday. Now direct digital ingest of DVCPRO HD off P2 media (or copies) through Log and Transfer into Final Cut Pro has been one of the simplest and straightforward workflows since Apple introduced it with FCP 4.5. Basically “it just works”, except it didn’t.

My client followed Shane Ross’ article on importing P2 media to FCP 6 over at creativecow.net and all was good up until the point where the media should show up in Log and Transfer, when nothing happened. 

P2 workflow isn’t my strongest suit, so I referred my guy to Shane. Independently the client contacted his associate at Panasonic about the problem. Both concluded that FCP needed to be re-installed, which I dutifully did, uninstalling FCP 6 first, then re-installing.

There was no change! Troubleshooting is a logical process, something that seems to elude most people. As Sherlock Holmes said:

“Eliminate all other factors, and the one which remains must be the truth.”

We had a P2 card reader so to determine whether or not it was the media or the copies, we tried with direct-from-card import; disk images of the cards; and folder copies of the cards. (Both disk images and folder copies had carefully  maintained the file structure from the P2 cards, something that is crucially important.)

To try and eliminate FCP from the equation we tested with the demo versions of P2 Log from Imagine Products and Raylight for Mac from DV Film. Both applications crashed upon any attempt to convert and neither would show a preview. Not looking good for the media’s integrity, however the client had played back the media after shooting so there was a good reason to believe that it was fine.

Finally, the MXF4QT demo from Hamburg Pro Audio showed us that the media was fine, so what was the problem? 

Now comes the two most important tools for troubleshooting: Google.com and CreativeCow.net. While Creative COW has a fairly good search engine, I generally prefer to search via Google so that other sites are included. However, this time a search for “Can’t import P2 media Final Cut Pro” turned up a single thread that suggested there was a conflict between Noise Industry’s “FX Factory” product and the DVCPRO HD codec.

Search terms are important. I usually start with the important words of the problem and application or platform. Too few words and you’ll never find the solution in the results; too many and there will be no match. I like to think about how the answer might be structured and search for words I expect to find in the answer.

What I didn’t know until later was that the FX Factory application has an uninstall option, which would have been much cleaner than searching and deleting applications or components that don’t show up in a Spotlight search but do show up in a Find in the Finder. (Apparently Spotlight won’t show results from the Library or System folders “to protect you from yourself”!)

Once FX Factory was completely uninstalled, the P2 media appeared in the Log and Transfer window as expected, and presumably would also work in the P2 Log and Raylight demos, which appear to draw on the Apple DVCPRO HD codec. MXF4QT doesn’t call that codec so it was able to show the media.

I didn’t check versions of FX Factory and there could well be an update that resolves this problem. My client was more interested in getting to work editing at that point.

Categories
Interesting Technology Metadata Video Technology

What are the different types of metadata we can use in production and post production?

I’ve been thinking a lot about metadata – data about the video and audio assets – particularly since we use metadata extensively in our Intelligent Assistance software products and for media items for sale in Open TV Network. And the new “Faces” and “Places” features in iPhoto ’09 show just how useful metadata can be.

Back in the days when tape-based acquisition ruled there wasn’t much metadata available. If you were lucky there would be an identifying note on or with the tape. For linear editing that was all that was available at the source – the tape. The only other source metadata would be frame rate and frame size, and tape format and perhaps some user bits with the Timecode. With a linear system that was all you could use anyway.

With non-linear editing we moved media into the digital domain and added additional metadata: reel names; clip names, descriptions etc and with digital formats we’re getting more source metadata from the cameras.

But there are more types of metadata than just what the camera provides and what an editor or assistant enters. In fact we think there are four types of metadata: Source, Added, Derived and Inferred. But before I expand on that, let me diverge a little to talk about “Explicit” and “Implicit” metadata.

These terms have had reasonable currency on the Internet and there’s a good post on the subject at Udi’s Spot “Implicit kicks explicit’s *ss.” In this usage, explicit metadata is what people provide explicitly (like pushing a story to the top of Digg) while implicit metadata is based on the tracks that we inadvertently leave.

Actions that create explicit metadata include:

  • Rating a video on Youtube.
  • Rating a song in your music player.
  • Digging a website on Digg.

Actions that create implicit metadata include:

  • Watching a video on Youtube.
  • Buying a product on Amazon.
  • Skipping past a song in your music player as soon as it gets annoying.

We didn’t think those terms were totally useful for production and post production so instead we think there are the four types noted above.

Source

Source Metadata is stored in the file from the outset by the camera or capture software, such as in EXIF format. It is usually immutable.  Examples:

  • timecode and timebase
  • date
  • reel number
  • codec
  • file name
  • duration
  • GPS data
  • focal length, aperture, exposure
  • white balance setting

Added

Added Metadata is beyond the scope of the camera or capture software and has to come from a human. It can be added by a person on-set (e.g. Adobe OnLocation) or during the logging process. Examples:

  • keywords
  • comments
  • event name
  • person’s name
  • mark good
  • label
  • auxiliary timecode
  • transcription of speech (not done by software)

Derived

Derived Metadata is calculated using a non-human external information source. Examples:

  • speech recognition software can produce a transcription
  • a language algorithm can derive keywords from a transcription
  • locations can be derived from GPS data using mapping data (e.g. Eiffel Tower, Paris, France) or even identifying whether somewhere is in a city or the country
  • recalculation of duration when video and audio have different timebases
  • OCR of text within a shot.

Derived metadata is in its infancy but I expect to see a lot more over the next few years.

Inferred

Inferred Metadata is metadata that can be assumed from other metadata without an external information source. It may be used to help obtain Added metadata. Examples: 

  • time of day and GPS data can group files that were shot at the same location during a similar time period (if this event is given a name, it is Added metadata)
  • if time of day timecode for a series of shots is within a period over different locations, and there is a big gap until the next time of day timecode, it can be assumed that those shots were made together at a series of related events (and if they are named, this becomes Added metadata)
  • facial recognition software recognizes a person in 3 different shots (Inferred), but it needs to be told the person’s name and if its guesses are correct (Added) 

We already use inferred metadata in some of our software products. I think we will be using more in the future.

So that’s what we see as the different types of metadata that are useful for production and post production.

Categories
Distribution Interesting Technology

How will video be distributed on the Internet?

At his Blog Maverick, Mark Cuban reveals the “Great Internet Lie”, which apparently is that the utopian dream of distributing “Television and Film” over the Internet is doomed. He goes on to “conclusively” prove that it’s not possible to reach Network-sized audiences (even cable network) simultaneously across the Internet.

And he’s right. Certainly I’ve got nothing to refute his numbers. Real time streaming is difficult and expensive. And totally unnecessary. Broadcast Television and its cable/satellite brethren deliver simultaneous viewing to many millions of people at a time, without incremental cost. The bigger the audience, the more profitable the show because there’s not incremental cost and bigger audiences equal more viewers for advertisements.

The trouble is, that’s trying to resolve a new problem with an old solution. (When you only have a hammer, every problem looks like a nail.) The problem that will need to be resolved in the future is “how do we deliver a broad multiplicity of program choices tailored to individual tastes and smaller per-show audience numbers?” 

Programming has to be available when people want to watch it, not at some “appointed” time. It’s been more than five years since I watched real-time television. PVRs and digital downloads replace broadcast schedules. Even for the big-pipe broadcasters appointment television is dying. Broadcast will eventually become the home of sports (real time delivery to big audiences is highly desirable); American Idol and reality TV. These are the only shows that will garner large-enough audiences to meet the mass-audience requirement of a broadcast station or network.

vintcerfSo, how do we deliver that multiplicity of program choices – from traditional and new media suppliers – to meet customer demands? None other than Vint Cerf, one of the “inventors” of the Internet, feels that the future of video on the Internet is downloading instead of streaming real time like a broadcaster.

Already the majority of the video on the Internet is downloaded – progressive download (a.k.a. fast start) drives YouTube, Google Video and pretty much every How to and travel video website on the Internet. It’s only the traditional media folk that think emulating their old business is the way to build a new business, so Hulu and ABC.com stream. (Hint: it’s never been the case and isn’t now.)

Progressive download requires much simpler technological configurations. It’s far less sensitive to the variability of speeds on the public Internet and shared neighborhood nodes, and it meets the needs for the future, instead of the past.

Advocates of real time streaming attempt to draw a distinction between a viewing and “ownership” of a copy. Real time streaming never leaves a copy on the viewer’s hard drive; download does. But realistically a viewing = ownership (potential) has been the de facto reality since the introduction of the Betamax in 1975. Ever since then every broadcast has had the potential for people to keep a personal copy of the show. Digital only accelerates that with built-in PVRs and DVRs in cable and satellite boxes. 

One trend that defines the current state of mass-market television is increased choice and control over viewing schedule, moving the viewer away from the broadcast or cable programmer to effectively programming their own entertainment channel. This is the trend that will increase in the future and that’s why Mark Cuban is completely right and totally missing the point.

Addition: It seems like I’m not the only one questioning Mark’s assumptions. The Forrester Blog for Consumer Product Strategy Professionals posted “Mark Cuban Goes off on the Internet Video Lie” pointing out that Broadcast and Internet delivery were complementary media. They also have links to some of their other writings on the subject.

Categories
Distribution Video Technology

What’s the difference between a codec and a container or wrapper?

It’s a subject with widespread confusion often leading to only a partial understanding.

There are file containers, sometimes called wrappers, that wrap around a number of video and audio tracks. Each of those tracks will have an appropriate video or audio codec. A codec is a concatenation of “coder – decoder”. Basically it’s like using a secret code or cryptography: as long as the encoder and the decoder understand each other, we get video and audio back out at the other end.

Think of a shipping container. There’s this standard “wrapper” (the container) which tells us nothing. Inside could be a car, computer or a million wrist watches. Like the shipping container, file containers can carry many different types of content – the video and audio tracks. These tracks are encoded with some sort of codec. Most codecs compress the video to reduce file size and time to download (and to increase field recording times in production), but there are codecs that work with uncompressed video. Every track has to have a codec for video and for audio.

Common containers are QuickTime (which supports over 160 codecs at last count); AVI (which probably supports almost that many) and MPEG-4, which supports only a few codecs, but very versatile ones. Common codecs are “MPEG-4”, “Sorenson”, “H.264”, “Animation”, “Cinepac”, etc. (DivX is it’s own thing, as I’ll explain.)

Most QuickTime codecs are for production purposes. The older QT codecs that were used for .mov on the web have been “deprecated” by Apple. They no longer show up as export options in the default install of QuickTime. Nor should they. They’re way too inefficient by modern standards. The last new QT distribution codec was Sorenson Video 3 in July 2001. In codec terms that’s just a little after the Jurassic era.

AVI has been a workhorse. I refer to it as the Zombie format because Microsoft officially killed it in 1996 (when the last development was done). It is still in use in production on PCs and very popular for distribution on the Internet, with more modern codecs. Most AVI production codecs are specific to their hardware parent. A modern .avi file is likely to be a “DivX” file.

DivX is actually a hybrid of an AVI wrapper with an MPEG-4 Advanced Simple Profile (see later) video codec and an MP3 audio track. This is a bad hybrid of codecs and formats, such that DivX for a while had to have their own player. (MPEG-4 video should go with AAC audio.)

Most often the MPEG-4 codecs are used in the MPEG-4 container. This is a modern, standards-based container not owned by any one company. It is an official International Standards Organization standard. The basic file format was donated by Apple and is heavily based on the QuickTime container, but is NOT the same. You can’t just change the .mov to .mp4 (or reverse) and hope it’ll work. (It will in the QT player but nowhere else.)

The first codec that the Motion Picture Experts Group (a.k.a. MPEG) approved is properly called MPEG-4 Part 2 ‘Simple Profile’ or ‘Advanced Simple Profile’. This was such a great marketing name, that Apple just called it simply “MPEG-4,” thereby creating huge confusion for everyone as the distinction between codec and container was totally blurred! Thanks Apple! Not! Apple only supported Simple Profile; Sorenson and DivX used Advanced Simple Profile and there were components for QuickTime (not made by Apple) that played Advanced Simple Profile MPEG-4 as well as Simple Profile MPEG-4.

DivX uses the Advanced Simple Profile but in an AVI wrapper, as noted above.

Then just a few years ago, the MPEG association approved a new codec, to be used in the same MPEG-4 wrapper, called (in full) MPEG-4 Part 10 the Advanced Video Codec (AVC). The European ITU also supported the same codec independent of MPEG-4 (so it could be used in other wrappers) as H.264. They’re all the MPEG-4 codec that is Part 10, Advanced Video Codec or H.264.

And yes it is possible to put an AVC/H.264 video track in a QT .mov, but that’s a different container and only QT will play it. MPEG-4 is an ISO standard and there are more than 20 player implementations.

It is AVC/H.264 video with AAC audio (the MPEG audio standard) in an MPEG-4 container that is now playable in QuickTime Player, iTunes, on Apple Devices, in 20 standard players and in Flash 9 release 3 or later (9r3 was finalized in Nov 2007 and is now widely installed). Microsoft have also announced support for H.264 MPEG-4 is coming in Silverlight in 2009, and Windows 9 Media Player has support built in for those same files.

3GPP and 3GPP2 cell phone codecs are part of the MPEG-4 family fwiw.

Hope that helps and makes sense.