Deep Learning for Metadata Extraction

Samuel_Zeller_DSCF5179The Media Institute has worked extensively in Automated Metadata Extraction, with a portfolio of past and current projects innovating in the areas of:

  • Compact Video Signatures for search and unique identification
  • Semantic tagging of moving and still images
  • Object and feature identification

The media industry is eagerly anticipating a wealth of new technologies to help automate media operations, and the research community is equally keen to deliver genuinely valuable new capabilities.  Among the most rapidly-evolving research arenas, the field of “Deep Learning” is impacting our everyday lives, from “Google Translate” to Netflix viewing recommendations.  This type of AI (Artificial Intelligence) allows computers to learn in the same way as humans: by being exposed to a range of experience and information, and naturally deriving meaning.   But, just like young humans, Deep Learning Neural Networks (DNNs for short) need help gathering information and validating their interpretations are correct.  Once a reliable set of data – “ground truth” – is available, in sufficient quantity and quality – computers can learn!

One well-known source of ground truth is Image-Net, an open research project which includes 14 million images tagged and validated by humans into hundreds of thousands of semantic nouns.  For example, 856 nouns relate to birds and types of bird, and these are used to tag 812,000 images of birds.  As a result of this work, researchers around the word are able to use this ground truth to train their own DNN systems to recognize and tag content.  

But within the media community more is needed – automatically tagging video with ‘cats’ and ‘beaches’ is a great start, but ‘verbs’ (walking, fighting) and other semantic data is essential: what is the shot type, shot narrative, quality of source material, copyright information?  Once this metadata is generated, for operational use, individuals and organizations need the information in an easy to read, structured and system-interchangeable form.