Skip to end of banner
Go to start of banner

Metadata Concepts

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 13 Next »

A file in MediaHaven consists of a list of metadata fields. Each metadata field has a key and the keys are unique in the list of metadata fields from a file. An overview of all the fields that are standard in MediaHaven is documented in Metadata Fields.

Field Definitions

A field definition describes the properties of a field.

Property

Description

Key

  • The name of the field

  • The keys are case sensitive

  • The Key of the top fields are unique (case insensitive)

Sub

Metadata fields can be nested inside other metadata fields (see complex field). Such nested fields are called sub fields or child fields and have a parent field. 

Top

Inverse of the property sub.  Each file consists of a list of top fields whose key is unique.

Flattened Key

A sub field cannot be uniquely determined by its own key because the key for sub fields is not unique across all fields. By concatenating the key of the parent field(s) with the key of the sub field, you obtain a unique flattened key. For example the field Keywords with sub field Keyword, yields a flattened key KeywordsKeyword.

The flattened key is allowed in the MediaHaven REST API 2.0 but the dotted key is the preferred format

Dotted Key

Same principle as for flattened keys but now the concatenation contains a dot . between each part and the first piece is the family.

  • The descriptive field Keyword has as dotted key Structural.Keywords.Keyword

The dotted key is the reference format for the MediaHaven REST API 2.0

Lucene Key

  • Name of the field in Lucene. For legacy reasons it can differ significantly from the key of the field.

  • From 18.1 the REST API allows for using the field key instead of the Lucene key in queries. For sub fields you must use the flattened key. This allows for bypassing the legacy Lucene keys.

  • From 19.1: the Lucene key always corresponds with the flattened key

The Lucene key is no longer allowed in the MediaHaven REST API 2.0

Required 

If true, this field will always be present for every record with a non empty value.

Read Only

The value of this field cannot be changed using metadata updates.

Index

  • If true, make the field searchable in the index

  • Note: a field can be present in the index without being indexed.

Global

When using the standard search, it searches in a hidden virtual field which contains as value a join of all fields marked as global

Advanced Search

If true, make it available in the advanced search. Requires the field to be indexed.

Tokenize

Informally speaking, tokenizing means splitting the string value of the field on white space in tokens. Requires the field to be indexed.

For example the field Description with value "Alice Bob Cedric" is split into the ordered tokens "Alice", "Bob" and "Cedric".

The file can matched in the index:

  • using these tokens separately: +Description:Cedric +Description:Alice

  • in a specific order: +Description:"Bob Cedric"

  • note: +Description:"Alice Cedric" will not match

Family

Describes the family to which the field belongs. See the section Metadata Families below for more information.

Metadata Families

Metadata Families

Metadata Field

A metadata field is either:

  • Simple: which means it has a scalar value, for example a string, long or boolean

  • Complex: which means the value consists of a list of other fields termed sub fields.

The types SimpleField and ComplexField are each further specialised into additional types.

Base Class

Sub Class #1

Sub Class #2

Comment

Examples

SimpleField



Contains a string value with less than 32K characters.

Title

SimpleField

BooleanField


Contains a boolean value. In MediaHaven 1.0 web site and the old External Metadata [deprecated] this shown as either "0" or "1".

IsFragment, ContainsGeoData

SimpleField

EnumField


Contains a string value. The value is limited to a specific set of values defined on the fielddefinition


SimpleField

TimeCodeField


Contains a timecode in ISO timecode format, e.g. "00:25:12.840"

StartTimeCode

SimpleField

LongField


Contains a long value, e.g. "5845988"

Width, Height, FileSize

SimpleField

LongField

FramesField

Contains a long value with up 10 zeroes prepadded, e.g. "0000000025"

The rationale is to make it  lexicographically sortable in Lucene, because regular long values are not.

FragmentStartFrames

SimpleField

DateField


Contains a full ISO8601 date in microsecond precision and Zulu time. The machine specification of the format is yyyy-MM-dd'T'HH:mm:ss.SSSSSS'Z'

1925-11-30T15:23:48.123456Z

SimpleField

TextField

Contains a string value of any length or when the value is tokenized

Description

ComplexField



Contains as value a list of metadata fields

Browses

ComplexField

MultiItemField


  • Contains as value a list of simple fields. 

  • The string value of sub fields are trimmed and empty values are ignored

  • The sub fields have no order

  • The sub fields can be duplicate on their key, but their value must differ (case insensitive)

Authors

ComplexField

MultiItemField

ListField

  • ListField is a specialized MultiItemField where all sub fields must have the same (predefined) key

  • This allows for representing the value as a list

Keywords, Categories, Publications

ComplexField

MultiItemField

MapField

  • MapField is a specialized MultiItemField where all sub fields have a unique key.

  • This allows for representing the value as a (hash) map

Browses/Browse 



  • No labels