Ingest Monitoring

The Ingest Monitoring offers an overview of the items for your organisation in MediaHaven. Each item in MediaHaven has a list identifiers and a particular status and is linked with a series of events. Firstly we describe these concepts of status, events and identifiers. Secondly we describe how these concepts are used in the design of the ingest into MediaHaven and the representation in three views in the MediaHaven Ingest Monitoring.

ArchiveStatus

Value

Remark

Value

Remark

on_ingest_tape

The item has been detected by the tape ingest server but has not yet been transferred to the ingest server.

in_progress

The item has been detected on the ingest server and is being processed.

failed

The item is not successfully processed. The events for this item contain one or more NOK events and these contain the reason for the failure.

on_disk

The item is successfully archived on two or more disks. For files where the ultimate location is tape, this status will not appear.

on_tape

The item is successfully archived on N tapes. Depending on the installation N can be 1 (archive), 2 (backup) or 3 (vault).

completed

Only used for files of the type:

  • Ensemble (Collection, Set, Newspaper): all files in the ensemble have either ArchiveStatus on_diskon_tape or completed.

  • Metadata Only: the ArchiveStatus is immediately completed upon creation

Events

MediaHaven logs the complete history of an item as a list of successive Premis events. The outcome of an event is either good ok or bad nok.

Here we offer an overview of event types.

Event

Description

Event

Description

CREATE

The item has been created in MediaHaven. From this point on the three identifiers (see below) are defined.

SIP_DETECTED

The item has been detected by either the tape ingest server tapeserver or by the standard ingest service pretranscoder

TAPE_ZIP

(tape ingest only) The item has been successfully packed into a ZIP archive.

UPDATE

The metadata of the item has been updated in the database.

publish

The file has been manually published by the user or has been automatically published.

pretranscoded

Event generated by the standerd ingest service pretranscoder to typically log validation errors (e.g. MD5 checksum, archive validation) with outcome nok.

transcode

The service transcoder has started the transcoding and storage allocation of the item.

transcoded

The service transcoder has finished the transcoding and storage allocation of the item. For installations where the ultimate location is not tape, the item will have the status on_disk from this point onward.

archiveD_on_tape_ARCHIVE

(For installations that write to 1 or more tapes) The item has been successfully written to an archive tape. The comment of the event contains the barcode of the tape. If the item has been successfully written to all required tapes, the item will have the status on_tape after the batch of tapes writes is complete.

archived_on_tape_BACKUP

(For installations that write to 2 or more tapes) The item has been successfully written to a backup tape. The comment of the event contains the barcode of the tape. If the item has been successfully written to all required tapes, the item will have the status on_tape after the batch of tapes writes is complete.

archived_on_tape_VAULT

(For installations that write to 3 tapes) The item has been successfully written to a vault tape. The comment of the event contains the barcode of the tape. If the item has been successfully written to all required tapes, the item will have the status on_tape after the batch of tapes writes is complete.

export

An export of the item has been requested by a user. The event can have outcome nok if permission was denied.

exported

The export of the item has been completed with outcome ok or nok.

Examples of the sequence of events

  • An item submitted using the standard online ingest and stored on disk: SIP_DETECTED CREATE UPDATE transcode transcoded → on_disk

  • An item submitted using the standard online ingest with an MD5 checksum failure: SIP_DETECTED CREATE pretranscoded → failed

  • An item submitted using the standard online ingest and stored on an archive and backup tape: SIP_DETECTED CREATE UPDATE transcode transcoded archived_on_tape_ARCHIVE archived_on_tape_BACKUP → on_tape

  • An item submitted using the standard online ingest and but which could not be written to tape due to an internal error:  SIP_DETECTED CREATE UPDATE transcode transcoded archived_on_tape_ARCHIVE archived_on_tape_BACKUP → failed

  • An item read from an ingest tape and stored on an archive and backup tape: SIP_DETECTED CREATE TAPE_ZIP SIP_DETECTED UPDATE transcode transcoded archived_on_tape_ARCHIVE archived_on_tape_BACKUP → on_tape

  • An item read from an ingest tape but with an MD5 checksum failure on the ingest tape:  SIP_DETECTED CREATE TAPE_ZIP → failed

Identification

In order to unique identify items in the system MediaHaven uses three identifiers:

  1. Umid (aka MediaObjectId): This is a 64 character hex string uniquely representing the item in MediaHaven.

  2. FragmentId: This is a 96 character hex string representing a fragment of an item (e.g. a clip from a video, a page from a document). An item has always at least one fragment, termed the main fragment.

  3. ExternalId: (Advanced installations only) The field contains the identifier of the item provided by the customer. The ingest flow is tightly coupled with this field: it will reject or accept a item based on whether the ExternalId already exists in the system. In the design section below we go into detail about this.

Views

The Ingest Monitoring offers three views:

  1. SIPs: Offers an overview of items coupled with an ExternalId, showing only the most recent submission of each ExternalId.

  2. items: Offers a complete history of items submitted to MediaHaven, including failed or deleted submissions.

  3. Events: Offers the complete history of events in MediaHaven. When restricted to events of a single item, it shows the complete history of events that occurred to that item.

Design

The ingest views are designed around the following philosophy:

In the view sips you see the state of the most recent submission of items with an ExternalId. While in the view FILES you see the the complete backlog of the submissions of an item into MediaHaven, including failed submissions. When you present an item with an ExternalId to the system and it already exists with status in_progress, on_disk or on_tape and the old item is not deleted, the new item is rejected and an event SIP_ALREADY_PROCESSING or SIP_ALREADY_ARCHIVED is generated on the existing item. If the ExternalId already existed with status failed, the old item is deleted automatically and a new item is created. If the ExternalId already existed but was deleted, the new item is also accepted.

Let's us explain this by an example using the ExternalId ara_510_2242_00.

SIPs

The screenshot below shows in the view SIps the item with ExternalId ara_510_2242_00 with status on_tape that was successfully archived on Oct 7, 2016.

Files

The screenshot below shows in the view files the complete history of submissions of the item ara_510_2242_00. The most recent submission has status on_tape, while the previous submissions have status failed and have been deleted (showed by a shaded or striped colour). In the example the item ara_510_2242_00 has been submitted 8 times. Notice that each submission has its own FragmentId while they share the same ExternalId. Due to design of the ingest, there can be exactly only one non-deleted item for the same ExternalId.

Events (succes)

If we look into the view events of the successful on_tape submission we see the following events. Notice that the item has been successfully written to the archive tape a00001L7 and backup tape B00001L7. Afterwards it has been exported to an export tape E00000L6.

Events (failure)

If we look into the view events of a failed submission we see for example the following events.

We notice that the event pretranscodeD with outcome nok. The comment contains the following error:

METS: referred file does not exist: OriginalData/510_2242_000_00049_000/510_2242_000_00049_000_0_0001.tif

The submitted item was an archive and the METS file linked with it, described a file which was not found inside the archive.