Complex Objects 2.0 aka SIP

Introduction

We introduce an extension to the existing SIP model from Complex Objects 1.0. The file structure remains unchanged, namely, a ZIP file with files organised in folders and containing a top-level METS file describing the files in the ZIP. The structure of the METS has been changed to be correctly in line with the METS standard and have the expressivity for describing https://mediahaven.atlassian.net/wiki/spaces/CS/pages/4064641076.

MediaHaven already offers to export a complete record tree together with an METS XML.

Record Trees

Complex Objects 2.0 now can describe any of the https://mediahaven.atlassian.net/wiki/spaces/CS/pages/4064641076 one can model in MediaHaven.

Example 1

  • Newspaper

    • Newspaper page 1

      • TIFF image

        • Original representation

      • JP2 image

        • Original representation

    • Newspaper page 2

      • TIFF image

        • Original representation

      • JP2 image

        • Original representation

    • …

    • ALTO XML for the entire newspaper

Example 2

  • Dossier

    • Document 1 (e-mail)

      • Original representation (e-mail)

      • Email attachment 1

        • Original representation

      • Email attachment 2

        • Original representation

    • Document 2

      • Original representation

    • …

METS

METS

SIP

The structure of the SIP is unchanged from the old Complex Objects 1.0.

Requirements

Unchanged from Complex Objects Reference

The following requirements are imposed by the Complex Ingest workflow that go beyond the well-formatted XML and the validation by the provided XSDs.

  1. Every file in the archive must be referenced by the METS. If files are not referenced or if a referenced file is missing, the entire archive is rejected.

  2. The MD5 checksums provided in the XML are compared against the calculated MD5 checksums. The entire archive is rejected if one check fails.

  3. The file paths used in the METS are paths relative to the root the accompanying archive.