SIPs
Introduction
We introduce an extension to the existing SIP model from Complex Objects 1.0. The file structure remains unchanged, namely, a ZIP file with files organised in folders and containing a top-level METS file describing the files in the ZIP. The structure of the METS has been changed to be correctly in line with the METS standard and have the expressivity for describing Record Trees.
MediaHaven already offers to export a complete record tree together with an METS XML.
Retrieving the created records
When the top-level intellectual object is created, it will get a metadata field Structural.Relations.ContainedBy that contains the fragment ID of the Sip. On the SIP, the inverse relation Structural.Relations.Containscontains the fragment ID of the top-level intellectual object.
Record Trees
Complex Objects 2.0 now can describe any of the Record Trees one can model in MediaHaven.
Example 1
Newspaper
Newspaper page 1
TIFF image
Original representation
JP2 image
Original representation
Newspaper page 2
TIFF image
Original representation
JP2 image
Original representation
…
ALTO XML for the entire newspaper
Example 2
Dossier
Document 1 (e-mail)
Original representation (e-mail)
Email attachment 1
Original representation
Email attachment 2
Original representation
Document 2
Original representation
…
METS
The SIP contains a metadata sidecar file in theMETSformat describing the entire SIP. Because of this, the METS file can be quite large and will be stored as an additional representation under the original representation of the SIP. When the original representation is removed as part of the lifecycle of the SIP, this additional representation is removed as well.
SIP
The structure of the SIP is unchanged from the old Complex Objects 1.0.
Requirements
Unchanged from Complex Objects Reference
The following requirements are imposed by the Complex Ingest workflow that go beyond the well-formatted XML and the validation by the provided XSDs.
Every file in the archive must be referenced by the METS. If files are not referenced or if a referenced file is missing, the entire archive is rejected.
The MD5 checksums provided in the XML are compared against the calculated MD5 checksums. The entire archive is rejected if one check fails.
The file paths used in the METS are paths relative to the root the accompanying archive.
Rules
Lifecycle
See also MediaHaven Record Status for an overview of statuses.
Status | Meaning | Storage What happens to the original representation of the SIP? |
|---|---|---|
| The SIP has been created but not yet picked up. |
|
| The SIP workflow has started and the SIP is being processed. |
|
| The SIP and all its content has been successfully ingested. In older versions the status was |
|
| The SIP itself contains invalid (meta)data or one of the files from the SIP was rejected during ingest | Same behaviour as |
Rejections
SIPs will be rejected for the following possible reasons about the ZIP file itself or the embedded METS XML.
Message | Meaning | |
|---|---|---|
| 1 |
| The METS XML is not found inside the SIP |
| 2 |
| The system cannot unambiguously determine the METS XML because multiple XML files are present in the root directory |
| 3 |
| The file is not a valid ZIP file |
| 4 |
| The ZIP file has become corrupted or truncated |
| 5 |
| The sidecar file is corrupted |
| 6 |
| The sidecar file is corrupted |
| 7 |
| The METS XML is not a well-formed XML file, nor does it comply with the METS XSD. |
| 8 |
| One or more files described by the METS XML are not physically present in the ZIP file |
| 9 |
| One or more files not described by the METS XML are present in the ZIP file |
| 10 |
| The attribute |
| 11 |
| When duplicate files are not allowed (the default), then raise a validation even before extracting the file |
| 12 |
| The metadata field |
| 13 |
| The METS XML contains multiple instances of the same value for |
Beyond this, the SIP can be rejected because individual objects are rejected during ingest
Check | Meaning | |
|---|---|---|
| 1 | Duplicate MD5 | When the MD5 is not provided in the METS XML, the dynamic calculation during the ingest can still raise a validation error |
| 2 | Virus scanning | A virus was detected |
| 3 | Format |
|
| 4 | Validation | A Validation Module determined that the object is not valid |
| 5 | Transformation | If configured as strict, failed transformations lead to rejection |
Limits 25.2
For advanced installations, these limitations can have different (higher) values
The following limits apply
Limit | Value |
|---|---|
Total size | 250 GB |
Number of files in SIP | 10000 |
The SIP will be rejected if these limits are exceeded.