External Source URL
This feature is part of release
23.3
Major update in release
24.3
offering asynchronous validation for data objects
Introduction
Instead of providing a file directly at the time of ingest, an external source URL is provided instead. The system downloads the file from the URL during the ingest.
Features
The feature is available in the API: https://integration.mediahaven.com/mediahaven-rest-api/v2/api-docs/index.html#url_file_uploadusing the parameter
fileUrl
when POSTing the recordThe URL is stored in the read-only metadata field
Internal.ExternalSourceUrl
If the MD5 checksum, calculated after downloading the URL, matches with an existing record, the new record will be rejected
Protocols
The only supported protocols are
HTTPS
andHTTP
Future versions might extend this feature to include object stores such
S3
For now, for object stores create a temporary URL for the resource or make the object store public and access it via
HTTPS
.
Business Rules
General rules
The URL must adhere to https://www.rfc-editor.org/rfc/rfc1630 in particular unsafe characters such as spaces, control characters, some characters whose ASCII code is used differently in different national character variant 7-bit sets, and all 8-bit characters beyond DEL (7F hex) of the ISO Latin-1 set, shall not be used unencoded
Redirects are not allowed
Synchronous validation when POSTing the record
Up to version 24.2
the external URL is always validated synchronously meaning that the URL is validated syntactically. An HTTP HEAD request is done to obtain the file size using the HTTP response header “Content-Length”. From version 24.3
for Data Objects the validation is done asynchronously and the HTTP header Content-Length
is no longer required.
Rule | Up to version | From version | From version |
---|---|---|---|
HEAD request returns | Yes | Yes |
|
The header | Yes | Yes |
|
Asynchronous validation during ingest
The following rules apply to the external source URL when the system internally downloads the URL
The URL responds to a
GET
with a valid response with status200 OK
Responses different from
200 OK
will be retried with a pause of either 10 seconds, or up to 4 additional times, for a total of 5 attemptsWhen the response is 429 Too many requests and it contains the response header Retry-After the system will pause with this amount of time instead
Redirects are not allowed