External Source URL

  • This feature is part of release 23.3

  • Major update in release 24.3 offering asynchronous validation for data objects

Introduction

Instead of providing a file directly at the time of ingest, an external source URL is provided instead. The system downloads the file from the URL during the ingest.

Features

Protocols

  • The only supported protocols are HTTPS and HTTP

  • Future versions might extend this feature to include object stores such S3

  • For now, for object stores create a temporary URL for the resource or make the object store public and access it via HTTPS.

Business Rules

General rules

  • The URL must adhere to https://www.rfc-editor.org/rfc/rfc1630 in particular unsafe characters such as spaces, control characters, some characters whose ASCII code is used differently in different national character variant 7-bit sets, and all 8-bit characters beyond DEL (7F hex) of the ISO Latin-1 set, shall not be used unencoded

  • Redirects are not allowed

Synchronous validation when POSTing the record

Up to version 24.2 the external URL is always validated synchronously meaning that the URL is validated syntactically. An HTTP HEAD request is done to obtain the file size using the HTTP response header “Content-Length”. From version 24.3 for records with https://mediahaven.atlassian.net/wiki/spaces/CS/pages/4064378938 Data, validation is done asynchronously and the HTTP header Content-Length is no longer required.

Rule

Up to version 24.2

From version 24.3
for flat data objects

From version 24.3
for data objects

Rule

Up to version 24.2

From version 24.3
for flat data objects

From version 24.3
for data objects

HEAD request returns 200 OK

Yes

Yes

  • No, it is checked asynchronously during ingest

  • When the validation fails the record will be rejected

The header Content-Length is present in the HEAD response and is a valid positive number

Yes

Yes

  • No longer validated or mandatory because the file size is determined by actually downloading the file during ingest

Asynchronous validation during ingest

The following rules apply to the external source URL when the system internally downloads the URL

  • The URL responds to a GET with a valid response with status 200 OK

  • Responses different from 200 OK will be retried with a pause of either 10 seconds, or up to 4 additional times, for a total of 5 attempts

  • Redirects are not allowed