Batches

Introduction

Batches were introduced in 20.2 to easily and safely operate on a large amount of data. Batches operate on a large data set of records conveyed via a filter. The data set is then linked to one of the various tasks. Batches can be started manually in the MediaHaven REST API or created by a workflow process.

Error Handling & Reporting 24.2+

Batches will handle every record matching the provided query. Depending on the outcome of the handling the following properties of the batch change

  • The record did change → Completed increments by 1

  • The record did not change → Skipped increments by 1

  • Failure → Failed increments by 1

The batch does not abort on failure but keeps on processing the subsequent records unless at least 20% of the total failed. In the latter case, the batch is assigned the status TooManyFailed.

Failed records can be retrieved through monitoring (Batches → Failed records) or the API: /batches/:batchId/failures

Status

Status

Meaning

Status

Meaning

Waiting

A batch has been created but no page has been picked up yet

Processing

At least one page of the batch is already processing

Completed

All records have been processed and there was no failure for any record

CompletedWithErrors

All records have been processed and there was at 1 failure for a record

PostBatchFailed

All records have been processed but in the post-batch step an exception occurred.

TooManyFailed

At 20% of records encountered a failure and the batch was aborted as a consequence

Cancelling

A request has been sent to cancel the batch. No new jobs will be created.

Cancelled

Status after all the existing jobs of the batch are finished after cancelling.

API Permissions

POST batches/

Any user can create batches for the index of their organisation; the created batches search as the user who created to batch.

GET batches/

The returned batches depend on the function of the user

Function

Effect

Function

Effect

No

Can read the batches created by this user

ADMIN_BATCHES

Can read all the batches from the index of the organisation of this user

ADMIN_BATCHES + ADMIN_VIEW_ALL_ORGANISATIONS

Can read all batches from all indices

PATCH batches/ 24.3+

The following functions are needed to update a batch partially:

Function

When needed?

Function

When needed?

ADMIN_BACKEND_SERVICES

Own batches

ADMIN_BACKEND_SERVICES + ADMIN_BATCHES

Batches from the same organisation

ADMIN_BACKEND_SERVICES + ADMIN_BATCHES + ADMIN_EDIT_ALL_ORGANISATIONS

Batches from other organisations

DELETE batches/ 24.3+

Requires the function ADMIN_BACKEND_SERVICES.

For cancelling a batch you need the same functions as for PATCH.

Cleanup 24.2+

Completed batches are cleaned up 30 days after their finish date.

Multi indices 22.2+

Normally, a batch is executed on the index of the organization to which the user belongs. When the zeticon@installation or system@installation user starts a batch, it will be executed across all indices available on the system.

Heavy batches 24.3+

Requires the function ADMIN_BACKEND_SERVICES.

To prevent heavy batches from clogging the system, the following actions can be taken:

  • Lower the priority of the batch using a PATCH

  • Change the worker daemon zone using a PATCH

  • Cancel the batch by sending a DELETE request, which sets the status to Cancelling. The cancellation is not an instant process, instead, it will let the existing jobs of the batch finish. No new jobs will be created. After the jobs are finished, the status is set to Cancelled.

Â