Scheduler

Introduction

The scheduler is a timed process which moves records from one set of source cluster groups to another set of destination cluster groups. It does so by creating export jobs with as source pool a selection matching the source cluster groups to all destination cluster groups to create a mirrored copy on each destination cluster group. The mirroring concept is described in https://mediahaven.atlassian.net/wiki/spaces/CS/pages/20643843

Example

The scheduler at Meemoo which schedules records from the source cluster group “mob” to the destination cluster groups “tape_archive_8” “tape_backup_8” and “tape_vault_8”;

Batches

The scheduler runs in batches.

  1. Check if any exports from the previous batch are waiting or processing, if so exit

  2. Cutting: Check for all records which have been successfully written to the destination cluster groups (typically during the previous batch)

    1. Logically delete the record and pool combinations belonging to the source cluster groups

    2. See https://mediahaven.atlassian.net/wiki/spaces/DEVELOPMENT/pages/2735079625 for cascading effect

  3. The previous batch is now complete

  4. Exports: Create a new number of export jobs from the source cluster group(s) to the destination cluster groups

  5. These exports form the new batch and can take hours to complete

  6. Exit

Parameters

See the settings pages of a scheduler for all settings and their description

  • Controls the source and destination cluster groups

    • CLUSTER_GROUP_SOURCES

    • CLUSTER_GROUP_ARCHIVE 

    • CLUSTER_GROUP_BACKUP

    • CLUSTER_GROUP_VAULT

  • Controls what to schedule

    • SCHEDULER_SKIP_ORGANISATIONS

  • Controls to condition when to schedule

    • SCHEDULER_SIZE_TO_FREE

    • SCHEDULER_MINIMUM_SIZE

    • SCHEDULER_FREE_SPACE_THRESHOLD

    • SCHEDULER_MAX_WAITING_PERIOD Since 22.1

  • Limits

    • SCHEDULER_MAXIMUM_FILES

    • SCHEDULER_CUTTING_MINIMUM_AGE

  • Controls whether to write export files in parallel (see below)

    • SCHEDULER_PARALLEL_ORGANISATIONS Since 22.1

    • SCHEDULER_PARALLEL_AMOUNT Since 22.1

Is Online

In this mode the scheduler checks the tape databases belonging to its destination cluster groups where tapes are present. An newly detected tapes are marked as online, while any no longer detected tapes are marked as offline. See the property “is online” at https://mediahaven.atlassian.net/wiki/spaces/CS/pages/20643843 .

Verify / Healing

This feature makes the schedule verify random samples of records, to check if these have been successfully written to the destination cluster groups. It does so by performing the reverse operation, namely creating export jobs from the destination cluster group to the source cluster group.

If the export is successful it will be automatically cut in the next regular of the scheduler, because it detects a file that is already successfully written to all destination cluster groups.

If the export is failed it generates NOK premis events for the records

Healing is a procedure where an export job is created from another mirrored copy (the other mirrored copy lead to the failed export) to the source cluster group and marking the mirrored copy that failed as logically deleted. In the next batch the scheduler will pick up the file from the source cluster group and write again to all destination cluster groups. For example if a record as written to three tapes A0, B0 and C0 and the B0 copy was discovered to be corrupt, the B0 will be deleted. After the next completed batch it will be written to new mirrored copies A1, B1 and C1 in addition to the already existing A0 and C0 copies.

Prioritization

By default the scheduler will prioritize bigger files over smaller ones.

The setting SCHEDULER_MAX_WAITING_PERIOD defines the maximum number of days since ingest the scheduler will wait before picking up a file, regardless of its size.

Parallel scheduling

By default the scheduler will export files to the same storage pool(s), until the configured algorithm picks (a) different pool.

This can mean that when there are 2 tapes available, only one tape gets used until it’s full. As writing to tape is fairly slow, this is not optimal.

In 22.1 the following a new feature was introduced that, if activated, will write files to tape triplets in parallel:

  • SCHEDULER_PARALLEL_ORGANISATIONS

    • Configures the organisations this feature is activated for.

  • SCHEDULER_PARALLEL_AMOUNT

    • The number of tape triplets to use in parallel

Future

In the future the scheduler should written to write from one super cluster to another super cluster. For example from the super cluster “gpfs-buffer” to the super cluster “tape LTO-8”.