Scheduler
Introduction
The scheduler is a timed process which moves records from one set of source cluster groups to another set of destination cluster groups. It does so by creating export jobs with as source pool a selection matching the source cluster groups to all destination cluster groups to create a mirrored copy on each destination cluster group. The mirroring concept is described in https://mediahaven.atlassian.net/wiki/spaces/CS/pages/20643843
Example
The scheduler at Meemoo which schedules records from the source cluster group “mob” to the destination cluster groups “tape_archive_8” “tape_backup_8” and “tape_vault_8”;
Batches
The scheduler runs in batches.
Check if any exports from the previous batch are waiting or processing, if so exit
Cutting: Check for all records which have been successfully written to the destination cluster groups (typically during the previous batch)
Logically delete the record and pool combinations belonging to the source cluster groups
See https://mediahaven.atlassian.net/wiki/spaces/DEVELOPMENT/pages/2735079625 for cascading effect
The previous batch is now complete
Exports: Create a new number of export jobs from the source cluster group(s) to the destination cluster groups
These exports form the new batch and can take hours to complete
Exit
Parameters
See the settings pages of a scheduler for all settings and their description
Controls the source and destination cluster groups
CLUSTER_GROUP_SOURCES
CLUSTER_GROUP_ARCHIVE
CLUSTER_GROUP_BACKUP
CLUSTER_GROUP_VAULT
Controls what to schedule
SCHEDULER_SKIP_ORGANISATIONS
Controls to condition when to schedule
SCHEDULER_SIZE_TO_FREE
SCHEDULER_MINIMUM_SIZE
SCHEDULER_FREE_SPACE_THRESHOLD
SCHEDULER_MAX_WAITING_PERIOD
Since 22.1
SCHEDULER_CUTTING_MINIMUM_AGE
Limits control how much to schedule
SCHEDULER_MAXIMUM_FILES
SCHEDULER_MAXIMUM_SOURCE_TAPES
Since 24.1
Controls whether to write export files in parallel (see below)
SCHEDULER_PARALLEL_ORGANISATIONS
Since 22.1
SCHEDULER_PARALLEL_AMOUNT
Since 22.1
Is Online
In this mode the scheduler checks the tape databases belonging to its destination cluster groups where tapes are present. An newly detected tapes are marked as online, while any no longer detected tapes are marked as offline. See the property “is online” at https://mediahaven.atlassian.net/wiki/spaces/CS/pages/20643843 .
Verify / Healing
This feature makes the schedule verify random samples of records, to check if these have been successfully written to the destination cluster groups. It does so by performing the reverse operation, namely creating export jobs from the destination cluster group to the source cluster group.
If the export is successful it will be automatically cut in the next regular of the scheduler, because it detects a file that is already successfully written to all destination cluster groups.
If the export is failed it generates NOK premis events for the records
Healing is a procedure where an export job is created from another mirrored copy (the other mirrored copy lead to the failed export) to the source cluster group and marking the mirrored copy that failed as logically deleted. In the next batch the scheduler will pick up the file from the source cluster group and write again to all destination cluster groups. For example if a record as written to three tapes A0, B0 and C0 and the B0 copy was discovered to be corrupt, the B0 will be deleted. After the next completed batch it will be written to new mirrored copies A1, B1 and C1 in addition to the already existing A0 and C0 copies.
Prioritization
By default the scheduler will prioritize bigger files over smaller ones.
The setting SCHEDULER_MAX_WAITING_PERIOD defines the maximum number of days since ingest the scheduler will wait before picking up a file, regardless of its size.
Parallel scheduling
By default the scheduler will export files to the same storage pool(s), until the configured https://mediahaven.atlassian.net/wiki/spaces/CS/pages/3471114243 algorithm picks (a) different pool.
This can mean that when there are 2 tapes available, only one tape gets used until it’s full. As writing to tape is fairly slow, this is not optimal.
In 22.1
the following a new feature was introduced that, if activated, will write files to tape triplets in parallel:
SCHEDULER_PARALLEL_ORGANISATIONS
Configures the organisations this feature is activated for.
SCHEDULER_PARALLEL_AMOUNT
The number of tape triplets to use in parallel
Order
The order in which export jobs are created and hence also optionally limited is determined as follows
Case | Order | Since |
---|---|---|
The source is tape | Storage Pool ID, Position on tape |
|
The source is not tape | Organisation, Archive Date, Cluster ID |
|
See https://mediahaven.atlassian.net/wiki/spaces/CS/pages/20643843 for additional information about these terms
Future
In the future the scheduler should written to write from one super cluster to another super cluster. For example from the super cluster “gpfs-buffer” to the super cluster “tape LTO-8”.