Search for records

Introduction

Our search engine contains all records and their https://mediahaven.atlassian.net/wiki/spaces/CS/pages/42074134 and associated https://mediahaven.atlassian.net/wiki/spaces/CS/pages/2703949892 . Some of these fields will be indexed and can be specified in search queries.

Permissions

When the search query does not specify certain fields, the system under the hood will add a default term to the query. There are two types of permissions:

  • Soft permissions are applied by default but circumventable by adding this field explicitly in the query

  • Hard permissions can never be circumvented

Field

Default

Type

Description

Field

Default

Type

Description

Internal.IsInIngestSpace

+Internal.IsInIngestSpace:0

SOFT

By default only return records no longer part of a zone

Structural.VersioningStatus

+Structural.VersioningStatus:(Head Untracked)

SOFT

By default only return records whose versioning status is not Draft

Administrative.DeleteStatus

+Administrative.DeleteStatus:NotDeleted

SOFT

By default only return non deleted records

RightsManagement.Permissions

+RightsManagement.Permissions.Read:(...)
with ... being all the groups the user is member of

HARD

Only return records to which the user has the read right: https://mediahaven.atlassian.net/wiki/spaces/CS/pages/3938877461

Administrative.RecordPhase

+Administrative.RecordPhase:(...) with ... being all the record phases the user has access to

HARD

Only return records belonging to a record phase to which the user has acess, see

For example, by default, only records that are no longer members of one of the are returned. To circumvent this, manually add +Internal.IsIngestSpace:<any value> to the query.

Search Normalization

The underlying search functionality of Mediahaven endeavours to return as many relevant records as possible. To achieve this, several normalization steps are applied. These steps vary depending on which type of field definition is associated with a metadata field

Global search and advanced search on TextFields

Following normalization steps are applied when the user wants to perform a free text search (global search) or when the user wants to search within a specific field configured as TextField:

  • Tokenization: split the value into individual words or tokens

  • Lowercasing: converting all characters to their lowercase equivalent

  • Ascii conversion: characters with diacritics or accents are transformed into their ASCII equivalents

  • Filtering stopwords: remove words which occur frequently but do not carry much semantic meaning

  • Stemming: reduce words to their root or base form. Example: "running", "ran", and "runner" all share the same root "run"

Advanced search on SimpleFields

Following normalization steps are applied when the user wants to search within a specific field configured as (subclass of) SimpleField:

  • Lowercasing: converting all characters to their lowercase equivalent

  • Ascii conversion: characters with diacritics or accents are transformed into their ASCII equivalents