Similarity search using Open AI (POC)
This optional module is part of release 24.4
and is still in the proof-of-concept (POC) stage.
Overview
This document provides technical information for the new module "Similarity search using Open AI (POC) " which includes the plugin "Similarity Search”. The module uses embeddings generated by Open AI during the ingest process, storing the hash in the record metadata and offering an API method to retrieve similar objects for an existing object. This module is the premium version of similarity search versus the basic version Similarity search with perceptual hashes (POC).
Description
Similarity search consists of asking for an existing object the most similar objects. See https://mediahaven.atlassian.net/wiki/spaces/CS/pages/4586110979 for details.
Activation
This feature will be automatically activated for the tertiary organisation on integration environments
The configuration of the Open AI API key is required for the correct operation
Create the following field definitions
MapField
namedDynamic.EmbeddingsPoc
VectorField
namedDynamic.EmbeddingsPoc.OpenAi
with dimensions =1536
andindex = true
Publish them
Enable the module
SIMILARITY_SEARCH_OPEN_AI_POC
for the customer’s organisation. See the REST document for information on how to do that.Obtain the Open AI API key for this environment. The development Open AI API key is stored in LastPass.
Update the plugin
OPEN_AI_EMBEDDINGS_POC
for the propertySecret
with the above Open AI API key. See Postman “Update secret for plugin”
Embedding
The embedding generated using Open AI is specifically crafted to include both
The metadata of the object
The data of the object (extracted from the preview, technically
PathToPreview
if it has theJPG
format, otherwise thePathToKeyframe
)
Caveats
Only works on newly ingested objects after the activation has been fully completed
Front-end
The MediaHaven front end will offer a context menu option for an object to show similar objects.
API
There is a new API method GET records/:recordId/similar
to return similar objects. See the REST documentation for further information.