In the current version (24.4) this is still a POC (Proof of concept).
Overview
This document provides technical support information for the new module "Full-text search (POC)" which includes the plugin "OCR (POC)". The module is designed to calculate Optical Character Recognition (OCR) for objects during the ingest process, storing the results in the record metadata to enable searching on the text of a file.
Activation
Create a field definition
Dynamic.PocOcr
of typeTextField
Also, mark the field definition as global such that the end-user experiences a full-text search using the global search
Link the module
FULL_TEXT_SEARCH_POC
to the customer’s organisation. See this part of the rest api documentation for information on how to do that.
Once activated, OCR will be carried out for all files that are newly ingested via the ingest 1.5 and 2.0 flows for that organisation. The extracted text will be saved to the Dynamic.PocOcr
field.
Supported file types
Format | Supported file extensions |
---|---|
Pdf files |
|
Emails |
|
Microsoft office |
|
Web pages |
|
Plain text |
|