...
Once activated, OCR will be carried out for all files that are newly ingested via the ingest 1.5 and 2.0 flows for that organisation. The extracted text will be saved to the Dynamic.PocOcr
field.
Supported file types
The following table contains file types that are confirmed to work.
Theoretically all supported Tika formats are supported: https://tika.apache.org/3.0.0-BETA/formats.html. However, this is not guaranteed for file formats not listed in the table below.
Format | Supported file extensions |
---|---|
Pdf files |
|
Emails |
|
Microsoft office |
|
Web pages |
|
Plain text |
|
Images |
|