Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Once activated, OCR will be carried out for all files that are newly ingested via the ingest 1.5 and 2.0 flows for that organisation. The extracted text will be saved to the Dynamic.PocOcr field.

Supported file types

The following table contains file types that are confirmed to work.

Theoretically all supported Tika formats are supported: https://tika.apache.org/3.0.0-BETA/formats.html. However, this is not guaranteed for file formats not listed in the table below.

Format

Supported file extensions

Pdf files

pdf

Emails

msg, eml

Microsoft office

doc, docx, ppt, pptx, xls, xlsx

Web pages

htm, html, asp, php

Plain text

txt, rft, log

Images

jpg, png, bmp