Duplicate Prevention

Introduction

The duplicate prevention in the system prevents duplicate files. The rationale for this is to prevent wasted storage or metadata by having multiple copies of the same file. The detection is based on the MD5 checksum that is always calculated during ingest.

Configuration

The duplicate prevention can be disabled on an organisation basis if needed by changing the organisation setting allow_duplicate_files to true using the API or Settings & Management module.

Edge cases

The system will allow duplicate files in the following edge cases

  1. Uploading the same file simultaneously against the same or other APIs

  2. Upload a file, delete the file, upload it again (new file) and restore the deleted file from the recycle bin → Same file occurs two times