6 steps for classifying documents effectively

6 steps for classifying documents effectively

  1. Setup Tags & Keywords
    • Prior to any scanning operation, the business must define the basis on which to categorize documents. E.g. HR documents might be categorized based on keywords such as: Employee ID, Employee Name, Date of Joining etc.
    • Your Document Management System (DMS) must be capable of setting up custom “Document Types” with custom Tags & Keywords for those document types.
    • These tags & keywords will be crucial to effectively filing documents in the correct folders, assigning the correct permissions, assigning the correct destruction policies and for search and retrieval.
  2. Setup Rules for Document Destruction
    • Where possible, documents should not be kept indefinitely. This saves storage space and reduces the organisation’s exposure to litigation.
    • Most document types can be assigned a Retention Period – the number of years the document must be kept. This can be decided through internal consultations with the document type’s users and with your organisation’s legal team.
    • Each custom “Document Type” in the DMS must have a retention period field. Based on this field and the date of creation (or the date of declaration as a record) the DMS should automatically assign an expiry date.
    • Upon reaching the expiry date, the DMS should start a workflow informing users that a document is going to be destroyed, giving them the option to stop the destruction with a good reason for it.
  3. Setup Access Control Rules
    • Each user in the DMS should have access only to documents that they are authorized for. Further, it should be pre-decided which users from which departments can view/edit/delete documents in which repositories.
    • The DMS should be capable of providing multiple levels of access to different users and user groups.
  4. Capture Documents in the Correct Format
    • Very often documents are captured in bad formats, making them difficult to read for the machines and for the human eye.
    • Common scanning mistakes are:
      • Selecting a very low resolution
      • Selecting a very high resolution – making the file unnecessarily large
      • Saving in JPEG or other lossy formats that reduce image quality
      • Warped paper alignment, resulting in lopsided images that are not machine readable
      • Marks and tears on the original paper
    • Ideally images will be scanned in:
      • 150-200 dpi resolution
      • TIFF format (which can easily be OCRed and stored in PDF later on)
      • Properly aligned
      • No wrinkles, tears or marks on the original paper — though these can’t always be avoided
  5. OCR the Documents
    • Optical Character Recognition (OCR) is a mature technology and most DMS systems should provide that.
    • A good OCR system creates searchable PDF version of scanned TIFF/JPEG/PDF images.
    • A good OCR system will allow you to specify fields to be picked up automatically based on pattern recognition. E.g. in the HR document example above, the system could be trained to automatically populate the values for Employee ID, Employee Name and Date of Joining.
  6. Smartly Categorize using Automated Tools
    • The OCR step above can pickup some tag/keyword values automatically
    • The scanning operator may have to enter some more tags/keywords manually for the documents
    • Once all relevant tags/keywords are populated, the system should automatically put the documents in the correct folders, give the correct permissions and assign the correct destruction policies.
At EisenVault we strive to make document classification a breeze. Our team will consult with you initially to setup all the tags, keywords, permissions, access control rules and destruction policies. The system will then automatically perform categorization using advanced OCR technology.