Document Analytics is using optical character recognition (OCR) technologies to process documents with an ability to search and analyze them; combined with Natural Language Processing (NLP), it provides for a powerful solution that is easy to use. OCR is widely used as a form of information entry from printed paper data records, whether insurance, legal or other forms, passport documents, invoices, bank statements or business cards. Over the years, OCR technologies have evolved into specialized domain-specific engines such as receipt OCR, check OCR, invoice OCR, etc.
It is not remiss to say that OCR technology itself is a commodity. There are many open-source as well as commercial tools that provide varying levels of accuracy for selected domains. However, all of these tools work in batch mode on a single system, sequentially processing files over hours or days.
With the explosion of data and big data technologies to handle the data, these stand-alone tools no longer meet business needs. Businesses today require the processing of large numbers of documents in near-real time in such as way as to be able to quickly index, search, analyze and aggregate their content. They can no longer afford to manually create individual templates, manually check the results of the OCR and manually fix incorrect recognitions.
Orzota BigForce Document Analytics Solution
The Orzota BigForce Document Analytics Solutions uses Big Data technologies to process a large number of documents (image files, PDFs, etc.) automatically indexing them along the way to provide instant search and analysis capabilities. A cognitive engine provides an easy to use interface for the mobile user; using NLP to search and return the data requested. A full functional big data search facility provides powerful regular expression search for the desktop user as well.
The raw files are stored in the Hadoop File System (HDFS). Depending on the type of documents, an appropriate OCR Engine is chosen; the files are processed in parallel using Spark. Sophisticated NLP algorithms make intelligent decisions using dynamically generated templates for various classes of documents.
The resulting data is automatically indexed making it instantly searchable. The data is also converted into its target elements (e.g. various form fields, sender, receiver, address, etc.) and stored in a columnar database such as HBase.
A customizable analytics layer provides an easy-to-use interface and powerful analytics based on specific customer requirements. Full search capabilities instantly let you search and view results dynamically. For mobile users, a cognitive engine provides an easy-to-use free-form query interface (see sample query in diagram above).
Orzota BigForce Document Analytics Solution Advantages
The Orzota BigForce Document Analytics Solution provides many advantages over traditional tools and methods of processing documents:
- Process large number of documents in parallel, in near real-time
- Identify templates and classes of documents
- Configure and use an appropriate OCR engine based on the document
- Search for almost anything in the processed documents
- Allow for sophisticated business insights and analytics
- Cognitive (AI) Engine provides a natural language interface to search for documents
Sign up for a demo today and find out how the Orzota BigForce Document Analytics Solution can help your business.