Optical Character Recognition

Optical Character Recognition

 

Introduction

Optical Character Recognition is a Catalyst Zia AI-driven service that performs the electronic detection of handwritten or printed textual characters in images or digital documents, and converts the detected characters to machine-encoded text. Zia detects text in photos and scanned documents, then breaks the text down into individual characters, and identifies the language it is in. The recognized text is then presented as a JSON response.

The recognized text is presented as a JSON response, along with a confidence score that informs you of its accuracy. You can code the Catalyst application to store the recognized data or process it further in any way you require. Zia OCR can automatically detect and recognize texts in 10 major languages.

OCR is widely used in web and mobile applications that are created to read content from scanned or photographed documents, flyers, menus, posters, signs, and other files containing text. The identified text can be stored digitally or used for further data processing.   

Catalyst provides Zia OCR in the Java and Node.js SDK packages, and you can integrate it in your Catalyst web or Android application. The Catalyst console provides easy access to code templates for these environments that you can implement in your application's code.

You can also test Zia OCR by uploading sample images or documents that contain text in the console and obtain the recognized text, to get a better idea of Zia's accuracy and the OCR response format.

You can refer to the Java SDK documentation and Node.js SDK documentation for code samples of Zia OCR. Refer to the API documentation to learn about the API available for OCR.

 

Key Concepts

Before you learn about the use cases and implementation of Optical Character Recognition, it's important to understand its fundamental concepts in detail.

Text Recognition Process

OCR systems in general follow a top-down approach to the text detection and identification process. 

When an image or a digital document is submitted to Zia OCR, the text detection and recognition process proceeds as follows: 

  1. Zia analyzes the structure of the image and divides it into blocks of contiguous sets of textual lines, like paragraphs. 
    Note: A block could also contain pictorial content. However, any content that is not text, such as diagrams, symbols, or images will not be identified by Zia OCR. 
  2. Zia then breaks the blocks down further and identifies individual lines of text. 
  3. The lines of text are then divided into words and each word is broken down into individual characters. 
  4. Zia compares the characters it has detected with its dataset and runs advanced algorithms and analysis to identify the characters and recognize words based on the of character groupings.
  5. Zia also identifies the language the content is in by processing it through volumes of probabilities and hypotheses using Intelligent Character Recognition (ICR) technology. 
  6. The processed and recognized text is finally returned to the user as either a JSON or a document response.
 

Model Types

Zia OCR enables you to provide files of the following five model types:

  • OCR: The user can specify this model type in the input for all general image and document files that need to be processed for optical character recognition. Zia will perform a generic text recognition and text analysis processing.
  • AADHAAR: The user can specify this model type in the input when they specifically require Indian Aadhaar card image files to be processed for optical character recognition. For this type, the user must upload two images of the front and back of the Aadhaar card. You will not be able to test the AADHAAR model type in the console. The Java and Node.js SDK also do not offer support to provide inputs of the AADHAAR model. You can only process this model using the API.
  • PASSBOOK: The user can specify this model type in the input when they specifically require Indian bank passbook image files to be processed for optical character recognition. Zia will then restricts its processing to the format of a passbook and provides quicker results.
  • CHEQUE: The user can specify this model type in the input when they specifically require Indian cheque image files to be processed for optical character recognition. Zia only processes cheques of the CTS-2010 format. You will not be able to test the CHEQUE type from the console either. You can process this type using the API.
  • PAN: The user can specify this model type in the input when they specifically require Indian PAN card image files to be processed for optical character recognition. Zia will recognize the textual characters in the PAN card. You will be able to process this type only using the API.
Note: The AADHAAR, PASSBOOK, CHEQUE, PAN model types are only relevant to Indian users. These model types are also not available to users accessing from the EU data center. Non-Indian users and users from the EU DC can access the general OCR model type.

The model type is optional while sending the API request. If it is not specified, Zia will process the file as an OCR type by default.

 

Supported Languages

The OCR and PASSBOOK models can detect and recognize textual content in 9 international languages and 10 Indian languages. However, the AADHAAR model only supports the 10 Indian languages and English.

The CHEQUE model process the content in English by default, and does not support any other languages.

Indian Languages Supported by All Models

  1. English
  2. Hindi
  3. Bengali
  4. Marathi
  5. Telugu
  6. Tamil
  7. Gujarati
  8. Urdu
  9. Kannada
  10. Malayalam
  11. Sanskrit

Additional International Languages Supported by OCR and PASSBOOK Models

  1. Arabic
  2. Chinese
  3. French
  4. Italian
  5. Japanese
  6. Portuguese
  7. Romanian
  8. Spanish

If the user doesn't specify the language, Zia can detect the language automatically. Zia can recognize handwritten content as long as the text is legible, clear, and uses a standard font structure. However, it cannot recognize any non-textual content such as images or diagrams.

 

Input Format

Zia OCR supports input files in the following formats for processing:

  1. .jpg/.jpeg
  2. .png
  3. .tiff
  4. .bmp
  5. .pdf

You could provide a space for the user to upload the image or document file from the device's memory to the Catalyst application. You can also code the Catalyst application to use the end user device's camera to capture a photo with textual content, and process the image as the input file.

The input provided using the API contains the source file, the language of the text to be recognized (optional), and the model type (optional).

You can check the request format from the API documentation.

The user must follow these guidelines while providing the input, for better results:

  • Avoid providing blurred or unrecognizable text in images.
  • Ensure that the text in an image file is clear, visible, and legible.
  • If handwritten text is present in an image file, ensure that it uses a standard font.
  • The image size must not be too small.
  • For the AADHAAR model type, two images of the front and back of the aadhaar card must be provided.
  • The file size must not exceed 20 MB. For the AADHAAR model type, each image of the front and back must be under 20 MB.
 

Response Format

Zia returns the response of OCR processing in the following ways:

  • In the Console
    When you upload a sample image or a document file to be processed in the console, it will return the response in two formats:
    • Document response: This returns a formatted readable text that is visually segregated into lines and paragraphs based on the original content, along with a confidence score for the OCR model type in a percentage value.
    • JSON response: This returns the recognized text in JSON format along with the confidence score for the OCR model type.
  • Using the SDKs
    When you send an image or document file using an API request, you will receive a JSON response containing the recognized text in the same format mentioned above. You can customize the formatting of the JSON response in your code using SDKs. For example, you can return separate paragraphs or individual words from a line as the response. For more information, refer to the Java and Node.js SDK documentation.
 

Benefits

  1. Automatic Language Detection
    Zia can operate in an automatic mode and run its algorithms to detect and identify the language of the text in an input file. However, to speed up the text recognition process, the user can specify the language of the text, if they know it, while submitting it for OCR processing. This enables Zia to restrict its processing and analysis to a limited dataset, leading to quicker and more accurate results.
  2. Rapid Performance
    Zia OCR processes files and generates the results in a fast and effective manner. Catalyst ensures a high throughput of data transmission and a minimal latency in serving requests. The quick response time enhances your application's performance, and provides a satisfying experience for the end user.
  3. Highly Accurate and Reliable Results
    Zia is an AI-driven assistant that undergoes repeated systematic training to generate results with higher accuracy and a lower error margin. The AI is trained using various machine learning techniques to perform complex computations and analysis. The training model is highly vigorous, which means it studies and analyzes large volumes of data, and this ensures that the results generated are precise, accurate, and reliable.
  4. Seamless Integration
    You can easily implement OCR in your application without having to worry about the underlying logic or the backend set-up. You can implement the ready-made code templates provided for the Java and Node.js platforms in any of your Catalyst applications that requires the use of OCR.
  5. Testing in the Console
    The testing feature in the console enables you to verify the efficiency of Zia OCR. You can upload sample images and documents with text, and view the results. This allows you to get an idea about the format and accuracy of the response that will be generated when you implement it in your application.
 

Use Cases

Text detection and recognition technologies are implemented in a wide range of applications and scenarios. The following are some use cases for Zia OCR:

  • A text conversion Android application implements Zia OCR to convert hand-written and hard copy text content into digital documents. The application scans a picture of the source text and it is processed for OCR where the content is broken down and individual characters are recognized and grouped. Catalyst then produces the final intelligible result within seconds which is displayed to the end user in the application. The user will now be able to edit, format, or search in the text file like any other digital document.
  • An application linked to traffic cameras that scans and reads license plate registration numbers of traffic rules offenders implements Zia OCR to read text from captured images of license plates. The images are uploaded automatically, and Zia performs OCR processing on them to decipher the registration numbers. The recognized registration numbers are processed further to obtain the identity of the vehicle owner. The application also stores the data in Catalyst Data Store tables.

Some other examples where Zia OCR can be implemented include:

  • An application that recognizes text in image files and converts them to PDFs with the text, or vice-versa
  • An application that enables you to quickly digitize your Aadhaar cards and bank passbooks and store them as PDF files
  • A crime investigation application that scans crime scene photos for textual content such as from street signs, billboards, or graffiti
  • Applications that digitize magazines, journals, contracts, pamphlets or posters by scanning hard copies of them
 

Implementation

This section only covers working with OCR in the Catalyst console. Refer to the SDK and API documentation sections for implementing Zia OCR in your application's code.

As mentioned earlier, you can access the code templates that will enable you to integrate OCR in your Catalyst application from the console, and also test the feature by uploading sample images and documents and obtaining the recognized text.

Access Optical Character Recognition

To access Optical Character Recognition in your Catalyst console:

  1. Navigate to Zia Services under Discover, then click Access Now on the Optical Character Recognition window.
  2. Click Try a Demo in the Optical Character Recognition feature page.

    This will open the Optical Character Recognition feature.
 

Test Optical Character Recognition in the Catalyst Console

You can test OCR by either selecting a sample image or PDF file from Catalyst or by uploading your own file.

To process a sample file and obtain the results:

  1. Click Select a Sample Image in the box.
  2. Select an image or a PDF file from the samples provided.

    OCR will process the file, detect and identify the textual content in it. Since it is a sample file, the language the text is in and the model type is provided by Catalyst automatically.

    The recognized text is displayed in the console under the Result section.

    You can view the JSON response by clicking View Response.

To upload your own image or PDF file with text:

  1. Click Upload under the Result section.

    If you're opening Optical Character Recognition after you have closed it, click Browse Files in this box.
  2. Upload a file from your local system.
    Note: The file must be in .jpg/.jpeg, .png, .bmp, .tiff, or .pdf format. The file size must not exceed 20 MB.
  3. Select the model type and the languages in the file's text, if you are aware of them. You can select General for the OCR model or Passbook for the PASSBOOK model. You can select multiple languages, if the file contains text in multiple languages.
  4. Click Proceed.

The console will process the file and display the recognized textual content, and the confidence score if it is of the OCR model type. You can copy the recognized text using the copy icon.

You can check the JSON response in a similar way.


 

Access Code Templates for Optical Character Recognition

You can implement Optical Character Recognition in your Catalyst application using the code templates provided by Catalyst for Java and Node.js platforms.

You can access them from the section below the test window. Click either the Java SDK or NodeJS SDK tab, and copy the code using the copy icon. You can paste this code in your web or Android application's code wherever you require.

In Java, you can process the input file as a new File, specify the model type using ZCOCRModelType and the languages using setLanguageCode. Refer to the API documentation for the keys of the supported languages and model types.

As mentioned earlier, you can format the JSON response that you receive. The Java code enables you to obtain specific paragraphs, individual lines in a paragraph, or individual words in a line.

The Node.js code processes the input file as the object ocrPromise. You can provide the input file name, set the model type using modelType, and the languages using language.

Share this post : FacebookTwitter

Still can't find what you're looking for?

Write to us: support@zohocatalyst.com