Optical Character Recognition
Table of Contents:
Optical Character Recognition is a Zia AI-driven service that performs the electronic detection of handwritten or printed textual characters in images or digital documents, and converts the detected characters to machine-encoded text. Zia detects text in photos and scanned documents, then breaks the text down into individual characters, and identifies the language it is in. The recognized text is then presented as a JSON response.
The recognized text is presented as a JSON response, along with a confidence score that informs you of its accuracy. You can code the Catalyst application to store the recognized data or process it further in any way you require. Zia OCR can automatically detect and recognize texts in 10 major languages.
OCR is widely used in web and mobile applications that are created to read content from scanned or photographed documents, flyers, menus, posters, signs, and other files containing text. The identified text can be stored digitally or used for further data processing.
Catalyst provides Zia OCR in the Java and Node.js SDK packages, and you can integrate it in your Catalyst web or Android application. The Catalyst console provides easy access to code templates for these environments that you can implement in your application's code.
You can also test Zia OCR by uploading sample images or documents that contain text in the console and obtain the recognized text, to get a better idea of Zia's accuracy and the OCR response format.
Before you learn about the use cases and implementation of Optical Character Recognition, it's important to understand its fundamental concepts in detail.
OCR systems in general follow a top-down approach to the text detection and identification process.
When an image or a digital document is submitted to Zia OCR, the text detection and recognition process proceeds as follows:
- Zia analyzes the structure of the image and divides it into blocks of contiguous sets of textual lines, like paragraphs.
Note: A block could also contain pictorial content. However, any content that is not text, such as diagrams, symbols, or images will not be identified by Zia OCR.
- Zia then breaks the blocks down further and identifies individual lines of text.
- The lines of text are then divided into words and each word is broken down into individual characters.
- Zia compares the characters it has detected with its dataset and runs advanced algorithms and analysis to identify the characters and recognize words based on the of character groupings.
- Zia also identifies the language the content is in by processing it through volumes of probabilities and hypotheses using Intelligent Character Recognition (ICR) technology.
- The processed and recognized text is finally returned to the user as either a JSON or a document response.
A model type is a key attribute that describes the type of OCR feature supported by Catalyst. All general image and document files that you process for the common optical character recognition feature will fall under the OCR Model Type. You will need to specify this as the model type, whenever you process an image or a document through the Catalyst OCR API or SDK.
Catalyst also enables you to process ID proofs and official documents, and perform secure identity checks through an independent feature called Identity Scanner. These will fall under their respective model types of AADHAR, PAN, CHEQUE and PASSBOOK.
The OCR models can detect and recognize textual content in 9 international languages and 10 Indian languages.
Additional International Languages
If the user doesn't specify the language, Zia can detect the language automatically. Zia can recognize handwritten content as long as the text is legible, clear, and uses a standard font structure. However, it cannot recognize any non-textual content such as images or diagrams.
Zia OCR supports input files in the following formats for processing:
You could provide a space for the user to upload the image or document file from the device's memory to the Catalyst application. You can also code the Catalyst application to use the end user device's camera to capture a photo with textual content, and process the image as the input file.
You can check the request format from the API documentation.
The user must follow these guidelines while providing the input, for better results:
- Avoid providing blurred or unrecognizable text in images.
- Ensure that the text in an image file is clear, visible, and legible.
- If handwritten text is present in an image file, ensure that it uses a standard font.
- The image size must not be too small.
Zia returns the response of OCR processing in the following ways:
- In the Console
When you upload a sample image or a document file to be processed in the console, it will return the response in two formats:
- Document response: This returns a formatted readable text that is visually segregated into lines and paragraphs based on the original content, along with a confidence score for the OCR model type in a percentage value.
- JSON response: This returns the recognized text in JSON format along with the confidence score for the OCR model type.
- Using the SDKs
When you send an image or document file using an API request, you will receive a JSON response containing the recognized text in the same format mentioned above. You can customize the formatting of the JSON response in your code using SDKs. For example, you can return separate paragraphs or individual words from a line as the response. For more information, refer to the Java and Node.js SDK documentation.
- Automatic Language Detection
Zia can operate in an automatic mode and run its algorithms to detect and identify the language of the text in an input file. However, to speed up the text recognition process, the user can specify the language of the text, if they know it, while submitting it for OCR processing. This enables Zia to restrict its processing and analysis to a limited dataset, leading to quicker and more accurate results.
- Rapid Performance
Zia OCR processes files and generates the results in a fast and effective manner. Catalyst ensures a high throughput of data transmission and a minimal latency in serving requests. The quick response time enhances your application's performance, and provides a satisfying experience for the end user.
- Highly Accurate and Reliable Results
Zia is an AI-driven assistant that undergoes repeated systematic training to generate results with higher accuracy and a lower error margin. The AI is trained using various machine learning techniques to perform complex computations and analysis. The training model is highly vigorous, which means it studies and analyzes large volumes of data, and this ensures that the results generated are precise, accurate, and reliable.
- Seamless Integration
You can easily implement OCR in your application without having to worry about the underlying logic or the backend set-up. You can implement the ready-made code templates provided for the Java and Node.js platforms in any of your Catalyst applications that requires the use of OCR.
- Testing in the Console
The testing feature in the console enables you to verify the efficiency of Zia OCR. You can upload sample images and documents with text, and view the results. This allows you to get an idea about the format and accuracy of the response that will be generated when you implement it in your application.
Text detection and recognition technologies are implemented in a wide range of applications and scenarios. The following are some use cases for Zia OCR:
- A text conversion Android application implements Zia OCR to convert hand-written and hard copy text content into digital documents. The application scans a picture of the source text and it is processed for OCR where the content is broken down and individual characters are recognized and grouped. Catalyst then produces the final intelligible result within seconds which is displayed to the end user in the application. The user will now be able to edit, format, or search in the text file like any other digital document.
- An application linked to traffic cameras that scans and reads license plate registration numbers of traffic rules offenders implements Zia OCR to read text from captured images of license plates. The images are uploaded automatically, and Zia performs OCR processing on them to decipher the registration numbers. The recognized registration numbers are processed further to obtain the identity of the vehicle owner. The application also stores the data in Catalyst Data Store tables.
Some other examples where Zia OCR can be implemented include:
- An application that recognizes text in image files and converts them to PDFs with the text, or vice-versa
- A crime investigation application that scans crime scene photos for textual content such as from street signs, billboards, or graffiti
- Applications that digitize magazines, journals, contracts, pamphlets or posters by scanning hard copies of them
This section only covers working with OCR in the Catalyst console. Refer to the SDK and API documentation sections for implementing Zia OCR in your application's code.
As mentioned earlier, you can access the code templates that will enable you to integrate OCR in your Catalyst application from the console, and also test the feature by uploading sample images and documents and obtaining the recognized text.
To access Optical Character Recognition in your Catalyst console:
- Navigate to Zia Services under Discover, then click Access Now on the Optical Character Recognition window.
- Click Try a Demo in the Optical Character Recognition feature page.
This will open the Optical Character Recognition feature.
You can test OCR by either selecting a sample image or PDF file from Catalyst or by uploading your own file.
To process a sample file and obtain the results:
- Click Select a Sample Image in the box.
- Select an image or a PDF file from the samples provided.
OCR will process the file, detect and identify the textual content in it. Since it is a sample file, the language the text is in and the model type is provided by Catalyst automatically.
The recognized text is displayed in the console under the Result section.
You can view the JSON response by clicking View Response.
To upload your own image or PDF file with text:
- Click Upload under the Result section.
If you're opening Optical Character Recognition after you have closed it, click Browse Files in this box.
- Upload a file from your local system.
Note: The file must be in .jpg/.jpeg, .png, .bmp, .tiff, or .pdf format. The file size must not exceed 20 MB.
- Select the model type and the languages in the file's text, if you are aware of them. You can select General for the OCR model. You can select multiple languages if the file contains text in multiple languages.
- Click Proceed.
The console will process the file and display the recognized textual content, and the confidence score if it is of the OCR model type. You can copy the recognized text using the copy icon.
You can check the JSON response in a similar way.
You can access them from the section below the test window. Click either the Java SDK or NodeJS SDK tab, and copy the code using the copy icon. You can paste this code in your web or Android application's code wherever you require.
In Java, you can process the input file as a new File, specify the model type using ZCOCRModelType and the languages using setLanguageCode. Refer to the API documentation for the keys of the supported languages and model types.
As mentioned earlier, you can format the JSON response that you receive. The Java code enables you to obtain specific paragraphs, individual lines in a paragraph, or individual words in a line.
The Node.js code processes the input file as the object ocrPromise. You can provide the input file name, set the model type using modelType, and the languages using language.