Line 12: Line 12:
  
 
==2. algorithm overview==
 
==2. algorithm overview==
*OCR is a simple machine learning example, because it requires training and testing processing. Both training and testing needs the procedure of pre-processing and feature extraction. However, we know the true classification of the training data and we can build the statistical model to estimate the marked class. And by comparing the same extracted feature data, we can try to classify the testing data into corresponding class.  
+
*OCR is a simple machine learning example. Machine learning is all about learning the patterns and features from the known data and then making predictions accordingly. It is powerful in pattern recognition field. One fundamental method of machine learning is using computational statistics. For example, the way we detect whether two sinusoidal signals are the same is to compare the maximum magnitude, period and relative phase. These three numbers are so-called computational statistics that we extract from the data. By comparing with the known sinusoidal signals, we can then predict or know what functions could be used to describe the unknown signals.
 +
*Most machine learning requires training and testing processing. For specific task, we will analyze and figure out a group of representative numbers that would well describe the data like the maximum magnitude of sinusoidal signals. The training process firstly computes all the statistical features of marked data which we know what they are. After saving all the values, we then compute all the statistics for the unknown data which we want to figure out what they are. By comparing the unknown data with the whole data set of the known data, we would predict the unknown data is the same as certain known data if their statistics are quite similar or even identical with other comparison. This is the testing processing: extracting the feature statistics and then compare values to classify the unknown data.
 +
*In some cases, we need pre-processing to transform the data in some way to get the statistical value, such as the Fourier Transformation if we want to get the formants from the audio signal.  
 +
 
 
----
 
----
  
Line 25: Line 28:
 
==4. main steps of the algorithm==   
 
==4. main steps of the algorithm==   
 
1. Segmentation
 
1. Segmentation
*It is a pre-processing part. The preliminary step is to convert the image into binary number by 'im2bw'.The white pixel returns 0 and black pixel returns 1 by the function.  
+
*It is a pre-processing part. The preliminary step is to convert the image into binary number by 'im2bw'(MATLAB command). The white pixel returns 0 and black pixel returns 1 by the function.  
 
*After that, crop the image to fit the text. We find the minimum and maximum index of the picture that contains a text pixel vertically and horizontally. Then we can get the cropped image by the calculated index.
 
*After that, crop the image to fit the text. We find the minimum and maximum index of the picture that contains a text pixel vertically and horizontally. Then we can get the cropped image by the calculated index.
 
*Then extract the whole characters line by line. We sum up each row to find the first one to be zero. And we then know the index of the row to trim. By doing this, we can separate the first line of texts with the remaining if there are several lines. If we repeat this procedure, we can successfully separate all the lines that contains texts.  
 
*Then extract the whole characters line by line. We sum up each row to find the first one to be zero. And we then know the index of the row to trim. By doing this, we can separate the first line of texts with the remaining if there are several lines. If we repeat this procedure, we can successfully separate all the lines that contains texts.  
*Finally, we can trim each character in each line. To achieve this, we will use the function 'bwlabel'. The function 'L = bwlabel(BW)' returns the label matrix L that contains labels for the 8-connected objects found in BW. The label matrix, L, is the same size as BW. The code is from: http://www.mathworks.com/matlabcentral/fileexchange/8031-image-segmentation---extraction.  
+
*Finally, we can trim each character in each line. To achieve this, we will use the function 'bwlabel'(MATLAB command). The function 'L = bwlabel(BW)' returns the label matrix L that contains labels for the 8-connected objects found in BW. The label matrix, L, is the same size as BW. Finally, we can extract every signal character from each line. The code is from: http://www.mathworks.com/matlabcentral/fileexchange/8031-image-segmentation---extraction.  
  
 
2. Classification
 
2. Classification
Line 36: Line 39:
 
{\sqrt{ \left( \sum_{m=1}^{M} \sum_\sum_{n=1}^{N} \left( A_{mn}-\bar{A} \right)^2 \right) \left( \sum_{m=1}^{M} \sum_{n=1}^{N} \left( B_{mn}-\bar{B} \right)^2 \right)}}
 
{\sqrt{ \left( \sum_{m=1}^{M} \sum_\sum_{n=1}^{N} \left( A_{mn}-\bar{A} \right)^2 \right) \left( \sum_{m=1}^{M} \sum_{n=1}^{N} \left( B_{mn}-\bar{B} \right)^2 \right)}}
 
</math>
 
</math>
*Then we find the maximum value of correlation. And the corresponding letter or number is the recognized symbol for the each character.  
+
*Then we find the maximum value of correlation within all the comparisons with training data. And the corresponding letter or number of the max correlation in training data is the recognized symbol for the each character.  
  
 
----
 
----
Line 44: Line 47:
 
*I also take some ideas from this project description.
 
*I also take some ideas from this project description.
 
**http://www.ele.uri.edu/~hansenj/projects/ele585/OCR/OCR.pdf
 
**http://www.ele.uri.edu/~hansenj/projects/ele585/OCR/OCR.pdf
 +
*This is the reference for the explanation of machine learning.
 +
**https://en.wikipedia.org/wiki/Machine_learning
 
*Here are some other useful link to OCR that may help you to discover it more in-depth and fundamentally.
 
*Here are some other useful link to OCR that may help you to discover it more in-depth and fundamentally.
 
**https://www.mathworks.com/help/vision/ref/ocr.html|https://www.mathworks.com/help/vision/ref/ocr.html
 
**https://www.mathworks.com/help/vision/ref/ocr.html|https://www.mathworks.com/help/vision/ref/ocr.html

Revision as of 14:04, 14 November 2016


Introduction to Optical Character Recognition(OCR)

1. Introduction

  • Optical character Recognition (OCR) serves as a tool to detect information from natural images and transfer them into machine-coded texts, such as words, symbols and numbers. It is still a hot ongoing search area and some novel algorithms are publishing from time to time. It is pretty interesting and essential to recognize the characters in the image because it could help greatly in some certain area: auto plate number recognition, books and documents scanning, assistive technology for blind and visually impaired users , zip-code recognition needed for post offices and much more.
  • In this page, I would like to introduce a basic and simple method to transfer typed alphabets or numbers into machine coded texts.

2. algorithm overview

  • OCR is a simple machine learning example. Machine learning is all about learning the patterns and features from the known data and then making predictions accordingly. It is powerful in pattern recognition field. One fundamental method of machine learning is using computational statistics. For example, the way we detect whether two sinusoidal signals are the same is to compare the maximum magnitude, period and relative phase. These three numbers are so-called computational statistics that we extract from the data. By comparing with the known sinusoidal signals, we can then predict or know what functions could be used to describe the unknown signals.
  • Most machine learning requires training and testing processing. For specific task, we will analyze and figure out a group of representative numbers that would well describe the data like the maximum magnitude of sinusoidal signals. The training process firstly computes all the statistical features of marked data which we know what they are. After saving all the values, we then compute all the statistics for the unknown data which we want to figure out what they are. By comparing the unknown data with the whole data set of the known data, we would predict the unknown data is the same as certain known data if their statistics are quite similar or even identical with other comparison. This is the testing processing: extracting the feature statistics and then compare values to classify the unknown data.
  • In some cases, we need pre-processing to transform the data in some way to get the statistical value, such as the Fourier Transformation if we want to get the formants from the audio signal.

3. algorithm assumption

  • The proposed algorithm is simple and easy to learn. The purpose of this project is to welcome talents like you to get involved with the recognition world.
  • We assume the input image has a clean background. The "clean" here means the contrast between the background and characters is high enough to detect. It works best when the background is white. It will enhance the success rate of the algorithm. The colored letters are intended to include in the project. However, if the returned results are weird, try to convert the image into gray scale. Because binarization helps us to calculate a relatively robust threshold value to classify the results.
  • We assume all the characters are lined horizontally. But they could be placed in several lines. In the project, the technique we used to segment each character requires this particular arrangement.
  • We assume all the characters are machine printed or similar to that. It is because the training data set contains only a particular letter style. So maybe some machine printed testing images would also return undesired results.
  • We assume all the test data contains characters whose font size is not less than 42 * 24 pixels. However, if that is small, You want to use the function 'imdilate' to increase its thickness before executing the program.

4. main steps of the algorithm

1. Segmentation

  • It is a pre-processing part. The preliminary step is to convert the image into binary number by 'im2bw'(MATLAB command). The white pixel returns 0 and black pixel returns 1 by the function.
  • After that, crop the image to fit the text. We find the minimum and maximum index of the picture that contains a text pixel vertically and horizontally. Then we can get the cropped image by the calculated index.
  • Then extract the whole characters line by line. We sum up each row to find the first one to be zero. And we then know the index of the row to trim. By doing this, we can separate the first line of texts with the remaining if there are several lines. If we repeat this procedure, we can successfully separate all the lines that contains texts.
  • Finally, we can trim each character in each line. To achieve this, we will use the function 'bwlabel'(MATLAB command). The function 'L = bwlabel(BW)' returns the label matrix L that contains labels for the 8-connected objects found in BW. The label matrix, L, is the same size as BW. Finally, we can extract every signal character from each line. The code is from: http://www.mathworks.com/matlabcentral/fileexchange/8031-image-segmentation---extraction.

2. Classification

  • According to Tou and Gonzalez, “The principal function of a pattern recognition system is to yield decisions concerning the class membership of the patterns with which it is confronted.” For this project, the goal is to compare the image of each trimmed character with that of training data. Due to the fact that each image of characters is made up with numerical pixels, we can find the correlation between them in two dimensions. The formula is used to find the value of correlation. The higher value means the better match.

$ r = \frac{ \sum_{m=1}^{M} \sum_{n=1}^{N} \left( A_{mn}-\bar{A} \right) \left( B_{mn}-\bar{B} \right) } {\sqrt{ \left( \sum_{m=1}^{M} \sum_\sum_{n=1}^{N} \left( A_{mn}-\bar{A} \right)^2 \right) \left( \sum_{m=1}^{M} \sum_{n=1}^{N} \left( B_{mn}-\bar{B} \right)^2 \right)}} $

  • Then we find the maximum value of correlation within all the comparisons with training data. And the corresponding letter or number of the max correlation in training data is the recognized symbol for the each character.

5. References



Alumni Liaison

Ph.D. 2007, working on developing cool imaging technologies for digital cameras, camera phones, and video surveillance cameras.

Buyue Zhang