Handwritten Recognizing Digits with scikit-learn

Sowmiya K
5 min readMar 5, 2021

Recognizing handwritten text is a problem that can be traced back to the first automatic machines that needed to recognize individual characters in handwritten documents. Think about, for example, the ZIP codes on letters at the post office and the automation needed to recognize these five digits. Perfect recognition of these codes is necessary in order to sort mail automatically and efficiently.

Included among the other applications that may come to mind is OCR (Optical Character Recognition) software. OCR software must read handwritten text, or pages of printed books, for general electronic documents in which each character is well defined. But the problem of handwriting recognition goes farther back in time, more precisely to the early 20th Century (1920s), when Emanuel Goldberg (1881–1970) began his studies regarding this issue and suggested that a statistical approach would be an optimal choice.

To address this issue in Python, the scikit-learn library provides a good example to better understand this technique, the issues involved, and the possibility of making predictions.

The scikit-learn library (http://scikit-learn.org/) enables you to approach this type of data analysis. The data to be analyzed is closely related to numerical values or strings, but can also involve images and sounds.

The problem you have to face in this project involves predicting a numeric value, and then reading and interpreting an image that uses a handwritten font. So even in this case you will have an estimator with the task of learning through a fit() function, and once it has reached a degree of predictive capability (a model sufficiently valid), it will produce a prediction with the predict() function. Then we will discuss the training set and validation set, created this time from a series of images.

Digits data set consists of 1,797 images that are 8x8 pixels in size. Each image is a handwritten digit in grayscale.

  • Now open a new IPython Notebook session from the command line by entering the following command: (Jupyter)

Let us start by importing necessary libraries for our model and loading the dataset digits. To import the svm module of the scikit-learn library.

After loading the dataset, you can analyze the content. First, you can read lots of information about the datasets by calling the DESCR attribute.

The images of the handwritten digits are contained in a digits.images array. Each element of this array is an image that is represented by an 8x8 matrix of numerical values that correspond to a grayscale from white, with a value of 0, to black, with the value 15.

By launching this command, you will obtain the grayscale image shown in Figure

The numerical values represented by images, i.e., the targets, are contained in the digit.targets array.

Learning and Predicting

Visualizing the images and labels in our Dataset. This dataset contains 1,797 elements, and so let us consider the first 1,791 as a training set and will use the last six as a validation set. We can see in detail these six handwritten digits by using the matplotlib library.

Now we are training the svc estimator that we have defined earlier.

Now we have to test our estimator, making it interpret the six digits of the validation set.

As we can see that the svc estimator has learned correctly. It is able to recognize the handwritten digits, interpreting correctly all six digits of the validation set

Now let us see the Scikit-Learn 4-Step Modeling Pattern.

First let’s split our Dataset into training and test sets to make sure that after we train our model, it is able to generalize well to new data.

Importing the model we want to use.

Importing using Logistic Regression.

Making an instance of the Model.

Training the Model.

Predicting the labels of new data and measuring performance of our model.

As we can clearly see above, 95% of our models the achieved accuracy is 100% . Hence, we can easily conclude that our model works for more than 95% of the time.

--

--

Sowmiya K

Working for HCL as a Senior Software Engineer | Interested in Networking | cybersecurity | Python