Handwritten Digit Recognition Using Machine Learning
Machine learning and deep learning assumes a significant function in PC innovation and AI. With the utilization of machine learning and AI, human exertion can be diminished in perceiving, learning, forecasts and a lot more territories. Now computer vision comes into picture . Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the human visual system can do . Now let’s know something more about machine learning.Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. Machine learning focuses on the development of computer programs that can access data and use it learn for themselves.
In this article , we are going to perform our task on handwritten digit recognition dataset present in the sklearn.dataset library and we shall check whether we should accept the null hypothesis or not . Here our null hypothesis is “Does the digits dataset of scikit-learn library predicts the digit accurately 95% of the times or not” ?
What is Handwritten Digit Recognition ?
The handwritten digit recognition is the ability of computers to recognize human handwritten digits. It is a hard task for the machine because handwritten digits are not perfect and can be made with many different flavors. The handwritten digit recognition is the solution to this problem which uses the image of a digit and recognizes the digit present in the image.
Here to check the evidence of null hypothesis , we make our prediction by using 2 algorithms . If in both cases, if the model will reach an accuracy of 0.95, we accept our null hypothesis otherwise we shall reject it.
First we shall import all the necessary libraries . Then we load the dataset which is in sklearn by using datasets.load_digits() . To know the content of dataset , we shall use dir() function.
This dataset consists of 1,797 images that are 8x8 pixels in size. Each image is a handwritten digit in grayscale. Now let’s check the length of the dataset.
Now let’s check , the target dataset.
Below we have plotted some random images from the dataset.
After all these things, let’s create the model . First we choose support vector machine algorithm for our model creation. So we shall import SVC from sklearn.svm . Then we shall create the object of SVC. Then we shall apply GridSearchCV to find the best parameters to train our model and by using best_params_ .
For training and testing purpose , we shall take 1257 images for training and rest for testing. Then we shall plot confusion matrix .
Now let’s print the accuracy .
Now , we shall check the accuracy for another algorithm. We shall use random forest . We shall follow the same process . First we shall create a model . To do this , we shall import RandomForestClassifier from sklearn.ensemble library. Then we shall apply the GridSearchCV to choose the best parameters for our training .
Now , we shall train our data.
By using heatmap , we shall plot the confusion matrix.
Let’s print the test score.
From the above two test score , it is clear that we are getting 95 % accuracy in case of support vector machine . So it’s not true for all algorithms . Hence we shall reject the null hypothesis and accept the alternative one.
You can find the working code at Github.
“I am thankful to mentors at https://internship.suvenconsultants.com for providing awesome problem statements and giving many of us a Coding Internship Experience. Thank you www.suvenconsultants.com"