A Beginners Guide to Computer Vision

Computer Vision is one of the most fascinating and booming fields with increasing scope. We can compare it with the combination of our eyes and brain. Let’s understand Computer Vision in elementary words; we train models to recognize different objects visually. Eyes take the input, and there’s a neural network in our brain that makes decisions and tells us about the image. This article will explain the A-Z basics of computer vision with real-time examples.

From this article, you will take the following points with you forever and never forget them:

  1. Tasks in Computer Vision
  2. How do machines process Images?
  3. What is Kernel?
  4. Convolution operation
  5. What are Edges, How to detect and its Importance?
  6. Importance of Padding and Pooling in training?
  7. Convolution Operation
  8. Most Important Libraries for Computer Vision in Python
  9. Awesome Project Ideas for the future.

Tasks in Computer Vision

The demand for computer vision applications is increasing every day in almost every field, covering Healthcare, Security, Transport, Automobile, Retail, Insurance, Fashion, Media, Agriculture, and many more. Computer Vision is the most evolving field; applications like autonomous vehicles and filters used in social media are gaining traction. Some of the other typical applications of Computer Vision are Face Detection and Recognition, Object detection and classification, Object Segmentation, Text detection and recognition, and many others. We’ll discuss beginner-level machine learning projects on computer vision in the following sections of this blog.

Data quality plays a vital role in developing any computer vision application because model accuracy is directly proportional to the quality of data on which the model is trained.

Some applications of Computer Vision
Source: (https://blog.superannotate.com/introduction-to-computer-vision/)

How do machines process Images?

Machines can only process numbers, not images. So Machine converts images in numbers, specifically in Numpy array, to process and extract information out of it.  Channels and Kernels are used to extract features and store them in Feature maps. And these feature maps are processed using Convolution operation to extract more features. This is how machines process images, and important information is extracted from them.

What are Channels and Kernels?

Channels are the information that we want our model to learn; for example, if we are training to detect human faces, then here faces are Channels, and this information is extracted using Kernels. 

Kernels are the random matrix that works as filters that filter the images to get desired information. 

In computer vision, we use many kernels to extract all the patterns from the image to get all the information out of it.

Operation done by kernel on Image
(https://setosa.io/ev/image-kernels/)

These predefined kernels are primarily used with fixed properties; you can write kernels for your images.

Kernels to more sharp images

Isn’t it amazing to write your kernels? This will give you the power to write your kernels according to your image. Suitable kernels for images help extract information easily with less computational cost.

What are Edges?

If we take an example of our eyes and brain, which processes images, we recognize objects based on their shape, edges, and outlines. Here I have not included the color parameter because it is the least important feature in recognizing any object. 

The most important thing is edges which help us recognize objects.

Let’s take an example of the image given below. 

Sketch of  Elephant

This is a simple sketch of an Elephant drawn from a pencil without any color. But still, we can easily recognize the outlined image as an elephant. 

All it takes is a few well-placed edges for understanding the image. Edge detection kernels detect these edges. 

Edge detection can also be done using the Canny() function of OpenCV. Its results are shown below:  

What is Padding and Its Importance?

Padding is adding a margin to images by filling the space with white or black color. In the era of social media, most people use this without ever even knowing it; people use thick white borders in images, as shown below

This white border is used to make corners and borders of the original image-focused. Borders are least parsed in images; some vital information can be present in edges and corners to extract that critical information we do Padding.

Number of time pixels get parsed in different parts of images during Convolution
(https://prasantdixit.medium.com/?p=b887ebdb4a9f)

And other than that there is another reason is that after applying Convolution operation, size of output decrease 

output_size= image_size- kernel_size +1 

Note: kernel_size must be less than image_size

Output of image after applying padding 

out_pad_size= image_size+ 2*padding

In the above image, padding applied is one on the original image, and the output size is the same as the original image.

What is Pooling and its Importance?

Pooling operation is generally used to downsample the output of images. When applying Pooling in images, the output size does not decrease, but this is not always good for us; this increases the system’s computational power and takes a significant amount of time. To overcome this problem, we use Pooling.

There are various types of Pooling:

  1. Max Pooling
  2. Min Pooling
  3. Global Average Pooling

But Max pooling is mainly used after every Convolution operation to downsample it and decrease its size by half. Max Pooling can be understood from the diagram below:

Max Pooling takes out Maximum value out of each kernel interval and stores it in output. Max Pooling is used to retain most important information, others which are less important to focus on important features and train better. 

Convolution Operation

Convolution is an operation that is applied to images to extract the important information out of images. By parsing it from top to bottom, important information is taken out and stored in output, also known as feature map, because it stores information about features.

We can compare it with our eyes as well. How do we see an image from top to bottom if the image size is too large? We start from left-top to right bottom to know about the image.

Convolution also works the same way, left-top to right bottom. It can be understood from the following diagram:

For getting started with this field, this much information is enough to understand the background of Computer Vision. Now you are ready to jump into this field.

Most Important Computer Vision Libraries 

Libraries are an essential part of coding in Python because most of the operations we want to apply are already being implemented in various libraries. Here I have listed the most important libraries which perform well and are in demand in the industry.

  1. OpenCV

It is one of the most extensive open-source computer vision libraries, abbreviated as OpenSource Computer Vision. It provides various functions for altering brightness, contrast, edge detection, and many other functions which work the same as kernels.

Most important functions which are used in every computer vision project

i) imread/ imshow : to read and show images

ii) cvtColor : to convert color like RGB2GRAY

iii) resize : to resizing image

iv) ones/ zeros : to fill the image with zeros and ones

v) transpose : to transpose the image matrix

All the other libraries are using OpenCV for various operations.

2. Tensorflow

Tensorflow is Machine Learning used to design Convolutional Neural Networks for various tasks like Detection, Classification, and many others. Other than that, many Computer Vision pre-trained models can also be implemented using Tensorflow.

3. Dlib

Dlib is a library written in C++ language but can also be used in Python. Dlib is one of the most famous libraries for Face Landmark detection, Face detection tasks. It is a very easy-to-use library that gives excellent results.

4. MediaPipe

MediaPipe is known for its ready-to-use real-time Face detection, Face landmark detection, Object detection, and many other tasks. It provides live customizable Computer Vision models. MediaPipe is used to build ML Pipelines on edge devices like Android, iOS, and the web.

5. PyTorch

Pytorch is a well-known Deep Learning framework. Pytorch is used for Data augmentation, i.e., to increase data by changing its rotation, color by normalizing Images. Pretrained Computer Vision models can also be used in Pytorch. Pytorch is also used to design Convolutional Neural Network architecture for Detection, Classification, and many others

Excellent Machine Learning Project Ideas on Computer Vision for Practice

There are many Computer Vision projects, and all of them seem fascinating. As a Beginner, these are pretty interesting and easy to get hands-on experience in Computer Vision.

  1. Face Detection and Recognition

Face detection is the most famous project in Computer Vision. It can be easily done using the frontal face detector of the Dlib library in a few lines. Dlib also helps to detect landmarks of the face.

Face Detection and Recognition

2. Object Detection and Classification

Object detection is easy to implement using pre-trained models, which can be used using Tensorflow and Pytorch, this way of training any model is called Transfer Learning. In this task, the main component is data to train customized Object Detection and Classification.

3. Sketch using Geometrical figure

OpenCV provides functions to create geometrical figures like rectangles, circles, and many others. Using these Geometrical figures with the correct placement and color, you can form a meaningful figure. 

Sketch of Doremon

4. BarCode and QR Code Detection and Recognition

BarCode and QR Code detector and recognizer can be made using OpenCV and Pyzbar Library. It is a straightforward and exciting project that recognizes the Bar and QR code and gives links.

5. Number Digit Detection and Recognition

Dataset for this task is already available, known by the name MNIST. Model can be easily trained on it to detect and recognize the number digits. It is the most basic project done by almost every machine learning beginner.

Keep Learning. There is a lot to explore in the field of computer vision. These computer vision projects will give you a good start as a beginner, and you can explore other advanced projects to enhance your skills.


Featured Images to: 6 Easy Steps to Implement a Computer Vision Application Using Tensorflow.js

ombirsharma

Leave a Reply

Your email address will not be published.